3  Learning, validation, testing

Why slicing data:

A classic scheme:

3.1 Cross-validation

Cross-validation is a method of estimating model reliability based on a sampling technique.

3.1.1 K-fold cross-validation

The original sample is divided into K samples (or blocks), then one of the K samples is selected as the validation set, while the other K − 1 samples constitute the training set.

  • Split the data into K sub-samples of the same size

  • For \(k = 1,··· ,K,\):

    • estimate the rule on the private data of sample k

    • predict the data of sample k with this rule

  • Compute the performance criterion on these K predictions

3.1.2 Leave-one-out cross-validation (LOOCV)

Leave-one-out cross-validation is the special case of K block cross-validation with K = n. That is, at each learning-validation iteration, learning is performed on \(n − 1\) observations and validation on the single remaining observation.

  • For \(i = 1,··· ,n\)

    • estimate the rule on the data without the \(i_{th}\) data

    • predict this data i with this rule

  • Compute the performance criterion on these n predictions