Each row in the data are assigned to one of `1:n_folds` at random.
Then for each of `i` in `1:n_folds`, the `training_data[[i]]`
is comprised of the data with `sl_fold != i`, i.e., capturing
roughly `(n-folds-1)/n_folds` proportion of the data. The validation data
is a list of dataframes, each comprising of roughly `1/n_folds` proportion of the
data.
Usage
cv_random_schema(data, n_folds = 5)
Arguments
- data
a data.frame (or similar) to split into training and validation datasets.
- n_folds
The number of `training_data` and `validation_data` data frames to make.
Value
a named list of two lists, each being a list of `n_folds` data frames.
Details
Since the assignment to folds is random, the proportions are not
exact or guaranteed and there is some variability in the size of
each `training_data` data frame, and likewise for the `validation_data`
data frames.
Examples
if (FALSE) { # \dontrun{
data(Boston, package = 'MASS')
training_validation_data <- cv_random_schema(Boston, n_folds = 3)
# take a look at what's in the output:
str(training_validation_data, max.level = 2)
} # }