Skip to contents

Each row in the data are assigned to one of `1:n_folds` at random. Then for each of `i` in `1:n_folds`, the `training_data[[i]]` is comprised of the data with `sl_fold != i`, i.e., capturing roughly `(n-folds-1)/n_folds` proportion of the data. The validation data is a list of dataframes, each comprising of roughly `1/n_folds` proportion of the data.

Usage

cv_random_schema(data, n_folds = 5)

Arguments

data

a data.frame (or similar) to split into training and validation datasets.

n_folds

The number of `training_data` and `validation_data` data frames to make.

Value

a named list of two lists, each being a list of `n_folds` data frames.

Details

Since the assignment to folds is random, the proportions are not exact or guaranteed and there is some variability in the size of each `training_data` data frame, and likewise for the `validation_data` data frames.

Examples

if (FALSE) { # \dontrun{
  data(Boston, package = 'MASS')
  training_validation_data <- cv_random_schema(Boston, n_folds = 3)
  # take a look at what's in the output:
  str(training_validation_data, max.level = 2)
} # }