Skip to contents

Each row in the data are assigned to one of `1:n_folds` at random. Then for each of `i` in `1:n_folds`, the `training_data[[i]]` is comprised of the data with `sl_fold != i`, i.e., capturing roughly `(n-folds-1)/n_folds` proportion of the data. The validation data is a list of dataframes, each comprising of roughly `1/n_folds` proportion of the data.

Usage

cv_random_schema(data, n_folds = 5)

Arguments

data

a data.frame (or similar) to split into training and validation datasets.

n_folds

The number of `training_data` and `validation_data` data frames to make.

Value

A list of two lists ($training_data and $validation_data) which are each lists of length n_folds. In each of those entries is a data.frame that contains the nth training or validation fold of the data.

Details

Since the assignment to folds is random, the proportions are not exact or guaranteed and there is some variability in the size of each `training_data` data frame, and likewise for the `validation_data` data frames.

Examples

if (FALSE) { # \dontrun{
  data(Boston, package = 'MASS')
  training_validation_data <- cv_random_schema(Boston, n_folds = 3)
  # take a look at what's in the output:
  str(training_validation_data, max.level = 2)
} # }