Skip to contents

Cross-Validation with Origami

Usage

cv_origami_schema(
  data = data,
  n_folds = 5,
  fold_fun = origami::folds_vfold,
  cluster_ids = NULL,
  strata_ids = NULL,
  ...
)

Arguments

data

a data.frame (or similar) to split into training and validation datasets.

n_folds

The number of `training_data` and `validation_data` data frames to make.

fold_fun

An origami::folds_* function

cluster_ids

A vector of cluster ids. Clusters are treated as a unit – that is, all observations within a cluster are placed in either the training or validation set. See ?origami::make_folds.

strata_ids

A vector of strata ids. Strata are balanced: insofar as possible the distribution in the sample should be the same as the distribution in the training and validation sets. See ?origami::make_folds.

...

Extra arguments to be passed to origami::make_folds()

Examples

if (FALSE) { # \dontrun{

# to use origami::folds_vfold behind the scenes, just tell nadir::super_learner
# you want to use cv_origami_schema.

sl_model <- super_learner(
  data = mtcars,
  formula = mpg ~ cyl + hp,
  learners = list(rf = lnr_rf, lm = lnr_lm, mean = lnr_mean),
  cv_schema = cv_origami_schema,
  verbose = TRUE
 )

# if you want to use a different origami::folds_* function, pass it into cv_origami_schema
sl_model <- super_learner(
  data = mtcars,
  formula = mpg ~ cyl + hp,
  learners = list(rf = lnr_rf, lm = lnr_lm, mean = lnr_mean),
  cv_schema = \(data, n_folds) {
    cv_origami_schema(data, n_folds, fold_fun = origami::folds_loo)
    },
  verbose = TRUE
 )
} # }