Skip to contents

Produce cv-rmse for a `super_learner` specified by a closure that accepts data and returns a `super_learner` prediction function.

Usage

cv_super_learner(
  data,
  sl_closure,
  y_variable,
  n_folds = 5,
  cv_schema = cv_random_schema,
  loss_metric
)

Arguments

data

Data to use in training a `super_learner`.

sl_closure

A function that takes in data and produces a `super_learner` predictor.

y_variable

The string name of the outcome column in `data`

n_folds

The number of cross-validation folds to use in constructing the `super_learner`.

cv_schema

A function that takes `data`, `n_folds` and returns a list containing `training_data` and `validation_data`, each of which are lists of `n_folds` data frames.

Details

The idea is that `cv_super_learner` splits the data into training/validation splits, trains `super_learner` on each training split, and then evaluates their predictions on the held-out validation data, calculating a root-mean-squared-error on those held-out data.