How are errors handled in nadir?
We try to handle errors gracefully. We understand that specifying complicated models may not always go smoothly, and models can be finnicky.
When a candidate learner throws an error during the
super_learner()
training process, we set its weight to 0
and collect the thrown errors for the user in the verbose output.
For example, lnr_lmer
will throw an error if the formula
does not use random effects. The below code snippet shows how we collect
the errors and ensure that super learning can continue even if one or
more learners fail.
library(nadir)
# train a super_learner() model
#
# lnr_lmer will error because we didn't use random effects
sl_model <- super_learner(
mtcars,
formula = mpg ~ cyl,
learners = list(lnr_mean, lnr_lmer)
)
# observe, prediction falls back to relying on the other learners we specified:
# in this case, just lnr_mean
sl_model(mtcars)
#> [1] 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062
#> [9] 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062
#> [17] 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062
#> [25] 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062
# if we specify a more complicated super learner, we can see that as long as we
# specify random effects, lnr_lmer doesn't fail.
#
# also, now you can see how {nadir} collects errors in its verbose output.
sl_verbose_output <- super_learner(
mtcars,
formulas = list(
.default = mpg ~ cyl,
lmer_2 = mpg ~ (1 | cyl) + hp),
learners = list(lnr_mean, lnr_lmer, lnr_lmer),
verbose = TRUE
)
sl_verbose_output |> str()
#> List of 8
#> $ sl_predictor :function (newdata)
#> ..- attr(*, "srcref")= 'srcref' int [1:8] 448 39 454 3 39 3 2442 2448
#> .. ..- attr(*, "srcfile")=Classes 'srcfilealias', 'srcfile' <environment: 0x11dbf7440>
#> $ y_variable : chr "mpg"
#> $ outcome_type : chr "continuous"
#> $ learner_weights : Named num [1:2] 0.0539 0.9461
#> ..- attr(*, "names")= chr [1:2] "mean" "lmer_2"
#> $ holdout_predictions : tibble [32 × 4] (S3: tbl_df/tbl/data.frame)
#> ..$ .sl_fold: int [1:32] 1 1 1 1 1 1 1 2 2 2 ...
#> ..$ mean : num [1:32] 20.1 20.1 20.1 20.1 20.1 ...
#> ..$ lmer_2 : Named num [1:32] 20.5 19.9 13.9 27 25.4 ...
#> .. ..- attr(*, "names")= chr [1:32] "Mazda RX4 Wag" "Merc 280" "Chrysler Imperial" "Toyota Corolla" ...
#> ..$ mpg : num [1:32] 21 19.2 14.7 33.9 21.5 15.5 15 21 22.8 14.3 ...
#> $ errors_from_training_cv_stage1 :List of 10
#> ..$ message: chr "No random effects terms specified in formula"
#> ..$ call : language lnr_lmer_1(training_data[[1L]], formula = mpg ~ cyl, NULL)
#> ..$ message: chr "No random effects terms specified in formula"
#> ..$ call : language lnr_lmer_1(training_data[[2L]], formula = mpg ~ cyl, NULL)
#> ..$ message: chr "No random effects terms specified in formula"
#> ..$ call : language lnr_lmer_1(training_data[[3L]], formula = mpg ~ cyl, NULL)
#> ..$ message: chr "No random effects terms specified in formula"
#> ..$ call : language lnr_lmer_1(training_data[[4L]], formula = mpg ~ cyl, NULL)
#> ..$ message: chr "No random effects terms specified in formula"
#> ..$ call : language lnr_lmer_1(training_data[[5L]], formula = mpg ~ cyl, NULL)
#> $ errors_from_predicting_cv_stage2:List of 10
#> ..$ message: chr "attempt to apply non-function"
#> ..$ call : language trained_learners[["lmer_1"]][[1L]](validation_data[[1L]])
#> ..$ message: chr "attempt to apply non-function"
#> ..$ call : language trained_learners[["lmer_1"]][[2L]](validation_data[[2L]])
#> ..$ message: chr "attempt to apply non-function"
#> ..$ call : language trained_learners[["lmer_1"]][[3L]](validation_data[[3L]])
#> ..$ message: chr "attempt to apply non-function"
#> ..$ call : language trained_learners[["lmer_1"]][[4L]](validation_data[[4L]])
#> ..$ message: chr "attempt to apply non-function"
#> ..$ call : language trained_learners[["lmer_1"]][[5L]](validation_data[[5L]])
#> $ erring_learners : chr "lmer_1"
#> - attr(*, "class")= chr [1:2] "list" "nadir_sl_verbose_output"
Note that the language objects in the error calls are modified
slightly inside super_learner()
to make them more
user-friendly so that they contain the names of the learners, and
information about what formula, fold, and extra learner arguments they
were called with rather than obscure errors that refer to things like
learners[[i]]
and use do.call()
to
programmatically pass the extra arguments.