Error Handling • nadir

How are errors handled in nadir?

We try to handle errors gracefully. We understand that specifying complicated models may not always go smoothly, and models can be finnicky.

When a candidate learner throws an error during the super_learner() training process, we set its weight to 0 and collect the thrown errors for the user in the verbose output.

For example, lnr_lmer will throw an error if the formula does not use random effects. The below code snippet shows how we collect the errors and ensure that super learning can continue even if one or more learners fail.

library(nadir)

# train a super_learner() model
# 
# lnr_lmer will error because we didn't use random effects
sl_model <- super_learner(
  mtcars,
  formula = mpg ~ cyl,
  learners = list(lnr_mean, lnr_lmer)
  )

# observe, prediction falls back to relying on the other learners we specified:
# in this case, just lnr_mean
sl_model(mtcars)
#>  [1] 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062
#>  [9] 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062
#> [17] 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062
#> [25] 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062 20.09062

# if we specify a more complicated super learner, we can see that as long as we
# specify random effects, lnr_lmer doesn't fail. 
# 
# also, now you can see how {nadir} collects errors in its verbose output.
sl_verbose_output <- super_learner(
  mtcars,
  formulas = list(
    .default = mpg ~ cyl,
    lmer_2 = mpg ~ (1 | cyl) + hp),
  learners = list(lnr_mean, lnr_lmer, lnr_lmer),
  verbose = TRUE
)

sl_verbose_output |> str()
#> List of 8
#>  $ sl_predictor                    :function (newdata)  
#>   ..- attr(*, "srcref")= 'srcref' int [1:8] 448 39 454 3 39 3 2442 2448
#>   .. ..- attr(*, "srcfile")=Classes 'srcfilealias', 'srcfile' <environment: 0x11dbf7440> 
#>  $ y_variable                      : chr "mpg"
#>  $ outcome_type                    : chr "continuous"
#>  $ learner_weights                 : Named num [1:2] 0.0539 0.9461
#>   ..- attr(*, "names")= chr [1:2] "mean" "lmer_2"
#>  $ holdout_predictions             : tibble [32 × 4] (S3: tbl_df/tbl/data.frame)
#>   ..$ .sl_fold: int [1:32] 1 1 1 1 1 1 1 2 2 2 ...
#>   ..$ mean    : num [1:32] 20.1 20.1 20.1 20.1 20.1 ...
#>   ..$ lmer_2  : Named num [1:32] 20.5 19.9 13.9 27 25.4 ...
#>   .. ..- attr(*, "names")= chr [1:32] "Mazda RX4 Wag" "Merc 280" "Chrysler Imperial" "Toyota Corolla" ...
#>   ..$ mpg     : num [1:32] 21 19.2 14.7 33.9 21.5 15.5 15 21 22.8 14.3 ...
#>  $ errors_from_training_cv_stage1  :List of 10
#>   ..$ message: chr "No random effects terms specified in formula"
#>   ..$ call   : language lnr_lmer_1(training_data[[1L]], formula = mpg ~ cyl, NULL)
#>   ..$ message: chr "No random effects terms specified in formula"
#>   ..$ call   : language lnr_lmer_1(training_data[[2L]], formula = mpg ~ cyl, NULL)
#>   ..$ message: chr "No random effects terms specified in formula"
#>   ..$ call   : language lnr_lmer_1(training_data[[3L]], formula = mpg ~ cyl, NULL)
#>   ..$ message: chr "No random effects terms specified in formula"
#>   ..$ call   : language lnr_lmer_1(training_data[[4L]], formula = mpg ~ cyl, NULL)
#>   ..$ message: chr "No random effects terms specified in formula"
#>   ..$ call   : language lnr_lmer_1(training_data[[5L]], formula = mpg ~ cyl, NULL)
#>  $ errors_from_predicting_cv_stage2:List of 10
#>   ..$ message: chr "attempt to apply non-function"
#>   ..$ call   : language trained_learners[["lmer_1"]][[1L]](validation_data[[1L]])
#>   ..$ message: chr "attempt to apply non-function"
#>   ..$ call   : language trained_learners[["lmer_1"]][[2L]](validation_data[[2L]])
#>   ..$ message: chr "attempt to apply non-function"
#>   ..$ call   : language trained_learners[["lmer_1"]][[3L]](validation_data[[3L]])
#>   ..$ message: chr "attempt to apply non-function"
#>   ..$ call   : language trained_learners[["lmer_1"]][[4L]](validation_data[[4L]])
#>   ..$ message: chr "attempt to apply non-function"
#>   ..$ call   : language trained_learners[["lmer_1"]][[5L]](validation_data[[5L]])
#>  $ erring_learners                 : chr "lmer_1"
#>  - attr(*, "class")= chr [1:2] "list" "nadir_sl_verbose_output"

Note that the language objects in the error calls are modified slightly inside super_learner() to make them more user-friendly so that they contain the names of the learners, and information about what formula, fold, and extra learner arguments they were called with rather than obscure errors that refer to things like learners[[i]] and use do.call() to programmatically pass the extra arguments.