This article contains some advice for writing and constructing new learners.
Weights
We recommend explicitly handling weights
as an argument
to learners so that it is a protected argument. Some of the internals of
different algorithms may vary, using other names for
weights
instead, so we recommend doing this to standardize
the weights argument across different learner algorithms. As a concrete
example, the ranger::ranger()
function takes
case.weights
as its argument rather than
weights
.
A typical learner that supports weights
might look
like:
lnr_supportsWeights <- function(data, formula, weights = NULL, ...) {
# train the model
model <- model_fit(data = data, formula = formula, weights = weights, ...)
return(function(newdata) {
predict(model, newdata = newdata)
})
}
However, some model fitting procedures do not like passing
weights = NULL
and so it may be necessary to be careful not
to pass the default weights = NULL
to the model fitting
procedure. As an example of this, we refer the curious reader to the
source of lnr_glm
in https://github.com/ctesta01/nadir/blob/main/R/learners.R.
model_training_arguments <- list(data = data, formula = formula)
# add weights they aren’t missing if (! is.null(weights) & length(weights) == nrow(data)) { model_training_arguments$weights <- weights }
Attributes
It’s recommended that if you create learners, that you also give them a couple attributes for a couple of reasons:
- If a learner has a
sl_lnr_name
attribute, then this can be automatically used in the outputs if a name for the learner is left unspecified. - If a learner has a
sl_lnr_type
attribute, it will be checked against theoutput_type
argument tosuper_learner()
.
To set these attributes, when making a new learner, one should run something along the lines of
lnr_myNewLearner <- function(data, formula, ...) {
model <- # fit your learner given data, formula, ...
predictor_fn <- function(newdata) {
predict(model, newdata = newdata)
}
return(predictor)
}
attr(lnr_myNewLearner, 'sl_lnr_name') <- 'newLearnerName'
attr(lnr_myNewLearner, 'sl_lnr_type') <- 'continuous' # or c('continuous', 'binary') and similar
# see ?nadir_supported_types