Currying, Closures, and Function Factories • nadir

R is a functional programming language, which allows for functions to build and return functions just like any other return object.

Super Learning heavily rests on the ability to train learners.

We refer to functions that create and return another function as a function factory. For an extended reference, see the Advanced R book.

Function factories are so useful in nadir because, at their essence, a candidate learner needs to be able to 1) accept training data, and 2) produce a prediction function that can make predictions on heldout validation data. So a typical learner in nadir looks like:

lnr_lm <- function(data, formula, ...) {
  model <- stats::lm(formula = formula, data = data, ...)

  predict_from_trained_lm <- function(newdata) {
    predict(model, newdata = newdata, type = 'response')
  }
  return(predict_from_trained_lm)
}

Moreover, given how code-lightweight it is to write a simple learner, this makes it relatively easy for users to write new learners that meet their exact needs.

If you want to implement your own learners, you just need to follow the following pseudocode approach:

lnr_custom <- function(data, formula, ...) {
  model <- # train your model using data, formula, ... 
  
  predict_from_model <- function(newdata) {
    return(predict(model, newdata = newdata)) # return predictions from the trained model 
    # (predictions should be a vector of predictions for each row of newdata)
  }
  return(predict_from_model)
}

Note: At present, the user needs to be careful that the models specified produce predictions for the right outcome type (e.g., non-negative, continuous, densities, etc.).

We refer to the returned predict_from_model function as a closure because the trained model is actually encapsulated inside it in order to be able to produce predictions. A pneumonic/memory-aid that could be useful is that a closure encloses objects (namely a trained model in our case) inside it to facilitate operating with its input to produce its output.

Now you know about function factories and closures. One more functional programming practice used in nadir is that of currying, which is closely related to producing a closure.

If you are familiar with the following mathematical notation, a concise way to describe currying is as follows: to “curry” the function $f(x, y)$ so that it only takes an argument $x$ for some fixed $y$ is to produce the function $x \mapsto f_y(x).$

Currying is perhaps most easily thought of as taking a function of several arguments, fixing some subset of them, and leaving the others unspecified and thereby producing a new function that only takes the subset of arguments.

Let’s do a simple example:

f <- function(x, y) {
  x + y
}

add_five <- function(x) {
  f(x, 5)
}

add_five(5)
#> [1] 10

We would refer to add_five() as a curried function. Why is this useful to us in nadir? We use currying to simplify the process for running cv_super_learner().

nadir internally produces a curried version of the specified super_learner() with everything specified/fixed except for the data argument.

This way, inside nadir:::cv_super_learner_internal() the curried super learner can be called repeatedly (syntactically easily) on different training datasets.