Screeners work off of the principle that they should take the same arguments that a learner does and return a modified dataset and formula in which variables that have failed to meet some threshold have been screened out.
Details
A screener can be added to a learner by using the add_screener(learner, screener)
function provided. This returns a modified learner that implements screening based on the
data and formula passed.
So far, the screeners implemented rely on being able to call model.matrix
and therefore
only support standard (generalized) linear model syntax like those mentioned in ?formula
.
Examples
if (FALSE) { # \dontrun{
# examples for setting up a screened regression problem:
#
# users can just run a screener to see what data and formula terms pass the
# given screener conditions:
screened_regression_problem <- screener_cor(data = mtcars, formula = mpg ~ ., threshold = 0.5)
screened_regression_problem
screened_regression_problem2 <- screener_cor(data = mtcars, formula = mpg ~ ., threshold = 0.5, cor... = list(method = 'spearman'))
screened_regression_problem2
screened_regression_problem3 <- screener_t_test(data = mtcars, formula = mpg ~ ., t_statistic_threshold = 10)
screened_regression_problem3
# build a new learner with screening builtin:
lnr_rf_screener_top_5_cor_terms <- add_screener(
learner = lnr_rf,
screener = screener_cor_top_n,
screener_extra_args = list(cor... = list(method = 'spearman'),
keep_n_terms = 5)
)
# train learner
trained_learner <- lnr_rf_screener_top_5_cor_terms(data = mtcars, formula = mpg ~ .)
mtcars_modified <- mtcars
mtcars_modified['gear'] <- 1 # gear is one of the least correlated variables with mpg
identical(trained_learner(mtcars), trained_learner(mtcars_modified))
} # }