Skip to contents

Screens out variables from the formula and dataset based on a p.value and/or the absolute value of the t statistic from a univariate linear regression (with intercept and one term) comparing each predictor to the outcome (dependent) variable.

Usage

screener_t_test(
  data,
  formula,
  p_value_threshold = NULL,
  t_statistic_threshold = NULL
)

Arguments

data

a dataset with variables mentioned in the formula

formula

a formula with terms from data, intended to be used with a learner from nadir.

p_value_threshold

A numeric scalar where terms pass if the t test for the linear model coefficient has p value lower than or equal to the p_value_threshold given.

t_statistic_threshold

A numeric scalar where terms pass if they have a t test statistic greater than or equal to the t_statistic_threshold given.

Value

A list of $data with columns screened out, $formula with variables screened out, and $failed_to_pass_threshold the names of variables that failed to associate with the outcome at least at the threshold level.

Details

The intended use of screener_t_test and other screeners is for pragmatic purposes: when there are a very large number of candidate predictors, such that super_learner is very slow to run, predictor variables that fail to have a detectable association with the dependent variable of a formula should be dropped from the learner.

See also

screeners, add_screener, screener_cor_top_n