
t-test Based Screening: Thresholds on p.values and/or t statistics
Source:R/screeners.R
screener_t_test.Rd
Screens out variables from the formula and dataset based on a p.value and/or the absolute value of the t statistic from a univariate linear regression (with intercept and one term) comparing each predictor to the outcome (dependent) variable.
Arguments
- data
a dataset with variables mentioned in the
formula
- formula
a
formula
with terms fromdata
, intended to be used with a learner fromnadir
.- p_value_threshold
A numeric scalar where terms pass if the t test for the linear model coefficient has p value lower than or equal to the
p_value_threshold
given.- t_statistic_threshold
A numeric scalar where terms pass if they have a t test statistic greater than or equal to the
t_statistic_threshold
given.
Value
A list of $data
with columns screened out,
$formula
with variables screened out, and $failed_to_pass_threshold
the names of variables that failed to associate with the outcome at least at the threshold
level.
Details
The intended use of screener_t_test
and other screeners is for
pragmatic purposes: when there are a very large number of candidate
predictors, such that super_learner
is very slow to run, predictor
variables that fail to have a detectable association with the dependent
variable of a formula should be dropped from the learner.