
t-test Based Screening: Thresholds on p.values and/or t statistics
Source:R/screeners.R
screener_t_test.RdScreens out variables from the formula and dataset based on a p.value and/or the absolute value of the t statistic from a univariate linear regression (with intercept and one term) comparing each predictor to the outcome (dependent) variable.
Arguments
- data
a dataset with variables mentioned in the
formula- formula
a
formulawith terms fromdata, intended to be used with a learner fromnadir.- p_value_threshold
A numeric scalar where terms pass if the t test for the linear model coefficient has p value lower than or equal to the
p_value_thresholdgiven.- t_statistic_threshold
A numeric scalar where terms pass if they have a t test statistic greater than or equal to the
t_statistic_thresholdgiven.
Value
A list of $data with columns screened out,
$formula with variables screened out, and $failed_to_pass_threshold
the names of variables that failed to associate with the outcome at least at the threshold
level.
Details
The intended use of screener_t_test and other screeners is for
pragmatic purposes: when there are a very large number of candidate
predictors, such that super_learner is very slow to run, predictor
variables that fail to have a detectable association with the dependent
variable of a formula should be dropped from the learner.