Correlation Threshold Based Screening
Arguments
- data
A dataframe intended to be used with
super_learner()
- formula
The formula specifying the regression to be done
- threshold
The correlation coefficient cutoff, below which variables are screened out from the dataset and regression formula.
- cor...
An optional list of extra arguments to pass to
cor
. Usemethod = 'spearman'
for the Spearman rank based correlation coefficient.
Value
A list of $data
with columns screened out,
$formula
with variables screened out, and $failed_to_correlate_names
the names of variables that failed to correlate with the outcome at least at the threshold
level.
Details
If a variable used has little correlation with the outcome being predicted, we might want to screen that variable out from the predictors.
In large datasets, this is quite important, as having a huge number of
columns could be computationally intractable or frustratingly time-consuming
to run super_learner()
with.
Examples
if (FALSE) { # \dontrun{
screener_cor(
data = mtcars,
formula = mpg ~ .,
threshold = .5)
# We're also showing how to specify that you want the Spearman rank-based
# correlation coefficient, to get away from the assumption of linearity.
screener_cor(
data = mtcars,
formula = mpg ~ .,
threshold = .5,
cor... = list(method = 'spearman')
)
} # }