Skip to contents

Correlation Threshold Based Screening

Usage

screener_cor(data, formula, threshold = 0.2, cor... = NULL)

Arguments

data

A dataframe intended to be used with super_learner()

formula

The formula specifying the regression to be done

threshold

The correlation coefficient cutoff, below which variables are screened out from the dataset and regression formula.

cor...

An optional list of extra arguments to pass to cor. Use method = 'spearman' for the Spearman rank based correlation coefficient.

Value

A list of $data with columns screened out, $formula with variables screened out, and $failed_to_correlate_names the names of variables that failed to correlate with the outcome at least at the threshold level.

Details

If a variable used has little correlation with the outcome being predicted, we might want to screen that variable out from the predictors.

In large datasets, this is quite important, as having a huge number of columns could be computationally intractable or frustratingly time-consuming to run super_learner() with.

Examples

if (FALSE) { # \dontrun{
screener_cor(
  data = mtcars,
  formula = mpg ~ .,
  threshold = .5)

# We're also showing how to specify that you want the Spearman rank-based
# correlation coefficient, to get away from the assumption of linearity.

screener_cor(
  data = mtcars,
  formula = mpg ~ .,
  threshold = .5,
  cor... = list(method = 'spearman')
  )
} # }