Conditional Density Estimation with Heteroskedasticity — lnr_heteroskedastic

TODO: The following code has a bug / statistical issue. =======================================================

Usage

lnr_heteroskedastic_density(
  data,
  formula,
  mean_lnr,
  var_lnr,
  mean_lnr_args = NULL,
  var_lnr_args = NULL,
  density_args = NULL
)

Arguments

data: A dataframe to train a learner / learners on.
formula: A regression formula to use inside this learner.
mean_lnr: A learner (function) passed in to be trained on the data with the given formula and then used to predict conditional means for provided newdata.
var_lnr: A learner (function) passed in to be trained on the squared error from the mean_lnr on the given data and then used to predict the expected variance for the density distribution of the outcome centered around the predicted conditional mean in the output.
mean_lnr_args: Extra arguments to be passed to the mean_lnr
var_lnr_args: Extra arguments to be passed to the var_lnr
density_args: Extra arguments to be passed to the kernel density smoother stats::density, especially things like bw for specifying the smoothing bandwidth. See ?stats::density.

Value

a closure (function) that produces density estimates at the newdata given according to the fit model.

Details

I think there are bugs with this because performing a basic test that if we fix the conditioning set (X) and integrate, integrating a conditional probability density with X fixed should yield 1.

In numerical tests, when the variance is scaled for, integrating conditional densities seems to yield integration values exceeding 1 (sometimes by a lot). I am pretty sure this poses a problem for optimizing negative log likelihood loss.

Said numerical tests are displayed in the `Density-Estimation` article.