netcal.regression.VarianceScaling¶

class netcal.regression.VarianceScaling¶

Variance recalibration using maximum likelihood estimation for multiple independent dimensions (optionally). Rescales the input standard deviation by a scalar parameter to achieve variance calibration [1], [2]. This method uses the negative log likelihood as the loss function to optimize the scalar scaling parameter. The distributional mean is fixed during optimization.

On the one hand, this method accepts as input X either a tuple X = (mean, stddev) using two NumPy arrays of shape N with N number of samples that express the estimated mean and standard deviation of a probabilistic forecaster. On the other hand, a NumPy array of shape (R, N) is also accepted where R denotes the number of probabilistic forecasts. For example, if probabilistic outputs are obtained by Monte-Carlo sampling using N samples and R stochastic forward passes, it is possible to pass all outputs to the calibration function in a single NumPy array.

This method is capable of multiple independent data dimensions where separate calibration models are fitted for each data dimension. This method outputs the recalibrated standard deviation (stddev) for each dimension D.

Mathematical background: In [1] and [2], regression calibration is defined in terms of variance calibration. A probabilistic forecaster \(h(X)\) outputs for any input \(X \in \mathbb{R}\) a mean \(\mu_Y(X)\) and a variance \(\sigma_Y^2(X)\) for the target domain \(Y \in \mathcal{Y} = \mathbb{R}\). Using this notation, variance calibration [1], [2] is defined as

\[\mathbb{E}_{X,Y}\Big[(\mu_Y(X) - Y)^2 | \sigma^2_Y(X) = \sigma \Big] = \sigma^2, \quad \forall \sigma \in \mathbb{R}_{>0},\]

In other words, the estimated variance should match the observed variance given a certain variance level. For example, if a forecaster outputs 100 predictions with a variance of \(\sigma^2=2\), we would also expect a variance (mean squared error) of 2. Further definitions for regression calibration are quantile calibration and distribution calibration.

To achieve variance calibration, the Variance Scaling methods applies a temperature scaling on the input variance by a single scalar parameter \(\theta\). The methods uses the negative log likelihood as the loss for the scalar. Since we are working with Gaussians, the loss is given by

\[\begin{split}\mathcal{L}(\theta) &= -\sum^N_{n=1} \frac{1}{\sqrt{2\pi} \theta \cdot \sigma_Y(x_n)} \exp\Bigg( \frac{y_n - \mu_Y(x_n)}{2 (\theta \cdot \sigma_Y(x_n))^2} \Bigg) \\ &\propto -N \log(\theta) - \frac{1}{2\theta^2} \sum^N_{n=1} \sigma_Y^{-2}(x_n) , \big(y_n - \mu_Y(x_n)\big)^2\end{split}\]

which is to be minimized. Thus, we seek to get the minimum of the optimization objective which can be analytically determined in this case, setting its first derivative to 0 by

\[\begin{split}&\frac{\partial \mathcal{L}(\theta)}{\partial \theta} = 0\\ \leftrightarrow \quad & -N \theta^2 \sum^N_{n=1} \sigma_Y^{-2}(x_n) \big(y_n - \mu_Y(x_n) \big)^2 \\ \leftrightarrow \quad & \theta \pm \sqrt{\frac{1}{N} \sum^N_{n=1} \sigma_Y^{-2}(x_n) \big(y_n - \mu_Y(x_n) \big)^2} .\end{split}\]

References

Methods

`__init__`()	Constructor.
`clear`()	Clear model parameters.
`epsilon`(dtype)	Get the smallest digit that is representable depending on the passed dtype (NumPy or PyTorch).
`fit`(X, y[, tensorboard])	Fit a variance scaling calibration method to the provided data.
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`load_model`(filename)	Load model from saved torch dump.
`save_model`(filename)	Save model instance as with torch's save function as this is safer for torch tensors.
`set_fit_request`(*[, tensorboard])	Request metadata passed to the `fit` method.
`set_output`(*[, transform])	Set output container.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(X)	Transform uncalibrated distributional estimates (mean and stddev or stochastic samples) to calibrated ones by applying variance recalibration.