netcal.scaling.BetaCalibrationDependent¶

class netcal.scaling.BetaCalibrationDependent(*args, method: str = 'momentum', **kwargs)¶

This calibration method uses a multivariate variant of a Beta distribution to obtain a calibration mapping by means of the confidence as well as additional features. This method is originally proposed by [1]. This calibration scheme tries to model several dependencies in the variables given by the input X.

It is necessary to provide all data in input parameter X as an NumPy array of shape (n_samples, n_features), whereas the confidence must be the first feature given in the input array. The ground-truth samples y must be an array of shape (n_samples,) consisting of binary labels \(y \in \{0, 1\}\). Those labels indicate if the according sample has matched a ground truth box \(\text{m}=1\) or is a false prediction \(\text{m}=0\).

Mathematical background: For confidence calibration in classification tasks, a confidence mapping \(g\) is applied on top of a miscalibrated scoring classifier \(\hat{p} = h(x)\) to deliver a calibrated confidence score \(\hat{q} = g(h(x))\).

For detection calibration, we can also use the additional box regression output which we denote as \(\hat{r} \in [0, 1]^J\) with \(J\) as the number of dimensions used for the box encoding (e.g. \(J=4\) for x position, y position, width and height). Therefore, the calibration map is not only a function of the confidence score, but also of \(\hat{r}\). To define a general calibration map for binary problems, we use the logistic function and the combined input \(s = (\hat{p}, \hat{r})\) of size K by

\[g(s) = \frac{1}{1 + \exp(-z(s))} ,\]

According to [1], we can interpret the logit \(z\) as the logarithm of the posterior odds

\[z(s) = \log \frac{f(\text{m}=1 | s)}{f(\text{m}=0 | s)} \approx \log \frac{f(s | \text{m}=1)}{f(s | \text{m}=1)} = \ell r(s)\]

For a multivariate probability density function \(f(s|\text{m})\), we use a variant of the beta distribution described in [2] and given by

\[f(s|\text{m}) = \frac{1}{B(\alpha_0, ..., \alpha_K)} \frac{\prod^K_{k=1} \lambda_k^{\alpha_k}(s_k^\ast)^{\alpha_k - 1} \Big(\frac{s_k^\ast}{s_k}\Big)^2} {\Big[1 + \sum^K_{k=1} \lambda_k s_k^\ast\Big]^{\sum^K_{k=0} \alpha_k} }\]

with shape parameters \(\alpha_k, \beta_k > 0\), \(\forall k \in \{0, ..., K \}\). For notation easyness, we denote \(\lambda_k=\frac{\beta_k}{\beta_0}\) and \(s^\ast=\frac{s}{1-s}\). Inserting this density function into this framework with \(\alpha_k^+\), \(\beta_k^+\) and \(\alpha_k^-\), \(\beta_k^-\) as the distribution parameters for \(\text{m}=1\) and \(\text{m}=0\), respectively, we get a likelihood ratio of

\[\begin{split}\ell r(s) &= \sum^K_{k=1} \alpha_k^+ \log(\lambda_k^+) - \alpha_k^- \log(\lambda_k^-) \\ &+ \sum^K_{k=1} (\alpha_k^+ - \alpha_k^-) \log(s^\ast) \\ &+ \sum^K_{k=0} \alpha_k^- \log\Bigg[\sum^K_{j=1} \lambda_j^- s^\ast_j\Bigg] \\ &- \sum^K_{k=0} \alpha_k^+ \log\Bigg[\sum^K_{j=1} \lambda_j^+ s^\ast_j\Bigg] \\ &+ c ,\end{split}\]

where and \(c=\log B(\alpha_0^-, ..., \alpha_k^-) - \log B(\alpha_0^+, ..., \alpha^+_k)\).

This is optimized by an Adam optimizer with a learning rate of 1e-3 and a batch size of 256 for 1000 iterations (default).

Capturing epistemic uncertainty of the calibration method is also able with this implementation [3].

Parameters:

method (str, default: "momentum") – Method that is used to obtain a calibration mapping: - ‘mle’: Maximum likelihood estimate without uncertainty using a convex optimizer. - ‘momentum’: MLE estimate using Momentum optimizer for non-convex optimization. - ‘variational’: Variational Inference with uncertainty. - ‘mcmc’: Markov-Chain Monte-Carlo sampling with uncertainty.
momentum_epochs (int, optional, default: 1000) – Number of epochs used by momentum optimizer.
mcmc_steps (int, optional, default: 20) – Number of weight samples obtained by MCMC sampling.
mcmc_chains (int, optional, default: 1) – Number of Markov-chains used in parallel for MCMC sampling (this will result in mcmc_steps * mcmc_chains samples).
mcmc_warmup_steps (int, optional, default: 100) – Warmup steps used for MCMC sampling.
vi_epochs (int, optional, default: 1000) – Number of epochs used for ELBO optimization.
independent_probabilities (bool, optional, default: False) – Boolean for multi class probabilities. If set to True, the probability estimates for each class are treated as independent of each other (sigmoid).
use_cuda (str or bool, optional, default: False) – Specify if CUDA should be used. If str, you can also specify the device number like ‘cuda:0’, etc.

References

Methods

`__init__`(*args[, method])	Create an instance of BetaCalibrationDependent.
`clear`()	Clear model parameters.
`convex`(data, y, tensorboard, log_dir)	Convex optimization to find the global optimum of current parameter search.
`epsilon`(dtype)	Get the smallest digit that is representable depending on the passed dtype (NumPy or PyTorch).
`fit`(X, y[, random_state, tensorboard, log_dir])	Build logitic calibration model either conventional with single MLE estimate or with Variational Inference (VI) or Markov-Chain Monte-Carlo (MCMC) algorithm to also obtain uncertainty estimates.
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`guide`([X, y])	Variational substitution definition for each parameter.
`load_model`(filename)	Load model from saved torch dump.
`mask`()	Seek for all relevant weights whose values are negative.
`mcmc`(data, y, tensorboard, log_dir)	Perform Markov-Chain Monte-Carlo sampling on the (unknown) posterior.
`model`([X, y])	Definition of the log regression model.
`momentum`(data, y, tensorboard, log_dir)	Momentum optimization to find the global optimum of current parameter search.
`prepare`(X)	Preprocessing of input data before called at the beginning of the fit-function.
`prior`(dtype)	Prior definition of the weights used for log regression.
`save_model`(filename)	Save model instance as with torch's save function as this is safer for torch tensors.
`set_fit_request`(*[, log_dir, random_state, ...])	Request metadata passed to the `fit` method.
`set_output`(*[, transform])	Set output container.
`set_params`(**params)	Set the parameters of this estimator.
`set_transform_request`(*[, mean_estimate, ...])	Request metadata passed to the `transform` method.
`to`(device)	Set distribution parameters to the desired device in order to compute either on CPU or GPU.
`transform`(X[, num_samples, random_state, ...])	After model calibration, this function is used to get calibrated outputs of uncalibrated confidence estimates.
`variational`(data, y, tensorboard, log_dir)	Perform variational inference using the guide.