netcal.scaling.BetaCalibrationDependent¶
- class netcal.scaling.BetaCalibrationDependent(*args, method: str = 'momentum', **kwargs)¶
This calibration method uses a multivariate variant of a Beta distribution to obtain a calibration mapping by means of the confidence as well as additional features. This method is originally proposed by [1]. This calibration scheme tries to model several dependencies in the variables given by the input
X.It is necessary to provide all data in input parameter
Xas an NumPy array of shape(n_samples, n_features), whereas the confidence must be the first feature given in the input array. The ground-truth samplesymust be an array of shape(n_samples,)consisting of binary labels \(y \in \{0, 1\}\). Those labels indicate if the according sample has matched a ground truth box \(\text{m}=1\) or is a false prediction \(\text{m}=0\).Mathematical background: For confidence calibration in classification tasks, a confidence mapping \(g\) is applied on top of a miscalibrated scoring classifier \(\hat{p} = h(x)\) to deliver a calibrated confidence score \(\hat{q} = g(h(x))\).
For detection calibration, we can also use the additional box regression output which we denote as \(\hat{r} \in [0, 1]^J\) with \(J\) as the number of dimensions used for the box encoding (e.g. \(J=4\) for x position, y position, width and height). Therefore, the calibration map is not only a function of the confidence score, but also of \(\hat{r}\). To define a general calibration map for binary problems, we use the logistic function and the combined input \(s = (\hat{p}, \hat{r})\) of size K by
\[g(s) = \frac{1}{1 + \exp(-z(s))} ,\]According to [1], we can interpret the logit \(z\) as the logarithm of the posterior odds
\[z(s) = \log \frac{f(\text{m}=1 | s)}{f(\text{m}=0 | s)} \approx \log \frac{f(s | \text{m}=1)}{f(s | \text{m}=1)} = \ell r(s)\]For a multivariate probability density function \(f(s|\text{m})\), we use a variant of the beta distribution described in [2] and given by
\[f(s|\text{m}) = \frac{1}{B(\alpha_0, ..., \alpha_K)} \frac{\prod^K_{k=1} \lambda_k^{\alpha_k}(s_k^\ast)^{\alpha_k - 1} \Big(\frac{s_k^\ast}{s_k}\Big)^2} {\Big[1 + \sum^K_{k=1} \lambda_k s_k^\ast\Big]^{\sum^K_{k=0} \alpha_k} }\]with shape parameters \(\alpha_k, \beta_k > 0\), \(\forall k \in \{0, ..., K \}\). For notation easyness, we denote \(\lambda_k=\frac{\beta_k}{\beta_0}\) and \(s^\ast=\frac{s}{1-s}\). Inserting this density function into this framework with \(\alpha_k^+\), \(\beta_k^+\) and \(\alpha_k^-\), \(\beta_k^-\) as the distribution parameters for \(\text{m}=1\) and \(\text{m}=0\), respectively, we get a likelihood ratio of
\[\begin{split}\ell r(s) &= \sum^K_{k=1} \alpha_k^+ \log(\lambda_k^+) - \alpha_k^- \log(\lambda_k^-) \\ &+ \sum^K_{k=1} (\alpha_k^+ - \alpha_k^-) \log(s^\ast) \\ &+ \sum^K_{k=0} \alpha_k^- \log\Bigg[\sum^K_{j=1} \lambda_j^- s^\ast_j\Bigg] \\ &- \sum^K_{k=0} \alpha_k^+ \log\Bigg[\sum^K_{j=1} \lambda_j^+ s^\ast_j\Bigg] \\ &+ c ,\end{split}\]where and \(c=\log B(\alpha_0^-, ..., \alpha_k^-) - \log B(\alpha_0^+, ..., \alpha^+_k)\).
This is optimized by an Adam optimizer with a learning rate of 1e-3 and a batch size of 256 for 1000 iterations (default).
Capturing epistemic uncertainty of the calibration method is also able with this implementation [3].
- Parameters:
method (str, default: "momentum") – Method that is used to obtain a calibration mapping: - ‘mle’: Maximum likelihood estimate without uncertainty using a convex optimizer. - ‘momentum’: MLE estimate using Momentum optimizer for non-convex optimization. - ‘variational’: Variational Inference with uncertainty. - ‘mcmc’: Markov-Chain Monte-Carlo sampling with uncertainty.
momentum_epochs (int, optional, default: 1000) – Number of epochs used by momentum optimizer.
mcmc_steps (int, optional, default: 20) – Number of weight samples obtained by MCMC sampling.
mcmc_chains (int, optional, default: 1) – Number of Markov-chains used in parallel for MCMC sampling (this will result in mcmc_steps * mcmc_chains samples).
mcmc_warmup_steps (int, optional, default: 100) – Warmup steps used for MCMC sampling.
vi_epochs (int, optional, default: 1000) – Number of epochs used for ELBO optimization.
independent_probabilities (bool, optional, default: False) – Boolean for multi class probabilities. If set to True, the probability estimates for each class are treated as independent of each other (sigmoid).
use_cuda (str or bool, optional, default: False) – Specify if CUDA should be used. If str, you can also specify the device number like ‘cuda:0’, etc.
References
Methods
__init__(*args[, method])Create an instance of BetaCalibrationDependent.
clear()Clear model parameters.
convex(data, y, tensorboard, log_dir)Convex optimization to find the global optimum of current parameter search.
epsilon(dtype)Get the smallest digit that is representable depending on the passed dtype (NumPy or PyTorch).
fit(X, y[, random_state, tensorboard, log_dir])Build logitic calibration model either conventional with single MLE estimate or with Variational Inference (VI) or Markov-Chain Monte-Carlo (MCMC) algorithm to also obtain uncertainty estimates.
fit_transform(X[, y])Fit to data, then transform it.
get_metadata_routing()Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
guide([X, y])Variational substitution definition for each parameter.
load_model(filename)Load model from saved torch dump.
mask()Seek for all relevant weights whose values are negative.
mcmc(data, y, tensorboard, log_dir)Perform Markov-Chain Monte-Carlo sampling on the (unknown) posterior.
model([X, y])Definition of the log regression model.
momentum(data, y, tensorboard, log_dir)Momentum optimization to find the global optimum of current parameter search.
prepare(X)Preprocessing of input data before called at the beginning of the fit-function.
prior(dtype)Prior definition of the weights used for log regression.
save_model(filename)Save model instance as with torch's save function as this is safer for torch tensors.
set_fit_request(*[, log_dir, random_state, ...])Request metadata passed to the
fitmethod.set_output(*[, transform])Set output container.
set_params(**params)Set the parameters of this estimator.
set_transform_request(*[, mean_estimate, ...])Request metadata passed to the
transformmethod.to(device)Set distribution parameters to the desired device in order to compute either on CPU or GPU.
transform(X[, num_samples, random_state, ...])After model calibration, this function is used to get calibrated outputs of uncalibrated confidence estimates.
variational(data, y, tensorboard, log_dir)Perform variational inference using the guide.