netcal.metrics¶

Metrics Package to Measure Miscalibration¶

Methods for measuring miscalibration in the context of confidence calibration and regression uncertainty calibration.

The common methods for confidence calibration evaluation are given with the netcal.metrics.confidence.ECE (ECE), netcal.metrics.confidence.MCE (MCE), and netcal.metrics.confidence.ACE (ACE). Each method bins the samples by their confidence and measures the accuracy in each bin. The ECE gives the mean gap between confidence and observed accuracy in each bin weighted by the number of samples. The MCE returns the highest observed deviation. The ACE is similar to the ECE but weights each bin equally.

The common methods for regression uncertainty evaluation are netcal.metrics.regression.PinballLoss (Pinball loss), the netcal.metrics.regression.NLL (NLL), and the netcal.metrics.regression.QCE (M-QCE and C-QCE). The Pinball loss as well as the Marginal/Conditional Quantile Calibration Error (M-QCE and C-QCE) evaluate the quality of the estimated quantiles compared to the observed ground-truth quantile coverage. The NLL is a proper scoring rule to measure the overall quality of the predicted probability distributions.

For a detailed description of the available metrics within regression calibration, see the module doc of netcal.regression.

Available classes¶

`ACE`([bins, equal_intervals, detection, ...])	Average Calibration Error (ACE) for classification and Detection Average Calibration Error (D-ACE) for object detection or segmentation.
`ECE`([bins, equal_intervals, detection, ...])	Expected Calibration Error (ECE) for classification and Detection Expected Calibration Error (D-ECE) for object detection or segmentation.
`MCE`([bins, equal_intervals, detection, ...])	Maximum Calibration Error (MCE) for classification and Detection Maximum Calibration Error (D-MCE) for object detection or segmentation.
`MMCE`([detection])	Maximum Mean Calibration Error (MMCE).
`QuantileLoss`()	Pinball aka quantile loss within regression calibration to test for quantile calibration of a probabilistic regression model.
`PinballLoss`()	Synonym for Quantile loss.
`ENCE`([bins, sample_threshold])	Expected Normalized Calibration Error (ENCE) for a regression calibration evaluation to test for variance calibration.
`UCE`([bins, sample_threshold])	Uncertainty Calibration Error (UCE) for a regression calibration evaluation to test for variance calibration.
`PICP`([bins, equal_intervals, detection, ...])	Compute Prediction Interval Coverage Probability (PICP) and Mean Prediction Interval Width (MPIW).
`QCE`([bins, marginal, sample_threshold])	Marginal Quantile Calibration Error (M-QCE) and Conditional Quantile Calibration Error (C-QCE) which both measure the gap between predicted quantiles and observed quantile coverage also for multivariate distributions.
`NLL`()	Negative log likelihood (NLL) for probabilistic regression models.

Packages¶

confidence

Metrics for Confidence Calibration Methods for measuring miscalibration in the context of confidence calibration. The common methods for confidence calibration evaluation are given with the netcal.metrics.confidence.ECE (ECE), netcal.metrics.confidence.MCE (MCE), and netcal.metrics.confidence.ACE (ACE). Each method bins the samples by their confidence and measures the accuracy in each bin. The ECE gives the mean gap between confidence and observed accuracy in each bin weighted by the number of samples. The MCE returns the highest observed deviation. The ACE is similar to the ECE but weights each bin equally. A further metric is the Maximum Mean Calibration Error (MMCE) which is a differentiable variant of the ECE that might also be used as a regularization technique during model training.

regression

Metrics for Regression Uncertainty Calibration Methods for measuring miscalibration in the context of regression uncertainty calibration for probabilistic regression models. The common methods for regression uncertainty evaluation are netcal.metrics.regression.PinballLoss (Pinball loss), the netcal.metrics.regression.NLL (NLL), and the netcal.metrics.regression.QCE (M-QCE and C-QCE). The Pinball loss as well as the Marginal/Conditional Quantile Calibration Error (M-QCE and C-QCE) evaluate the quality of the estimated quantiles compared to the observed ground-truth quantile coverage. The NLL is a proper scoring rule to measure the overall quality of the predicted probability distributions. Further metrics are the netcal.metrics.regression.UCE (UCE) and the netcal.metrics.regression.ENCE (ENCE) which both apply a binning scheme over the predicted standard deviation/variance and test for variance calibration. For a detailed description of the available metrics within regression calibration, see the module doc of netcal.regression.