netcal.metrics.confidence.ACE¶

class netcal.metrics.confidence.ACE(bins: int | Iterable[int] = 10, equal_intervals: bool = True, detection: bool = False, sample_threshold: int = 1)¶

Average Calibration Error (ACE) for classification and Detection Average Calibration Error (D-ACE) for object detection or segmentation. This metric is used on classification [1] or as Detection Average Calibration Error (D-ACE) [2] on object detection tasks. This metrics measures the average difference between accuracy and confidence by grouping all samples into \(B\) bins and calculating

\[ACE = \frac{1}{B} \sum_{b=1}^B |\text{acc}(b) - \text{conf}(b)| ,\]

where \(\text{acc}(b)\) and \(\text{conf}(b)\) denote the accuracy and average confidence in the b-th bin. The main difference to netcal.regression.confidence.ECE is that each bin is weighted equally.

Parameters:

bins (int or iterable, default: 10) – Number of bins used by the ACE. On detection mode: if int, use same amount of bins for each dimension (nx1 = nx2 = … = bins). If iterable, use different amount of bins for each dimension (nx1, nx2, … = bins).
equal_intervals (bool, optional, default: True) – If True, the bins have the same width. If False, the bins are splitted to equalize the number of samples in each bin.
detection (bool, default: False) – If False, the input array ‘X’ is treated as multi-class confidence input (softmax) with shape (n_samples, [n_classes]). If True, the input array ‘X’ is treated as a box predictions with several box features (at least box confidence must be present) with shape (n_samples, [n_box_features]).
sample_threshold (int, optional, default: 1) – Bins with an amount of samples below this threshold are not included into the miscalibration metrics.

References

Methods

`__init__`([bins, equal_intervals, detection, ...])	Constructor.
`binning`(bin_bounds, samples, *values[, nan])	Perform binning on value (and all additional values passed) based on samples.
`frequency`(X, y[, batched, uncertainty])	Measure the frequency of each point by binning.
`measure`(X, y[, batched, uncertainty, ...])	Measure calibration by given predictions with confidence and the according ground truth.
`prepare`(X, y[, batched, uncertainty])	Check input data.
`process`(metric, acc_hist, conf_hist, ...)	Determine miscalibration based on passed histograms.
`reduce`(histogram, distribution, axis[, ...])	Calculate the weighted mean on a given histogram based on a dedicated data distribution.