netcal.metrics.ACE

class netcal.metrics.ACE(bins: int | Iterable[int] = 10, equal_intervals: bool = True, detection: bool = False, sample_threshold: int = 1)

Average Calibration Error (ACE) for classification and Detection Average Calibration Error (D-ACE) for object detection or segmentation. This metric is used on classification [1] or as Detection Average Calibration Error (D-ACE) [2] on object detection tasks. This metrics measures the average difference between accuracy and confidence by grouping all samples into \(B\) bins and calculating

\[ACE = \frac{1}{B} \sum_{b=1}^B |\text{acc}(b) - \text{conf}(b)| ,\]

where \(\text{acc}(b)\) and \(\text{conf}(b)\) denote the accuracy and average confidence in the b-th bin. The main difference to netcal.regression.confidence.ECE is that each bin is weighted equally.

Parameters:
  • bins (int or iterable, default: 10) – Number of bins used by the ACE. On detection mode: if int, use same amount of bins for each dimension (nx1 = nx2 = … = bins). If iterable, use different amount of bins for each dimension (nx1, nx2, … = bins).

  • equal_intervals (bool, optional, default: True) – If True, the bins have the same width. If False, the bins are splitted to equalize the number of samples in each bin.

  • detection (bool, default: False) – If False, the input array ‘X’ is treated as multi-class confidence input (softmax) with shape (n_samples, [n_classes]). If True, the input array ‘X’ is treated as a box predictions with several box features (at least box confidence must be present) with shape (n_samples, [n_box_features]).

  • sample_threshold (int, optional, default: 1) – Bins with an amount of samples below this threshold are not included into the miscalibration metrics.

References

Methods

__init__([bins, equal_intervals, detection, ...])

Constructor.

binning(bin_bounds, samples, *values[, nan])

Perform binning on value (and all additional values passed) based on samples.

frequency(X, y[, batched, uncertainty])

Measure the frequency of each point by binning.

measure(X, y[, batched, uncertainty, ...])

Measure calibration by given predictions with confidence and the according ground truth.

prepare(X, y[, batched, uncertainty])

Check input data.

process(metric, acc_hist, conf_hist, ...)

Determine miscalibration based on passed histograms.

reduce(histogram, distribution, axis[, ...])

Calculate the weighted mean on a given histogram based on a dedicated data distribution.