netcal.metrics.confidence.ECE

class netcal.metrics.confidence.ECE(bins: int | Iterable[int] = 10, equal_intervals: bool = True, detection: bool = False, sample_threshold: int = 1)

Expected Calibration Error (ECE) for classification and Detection Expected Calibration Error (D-ECE) for object detection or segmentation. This metric is used on classification [1] or as Detection Expected Calibration Error (D-ECE) on object detection tasks [2]. This metrics measures the expected difference between accuracy/precision and confidence by grouping all samples (size \(N\)) into \(B\) bins, so that the ECE is computed by

\[ECE = \sum_{b=1}^B \frac{N_b}{N} |\text{acc}(b) - \text{conf}(b)| ,\]

where \(\text{acc}(b)\) and \(\text{conf}(b)\) denote the accuracy and average confidence in the i-th bin and \(N_b\) denote the number of samples in bin \(b\).

Parameters:
  • bins (int or iterable, default: 10) – Number of bins used by the Expected Calibration Error. On detection mode: if int, use same amount of bins for each dimension (nx1 = nx2 = … = bins). If iterable, use different amount of bins for each dimension (nx1, nx2, … = bins).

  • equal_intervals (bool, optional, default: True) – If True, the bins have the same width. If False, the bins are splitted to equalize the number of samples in each bin.

  • detection (bool, default: False) – If False, the input array ‘X’ is treated as multi-class confidence input (softmax) with shape (n_samples, [n_classes]). If True, the input array ‘X’ is treated as a box predictions with several box features (at least box confidence must be present) with shape (n_samples, [n_box_features]).

  • sample_threshold (int, optional, default: 1) – Bins with an amount of samples below this threshold are not included into the miscalibration metrics.

References

Methods

__init__([bins, equal_intervals, detection, ...])

Constructor.

binning(bin_bounds, samples, *values[, nan])

Perform binning on value (and all additional values passed) based on samples.

frequency(X, y[, batched, uncertainty])

Measure the frequency of each point by binning.

measure(X, y[, batched, uncertainty, ...])

Measure calibration by given predictions with confidence and the according ground truth.

prepare(X, y[, batched, uncertainty])

Check input data.

process(metric, acc_hist, conf_hist, ...)

Determine miscalibration based on passed histograms.

reduce(histogram, distribution, axis[, ...])

Calculate the weighted mean on a given histogram based on a dedicated data distribution.