netcal.metrics.confidence.ECE¶

class netcal.metrics.confidence.ECE(bins: int | Iterable[int] = 10, equal_intervals: bool = True, detection: bool = False, sample_threshold: int = 1)¶

Expected Calibration Error (ECE) for classification and Detection Expected Calibration Error (D-ECE) for object detection or segmentation. This metric is used on classification [1] or as Detection Expected Calibration Error (D-ECE) on object detection tasks [2]. This metrics measures the expected difference between accuracy/precision and confidence by grouping all samples (size \(N\)) into \(B\) bins, so that the ECE is computed by

\[ECE = \sum_{b=1}^B \frac{N_b}{N} |\text{acc}(b) - \text{conf}(b)| ,\]

where \(\text{acc}(b)\) and \(\text{conf}(b)\) denote the accuracy and average confidence in the i-th bin and \(N_b\) denote the number of samples in bin \(b\).

Parameters:

bins (int or iterable, default: 10) – Number of bins used by the Expected Calibration Error. On detection mode: if int, use same amount of bins for each dimension (nx1 = nx2 = … = bins). If iterable, use different amount of bins for each dimension (nx1, nx2, … = bins).
equal_intervals (bool, optional, default: True) – If True, the bins have the same width. If False, the bins are splitted to equalize the number of samples in each bin.
detection (bool, default: False) – If False, the input array ‘X’ is treated as multi-class confidence input (softmax) with shape (n_samples, [n_classes]). If True, the input array ‘X’ is treated as a box predictions with several box features (at least box confidence must be present) with shape (n_samples, [n_box_features]).
sample_threshold (int, optional, default: 1) – Bins with an amount of samples below this threshold are not included into the miscalibration metrics.

References

Methods

`__init__`([bins, equal_intervals, detection, ...])	Constructor.
`binning`(bin_bounds, samples, *values[, nan])	Perform binning on value (and all additional values passed) based on samples.
`frequency`(X, y[, batched, uncertainty])	Measure the frequency of each point by binning.
`measure`(X, y[, batched, uncertainty, ...])	Measure calibration by given predictions with confidence and the according ground truth.
`prepare`(X, y[, batched, uncertainty])	Check input data.
`process`(metric, acc_hist, conf_hist, ...)	Determine miscalibration based on passed histograms.
`reduce`(histogram, distribution, axis[, ...])	Calculate the weighted mean on a given histogram based on a dedicated data distribution.