Configuration

Thetis needs a YAML configuration file that specifies several aspects, e.g., the task, available classes, requested evaluation aspects, etc. Download an exemplary configuration file or copy/paste the following configuration file. An explanation for each configuration aspect can be found below.

Example configuration file

An exemplary YAML configuration for Thetis must have the following form:

# meta data of model predictions and dataset
meta:

  model:
    name: "<model name>"
    revision: "<model revision>"

  dataset:
    name: "<dataset name>"
    revision: "r1"


# Examination task. Can be one of: "classification" (binary/multi-class classification),
# "detection" (image-based object detection) or "regression"
task: "classification"

# Language of the final report. Can be one of: "en", "de"
language: "en"

# Task-specific settings. Required and available fields depend on the selected task.
task_settings:

  # List of distinct classes that can occur within the dataset (can only be set for classification or
  # object detection). If specified then this parameter cannot be empty.
  distinct_classes: ["no person", "person"]

  # In binary classification (when 'distinct_classes' has length of 2), you must specify a positive label out of
  # the list of available classes. This is important since you only give a single "confidence" for each prediction,
  # targeting the probability of the positive class. May only be specified for binary classification.
  binary_positive_label: "person"

  # Bounding-box format. Can be one of: "xyxy" (xmin, ymin, xmax, ymax), "xywh" (xmin, ymin, width, height),
  # or "cxcywh" (center x, center y, width, height).
  detection_bbox_format: "xyxy"

  # List with IoU scores used for object detection evaluation
  # Note: the IoU score "0.5" is always active for the evaluation. You can specify more IoU scores if you want
  detection_bbox_ious: [0.75]

  # String with bounding box matching strategy. Must be one of: "exclusive", "max".
  detection_bbox_matching: "exclusive"

  # Set to true if the bounding boxes are also inferred with a separate variance score (currently not supported)
  detection_bbox_probabilistic: false

  # In detection mode, it is possible to set a confidence threshold
  # to discard blurry predictions with low confidence
  detection_confidence_thr: 0.2

  # In detection mode it is possible to specify a tolerance zone outside image bounds within which clipping is applied. The boxes within these zones are
  # clipped to the image dimensions. For boxes outside the specified tolerance, an error is raised instead.
  detection_bbox_clipping: 20%

# Settings for the data evaluation routine
data_evaluation:
  examine: true

# Settings for the AI baseline performance evaluation (which should be always performed!)
performance:
  examine: true

# Settings for the evaluation of confidence calibration
uncertainty:
  examine: true

  # Number of bins used for ECE calculation, required for classification and detection evaluation
  ece_bins : 20

  # During ECE/D-ECE computation, bins with a number of samples less than this threshold are ignored
  # Required for classification and detection evaluation
  ece_sample_threshold: 10

  # Number of bins used for D-ECE calculation (object detection), required for detection evaluation
  dece_bins: 5

# Settings for the evaluation of model fairness
fairness:
  examine: true

  # Specify sensitive attributes that are used for fairness evaluation. For each of these attributes,
  # you need to specify the classes for which the attributes are actually valid (out of the labels
  # within 'distinct_classes' list). You can also leave it empty or type "all" to mark validity for all classes.
  sensitive_attributes:
    gender: ["no person", "person"]
    age: "all"

General application settings

In the following, we give a detailed overview about all possible general configuration settings.

Meta information settings describing the customer information, model properties, and used dataset.

Key/Specifier

Dtype

Description

meta/model/name

string

Name of the AI model used to generate predictions.

meta/model/revision

string

Revision of the AI model used to generate predictions.

meta/dataset/name

string

Name of the dataset holding the ground truth information.

meta/dataset/revision

string

Revision of the dataset holding the ground truth information.

General application settings

Key/Specifier

Dtype

Description

task

string

Selection of the examination task. Can be one of: “classification” (binary/multi-class classification), “detection” (image-based object detection).

language

string

Language of the final evaluation report. Can be one of: “en” (US English), “de” (German).

task_settings/distinct_classes

list of int or string

List of distinct classes that can occur within the dataset. Only to be provided in case of Classification or Detection

task_settings/binary_positive_label

int or string

In binary classification (when ‘distinct_classes’ has length of 2), you must specify a positive label out of the list of available classes. This is important since you only give a single “confidence” for each prediction, targeting the probability of the positive class.

task_settings/detection_bbox_format

string

Bounding-box format of the provided boxes in object detection mode. Can be one of: “xyxy” (xmin, ymin, xmax, ymax), “xywh” (xmin, ymin, width, height), or “cxcywh” (center x, center y, width, height).

task_settings/detection_bbox_ious

list of float

List with IoU scores (in [0, 1] interval) used for object detection evaluation. Note: the IoU score “0.5” is always active for the evaluation. You can specify more IoU scores if you want.

task_settings/detection_bbox_matching

string

String with bounding box matching strategy within object detection evaluation. The strategy of matching the predicted bounding boxes with the ground truth ones must be either “exclusive,” where each prediction and each ground truth are assigned to at most a single counterpart, or “max,” with maximum/non-exclusive bounding box matching, where each ground truth object may have multiple predictions assigned to it. The default is “exclusive”.

task_settings/detection_bbox_probabilistic

boolean

Currently not used.

task_settings/detection_confidence_thr

float

In detection mode, it is possible to set a confidence threshold (in [0, 1] interval) to discard blurry predictions with low confidence.

task_settings/detection_bbox_clipping

int

In detection mode, it is possible to specify a tolerance zone outside the image in case of boxes that are out of image bounds. This can be ommitted, in which case no clipping is applied and an error is raised if a box is out of image bounds. Alternatively, it can be set to relative(relative to image width and height)% ([0-100]%) or absolute values in px ([int]px). These specify the dimensions outside the image, such that if any boxes extend into this tolerance zone, they will get clipped to the image dimensions. If boxes exceed these tolerance zones no clipping will be applied, an error will be raised instead.

Configuration of safety evaluation

Configuration settings for dataset evaluation.

Key/Specifier

Dtype

Description

data_evaluation/examine

boolean

Enables/disables the data evaluation for the final rating & reporting.

Configuration settings for AI performance evaluation.

Key/Specifier

Dtype

Description

performance/examine

boolean

Enables/disables the AI performance evaluation (e.g., accuracy, mAP, precision, recall, etc.) for the final reporting.

Configuration settings for uncertainty evaluation (uncertainty calibration).

Key/Specifier

Dtype

Description

uncertainty/examine

boolean

Enables/disables the uncertainty evaluation (uncertainty calibration, e.g., computation of the Expected Calibration Error (ECE)) for the final rating & reporting.

uncertainty/ece_bins

int

Number of bins used for the computation of the Expected Calibration Error (ECE), Maximum Calibration Error (MCE), and the respective reliability diagrams. The default value is 20.

uncertainty/ece_sample_threshold

int

Sample threshold used for the computation of the ECE, MCE, D-ECE, and the respective reliability diagrams to discard bins with an amount of samples below this threshold. Discarding bins with only a small amount of samples is recommended to stabilize the ECE/MCE computations. The default value is 10.

uncertainty/dece_bins

int

Number of bins used for the computation of the Decetion Expected Calibration Error (D-ECE) (object detection only) and the respective reliability diagrams. The D-ECE is the counterpart of the ECE for position-dependent calibration evaluation of object detection tasks. The default value is 5.

Configuration settings for AI fairness evaluation.

Key/Specifier

Dtype

Description

fairness/examine

boolean

Enables/disables the AI fairness evaluation for the final rating & reporting.

fairness/sensitive_attributes/<label name>

optional string or list of int/string

Specify one or multiple sensitive attributes (e.g., gender or age) that are used for fairness evaluation. The value of this entry is a list of target classes (given by “distinct_classes” parameter) for which the sensitive attribute is valid. For example, if “distinct_classes” specifies labels “person” and “car”, a sensitive attribute for “gender” might only be valid for target label “person”. If the attribute is valid for all specified target labels, you can also leave the value empty or pass “all”.