Preparing input data

The Thetis evaluation library only requires the output of an AI model on a dedicated evaluation dataset. Thus, the application requires the ground truth target labels (not the data itself!) and the according AI predictions. We give a detailed overview about the required data format in the following.

Binary classification

In the case of binary classification, Thetis expects two instances of a Pandas DataFrame pd.DataFrame representing the annotations and the model predictions on the dataset, respectively.

Imagine that we have the variables annotations and predictions with annotations and predictions, respectively. If we print the content of annotations to the Python console, it might output the following:

>>> annotations
           target  gender    age
img_00     person  female  adult
img_01     person    male  child
img_02     person  female  adult
img_03     person  female  adult
img_04     person  female  adult
img_05     person  female  adult
img_06  no person    male  adult
img_07  no person  female  adult
img_08     person  female  child
img_09     person  female  child

In this example, we have the available target classes “person” and “not person” with the sensitive attributes “gender” and “age”. The target labels must always be given in the column target. Furthermore, each sensitive attribute must be given with its respective name as column name.

Accordingly, the output of predictions might look like the following:

>>> predictions
           labels  confidence
img_00     person    0.992300
img_01     person    0.962620
img_02  no person    0.146000
img_03     person    0.795490
img_04     person    0.897310
img_05  no person    0.247210
img_06  no person    0.001412
img_07  no person    0.000150
img_08     person    0.970410
img_09     person    0.931941

The required column labels holds the predicted label for each item in the dataset, whereas the required column confidence represents the (binary) label confidence/uncertainty estimated by the AI model.

Important: the confidence refers to the binary_positive_label specified in the application config. The uncertainty for the negative class (“no person” in this case) is given by 1 - confidence.

Note: the indices of the DataFrames annotations and predictions must match to each other.

Multi-class classification

If you are working in a multi-class classification setting (setting “task” to “classification” with more than 2 entries in “distinct_classes” in the application config), Thetis also expects two Pandas DataFrames annotations and predictions representing the ground truth annotations and the according AI predictions, respectively.

Similar to the binary classification case, the DataFrame annotations must consist of the columns target as well as an own column for each specified sensitive attribute:

>>> annotations
         target  gender    age
img_00   person  female  adult
img_01   person    male  child
img_02      car    None   None
img_03  bicycle    None   None
img_04      car    None   None
img_05   person  female  child
img_06      car    None   None
img_07   person  female  adult
img_08   person    male  adult
img_09  bicycle    None   None

Note: in this example, the sensitive attributes “gender” and “age” are only valid for the target class “person”. This must be specified in the fairness section of the application config. For all other entries, the missing entries are marked by passing “None”.

The respective AI predictions given by predictions are given in a similar way compared to binary classification. A column labels is required, representing the predicted label of each individual sample. However, an uncertainty/confidence is required for each possible class (e.g., the output of a Softmax activation function in the context of neural networks). The columns for the confidence must follow the naming convention confidence_<label>. Thus, given a configuration for “distinct_classes” with possible classes “person”, “bicycle”, and “car”, the DataFrame predictions for the AI predictions might look like the following:

>>> predictions
         labels  confidence_person  confidence_bicycle  confidence_car
img_00   person           0.984100            0.014250        0.001650
img_01   person           0.948210            0.035340        0.016450
img_02      car           0.001020            0.021920        0.977060
img_03      car           0.021412            0.420190        0.558398
img_04      car           0.030120            0.001390        0.968490
img_05  bicycle           0.361530            0.591312        0.047158
img_06      car           0.000326            0.005310        0.994364
img_07   person           0.873920            0.004124        0.121956
img_08   person           0.968320            0.020931        0.010749
img_09   biycle           0.015182            0.947182        0.037636

Note: the indices of the DataFrames annotations and predictions must match to each other.

Regression

In the case of regression, Thetis expects two instances of a Pandas DataFrame pd.DataFrame representing the annotations (target scores) and the model predictions on the dataset, respectively.

Imagine that we have the variables annotations and predictions with annotations and predictions, respectively. If we print the content of annotations to the Python console, it might output the following:

>>> annotations
            target  gender    age
sample_00   -1.246  female  adult
sample_01    0.579    male  child
sample_02    0.000  female  adult
sample_03  -10.798  female  adult
sample_04    3.480  female  adult
sample_05    9.546  female  adult
sample_06   70.892    male  adult
sample_07  -16.721  female  adult
sample_08    0.239  female  child
sample_09   -0.724  female  child

In this example, we have the target scores with the sensitive attributes “gender” and “age”. The target labels must always be given in the column target. Furthermore, each sensitive attribute must be given with its respective name as column name.

Accordingly, the output of predictions might look like the following:

>>> predictions
          predictions  stddev
sample_00     -0.524    1.272
sample_01      2.725    0.713
sample_02      0.011    0.005
sample_03     -8.372    2.795
sample_04     -2.745    3.657
sample_05      9.546    0.001
sample_06     60.126    9.001
sample_07     -3.913    4.503
sample_08     -0.342    0.098
sample_09     -0.223    0.003

The required column predictions holds the predicted scores for each item in the dataset. The optional column stddev holds the estimated standard deviation for each predicted score representing the estimation uncertainty. Alternatively, the column variance can also be passed with the according estimation variance.

Note: when the evaluation of uncertainty is active, either column stddev or variance must be given.

Note: the indices of the DataFrames annotations and predictions must match to each other.

Object detection

The input for the image-based object detection evaluation case differs compared to the classification cases. In the object detection evaluation mode, Thetis expects two Python dictionaries annotations and predictions, representing the ground truth objects as well as the predicted objects, respectively.

Each entry within these dictionaries must be an instance of a Pandas DataFrame. The dictionary keys are the identifiers for each image. Thus, it is possible to assign predicted objects to real existing ones by identifying the ground truth and prediction information using the given dictionary keys.

The individual pd.DataFrame instances (annotations and predictions) for each frame must be given according to the format for binary classification (but with more than 2 labels allowed). Thus, the console output for annotations might look like the following:

>>> annotations
{'__meta__':
            width   height
    img_00   1920     1080
    img_01   1680     720,

 'img_00':
         target  gender    age
     0   person  female  adult
     1   person    male  child
     2      car    None   None
     3   person  female  child
     4      car    None   None
     5   person  female  adult,
'img_01': ...
}

Important: the dictionary for annotations requires a field __meta__ with an instance of pd.DataFrame with columns width and height. This DataFrame holds the width and height meta information for the respective image. The frame index must match the set of keys that are present in annotations and predictions.

The console output for predictions might look like:

>>> predictions
{'img_00':
        labels  confidence
    0   person    0.914123
    1   person    0.871923
    2      car    0.921751
    3      car    0.993720
    4      car    0.351152
    5  bicycle    0.639153
    6      car    0.817591
    7   person    0.912730
    8   person    0.981693
    9   biycle    0.583190,
'img_01': ...
}

Note: the indices of the individual pd.DataFrame instances are not expected to match each other since the amount of predicted and real-existing objects can differ.

Furthermore, only a single field for confidence is expected, even when working with multiple labels. This is because most of the common object dection algorithms only output a single confidence estimate for a detected object, discarding the confidence information for the remaining classes.