MIT/Tuebingen Saliency Benchmark

Evaluation

We evaluate models using seven metrics: AUC, shuffled AUC, Normalized Scanpath Salience (NSS), Correlation Coefficent (CC), Similarity (SIM) and KL-Divergence. The evaluations are implemented in the pysaliency python library and called with the code available here.

For probabilistic models, we compute correct saliency maps for the evaluated metrics and evaluate those metrics. More precisely, each metric is evaluated with the saliency map which the model itself predicts to have highest metric performance. This will result in models being scored fairly in all metrics and therefore have higher scores in some metrics than classic saliency map models. For more details, check Kümmerer et al., Saliency Benchmarking Made Easy: Separating Models, Maps and Metrics [ECCV 2018].

Note that when reimplementing the evaluation, we tried to make some details more principled. Therefore, there are a few inconsistencies between the original evaluation code of the MIT Saliency Benchmark and how the saliency maps are computed and evaluated here: