MIT/Tübingen Saliency Benchmark datasets

Dataset Citation Images Observers Tasks Durations Extra Notes
MIT300 Tilke Judd, Fredo Durand, Antonio Torralba. A Benchmark of Computational Models of Saliency to Predict Human Fixations [MIT tech report 2012] 300 natural indoor and outdoor scenes
size: max dim: 1024px, other dim: 457-1024px
1 dva* ~ 35px
ages: 18-50
free viewing 3 sec This was the first data set with held-out human eye movements, and is used as benchmark test set in the MIT/Tübingen Saliency Benchmark.
eyetracker: ETL 400 ISCAN (240Hz)
Download 300 test images.
CAT2000 Ali Borji, Laurent Itti. CAT2000: A Large Scale Fixation Dataset for Boosting Saliency Research [CVPR 2015 workshop on "Future of Datasets"] 4000 images from 20 different categories
size: 1920x1080px
1 dva* ~ 38px
24 per image (120 in total)
ages: 18-27
free viewing 5 sec This dataset contains two sets of images: train and test. Train images (100 from each category) and fixations of 18 observers are shared but 6 observers are held-out. Test images are available but fixations of all 24 observers are held out.
eyetracker: EyeLink1000 (1000Hz)
Download 2000 test images.
Download 2000 train images (with fixations of 18 observers).

Other saliency datasets

[If you have another fixation data set that you would like to list here, email]

Dataset Citation Images Observers Tasks Durations Extra Notes
MIT data set Tilke Judd, Krista Ehinger, Fredo Durand, Antonio Torralba. Learning to Predict where Humans Look [ICCV 2009] 1003 natural indoor and outdoor scenes
size: max dim: 1024px, other dim: 405-1024px
1 dva ~ 35px
ages: 18-35
free viewing 3 sec Includes: 779 landscape images and 228 portrait images. Can be used as training data for MIT benchmark.
eyetracker: ETL 400 ISCAN (240Hz)
Le Meur et al 2020 O. Le Meur, T. Le Pen, & R. Cozot. Can we accurately predict where we look at paintings?. Plos one 2020. 150 paintings from Romantism, Realism, Impressionism, Pointillism and Fauvism
21 free viewing 4 sec
MIE Fo & MIE No O. Le Meur, A. Nebout, M. Chérel, & E. Etchamendy. From Kanner autism to Asperger syndromes, the difficult task to predict where ASD people look at. IEEE Access 2020. 25 (MIE Fo) + 25 (MIE No) images
size: 1920x1200px
17 (MIE Fo) + 12 (MIE No) free viewing 4 sec nppd=38 (horizontal), nppd=27 (vertical)
EyeTrackUAV2 Perrin, A. F., Krassanakis, V., Zhang, L., Ricordel, V., Perreira Da Silva, M., & Le Meur, O. (2020). EyeTrackUAV2: a Large-Scale Binocular Eye-Tracking Dataset for UAV Videos. Drones, 4(1), 2. 43 videos (subset of UAV123, DTB70 and VIRAT)
size: 1280x720px or 720x480px, 30fps, RGB
ages: 29.8 overall, 27.9 T, 31.7 FV
Surveillance-viewing task (T) and free-viewing (FV) Average 33 - min 4, max 106 - total 1408 sec. Total of 42 241 frames Available data: raw gaze data, fixations, saccades, and heatmaps. Fixation detection process was based on the implementation of the I-DT based algorithm Scenario: all the above is available for both eyes, dominant eye, binocular position, left, and right eye. Eyetracker: EyeLink 1000 Plus (1000 Hz, binocular)
CrowdFix M. Tahira, S. Mehboob, A. Rahman, & O. Arif CrowdFix: An Eyetracking Dataset of Real Life Crowd Videos. IEEE Access 2019. 434 crowd videos, 1280x720px, 1-3s
26 observers with ages between 17-40 free viewing 1-3 sec Contains 3 categories of crowds: Dense-Congested, Dense-Free Flowing and Sparse. Main focus: Bottom-up attention. Available data: Fixation maps and Saliency maps of each frame ready to be used as ground truth for deep learning tasks. Eyetracker: The Eyetribe Eyetracker (60 Hz).
EyeTrackUAV Krassanakis, Vassilios; Perreira Da Silva, Matthieu; Ricordel, Vincent. Monitoring Human Visual Behavior during the Observation of Unmanned Aerial Vehicles (UAVs) Videos [Drones 2, no. 4: 36, 2018] 19 UAVs videos (subset of UAV123 database)
size: 1280x720px (720p), 30fps
ages: 25.4 (+/-3.8)
free viewing average 47 sec (14 - 162 sec each; 14:47 min total) Available data: raw gaze data, fixations, saccades & heatmaps, Fixation detection process was based on the implementation of the I-DT based algorithm
eyetracker: EyeLink 1000 Plus (1000 Hz, binocular)
DHF1K Wenguan Wang, Jianbing Shen, Ming-Ming Cheng, Ali Borji. Revisiting Video Saliency Prediction in the Deep Learning Era [CVPR 2018] 1000 video sequences
size: 640x360px
ages: 20-28
free viewing varied largest dataset in dynamic visual fixation prediction area
eyetracker: Senso Motoric Instruments (SMI) RED 250 (250Hz)
EMOtional attention dataset (EMOd) Shaojing Fan, Zhiqi Shen, Ming Jiang, Bryan L. Koenig, Juan Xu, Mohan Kankanhali, Qi Zhao. Emotional Attention: A Study of Image Sentiment and Visual Attention [CVPR 2018] (Spotlight) 1019 emotion-eliciting images, including 321 emotion-evoking pictures selected from IAPS
size: 1024x768px
ages: 21-35
free viewing 3 sec The images cover a diversity of emotional contents, arousing various sentiments, such as happiness, excitement, awe, disgust, and fear. Object-level and image-level annotations, code, and CNN models for saliency prediction are available on the project page.
eyetracker: Eyelink 1000 (1000Hz)
ETTO (Eye-Tracking Through Objects) A. Bruno, F. Gugliuzza, R. Pirrone & E. Ardizzone. A Multi-Scale Colour and Keypoint Density-Based Approach for Visual Saliency Detection , IEEE Access, vol.8, pp.:121330-121343, (2020), IEEE. E. Ardizzone, A. Bruno & F. Gugliuzza. Exploiting visual saliency algorithms for object-based attention: A new color and scale-based approach , International Conference on Image Analysis and Processing. pp.:191-201, (2017), Springer. The dataset consists of enhanced versions of 40 images from OPED. 24
ages: 21-34
free viewing 3 sec 24 subjects placed at approximately 70 cm from the screen.
EToCVD (Eye-Tracking of Colour Vision Deficiencies) A. Bruno, F. Gugliuzza, E. Ardizzone, C.C. Giunta & R. Pirrone. Image content enhancement through salient regions segmentation for people with color vision deficiencies, i-Perception, vol. 10, num. 3, pp.:2041669519841073,(2019), SAGE Publications Sage UK: London, England. The dataset consists of 90 colourful images, with different sizes and resolutions, obtained from different public datasets: MIT1003, CAT2000, NUSEF, MIT300. 8 subjects with colour vision deficiencies and 8 subjects without. free viewing 3 sec
FIGRIM Fixation Dataset Zoya Bylinskii, Phillip Isola, Constance Bainbridge, Antonio Torralba, Aude Oliva. Intrinsic and Extrinsic Effects on Image Memorability [Vision Research 2015] 2787 natural scenes from 21 different indoor and outdoor scene categories
size: 1000x1000px
1 dva ~ 33px
15 average per image (42 in total)
ages: 17-33
memory task 2 sec Includes: annotated (LabelMe) objects and memory scores for 630 of the images
eyetracker: EyeLink 1000 (500 Hz)
Coutrot Database 1 Antoine Coutrot, Nathalie Guyader. How saliency, faces, and sound influence gaze in dynamic social scenes [JoV 2014]
Antoine Coutrol, Nathalie Guyader. Toward the introduction of auditory information in dynamic visual attention models [WIAMIS 2013]
60 videos
size: 720x576px
ages: 20-35
free viewing average: 17 sec 4 categories: one moving object, several moving objects, landscapes, people having a conversation. Each video has been seen in 4 auditory conditions.
eyetracker: EyeLink 1000 (1000 Hz)
Coutrot Database 2 Antoine Coutrot, Nathalie Guyader. An efficient audiovisual saliency model to predict eye positions when looking at conversations [EUSIPCO 2015] 15 videos
size: 1232x504px
ages: 22-36
free viewing average: 44 sec Videos of 4 people having a meeting. Each video has been seen in 2 auditory conditions (with and without the original soundtrack).
eyetracker: EyeLink 1000 (1000 Hz)
SAVAM Yury Gitman, Mikhail Erofeev, Dmitriy Vatolin, Andrey Bolshakov, Alexey Fedorov. Semiautomatic Visual-Attention Modeling and Its Application to Video Compression [ICIP 2014] 41 videos
size: max dim: 1920px, other dim: 1080px
ages: 18-56
free viewing average: 20 sec Left and right stereoscopic views available for all sequences. Nevertheless, only the left view was demonstrated to observers.
eyetracker: SMI iViewXTM Hi-Speed 1250 (500Hz)
Eye Fixations in Crowd (EyeCrowd) data set Ming Jiang, Juan Xu, Qi Zhao. Saliency in Crowd [ECCV 2014] 500 natural indoor and outdoor images with varying crowd densities
size: 1024x768px
1 dva ~ 26px
ages: 20-30
free viewing 5 sec The images have a diverse range of crowd densities (up to 268 faces per image). Annotations available: faces labelled with rectangles; two annotations of pose and partial occlusion on each face.
eyetracker: Eyelink 1000 (1000Hz)
Fixations in Webpage Images (FiWI) data set Chengyao Shen, Qi Zhao. Webpage Saliency [ECCV 2014] 149 webpage screenshots from in 3 categories.
size: 1360x768px
1 dva ~ 26px
ages: 21-25
free viewing 5 sec Text: 50, Pictorial: 50, Mixed:49
eyetracker: Eyelink 1000 (1000Hz)
VIU data set Kathryn Koehler, Fei Guo, Sheng Zhang, Miguel P. Eckstein. What Do Saliency Models Predict? [JoV 2014] 800 natural indoor and outdoor scenes
size: max dim: 405px
1 dva ~ 27px
ages: 18-23
explicit saliency judgement, free viewing, saliency search, cued object search until response, 2 sec, 2 sec, 2 sec eyetracker: Eyelink 1000 (250Hz)
Object and Semantic Images and Eye-tracking (OSIE) data set Juan Xu, Ming Jiang, Shuo Wang, Mohan Kankanhalli, Qi Zhao. Predicting Human Gaze Beyond Pixels [JoV 2014] 700 natural indoor and outdoor scenes, aesthetic photographs from Flickr and Google
size: 800x600px
1 dva ~ 24px
ages: 18-30
free viewing 3 sec A large portion of images have multiple dominant objects in the same image. Annotations available: 5,551 segmented objects with fine contours; annotations of 12 semantic attributes on each of the 5,551 objects
eyetracker: Eyelink 1000 (2000Hz)
VIP data set Keng-Teck Ma, Terence Sim, Mohan Kankanhalli. A Unifying Framework for Computational Eye-Gaze Research [Workshop on Human Behavior Understanding 2013] 150 neutral and affective images, randomly chosen from NUSEF dataset
ages: undergrads, postgrads, working adults
free viewing, anomaly detection 5 sec Annotations available: demographic and personality traits of the viewers (can be used for training trait-specific saliency models)
eyetracker: SMI RED 250 (120Hz)
FocalAmbient Follet, B., Le Meur, O., & Baccino, T. New insights into ambient and focal visual fixations using an automatic classification algorithm. i-Perception, 2(6), 592-610 (2011). 120 images
size: 800x600px
22 men, 18 women, mean age = 36.7
free viewing 4 sec Four categories (Street, Coast, Mountain, and Open Country) with very few salient objects. Available data: original images, fixation maps, saliency maps, and gaze coordinates in the referential image. Visual angle: 36x29. Eyetracker: SMI RED iViewX system (50Hz)
MIT Low-resolution data set Tilke Judd, Fredo Durand, Antonio Torralba. Fixations on Low-Resolution Images [JoV 2011] 168 natural and 25 pink noise images at 8 different resolutions
size: 860x1024px
1 dva ~ 35px
8 viewers per image, 64 in total
ages: 18-55
free viewing 3 sec eyetracker: ETL 400 ISCAN (240Hz)
KTH Koostra data set Gert Kootstra, Bart de Boer, Lambert R. B. Schomaker. Predicting Eye Fixations on Complex Visual Stimuli using Local Symmetry [Cognitive Computation 2011] 99 photographs from 5 categories.
size: 1024x768px
ages: 17-32
free viewing 5 sec Images by category: 19 images with symmetrical natural objects, 12 images of animals in a natural setting, 12 images of street scenes, 16 images of buildings, 40 images of natural environments.
eyetracker: Eyelink I
NUSEF data set Subramanian Ramanathan, Harish Katti, Nicu Sebe, Mohan Kankanhalli, Tat-Seng Chua. An eye fixation database for saliency detection in images [ECCV 2010] 758 everyday scenes from Flickr, aesthetic content from, Google images, emotion-evoking IAPS pictures
size: 1024x728px
25 on average
ages: 18-35
free viewing 5 sec eyetracker: ASL
TUD Image Quality Database 2 H. Alers, H. Liu, J. Redi and I. Heynderickx. Studying the risks of optimizing the image quality in saliency regions at the expense of background content [SPIE 2010] 160 images (40 at 4 different levels of compression)
size: 600x600px
40, 20
ages: students
free viewing, quality assessment 8 sec eyetracker: iView X RED (50Hz)
Ehinger data set Krista Ehinger, Barbara Hidalgo-Sotelo, Antonio Torralba, Aude Oliva. Modeling search for people in 900 scenes [Visual Cognition 2009] 912 outdoor scenes
size: 800x600px
1 dva ~ 34px
ages: 18-40
search (person detection) until response eyetracker: ISCAN RK-464 (240Hz)
A Database of Visual Eye Movements (DOVES) Ian van der Linde, Umesh Rajashekar, Alan C. Bovik, Lawrence K. Cormack. DOVES: A database of visual eye movements [Spatial Vision 2009] 101 natural calibrated images
size: 1024x768px
ages: mean=27
free viewing 5 sec eyetracker: Fourward Tech. Gen. V (200Hz)
TUD Image Quality Database 1 H. Liu and I. Heynderickx. Studying the Added Value of Visual Attention in Objective Image Quality Metrics Based on Eye Movement Data [ICIP 2009] 29 images from the LIVE image quality database (varying dimensions) 20
ages: students
free viewing 10 sec eyetracker: iView X RED (50Hz)
Visual Attention for Image Quality (VAIQ) Database Ulrich Engelke, Anthony Maeder, Hans-Jurgen Zepernick. Visual Attention Modeling for Subjective Image Quality Databases [MMSP 2009] 42 images from 3 image quality databases: IRCCyN/IVC, MICT, and LIVE (varying dimensions) 15
ages: 20-60 (mean=42)
free viewing 12 sec eyetracker: EyeTech TM3
Toronto data set Neil Bruce, John K. Tsotsos. Attention based on information maximization [JoV 2007] 120 color images of outdoor and indoor scenes
size: 681x511px
ages: undergrads, grads
free viewing 4 sec A large portion of images here do not contain particular regions of interest.
eyetracker: ERICA workstation including a Hitachi CCD camera with an IR emitting LED
Fixations in Faces (FiFA) data base Moran Cerf, Jonathan Harel, Wolfgang Einhauser, Christof Koch. Predicting human gaze using low-level saliency combined with face detection [NIPS 2007] 200 color outdoor and indoor scenes
size: 1024x768px
1 dva ~ 34px
8 free viewing 2 sec Images include salient objects and many different types of faces. This data set was originally used to establish that human faces are very attractive to observers and to test models of saliency that included face detectors. Object annotations are available.
eyetracker: Eyelink 1000 (1000Hz)
Le Meur data set Olivier Le Meur, Patrick Le Callet, Dominique Barba, Dominique Thoreau. A coherent computational approach to model the bottom-up visual attention [PAMI 2006] 27 color images 40 free viewing 15 sec eyetracker: Cambridge Research

(*) dva = degree of visual angle
Matlab code for computing visual angle. This code has been written to help standardize and make this computation easier.
Why is this relevant to saliency modeling? A continuous fixation map is calculated by convolving locations of fixation with a Gaussian of a particular sigma. This sigma is most commonly set to be approximately 1 degree of visual angle, which is an estimate of the size of the fovea (Le Meur and Baccino, 2013). This gives us an upper bound of how well we can predict where humans look on the images in a particular dataset, and thus this should inform how we evaluate saliency models.

Saliency-related data sets

FixaTons: An open project that consists of a collection of datasets, within a uniform framework in python, for scanpaths and fixations studies. It includes code for data use, statistics calculation, calculation of salience metrics and metrics for scanpath similarity.

IVC Data sets The Images and Video Communications team (IVC) of IRCCyN lab provides several image and video databases including eye movement recordings. Some of the databases are based on a free viewing task, other on a quality evaluation task.

Regional Saliency Dataset (RSD) [Li, Tian, Huang, Gao 2009] (paper) A dataset for evaluating visual saliency in video.

MSRA Salient Object Database [Liu et al. 2007] database of 20,000 images with hand labeled rectangles of principle salient object by 3 users.