MIT/Tuebingen Saliency Benchmark

MIT/Tübingen Saliency Benchmark datasets

Dataset	Citation	Images	Observers	Tasks	Durations	Extra Notes
MIT300	Tilke Judd, Fredo Durand, Antonio Torralba. A Benchmark of Computational Models of Saliency to Predict Human Fixations [MIT tech report 2012]	300 natural indoor and outdoor scenes size: max dim: 1024px, other dim: 457-1024px 1 dva* ~ 35px	39 ages: 18-50	free viewing	3 sec	This was the first data set with held-out human eye movements, and is used as benchmark test set in the MIT/Tübingen Saliency Benchmark. eyetracker: ETL 400 ISCAN (240Hz) Download 300 test images.
CAT2000	Ali Borji, Laurent Itti. CAT2000: A Large Scale Fixation Dataset for Boosting Saliency Research [CVPR 2015 workshop on "Future of Datasets"]	4000 images from 20 different categories size: 1920x1080px 1 dva* ~ 38px	24 per image (120 in total) ages: 18-27	free viewing	5 sec	This dataset contains two sets of images: train and test. Train images (100 from each category) and fixations of 18 observers are shared but 6 observers are held-out. Test images are available but fixations of all 24 observers are held out. eyetracker: EyeLink1000 (1000Hz) Download 2000 test images. Download 2000 train images (with fixations of 18 observers).

Other saliency datasets

[If you have another fixation data set that you would like to list here, email saliency@tuebingen.ai]

Dataset	Citation	Images	Observers	Tasks	Durations	Extra Notes
MIT data set	Tilke Judd, Krista Ehinger, Fredo Durand, Antonio Torralba. Learning to Predict where Humans Look [ICCV 2009]	1003 natural indoor and outdoor scenes size: max dim: 1024px, other dim: 405-1024px 1 dva ~ 35px	15 ages: 18-35	free viewing	3 sec	Includes: 779 landscape images and 228 portrait images. Can be used as training data for MIT benchmark. eyetracker: ETL 400 ISCAN (240Hz)
Le Meur et al 2020	O. Le Meur, T. Le Pen, & R. Cozot. Can we accurately predict where we look at paintings?. Plos one 2020.	150 paintings from Romantism, Realism, Impressionism, Pointillism and Fauvism	21	free viewing	4 sec
MIE Fo & MIE No	O. Le Meur, A. Nebout, M. Chérel, & E. Etchamendy. From Kanner autism to Asperger syndromes, the difficult task to predict where ASD people look at. IEEE Access 2020.	25 (MIE Fo) + 25 (MIE No) images size: 1920x1200px	17 (MIE Fo) + 12 (MIE No)	free viewing	4 sec	nppd=38 (horizontal), nppd=27 (vertical)
EyeTrackUAV2	Perrin, A. F., Krassanakis, V., Zhang, L., Ricordel, V., Perreira Da Silva, M., & Le Meur, O. (2020). EyeTrackUAV2: a Large-Scale Binocular Eye-Tracking Dataset for UAV Videos. Drones, 4(1), 2.	43 videos (subset of UAV123, DTB70 and VIRAT) size: 1280x720px or 720x480px, 30fps, RGB	30 ages: 29.8 overall, 27.9 T, 31.7 FV	Surveillance-viewing task (T) and free-viewing (FV)	Average 33 - min 4, max 106 - total 1408 sec. Total of 42 241 frames	Available data: raw gaze data, fixations, saccades, and heatmaps. Fixation detection process was based on the implementation of the I-DT based algorithm Scenario: all the above is available for both eyes, dominant eye, binocular position, left, and right eye. Eyetracker: EyeLink 1000 Plus (1000 Hz, binocular)
CrowdFix	M. Tahira, S. Mehboob, A. Rahman, & O. Arif CrowdFix: An Eyetracking Dataset of Real Life Crowd Videos. IEEE Access 2019.	434 crowd videos, 1280x720px, 1-3s	26 observers with ages between 17-40	free viewing	1-3 sec	Contains 3 categories of crowds: Dense-Congested, Dense-Free Flowing and Sparse. Main focus: Bottom-up attention. Available data: Fixation maps and Saliency maps of each frame ready to be used as ground truth for deep learning tasks. Eyetracker: The Eyetribe Eyetracker (60 Hz).
EyeTrackUAV	Krassanakis, Vassilios; Perreira Da Silva, Matthieu; Ricordel, Vincent. Monitoring Human Visual Behavior during the Observation of Unmanned Aerial Vehicles (UAVs) Videos [Drones 2, no. 4: 36, 2018]	19 UAVs videos (subset of UAV123 database) size: 1280x720px (720p), 30fps	14 ages: 25.4 (+/-3.8)	free viewing	average 47 sec (14 - 162 sec each; 14:47 min total)	Available data: raw gaze data, fixations, saccades & heatmaps, Fixation detection process was based on the implementation of the I-DT based algorithm eyetracker: EyeLink 1000 Plus (1000 Hz, binocular)
DHF1K	Wenguan Wang, Jianbing Shen, Ming-Ming Cheng, Ali Borji. Revisiting Video Saliency Prediction in the Deep Learning Era [CVPR 2018]	1000 video sequences size: 640x360px	17 ages: 20-28	free viewing	varied	largest dataset in dynamic visual fixation prediction area eyetracker: Senso Motoric Instruments (SMI) RED 250 (250Hz)
EMOtional attention dataset (EMOd)	Shaojing Fan, Zhiqi Shen, Ming Jiang, Bryan L. Koenig, Juan Xu, Mohan Kankanhali, Qi Zhao. Emotional Attention: A Study of Image Sentiment and Visual Attention [CVPR 2018] (Spotlight)	1019 emotion-eliciting images, including 321 emotion-evoking pictures selected from IAPS size: 1024x768px	16 ages: 21-35	free viewing	3 sec	The images cover a diversity of emotional contents, arousing various sentiments, such as happiness, excitement, awe, disgust, and fear. Object-level and image-level annotations, code, and CNN models for saliency prediction are available on the project page. eyetracker: Eyelink 1000 (1000Hz)
ETTO (Eye-Tracking Through Objects)	A. Bruno, F. Gugliuzza, R. Pirrone & E. Ardizzone. A Multi-Scale Colour and Keypoint Density-Based Approach for Visual Saliency Detection , IEEE Access, vol.8, pp.:121330-121343, (2020), IEEE. E. Ardizzone, A. Bruno & F. Gugliuzza. Exploiting visual saliency algorithms for object-based attention: A new color and scale-based approach , International Conference on Image Analysis and Processing. pp.:191-201, (2017), Springer.	The dataset consists of enhanced versions of 40 images from OPED.	24 ages: 21-34	free viewing	3 sec	24 subjects placed at approximately 70 cm from the screen.
EToCVD (Eye-Tracking of Colour Vision Deficiencies)	A. Bruno, F. Gugliuzza, E. Ardizzone, C.C. Giunta & R. Pirrone. Image content enhancement through salient regions segmentation for people with color vision deficiencies, i-Perception, vol. 10, num. 3, pp.:2041669519841073,(2019), SAGE Publications Sage UK: London, England.	The dataset consists of 90 colourful images, with different sizes and resolutions, obtained from different public datasets: MIT1003, CAT2000, NUSEF, MIT300.	8 subjects with colour vision deficiencies and 8 subjects without.	free viewing	3 sec
FIGRIM Fixation Dataset	Zoya Bylinskii, Phillip Isola, Constance Bainbridge, Antonio Torralba, Aude Oliva. Intrinsic and Extrinsic Effects on Image Memorability [Vision Research 2015]	2787 natural scenes from 21 different indoor and outdoor scene categories size: 1000x1000px 1 dva ~ 33px	15 average per image (42 in total) ages: 17-33	memory task	2 sec	Includes: annotated (LabelMe) objects and memory scores for 630 of the images eyetracker: EyeLink 1000 (500 Hz)
Coutrot Database 1	Antoine Coutrot, Nathalie Guyader. How saliency, faces, and sound influence gaze in dynamic social scenes [JoV 2014] Antoine Coutrol, Nathalie Guyader. Toward the introduction of auditory information in dynamic visual attention models [WIAMIS 2013]	60 videos size: 720x576px	72 ages: 20-35	free viewing	average: 17 sec	4 categories: one moving object, several moving objects, landscapes, people having a conversation. Each video has been seen in 4 auditory conditions. eyetracker: EyeLink 1000 (1000 Hz)
Coutrot Database 2	Antoine Coutrot, Nathalie Guyader. An efficient audiovisual saliency model to predict eye positions when looking at conversations [EUSIPCO 2015]	15 videos size: 1232x504px	40 ages: 22-36	free viewing	average: 44 sec	Videos of 4 people having a meeting. Each video has been seen in 2 auditory conditions (with and without the original soundtrack). eyetracker: EyeLink 1000 (1000 Hz)
SAVAM	Yury Gitman, Mikhail Erofeev, Dmitriy Vatolin, Andrey Bolshakov, Alexey Fedorov. Semiautomatic Visual-Attention Modeling and Its Application to Video Compression [ICIP 2014]	41 videos size: max dim: 1920px, other dim: 1080px	50 ages: 18-56	free viewing	average: 20 sec	Left and right stereoscopic views available for all sequences. Nevertheless, only the left view was demonstrated to observers. eyetracker: SMI iViewXTM Hi-Speed 1250 (500Hz)
Eye Fixations in Crowd (EyeCrowd) data set	Ming Jiang, Juan Xu, Qi Zhao. Saliency in Crowd [ECCV 2014]	500 natural indoor and outdoor images with varying crowd densities size: 1024x768px 1 dva ~ 26px	16 ages: 20-30	free viewing	5 sec	The images have a diverse range of crowd densities (up to 268 faces per image). Annotations available: faces labelled with rectangles; two annotations of pose and partial occlusion on each face. eyetracker: Eyelink 1000 (1000Hz)
Fixations in Webpage Images (FiWI) data set	Chengyao Shen, Qi Zhao. Webpage Saliency [ECCV 2014]	149 webpage screenshots from in 3 categories. size: 1360x768px 1 dva ~ 26px	11 ages: 21-25	free viewing	5 sec	Text: 50, Pictorial: 50, Mixed:49 eyetracker: Eyelink 1000 (1000Hz)
VIU data set	Kathryn Koehler, Fei Guo, Sheng Zhang, Miguel P. Eckstein. What Do Saliency Models Predict? [JoV 2014]	800 natural indoor and outdoor scenes size: max dim: 405px 1 dva ~ 27px	100,22,20,38 ages: 18-23	explicit saliency judgement, free viewing, saliency search, cued object search	until response, 2 sec, 2 sec, 2 sec	eyetracker: Eyelink 1000 (250Hz)
Object and Semantic Images and Eye-tracking (OSIE) data set	Juan Xu, Ming Jiang, Shuo Wang, Mohan Kankanhalli, Qi Zhao. Predicting Human Gaze Beyond Pixels [JoV 2014]	700 natural indoor and outdoor scenes, aesthetic photographs from Flickr and Google size: 800x600px 1 dva ~ 24px	15 ages: 18-30	free viewing	3 sec	A large portion of images have multiple dominant objects in the same image. Annotations available: 5,551 segmented objects with fine contours; annotations of 12 semantic attributes on each of the 5,551 objects eyetracker: Eyelink 1000 (2000Hz)
VIP data set	Keng-Teck Ma, Terence Sim, Mohan Kankanhalli. A Unifying Framework for Computational Eye-Gaze Research [Workshop on Human Behavior Understanding 2013]	150 neutral and affective images, randomly chosen from NUSEF dataset	75 ages: undergrads, postgrads, working adults	free viewing, anomaly detection	5 sec	Annotations available: demographic and personality traits of the viewers (can be used for training trait-specific saliency models) eyetracker: SMI RED 250 (120Hz)
FocalAmbient	Follet, B., Le Meur, O., & Baccino, T. New insights into ambient and focal visual fixations using an automatic classification algorithm. i-Perception, 2(6), 592-610 (2011).	120 images size: 800x600px	40 22 men, 18 women, mean age = 36.7	free viewing	4 sec	Four categories (Street, Coast, Mountain, and Open Country) with very few salient objects. Available data: original images, fixation maps, saliency maps, and gaze coordinates in the referential image. Visual angle: 36x29. Eyetracker: SMI RED iViewX system (50Hz)
MIT Low-resolution data set	Tilke Judd, Fredo Durand, Antonio Torralba. Fixations on Low-Resolution Images [JoV 2011]	168 natural and 25 pink noise images at 8 different resolutions size: 860x1024px 1 dva ~ 35px	8 viewers per image, 64 in total ages: 18-55	free viewing	3 sec	eyetracker: ETL 400 ISCAN (240Hz)
KTH Koostra data set	Gert Kootstra, Bart de Boer, Lambert R. B. Schomaker. Predicting Eye Fixations on Complex Visual Stimuli using Local Symmetry [Cognitive Computation 2011]	99 photographs from 5 categories. size: 1024x768px	31 ages: 17-32	free viewing	5 sec	Images by category: 19 images with symmetrical natural objects, 12 images of animals in a natural setting, 12 images of street scenes, 16 images of buildings, 40 images of natural environments. eyetracker: Eyelink I
NUSEF data set	Subramanian Ramanathan, Harish Katti, Nicu Sebe, Mohan Kankanhalli, Tat-Seng Chua. An eye fixation database for saliency detection in images [ECCV 2010]	758 everyday scenes from Flickr, aesthetic content from Photo.net, Google images, emotion-evoking IAPS pictures size: 1024x728px	25 on average ages: 18-35	free viewing	5 sec	eyetracker: ASL
TUD Image Quality Database 2	H. Alers, H. Liu, J. Redi and I. Heynderickx. Studying the risks of optimizing the image quality in saliency regions at the expense of background content [SPIE 2010]	160 images (40 at 4 different levels of compression) size: 600x600px	40, 20 ages: students	free viewing, quality assessment	8 sec	eyetracker: iView X RED (50Hz)
Ehinger data set	Krista Ehinger, Barbara Hidalgo-Sotelo, Antonio Torralba, Aude Oliva. Modeling search for people in 900 scenes [Visual Cognition 2009]	912 outdoor scenes size: 800x600px 1 dva ~ 34px	14 ages: 18-40	search (person detection)	until response	eyetracker: ISCAN RK-464 (240Hz)
A Database of Visual Eye Movements (DOVES)	Ian van der Linde, Umesh Rajashekar, Alan C. Bovik, Lawrence K. Cormack. DOVES: A database of visual eye movements [Spatial Vision 2009]	101 natural calibrated images size: 1024x768px	29 ages: mean=27	free viewing	5 sec	eyetracker: Fourward Tech. Gen. V (200Hz)
TUD Image Quality Database 1	H. Liu and I. Heynderickx. Studying the Added Value of Visual Attention in Objective Image Quality Metrics Based on Eye Movement Data [ICIP 2009]	29 images from the LIVE image quality database (varying dimensions)	20 ages: students	free viewing	10 sec	eyetracker: iView X RED (50Hz)
Visual Attention for Image Quality (VAIQ) Database	Ulrich Engelke, Anthony Maeder, Hans-Jurgen Zepernick. Visual Attention Modeling for Subjective Image Quality Databases [MMSP 2009]	42 images from 3 image quality databases: IRCCyN/IVC, MICT, and LIVE (varying dimensions)	15 ages: 20-60 (mean=42)	free viewing	12 sec	eyetracker: EyeTech TM3
Toronto data set	Neil Bruce, John K. Tsotsos. Attention based on information maximization [JoV 2007]	120 color images of outdoor and indoor scenes size: 681x511px	20 ages: undergrads, grads	free viewing	4 sec	A large portion of images here do not contain particular regions of interest. eyetracker: ERICA workstation including a Hitachi CCD camera with an IR emitting LED
Fixations in Faces (FiFA) data base	Moran Cerf, Jonathan Harel, Wolfgang Einhauser, Christof Koch. Predicting human gaze using low-level saliency combined with face detection [NIPS 2007]	200 color outdoor and indoor scenes size: 1024x768px 1 dva ~ 34px	8	free viewing	2 sec	Images include salient objects and many different types of faces. This data set was originally used to establish that human faces are very attractive to observers and to test models of saliency that included face detectors. Object annotations are available. eyetracker: Eyelink 1000 (1000Hz)
Le Meur data set	Olivier Le Meur, Patrick Le Callet, Dominique Barba, Dominique Thoreau. A coherent computational approach to model the bottom-up visual attention [PAMI 2006]	27 color images	40	free viewing	15 sec	eyetracker: Cambridge Research

(*) dva = degree of visual angle

Matlab code for computing visual angle. This code has been written to help standardize and make this computation easier.
Why is this relevant to saliency modeling? A continuous fixation map is calculated by convolving locations of fixation with a Gaussian of a particular sigma. This sigma is most commonly set to be approximately 1 degree of visual angle, which is an estimate of the size of the fovea (Le Meur and Baccino, 2013). This gives us an upper bound of how well we can predict where humans look on the images in a particular dataset, and thus this should inform how we evaluate saliency models.

Saliency-related data sets

FixaTons: An open project that consists of a collection of datasets, within a uniform framework in python, for scanpaths and fixations studies. It includes code for data use, statistics calculation, calculation of salience metrics and metrics for scanpath similarity.

IVC Data sets The Images and Video Communications team (IVC) of IRCCyN lab provides several image and video databases including eye movement recordings. Some of the databases are based on a free viewing task, other on a quality evaluation task.

Regional Saliency Dataset (RSD) [Li, Tian, Huang, Gao 2009] (paper) A dataset for evaluating visual saliency in video.

MSRA Salient Object Database [Liu et al. 2007] database of 20,000 images with hand labeled rectangles of principle salient object by 3 users.