Dataset | Citation | Images | Observers | Tasks | Durations | Extra Notes |
MIT300 | Tilke Judd, Fredo Durand, Antonio Torralba. A Benchmark of Computational Models of Saliency to Predict Human Fixations [MIT tech report 2012] | 300 natural indoor and outdoor scenes size: max dim: 1024px, other dim: 457-1024px 1 dva* ~ 35px |
39 ages: 18-50 |
free viewing | 3 sec | This was the first data set with held-out human eye movements, and is used as benchmark test set in the MIT/Tübingen Saliency Benchmark. eyetracker: ETL 400 ISCAN (240Hz) Download 300 test images. |
CAT2000 | Ali Borji, Laurent Itti. CAT2000: A Large Scale Fixation Dataset for Boosting Saliency Research [CVPR 2015 workshop on "Future of Datasets"] | 4000 images from 20 different categories size: 1920x1080px 1 dva* ~ 38px |
24 per image (120 in total) ages: 18-27 |
free viewing | 5 sec | This dataset contains two sets of images: train and test. Train images (100 from each category) and fixations of 18 observers are shared but 6 observers are held-out. Test images are available but fixations of all 24 observers are held out. eyetracker: EyeLink1000 (1000Hz) Download 2000 test images. Download 2000 train images (with fixations of 18 observers). |
Dataset | Citation | Images | Observers | Tasks | Durations | Extra Notes |
MIT data set | Tilke Judd, Krista Ehinger, Fredo Durand, Antonio Torralba. Learning to Predict where Humans Look [ICCV 2009] | 1003 natural indoor and outdoor scenes size: max dim: 1024px, other dim: 405-1024px 1 dva ~ 35px |
15 ages: 18-35 |
free viewing | 3 sec | Includes: 779 landscape images and 228 portrait images. Can be used as training data for MIT benchmark. eyetracker: ETL 400 ISCAN (240Hz) |
Le Meur et al 2020 | O. Le Meur, T. Le Pen, & R. Cozot. Can we accurately predict where we look at paintings?. Plos one 2020. | 150 paintings from Romantism, Realism, Impressionism, Pointillism and Fauvism |
21 | free viewing | 4 sec | |
MIE Fo & MIE No | O. Le Meur, A. Nebout, M. Chérel, & E. Etchamendy. From Kanner autism to Asperger syndromes, the difficult task to predict where ASD people look at. IEEE Access 2020. | 25 (MIE Fo) + 25 (MIE No) images size: 1920x1200px |
17 (MIE Fo) + 12 (MIE No) | free viewing | 4 sec | nppd=38 (horizontal), nppd=27 (vertical) |
EyeTrackUAV2 | Perrin, A. F., Krassanakis, V., Zhang, L., Ricordel, V., Perreira Da Silva, M., & Le Meur, O. (2020). EyeTrackUAV2: a Large-Scale Binocular Eye-Tracking Dataset for UAV Videos. Drones, 4(1), 2. | 43 videos (subset of UAV123, DTB70 and VIRAT) size: 1280x720px or 720x480px, 30fps, RGB |
30 ages: 29.8 overall, 27.9 T, 31.7 FV |
Surveillance-viewing task (T) and free-viewing (FV) | Average 33 - min 4, max 106 - total 1408 sec. Total of 42 241 frames | Available data: raw gaze data, fixations, saccades, and heatmaps. Fixation detection process was based on the implementation of the I-DT based algorithm Scenario: all the above is available for both eyes, dominant eye, binocular position, left, and right eye. Eyetracker: EyeLink 1000 Plus (1000 Hz, binocular) |
CrowdFix | M. Tahira, S. Mehboob, A. Rahman, & O. Arif CrowdFix: An Eyetracking Dataset of Real Life Crowd Videos. IEEE Access 2019. | 434 crowd videos, 1280x720px, 1-3s |
26 observers with ages between 17-40 | free viewing | 1-3 sec | Contains 3 categories of crowds: Dense-Congested, Dense-Free Flowing and Sparse. Main focus: Bottom-up attention. Available data: Fixation maps and Saliency maps of each frame ready to be used as ground truth for deep learning tasks. Eyetracker: The Eyetribe Eyetracker (60 Hz). |
EyeTrackUAV | Krassanakis, Vassilios; Perreira Da Silva, Matthieu; Ricordel, Vincent. Monitoring Human Visual Behavior during the Observation of Unmanned Aerial Vehicles (UAVs) Videos [Drones 2, no. 4: 36, 2018] | 19 UAVs videos (subset of UAV123 database) size: 1280x720px (720p), 30fps |
14 ages: 25.4 (+/-3.8) |
free viewing | average 47 sec (14 - 162 sec each; 14:47 min total) | Available data: raw
gaze data, fixations, saccades & heatmaps, Fixation detection process
was based on the implementation of the I-DT based algorithm eyetracker: EyeLink 1000 Plus (1000 Hz, binocular) |
DHF1K | Wenguan Wang, Jianbing Shen, Ming-Ming Cheng, Ali Borji. Revisiting Video Saliency Prediction in the Deep Learning Era [CVPR 2018] | 1000 video sequences size: 640x360px |
17 ages: 20-28 |
free viewing | varied | largest dataset in dynamic visual fixation prediction area eyetracker: Senso Motoric Instruments (SMI) RED 250 (250Hz) |
EMOtional attention dataset (EMOd) | Shaojing Fan, Zhiqi Shen, Ming Jiang, Bryan L. Koenig, Juan Xu, Mohan Kankanhali, Qi Zhao. Emotional Attention: A Study of Image Sentiment and Visual Attention [CVPR 2018] (Spotlight) | 1019 emotion-eliciting images, including 321 emotion-evoking pictures selected from IAPS size: 1024x768px |
16 ages: 21-35 |
free viewing | 3 sec | The images cover a diversity of emotional contents, arousing various sentiments, such as happiness, excitement, awe, disgust, and fear. Object-level and image-level annotations, code, and CNN models for saliency prediction are available on the project page. eyetracker: Eyelink 1000 (1000Hz) |
ETTO (Eye-Tracking Through Objects) | A. Bruno, F. Gugliuzza, R. Pirrone & E. Ardizzone. A Multi-Scale Colour and Keypoint Density-Based Approach for Visual Saliency Detection , IEEE Access, vol.8, pp.:121330-121343, (2020), IEEE. E. Ardizzone, A. Bruno & F. Gugliuzza. Exploiting visual saliency algorithms for object-based attention: A new color and scale-based approach , International Conference on Image Analysis and Processing. pp.:191-201, (2017), Springer. | The dataset consists of enhanced versions of 40 images from OPED. | 24 ages: 21-34 |
free viewing | 3 sec | 24 subjects placed at approximately 70 cm from the screen. |
EToCVD (Eye-Tracking of Colour Vision Deficiencies) | A. Bruno, F. Gugliuzza, E. Ardizzone, C.C. Giunta & R. Pirrone. Image content enhancement through salient regions segmentation for people with color vision deficiencies, i-Perception, vol. 10, num. 3, pp.:2041669519841073,(2019), SAGE Publications Sage UK: London, England. | The dataset consists of 90 colourful images, with different sizes and resolutions, obtained from different public datasets: MIT1003, CAT2000, NUSEF, MIT300. | 8 subjects with colour vision deficiencies and 8 subjects without. | free viewing | 3 sec | |
FIGRIM Fixation Dataset | Zoya Bylinskii, Phillip Isola, Constance Bainbridge, Antonio Torralba, Aude Oliva. Intrinsic and Extrinsic Effects on Image Memorability [Vision Research 2015] | 2787 natural scenes from 21 different indoor and outdoor scene categories
size: 1000x1000px 1 dva ~ 33px |
15 average per image (42 in total) ages: 17-33 |
memory task | 2 sec | Includes: annotated (LabelMe) objects and memory scores for 630 of the images eyetracker: EyeLink 1000 (500 Hz) |
Coutrot Database 1 | Antoine Coutrot, Nathalie Guyader. How saliency, faces, and sound influence gaze in dynamic
social scenes [JoV 2014] Antoine Coutrol, Nathalie Guyader. Toward the introduction of auditory information in dynamic visual attention models [WIAMIS 2013] |
60 videos size: 720x576px |
72 ages: 20-35 |
free viewing | average: 17 sec | 4 categories: one moving object, several moving objects, landscapes, people having a conversation. Each video has been seen in 4 auditory conditions. eyetracker: EyeLink 1000 (1000 Hz) |
Coutrot Database 2 | Antoine Coutrot, Nathalie Guyader. An efficient audiovisual saliency model to predict eye positions when looking at conversations [EUSIPCO 2015] | 15 videos size: 1232x504px |
40 ages: 22-36 |
free viewing | average: 44 sec | Videos of 4 people having a meeting. Each video has been seen in 2 auditory conditions (with and without the original soundtrack). eyetracker: EyeLink 1000 (1000 Hz) |
SAVAM | Yury Gitman, Mikhail Erofeev, Dmitriy Vatolin, Andrey Bolshakov, Alexey Fedorov. Semiautomatic Visual-Attention Modeling and Its Application to Video Compression [ICIP 2014] | 41 videos size: max dim: 1920px, other dim: 1080px |
50 ages: 18-56 |
free viewing | average: 20 sec | Left and right stereoscopic views available for all sequences.
Nevertheless, only the left view was demonstrated to observers. eyetracker: SMI iViewXTM Hi-Speed 1250 (500Hz) |
Eye Fixations in Crowd (EyeCrowd) data set | Ming Jiang, Juan Xu, Qi Zhao. Saliency in Crowd [ECCV 2014] | 500 natural indoor and outdoor images with varying crowd densities size: 1024x768px 1 dva ~ 26px |
16 ages: 20-30 |
free viewing | 5 sec | The images have a diverse range of crowd densities (up to 268 faces per image). Annotations available: faces labelled with rectangles; two annotations of pose and partial occlusion on each face. eyetracker: Eyelink 1000 (1000Hz) |
Fixations in Webpage Images (FiWI) data set | Chengyao Shen, Qi Zhao. Webpage Saliency [ECCV 2014] | 149 webpage screenshots from in 3 categories. size: 1360x768px 1 dva ~ 26px |
11 ages: 21-25 |
free viewing | 5 sec | Text: 50, Pictorial: 50, Mixed:49 eyetracker: Eyelink 1000 (1000Hz) |
VIU data set | Kathryn Koehler, Fei Guo, Sheng Zhang, Miguel P. Eckstein. What Do Saliency Models Predict? [JoV 2014] | 800 natural indoor and outdoor scenes size: max dim: 405px 1 dva ~ 27px |
100,22,20,38 ages: 18-23 |
explicit saliency judgement, free viewing, saliency search, cued object search | until response, 2 sec, 2 sec, 2 sec | eyetracker: Eyelink 1000 (250Hz) |
Object and Semantic Images and Eye-tracking (OSIE) data set | Juan Xu, Ming Jiang, Shuo Wang, Mohan Kankanhalli, Qi Zhao. Predicting Human Gaze Beyond Pixels [JoV 2014] | 700 natural indoor and outdoor scenes, aesthetic photographs from Flickr and Google size: 800x600px 1 dva ~ 24px |
15 ages: 18-30 |
free viewing | 3 sec | A large portion of images have multiple dominant objects in the same image. Annotations available: 5,551 segmented objects with fine contours; annotations of 12 semantic attributes on each of the 5,551 objects eyetracker: Eyelink 1000 (2000Hz) |
VIP data set | Keng-Teck Ma, Terence Sim, Mohan Kankanhalli. A Unifying Framework for Computational Eye-Gaze Research [Workshop on Human Behavior Understanding 2013] | 150 neutral and affective images, randomly chosen from NUSEF dataset |
75 ages: undergrads, postgrads, working adults |
free viewing, anomaly detection | 5 sec | Annotations available: demographic and personality traits of the viewers (can be used for training trait-specific saliency models) eyetracker: SMI RED 250 (120Hz) |
FocalAmbient | Follet, B., Le Meur, O., & Baccino, T. New insights into ambient and focal visual fixations using an automatic classification algorithm. i-Perception, 2(6), 592-610 (2011). | 120 images size: 800x600px |
40 22 men, 18 women, mean age = 36.7 |
free viewing | 4 sec | Four categories (Street, Coast, Mountain, and Open Country) with very few salient objects. Available data: original images, fixation maps, saliency maps, and gaze coordinates in the referential image. Visual angle: 36x29. Eyetracker: SMI RED iViewX system (50Hz) |
MIT Low-resolution data set | Tilke Judd, Fredo Durand, Antonio Torralba. Fixations on Low-Resolution Images [JoV 2011] | 168 natural and 25 pink noise images at 8 different resolutions size: 860x1024px 1 dva ~ 35px |
8 viewers per image, 64 in total ages: 18-55 |
free viewing | 3 sec | eyetracker: ETL 400 ISCAN (240Hz) |
KTH Koostra data set | Gert Kootstra, Bart de Boer, Lambert R. B. Schomaker. Predicting Eye Fixations on Complex Visual Stimuli using Local Symmetry [Cognitive Computation 2011] | 99 photographs from 5 categories. size: 1024x768px |
31 ages: 17-32 |
free viewing | 5 sec | Images by category: 19 images with symmetrical natural objects, 12 images of animals in a natural setting, 12 images of street scenes, 16 images of buildings, 40 images of natural environments. eyetracker: Eyelink I |
NUSEF data set | Subramanian Ramanathan, Harish Katti, Nicu Sebe, Mohan Kankanhalli, Tat-Seng Chua. An eye fixation database for saliency detection in images [ECCV 2010] | 758 everyday scenes from Flickr, aesthetic content from Photo.net, Google images, emotion-evoking IAPS pictures size: 1024x728px |
25 on average ages: 18-35 |
free viewing | 5 sec | eyetracker: ASL |
TUD Image Quality Database 2 | H. Alers, H. Liu, J. Redi and I. Heynderickx. Studying the risks of optimizing the image quality in saliency regions at the expense of background content [SPIE 2010] | 160 images (40 at 4 different levels of compression) size: 600x600px |
40, 20 ages: students |
free viewing, quality assessment | 8 sec | eyetracker: iView X RED (50Hz) |
Ehinger data set | Krista Ehinger, Barbara Hidalgo-Sotelo, Antonio Torralba, Aude Oliva. Modeling search for people in 900 scenes [Visual Cognition 2009] | 912 outdoor scenes size: 800x600px 1 dva ~ 34px |
14 ages: 18-40 |
search (person detection) | until response | eyetracker: ISCAN RK-464 (240Hz) |
A Database of Visual Eye Movements (DOVES) | Ian van der Linde, Umesh Rajashekar, Alan C. Bovik, Lawrence K. Cormack. DOVES: A database of visual eye movements [Spatial Vision 2009] | 101 natural calibrated images size: 1024x768px |
29 ages: mean=27 |
free viewing | 5 sec | eyetracker: Fourward Tech. Gen. V (200Hz) |
TUD Image Quality Database 1 | H. Liu and I. Heynderickx. Studying the Added Value of Visual Attention in Objective Image Quality Metrics Based on Eye Movement Data [ICIP 2009] | 29 images from the LIVE image quality database (varying dimensions) | 20 ages: students |
free viewing | 10 sec | eyetracker: iView X RED (50Hz) |
Visual Attention for Image Quality (VAIQ) Database | Ulrich Engelke, Anthony Maeder, Hans-Jurgen Zepernick. Visual Attention Modeling for Subjective Image Quality Databases [MMSP 2009] | 42 images from 3 image quality databases: IRCCyN/IVC, MICT, and LIVE (varying dimensions) | 15 ages: 20-60 (mean=42) |
free viewing | 12 sec | eyetracker: EyeTech TM3 |
Toronto data set | Neil Bruce, John K. Tsotsos. Attention based on information maximization [JoV 2007] | 120 color images of outdoor and indoor scenes size: 681x511px |
20 ages: undergrads, grads |
free viewing | 4 sec | A large portion of images here do not contain particular regions of interest. eyetracker: ERICA workstation including a Hitachi CCD camera with an IR emitting LED |
Fixations in Faces (FiFA) data base | Moran Cerf, Jonathan Harel, Wolfgang Einhauser, Christof Koch. Predicting human gaze using low-level saliency combined with face detection [NIPS 2007] | 200 color outdoor and indoor scenes size: 1024x768px 1 dva ~ 34px |
8 | free viewing | 2 sec | Images include salient objects and many different types of faces. This data set was originally used to establish that human faces are very attractive to observers and to test models of saliency that included face detectors. Object annotations are available. eyetracker: Eyelink 1000 (1000Hz) |
Le Meur data set | Olivier Le Meur, Patrick Le Callet, Dominique Barba, Dominique Thoreau. A coherent computational approach to model the bottom-up visual attention [PAMI 2006] | 27 color images | 40 | free viewing | 15 sec | eyetracker: Cambridge Research |