Face Recognition Homepage

DATABASES

When benchmarking an algorithm it is recommendable to use a standard test data set for researchers to be able to directly compare the results. While there are many databases in use currently, the choice of an appropriate database to be used should be made based on the task given (aging, expressions, lighting etc). Another way is to choose the data set specific to the property to be tested (e.g. how algorithm behaves when given images with lighting changes or images with different facial expressions). If, on the other hand, an algorithm needs to be trained with more images per class (like LDA), Yale face database is probably more appropriate than FERET.

Read more:

R. Gross, Face Databases, Handbook of Face Recognition, Stan Z. Li and Anil K. Jain, ed., Springer-Verlag, February 2005, 22 pages
link

Testing protocols:

Face Image ISO Compliance Verification Benchmark Area - FVC-onGoing is a web-based automated evaluation system developed to evaluate biometric algorithms. Algorithms submitted to the Face Compliance Verification to ISO standard (FICV) benchmark area are required to check the compliance of face images to ISO/IEC 19794-5 standard. To the best of our knowledge this is the first available benchmark that directly assesses the accuracy of algorithms to automatically verify the compliance of face images to the ISO standard, in the attempt of semi-automating the document issuing process.

P. Jonathon Phillips, A. Martin, C.l. Wilson, M. Przybocki, An Introduction to Evaluating Biometric Systems, IEEE Computer, Vol. 33, No. 2, February 2000, pp. 56-63
download here, 407 kB

A. J. Mansfield, J. L. Wayman, Best Practices in Testing and Reporting Performance of Biometric Devices, NPL Report CMSC 14/02, August 2002
download here, 406 kB

K. Delac, M. Grgic, S. Grgic, Independent Comparative Study of PCA, ICA, and LDA on the FERET Data Set, International Journal of Imaging Systems and Technology, Vol. 15, Issue 5, pp. 252-260
download here, 412 kB

Here are some face data sets often used by researchers:

The Color FERET Database, USA

The FERET program set out to establish a large database of facial images that was gathered independently from the algorithm developers. Dr. Harry Wechsler at George Mason University was selected to direct the collection of this database. The database collection was a collaborative effort between Dr. Wechsler and Dr. Phillips. The images were collected in a semi-controlled environment. To maintain a degree of consistency throughout the database, the same physical setup was used in each photography session. Because the equipment had to be reassembled for each session, there was some minor variation in images collected on different dates. The FERET database was collected in 15 sessions between August 1993 and July 1996. The database contains 1564 sets of images for a total of 14,126 images that includes 1199 individuals and 365 duplicate sets of images. A duplicate set is a second set of images of a person already in the database and was usually taken on a different day. For some individuals, over two years had elapsed between their first and last sittings, with some subjects being photographed multiple times. This time lapse was important because it enabled researchers to study, for the first time, changes in a subject's appearance that occur over a year.

SCface - Surveillance Cameras Face Database

SCface is a database of static images of human faces. Images were taken in uncontrolled indoor environment using five video surveillance cameras of various qualities. Database contains 4160 static images (in visible and infrared spectrum) of 130 subjects. Images from different quality cameras mimic the real-world conditions and enable robust face recognition algorithms testing, emphasizing different law enforcement and surveillance use case scenarios. SCface database is freely available to research community. The paper describing the database is available here.

SCfaceDB Landmarks

The database is comprised of 21 facial landmarks (from 4160 face images) from 130 users annotated manually by a human operator, as described in this paper.

Multi-PIE

A close relationship exists between the advancement of face recognition algorithms and the availability of face databases varying factors that affect facial appearance in a controlled manner. The PIE database, collected at Carnegie Mellon University in 2000, has been very influential in advancing research in face recognition across pose and illumination. Despite its success the PIE database has several shortcomings: a limited number of subjects, a single recording session and only few expressions captured. To address these issues researchers at Carnegie Mellon University collected the Multi-PIE database. It contains 337 subjects, captured under 15 view points and 19 illumination conditions in four recording sessions for a total of more than 750,000 images. The paper describing the database is available here.

The Yale Face Database

Contains 165 grayscale images in GIF format of 15 individuals. There are 11 images per subject, one per different facial expression or configuration: center-light, w/glasses, happy, left-light, w/no glasses, normal, right-light, sad, sleepy, surprised, and wink.

The Yale Face Database B

Contains 5760 single light source images of 10 subjects each seen under 576 viewing conditions (9 poses x 64 illumination conditions). For every subject in a particular pose, an image with ambient (background) illumination was also captured.

PIE Database, CMU

A database of 41,368 images of 68 people, each person under 13 different poses, 43 different illumination conditions, and with 4 different expressions.

Project - Face In Action (FIA) Face Video Database, AMP, CMU

Capturing scenario mimics the real world applications, for example, when a person is going through the airport check-in point. Six cameras capture human faces from three different angles. Three out of the six cameras have smaller focus length, and the other three have larger focus length. Plan to capture 200 subjects in 3 sessions in different time period. For one session, both in-door and out-door scenario will be captured. User-dependent pose and expression variation are expected from the video sequences.

AT&T "The Database of Faces" (formerly "The ORL Database of Faces")

Ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement).

Cohn-Kanade AU Coded Facial Expression Database

Subjects in the released portion of the Cohn-Kanade AU-Coded Facial Expression Database are 100 university students. They ranged in age from 18 to 30 years. Sixty-five percent were female, 15 percent were African-American, and three percent were Asian or Latino. Subjects were instructed by an experimenter to perform a series of 23 facial displays that included single action units and combinations of action units. Image sequences from neutral to target display were digitized into 640 by 480 or 490 pixel arrays with 8-bit precision for grayscale values. Included with the image files are "sequence" files; these are short text files that describe the order in which images should be read.

MIT-CBCL Face Recognition Database

The MIT-CBCL face recognition database contains face images of 10 subjects. It provides two training sets: 1. High resolution pictures, including frontal, half-profile and profile view; 2. Synthetic images (324/subject) rendered from 3D head models of the 10 subjects. The head models were generated by fitting a morphable model to the high-resolution training images. The 3D models are not included in the database. The test set consists of 200 images per subject. We varied the illumination, pose (up to about 30 degrees of rotation in depth) and the background.

Image Database of Facial Actions and Expressions - Expression Image Database

24 subjects are represented in this database, yielding between about 6 to 18 examples of the 150 different requested actions. Thus, about 7,000 color images are included in the database, and each has a matching gray scale image used in the neural network analysis.

Face Recognition Data, University of Essex, UK

395 individuals (male and female), 20 images per individual. Contains images of people of various racial origins, mainly of first year undergraduate students, so the majority of indivuals are between 18-20 years old but some older individuals are also present. Some individuals are wearing glasses and beards.

NIST Mugshot Identification Database

There are images of 1573 individuals (cases) 1495 male and 78 female. The database contains both front and side (profile) views when available. Separating front views and profiles, there are 131 cases with two or more front views and 1418 with only one front view. Profiles have 89 cases with two or more profiles and 1268 with only one profile. Cases with both fronts and profiles have 89 cases with two or more of both fronts and profiles, 27 with two or more fronts and one profile, and 1217 with only one front and one profile.

NLPR Face Database

450 face images. 896 x 592 pixels. JPEG format. 27 or so unique people under with different lighting/expressions/backgrounds.

M2VTS Multimodal Face Database (Release 1.00)

Database is made up from 37 different faces and provides 5 shots for each person. These shots were taken at one week intervals or when drastic face changes occurred in the meantime. During each shot, people have been asked to count from '0' to '9' in their native language (most of the people are French speaking), rotate the head from 0 to -90 degrees, again to 0, then to +90 and back to 0 degrees. Also, they have been asked to rotate the head once again without glasses if they wear any.

The Extended M2VTS Database, University of Surrey, UK

Contains four recordings of 295 subjects taken over a period of four months. Each recording contains a speaking head shot and a rotating head shot. Sets of data taken from this database are available including high quality colour images, 32 KHz 16-bit sound files, video sequences and a 3D model.

The AR Face Database, The Ohio State University, USA

4,000 color images corresponding to 126 people's faces (70 men and 56 women). Images feature frontal view faces with different facial expressions, illumination conditions, and occlusions (sun glasses and scarf).

The University of Oulu Physics-Based Face Database

Contains 125 different faces each in 16 different camera calibration and illumination condition, an additional 16 if the person has glasses. Faces in frontal position captured under Horizon, Incandescent, Fluorescent and Daylight illuminant. Includes 3 spectral reflectance of skin per person measured from both cheeks and forehead. Contains RGB spectral response of camera used and spectral power distribution of illuminants.

CAS-PEAL Face Database

The CAS-PEAL face database has been constructed under the sponsors of National Hi-Tech Program and ISVISION. The goals to create the PEAL face database include: providing the worldwide researchers of FR community a large-scale Chinese face database for training and evaluating their algorithms; facilitating the development of FR by providing large-scale face images with different sources of variations, especially Pose, Expression, Accessories, and Lighting (PEAL); advancing the state-of-the-art face recognition technologies aiming at practical applications especially for the oriental.

Japanese Female Facial Expression (JAFFE) Database

The database contains 213 images of 7 facial expressions (6 basic facial expressions + 1 neutral) posed by 10 Japanese female models. Each image has been rated on 6 emotion adjectives by 60 Japanese subjects.

BioID Face DB - HumanScan AG, Switzerland

The dataset consists of 1521 gray level images with a resolution of 384x286 pixel. Each one shows the frontal view of a face of one out of 23 different test persons. For comparison reasons the set also contains manually set eye postions.

Psychological Image Collection at Stirling (PICS)

This is a collection of images useful for research in Psychology, such as sets of faces and objects. The images in the database are organised into SETS, with each set often representing a separate experimental study.

The Sheffield Face Database (previously: The UMIST Face Database)

Consists of 564 images of 20 people. Each covering a range of poses from profile to frontal views. Subjects cover a range of race/sex/appearance. Each subject exists in their own directory labelled 1a, 1b, ... 1t and images are numbered consequetively as they were taken. The files are all in PGM format, approximately 220 x 220 pixels in 256 shades of grey.

Face Video Database of the Max Planck Institute for Biological Cybernetics

This database contains short video sequences of facial Action Units recorded simultaneously from six different viewpoints, recorded in 2003 at the Max Planck Institute for Biological Cybernetics. The video cameras were arranged at 18 degrees intervals in a semi-circle around the subject at a distance of roughly 1.3m. The cameras recorded 25 frames/sec at 786x576 video resolution, non-interlaced. In order to facilitate the recovery of rigid head motion, the subject wore a headplate with 6 green markers. The website contains a total of 246 video sequences in MPEG1 format.

Caltech Faces

450 face images. 896 x 592 pixels. JPEG format. 27 or so unique people under with different lighting/expressions/backgrounds.

EQUINOX HID Face Database

Human identification from facial features has been studied primarily using imagery from visible video cameras. Thermal imaging sensors are one of the most innovative emerging techonologies in the market. Fueled by ever lowering costs and improved sensitivity and resolution, our sensors provide exciting new oportunities for biometric identification. As part of our involvement in this effort, Equinox is collecting an extensive database of face imagery in the following modalities: coregistered broadband-visible/LWIR (8-12 microns), MWIR (3-5 microns), SWIR (0.9-1.7 microns). This data collection is made available for experimentation and statistical performance evaluations.

VALID Database

With the aim to facilitate the development of robust audio, face, and multi-modal person recognition systems, the large and realistic multi-modal (audio-visual) VALID database was acquired in a noisy "real world" office scenario with no control on illumination or acoustic noise. The database consists of five recording sessions of 106 subjects over a period of one month. One session is recorded in a studio with controlled lighting and no background noise, the other 4 sessions are recorded in office type scenarios. The database contains uncompressed JPEG Images at resolution of 720x576 pixels.

The UCD Colour Face Image Database for Face Detection

The database has two parts. Part one contains colour pictures of faces having a high degree of variability in scale, location, orientation, pose, facial expression and lighting conditions, while part two has manually segmented results for each of the images in part one of the database. These images are acquired from a wide variety of sources such as digital cameras, pictures scanned using photo-scanner, other face databases and the World Wide Web. The database is intended for distribution to researchers.

Georgia Tech Face Database

The database contains images of 50 people and is stored in JPEG format. For each individual, there are 15 color images captured between 06/01/99 and 11/15/99. Most of the images were taken in two different sessions to take into account the variations in illumination conditions, facial expression, and appearance. In addition to this, the faces were captured at different scales and orientations.

Indian Face Database

The database contains a set of face images taken in February, 2002 in the IIT Kanpur campus. There are eleven different images of each of 40 distinct subjects. For some subjects, some additional photographs are included. All the images were taken against a bright homogeneous background with the subjects in an upright, frontal position. The files are in JPEG format. The size of each image is 640x480 pixels, with 256 grey levels per pixel. The images are organized in two main directories - males and females. In each of these directories, there are directories with name as a serial numbers, each corresponding to a single individual. In each of these directories, there are eleven different images of that subject, which have names of the form abc.jpg, where abc is the image number for that subject. The following orientations of the face are included: looking front, looking left, looking right, looking up, looking up towards left, looking up towards right, looking down. Available emotions are: neutral, smile, laughter, sad/disgust.

VidTIMIT Database

The VidTIMIT database is comprised of video and corresponding audio recordings of 43 people, reciting short sentences. It can be useful for research on topics such as multi-view face recognition, automatic lip reading and multi-modal speech recognition. The dataset was recorded in 3 sessions, with a space of about a week between each session. There are 10 sentences per person, chosen from the TIMIT corpus. In addition to the sentences, each person performed a head rotation sequence in each session. The sequence consists of the person moving their head to the left, right, back to the center, up, then down and finally return to center. The recording was done in an office environment using a broadcast quality digital video camera. The video of each person is stored as a numbered sequence of JPEG images with a resolution of 512 x 384 pixels. The corresponding audio is stored as a mono, 16 bit, 32 kHz WAV file.

Labeled Faces in the Wild

Labeled Faces in the Wild is a database of face photographs designed for studying the problem of unconstrained face recognition. The database contains more than 13,000 images of faces collected from the web. Each face has been labeled with the name of the person pictured. 1680 of the people pictured have two or more distinct photos in the database. The only constraint on these faces is that they were detected by the Viola-Jones face detector. Please see the database web page and the technical report linked there for more details.

The LFWcrop Database

LFWcrop is a cropped version of the Labeled Faces in the Wild (LFW) dataset, keeping only the center portion of each image (i.e. the face). In the vast majority of images almost all of the background is omitted. LFWcrop was created due to concern about the misuse of the original LFW dataset, where face matching accuracy can be unrealistically boosted through the use of background parts of images (i.e. exploitation of possible correlations between faces and backgrounds). As the location and size of faces in LFW was determined through the use of an automatic face locator (detector), the cropped faces in LFWcrop exhibit real-life conditions, including mis-alignment, scale variations, in-plane as well as out-of-plane rotations.

Labeled Faces in the Wild-a (LFW-a)

The "Labeled Faces in the Wild-a" image collection is a database of labeled, face images intended for studying Face Recognition in unconstrained images. It contains the same images available in the original Labeled Faces in the Wild data set, however, here we provide them after alignment using a commercial face alignment software. Some of our results were produced using these images. We show this alignment to improve the performance of face recognition algorithms. We have maintained the same directory structure as in the original LFW data set, and so these images can be used as direct substitutes for those in the original image set. Note, however, that the images available here are grayscale versions of the originals.

3D_RMA database

The 3D_RMA database is a collection of two sessions (Nov 1997 and Jan 1998) consisting of 120 persons. For each session, three shots were recorded with different (but limited) orientations of the head. Details about the population and typical problems affecting the quality are given in the referred link. 3D was captured thanks to a first prototype of a proprietary system based on structured light (analog camera!). The quality was limited but sufficient to show the ability of 3D face recognition. For privacy reasons, the texture images are not made available. In the period 2003-2008, this database has been downloaded by about 100 researchers. A few papers present recognition results with the database (like, of course, papers from the author).

GavabDB: 3D face database, GAVAB research group, Universidad Rey Juan Carlos, Spain

GavabDB is a 3D face database. It contains 549 three-dimensional images of facial surfaces. These meshes correspond to 61 different individuals (45 male and 16 female) having 9 images for each person. The total of the individuals are Caucasian and their age is between 18 and 40 years old. Each image is given by a mesh of connected 3D points of the facial surface without texture. The database provides systematic variations with respect to the pose and the facial expression. In particular, the 9 images corresponding to each individual are: 2 frontal views with neutral expression, 2 x-rotated views (ą30o, looking up and looking down respectively) with neutral expression, 2 y-rotated views (ą90o, left and right profiles respectively) with neutral expression and 3 frontal gesture images (laugh, smile and a random gesture chosen by the user, respectively).

FRAV2D Database

This database is formed by up to 109 subjects (75 men and 34 women), with 32 colour images per person. Each picture has a 320 x 240 pixel resolution, with the face occupying most of the image in an upright position. For one single person, all the photographs were taken on the same day, although the subject was forced to stand up and sit down again in order to change pose and gesture. In all cases, the background is plain and dark blue. The 32 images were classified in six groups according to the pose and lighting conditions: 12 frontal images, 4 15o-turned images, 4 30o-turned images, 4 images with gestures, 4 images with occluded face features and 4 frontal images with a change of illumination. This database is delivered for free exclusively for research purposes.

FRAV3D Database

This database contains 106 subjects, with approximately one woman every three men. The data were acquired with a Minolta VIVID 700 scanner, which provides texture information (2D image) and a VRML file (3D image). If needed, the corresponding range data (2.5D image) can be computed by means of the VRML file. Therefore, it is a multimodal database (2D, 2.5D y 3D). During all time, a strict acquisition protocol was followed, with controlled lighting conditions. The person sat down on an adjustable stool opposite the scanner and in front of a blue wall. No glasses, hats or scarves were allowed. A total of 16 captures per person were taken in every session, with different poses and lighting conditions, trying to cover all possible variations, including turns in different directions, gestures and lighting changes. In every case only one parameter was modified between two captures. This is one of the main advantages of this database, respect to others. This database is delivered for free exclusively for research purposes.

BJUT-3D Chinese Face Database

The BJUT-3D is a three dimension face database including 500 Chinese persons. There are 250 females and 250 males in the database. Everyone has a 3D face data with neutral expression and without accessories. Original high-resolution 3D face data is acquired by the CyberWare 3D scanner in given environment, Every 3D face data has been preprocessed, and cut the redundant parts. Now the face database is available for research purpose only. The Multimedia and Intelligent Software Technology Beijing Municipal Key Laboratory in Beijing University of Technology is serving as the technical agent for distribution of the database and reserves the copyright of all the data in the database.

The Bosphorus Database

The Bosphorus Database is a new 3D face database that includes a rich set of expressions, systematic variation of poses and different types of occlusions. This database is unique from three aspects: (1) The facial expressions are composed of judiciously selected subset of Action Units as well as the six basic emotions, and many actors/actresses are incorporated to obtain more realistic expression data; (2) A rich set of head pose variations are available; (3) Different types of face occlusions are included. Hence, this new database can be a very valuable resource for development and evaluation of algorithms on face recognition under adverse conditions and facial expression analysis as well as for facial expression synthesis.

PUT Face Database

PUT Face Database consists of almost 10000 hi-res images of 100 people. Images were taken in controlled conditions and the database is supplied with additional data including: rectangles containing face, eyes, nose and mouth, landmarks positions and manually annotated contour models. Database is available for research purposes.

The Basel Face Model (BFM)

The Basel Face Model (BFM) is a 3D Morphable Face Model constructed from 100 male and 100 female example faces. The BFM consists of a generative 3D shape model covering the face surface from ear to ear and a high quality texture model. The model can be used either directly for 2D and 3D face recognition or to generate training and test images for any imaging condition. Hence, in addition to being a valuable model for face analysis it can also be viewed as a meta-database which allows the creation of accurately labeled synthetic training and testing images. To allow for a fair comparison with other algorithms, we provide both the training data set (the BFM) and the model fitting results for several standard image data sets (CMU-PIE, FERET) obtained with our fitting algorithm. The BFM web page additionally provides a set of registered scans of ten individuals, together with a set of 270 renderings of these individuals with systematic pose and light variations. These scans are not included in the training set of the BFM and form a standardized test set with a ground truth for pose and illumination.

Plastic Surgery Face Database

The plastic surgery face database is a real world database that contains 1800 pre and post surgery images pertaining to 900 subjects. Different types of facial plastic surgeries have different impact on facial features. To enable the researchers to design and evaluate face recognition algorithms on all types of facial plastic surgeries, the database contains images from a wide variety of cases such as Rhinoplasty (nose surgery), Blepharoplasty (eyelid surgery), brow lift, skin peeling, and Rhytidectomy (face lift). For each individual, there are two frontal face images with proper illumination and neutral expression: the first is taken before surgery and the second is taken after surgery. The database contains 519 image pairs corresponding to local surgeries and 381 cases of global surgery (e.g., skin peeling and face lift). The details of the database and performance evaluation of several well known face recognition algorithms is available in this paper.

The Iranian Face Database (IFDB)

The Iranian Face Database (IFDB), the first image database in middle-east, contains color facial imagery of a large number of Iranian subjects. IFDB is a large database that can support studies of the age classification systems. It contains over 3,600 color images. IFDB can be used for age classification, facial feature extraction, aging, facial ratio extraction, percent of facial similarity, facial surgery, race detection and other similar researches.

The Hong Kong Polytechnic University NIR Face Database

The Biometric Research Centre at The Hong Kong Polytechnic University developed a real time NIR face capture device and used it to construct a large-scale NIR face database. The NIR face image acquisition system consists of a camera, an LED light source, a filter, a frame grabber card and a computer. The camera used is a JAI camera, which is sensitive to NIR band. The active light source is in the NIR spectrum between 780nm - 1,100 nm. The peak wavelength is 850 nm. The strength of the total LED lighting is adjusted to ensure a good quality of the NIR face images when the camera face distance is between 80 cm - 120 cm, which is convenient for the users. By using the data acquisition device described above, we collected NIR face images from 335 subjects. During the recording, the subject was first asked to sit in front of the camera, and the normal frontal face images of him/her were collected. Then the subject was asked to make expression and pose changes and the corresponding images were collected. To collect face images with scale variations, we asked the subjects to move near to or away from the camera in a certain range. At last, to collect face images with time variations, samples from 15 subjects were collected at two different times with an interval of more than two months. In each recording, we collected about 100 images from each subject, and in total about 34,000 images were collected in the PolyU-NIRFD database.

The Hong Kong Polytechnic University Hyperspectral Face Database (PolyU-HSFD)

The Biometric Research Centre at The Hong Kong Polytechnic University established a Hyperspectral Face database. The indoor hyperspectral face acquisition system was built which mainly consists of a CRI's VariSpec LCTF and a Halogen Light, and includes a hyperspectral dataset of 300 hyperspectral image cubes from 25 volunteers with age range from 21 to 33 (8 female and 17 male). For each individual, several sessions were collected with an average time space of 5 month. The minimal interval is 3 months and the maximum is 10 months. Each session consists of three hyperspectral cubes - frontal, right and left views with neutral-expression. The spectral range is from 400 nm to 720 nm with a step length of 10 nm, producing 33 bands in all. Since the database was constructed over a long period of time, significant appearance variations of the subjects, e.g. changes of hair style and skin condition, are presented in the data. In data collection, positions of the camera, light and subject are fixed, which allows us to concentrate on the spectral characteristics for face recognition without masking from environmental changes.

MOBIO - Mobile Biometry Face and Speech Database

The MOBIO database consists of bi-modal (audio and video) data taken from 152 people. The database has a female-male ratio or nearly 1:2 (100 males and 52 females) and was collected from August 2008 until July 2010 in six different sites from five different countries. This led to a diverse bi-modal database with both native and non-native English speakers. In total 12 sessions were captured for each client: 6 sessions for Phase I and 6 sessions for Phase II. The Phase I data consists of 21 questions with the question types ranging from: Short Response Questions, Short Response Free Speech, Set Speech, and Free Speech. The Phase II data consists of 11 questions with the question types ranging from: Short Response Questions, Set Speech, and Free Speech. The database was recorded using two mobile devices: a mobile phone and a laptop computer. The mobile phone used to capture the database was a NOKIA N93i mobile while the laptop computer was a standard 2008 MacBook. The laptop was only used to capture part of the first session, this first session consists of data captured on both the laptop and the mobile phone.

Texas 3D Face Recognition Database (Texas 3DFRD)

Texas 3D Face Recognition database (Texas 3DFRD) contains 1149 pairs of facial color and range images of 105 adult human subjects. The images were acquired at the company Advanced Digital Imaging Research (ADIR), LLC (Friendswood, TX), formerly a subsidiary of Iris International, Inc. (Chatsworth, CA), with assistance from research students and faculty from the Laboratory for Image and Video Engineering (LIVE) at The University of Texas at Austin. This project was sponsored by the Advanced Technology Program of the National Institute of Standards and Technology (NIST). The database is being made available by Dr. Alan C Bovik at UT Austin. The images were acquired using a stereo imaging system at a high spatial resolution of 0.32 mm. The color and range images were captured simultaneously and thus are perfectly registered to each other. All faces have been normalized to the frontal position and the tip of the nose is positioned at the center of the image. The images are of adult humans from all the major ethnic groups and both genders. For each face, is also available information about the subjects' gender, ethnicity, facial expression, and the locations 25 anthropometric facial fiducial points. These fiducial points were located manually on the facial color images using a computer based graphical user interface. Specific data partitions (training, gallery, and probe) that were employed at LIVE to develop the Anthropometric 3D Face Recognition algorithm are also available.

Natural Visible and Infrared facial Expression database (USTC-NVIE)

The database contains both spontaneous and posed expressions of more than 100 subjects, recorded simultaneously by a visible and an infrared thermal camera, with illumination provided from three different directions. The posed database also includes expression images with and without glasses. The paper describing the database is available here.

FEI Face Database

The FEI face database is a Brazilian face database that contains a set of face images taken between June 2005 and March 2006 at the Artificial Intelligence Laboratory of FEI in Sao Bernardo do Campo, Sao Paulo, Brazil. There are 14 images for each of 200 individuals, a total of 2800 images. All images are colourful and taken against a white homogenous background in an upright frontal position with profile rotation of up to about 180 degrees. Scale might vary about 10% and the original size of each image is 640x480 pixels. All faces are mainly represented by students and staff at FEI, between 19 and 40 years old with distinct appearance, hairstyle, and adorns. The number of male and female subjects are exactly the same and equal to 100.

ChokePoint

ChokePoint video dataset is designed for experiments in person identification/verification under real-world surveillance conditions using existing technologies. An array of three cameras was placed above several portals (natural choke points in terms of pedestrian traffic) to capture subjects walking through each portal in a natural way. While a person is walking through a portal, a sequence of face images (ie. a face set) can be captured. Faces in such sets will have variations in terms of illumination conditions, pose, sharpness, as well as misalignment due to automatic face localisation/detection. Due to the three camera configuration, one of the cameras is likely to capture a face set where a subset of the faces is near-frontal. The dataset consists of 25 subjects (19 male and 6 female) in portal 1 and 29 subjects (23 male and 6 female) in portal 2. In total, the dataset consists of 54 video sequences and 64,204 labelled face images.

UMB database of 3D occluded faces

The University of Milano Bicocca 3D face database is a collection of multimodal (3D + 2D colour images) facial acquisitions. The database is available to universities and research centers interested in face detection, face recognition, face synthesis, etc. The UMB-DB has been acquired with a particular focus on facial occlusions, i.e. scarves, hats, hands, eyeglasses and other types of occlusion wich can occur in real-world scenarios.

VADANA: Vims Appearance Dataset for facial ANAlysis

The primary use of VADANA is for the problems of face verification and recognition across age progression. The main characteristics of VADANA, which distinguish it from current benchmarks, is the large number of intra-personal pairs (order of 168 thousand); natural variations in pose, expression and illumination; and the rich set of additional meta-data provided along with standard partitions for direct comparison and bench-marking efforts.

MORPH Database (Craniofacial Longitudinal Morphological Face Database)

The MORPH Longitudinal Database is the largest longitudinal facial recognition database in the world. It contains 202,038 unique images of 40,395 subjects. Subject ages range from 15 to 80 with a median age of 31.87 years. The average number of images per subject is 5 and the average time between consecutive photos is 251 days, with the minimum being 1 day and the maximum being 3,506 days (9 years, and 221 days). The standard deviation of days between images is 308. The maximum duration between images for a single subject is 3,976 days (10 years, and 326 days). The database is licensed for developmental, and commercial uses. Additional information about the MORPH Database can be found here. There is a separate version, Academic MORPH Database, available for academic uses. It is an abbreviated version, both in terms of images and associated metadata, that is available here.

Long Distance Heterogeneous Face Database (LDHF-DB)

LDHF database contains both visible (VIS) and near-infrared (NIR) face images at distances of 60 m, 100 m and 150 m outdoors and at a 1 m distance indoors. Face images of 100 subjects (70 males and 30 females) were captured; for each subject one image was captured at each distance in daytime and nighttime. All the images of individual subjects are frontal faces without glasses and collected in a single sitting.

PhotoFace: Face recognition using photometric stereo

This unique 3D face database is amongst the largest currently available, containing 3187 sessions of 453 subjects, captured in two recording periods of approximately six months each. The Photoface device was located in an unsupervised corridor allowing real-world and unconstrained capture. Each session comprises four differently lit colour photographs of the subject, from which surface normal and albedo estimations can be calculated (photometric stereo Matlab code implementation included). This allows for many testing scenarios and data fusion modalities. Eleven facial landmarks have been manually located on each session for alignment purposes. Additionally, the Photoface Query Tool is supplied (implemented in Matlab), which allows for subsets of the database to be extracted according to selected metadata e.g. gender, facial hair, pose, expression.

The EURECOM Kinect Face Dataset (EURECOM KFD)

The Dataset consists of multimodal facial images of 52 people (14 females, 38 males) acquired with a Kinect sensor. The data is captured in two sessions at different intervals (of about two weeks). In each session, 9 facial images are collected from each person according to different facial expressions, lighting and occlusion conditions: neutral, smile, open mouth, left profile, right profile, occluded eyes, occluded mouth, side occlusion with a sheet of paper and light on. An RGB color image, a depth map (provided both as a bitmap depth image and a text file containing the original depth levels sensed by Kinect) as well as the associated 3D data are provided for all samples. In addition, the dataset includes 6 manually labeled landmark positions for every face: left eye, right eye, tip of the nose, left side of mouth, right side of mouth and the chin. Other information, such as gender, year of birth, ethnicity, glasses (whether a person wears glasses or not) and the time of each session are also available.

YouTube Faces Database

The data set contains 3,425 videos of 1,595 different people. All the videos were downloaded from YouTube. An average of 2.15 videos are available for each subject. The shortest clip duration is 48 frames, the longest clip is 6,070 frames, and the average length of a video clip is 181.3 frames. In designing our video data set and benchmarks we follow the example of the 'Labeled Faces in the Wild' LFW image collection. Specifically, our goal is to produce a large scale collection of videos along with labels indicating the identities of a person appearing in each video. In addition, we publish benchmark tests, intended to measure the performance of video pair-matching techniques on these videos. Finally, we provide descriptor encodings for the faces appearing in these videos, using well established descriptor methods.

YMU (YouTube Makeup) Dataset

The dataset consists of 151 subjects, specifically Caucasian females, from YouTube makeup tutorials. Images of the subjects before and after the application of makeup were captured. There are four shots per subject: two shots before the application of makeup and two shots after the application of makeup. For a few subjects, three shots each before and after the application of makeup were obtained. The makeup in these face images varies from subtle to heavy. The cosmetic alteration is mainly in the ocular area, where the eyes have been accentuated by diverse eye makeup products. Additional changes are on the quality of the skin due to the application of foundation and change in lip color. This dataset includes some variations in expression and pose. The illumination condition is reasonably constant over multiple shots of the same subject. In few cases, the hair style before and after makeup changes drastically.

VMU (Virtual Makeup) Dataset

The VMU dataset was assembled by synthetically adding makeup to 51 female Caucasian subjects in the FRGC dataset. We added makeup by using a publicly available tool from Taaz. Three virtual makeovers were created: (a) application of lipstick only; (b) application of eye makeup only; and (c) application of a full makeup consisting of lipstick, foundation, blush and eye makeup. Hence, the assembled dataset contains four images per subject: one before-makeup shot and three aftermakeup shots.

MIW (Makeup in the "wild") Dataset

The MIW dataset contains 125 subjects with 1-2 images per subject. Total number of images is 154 (77 with makeup and 77 without makeup). The images are obtained from the internet and the faces are unconstrained.

3D Mask Attack Database (3DMAD)

The 3D Mask Attack Database (3DMAD) is a biometric (face) spoofing database. It currently contains 76500 frames of 17 persons, recorded using Kinect for both real-access and spoofing attacks. Each frame consists of: (1) a depth image (640x480 pixels – 1x11 bits); (2) the corresponding RGB image (640x480 pixels – 3x8 bits); (3) manually annotated eye positions (with respect to the RGB image). The data is collected in 3 different sessions for all subjects and for each session 5 videos of 300 frames are captured. The recordings are done under controlled conditions, with frontal-view and neutral expression. The first two sessions are dedicated to the real access samples, in which subjects are recorded with a time delay of ~2 weeks between the acquisitions. In the third session, 3D mask attacks are captured by a single operator (attacker). If you use this database please cite this publication: N. Erdogmus and S. Marcel. "Spoofing in 2D Face Recognition with 3D Masks and Anti-spoofing with Kinect", in IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS), 2013. Source code to reproduce experiments in the paper: https://pypi.python.org/pypi/maskattack.lbp

Senthilkumar Face Database (Version 1.0)

The Senthilkumar Face Database contains 80 grayscale face images of 5 people (all are men), including frontal views of faces with different facial expressions, occlusions and brightness conditions. Each person has 16 different images. The face portion of the image is manually cropped to 140x188 pixels and then it is normalized. Facial images are available in both grayscale and colour images.

McGill Real-world Face Video Database

This database contains 18000 video frames of 640x480 resolution from 60 video sequences, each of which recorded from a different subject (31 female and 29 male). Each video was collected in a different environment (indoor or outdoor) resulting arbitrary illumination conditions and background clutter. Furthermore, the subjects were completely free in their movements, leading to arbitrary face scales, arbitrary facial expressions, head pose (in yaw, pitch and roll), motion blur, and local or global occlusions.

SiblingsDB Database

The SiblingsDB contains two different datasets depicting images of individuals related by sibling relationships. The first, called HQfaces, contains a set of high quality images depicting 184 individuals (92 pairs of siblings). A subset of 79 pairs contains profile images as well, and 56 of them have also smiling frontal and profile pictures. All the images are annotated with, respectively, the position of 76 landmarks on frontal images and 12 landmarks on profile images. For each individual the information on sex, birth date, age (the highest and average age differences between siblings are 30 and 4.6 years, respectively) and votes of the panel of human raters (who were asked to evaluate if the couples depict siblings or not) are also available. The second DB, called LQfaces, contains contains 98 pairs of siblings (196 individuals) found over the Internet, where most of the subjects are celebrities. The position of the 76 frontal facial landmarks are provided as well, but this dataset does not include the age information and human expert ratings were not collected since this dataset is composed mainly of well-known personages and, hence, likely to produce biased ratings.

The Adience image set and benchmark of unfiltered faces for age, gender and subject classification

The dataset consists of 26,580 images, portraying 2,284 individuals, classified for 8 age groups, gender and including subject labels (identity). It is unique in its construction: The sources of the images included in this set are Flickr albums, assembled by automatic upload from iPhone5 or later smartphone devices, and released by their authors to the general public under the Creative Commons (CC) license. This constitutes the largest, fully unconstrained collection of images for age, gender and subject recognition.

FaceScrub - A Dataset With Over 100,000 Face Images of 530 People

Large face datasets are important for advancing face recognition research, but they are tedious to build, because a lot of work has to go into cleaning the huge amount of raw data. To facilitate this task, we developed an approach to building face datasets that detects faces in images returned from searches for public figures on the Internet, followed by automatically discarding those not belonging to each queried person. The FaceScrub dataset was created using this approach, followed by manually checking and cleaning the results. It comprises a total of 107,818 face images of 530 celebrities, with about 200 images per person. As such, it is one of the largest public face databases.

LFW3D and Adience3D sets

Frontalization is the process of synthesizing frontal facing views of faces appearing in single unconstrained photos. Recent reports have suggested that this process may substantially boost the performance of face recognition systems. This, by transforming the challenging problem of recognizing faces viewed from unconstrained viewpoints to the easier problem of recognizing faces in constrained, forward facing poses. Authors provide frontalized versions of both the widely used Labeled Faces in the Wild set (LFW) for face identity verification and the Adience collection for age and gender classification. These sets, (LFW3D and Adience3D) are made available along with our implementation of the method used for the frontalization.

Indian Movie Face database (IMFDB)

Indian Movie Face database (IMFDB) is a large unconstrained face database consisting of 34512 images of 100 Indian actors collected from more than 100 videos. All the images are manually selected and cropped from the video frames resulting in a high degree of variability interms of scale, pose, expression, illumination, age, resolution, occlusion, and makeup. IMFDB is the first face database that provides a detailed annotation of every image in terms of age, pose, gender, expression and type of occlusion that may help other face related applications.

Labeled Wikipedia Faces (LWF)

The goal of this project is to mine facial images and other important information for the Wikipedia Living People category. Currently, there are over 0.5 million biographic entries, and the number continues to grow. Unlike other data sets, such as the Labeled Faces in the Wild (LFW) and PubFig, the Labeled Wikipedia Faces (LWF) comes from Wikipedia, which is a creative common resource. In addition to these faces, useful meta data are released: the source images, image captions (if available), and person name detection results (through a named entity detector). So, mining experiments can also be performed. This is an unique property of this benchmark compared to others. The Labeled Wikipedia Faces (LWF) is a dataset with 8.5k faces for about 1.5k identities.

10k US Adult Faces Database

It is a database of 10,168 natural face photographs of all different individuals, and major celebrities removed. This database was made by randomly sampling Google Images for randomly generated names based on name distributions in the 1990 US Census. Because of this methodology, the distribution of the faces matches the demographic distribution of the US (e.g., age, race, gender). The database also has a wide range of faces in terms of attractiveness and emotion. Ovals surround each face to eliminate any background effects. Additionally, for a random set of 2,222 of the faces, we have demographic information, attribute scores (attractiveness, distinctiveness, perceived personality, etc), and memorability scores included with the images, to help researchers create their own stimulus sets.

Denver Intensity of Spontaneous Facial Action (DISFA) Database

Denver Intensity of Spontaneous Facial Action (DISFA) Database is a non-posed facial expression database for those who are interested in developing computer algorithms for automatic action unit detection and their intensities described by FACS. This database contains stereo videos of 27 adult subjects (12 females and 15 males) with different ethnicities. The images were acquired using PtGrey stereo imaging system at high resolution (1024×768). The intensity of AU’s (0-5 scale) for all video frames were manually scored by two human FACS experts. The database also includes 66 facial landmark points of each image in the database.

BU-3DFE Database (Static Data)

BU-3DFE (Binghamton University 3D Facial Expression) includes 100 subjects with 2,500 facial expression models. The BU-3DFE database is available to the research community (e.g., areas of interest come from as diverse as affective computing, computer vision, human computer interaction, security, biomedicine, law-enforcement, and psychology). The database contains 100 subjects (56% female, 44% male), ranging age from 18 years to 70 years old, with a variety of ethnic/racial ancestries, including White, Black, East-Asian, Middle-east Asian, Indian, and Hispanic Latino.

BU-4DFE Database (Dynamic Data)

To analyze the facial behavior from a static 3D space to a dynamic 3D space, BU-3DFE Database is extended and a new database is formed: BU-4DFE (3D + time): A 3D Dynamic Facial Expression Database. A newly created high-resolution 3D dynamic facial expression database are presented, which is made available to the scientific research community. The 3D facial expressions are captured at a video rate (25 frames per second). For each subject, there are six model sequences showing six prototypic facial expressions (anger, disgust, happiness, fear, sadness, and surprise), respectively. Each expression sequence contains about 100 frames. The database contains 606 3D facial expression sequences captured from 101 subjects, with a total of approximately 60,600 frame models. Each 3D model of a 3D video sequence has the resolution of approximately 35,000 vertices. The texture video has a resolution of about 1040×1329 pixels per frame. The resulting database consists of 58 female and 43 male subjects, with a variety of ethnic/racial ancestries, including Asian, Black, Hispanic/Latino, and White.

BP4D-Spontanous Database

Because posed and un-posed (aka “spontaneous”) 3D facial expressions differ along several dimensions including complexity and timing, well-annotated 3D video of un-posed facial behavior is needed. Therefore, newly developed 3D video database of spontaneous facial expressions in a diverse group of young adults is introduced - BP4D-Spontanous: Binghamton-Pittsburgh 3D Dynamic Spontaneous Facial Expression Database. Well-validated emotion inductions were used to elicit expressions of emotion and paralinguistic communication. Frame-level ground-truth for facial actions was obtained using the Facial Action Coding System. Facial features were tracked in both 2D and 3D domains using both person-specific and generic approaches. The work promotes the exploration of 3D spatiotemporal features in subtle facial expression, better understanding of the relation between pose and motion dynamics in facial action units, and deeper understanding of naturally occurring facial action. The database includes 41 participants (23 women, 18 men). They were 18-29 years of age; 11 were Asian, 6 were African-American, 4 were Hispanic, and 20 were Euro-American. An emotion elicitation protocol was designed to elicit emotions of participants effectively. Eight tasks were covered with an interview process and a series of activities to elicit eight emotions. The database is structured by participants. Each participant is associated with 8 tasks. For each task, there are both 3D and 2D videos. As well, the metadata include manually annotated action units (FACS AU), automatically tracked head pose and 2D/3D facial landmarks. The database is in the size of about 2.6 TB (without compression).

DMCSv1 Database - Multimodal Biometric Database of 3D Face and Hand Scans

The DMCSv1 database is a multimodal biometric database (DMCS stands for Department of Microelectronics and Computer Science and v1 indicates version number of the database). The database contains 3D face and hand scans. It was acquired using the structured light technology. According to our knowledge it is the first publicly available database where both sides of a hand were captured within one scan.

CAFE - The Child Affective Face Set

Although there is a large amount of research examining the perception of emotional facial expressions, almost all of this research has focused on the perception of adult facial expressions. There are several excellent stimulus sets of adult facial expressions that can be easily obtained and used in scientific research (i.e., NimStim, Ekman faces). However, there is no complete stimulus set of child affective facial expressions, and thus research on the perception of children making affective facial expression is sparse. In order to fully understand how humans respond to and process affective facial expressions, it is important to have this understanding across a variety of means. The Child Affective Facial Expressions Set (CAFE) is the first attempt to create a large and representative set of children making a variety of affective facial expressions that can be used for scientific research in this area. The set is made up of 1200 photographs of over 100 child models (ages 2-8) making 7 different facial expressions - happy, angry, sad, fearful, surprise, neutral, and disgust.

UFI - Unconstrained Facial Images

Unconstrained Facial Images (UFI) is a novel real-world database that contains images extracted from real photographs acquired by reporters of the Czech News Agency (ČTK). It is mainly intended to be used for benchmarking of the face identification methods, however it is possible to use this corpus in many related tasks (e.g. face detection, verification, etc.). Two different partitions of the database are available. The first one contains the cropped faces that were automatically extracted from the photographs using the Viola-Jones algorithm. The face size is thus almost uniform and the images contain just a small portion of background. The images in the second partition have more background, the face size also significantly differs and the faces are not localized. The purpose of this set is to evaluate and compare complete face recognition systems where the face detection and extraction is included. Each photograph is annotated with the name of a person.

Senthil IRTT Face Database Version 1.1

This database contains IRTT (Institute of Road and Transport Technology) students of both colour and gray scale facial images. There are 317 facial images for 13 IRTT students. They are of same age factor around 23 to 24 years. The images along with background are captured by canon digital camera of 14.1 megapixels resolution. The actual size of cropped faces 550x780 and they are further resized to downscale factor 5. Out of 13, 12 male and one female. Each subject have variety of face expressions, little makeup, scarf, poses and hat also.

Senthil IRTT Face Database Version1.2

The database version 1.2 contains IRTT students of both colour and gray scale faces. There are 100 facial images for 10 IRTT girl students (all are female) with 10 faces per subject with age factor around 23 to 24 years. The colour images along with background are captured with a pixel resolution of 480x640 and their faces are cropped to 100x100 pixels.

Senthil IRTT Video Face Database 1.0

This IRTT student video database contains one video in .mp4 format. Later more videos will be included in this database. The video duration is 55.938 seconds and contains 30 frames with resolution of 720x1280. This video is captured by smart phone. The faces and other features like eyes, lips and nose are extracted from this video separately.

VT-AAST Bench-marking Dataset

Virginia Tech - Arab Academy for Science & Technology (VT-AAST) Bench-marking Dataset is a color face image database for benchmarking of automatic face detection algorithms and human skin segmentation techniques. It is named the VT-AAST image database, and is divided into four parts. Part one is a set of 286 color photographs that include a total of 1027 faces in the original format given by our digital cameras, offering a wide range of difference in orientation, pose, environment, illumination, facial expression and race. Part two contains the same set in a different file format. The third part is a set of corresponding image files that contain human colored skin regions resulting from a manual segmentation procedure. The fourth part of the database has the same regions converted into grayscale. The database is available on-line for noncommercial use.

SEAS-FR-DB (School of Engineering & Applied Science - Face Video Database)

The database is designed for providing high-quality HD multi-subject banchmarked video inputs for face recognition algorithms. The database is a useful input for offline as well as online (Real-Time) Video scenarios. The database has primarily three classified subjects (3) and two not-classified (unknown/un-labeled) subjects available in 30 fps - High Definition Video (Full HD - 1080p) video developed at School of Engineering & Applied Science, Ahmedabad University in 2016.

IIIT - Cartoon Faces in the Wild

The IIIT-CFW is database for the cartoon faces in the wild. It is harvested from Google image search. Query words such as Obama + cartoon, Modi + cartoon, and so on were used to collect cartoon images of 100 public figures. The dataset contains 8928 annotated cartoon faces of famous personalities of the world with varying profession. Additionally, we also provide 1000 real faces of the public figure to study cross modal retrieval tasks, such as, Photo2Cartoon retrieval. The IIIT-CFW can be used for the study spectrum of problems, such as, face synthesis, heterogeneous face recognition, cross modal retrieval, etc. (Please use this database only for the academic research purpose)

Facial Expression Research Group Database (FERG-DB)

Facial Expression Research Group Database (FERG-DB) is a database of stylized characters with annotated facial expressions. The database contains multiple face images of six stylized characters. The characters were modelled using the MAYA software and rendered out in 2D to create the images. The database contains facial expression images of six stylized characters. The images for each character is grouped into seven types of expressions - anger, disgust, fear, joy, neutral, sadness and surprise.

Large Age-Gap Database (LAG)

Large Age-Gap (LAG) dataset is a dataset containing variations of age in the wild, with images ranging from child/young to adult/old. The dataset contains 3,828 images of 1,010 celebrities. For each identity at least one child/young image and one adult/old image are present.

Specs on Faces (SoF) Dataset

The SoF dataset is a collection of 42,592 (2,662×16) images for 112 persons (66 males and 46 females) who wear glasses under different illumination conditions. The dataset is FREE for reasonable academic fair use. The dataset presents a new challenge regarding face detection and recognition. It is devoted to two problems that affect face detection, recognition, and classification, which are harsh illumination environments and face occlusions. The glasses are the common natural occlusion in all images of the dataset. However, the glasses are not the sole facial occlusion in the dataset; there are two synthetic occlusions (nose and mouth) added to each image. Moreover, three image filters, that may evade face detectors and facial recognition systems, were applied to each image. All generated images are categorized into three levels of difficulty (easy, medium, and hard). That enlarges the number of images to be 42,592 images (26,112 male images and 16,480 female images). Furthermore, the dataset comes with a metadata that describes each subject from different aspects. The original images (without filters or synthetic occlusions) were captured in different countries over a long period. // Usage: 1 - Gender classification; 2 - Face detection; 3 - Facial landmark estimation; 4 - Emotion Recognition; 5 - Eyeglasses detection; 6 - Age classification.

CyberExtruder Ultimate Face Matching Data Set

The CyberExtruder Ultimate Face Matching Data Set contains 10,205 images of 1000 people scraped from the internet. The data set is unrestricted, as such, it contains large pose, lighting, expression, race and age variation. It also contains images which are artistic impressions (drawings, paintings etc.) and a multitude of occlusion (hats, glasses, makeup). All images have size 600 x 600 pixels and are stored with jpeg compression.

The IST-EURECOM Light Field Face Database

The IST-EURECOM Light Field Face Database includes data from 100 subjects, captured by a Lytro ILLUM camera in two 1-6 months separated sessions, with 20 samples per each person per session. To simulate multiple scenarios, the images are captured with several facial variations, covering a range of emotions, actions, poses, illuminations, and occlusions. The database includes the raw light field images, 2D rendered images and associated depth maps, along with a rich set of metadata. The first part of the database, captured at Instituto de Telecomunicações - Instituto Superior Técnico, Lisbon, Portugal can be accessed at http://www.img.lx.it.pt/LFFD/. The second part, captured at EURECOM, SophiaTech Campus, Nice, France can be accessed at http://lffd.eurecom.fr/

The Makeup Induced Face Spoofing (MIFS) dataset

The Makeup Induced Face Spoofing (MIFS) dataset consists of 107 makeup-transformations taken from random YouTube makeup video tutorials. Each subject is attempting to spoof a target identity. Hence this dataset consists of three sets of face images: images of a subject before makeup; images of the same subject after makeup with the intention of spoofing; and images of the target subject who is being spoofed.

RAVDESS: Ryerson Audio-Visual Database of Emotional Speech and Song

The RAVDESS is a validated multimodal database of emotional speech and song. The database is gender balanced consisting of 24 professional actors, vocalizing lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity, with an additional neutral expression. All conditions are available in face-and-voice, face-only, and voice-only formats. The set of 7356 recordings were rated by 319 adult participants. High levels of emotional validity and test-retest intrarater reliability were reported, as described in our PLoS One paper. All recordings are made freely available under a Creative Commons license, non-commerical license.

Disguised Faces in the Wild

Face recognition research community has prepared several large-scale datasets captured in uncontrolled scenarios for performing face recognition. However, none of these focus on the specific challenge of face recognition under the disguise covariate. The Disguised Faces in the Wild (DFW) dataset has been prepared in order to address these limitations. The proposed DFW dataset consists of 11,157 images of 1,000 subjects. The dataset contains a broad set of unconstrained disguised faces, taken from the Internet. The dataset encompasses several disguise variations with respect to hairstyles, beard, mustache, glasses, make-up, caps, hats, turbans, veils, masquerades and ball masks. This is coupled with other variations with respect to pose, lighting, expression, background, ethnicity, age, gender, clothing, hairstyles, and camera quality, thereby making the dataset challenging for the task of face recognition. The paper describing the database and the protocols is available here.

BAUM-1: Bahcesehir University Multimodal Face Database of Spontaneous Affective and Mental States

In affective computing applications, access to labeled spontaneous affective data is essential for testing the designed algorithms under naturalistic and challenging conditions. Most databases available today are acted or do not contain audio data. BAUM-1 is a spontaneous audio-visual affective face database of affective and mental states. The video clips in the database are obtained by recording the subjects from the frontal view using a stereo camera and from the half-profile view using a mono camera. The subjects are first shown a sequence of images and short video clips, which are not only meticulously fashioned but also timed to evoke a set of emotions and mental states. Then, they express their ideas and feelings about the images and video clips they have watched in an unscripted and unguided way in Turkish. The target emotions, include the six basic ones (happiness, anger, sadness, disgust, fear, surprise) as well as boredom and contempt. We also target several mental states, which are unsure (including confused, undecided), thinking, concentrating, and bothered. Baseline experimental results on the BAUM-1 database show that recognition of affective and mental states under naturalistic conditions is quite challenging. The database is expected to enable further research on audio-visual affect and mental state recognition under close-to-real scenarios.

NMAPS - NMAmit Photo Sketch database

NMAPS is a database of human face images and their corresponding sketches generated using a novel approach implemented using Matlab tool. Database contains 208 South Indian images gathered at the computer science research lab of NMAM Institute of Technology, Nitte. Images were taken under the random lighting conditions and environment with varying background and quality. Images captured under the varying conditions and quality mimic the real-world conditions and enables the researches to try out robust algorithms testing in the area of sketch generation and matching. This database is an unique contribution in the field of forensic science research as it contains the photo-sketch data-sets of South Indian people.

EURECOM Visible and Thermal paired Face database

The database was collected from 50 subjects of different age, sex and ethnicity, resulting a total of 2100 images. The camera (camera FLIR Duo R by FLIR Systems) was designed to take simultaneous face images of thermal and visible spectra. Variations include Expression, Pose, Occlusion and Illumination.

VIP_attribute Dataset

Images in the VIP_attribute dataset are obtained in 2017 from the WWW corresponding to 513 female and 513 male subjects (mainly actors, singers and athletes). The images include the frontal pose of the subjects. Co-variates include illumination, expression, image quality and resolution. Further challenging in this dataset are beautification (e.g., photoshop) of the images, as well as the presence of makeup, plastic surgery, beard and mustache. We obtained annotations related to te subjects' body weight and height from websites such as www.celebheights.com, www.howtallis.org and celebsize.com.

Indian Semi Acted Facial Expression Database (iSAFE)

iSAFE is one of it's kind database for human emotions. Human emotion recognition is of par importance for human computer interaction. Dataset and it's quality plays important role in this domain. The dataset contains 395 clips of 44 volunteers between 17 to 22 year of age. All the clips are manually splitted from the video recorded during stimulent clips are watched by volunteers. Facial expressions are self annotated by the volunteers as well as cross annotated by annotators. Analysis of the dataset is done using Resnet34 neural network and baseline for the dataset is provided for research and comparison. The dataset is described in this paper.

Grammatical Facial Expressions Data Set

The automated analysis of facial expressions has been widely used in different research areas, such as biometrics or emotional analysis. Special importance is attached to facial expressions in the area of sign language, since they help to form the grammatical structure of the language and allow for the creation of language disambiguation, and thus are called Grammatical Facial Expressions. This dataset was already used in the experiments described in Freitas et al. (2014). The dataset is composed by eighteen videos recorded using Microsoft Kinect sensor. In each video, a user performs (five times), in front of the sensor, five sentences in Libras (Brazilian Sign Language) that require the use of a grammatical facial expression. By using Microsoft Kinect, we have obtained: (a) a image of each frame, identified by a timestamp; (b) a text file containing one hundred coordinates (x, y, z) of points from eyes, nose, eyebrows, face contour and iris; each line in the file corresponds to points extracted from one frame. The images enabled a manual labeling of each file by a specialist, providing a ground truth for classification. The dataset is organized in 36 files: 18 datapoint files and 18 target files, one pair for each video which compose the dataset.The name of the file refers to each video: the letter corresponding to the user (A and B), name of grammatical facial expression and a specification (target or datapoints).

Sejong Face Database: A Multi-modal disguised face database (SFD)

The database contains images in visible, infrared, visible-plus-infrared, and thermal modalities. A total of 100 subjects, 60 male and 40 female, with various facial disguise add-ons. The database contains images with natural face, real beard, cap, scarf, glasses, mask, makeup, wig, fake beard, fake mustache, and their variations. Each facial addon is captured with multiple variations in pose. The database contains a total of 24,500 facial images. For further information about Sejong Face Database see here.

Indian Institute of Science Indian Face Dataset (IISCIFD)

Faces form the basis for a rich variety of judgments in humans, yet the underlying features remain poorly understood. Fine-grained distinctions within a race might more strongly constrain the possible features but are relatively less studied. Fine-grained race classification is also interesting because even humans may not be perfectly accurate on these tasks. This offers a unique opportunity to compare errors made by humans and machines, in contrast to standard object detection tasks where human performance is nearly perfect. We have benchmarked North Vs South categorization on close to 130 humans and machines performance on a novel database of close to 1650 diverse Indian faces labeled for fine-grained race (South vs North India) as well as for age, weight, height, and gender. We have explored a diverse set of features including part shape and configuration information, non-CNN features such as HOG and LBP as well as various relevant CNNs. The paper describing the database and the protocols is available here.