Existing face recognition algorithms achieve high recognition performance for frontal face images with good illumination and close proximity to the imaging device. However, most of the existing algorithms fail to perform equally well in surveillance scenarios, where videos are captured across varying resolutions and spectra. In surveillance settings, cameras are usually placed far away from the subjects, thereby resulting in variations across pose, illumination, occlusion, and resolution. Current video datasets used for face recognition are often captured in constrained environments, and thus fail to simulate the real world scenarios.
To this end, we present the proposed database featuring 252 subjects in 460 videos. The proposed dataset contains over 142K face images, spread across videos captured in both Visible and Near-Infrared spectra. Each video contains a group of individuals walking from 36ft towards the imaging device, offering a plethora of challenges common to surveillance settings. Benchmark experimental protocol and baseline results have been reported with state-of-the-art algorithms for face detection and recognition. It is our assertion that availability of such a challenging database will facilitate the research community to develop more robust face recognition systems relevant to real world surveillance scenarios.
The division of data in this crowd scenario where multiple subjects are present in the same video and a subject has given more than one videos, the division of training and testing data is tricky. In order to ensure mutual exclusivity between the training and testing data, the entire dataset is modelled as a graph with various connected components, where a connected component corresponds to a group of people who gave a video together → directly or indirectly. Then one entire connected component either goes in the training set or in the testing set, thereby ensuring that no subject occurs both in the training and testing set. The visualed data can be seen below with 43 connected components.
The number of subjects in the training set is 200, while the number of subjects in the testing set is 52. All videos are named in the following format: 'Time_LocationID_VideoID_SubjectID1...SubjectIDn'. Here, time refers to the time of the day the video was captured and may take two values, N or D. LocationID corresponds to the location at which the video was captured. It can take one of four values: S1, S2, S3 or S4. Here S1 and S2 refer to the sessions recorded in the day-time locations, while S3 and S4 refer to sessions recorded in the night-time locations. VideoID corresponds to a unique ID given to each video of a location and SubjectID corresponds to a unique ID given to each subject. For example, consider the video name N_S4_V_28_67_0, where N corresponds to a night-time video and S4 denotes that the video was captured in the fourth location. V28 denotes that video’s unique ID and the remaining number(s) denote the subject IDs which are present in the video. Subject ID 0 corresponds to subjects belonging to the open-set. This nomenclature ensures that every video obtains a unique and informative name. The high resolution still images have been named as SubjectID 1, SubjectID 2 and SubjectID 3 for each subject.
The dataset also includes annotated frames containing a bounding box for every face in each frame (about 1.42 lakh faces), following the nomenclature described above. Along with the loose cropped face images, each subject’s three high resolution still images are also part of the release. A small section of non-overlapping videos acquired under the same setup are also provided as a training set for learning based experiments.
The database can be downloaded from the following link.
FaceSurv Database (4.71 GB) (CRC-32: D8285DB0 , MD5: 336CBD48A21CD3A0497397907EFB4ADD )