• databases@iab-rubric.org
  • IIT Jodhpur
CatScreen: A Large MultiModal Benchmark Dataset for Cataract Screening

Overview

CatScreen is a large multimodal benchmark dataset for cataract screening, designed to support the development of robust, clinically relevant, and accessible AI systems for slit-lamp based eye screening. The dataset comprises *18,640 slit-lamp images from 2,251 subjects*, acquired using a *portable Remidio Portable Slit Lamp PSL-D20* device. It is structured to reflect real-world screening variability in image quality, illumination, and acquisition conditions. The dataset is intended to support research in cataract detection, grading, robust learning, multimodal modeling, and clinically grounded interpretability.

 

Dataset Overview

CatScreen is organized into three complementary subsets:

  • ⏺ Clean Set: Contains 9,915 images from 1,271 participants and serves as the primary benchmark split for supervised learning. This subset includes structured annotations for clinically relevant attributes such as image quality, illumination type, diagnosis, cataract subtype, and severity grading.
  • ⏺ Noisy Set: Contains 3,267 images from 505 participants and is designed to reflect realistic label noise arising in practical screening settings.
  • ⏺ Unlabelled Set: Contains 5,481 images and is intended to support self-supervised, semi-supervised, and active learning research.

 

What the Dataset Contains

CatScreen goes beyond conventional slit-lamp image collections by combining image data with subject-level metadata and structured annotations. Each sample may include slit-lamp imagery together with metadata describing demographic factors, health conditions, and ocular history.  In addition, a subset of the dataset includes anatomical and pathological region annotations.

 

Benchmark Tasks Supported

CatScreen is designed as a benchmark for multiple clinically relevant tasks in cataract screening, including:

  • ⏺ Image Quality Assessment - classifying slit-lamp images into quality categories such as good, acceptable, or poor.
  • ⏺ Illumination Type Classification - identifying the illumination setup, including diffuse, direct focal, and retro-illumination.
  • ⏺ Diagnosis - classifying images into normal, cataract, or other ocular conditions.
  • ⏺ Cataract Subtype Identification - distinguishing among categories such as nuclear, cortical, posterior subcapsular, pseudophakia, and related cases.
  • ⏺ Severity Grading - grading cataract severity into clinically meaningful categories.

 

Sample Dataset

A sample snapshot of the dataset is currently available. Based on the present repository structure, the sample release is organized into three parts: *Clean, Noisy,* and *Unlabelled*. The Clean portion includes image data and split files for *train/validation/test*, the Noisy portion includes image data with noisy labels, and the Unlabelled portion includes unlabeled slit-lamp images.

 

Availability

The full CatScreen dataset will be released soon.

 

Sample Dataset: Dataset Link
Full Dataset Download: Coming Soon...