Dataset

We collected a structured ophthalmic database of 5,000 patients with age, color fundus photographs from left and right eyes and doctors' diagnostic keywords from doctors (in short, ODIR-5K). This dataset is ‘‘real-life’’ set of patient information collected by Shanggong Medical Technology Co., Ltd. from different hospitals/medical centers in China. In these institutions, fundus images are captured by various cameras in the market, such as Canon, Zeiss and Kowa, resulting into varied image resolutions. Patient identifying information will be removed. Annotations are labeled by trained human readers with quality control management. They classify patient into eight labels including normal (N), diabetes (D), glaucoma (G), cataract (C), AMD (A), hypertension (H), myopia (M) and other diseases/abnormalities (O) based on both eye images and additionally patient age. The publishing of this dataset follows the ethical and privacy rules of China. Table 1 shows one record from ODIR-5K dataset.  Note: In the testing round, the diagnostic keywords will not be provided.

Table 1. The first structured ophthalmic record in ODIR-5K database.  

The 5,000 patients in this challenge are divided into training, off-site testing and on-site testing subsets. Almost 4,000 cases are used in training stage while others are for testing stages (off-site and on-site). Table 2 shows the distribution of case number with respect to eight labels in different stages. Note: one patient may contains one or multiple labels.
Table 2. Proportion of images per category in training and testing datasets.


我们收集了一个结构化的眼科数据库,包括5,000名患者的年龄,双眼的彩色眼底照片和医生的诊断关键词(ODIR-5K)。该数据集是上工医疗技术有限公司从中国不同医院/医疗中心收集的“真实”患者信息。在这些机构中,眼底图像由市场上的各种相机捕获,例如Canon,Zeiss和Kowa,因此导致各种各样的图像分辨率。病人的识别信息会被移除。注释由经过培训的人类读者进行标记,并具有质量控制管理。他们将患者分为8个标签,包括正常(N),糖尿病(D),青光眼(G),白内障(C),AMD(A),高血压(H),近视(M)和其他疾病/异常(O)。该数据集的发布遵循中国的道德和隐私规则。表1显示了来自ODIR-5K数据集的一条记录。【注意: 在测试集中,不会提供诊断关键词。】

表1. ODIR-5K数据库中的第一条结构化眼科数据示例

该竞赛提供的的5,000名病人数据,分为训练,非现场测试和现场测试子集。 接近近4,000个病例用于训练阶段,而其他病例则用于测试阶段(场外和现场)。 表2展示在不同竞赛阶段的图像标签分布情况。【注意:同一个病人可能有一个或者多个标签。】

表2. 每个类别在训练和测试数据集中的图像比例