The aim of this study is to validate a set of deep learning algorithms that detect various abnormalities from Chest X-rays, gold standard being the consensus of three independent radiologist reads.
A dataset of about 1.2 million X-rays was retrospectively collected from various centres in India. 75000 Chest X-rays (QXR - 75k) were set aside for testing purposes and the rest are used for development.
For this validation study, a dataset of 2000 (QXR - 2k) Chest X-rays were collected from centres (that did not contribute to our training/testing dataset) in two batches B1 and B2.
B1 is randomly sampled from xrays collected in a specific time period, B2 is enriched with xrays containing various abnormalities.
For each scan, Yes/No ground truths for each abnormality were established as the majority decision of three radiologist reads(two independent radiologist reads as part of the study and one from the original clinical radiology report). Areas under receiver operating characteristics curves (AUCs) were used to evaluate the individual abnormality detection algorithms.
|Blunted Costophrenic Angle||0.9372
This study demonstrates that deep learning algorithms can detect a large number of significant abnormalities on Chest Xrays with high AUCs.
Very good AUCs on external independent validation show that deep learning algorithms can potentially be used in either a screening/traging setup or as a second read for quality control.
To assess sensitivity and specificity of Artificial Intelligence (AI) for detecting and classifying pulmonary, pleural and cardiac abnormalities on frontal chest radiographs
We processed 374 de-identified frontal chest radiographs of adult patients with an AI algorithm (Qure AI) trained on 1,150,084 X-rays and tested separately on 73,908 X-Rays. Seperate scored (from 0 to 1) and prediction statistics (0 for score of 0 to 0.5 and 1 for 0.6 to 1) from AI were generated and recorded for presence of pulmonary opacities (O), pleural effusions (EF), hilar prominence (HP) and cardiomegaly(C). AI generated duplicate image sets with annotated abnormalities (heat maps). To establish standard of reference (SOR), two thoracic radiologists assessed all radiographs for these abnormalities; all disagreements were resolved in consensus with another radiologist. The score of (0 and 1) were given for absence or presence of finding. Two other radiologists unaware of SOR and AI findings, independently assessed presence of abnormalities on each chest radiograph (test radiologists). Descriptive statistics and ROC analysis were performed to determine accuracies of AI and test radradiologists
About 29% (109/374) had no findings according to SOR; single and multiple abnormalities were seen in 28% (105/374) and 71% (265/374) of radiographs. True positives were 77 A(C), 82 (EF), 211 (O) and 93 (HP). There was no statistical difference between AI and SOR for all abnormalities (p = 0.2-0.8).
|Abnormality||AI vs SOR
||Radiologists vs SOR
|| AI vs Radiologists
The Qure AI algorithm is accurate for detection of abnormalities on chest radiographs and help radiologists detect these abnormalities. Overlying implanted devices leads to false positive interpretation by AI
With caveats, AI algorithm can help radiologists in interpretation of multiple abnormalities on Chest X-Ray
To determine whether deep learning algorithms can detect abnormalities on chest X-rays (CXR) before they are visible to radiologists.
We trained deep learning models to identify abnormal X-rays and CXR opacities using a set of 1,150,084 chest X-Rays.
We used a retrospectively obtained independent set of de-identified chest X-rays from patients who had undergone a chest CT scan within 1 day (TS-1, n=187), 3 days (TS-3, n=197) and 10 days (TS-10, n=230) of the X-ray to evaluate the algorithms’ ability to detect abnormalities that were not visible to the radiologist at the time of reporting on the X-ray. Natural language processing algorithms were used to establish ground truth from radiologist reports of the CT scans, on 2 parameters - 'any abnormality' and 'hyperdense abnormality (HA)' - defined as any abnormal focal or diffuse hyperdense abnormality in the lung fields including but not limited to nodule, mass, fibrosis and calcification.
The CT scans were used as ground truth to evaluate the accuracy of the original CXR report and the deep learning algorithms.
Of 187 CT scans in TS-1, 153 contained an HA. 52 of these (34%) had been picked up on the original CXR by the reporting radiologist, and 63 of these (41%) were picked up by the deep learning algorithm. Of 180 abnormal scans in TS-1, 106 (59%) had been picked up as abnormal on the original CXR by the reporting radiologist, and 120 of these (67%) were picked up by the deep learning algorithm. Similar results were observed on TS-3 and TS-10.
|Hyperdense Abnormality vs AI||0.49
|Hyperdense Abnormality vs Radiologist||0.44
|All Abnormalities vs AI||0.67
|All Abnormalities vs Radiologist||0.59
Deep learning algorithms can pick up abnormalities that have been missed on chest X-rays but identified on a subsequent chest CT.
Using deep learning algorithms to screen chest X-rays could result in higher sensitivity at identifying abnormal scans than currently possible, with only a small corresponding increase in the number of false positives.
Collaboration with clinicians helps us in carrying out further research in Chest X-Rays. Most of our research is in partnership with radiologists across the globe.
If you are interested, please reach out to us at email@example.com