Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Breast cancer detection accuracy of AI in an entire screening population : a retrospective, multicentre study

Elhakim, Mohammad Talal ; Stougaard, Sarah Wordenskjold ; Graumann, Ole ; Nielsen, Mads ; Lång, Kristina LU ; Gerke, Oke ; Larsen, Lisbet Brønsro and Rasmussen, Benjamin Schnack Brandt (2023) In Cancer Imaging 23(1).
Abstract

Background: Artificial intelligence (AI) systems are proposed as a replacement of the first reader in double reading within mammography screening. We aimed to assess cancer detection accuracy of an AI system in a Danish screening population. Methods: We retrieved a consecutive screening cohort from the Region of Southern Denmark including all participating women between Aug 4, 2014, and August 15, 2018. Screening mammograms were processed by a commercial AI system and detection accuracy was evaluated in two scenarios, Standalone AI and AI-integrated screening replacing first reader, with first reader and double reading with arbitration (combined reading) as comparators, respectively. Two AI-score cut-off points were applied by matching... (More)

Background: Artificial intelligence (AI) systems are proposed as a replacement of the first reader in double reading within mammography screening. We aimed to assess cancer detection accuracy of an AI system in a Danish screening population. Methods: We retrieved a consecutive screening cohort from the Region of Southern Denmark including all participating women between Aug 4, 2014, and August 15, 2018. Screening mammograms were processed by a commercial AI system and detection accuracy was evaluated in two scenarios, Standalone AI and AI-integrated screening replacing first reader, with first reader and double reading with arbitration (combined reading) as comparators, respectively. Two AI-score cut-off points were applied by matching at mean first reader sensitivity (AIsens) and specificity (AIspec). Reference standard was histopathology-proven breast cancer or cancer-free follow-up within 24 months. Coprimary endpoints were sensitivity and specificity, and secondary endpoints were positive predictive value (PPV), negative predictive value (NPV), recall rate, and arbitration rate. Accuracy estimates were calculated using McNemar’s test or exact binomial test. Results: Out of 272,008 screening mammograms from 158,732 women, 257,671 (94.7%) with adequate image data were included in the final analyses. Sensitivity and specificity were 63.7% (95% CI 61.6%-65.8%) and 97.8% (97.7-97.8%) for first reader, and 73.9% (72.0-75.8%) and 97.9% (97.9-98.0%) for combined reading, respectively. Standalone AIsens showed a lower specificity (-1.3%) and PPV (-6.1%), and a higher recall rate (+ 1.3%) compared to first reader (p < 0.0001 for all), while Standalone AIspec had a lower sensitivity (-5.1%; p < 0.0001), PPV (-1.3%; p = 0.01) and NPV (-0.04%; p = 0.0002). Compared to combined reading, Integrated AIsens achieved higher sensitivity (+ 2.3%; p = 0.0004), but lower specificity (-0.6%) and PPV (-3.9%) as well as higher recall rate (+ 0.6%) and arbitration rate (+ 2.2%; p < 0.0001 for all). Integrated AIspec showed no significant difference in any outcome measures apart from a slightly higher arbitration rate (p < 0.0001). Subgroup analyses showed higher detection of interval cancers by Standalone AI and Integrated AI at both thresholds (p < 0.0001 for all) with a varying composition of detected cancers across multiple subgroups of tumour characteristics. Conclusions: Replacing first reader in double reading with an AI could be feasible but choosing an appropriate AI threshold is crucial to maintaining cancer detection accuracy and workload.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Artificial intelligence, Breast cancer, Deep learning, Double reading, Mammography screening
in
Cancer Imaging
volume
23
issue
1
article number
127
pages
13 pages
publisher
International Cancer Imaging Society
external identifiers
  • pmid:38124111
  • scopus:85180254657
ISSN
1470-7330
DOI
10.1186/s40644-023-00643-x
language
English
LU publication?
yes
additional info
Funding Information: We are grateful to the Region of Southern Denmark for the funding of this study. We thank ScreenPoint Medical for providing the AI system for this study. We are grateful to the Danish Clinical Quality Program – National Clinical Registries (RKKP), the Danish Breast Cancer Cooperative Group (DBCG) and the Danish Quality Database on Mammography Screening (DKMS) for the provision of data. We thank Henrik Johansen (Regional IT) for technical assistance and data management. We thank all supporting breast radiologists and mammography centres in the Region of Southern Denmark for contributing with their expertise and collaboration during the study conduct. We thank the women and patients for their participation. The authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policies, or view of the Region of Southern Denmark or any other collaborator. Funding Information: The study was funded through the Innovation Fund by the Region of Southern Denmark (grant number 10240300). The funder of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report. Publisher Copyright: © 2023, The Author(s).
id
1032766a-cf89-46ab-9ef4-e550c58ec2d9
date added to LUP
2023-12-30 00:38:38
date last changed
2024-04-14 05:45:35
@article{1032766a-cf89-46ab-9ef4-e550c58ec2d9,
  abstract     = {{<p>Background: Artificial intelligence (AI) systems are proposed as a replacement of the first reader in double reading within mammography screening. We aimed to assess cancer detection accuracy of an AI system in a Danish screening population. Methods: We retrieved a consecutive screening cohort from the Region of Southern Denmark including all participating women between Aug 4, 2014, and August 15, 2018. Screening mammograms were processed by a commercial AI system and detection accuracy was evaluated in two scenarios, Standalone AI and AI-integrated screening replacing first reader, with first reader and double reading with arbitration (combined reading) as comparators, respectively. Two AI-score cut-off points were applied by matching at mean first reader sensitivity (AI<sub>sens</sub>) and specificity (AI<sub>spec</sub>). Reference standard was histopathology-proven breast cancer or cancer-free follow-up within 24 months. Coprimary endpoints were sensitivity and specificity, and secondary endpoints were positive predictive value (PPV), negative predictive value (NPV), recall rate, and arbitration rate. Accuracy estimates were calculated using McNemar’s test or exact binomial test. Results: Out of 272,008 screening mammograms from 158,732 women, 257,671 (94.7%) with adequate image data were included in the final analyses. Sensitivity and specificity were 63.7% (95% CI 61.6%-65.8%) and 97.8% (97.7-97.8%) for first reader, and 73.9% (72.0-75.8%) and 97.9% (97.9-98.0%) for combined reading, respectively. Standalone AI<sub>sens</sub> showed a lower specificity (-1.3%) and PPV (-6.1%), and a higher recall rate (+ 1.3%) compared to first reader (p &lt; 0.0001 for all), while Standalone AI<sub>spec</sub> had a lower sensitivity (-5.1%; p &lt; 0.0001), PPV (-1.3%; p = 0.01) and NPV (-0.04%; p = 0.0002). Compared to combined reading, Integrated AI<sub>sens</sub> achieved higher sensitivity (+ 2.3%; p = 0.0004), but lower specificity (-0.6%) and PPV (-3.9%) as well as higher recall rate (+ 0.6%) and arbitration rate (+ 2.2%; p &lt; 0.0001 for all). Integrated AI<sub>spec</sub> showed no significant difference in any outcome measures apart from a slightly higher arbitration rate (p &lt; 0.0001). Subgroup analyses showed higher detection of interval cancers by Standalone AI and Integrated AI at both thresholds (p &lt; 0.0001 for all) with a varying composition of detected cancers across multiple subgroups of tumour characteristics. Conclusions: Replacing first reader in double reading with an AI could be feasible but choosing an appropriate AI threshold is crucial to maintaining cancer detection accuracy and workload.</p>}},
  author       = {{Elhakim, Mohammad Talal and Stougaard, Sarah Wordenskjold and Graumann, Ole and Nielsen, Mads and Lång, Kristina and Gerke, Oke and Larsen, Lisbet Brønsro and Rasmussen, Benjamin Schnack Brandt}},
  issn         = {{1470-7330}},
  keywords     = {{Artificial intelligence; Breast cancer; Deep learning; Double reading; Mammography screening}},
  language     = {{eng}},
  month        = {{12}},
  number       = {{1}},
  publisher    = {{International Cancer Imaging Society}},
  series       = {{Cancer Imaging}},
  title        = {{Breast cancer detection accuracy of AI in an entire screening population : a retrospective, multicentre study}},
  url          = {{http://dx.doi.org/10.1186/s40644-023-00643-x}},
  doi          = {{10.1186/s40644-023-00643-x}},
  volume       = {{23}},
  year         = {{2023}},
}