Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Performance across different versions of an artificial intelligence model for screen-reading of mammograms

Larsen, Marthe ; Lee, Christoph I. ; Bergan, Marie B. ; Holen, Åsne S. ; Lund-Hanssen, Håkon ; Hoff, Solveig R. ; Auensen, Steinar ; Nygård, Jan F. ; Lång, Kristina LU and Chen, Yan LU , et al. (2026) In European Radiology
Abstract

Objectives: Studies have reported promising results regarding artificial intelligence (AI) as a tool for improved mammographic screening interpretive performance. We analyzed AI malignancy risk scores from two versions of the same commercial AI model. Materials and methods: This retrospective cohort study used data from 117,709 screening examinations performed in BreastScreen Norway 2009–2018. The mammograms were processed by two versions of the commercially available AI model, Transpara (version 1.7 and 2.1). The distributions of exam-level risk scores (AI score 1–10) and risk categories were evaluated for both AI versions on all examinations, including 737 screen-detected and 200 interval cancers. Scores between 1–7 were categorized... (More)

Objectives: Studies have reported promising results regarding artificial intelligence (AI) as a tool for improved mammographic screening interpretive performance. We analyzed AI malignancy risk scores from two versions of the same commercial AI model. Materials and methods: This retrospective cohort study used data from 117,709 screening examinations performed in BreastScreen Norway 2009–2018. The mammograms were processed by two versions of the commercially available AI model, Transpara (version 1.7 and 2.1). The distributions of exam-level risk scores (AI score 1–10) and risk categories were evaluated for both AI versions on all examinations, including 737 screen-detected and 200 interval cancers. Scores between 1–7 were categorized as low risk, 8–9 as intermediate risk, and 10 as high risk of malignancy. Results: Area under the receiver operating curve was 0.908 (95% CI: 0.986–0.920) for version 1.7 and 0.928 (95% CI: 0.917–0.939) for 2.1 when screen-detected and interval cancers were considered as positive cases (p < 0.001). A total of 87.1% (642/737) and 93.5% (689/737) of the screen-detected cancers had an AI score of 10 with version 1.7 and 2.1, respectively. Among interval cancers, 45.0% (90/200) had AI score 10 with version 1.7 and 44.5% (89/200) had AI score 10 with version 2.1. Conclusion: A higher proportion of screen-detected breast cancers had the highest AI score of 10 with the newer version of the AI model compared to the older version. For interval cancers, there was no difference in the proportion of cases assigned to the highest score between the two versions. Key Points: Question Studies have reported promising results regarding the use of AI in mammography screening, but comparisons of updated versus older versions are less studied. Findings In our study, 87.1% (642/737) of the screen-detected cancers were classified with a high malignancy risk score by the old version, while it was 93.5% (689/737) for the newer version. Clinical relevance Understanding how version updates of AI models might impact screening mammography performance will be important for future quality assurance and validation of AI models.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; ; ; ; ; and , et al. (More)
; ; ; ; ; ; ; ; ; ; and (Less)
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Artificial intelligence, Breast cancer, Mammography, Screening
in
European Radiology
publisher
Springer Science and Business Media B.V.
external identifiers
  • pmid:41528472
  • scopus:105027912404
ISSN
0938-7994
DOI
10.1007/s00330-025-12240-6
language
English
LU publication?
yes
additional info
Publisher Copyright: © The Author(s) 2025.
id
3ed55003-15b6-49ce-96ff-d80cf04c4dbe
date added to LUP
2026-02-04 11:43:36
date last changed
2026-02-04 12:52:49
@article{3ed55003-15b6-49ce-96ff-d80cf04c4dbe,
  abstract     = {{<p>Objectives: Studies have reported promising results regarding artificial intelligence (AI) as a tool for improved mammographic screening interpretive performance. We analyzed AI malignancy risk scores from two versions of the same commercial AI model. Materials and methods: This retrospective cohort study used data from 117,709 screening examinations performed in BreastScreen Norway 2009–2018. The mammograms were processed by two versions of the commercially available AI model, Transpara (version 1.7 and 2.1). The distributions of exam-level risk scores (AI score 1–10) and risk categories were evaluated for both AI versions on all examinations, including 737 screen-detected and 200 interval cancers. Scores between 1–7 were categorized as low risk, 8–9 as intermediate risk, and 10 as high risk of malignancy. Results: Area under the receiver operating curve was 0.908 (95% CI: 0.986–0.920) for version 1.7 and 0.928 (95% CI: 0.917–0.939) for 2.1 when screen-detected and interval cancers were considered as positive cases (p &lt; 0.001). A total of 87.1% (642/737) and 93.5% (689/737) of the screen-detected cancers had an AI score of 10 with version 1.7 and 2.1, respectively. Among interval cancers, 45.0% (90/200) had AI score 10 with version 1.7 and 44.5% (89/200) had AI score 10 with version 2.1. Conclusion: A higher proportion of screen-detected breast cancers had the highest AI score of 10 with the newer version of the AI model compared to the older version. For interval cancers, there was no difference in the proportion of cases assigned to the highest score between the two versions. Key Points: Question Studies have reported promising results regarding the use of AI in mammography screening, but comparisons of updated versus older versions are less studied. Findings In our study, 87.1% (642/737) of the screen-detected cancers were classified with a high malignancy risk score by the old version, while it was 93.5% (689/737) for the newer version. Clinical relevance Understanding how version updates of AI models might impact screening mammography performance will be important for future quality assurance and validation of AI models.</p>}},
  author       = {{Larsen, Marthe and Lee, Christoph I. and Bergan, Marie B. and Holen, Åsne S. and Lund-Hanssen, Håkon and Hoff, Solveig R. and Auensen, Steinar and Nygård, Jan F. and Lång, Kristina and Chen, Yan and Ursin, Giske and Hofvind, Solveig}},
  issn         = {{0938-7994}},
  keywords     = {{Artificial intelligence; Breast cancer; Mammography; Screening}},
  language     = {{eng}},
  month        = {{01}},
  publisher    = {{Springer Science and Business Media B.V.}},
  series       = {{European Radiology}},
  title        = {{Performance across different versions of an artificial intelligence model for screen-reading of mammograms}},
  url          = {{http://dx.doi.org/10.1007/s00330-025-12240-6}},
  doi          = {{10.1007/s00330-025-12240-6}},
  year         = {{2026}},
}