Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

The analytical and clinical validity of AI algorithms to score TILs in TNBC : can we use different machine learning models interchangeably?

Vidal, Joan Martínez ; Tsiknakis, Nikos ; Staaf, Johan LU orcid ; Bosch, Ana LU ; Ehinger, Anna LU orcid ; Nimeus, Emma LU ; Salgado, Roberto ; Bai, Yalai ; Rimm, David L. and Hartman, Johan , et al. (2024) In EClinicalMedicine 78.
Abstract

Background: Pathologist-read tumor-infiltrating lymphocytes (TILs) have showcased their predictive and prognostic potential for early and metastatic triple-negative breast cancer (TNBC) but it is still subject to variability. Artificial intelligence (AI) is a promising approach toward eliminating variability and objectively automating TILs assessment. However, demonstrating robust analytical and prognostic validity is the key challenge currently preventing their integration into clinical workflows. Methods: We evaluated the impact of ten AI models on TILs scoring, emphasizing their distinctions in TILs analytical and prognostic validity. Several AI-based TILs scoring models (seven developed and three previously validated AI models) were... (More)

Background: Pathologist-read tumor-infiltrating lymphocytes (TILs) have showcased their predictive and prognostic potential for early and metastatic triple-negative breast cancer (TNBC) but it is still subject to variability. Artificial intelligence (AI) is a promising approach toward eliminating variability and objectively automating TILs assessment. However, demonstrating robust analytical and prognostic validity is the key challenge currently preventing their integration into clinical workflows. Methods: We evaluated the impact of ten AI models on TILs scoring, emphasizing their distinctions in TILs analytical and prognostic validity. Several AI-based TILs scoring models (seven developed and three previously validated AI models) were tested in a retrospective analytical cohort and in an independent prospective cohort to compare prognostic validation against invasive disease-free survival endpoint with 4 years median follow-up. The development and analytical validity set consisted of diagnostic tissue slides of 79 women with surgically resected primary invasive TNBC tumors diagnosed between 2012 and 2016 from the Yale School of Medicine. An independent set comprising of 215 TNBC patients from Sweden diagnosed between 2010 and 2015, was used for testing prognostic validity. Findings: A significant difference in analytical validity (Spearman's r = 0.63–0.73, p < 0.001) is highlighted across AI methodologies and training strategies. Interestingly, the prognostic performance of digital TILs is demonstrated for eight out of ten AI models, even less extensively trained ones, with similar and overlapping hazard ratios (HR) in the external validation cohort (Cox regression analysis based on IDFS-endpoint, HR = 0.40–0.47; p < 0.004). Interpretation: The demonstrated prognostic validity for most of the AI TIL models can be attributed to the intrinsic robustness of host anti-tumor immunity (measured by TILs) as a biomarker. However, the discrepancies between AI models should not be overlooked; rather, we believe that there is a critical need for an accessible, large, multi-centric dataset that will serve as a benchmark ensuring the comparability and reliability of different AI tools in clinical implementation. Funding: Nikos Tsiknakis is supported by the Swedish Research Council (Grant Number 2021-03061, Theodoros Foukakis). Balazs Acs is supported by The Swedish Society for Medical Research (Svenska Sällskapet för Medicinsk Forskning) postdoctoral grant. Roberto Salgado is supported by a grant from Breast Cancer Research Foundation (BCRF).

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; ; ; ; ; and , et al. (More)
; ; ; ; ; ; ; ; ; and (Less)
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Artificial intelligence, Breast cancer, Deep learning, Machine learning, TILs, Tumor infiltrating lymphocytes
in
EClinicalMedicine
volume
78
article number
102928
publisher
Lancet Publishing Group
external identifiers
  • pmid:39634035
  • scopus:85209128329
ISSN
2589-5370
DOI
10.1016/j.eclinm.2024.102928
language
English
LU publication?
yes
additional info
Publisher Copyright: © 2024 The Author(s)
id
4be84451-49ff-45ad-bc17-fc9c2f5083f6
date added to LUP
2025-01-09 10:07:32
date last changed
2025-07-11 01:36:54
@article{4be84451-49ff-45ad-bc17-fc9c2f5083f6,
  abstract     = {{<p>Background: Pathologist-read tumor-infiltrating lymphocytes (TILs) have showcased their predictive and prognostic potential for early and metastatic triple-negative breast cancer (TNBC) but it is still subject to variability. Artificial intelligence (AI) is a promising approach toward eliminating variability and objectively automating TILs assessment. However, demonstrating robust analytical and prognostic validity is the key challenge currently preventing their integration into clinical workflows. Methods: We evaluated the impact of ten AI models on TILs scoring, emphasizing their distinctions in TILs analytical and prognostic validity. Several AI-based TILs scoring models (seven developed and three previously validated AI models) were tested in a retrospective analytical cohort and in an independent prospective cohort to compare prognostic validation against invasive disease-free survival endpoint with 4 years median follow-up. The development and analytical validity set consisted of diagnostic tissue slides of 79 women with surgically resected primary invasive TNBC tumors diagnosed between 2012 and 2016 from the Yale School of Medicine. An independent set comprising of 215 TNBC patients from Sweden diagnosed between 2010 and 2015, was used for testing prognostic validity. Findings: A significant difference in analytical validity (Spearman's r = 0.63–0.73, p &lt; 0.001) is highlighted across AI methodologies and training strategies. Interestingly, the prognostic performance of digital TILs is demonstrated for eight out of ten AI models, even less extensively trained ones, with similar and overlapping hazard ratios (HR) in the external validation cohort (Cox regression analysis based on IDFS-endpoint, HR = 0.40–0.47; p &lt; 0.004). Interpretation: The demonstrated prognostic validity for most of the AI TIL models can be attributed to the intrinsic robustness of host anti-tumor immunity (measured by TILs) as a biomarker. However, the discrepancies between AI models should not be overlooked; rather, we believe that there is a critical need for an accessible, large, multi-centric dataset that will serve as a benchmark ensuring the comparability and reliability of different AI tools in clinical implementation. Funding: Nikos Tsiknakis is supported by the Swedish Research Council (Grant Number 2021-03061, Theodoros Foukakis). Balazs Acs is supported by The Swedish Society for Medical Research (Svenska Sällskapet för Medicinsk Forskning) postdoctoral grant. Roberto Salgado is supported by a grant from Breast Cancer Research Foundation (BCRF).</p>}},
  author       = {{Vidal, Joan Martínez and Tsiknakis, Nikos and Staaf, Johan and Bosch, Ana and Ehinger, Anna and Nimeus, Emma and Salgado, Roberto and Bai, Yalai and Rimm, David L. and Hartman, Johan and Acs, Balazs}},
  issn         = {{2589-5370}},
  keywords     = {{Artificial intelligence; Breast cancer; Deep learning; Machine learning; TILs; Tumor infiltrating lymphocytes}},
  language     = {{eng}},
  publisher    = {{Lancet Publishing Group}},
  series       = {{EClinicalMedicine}},
  title        = {{The analytical and clinical validity of AI algorithms to score TILs in TNBC : can we use different machine learning models interchangeably?}},
  url          = {{http://dx.doi.org/10.1016/j.eclinm.2024.102928}},
  doi          = {{10.1016/j.eclinm.2024.102928}},
  volume       = {{78}},
  year         = {{2024}},
}