Pilot study to evaluate tools to collect pathologist annotations for validating machine learning algorithms

Elfer, Katherine; Dudgeon, Sarah; Garcia, Victor; Blenman, Kim; Hytopoulos, Evangelos; Wen, Si; Li, Xiaoxian; Ly, Amy; Werness, Bruce; Sheth, Manasi S; Amgad, Mohamed; Gupta, Rajarsi; Saltz, Joel; Hanna, Matthew G; Ehinger, Anna; Peeters, Dieter; Salgado, Roberto; Gallas, Brandon D

Pilot study to evaluate tools to collect pathologist annotations for validating machine learning algorithms

Mark

Elfer, Katherine ; Dudgeon, Sarah ; Garcia, Victor ; Blenman, Kim ; Hytopoulos, Evangelos ; Wen, Si ; Li, Xiaoxian ; Ly, Amy ; Werness, Bruce and Sheth, Manasi S , et al. (2022) In Journal of Medical Imaging 9(4). p.1-14

Abstract: Purpose: Validation of artificial intelligence (AI) algorithms in digital pathology with a reference standard is necessary before widespread clinical use, but few examples focus on creating a reference standard based on pathologist annotations. This work assesses the results of a pilot study that collects density estimates of stromal tumor-infiltrating lymphocytes (sTILs) in breast cancer biopsy specimens. This work will inform the creation of a validation dataset for the evaluation of AI algorithms fit for a regulatory purpose. Approach: Collaborators and crowdsourced pathologists contributed glass slides, digital images, and annotations. Here, "annotations" refer to any marks, segmentations, measurements, or labels a pathologist adds... (More); Purpose: Validation of artificial intelligence (AI) algorithms in digital pathology with a reference standard is necessary before widespread clinical use, but few examples focus on creating a reference standard based on pathologist annotations. This work assesses the results of a pilot study that collects density estimates of stromal tumor-infiltrating lymphocytes (sTILs) in breast cancer biopsy specimens. This work will inform the creation of a validation dataset for the evaluation of AI algorithms fit for a regulatory purpose. Approach: Collaborators and crowdsourced pathologists contributed glass slides, digital images, and annotations. Here, "annotations" refer to any marks, segmentations, measurements, or labels a pathologist adds to a report, image, region of interest (ROI), or biological feature. Pathologists estimated sTILs density in 640 ROIs from hematoxylin and eosin stained slides of 64 patients via two modalities: an optical light microscope and two digital image viewing platforms. Results: The pilot study generated 7373 sTILs density estimates from 29 pathologists. Analysis of annotations found the variability of density estimates per ROI increases with the mean; the root mean square differences were 4.46, 14.25, and 26.25 as the mean density ranged from 0% to 10%, 11% to 40%, and 41% to 100%, respectively. The pilot study informs three areas of improvement for future work: technical workflows, annotation platforms, and agreement analysis methods. Upgrades to the workflows and platforms will improve operability and increase annotation speed and consistency. Conclusions: Exploratory data analysis demonstrates the need to develop new statistical approaches for agreement. The pilot study dataset and analysis methods are publicly available to allow community feedback. The development and results of the validation dataset will be publicly available to serve as an instructive tool that can be replicated by developers and researchers.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/4eac308b-8e3d-4754-a8fd-8fe072b34b4b

author

Elfer, Katherine ; Dudgeon, Sarah ; Garcia, Victor ; Blenman, Kim ; Hytopoulos, Evangelos ; Wen, Si ; Li, Xiaoxian ; Ly, Amy ; Werness, Bruce and Sheth, Manasi S , et al. (More)

Elfer, Katherine ; Dudgeon, Sarah ; Garcia, Victor ; Blenman, Kim ; Hytopoulos, Evangelos ; Wen, Si ; Li, Xiaoxian ; Ly, Amy ; Werness, Bruce ; Sheth, Manasi S ; Amgad, Mohamed ; Gupta, Rajarsi ; Saltz, Joel ; Hanna, Matthew G ; Ehinger, Anna ^LU

; Peeters, Dieter ; Salgado, Roberto and Gallas, Brandon D (Less)

organization

publishing date

2022

type

Contribution to journal

publication status

published

subject

Medical Imaging

in

Journal of Medical Imaging

volume

9

issue

4

article number

047501

pages

1 - 14

publisher

SPIE

external identifiers

scopus:85142224830
pmid:35911208

ISSN

2329-4302

DOI

10.1117/1.JMI.9.4.047501

language

English

LU publication?

yes

additional info

id

4eac308b-8e3d-4754-a8fd-8fe072b34b4b

date added to LUP

2022-09-06 08:26:22

date last changed

2025-07-26 05:50:22

@article{4eac308b-8e3d-4754-a8fd-8fe072b34b4b,
  abstract     = {{<p>Purpose: Validation of artificial intelligence (AI) algorithms in digital pathology with a reference standard is necessary before widespread clinical use, but few examples focus on creating a reference standard based on pathologist annotations. This work assesses the results of a pilot study that collects density estimates of stromal tumor-infiltrating lymphocytes (sTILs) in breast cancer biopsy specimens. This work will inform the creation of a validation dataset for the evaluation of AI algorithms fit for a regulatory purpose. Approach: Collaborators and crowdsourced pathologists contributed glass slides, digital images, and annotations. Here, "annotations" refer to any marks, segmentations, measurements, or labels a pathologist adds to a report, image, region of interest (ROI), or biological feature. Pathologists estimated sTILs density in 640 ROIs from hematoxylin and eosin stained slides of 64 patients via two modalities: an optical light microscope and two digital image viewing platforms. Results: The pilot study generated 7373 sTILs density estimates from 29 pathologists. Analysis of annotations found the variability of density estimates per ROI increases with the mean; the root mean square differences were 4.46, 14.25, and 26.25 as the mean density ranged from 0% to 10%, 11% to 40%, and 41% to 100%, respectively. The pilot study informs three areas of improvement for future work: technical workflows, annotation platforms, and agreement analysis methods. Upgrades to the workflows and platforms will improve operability and increase annotation speed and consistency. Conclusions: Exploratory data analysis demonstrates the need to develop new statistical approaches for agreement. The pilot study dataset and analysis methods are publicly available to allow community feedback. The development and results of the validation dataset will be publicly available to serve as an instructive tool that can be replicated by developers and researchers.</p>}},
  author       = {{Elfer, Katherine and Dudgeon, Sarah and Garcia, Victor and Blenman, Kim and Hytopoulos, Evangelos and Wen, Si and Li, Xiaoxian and Ly, Amy and Werness, Bruce and Sheth, Manasi S and Amgad, Mohamed and Gupta, Rajarsi and Saltz, Joel and Hanna, Matthew G and Ehinger, Anna and Peeters, Dieter and Salgado, Roberto and Gallas, Brandon D}},
  issn         = {{2329-4302}},
  language     = {{eng}},
  number       = {{4}},
  pages        = {{1--14}},
  publisher    = {{SPIE}},
  series       = {{Journal of Medical Imaging}},
  title        = {{Pilot study to evaluate tools to collect pathologist annotations for validating machine learning algorithms}},
  url          = {{http://dx.doi.org/10.1117/1.JMI.9.4.047501}},
  doi          = {{10.1117/1.JMI.9.4.047501}},
  volume       = {{9}},
  year         = {{2022}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Pilot study to evaluate tools to collect pathologist annotations for validating machine learning algorithms