Man against machine : diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists

Haenssle, H A; Fink, C; Schneiderbauer, R; Toberer, F; Buhl, T; Blum, A; Kalloo, A; Hassen, A Ben Hadj; Thomas, L; Enk, A; Uhlmann, L

Man against machine : diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists

Mark

Haenssle, H A ; Fink, C ; Schneiderbauer, R ; Toberer, F ; Buhl, T ; Blum, A ; Kalloo, A ; Hassen, A Ben Hadj ; Thomas, L and Enk, A , et al. (2018) In Annals of oncology : official journal of the European Society for Medical Oncology 29(8). p.1836-1842

Abstract

Background: Deep learning convolutional neural networks (CNN) may facilitate melanoma detection, but data comparing a CNN's diagnostic performance to larger groups of dermatologists are lacking.

Methods: Google's Inception v4 CNN architecture was trained and validated using dermoscopic images and corresponding diagnoses. In a comparative cross-sectional reader study a 100-image test-set was used (level-I: dermoscopy only; level-II: dermoscopy plus clinical information and images). Main outcome measures were sensitivity, specificity and area under the curve (AUC) of receiver operating characteristics (ROC) for diagnostic classification (dichotomous) of lesions by the CNN versus an international group of 58 dermatologists during... (More)

Background: Deep learning convolutional neural networks (CNN) may facilitate melanoma detection, but data comparing a CNN's diagnostic performance to larger groups of dermatologists are lacking.

Methods: Google's Inception v4 CNN architecture was trained and validated using dermoscopic images and corresponding diagnoses. In a comparative cross-sectional reader study a 100-image test-set was used (level-I: dermoscopy only; level-II: dermoscopy plus clinical information and images). Main outcome measures were sensitivity, specificity and area under the curve (AUC) of receiver operating characteristics (ROC) for diagnostic classification (dichotomous) of lesions by the CNN versus an international group of 58 dermatologists during level-I or -II of the reader study. Secondary end points included the dermatologists' diagnostic performance in their management decisions and differences in the diagnostic performance of dermatologists during level-I and -II of the reader study. Additionally, the CNN's performance was compared with the top-five algorithms of the 2016 International Symposium on Biomedical Imaging (ISBI) challenge.

Results: In level-I dermatologists achieved a mean (±standard deviation) sensitivity and specificity for lesion classification of 86.6% (±9.3%) and 71.3% (±11.2%), respectively. More clinical information (level-II) improved the sensitivity to 88.9% (±9.6%, P = 0.19) and specificity to 75.7% (±11.7%, P < 0.05). The CNN ROC curve revealed a higher specificity of 82.5% when compared with dermatologists in level-I (71.3%, P < 0.01) and level-II (75.7%, P < 0.01) at their sensitivities of 86.6% and 88.9%, respectively. The CNN ROC AUC was greater than the mean ROC area of dermatologists (0.86 versus 0.79, P < 0.01). The CNN scored results close to the top three algorithms of the ISBI 2016 challenge.

Conclusions: For the first time we compared a CNN's diagnostic performance with a large international group of 58 dermatologists, including 30 experts. Most dermatologists were outperformed by the CNN. Irrespective of any physicians' experience, they may benefit from assistance by a CNN's image classification.

Clinical trial number: This study was registered at the German Clinical Trial Register (DRKS-Study-ID: DRKS00013570; https://www.drks.de/drks_web/).

(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/45bd6566-8465-4d6d-b3fd-c04bb76bd276

author

Haenssle, H A ; Fink, C ; Schneiderbauer, R ; Toberer, F ; Buhl, T ; Blum, A ; Kalloo, A ; Hassen, A Ben Hadj ; Thomas, L and Enk, A , et al. (More)

Haenssle, H A ; Fink, C ; Schneiderbauer, R ; Toberer, F ; Buhl, T ; Blum, A ; Kalloo, A ; Hassen, A Ben Hadj ; Thomas, L ; Enk, A and Uhlmann, L (Less)

contributor

Nielsen, Kari ^LU

author collaboration

Reader study level-I and level-II Groups

organization

Dermatology and Venereology (Lund)

publishing date

2018-08-01

type

Contribution to journal

publication status

published

subject

keywords

melanoma, melanocytic nevi, dermoscopy, deep learning convolutional neural network, computer algorithm, automated melonoma detection

in

Annals of oncology : official journal of the European Society for Medical Oncology

volume

29

issue

8

pages

7 pages

publisher

Oxford University Press

external identifiers

scopus:85054158054
pmid:29846502

ISSN

1569-8041

DOI

10.1093/annonc/mdy166

language

English

LU publication?

yes

id

45bd6566-8465-4d6d-b3fd-c04bb76bd276

date added to LUP

2019-05-23 09:14:29

date last changed

2025-10-02 08:21:49

@article{45bd6566-8465-4d6d-b3fd-c04bb76bd276,
  abstract     = {{<p>Background: Deep learning convolutional neural networks (CNN) may facilitate melanoma detection, but data comparing a CNN's diagnostic performance to larger groups of dermatologists are lacking.</p><p>Methods: Google's Inception v4 CNN architecture was trained and validated using dermoscopic images and corresponding diagnoses. In a comparative cross-sectional reader study a 100-image test-set was used (level-I: dermoscopy only; level-II: dermoscopy plus clinical information and images). Main outcome measures were sensitivity, specificity and area under the curve (AUC) of receiver operating characteristics (ROC) for diagnostic classification (dichotomous) of lesions by the CNN versus an international group of 58 dermatologists during level-I or -II of the reader study. Secondary end points included the dermatologists' diagnostic performance in their management decisions and differences in the diagnostic performance of dermatologists during level-I and -II of the reader study. Additionally, the CNN's performance was compared with the top-five algorithms of the 2016 International Symposium on Biomedical Imaging (ISBI) challenge.</p><p>Results: In level-I dermatologists achieved a mean (±standard deviation) sensitivity and specificity for lesion classification of 86.6% (±9.3%) and 71.3% (±11.2%), respectively. More clinical information (level-II) improved the sensitivity to 88.9% (±9.6%, P = 0.19) and specificity to 75.7% (±11.7%, P &lt; 0.05). The CNN ROC curve revealed a higher specificity of 82.5% when compared with dermatologists in level-I (71.3%, P &lt; 0.01) and level-II (75.7%, P &lt; 0.01) at their sensitivities of 86.6% and 88.9%, respectively. The CNN ROC AUC was greater than the mean ROC area of dermatologists (0.86 versus 0.79, P &lt; 0.01). The CNN scored results close to the top three algorithms of the ISBI 2016 challenge.</p><p>Conclusions: For the first time we compared a CNN's diagnostic performance with a large international group of 58 dermatologists, including 30 experts. Most dermatologists were outperformed by the CNN. Irrespective of any physicians' experience, they may benefit from assistance by a CNN's image classification.</p><p>Clinical trial number: This study was registered at the German Clinical Trial Register (DRKS-Study-ID: DRKS00013570; https://www.drks.de/drks_web/).</p>}},
  author       = {{Haenssle, H A and Fink, C and Schneiderbauer, R and Toberer, F and Buhl, T and Blum, A and Kalloo, A and Hassen, A Ben Hadj and Thomas, L and Enk, A and Uhlmann, L}},
  issn         = {{1569-8041}},
  keywords     = {{melanoma; melanocytic nevi; dermoscopy; deep learning convolutional neural network; computer algorithm; automated melonoma detection}},
  language     = {{eng}},
  month        = {{08}},
  number       = {{8}},
  pages        = {{1836--1842}},
  publisher    = {{Oxford University Press}},
  series       = {{Annals of oncology : official journal of the European Society for Medical Oncology}},
  title        = {{Man against machine : diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists}},
  url          = {{http://dx.doi.org/10.1093/annonc/mdy166}},
  doi          = {{10.1093/annonc/mdy166}},
  volume       = {{29}},
  year         = {{2018}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Man against machine : diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists