Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

On the unfounded enthusiasm for soft selective sweeps iii : The supervised machine learning algorithm that isn’t

Elhaik, Eran LU orcid and Graur, Dan (2021) In Genes 12(4).
Abstract

In the last 15 years or so, soft selective sweep mechanisms have been catapulted from a curiosity of little evolutionary importance to a ubiquitous mechanism claimed to explain most adaptive evolution and, in some cases, most evolution. This transformation was aided by a series of articles by Daniel Schrider and Andrew Kern. Within this series, a paper entitled “Soft sweeps are the dominant mode of adaptation in the human genome” (Schrider and Kern, Mol. Biol. Evolut. 2017, 34(8), 1863–1877) attracted a great deal of attention, in particular in conjunction with another paper (Kern and Hahn, Mol. Biol. Evolut. 2018, 35(6), 1366–1371), for purporting to discredit the Neutral Theory of Molecular Evolution (Kimura 1968). Here, we address an... (More)

In the last 15 years or so, soft selective sweep mechanisms have been catapulted from a curiosity of little evolutionary importance to a ubiquitous mechanism claimed to explain most adaptive evolution and, in some cases, most evolution. This transformation was aided by a series of articles by Daniel Schrider and Andrew Kern. Within this series, a paper entitled “Soft sweeps are the dominant mode of adaptation in the human genome” (Schrider and Kern, Mol. Biol. Evolut. 2017, 34(8), 1863–1877) attracted a great deal of attention, in particular in conjunction with another paper (Kern and Hahn, Mol. Biol. Evolut. 2018, 35(6), 1366–1371), for purporting to discredit the Neutral Theory of Molecular Evolution (Kimura 1968). Here, we address an alleged novelty in Schrider and Kern’s paper, i.e., the claim that their study involved an artificial intelligence technique called supervised machine learning (SML). SML is predicated upon the existence of a training dataset in which the correspondence between the input and output is known empirically to be true. Curiously, Schrider and Kern did not possess a training dataset of genomic segments known a priori to have evolved either neutrally or through soft or hard selective sweeps. Thus, their claim of using SML is thoroughly and utterly misleading. In the absence of legitimate training datasets, Schrider and Kern used: (1) simulations that employ many manipulatable variables and (2) a system of data cherry-picking rivaling the worst excesses in the literature. These two factors, in addition to the lack of negative controls and the irreproducibility of their results due to incomplete methodological detail, lead us to conclude that all evolutionary inferences derived from so-called SML algorithms (e.g., S/HIC) should be taken with a huge shovel of salt.

(Less)
Please use this url to cite or link to this publication:
author
and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Artificial intelligence (AI), Evolutionary biology, Molecular and genome evolution, Population size, Selective sweeps, Supervised machine learning (SML)
in
Genes
volume
12
issue
4
article number
527
publisher
MDPI AG
external identifiers
  • scopus:85104295844
  • pmid:33916341
ISSN
2073-4425
DOI
10.3390/genes12040527
language
English
LU publication?
yes
id
3d977fdb-9f51-4253-968c-c8bb6c329b49
date added to LUP
2021-04-26 10:29:01
date last changed
2024-04-20 05:27:25
@article{3d977fdb-9f51-4253-968c-c8bb6c329b49,
  abstract     = {{<p>In the last 15 years or so, soft selective sweep mechanisms have been catapulted from a curiosity of little evolutionary importance to a ubiquitous mechanism claimed to explain most adaptive evolution and, in some cases, most evolution. This transformation was aided by a series of articles by Daniel Schrider and Andrew Kern. Within this series, a paper entitled “Soft sweeps are the dominant mode of adaptation in the human genome” (Schrider and Kern, Mol. Biol. Evolut. 2017, 34(8), 1863–1877) attracted a great deal of attention, in particular in conjunction with another paper (Kern and Hahn, Mol. Biol. Evolut. 2018, 35(6), 1366–1371), for purporting to discredit the Neutral Theory of Molecular Evolution (Kimura 1968). Here, we address an alleged novelty in Schrider and Kern’s paper, i.e., the claim that their study involved an artificial intelligence technique called supervised machine learning (SML). SML is predicated upon the existence of a training dataset in which the correspondence between the input and output is known empirically to be true. Curiously, Schrider and Kern did not possess a training dataset of genomic segments known a priori to have evolved either neutrally or through soft or hard selective sweeps. Thus, their claim of using SML is thoroughly and utterly misleading. In the absence of legitimate training datasets, Schrider and Kern used: (1) simulations that employ many manipulatable variables and (2) a system of data cherry-picking rivaling the worst excesses in the literature. These two factors, in addition to the lack of negative controls and the irreproducibility of their results due to incomplete methodological detail, lead us to conclude that all evolutionary inferences derived from so-called SML algorithms (e.g., S/HIC) should be taken with a huge shovel of salt.</p>}},
  author       = {{Elhaik, Eran and Graur, Dan}},
  issn         = {{2073-4425}},
  keywords     = {{Artificial intelligence (AI); Evolutionary biology; Molecular and genome evolution; Population size; Selective sweeps; Supervised machine learning (SML)}},
  language     = {{eng}},
  number       = {{4}},
  publisher    = {{MDPI AG}},
  series       = {{Genes}},
  title        = {{On the unfounded enthusiasm for soft selective sweeps iii : The supervised machine learning algorithm that isn’t}},
  url          = {{http://dx.doi.org/10.3390/genes12040527}},
  doi          = {{10.3390/genes12040527}},
  volume       = {{12}},
  year         = {{2021}},
}