Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Information Dropout Patterns in Restriction Site Associated DNA Phylogenomics and a Comparison with Multilocus Sanger Data in a Species-Rich Moth Genus

Lee, Kyung Min ; Kivelä, Sami M. ; Ivanov, Vladislav ; Hausmann, Axel ; Kaila, Lauri ; Wahlberg, Niklas LU and Mutanen, Marko (2018) In Systematic Biology 67(6). p.925-939
Abstract

A rapid shift from traditional Sanger sequencing-based molecular methods to the phylogenomic approach with large numbers of loci is underway. Among phylogenomic methods, restriction site associated DNA (RAD) sequencing approaches have gained much attention as they enable rapid generation of up to thousands of loci randomly scattered across the genome and are suitable for nonmodel species. RAD data sets however suffer from large amounts of missing data and rapid locus dropout along with decreasing relatedness among taxa. The relationship between locus dropout and the amount of phylogenetic information retained in the data has remained largely uninvestigated. Similarly, phylogenetic hypotheses based on RAD have rarely been compared with... (More)

A rapid shift from traditional Sanger sequencing-based molecular methods to the phylogenomic approach with large numbers of loci is underway. Among phylogenomic methods, restriction site associated DNA (RAD) sequencing approaches have gained much attention as they enable rapid generation of up to thousands of loci randomly scattered across the genome and are suitable for nonmodel species. RAD data sets however suffer from large amounts of missing data and rapid locus dropout along with decreasing relatedness among taxa. The relationship between locus dropout and the amount of phylogenetic information retained in the data has remained largely uninvestigated. Similarly, phylogenetic hypotheses based on RAD have rarely been compared with phylogenetic hypotheses based on multilocus Sanger sequencing, even less so using exactly the same species and specimens. We compared the Sanger-based phylogenetic hypothesis (8 loci; 6172 bp) of 32 species of the diverse moth genus Eupithecia (Lepidoptera, Geometridae) to that based on double-digest RAD sequencing (3256 loci; 726,658 bp). We observed that topologies were largely congruent, with some notable exceptions that we discuss. The locus dropout effect was strong. We demonstrate that number of loci is not a precise measure of phylogenetic information since the number of single-nucleotide polymorphisms (SNPs) may remain low at very shallow phylogenetic levels despite large numbers of loci. As we hypothesize, the number of SNPs and parsimony informative SNPs (PIS) is low at shallow phylogenetic levels, peaks at intermediate levels and, thereafter, declines again at the deepest levels as a result of decay of available loci. Similarly, we demonstrate with empirical data that the locus dropout affects the type of loci retained, the loci found in many species tending to show lower interspecific distances than those shared among fewer species. We also examine the effects of the numbers of loci, SNPs, and PIS on nodal bootstrap support, but could not demonstrate with our data our expectation of a positive correlation between them. We conclude that RAD methods provide a powerful tool for phylogenomics at an intermediate phylogenetic level as indicated by its broad congruence with an eight-gene Sanger data set in a genus of moths. When assessing the quality of the data for phylogenetic inference, the focus should be on the distribution and number of SNPs and PIS rather than on loci.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
in
Systematic Biology
volume
67
issue
6
pages
15 pages
publisher
Oxford University Press
external identifiers
  • pmid:29669013
  • scopus:85055074118
ISSN
1063-5157
DOI
10.1093/sysbio/syy029
language
English
LU publication?
yes
id
0d3db0da-a5c3-4314-832d-3d8f1da6a37a
date added to LUP
2018-11-14 14:33:59
date last changed
2024-02-14 12:23:30
@article{0d3db0da-a5c3-4314-832d-3d8f1da6a37a,
  abstract     = {{<p>A rapid shift from traditional Sanger sequencing-based molecular methods to the phylogenomic approach with large numbers of loci is underway. Among phylogenomic methods, restriction site associated DNA (RAD) sequencing approaches have gained much attention as they enable rapid generation of up to thousands of loci randomly scattered across the genome and are suitable for nonmodel species. RAD data sets however suffer from large amounts of missing data and rapid locus dropout along with decreasing relatedness among taxa. The relationship between locus dropout and the amount of phylogenetic information retained in the data has remained largely uninvestigated. Similarly, phylogenetic hypotheses based on RAD have rarely been compared with phylogenetic hypotheses based on multilocus Sanger sequencing, even less so using exactly the same species and specimens. We compared the Sanger-based phylogenetic hypothesis (8 loci; 6172 bp) of 32 species of the diverse moth genus Eupithecia (Lepidoptera, Geometridae) to that based on double-digest RAD sequencing (3256 loci; 726,658 bp). We observed that topologies were largely congruent, with some notable exceptions that we discuss. The locus dropout effect was strong. We demonstrate that number of loci is not a precise measure of phylogenetic information since the number of single-nucleotide polymorphisms (SNPs) may remain low at very shallow phylogenetic levels despite large numbers of loci. As we hypothesize, the number of SNPs and parsimony informative SNPs (PIS) is low at shallow phylogenetic levels, peaks at intermediate levels and, thereafter, declines again at the deepest levels as a result of decay of available loci. Similarly, we demonstrate with empirical data that the locus dropout affects the type of loci retained, the loci found in many species tending to show lower interspecific distances than those shared among fewer species. We also examine the effects of the numbers of loci, SNPs, and PIS on nodal bootstrap support, but could not demonstrate with our data our expectation of a positive correlation between them. We conclude that RAD methods provide a powerful tool for phylogenomics at an intermediate phylogenetic level as indicated by its broad congruence with an eight-gene Sanger data set in a genus of moths. When assessing the quality of the data for phylogenetic inference, the focus should be on the distribution and number of SNPs and PIS rather than on loci.</p>}},
  author       = {{Lee, Kyung Min and Kivelä, Sami M. and Ivanov, Vladislav and Hausmann, Axel and Kaila, Lauri and Wahlberg, Niklas and Mutanen, Marko}},
  issn         = {{1063-5157}},
  language     = {{eng}},
  number       = {{6}},
  pages        = {{925--939}},
  publisher    = {{Oxford University Press}},
  series       = {{Systematic Biology}},
  title        = {{Information Dropout Patterns in Restriction Site Associated DNA Phylogenomics and a Comparison with Multilocus Sanger Data in a Species-Rich Moth Genus}},
  url          = {{http://dx.doi.org/10.1093/sysbio/syy029}},
  doi          = {{10.1093/sysbio/syy029}},
  volume       = {{67}},
  year         = {{2018}},
}