Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Looking beyond the reference genomes : Integrating sparse and dense datasets for robust phylogenetic inference of species-rich groups of moths

Yapar, Etka LU orcid (2025)
Abstract
Phylogenies are central to many disciplines in biology. While disciplines such as systematics are focused on phylogenetic hypotheses themselves, other disciplines depend on robust and well-supported phylogenies in order to put biological data into evolutionary context through phylogenetic comparative studies. Examples of such fields are macroevolutionary research e.g. studying the evolution of a focal character across a phylogeny; comparative genomics and transcriptomics, in which the studied phenotype is the genomic features themselves (e.g. gene copy number variation, gene expression differences); as well as other disciplines that may be interested in e.g. allometry but need to take into account the non-independent nature of the data... (More)
Phylogenies are central to many disciplines in biology. While disciplines such as systematics are focused on phylogenetic hypotheses themselves, other disciplines depend on robust and well-supported phylogenies in order to put biological data into evolutionary context through phylogenetic comparative studies. Examples of such fields are macroevolutionary research e.g. studying the evolution of a focal character across a phylogeny; comparative genomics and transcriptomics, in which the studied phenotype is the genomic features themselves (e.g. gene copy number variation, gene expression differences); as well as other disciplines that may be interested in e.g. allometry but need to take into account the non-independent nature of the data resulting from shared ancestry. We are certainly living in the genomic era for phylogenetic inference, and this is more true for some groups of organisms than for others. Lepidoptera (butterflies and moths), with the approximately 160,000 described species, comprise about 11\% of all the animal species on earth and are one of the groups with a very large number of reference genomes. There are now more than 1,200 species of Lepidoptera with such high-quality genomes. This makes them the best sampled order of animals with reference genomes and thus it presents an unprecedented opportunity for phylogenomics of Lepidoptera. When the aim is to infer the best possible phylogeny, taxon sampling is of utmost importance, and thus these reference genomes need to be complemented with other types of data to achieve that. Luckily, there are a lot of additional data available in the form of a rich collection of contig-level genome assemblies, transcriptomic raw data, and maybe the richest of them all, the tens of thousands of species that have been sequenced by low throughput methods in previous phylogenetic studies. Although promising, this data integration approach brings with it some challenges such as fragmented or wrongly identified gene sequences and errors in orthology assessment. Throughout the first four chapters of this thesis, I formulate the necessary steps to efficiently integrate these additional types of already available data with reference genomes, and apply this methodology to infer robust phylogenies for four species-rich groups (three families and a superfamily) of Lepidoptera: Gelechioidea, Geometridae, Noctuidae, and Erebidae; and develop a reproducible phylogenetic data preprocessing pipeline that makes this approach available for use. Then, in the final chapter, I demonstrate the power of such robust phylogenies in evolutionary biology by studying the macroevolution of a key adaptive trait against harsh environmental conditions, the ability to diapause (i.e. to suppress development or reproduction), across butterflies (superfamily Papilionoidea) by leveraging a robust time-calibrated phylogeny inferred earlier for the group.
(Less)
Please use this url to cite or link to this publication:
author
supervisor
opponent
  • Docent Irestedt, Martin, Department of Bioinformatics and Genetics, Naturhistoriska riksmuseet; Department of Zoology, Stockholms Universitet
organization
publishing date
type
Thesis
publication status
published
subject
keywords
Phylogenetics, Geometridae, Noctuidae, Gelechioidea, Erebidae, Papilionoidea, Butterflies, Evolution
pages
70 pages
publisher
Media-Tryck, Lund University, Sweden
defense location
Blåhallen, Ekologihuset, Biologiska Institutionen
defense date
2026-02-06 09:00:00
ISBN
978-91-8104-796-7
978-91-8104-797-4
project
Developing core–technologies for tree based models
language
English
LU publication?
yes
id
a2630b74-6aaa-4e69-9409-e5dc010feef7
date added to LUP
2026-01-08 11:11:18
date last changed
2026-01-12 10:49:53
@phdthesis{a2630b74-6aaa-4e69-9409-e5dc010feef7,
  abstract     = {{Phylogenies are central to many disciplines in biology. While disciplines such as systematics are focused on phylogenetic hypotheses themselves, other disciplines depend on robust and well-supported phylogenies in order to put biological data into evolutionary context through phylogenetic comparative studies. Examples of such fields are macroevolutionary research e.g. studying the evolution of a focal character across a phylogeny; comparative genomics and transcriptomics, in which the studied phenotype is the genomic features themselves (e.g. gene copy number variation, gene expression differences); as well as other disciplines that may be interested in e.g. allometry but need to take into account the non-independent nature of the data resulting from shared ancestry. We are certainly living in the genomic era for phylogenetic inference, and this is more true for some groups of organisms than for others. Lepidoptera (butterflies and moths), with the approximately 160,000 described species, comprise about 11\% of all the animal species on earth and are one of the groups with a very large number of reference genomes. There are now more than 1,200 species of Lepidoptera with such high-quality genomes. This makes them the best sampled order of animals with reference genomes and thus it presents an unprecedented opportunity for phylogenomics of Lepidoptera. When the aim is to infer the best possible phylogeny, taxon sampling is of utmost importance, and thus these reference genomes need to be complemented with other types of data to achieve that. Luckily, there are a lot of additional data available in the form of a rich collection of contig-level genome assemblies, transcriptomic raw data, and maybe the richest of them all, the tens of thousands of species that have been sequenced by low throughput methods in previous phylogenetic studies. Although promising, this data integration approach brings with it some challenges such as fragmented or wrongly identified gene sequences and errors in orthology assessment. Throughout the first four chapters of this thesis, I formulate the necessary steps to efficiently integrate these additional types of already available data with reference genomes, and apply this methodology to infer robust phylogenies for four species-rich groups (three families and a superfamily) of Lepidoptera: Gelechioidea, Geometridae, Noctuidae, and Erebidae; and develop a reproducible phylogenetic data preprocessing pipeline that makes this approach available for use. Then, in the final chapter, I demonstrate the power of such robust phylogenies in evolutionary biology by studying the macroevolution of a key adaptive trait against harsh environmental conditions, the ability to diapause (i.e. to suppress development or reproduction), across butterflies (superfamily Papilionoidea) by leveraging a robust time-calibrated phylogeny inferred earlier for the group.<br/>}},
  author       = {{Yapar, Etka}},
  isbn         = {{978-91-8104-796-7}},
  keywords     = {{Phylogenetics; Geometridae; Noctuidae; Gelechioidea; Erebidae; Papilionoidea; Butterflies; Evolution}},
  language     = {{eng}},
  publisher    = {{Media-Tryck, Lund University, Sweden}},
  school       = {{Lund University}},
  title        = {{Looking beyond the reference genomes : Integrating sparse and dense datasets for robust phylogenetic inference of species-rich groups of moths}},
  url          = {{https://lup.lub.lu.se/search/files/238375556/Avhandling_Etka_Yapar_LUCRIS.pdf}},
  year         = {{2025}},
}