Improving taxonomic inference from ancient environmental metagenomes by masking microbial-like regions in reference genomes
(2025) In GigaScience 14.- Abstract
Ancient environmental DNA is increasingly vital for reconstructing past ecosystems, particularly when paleontological and archaeological tissue remains are absent. Detecting ancient plant and animal DNA in environmental samples relies on using extensive eukaryotic reference genome databases for profiling metagenomics data. However, many eukaryotic genomes contain regions with high sequence similarity to microbial DNA, which can lead to the misclassification of bacterial and archaeal reads as eukaryotic. This issue is especially problematic in ancient eDNA datasets, where plant and animal DNA is typically present at very low abundance. In this study, we present a method for identifying bacterial- and archaeal-like sequences in eukaryotic... (More)
Ancient environmental DNA is increasingly vital for reconstructing past ecosystems, particularly when paleontological and archaeological tissue remains are absent. Detecting ancient plant and animal DNA in environmental samples relies on using extensive eukaryotic reference genome databases for profiling metagenomics data. However, many eukaryotic genomes contain regions with high sequence similarity to microbial DNA, which can lead to the misclassification of bacterial and archaeal reads as eukaryotic. This issue is especially problematic in ancient eDNA datasets, where plant and animal DNA is typically present at very low abundance. In this study, we present a method for identifying bacterial- and archaeal-like sequences in eukaryotic genomes and apply it to nearly 3,000 reference genomes from NCBI RefSeq and GenBank (vertebrates, invertebrates, plants) as well as the 1,323 PhyloNorway plant genome assemblies from herbarium material from northern high-latitude regions. We find that microbial-like regions are widespread across eukaryotic genomes and provide a comprehensive resource of their genomic coordinates and taxonomic annotations. This resource enables the masking of microbial-like regions during profiling analyses, thereby improving the reliability of ancient environmental metagenomic datasets for downstream analyses.
(Less)
- author
- Oskolkov, Nikolay LU ; Jin, Chenyu ; López Clinton, Samantha ; Guinet, Benjamin ; Wijnands, Flore ; Johnson, Ernst ; Kutschera, Verena E. ; Kinsella, Cormac M. ; Heintzman, Peter D. and Van der Valk, Tom
- organization
- publishing date
- 2025
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- ancient metagenomics, environmental DNA, microbial-like regions
- in
- GigaScience
- volume
- 14
- article number
- giaf108
- publisher
- Oxford University Press
- external identifiers
-
- scopus:105017743900
- pmid:41041810
- ISSN
- 2047-217X
- DOI
- 10.1093/gigascience/giaf108
- language
- English
- LU publication?
- yes
- id
- 11b0544d-85d2-43e2-942f-faac13d26d61
- date added to LUP
- 2025-12-05 14:52:16
- date last changed
- 2025-12-06 03:00:09
@article{11b0544d-85d2-43e2-942f-faac13d26d61,
abstract = {{<p>Ancient environmental DNA is increasingly vital for reconstructing past ecosystems, particularly when paleontological and archaeological tissue remains are absent. Detecting ancient plant and animal DNA in environmental samples relies on using extensive eukaryotic reference genome databases for profiling metagenomics data. However, many eukaryotic genomes contain regions with high sequence similarity to microbial DNA, which can lead to the misclassification of bacterial and archaeal reads as eukaryotic. This issue is especially problematic in ancient eDNA datasets, where plant and animal DNA is typically present at very low abundance. In this study, we present a method for identifying bacterial- and archaeal-like sequences in eukaryotic genomes and apply it to nearly 3,000 reference genomes from NCBI RefSeq and GenBank (vertebrates, invertebrates, plants) as well as the 1,323 PhyloNorway plant genome assemblies from herbarium material from northern high-latitude regions. We find that microbial-like regions are widespread across eukaryotic genomes and provide a comprehensive resource of their genomic coordinates and taxonomic annotations. This resource enables the masking of microbial-like regions during profiling analyses, thereby improving the reliability of ancient environmental metagenomic datasets for downstream analyses.</p>}},
author = {{Oskolkov, Nikolay and Jin, Chenyu and López Clinton, Samantha and Guinet, Benjamin and Wijnands, Flore and Johnson, Ernst and Kutschera, Verena E. and Kinsella, Cormac M. and Heintzman, Peter D. and Van der Valk, Tom}},
issn = {{2047-217X}},
keywords = {{ancient metagenomics; environmental DNA; microbial-like regions}},
language = {{eng}},
publisher = {{Oxford University Press}},
series = {{GigaScience}},
title = {{Improving taxonomic inference from ancient environmental metagenomes by masking microbial-like regions in reference genomes}},
url = {{http://dx.doi.org/10.1093/gigascience/giaf108}},
doi = {{10.1093/gigascience/giaf108}},
volume = {{14}},
year = {{2025}},
}