Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Improving taxonomic inference from ancient environmental metagenomes by masking microbial-like regions in reference genomes

Oskolkov, Nikolay LU ; Jin, Chenyu ; López Clinton, Samantha ; Guinet, Benjamin ; Wijnands, Flore ; Johnson, Ernst ; Kutschera, Verena E. ; Kinsella, Cormac M. ; Heintzman, Peter D. and Van der Valk, Tom (2025) In GigaScience 14.
Abstract

Ancient environmental DNA is increasingly vital for reconstructing past ecosystems, particularly when paleontological and archaeological tissue remains are absent. Detecting ancient plant and animal DNA in environmental samples relies on using extensive eukaryotic reference genome databases for profiling metagenomics data. However, many eukaryotic genomes contain regions with high sequence similarity to microbial DNA, which can lead to the misclassification of bacterial and archaeal reads as eukaryotic. This issue is especially problematic in ancient eDNA datasets, where plant and animal DNA is typically present at very low abundance. In this study, we present a method for identifying bacterial- and archaeal-like sequences in eukaryotic... (More)

Ancient environmental DNA is increasingly vital for reconstructing past ecosystems, particularly when paleontological and archaeological tissue remains are absent. Detecting ancient plant and animal DNA in environmental samples relies on using extensive eukaryotic reference genome databases for profiling metagenomics data. However, many eukaryotic genomes contain regions with high sequence similarity to microbial DNA, which can lead to the misclassification of bacterial and archaeal reads as eukaryotic. This issue is especially problematic in ancient eDNA datasets, where plant and animal DNA is typically present at very low abundance. In this study, we present a method for identifying bacterial- and archaeal-like sequences in eukaryotic genomes and apply it to nearly 3,000 reference genomes from NCBI RefSeq and GenBank (vertebrates, invertebrates, plants) as well as the 1,323 PhyloNorway plant genome assemblies from herbarium material from northern high-latitude regions. We find that microbial-like regions are widespread across eukaryotic genomes and provide a comprehensive resource of their genomic coordinates and taxonomic annotations. This resource enables the masking of microbial-like regions during profiling analyses, thereby improving the reliability of ancient environmental metagenomic datasets for downstream analyses.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; ; ; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
ancient metagenomics, environmental DNA, microbial-like regions
in
GigaScience
volume
14
article number
giaf108
publisher
Oxford University Press
external identifiers
  • scopus:105017743900
  • pmid:41041810
ISSN
2047-217X
DOI
10.1093/gigascience/giaf108
language
English
LU publication?
yes
id
11b0544d-85d2-43e2-942f-faac13d26d61
date added to LUP
2025-12-05 14:52:16
date last changed
2025-12-06 03:00:09
@article{11b0544d-85d2-43e2-942f-faac13d26d61,
  abstract     = {{<p>Ancient environmental DNA is increasingly vital for reconstructing past ecosystems, particularly when paleontological and archaeological tissue remains are absent. Detecting ancient plant and animal DNA in environmental samples relies on using extensive eukaryotic reference genome databases for profiling metagenomics data. However, many eukaryotic genomes contain regions with high sequence similarity to microbial DNA, which can lead to the misclassification of bacterial and archaeal reads as eukaryotic. This issue is especially problematic in ancient eDNA datasets, where plant and animal DNA is typically present at very low abundance. In this study, we present a method for identifying bacterial- and archaeal-like sequences in eukaryotic genomes and apply it to nearly 3,000 reference genomes from NCBI RefSeq and GenBank (vertebrates, invertebrates, plants) as well as the 1,323 PhyloNorway plant genome assemblies from herbarium material from northern high-latitude regions. We find that microbial-like regions are widespread across eukaryotic genomes and provide a comprehensive resource of their genomic coordinates and taxonomic annotations. This resource enables the masking of microbial-like regions during profiling analyses, thereby improving the reliability of ancient environmental metagenomic datasets for downstream analyses.</p>}},
  author       = {{Oskolkov, Nikolay and Jin, Chenyu and López Clinton, Samantha and Guinet, Benjamin and Wijnands, Flore and Johnson, Ernst and Kutschera, Verena E. and Kinsella, Cormac M. and Heintzman, Peter D. and Van der Valk, Tom}},
  issn         = {{2047-217X}},
  keywords     = {{ancient metagenomics; environmental DNA; microbial-like regions}},
  language     = {{eng}},
  publisher    = {{Oxford University Press}},
  series       = {{GigaScience}},
  title        = {{Improving taxonomic inference from ancient environmental metagenomes by masking microbial-like regions in reference genomes}},
  url          = {{http://dx.doi.org/10.1093/gigascience/giaf108}},
  doi          = {{10.1093/gigascience/giaf108}},
  volume       = {{14}},
  year         = {{2025}},
}