Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Happ : High-accuracy pipeline for processing deep metabarcoding data

Sundh, John ; Granqvist, Emma ; Iwaszkiewicz-Eggebrecht, Ela ; Manoharan, Lokeshwaran LU orcid ; van Dijk, Laura J A ; Goodsell, Robert ; Godeiro, Nerivania N ; Bellini, Bruno C ; Orsholm, Johanna and Łukasik, Piotr , et al. (2025) In PLoS Computational Biology 21(11).
Abstract

Deep metabarcoding offers an efficient and reproducible approach to biodiversity monitoring, but noisy data and incomplete reference databases challenge accurate diversity estimation and taxonomic annotation. Here, we introduce a novel algorithm, NEEAT, for removing spurious operational taxonomic units (OTUs) originating from nuclear-embedded mitochondrial DNA sequences (NUMTs) or sequencing errors. It integrates 'echo' signals across samples with the identification of unusual evolutionary patterns among similar DNA sequences. We also extensively benchmark current tools for chimera removal, taxonomic annotation and OTU clustering of deep metabarcoding data. The best performing tools/parameter settings are integrated into HAPP, a... (More)

Deep metabarcoding offers an efficient and reproducible approach to biodiversity monitoring, but noisy data and incomplete reference databases challenge accurate diversity estimation and taxonomic annotation. Here, we introduce a novel algorithm, NEEAT, for removing spurious operational taxonomic units (OTUs) originating from nuclear-embedded mitochondrial DNA sequences (NUMTs) or sequencing errors. It integrates 'echo' signals across samples with the identification of unusual evolutionary patterns among similar DNA sequences. We also extensively benchmark current tools for chimera removal, taxonomic annotation and OTU clustering of deep metabarcoding data. The best performing tools/parameter settings are integrated into HAPP, a high-accuracy pipeline for processing deep metabarcoding data. Tests using CO1 data from BOLD and large-scale metabarcoding data on insects demonstrate that HAPP significantly outperforms existing methods, while enabling efficient analysis of extensive datasets by parallelizing computations across taxonomic groups.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; ; ; ; ; and , et al. (More)
; ; ; ; ; ; ; ; ; ; ; ; ; and (Less)
organization
publishing date
type
Contribution to journal
publication status
epub
subject
in
PLoS Computational Biology
volume
21
issue
11
article number
e1013558
pages
23 pages
publisher
Public Library of Science (PLoS)
external identifiers
  • pmid:41202092
ISSN
1553-7358
DOI
10.1371/journal.pcbi.1013558
language
English
LU publication?
yes
additional info
Copyright: © 2025 Sundh et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
id
f2065479-5023-4357-b71c-53bc4ca3513d
date added to LUP
2025-11-11 07:42:42
date last changed
2025-11-11 09:33:29
@article{f2065479-5023-4357-b71c-53bc4ca3513d,
  abstract     = {{<p>Deep metabarcoding offers an efficient and reproducible approach to biodiversity monitoring, but noisy data and incomplete reference databases challenge accurate diversity estimation and taxonomic annotation. Here, we introduce a novel algorithm, NEEAT, for removing spurious operational taxonomic units (OTUs) originating from nuclear-embedded mitochondrial DNA sequences (NUMTs) or sequencing errors. It integrates 'echo' signals across samples with the identification of unusual evolutionary patterns among similar DNA sequences. We also extensively benchmark current tools for chimera removal, taxonomic annotation and OTU clustering of deep metabarcoding data. The best performing tools/parameter settings are integrated into HAPP, a high-accuracy pipeline for processing deep metabarcoding data. Tests using CO1 data from BOLD and large-scale metabarcoding data on insects demonstrate that HAPP significantly outperforms existing methods, while enabling efficient analysis of extensive datasets by parallelizing computations across taxonomic groups.</p>}},
  author       = {{Sundh, John and Granqvist, Emma and Iwaszkiewicz-Eggebrecht, Ela and Manoharan, Lokeshwaran and van Dijk, Laura J A and Goodsell, Robert and Godeiro, Nerivania N and Bellini, Bruno C and Orsholm, Johanna and Łukasik, Piotr and Miraldo, Andreia and Roslin, Tomas and Tack, Ayco J M and Andersson, Anders F and Ronquist, Fredrik}},
  issn         = {{1553-7358}},
  language     = {{eng}},
  month        = {{11}},
  number       = {{11}},
  publisher    = {{Public Library of Science (PLoS)}},
  series       = {{PLoS Computational Biology}},
  title        = {{Happ : High-accuracy pipeline for processing deep metabarcoding data}},
  url          = {{http://dx.doi.org/10.1371/journal.pcbi.1013558}},
  doi          = {{10.1371/journal.pcbi.1013558}},
  volume       = {{21}},
  year         = {{2025}},
}