Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Ultrasensitive sequencing of STR markers utilizing unique molecular identifiers and the SiMSen-Seq method

Sidstedt, Maja ; Gynnå, Arvid H. ; Kiesler, Kevin M. ; Jansson, Linda LU ; Steffen, Carolyn R. ; Håkansson, Joakim ; Johansson, Gustav ; Österlund, Tobias ; Bogestål, Yalda and Tillmar, Andreas , et al. (2024) In Forensic Science International: Genetics 71.
Abstract

Massively parallel sequencing (MPS) is increasingly applied in forensic short tandem repeat (STR) analysis. The presence of stutter artefacts and other PCR or sequencing errors in the MPS-STR data partly limits the detection of low DNA amounts, e.g., in complex mixtures. Unique molecular identifiers (UMIs) have been applied in several scientific fields to reduce noise in sequencing. UMIs consist of a stretch of random nucleotides, a unique barcode for each starting DNA molecule, that is incorporated in the DNA template using either ligation or PCR. The barcode is used to generate consensus reads, thus removing errors. The SiMSen-Seq (Simple, multiplexed, PCR-based barcoding of DNA for sensitive mutation detection using sequencing)... (More)

Massively parallel sequencing (MPS) is increasingly applied in forensic short tandem repeat (STR) analysis. The presence of stutter artefacts and other PCR or sequencing errors in the MPS-STR data partly limits the detection of low DNA amounts, e.g., in complex mixtures. Unique molecular identifiers (UMIs) have been applied in several scientific fields to reduce noise in sequencing. UMIs consist of a stretch of random nucleotides, a unique barcode for each starting DNA molecule, that is incorporated in the DNA template using either ligation or PCR. The barcode is used to generate consensus reads, thus removing errors. The SiMSen-Seq (Simple, multiplexed, PCR-based barcoding of DNA for sensitive mutation detection using sequencing) method relies on PCR-based introduction of UMIs and includes a sophisticated hairpin design to reduce unspecific primer binding as well as PCR protocol adjustments to further optimize the reaction. In this study, SiMSen-Seq is applied to develop a proof-of-concept seven STR multiplex for MPS library preparation and an associated bioinformatics pipeline. Additionally, machine learning (ML) models were evaluated to further improve UMI allele calling. Overall, the seven STR multiplex resulted in complete detection and concordant alleles for 47 single-source samples at 1 ng input DNA as well as for low-template samples at 62.5 pg input DNA. For twelve challenging mixtures with minor contributions of 10 pg to 150 pg and ratios of 1–15% relative to the major donor, 99.2% of the expected alleles were detected by applying the UMIs in combination with an ML filter. The main impact of UMIs was a substantially lowered number of artefacts as well as reduced stutter ratios, which were generally below 5% of the parental allele. In conclusion, UMI-based STR sequencing opens new means for improved analysis of challenging crime scene samples including complex mixtures.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; ; ; ; ; and , et al. (More)
; ; ; ; ; ; ; ; ; ; ; ; and (Less)
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Forensic DNA, Machine learning, Massively parallel sequencing, Short tandem repeats, Targeted PCR, UMI
in
Forensic Science International: Genetics
volume
71
article number
103047
publisher
Elsevier
external identifiers
  • pmid:38598919
  • scopus:85189897942
ISSN
1872-4973
DOI
10.1016/j.fsigen.2024.103047
language
English
LU publication?
yes
id
304632e3-0336-4516-a93d-670e9c6e010e
date added to LUP
2024-04-19 10:31:36
date last changed
2024-06-14 15:43:54
@article{304632e3-0336-4516-a93d-670e9c6e010e,
  abstract     = {{<p>Massively parallel sequencing (MPS) is increasingly applied in forensic short tandem repeat (STR) analysis. The presence of stutter artefacts and other PCR or sequencing errors in the MPS-STR data partly limits the detection of low DNA amounts, e.g., in complex mixtures. Unique molecular identifiers (UMIs) have been applied in several scientific fields to reduce noise in sequencing. UMIs consist of a stretch of random nucleotides, a unique barcode for each starting DNA molecule, that is incorporated in the DNA template using either ligation or PCR. The barcode is used to generate consensus reads, thus removing errors. The SiMSen-Seq (Simple, multiplexed, PCR-based barcoding of DNA for sensitive mutation detection using sequencing) method relies on PCR-based introduction of UMIs and includes a sophisticated hairpin design to reduce unspecific primer binding as well as PCR protocol adjustments to further optimize the reaction. In this study, SiMSen-Seq is applied to develop a proof-of-concept seven STR multiplex for MPS library preparation and an associated bioinformatics pipeline. Additionally, machine learning (ML) models were evaluated to further improve UMI allele calling. Overall, the seven STR multiplex resulted in complete detection and concordant alleles for 47 single-source samples at 1 ng input DNA as well as for low-template samples at 62.5 pg input DNA. For twelve challenging mixtures with minor contributions of 10 pg to 150 pg and ratios of 1–15% relative to the major donor, 99.2% of the expected alleles were detected by applying the UMIs in combination with an ML filter. The main impact of UMIs was a substantially lowered number of artefacts as well as reduced stutter ratios, which were generally below 5% of the parental allele. In conclusion, UMI-based STR sequencing opens new means for improved analysis of challenging crime scene samples including complex mixtures.</p>}},
  author       = {{Sidstedt, Maja and Gynnå, Arvid H. and Kiesler, Kevin M. and Jansson, Linda and Steffen, Carolyn R. and Håkansson, Joakim and Johansson, Gustav and Österlund, Tobias and Bogestål, Yalda and Tillmar, Andreas and Rådström, Peter and Ståhlberg, Anders and Vallone, Peter M. and Hedman, Johannes}},
  issn         = {{1872-4973}},
  keywords     = {{Forensic DNA; Machine learning; Massively parallel sequencing; Short tandem repeats; Targeted PCR; UMI}},
  language     = {{eng}},
  publisher    = {{Elsevier}},
  series       = {{Forensic Science International: Genetics}},
  title        = {{Ultrasensitive sequencing of STR markers utilizing unique molecular identifiers and the SiMSen-Seq method}},
  url          = {{http://dx.doi.org/10.1016/j.fsigen.2024.103047}},
  doi          = {{10.1016/j.fsigen.2024.103047}},
  volume       = {{71}},
  year         = {{2024}},
}