Multienzyme deep learning models improve peptide de novo sequencing by mass spectrometry proteomics

Gueto-Tettay, Carlos; Tang, Di; Happonen, Lotta; Heusel, Moritz; Khakzad, Hamed; Malmström, Johan; Malmström, Lars

Multienzyme deep learning models improve peptide de novo sequencing by mass spectrometry proteomics

Mark

Gueto-Tettay, Carlos ^LU ; Tang, Di ^LU

; Happonen, Lotta ^LU ; Heusel, Moritz ^LU ; Khakzad, Hamed ; Malmström, Johan ^LU

and Malmström, Lars ^LU (2023) In PLoS Computational Biology 19(1).

Abstract: Generating and analyzing overlapping peptides through multienzymatic digestion is an efficient procedure for de novo protein using from bottom-up mass spectrometry (MS). Despite improved instrumentation and software, de novo MS data analysis remains challenging. In recent years, deep learning models have represented a performance breakthrough. Incorporating that technology into de novo protein sequencing workflows require machine-learning models capable of handling highly diverse MS data. In this study, we analyzed the requirements for assembling such generalizable deep learning models by systemcally varying the composition and size of the training set. We assessed the generated models' performances using two test sets composed of... (More); Generating and analyzing overlapping peptides through multienzymatic digestion is an efficient procedure for de novo protein using from bottom-up mass spectrometry (MS). Despite improved instrumentation and software, de novo MS data analysis remains challenging. In recent years, deep learning models have represented a performance breakthrough. Incorporating that technology into de novo protein sequencing workflows require machine-learning models capable of handling highly diverse MS data. In this study, we analyzed the requirements for assembling such generalizable deep learning models by systemcally varying the composition and size of the training set. We assessed the generated models' performances using two test sets composed of peptides originating from the multienzyme digestion of samples from various species. The peptide recall values on the test sets showed that the deep learning models generated from a collection of highly N- and C-termini diverse peptides generalized 76% more over the termini-restricted ones. Moreover, expanding the training set's size by adding peptides from the multienzymatic digestion with five proteases of several species samples led to a 2-3 fold generalizability gain. Furthermore, we tested the applicability of these multienzyme deep learning (MEM) models by fully de novo sequencing the heavy and light monomeric chains of five commercial antibodies (mAbs). MEMs extracted over 10000 matching and overlapped peptides across six different proteases mAb samples, achieving a 100% sequence coverage for 8 of the ten polypeptide chains. We foretell that the MEMs' proven improvements to de novo analysis will positively impact several applications, such as analyzing samples of high complexity, unknown nature, or the peptidomics field.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/4c4d0359-e5be-4657-8c26-f47af388e476

author

Gueto-Tettay, Carlos ^LU ; Tang, Di ^LU

; Happonen, Lotta ^LU ; Heusel, Moritz ^LU ; Khakzad, Hamed ; Malmström, Johan ^LU

and Malmström, Lars ^LU

organization

publishing date

2023-01

type

Contribution to journal

publication status

published

subject

Bioinformatics (Computational Biology)

in

PLoS Computational Biology

volume

19

issue

1

article number

e1010457

publisher

Public Library of Science (PLoS)

external identifiers

pmid:36668672
scopus:85147040937

ISSN

1553-734X

DOI

10.1371/journal.pcbi.1010457

project

Properties of Protective Antibody Responses against Bacterial Pathogens

language

English

LU publication?

yes

id

4c4d0359-e5be-4657-8c26-f47af388e476

date added to LUP

2023-02-13 11:22:45

date last changed

2026-01-12 04:08:04

@article{4c4d0359-e5be-4657-8c26-f47af388e476,
  abstract     = {{<p>Generating and analyzing overlapping peptides through multienzymatic digestion is an efficient procedure for de novo protein using from bottom-up mass spectrometry (MS). Despite improved instrumentation and software, de novo MS data analysis remains challenging. In recent years, deep learning models have represented a performance breakthrough. Incorporating that technology into de novo protein sequencing workflows require machine-learning models capable of handling highly diverse MS data. In this study, we analyzed the requirements for assembling such generalizable deep learning models by systemcally varying the composition and size of the training set. We assessed the generated models' performances using two test sets composed of peptides originating from the multienzyme digestion of samples from various species. The peptide recall values on the test sets showed that the deep learning models generated from a collection of highly N- and C-termini diverse peptides generalized 76% more over the termini-restricted ones. Moreover, expanding the training set's size by adding peptides from the multienzymatic digestion with five proteases of several species samples led to a 2-3 fold generalizability gain. Furthermore, we tested the applicability of these multienzyme deep learning (MEM) models by fully de novo sequencing the heavy and light monomeric chains of five commercial antibodies (mAbs). MEMs extracted over 10000 matching and overlapped peptides across six different proteases mAb samples, achieving a 100% sequence coverage for 8 of the ten polypeptide chains. We foretell that the MEMs' proven improvements to de novo analysis will positively impact several applications, such as analyzing samples of high complexity, unknown nature, or the peptidomics field.</p>}},
  author       = {{Gueto-Tettay, Carlos and Tang, Di and Happonen, Lotta and Heusel, Moritz and Khakzad, Hamed and Malmström, Johan and Malmström, Lars}},
  issn         = {{1553-734X}},
  language     = {{eng}},
  number       = {{1}},
  publisher    = {{Public Library of Science (PLoS)}},
  series       = {{PLoS Computational Biology}},
  title        = {{Multienzyme deep learning models improve peptide de novo sequencing by mass spectrometry proteomics}},
  url          = {{http://dx.doi.org/10.1371/journal.pcbi.1010457}},
  doi          = {{10.1371/journal.pcbi.1010457}},
  volume       = {{19}},
  year         = {{2023}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Multienzyme deep learning models improve peptide de novo sequencing by mass spectrometry proteomics