A Comprehensive Evaluation of Consensus Spectrum Generation Methods in Proteomics

Luo, Xiyang; Bittremieux, Wout; Griss, Johannes; Deutsch, Eric W.; Sachsenberg, Timo; Levitsky, Lev I.; Ivanov, Mark V.; Bubis, Julia A.; Gabriels, Ralf; Webel, Henry; Sanchez, Aniel; Bai, Mingze; Käll, Lukas; Perez-Riverol, Yasset

A Comprehensive Evaluation of Consensus Spectrum Generation Methods in Proteomics

Mark

Luo, Xiyang ; Bittremieux, Wout ; Griss, Johannes ; Deutsch, Eric W. ; Sachsenberg, Timo ; Levitsky, Lev I. ; Ivanov, Mark V. ; Bubis, Julia A. ; Gabriels, Ralf and Webel, Henry , et al. (2022) In Journal of Proteome Research 21(6). p.1566-1574

Abstract: Spectrum clustering is a powerful strategy to minimize redundant mass spectra by grouping them based on similarity, with the aim of forming groups of mass spectra from the same repeatedly measured analytes. Each such group of near-identical spectra can be represented by its so-called consensus spectrum for downstream processing. Although several algorithms for spectrum clustering have been adequately benchmarked and tested, the influence of the consensus spectrum generation step is rarely evaluated. Here, we present an implementation and benchmark of common consensus spectrum algorithms, including spectrum averaging, spectrum binning, the most similar spectrum, and the best-identified spectrum. We have analyzed diverse public data sets... (More); Spectrum clustering is a powerful strategy to minimize redundant mass spectra by grouping them based on similarity, with the aim of forming groups of mass spectra from the same repeatedly measured analytes. Each such group of near-identical spectra can be represented by its so-called consensus spectrum for downstream processing. Although several algorithms for spectrum clustering have been adequately benchmarked and tested, the influence of the consensus spectrum generation step is rarely evaluated. Here, we present an implementation and benchmark of common consensus spectrum algorithms, including spectrum averaging, spectrum binning, the most similar spectrum, and the best-identified spectrum. We have analyzed diverse public data sets using two different clustering algorithms (spectra-cluster and MaRaCluster) to evaluate how the consensus spectrum generation procedure influences downstream peptide identification. The BEST and BIN methods were found the most reliable methods for consensus spectrum generation, including for data sets with post-translational modifications (PTM) such as phosphorylation. All source code and data of the present study are freely available on GitHub at https://github.com/statisticalbiotechnology/representative-spectra-benchmark.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/5ec52b9c-5e95-4ff2-8458-9fb782a1d647

author

Luo, Xiyang ; Bittremieux, Wout ; Griss, Johannes ; Deutsch, Eric W. ; Sachsenberg, Timo ; Levitsky, Lev I. ; Ivanov, Mark V. ; Bubis, Julia A. ; Gabriels, Ralf and Webel, Henry , et al. (More)

Luo, Xiyang ; Bittremieux, Wout ; Griss, Johannes ; Deutsch, Eric W. ; Sachsenberg, Timo ; Levitsky, Lev I. ; Ivanov, Mark V. ; Bubis, Julia A. ; Gabriels, Ralf ; Webel, Henry ; Sanchez, Aniel ^LU ; Bai, Mingze ; Käll, Lukas and Perez-Riverol, Yasset (Less)

organization

publishing date

2022-06-03

type

Contribution to journal

publication status

published

subject

Other Natural Sciences

keywords

benchmark, big data, clustering, consensus spectra, mass spectrometry, pride database, ProteomeXchange, spectral libraries

in

Journal of Proteome Research

volume

21

issue

6

pages

9 pages

publisher

The American Chemical Society (ACS)

external identifiers

pmid:35549218
scopus:85131214939

ISSN

1535-3893

DOI

10.1021/acs.jproteome.2c00069

language

English

LU publication?

yes

additional info

Funding Information: The authors would like to acknowledge the EuBIC-MS community that organized the EuBIC-MS Developer Meeting in January 2020, triggering the original discussions and implementations of this work. L.K. was supported by a grant from the Swedish Research Council (Grant 2017-04030). Publisher Copyright: © 2022 American Chemical Society. All rights reserved.

id

5ec52b9c-5e95-4ff2-8458-9fb782a1d647

date added to LUP

2022-10-07 13:54:34

date last changed

2026-01-12 00:40:28

@article{5ec52b9c-5e95-4ff2-8458-9fb782a1d647,
  abstract     = {{<p>Spectrum clustering is a powerful strategy to minimize redundant mass spectra by grouping them based on similarity, with the aim of forming groups of mass spectra from the same repeatedly measured analytes. Each such group of near-identical spectra can be represented by its so-called consensus spectrum for downstream processing. Although several algorithms for spectrum clustering have been adequately benchmarked and tested, the influence of the consensus spectrum generation step is rarely evaluated. Here, we present an implementation and benchmark of common consensus spectrum algorithms, including spectrum averaging, spectrum binning, the most similar spectrum, and the best-identified spectrum. We have analyzed diverse public data sets using two different clustering algorithms (spectra-cluster and MaRaCluster) to evaluate how the consensus spectrum generation procedure influences downstream peptide identification. The BEST and BIN methods were found the most reliable methods for consensus spectrum generation, including for data sets with post-translational modifications (PTM) such as phosphorylation. All source code and data of the present study are freely available on GitHub at https://github.com/statisticalbiotechnology/representative-spectra-benchmark.</p>}},
  author       = {{Luo, Xiyang and Bittremieux, Wout and Griss, Johannes and Deutsch, Eric W. and Sachsenberg, Timo and Levitsky, Lev I. and Ivanov, Mark V. and Bubis, Julia A. and Gabriels, Ralf and Webel, Henry and Sanchez, Aniel and Bai, Mingze and Käll, Lukas and Perez-Riverol, Yasset}},
  issn         = {{1535-3893}},
  keywords     = {{benchmark; big data; clustering; consensus spectra; mass spectrometry; pride database; ProteomeXchange; spectral libraries}},
  language     = {{eng}},
  month        = {{06}},
  number       = {{6}},
  pages        = {{1566--1574}},
  publisher    = {{The American Chemical Society (ACS)}},
  series       = {{Journal of Proteome Research}},
  title        = {{A Comprehensive Evaluation of Consensus Spectrum Generation Methods in Proteomics}},
  url          = {{http://dx.doi.org/10.1021/acs.jproteome.2c00069}},
  doi          = {{10.1021/acs.jproteome.2c00069}},
  volume       = {{21}},
  year         = {{2022}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

A Comprehensive Evaluation of Consensus Spectrum Generation Methods in Proteomics