Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

A community effort to identify and correct mislabeled samples in proteogenomic studies

Yoo, Seungyeul ; Shi, Zhiao ; Wen, Bo ; Kho, SoonJye ; Pan, Renke ; Feng, Hanying ; Chen, Hong ; Carlsson, Anders LU ; Edén, Patrik LU and Ma, Weiping , et al. (2021) In Patterns 2(5).
Abstract
Sample mislabeling or misannotation has been a long-standing problem in scientific research, particularly prevalent in large-scale, multi-omic studies due to the complexity of multi-omic workflows. There exists an urgent need for implementing quality controls to automatically screen for and correct sample mislabels or misannotations in multi-omic studies. Here, we describe a crowdsourced precisionFDA NCI-CPTAC Multi-omics Enabled Sample Mislabeling Correction Challenge, which provides a framework for systematic benchmarking and evaluation of mislabel identification and correction methods for integrative proteogenomic studies. The challenge received a large number of submissions from domestic and international data scientists, with highly... (More)
Sample mislabeling or misannotation has been a long-standing problem in scientific research, particularly prevalent in large-scale, multi-omic studies due to the complexity of multi-omic workflows. There exists an urgent need for implementing quality controls to automatically screen for and correct sample mislabels or misannotations in multi-omic studies. Here, we describe a crowdsourced precisionFDA NCI-CPTAC Multi-omics Enabled Sample Mislabeling Correction Challenge, which provides a framework for systematic benchmarking and evaluation of mislabel identification and correction methods for integrative proteogenomic studies. The challenge received a large number of submissions from domestic and international data scientists, with highly variable performance observed across the submitted methods. Post-challenge collaboration between the top-performing teams and the challenge organizers has created an open-source software, COSMO, with demonstrated high accuracy and robustness in mislabeling identification and correction in simulated and real multi-omic datasets. (Less)
Abstract (Swedish)
In a community effort to combat sample mislabeling in multi-omic studies, computational solutions received show a wide range of accuracy. The final collaborative product, COSMO, achieves high performance. Applying COSMO to published datasets demonstrates biological impact of the tool.
Please use this url to cite or link to this publication:
author
; ; ; ; ; ; ; ; and , et al. (More)
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; and (Less)
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
proteomics, genomics, mislabeling
in
Patterns
volume
2
issue
5
article number
100245
pages
14 pages
publisher
Cell Press
external identifiers
  • scopus:85105706455
  • pmid:34036290
ISSN
2666-3899
DOI
10.1016/j.patter.2021.100245
language
English
LU publication?
yes
id
6db11878-fef5-4572-8ae5-4094615c5b88
date added to LUP
2021-05-19 11:45:42
date last changed
2024-05-17 12:50:04
@article{6db11878-fef5-4572-8ae5-4094615c5b88,
  abstract     = {{Sample mislabeling or misannotation has been a long-standing problem in scientific research, particularly prevalent in large-scale, multi-omic studies due to the complexity of multi-omic workflows. There exists an urgent need for implementing quality controls to automatically screen for and correct sample mislabels or misannotations in multi-omic studies. Here, we describe a crowdsourced precisionFDA NCI-CPTAC Multi-omics Enabled Sample Mislabeling Correction Challenge, which provides a framework for systematic benchmarking and evaluation of mislabel identification and correction methods for integrative proteogenomic studies. The challenge received a large number of submissions from domestic and international data scientists, with highly variable performance observed across the submitted methods. Post-challenge collaboration between the top-performing teams and the challenge organizers has created an open-source software, COSMO, with demonstrated high accuracy and robustness in mislabeling identification and correction in simulated and real multi-omic datasets.}},
  author       = {{Yoo, Seungyeul and Shi, Zhiao and Wen, Bo and Kho, SoonJye and Pan, Renke and Feng, Hanying and Chen, Hong and Carlsson, Anders and Edén, Patrik and Ma, Weiping and Raymer, Michael and Maier, Ezekiel J. and Tezak, Zivana and Johansson, Elaine and Hinton, Denise and Rodriguez, Henry and Zhu, Jun and Boja, Emily and Wang, Pei and Zhang, Bing}},
  issn         = {{2666-3899}},
  keywords     = {{proteomics, genomics, mislabeling}},
  language     = {{eng}},
  month        = {{05}},
  number       = {{5}},
  publisher    = {{Cell Press}},
  series       = {{Patterns}},
  title        = {{A community effort to identify and correct mislabeled samples in proteogenomic studies}},
  url          = {{http://dx.doi.org/10.1016/j.patter.2021.100245}},
  doi          = {{10.1016/j.patter.2021.100245}},
  volume       = {{2}},
  year         = {{2021}},
}