A community effort to identify and correct mislabeled samples in proteogenomic studies
(2021) In Patterns 2(5).- Abstract
- Sample mislabeling or misannotation has been a long-standing problem in scientific research, particularly prevalent in large-scale, multi-omic studies due to the complexity of multi-omic workflows. There exists an urgent need for implementing quality controls to automatically screen for and correct sample mislabels or misannotations in multi-omic studies. Here, we describe a crowdsourced precisionFDA NCI-CPTAC Multi-omics Enabled Sample Mislabeling Correction Challenge, which provides a framework for systematic benchmarking and evaluation of mislabel identification and correction methods for integrative proteogenomic studies. The challenge received a large number of submissions from domestic and international data scientists, with highly... (More)
- Sample mislabeling or misannotation has been a long-standing problem in scientific research, particularly prevalent in large-scale, multi-omic studies due to the complexity of multi-omic workflows. There exists an urgent need for implementing quality controls to automatically screen for and correct sample mislabels or misannotations in multi-omic studies. Here, we describe a crowdsourced precisionFDA NCI-CPTAC Multi-omics Enabled Sample Mislabeling Correction Challenge, which provides a framework for systematic benchmarking and evaluation of mislabel identification and correction methods for integrative proteogenomic studies. The challenge received a large number of submissions from domestic and international data scientists, with highly variable performance observed across the submitted methods. Post-challenge collaboration between the top-performing teams and the challenge organizers has created an open-source software, COSMO, with demonstrated high accuracy and robustness in mislabeling identification and correction in simulated and real multi-omic datasets. (Less)
- Abstract (Swedish)
- In a community effort to combat sample mislabeling in multi-omic studies, computational solutions received show a wide range of accuracy. The final collaborative product, COSMO, achieves high performance. Applying COSMO to published datasets demonstrates biological impact of the tool.
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/6db11878-fef5-4572-8ae5-4094615c5b88
- author
- organization
- publishing date
- 2021-05-14
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- proteomics, genomics, mislabeling
- in
- Patterns
- volume
- 2
- issue
- 5
- article number
- 100245
- pages
- 14 pages
- publisher
- Cell Press
- external identifiers
-
- scopus:85105706455
- pmid:34036290
- ISSN
- 2666-3899
- DOI
- 10.1016/j.patter.2021.100245
- language
- English
- LU publication?
- yes
- id
- 6db11878-fef5-4572-8ae5-4094615c5b88
- date added to LUP
- 2021-05-19 11:45:42
- date last changed
- 2024-05-17 12:50:04
@article{6db11878-fef5-4572-8ae5-4094615c5b88, abstract = {{Sample mislabeling or misannotation has been a long-standing problem in scientific research, particularly prevalent in large-scale, multi-omic studies due to the complexity of multi-omic workflows. There exists an urgent need for implementing quality controls to automatically screen for and correct sample mislabels or misannotations in multi-omic studies. Here, we describe a crowdsourced precisionFDA NCI-CPTAC Multi-omics Enabled Sample Mislabeling Correction Challenge, which provides a framework for systematic benchmarking and evaluation of mislabel identification and correction methods for integrative proteogenomic studies. The challenge received a large number of submissions from domestic and international data scientists, with highly variable performance observed across the submitted methods. Post-challenge collaboration between the top-performing teams and the challenge organizers has created an open-source software, COSMO, with demonstrated high accuracy and robustness in mislabeling identification and correction in simulated and real multi-omic datasets.}}, author = {{Yoo, Seungyeul and Shi, Zhiao and Wen, Bo and Kho, SoonJye and Pan, Renke and Feng, Hanying and Chen, Hong and Carlsson, Anders and Edén, Patrik and Ma, Weiping and Raymer, Michael and Maier, Ezekiel J. and Tezak, Zivana and Johansson, Elaine and Hinton, Denise and Rodriguez, Henry and Zhu, Jun and Boja, Emily and Wang, Pei and Zhang, Bing}}, issn = {{2666-3899}}, keywords = {{proteomics, genomics, mislabeling}}, language = {{eng}}, month = {{05}}, number = {{5}}, publisher = {{Cell Press}}, series = {{Patterns}}, title = {{A community effort to identify and correct mislabeled samples in proteogenomic studies}}, url = {{http://dx.doi.org/10.1016/j.patter.2021.100245}}, doi = {{10.1016/j.patter.2021.100245}}, volume = {{2}}, year = {{2021}}, }