Advanced

Robust assignment of cancer subtypes from expression data using a uni-variate gene expression average as classifier

Lauss, Martin LU ; Frigyesi, Attila LU ; Rydén, Tobias LU and Höglund, Mattias LU (2010) In BMC Cancer 10.
Abstract
Background: Genome wide gene expression data is a rich source for the identification of gene signatures suitable for clinical purposes and a number of statistical algorithms have been described for both identification and evaluation of such signatures. Some employed algorithms are fairly complex and hence sensitive to over-fitting whereas others are more simple and straight forward. Here we present a new type of simple algorithm based on ROC analysis and the use of metagenes that we believe will be a good complement to existing algorithms. Results: The basis for the proposed approach is the use of metagenes, instead of collections of individual genes, and a feature selection using AUC values obtained by ROC analysis. Each gene in a data... (More)
Background: Genome wide gene expression data is a rich source for the identification of gene signatures suitable for clinical purposes and a number of statistical algorithms have been described for both identification and evaluation of such signatures. Some employed algorithms are fairly complex and hence sensitive to over-fitting whereas others are more simple and straight forward. Here we present a new type of simple algorithm based on ROC analysis and the use of metagenes that we believe will be a good complement to existing algorithms. Results: The basis for the proposed approach is the use of metagenes, instead of collections of individual genes, and a feature selection using AUC values obtained by ROC analysis. Each gene in a data set is assigned an AUC value relative to the tumor class under investigation and the genes are ranked according to these values. Metagenes are then formed by calculating the mean expression level for an increasing number of ranked genes, and the metagene expression value that optimally discriminates tumor classes in the training set is used for classification of new samples. The performance of the metagene is then evaluated using LOOCV and balanced accuracies. Conclusions: We show that the simple uni-variate gene expression average algorithm performs as well as several alternative algorithms such as discriminant analysis and the more complex approaches such as SVM and neural networks. The R package rocc is freely available at http://cran.r-project.org/web/packages/rocc/index. (Less)
Please use this url to cite or link to this publication:
author
organization
publishing date
type
Contribution to journal
publication status
published
subject
in
BMC Cancer
volume
10
publisher
BioMed Central
external identifiers
  • wos:000283654600002
  • scopus:77957314278
ISSN
1471-2407
DOI
10.1186/1471-2407-10-532
language
English
LU publication?
yes
id
0612ced6-8c47-42ab-b2d6-4b36a42003a1 (old id 1720171)
date added to LUP
2010-12-01 11:14:00
date last changed
2018-05-29 12:31:08
@article{0612ced6-8c47-42ab-b2d6-4b36a42003a1,
  abstract     = {Background: Genome wide gene expression data is a rich source for the identification of gene signatures suitable for clinical purposes and a number of statistical algorithms have been described for both identification and evaluation of such signatures. Some employed algorithms are fairly complex and hence sensitive to over-fitting whereas others are more simple and straight forward. Here we present a new type of simple algorithm based on ROC analysis and the use of metagenes that we believe will be a good complement to existing algorithms. Results: The basis for the proposed approach is the use of metagenes, instead of collections of individual genes, and a feature selection using AUC values obtained by ROC analysis. Each gene in a data set is assigned an AUC value relative to the tumor class under investigation and the genes are ranked according to these values. Metagenes are then formed by calculating the mean expression level for an increasing number of ranked genes, and the metagene expression value that optimally discriminates tumor classes in the training set is used for classification of new samples. The performance of the metagene is then evaluated using LOOCV and balanced accuracies. Conclusions: We show that the simple uni-variate gene expression average algorithm performs as well as several alternative algorithms such as discriminant analysis and the more complex approaches such as SVM and neural networks. The R package rocc is freely available at http://cran.r-project.org/web/packages/rocc/index.},
  author       = {Lauss, Martin and Frigyesi, Attila and Rydén, Tobias and Höglund, Mattias},
  issn         = {1471-2407},
  language     = {eng},
  publisher    = {BioMed Central},
  series       = {BMC Cancer},
  title        = {Robust assignment of cancer subtypes from expression data using a uni-variate gene expression average as classifier},
  url          = {http://dx.doi.org/10.1186/1471-2407-10-532},
  volume       = {10},
  year         = {2010},
}