Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

FS-GBDT : identification multicancer-risk module via a feature selection algorithm by integrating Fisher score and GBDT

Zhang, Jialin ; Xu, Da ; Hao, Kaijing ; Zhang, Yusen ; Chen, Wei ; Liu, Jiaguo ; Gao, Rui ; Wu, Chuanyan and De Marinis, Yang LU (2021) In Briefings in Bioinformatics 22(3).
Abstract

Cancer is a highly heterogeneous disease caused by dysregulation in different cell types and tissues. However, different cancers may share common mechanisms. It is critical to identify decisive genes involved in the development and progression of cancer, and joint analysis of multiple cancers may help to discover overlapping mechanisms among different cancers. In this study, we proposed a fusion feature selection framework attributed to ensemble method named Fisher score and Gradient Boosting Decision Tree (FS-GBDT) to select robust and decisive feature genes in high-dimensional gene expression datasets. Joint analysis of 11 human cancers types was conducted to explore the key feature genes subset of cancer. To verify the efficacy of... (More)

Cancer is a highly heterogeneous disease caused by dysregulation in different cell types and tissues. However, different cancers may share common mechanisms. It is critical to identify decisive genes involved in the development and progression of cancer, and joint analysis of multiple cancers may help to discover overlapping mechanisms among different cancers. In this study, we proposed a fusion feature selection framework attributed to ensemble method named Fisher score and Gradient Boosting Decision Tree (FS-GBDT) to select robust and decisive feature genes in high-dimensional gene expression datasets. Joint analysis of 11 human cancers types was conducted to explore the key feature genes subset of cancer. To verify the efficacy of FS-GBDT, we compared it with four other common feature selection algorithms by Support Vector Machine (SVM) classifier. The algorithm achieved highest indicators, outperforms other four methods. In addition, we performed gene ontology analysis and literature validation of the key gene subset, and this subset were classified into several functional modules. Functional modules can be used as markers of disease to replace single gene which is difficult to be found repeatedly in applications of gene chip, and to study the core mechanisms of cancer.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; ; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
bioinformatics, cancer classification, decision support systems, feature gene selection
in
Briefings in Bioinformatics
volume
22
issue
3
publisher
Oxford University Press
external identifiers
  • scopus:85106746209
  • pmid:34020547
ISSN
1477-4054
DOI
10.1093/bib/bbaa189
language
English
LU publication?
yes
id
64eed584-27bb-4118-8ade-56da92cca050
date added to LUP
2021-06-08 16:03:57
date last changed
2024-06-15 12:12:06
@article{64eed584-27bb-4118-8ade-56da92cca050,
  abstract     = {{<p>Cancer is a highly heterogeneous disease caused by dysregulation in different cell types and tissues. However, different cancers may share common mechanisms. It is critical to identify decisive genes involved in the development and progression of cancer, and joint analysis of multiple cancers may help to discover overlapping mechanisms among different cancers. In this study, we proposed a fusion feature selection framework attributed to ensemble method named Fisher score and Gradient Boosting Decision Tree (FS-GBDT) to select robust and decisive feature genes in high-dimensional gene expression datasets. Joint analysis of 11 human cancers types was conducted to explore the key feature genes subset of cancer. To verify the efficacy of FS-GBDT, we compared it with four other common feature selection algorithms by Support Vector Machine (SVM) classifier. The algorithm achieved highest indicators, outperforms other four methods. In addition, we performed gene ontology analysis and literature validation of the key gene subset, and this subset were classified into several functional modules. Functional modules can be used as markers of disease to replace single gene which is difficult to be found repeatedly in applications of gene chip, and to study the core mechanisms of cancer.</p>}},
  author       = {{Zhang, Jialin and Xu, Da and Hao, Kaijing and Zhang, Yusen and Chen, Wei and Liu, Jiaguo and Gao, Rui and Wu, Chuanyan and De Marinis, Yang}},
  issn         = {{1477-4054}},
  keywords     = {{bioinformatics; cancer classification; decision support systems; feature gene selection}},
  language     = {{eng}},
  month        = {{05}},
  number       = {{3}},
  publisher    = {{Oxford University Press}},
  series       = {{Briefings in Bioinformatics}},
  title        = {{FS-GBDT : identification multicancer-risk module via a feature selection algorithm by integrating Fisher score and GBDT}},
  url          = {{http://dx.doi.org/10.1093/bib/bbaa189}},
  doi          = {{10.1093/bib/bbaa189}},
  volume       = {{22}},
  year         = {{2021}},
}