Advanced

Improved modeling of clinical data with kernel methods

Daemen, Anneleen; Timmerman, Dirk; Van den Bosch, Thierry; Bottomley, Cecilia; Kirk, Emma; Van Holsbeke, Caroline; Valentin, Lil LU ; Bourne, Tom and De Moor, Bart (2012) In Artificial Intelligence in Medicine 54(2). p.103-114
Abstract
Objective: Despite the rise of high-throughput technologies, clinical data such as age, gender and medical history guide clinical management for most diseases and examinations. To improve clinical management, available patient information should be fully exploited. This requires appropriate modeling of relevant parameters. Methods: When kernel methods are used, traditional kernel functions such as the linear kernel are often applied to the set of clinical parameters. These kernel functions, however, have their disadvantages due to the specific characteristics of clinical data, being a mix of variable types with each variable its own range. We propose a new kernel function specifically adapted to the characteristics of clinical data.... (More)
Objective: Despite the rise of high-throughput technologies, clinical data such as age, gender and medical history guide clinical management for most diseases and examinations. To improve clinical management, available patient information should be fully exploited. This requires appropriate modeling of relevant parameters. Methods: When kernel methods are used, traditional kernel functions such as the linear kernel are often applied to the set of clinical parameters. These kernel functions, however, have their disadvantages due to the specific characteristics of clinical data, being a mix of variable types with each variable its own range. We propose a new kernel function specifically adapted to the characteristics of clinical data. Results: The clinical kernel function provides a better representation of patients' similarity by equalizing the influence of all variables and taking into account the range r of the variables. Moreover, it is robust with respect to changes in r. Incorporated in a least squares support vector machine, the new kernel function results in significantly improved diagnosis, prognosis and prediction of therapy response. This is illustrated on four clinical data sets within gynecology, with an average increase in test area under the ROC curve (AUC) of 0.023, 0.021, 0.122 and 0.019, respectively. Moreover, when combining clinical parameters and expression data in three case studies on breast cancer, results improved overall with use of the new kernel function and when considering both data types in a weighted fashion, with a larger weight assigned to the clinical parameters. The increase in AUC with respect to a standard kernel function and/or unweighted data combination was maximum 0.127, 0.042 and 0.118 for the three case studies. Conclusion: For clinical data consisting of variables of different types, the proposed kernel function which takes into account the type and range of each variable - has shown to be a better alternative for linear and non-linear classification problems. (C) 2011 Elsevier B.V. All rights reserved. (Less)
Please use this url to cite or link to this publication:
author
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Machine learning, Support vector machine, Kernel function, Biostatistics, Clinical data representation, Clinical decision support, system, Gynecology, Breast cancer
in
Artificial Intelligence in Medicine
volume
54
issue
2
pages
103 - 114
publisher
Elsevier
external identifiers
  • wos:000300604200002
  • scopus:84855961334
ISSN
1873-2860
DOI
10.1016/j.artmed.2011.11.001
language
English
LU publication?
yes
id
d34ebe7b-6e4d-4ce4-9bbe-d6465875f68a (old id 2403156)
date added to LUP
2012-04-02 09:27:07
date last changed
2017-07-30 03:03:43
@article{d34ebe7b-6e4d-4ce4-9bbe-d6465875f68a,
  abstract     = {Objective: Despite the rise of high-throughput technologies, clinical data such as age, gender and medical history guide clinical management for most diseases and examinations. To improve clinical management, available patient information should be fully exploited. This requires appropriate modeling of relevant parameters. Methods: When kernel methods are used, traditional kernel functions such as the linear kernel are often applied to the set of clinical parameters. These kernel functions, however, have their disadvantages due to the specific characteristics of clinical data, being a mix of variable types with each variable its own range. We propose a new kernel function specifically adapted to the characteristics of clinical data. Results: The clinical kernel function provides a better representation of patients' similarity by equalizing the influence of all variables and taking into account the range r of the variables. Moreover, it is robust with respect to changes in r. Incorporated in a least squares support vector machine, the new kernel function results in significantly improved diagnosis, prognosis and prediction of therapy response. This is illustrated on four clinical data sets within gynecology, with an average increase in test area under the ROC curve (AUC) of 0.023, 0.021, 0.122 and 0.019, respectively. Moreover, when combining clinical parameters and expression data in three case studies on breast cancer, results improved overall with use of the new kernel function and when considering both data types in a weighted fashion, with a larger weight assigned to the clinical parameters. The increase in AUC with respect to a standard kernel function and/or unweighted data combination was maximum 0.127, 0.042 and 0.118 for the three case studies. Conclusion: For clinical data consisting of variables of different types, the proposed kernel function which takes into account the type and range of each variable - has shown to be a better alternative for linear and non-linear classification problems. (C) 2011 Elsevier B.V. All rights reserved.},
  author       = {Daemen, Anneleen and Timmerman, Dirk and Van den Bosch, Thierry and Bottomley, Cecilia and Kirk, Emma and Van Holsbeke, Caroline and Valentin, Lil and Bourne, Tom and De Moor, Bart},
  issn         = {1873-2860},
  keyword      = {Machine learning,Support vector machine,Kernel function,Biostatistics,Clinical data representation,Clinical decision support,system,Gynecology,Breast cancer},
  language     = {eng},
  number       = {2},
  pages        = {103--114},
  publisher    = {Elsevier},
  series       = {Artificial Intelligence in Medicine},
  title        = {Improved modeling of clinical data with kernel methods},
  url          = {http://dx.doi.org/10.1016/j.artmed.2011.11.001},
  volume       = {54},
  year         = {2012},
}