Advanced

Manifold Learning in Computational Biology

Nilsson, Jens LU (2008)
Abstract
This thesis deals with manifold learning techniques and their application in gene expression data analysis. Manifold learning is the study of methods that aim to infer geometrical structure from data sampled from manifolds, enabling nonlinear solutions to various machine learning tasks. Gene expression data analysis is the analysis of measurements of the abundance of gene products from a set of genes in the cell, which, by the use of microarray technology, can include the whole genome. Since the expression of one gene is dynamically linked to the expression of others, it is reasonable to assume that such expression data exhibits nonlinear structure, why it would be natural to approach its analysis using nonlinear methods, such as manifold... (More)
This thesis deals with manifold learning techniques and their application in gene expression data analysis. Manifold learning is the study of methods that aim to infer geometrical structure from data sampled from manifolds, enabling nonlinear solutions to various machine learning tasks. Gene expression data analysis is the analysis of measurements of the abundance of gene products from a set of genes in the cell, which, by the use of microarray technology, can include the whole genome. Since the expression of one gene is dynamically linked to the expression of others, it is reasonable to assume that such expression data exhibits nonlinear structure, why it would be natural to approach its analysis using nonlinear methods, such as manifold learning.



Within the methodological development of manifold learning this thesis presents a method for robust estimation of geodesic distances (paper I), and a method for supervised manifold learning based on kernel dimension reduction (paper II). An extension of the latter algorithm to partitioned data is also presented. Further, a method for variable importance assessment in manifold learning is proposed (paper IV).



Within gene expression data analysis, results are presented that demonstrates better performance of manifold learning methods compared to linear methods in visualization of microarray samples (paper III). It is also demonstrated how genes can be ranked according to their influence on the observed structure in such nonlinear representations (paper IV). Finally, it is shown how biologically relevant gene/gene similarity measures can be obtained using unsupervised and supervised manifold learning (paper V). (Less)
Please use this url to cite or link to this publication:
author
supervisor
opponent
  • Professor Tegnér, Jesper, Karolinska Institutet, Stockholm
organization
publishing date
type
Thesis
publication status
published
subject
keywords
Nonlinear Dimensionality Reduction, Gene Expression Data, Computational Biology, Machine Learning, Manifold Learning
pages
180 pages
publisher
Centre for Mathematical Sciences, Lund University
defense location
Lecture room MH:B, Centre for Math. Sciences, Sölvegatan 18, Lund
defense date
2008-03-14 13:15
ISSN
1404-0034
ISBN
978-91-628-7407-0
language
English
LU publication?
yes
id
9d52eb24-e1f4-4687-b1f4-6a7cf139a115 (old id 1034593)
date added to LUP
2008-02-19 13:53:02
date last changed
2016-09-19 08:44:47
@phdthesis{9d52eb24-e1f4-4687-b1f4-6a7cf139a115,
  abstract     = {This thesis deals with manifold learning techniques and their application in gene expression data analysis. Manifold learning is the study of methods that aim to infer geometrical structure from data sampled from manifolds, enabling nonlinear solutions to various machine learning tasks. Gene expression data analysis is the analysis of measurements of the abundance of gene products from a set of genes in the cell, which, by the use of microarray technology, can include the whole genome. Since the expression of one gene is dynamically linked to the expression of others, it is reasonable to assume that such expression data exhibits nonlinear structure, why it would be natural to approach its analysis using nonlinear methods, such as manifold learning.<br/><br>
<br/><br>
Within the methodological development of manifold learning this thesis presents a method for robust estimation of geodesic distances (paper I), and a method for supervised manifold learning based on kernel dimension reduction (paper II). An extension of the latter algorithm to partitioned data is also presented. Further, a method for variable importance assessment in manifold learning is proposed (paper IV). <br/><br>
	<br/><br>
Within gene expression data analysis, results are presented that demonstrates better performance of manifold learning methods compared to linear methods in visualization of microarray samples (paper III). It is also demonstrated how genes can be ranked according to their influence on the observed structure in such nonlinear representations (paper IV). Finally, it is shown how biologically relevant gene/gene similarity measures can be obtained using unsupervised and supervised manifold learning (paper V).},
  author       = {Nilsson, Jens},
  isbn         = {978-91-628-7407-0},
  issn         = {1404-0034},
  keyword      = {Nonlinear Dimensionality Reduction,Gene Expression Data,Computational Biology,Machine Learning,Manifold Learning},
  language     = {eng},
  pages        = {180},
  publisher    = {Centre for Mathematical Sciences, Lund University},
  school       = {Lund University},
  title        = {Manifold Learning in Computational Biology},
  year         = {2008},
}