Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Binary tree-structured vector quantization approach to clustering and visualizing microarray data

Sultan, M. ; Wigle, D. A. ; Cumbaa, C. A. ; Maziarz, M. LU ; Glasgow, J. ; Tsao, M. S. and Jurisica, I. (2002) In Bioinformatics 18(SUPPL. 1).
Abstract

Motivation: With the increasing number of gene expression databases, the need for more powerful analysis and visualization tools is growing. Many techniques have successfully been applied to unravel latent similarities among genes and/or experiments. Most of the current systems for microarray data analysis use statistical methods, hierarchical clustering, self-organizing maps, support vector machines, or κ-means clustering to organize genes or experiments into 'meaningful' groups. Without prior explicit bias almost all of these clustering methods applied to gene expression data not only produce different results, but may also preduce clusters with little or no biological relevance. Of these methods, agglomerative hierarchical clustering... (More)

Motivation: With the increasing number of gene expression databases, the need for more powerful analysis and visualization tools is growing. Many techniques have successfully been applied to unravel latent similarities among genes and/or experiments. Most of the current systems for microarray data analysis use statistical methods, hierarchical clustering, self-organizing maps, support vector machines, or κ-means clustering to organize genes or experiments into 'meaningful' groups. Without prior explicit bias almost all of these clustering methods applied to gene expression data not only produce different results, but may also preduce clusters with little or no biological relevance. Of these methods, agglomerative hierarchical clustering has been the most widely applied, although many limitations have been identified. Results: Starting with a systematic comparison of the underlying theories behind clustering approaches, we have devised a technique that combines tree-structured vector quantization and partitive κ-means clustering (BTSVQ). This hybrid technique has revealed clinically relevant clusters in three large publicly available data sets. In contrast to existing systems, our approach is less sensitive to data preprocessing and data normalization. In addition, the clustering results produced by the technique have strong similarities to those of self-organizing maps (SOMs). We discuss the advantages and the mathematical reasoning behind our approach. Availability: The BTSVQ system is implemented in Matlab R12 using the SOM toolbox for the visualization and preprocessing of the data (http://www.cis.hut.fi/projects/ somtoolbox/). BTSVQ is available for non-commercial use (http://www.uhnres.utoronto.ca/ta3/BTSVQ).

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; ; and
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Lung cancer, Microarray data clustering and visulization, Self-organizing maps, partitive κ-means clustering
in
Bioinformatics
volume
18
issue
SUPPL. 1
publisher
Oxford University Press
external identifiers
  • pmid:12169538
  • scopus:0012321068
ISSN
1367-4803
DOI
10.1093/bioinformatics/18.suppl_1.S111
language
English
LU publication?
no
id
e7b3b85f-34db-4e5f-8dc7-3553a625dc67
date added to LUP
2019-08-05 13:20:08
date last changed
2024-02-15 18:26:37
@article{e7b3b85f-34db-4e5f-8dc7-3553a625dc67,
  abstract     = {{<p>Motivation: With the increasing number of gene expression databases, the need for more powerful analysis and visualization tools is growing. Many techniques have successfully been applied to unravel latent similarities among genes and/or experiments. Most of the current systems for microarray data analysis use statistical methods, hierarchical clustering, self-organizing maps, support vector machines, or κ-means clustering to organize genes or experiments into 'meaningful' groups. Without prior explicit bias almost all of these clustering methods applied to gene expression data not only produce different results, but may also preduce clusters with little or no biological relevance. Of these methods, agglomerative hierarchical clustering has been the most widely applied, although many limitations have been identified. Results: Starting with a systematic comparison of the underlying theories behind clustering approaches, we have devised a technique that combines tree-structured vector quantization and partitive κ-means clustering (BTSVQ). This hybrid technique has revealed clinically relevant clusters in three large publicly available data sets. In contrast to existing systems, our approach is less sensitive to data preprocessing and data normalization. In addition, the clustering results produced by the technique have strong similarities to those of self-organizing maps (SOMs). We discuss the advantages and the mathematical reasoning behind our approach. Availability: The BTSVQ system is implemented in Matlab R12 using the SOM toolbox for the visualization and preprocessing of the data (http://www.cis.hut.fi/projects/ somtoolbox/). BTSVQ is available for non-commercial use (http://www.uhnres.utoronto.ca/ta3/BTSVQ).</p>}},
  author       = {{Sultan, M. and Wigle, D. A. and Cumbaa, C. A. and Maziarz, M. and Glasgow, J. and Tsao, M. S. and Jurisica, I.}},
  issn         = {{1367-4803}},
  keywords     = {{Lung cancer; Microarray data clustering and visulization; Self-organizing maps, partitive κ-means clustering}},
  language     = {{eng}},
  month        = {{01}},
  number       = {{SUPPL. 1}},
  publisher    = {{Oxford University Press}},
  series       = {{Bioinformatics}},
  title        = {{Binary tree-structured vector quantization approach to clustering and visualizing microarray data}},
  url          = {{http://dx.doi.org/10.1093/bioinformatics/18.suppl_1.S111}},
  doi          = {{10.1093/bioinformatics/18.suppl_1.S111}},
  volume       = {{18}},
  year         = {{2002}},
}