Fast iterative gene clustering based on information theoretic criteria for selecting the cluster structure.

Giurcăneanu, Ciprian Doru; Tăbuş, Ioan; Astola, Jaakko; Ollila, Juha; Vihinen, Mauno

Fast iterative gene clustering based on information theoretic criteria for selecting the cluster structure.

Mark

Giurcăneanu, Ciprian Doru ; Tăbuş, Ioan ; Astola, Jaakko ; Ollila, Juha and Vihinen, Mauno ^LU

(2004) In Journal of Computational Biology 11(4). p.660-682

Abstract: Grouping of genes into clusters according to their expression levels is important for deriving biological information, e.g., on gene functions based on microarray and other related analyses. The paper introduces the selection of the number of clusters based on the minimum description length (MDL) principle for the selection of the number of clusters in gene expression data. The main feature of the new method is the ability to evaluate in a fast way the number of clusters according to the sound MDL principle, without exhaustive evaluations over all possible partitions of the gene set. The estimation method can be used in conjunction with various clustering algorithms. A recent clustering algorithm using principal component analysis, the... (More); Grouping of genes into clusters according to their expression levels is important for deriving biological information, e.g., on gene functions based on microarray and other related analyses. The paper introduces the selection of the number of clusters based on the minimum description length (MDL) principle for the selection of the number of clusters in gene expression data. The main feature of the new method is the ability to evaluate in a fast way the number of clusters according to the sound MDL principle, without exhaustive evaluations over all possible partitions of the gene set. The estimation method can be used in conjunction with various clustering algorithms. A recent clustering algorithm using principal component analysis, the "gene shaving" (GS) procedure, can be modified to make use of the new MDL estimation method, replacing the Gap statistics originally used in GS algorithm. The resulting clustering algorithm is shown to perform better than GS-Gap and CEM (classification expectation maximization), in the simulations using artificial data. The proposed method is applied to B-cell differentiation data, and the resulting clusters are compared with those found by self-organizing maps (SOM). (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/3635446

author

Giurcăneanu, Ciprian Doru ; Tăbuş, Ioan ; Astola, Jaakko ; Ollila, Juha and Vihinen, Mauno ^LU

publishing date

2004

type

Contribution to journal

publication status

published

subject

Medical Genetics and Genomics (including Gene Therapy)

keywords

B-Lymphocytes: cytology, B-Lymphocytes: physiology, Gene Expression Profiling: statistics & numerical data

in

Journal of Computational Biology

volume

11

issue

4

pages

660 - 682

publisher

Mary Ann Liebert, Inc.

external identifiers

pmid:15579237
scopus:4544279049

ISSN

1557-8666

DOI

10.1089/1066527041887285

language

English

LU publication?

no

id

c698a663-fd56-4e12-a0d6-4ad4751752bc (old id 3635446)

alternative location

http://www.ncbi.nlm.nih.gov/pubmed/15579237?dopt=Abstract

date added to LUP

2016-04-04 08:44:27

date last changed

2025-10-14 11:32:38

@article{c698a663-fd56-4e12-a0d6-4ad4751752bc,
  abstract     = {{Grouping of genes into clusters according to their expression levels is important for deriving biological information, e.g., on gene functions based on microarray and other related analyses. The paper introduces the selection of the number of clusters based on the minimum description length (MDL) principle for the selection of the number of clusters in gene expression data. The main feature of the new method is the ability to evaluate in a fast way the number of clusters according to the sound MDL principle, without exhaustive evaluations over all possible partitions of the gene set. The estimation method can be used in conjunction with various clustering algorithms. A recent clustering algorithm using principal component analysis, the "gene shaving" (GS) procedure, can be modified to make use of the new MDL estimation method, replacing the Gap statistics originally used in GS algorithm. The resulting clustering algorithm is shown to perform better than GS-Gap and CEM (classification expectation maximization), in the simulations using artificial data. The proposed method is applied to B-cell differentiation data, and the resulting clusters are compared with those found by self-organizing maps (SOM).}},
  author       = {{Giurcăneanu, Ciprian Doru and Tăbuş, Ioan and Astola, Jaakko and Ollila, Juha and Vihinen, Mauno}},
  issn         = {{1557-8666}},
  keywords     = {{B-Lymphocytes: cytology; B-Lymphocytes: physiology; Gene Expression Profiling: statistics & numerical data}},
  language     = {{eng}},
  number       = {{4}},
  pages        = {{660--682}},
  publisher    = {{Mary Ann Liebert, Inc.}},
  series       = {{Journal of Computational Biology}},
  title        = {{Fast iterative gene clustering based on information theoretic criteria for selecting the cluster structure.}},
  url          = {{http://dx.doi.org/10.1089/1066527041887285}},
  doi          = {{10.1089/1066527041887285}},
  volume       = {{11}},
  year         = {{2004}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Fast iterative gene clustering based on information theoretic criteria for selecting the cluster structure.