Advanced

Nonparametric methods for microarray data based on exchangeability and borrowed power

Lee, MLT; Whitmore, GA; Björkbacka, Harry LU and Freeman, MW (2005) In Journal of Biopharmaceutical Statistics 15(5). p.783-797
Abstract
This article proposes nonparametric inference procedures for analyzing microarray gene expression data that are reliable, robust, and simple to implement. They are conceptually transparent and require no special-purpose software. The analysis begins by normalizing gene expression data in a unique way. The resulting adjusted observations consist of gene-treatment interaction terms ( representing differential expression) and error terms. The error terms are considered to be exchangeable, which is the only substantial assumption. Thus, under a family null hypothesis of no differential expression, the adjusted observations are exchangeable and all permutations of the observations are equally probable. The investigator may use the adjusted... (More)
This article proposes nonparametric inference procedures for analyzing microarray gene expression data that are reliable, robust, and simple to implement. They are conceptually transparent and require no special-purpose software. The analysis begins by normalizing gene expression data in a unique way. The resulting adjusted observations consist of gene-treatment interaction terms ( representing differential expression) and error terms. The error terms are considered to be exchangeable, which is the only substantial assumption. Thus, under a family null hypothesis of no differential expression, the adjusted observations are exchangeable and all permutations of the observations are equally probable. The investigator may use the adjusted observations directly in a distribution-free test method or use their ranks in a rank-based method, where the ranking is taken over the whole data set. For the latter, the essential steps are as follows: 1. Calculate a Wilcoxon rank-sum difference or a corresponding Kruskal-Wallis rank statistic for each gene. 2. Randomly permute the observations and repeat the previous step. 3. Independently repeat the random permutation a suitable number of times. Under the exchangeability assumption, the permutation statistics are independent random draws from a null cumulative distribution function (c.d.f.) approximated by the empirical c.d.f. Reference to the empirical c.d.f. tells if the test statistic for a gene is outlying and, hence, shows differential expression. This feature is judged by using an appropriate rejection region or computing a p-value for each test statistic, taking into account multiple testing. The distribution-free analog of the rank-based approach is also available and has parallel steps which are described in the article. The proposed nonparametric analysis tends to give good results with no additional refinement, although a few refinements are presented that may interest some investigators. The implementation is illustrated with a case application involving differential gene expression in wild-type and knockout mice of an E. coli lipopolysaccharide (LPS) endotoxin treatment, relative to a baseline untreated condition. (Less)
Please use this url to cite or link to this publication:
author
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
rank methods, normalization, nonparametric methods, multiple testing, microarray, gene expression, false discovery rate, distribution-free, exchangeable random variables, SAM, statistical analysis
in
Journal of Biopharmaceutical Statistics
volume
15
issue
5
pages
783 - 797
publisher
Taylor & Francis
external identifiers
  • pmid:16078385
  • wos:000236233000003
  • scopus:22544456869
ISSN
1520-5711
DOI
10.1081/BIP-200067778
language
English
LU publication?
yes
id
e951fff9-23ca-47e6-9fbe-9529edb167cc (old id 208505)
date added to LUP
2007-08-20 13:21:26
date last changed
2017-01-01 05:00:58
@article{e951fff9-23ca-47e6-9fbe-9529edb167cc,
  abstract     = {This article proposes nonparametric inference procedures for analyzing microarray gene expression data that are reliable, robust, and simple to implement. They are conceptually transparent and require no special-purpose software. The analysis begins by normalizing gene expression data in a unique way. The resulting adjusted observations consist of gene-treatment interaction terms ( representing differential expression) and error terms. The error terms are considered to be exchangeable, which is the only substantial assumption. Thus, under a family null hypothesis of no differential expression, the adjusted observations are exchangeable and all permutations of the observations are equally probable. The investigator may use the adjusted observations directly in a distribution-free test method or use their ranks in a rank-based method, where the ranking is taken over the whole data set. For the latter, the essential steps are as follows: 1. Calculate a Wilcoxon rank-sum difference or a corresponding Kruskal-Wallis rank statistic for each gene. 2. Randomly permute the observations and repeat the previous step. 3. Independently repeat the random permutation a suitable number of times. Under the exchangeability assumption, the permutation statistics are independent random draws from a null cumulative distribution function (c.d.f.) approximated by the empirical c.d.f. Reference to the empirical c.d.f. tells if the test statistic for a gene is outlying and, hence, shows differential expression. This feature is judged by using an appropriate rejection region or computing a p-value for each test statistic, taking into account multiple testing. The distribution-free analog of the rank-based approach is also available and has parallel steps which are described in the article. The proposed nonparametric analysis tends to give good results with no additional refinement, although a few refinements are presented that may interest some investigators. The implementation is illustrated with a case application involving differential gene expression in wild-type and knockout mice of an E. coli lipopolysaccharide (LPS) endotoxin treatment, relative to a baseline untreated condition.},
  author       = {Lee, MLT and Whitmore, GA and Björkbacka, Harry and Freeman, MW},
  issn         = {1520-5711},
  keyword      = {rank methods,normalization,nonparametric methods,multiple testing,microarray,gene expression,false discovery rate,distribution-free,exchangeable random variables,SAM,statistical analysis},
  language     = {eng},
  number       = {5},
  pages        = {783--797},
  publisher    = {Taylor & Francis},
  series       = {Journal of Biopharmaceutical Statistics},
  title        = {Nonparametric methods for microarray data based on exchangeability and borrowed power},
  url          = {http://dx.doi.org/10.1081/BIP-200067778},
  volume       = {15},
  year         = {2005},
}