Nonparametric methods for microarray data based on exchangeability and borrowed power

Lee, MLT; Whitmore, GA; Björkbacka, Harry; Freeman, MW

Nonparametric methods for microarray data based on exchangeability and borrowed power

Mark

Lee, MLT ; Whitmore, GA ; Björkbacka, Harry ^LU

and Freeman, MW (2005) In Journal of Biopharmaceutical Statistics 15(5). p.783-797

Abstract: This article proposes nonparametric inference procedures for analyzing microarray gene expression data that are reliable, robust, and simple to implement. They are conceptually transparent and require no special-purpose software. The analysis begins by normalizing gene expression data in a unique way. The resulting adjusted observations consist of gene-treatment interaction terms ( representing differential expression) and error terms. The error terms are considered to be exchangeable, which is the only substantial assumption. Thus, under a family null hypothesis of no differential expression, the adjusted observations are exchangeable and all permutations of the observations are equally probable. The investigator may use the adjusted... (More); This article proposes nonparametric inference procedures for analyzing microarray gene expression data that are reliable, robust, and simple to implement. They are conceptually transparent and require no special-purpose software. The analysis begins by normalizing gene expression data in a unique way. The resulting adjusted observations consist of gene-treatment interaction terms ( representing differential expression) and error terms. The error terms are considered to be exchangeable, which is the only substantial assumption. Thus, under a family null hypothesis of no differential expression, the adjusted observations are exchangeable and all permutations of the observations are equally probable. The investigator may use the adjusted observations directly in a distribution-free test method or use their ranks in a rank-based method, where the ranking is taken over the whole data set. For the latter, the essential steps are as follows: 1. Calculate a Wilcoxon rank-sum difference or a corresponding Kruskal-Wallis rank statistic for each gene. 2. Randomly permute the observations and repeat the previous step. 3. Independently repeat the random permutation a suitable number of times. Under the exchangeability assumption, the permutation statistics are independent random draws from a null cumulative distribution function (c.d.f.) approximated by the empirical c.d.f. Reference to the empirical c.d.f. tells if the test statistic for a gene is outlying and, hence, shows differential expression. This feature is judged by using an appropriate rejection region or computing a p-value for each test statistic, taking into account multiple testing. The distribution-free analog of the rank-based approach is also available and has parallel steps which are described in the article. The proposed nonparametric analysis tends to give good results with no additional refinement, although a few refinements are presented that may interest some investigators. The implementation is illustrated with a case application involving differential gene expression in wild-type and knockout mice of an E. coli lipopolysaccharide (LPS) endotoxin treatment, relative to a baseline untreated condition. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/208505

author

Lee, MLT ; Whitmore, GA ; Björkbacka, Harry ^LU

and Freeman, MW

organization

Cardiovascular Research - Immunity and Atherosclerosis (research group)

publishing date

2005

type

Contribution to journal

publication status

published

subject

Cardiology and Cardiovascular Disease

keywords

rank methods, normalization, nonparametric methods, multiple testing, microarray, gene expression, false discovery rate, distribution-free, exchangeable random variables, SAM, statistical analysis

in

Journal of Biopharmaceutical Statistics

volume

15

issue

5

pages

783 - 797

publisher

Taylor & Francis

external identifiers

pmid:16078385
wos:000236233000003
scopus:22544456869
pmid:16078385

ISSN

1520-5711

DOI

10.1081/BIP-200067778

language

English

LU publication?

yes

id

e951fff9-23ca-47e6-9fbe-9529edb167cc (old id 208505)

date added to LUP

2016-04-01 12:17:38

date last changed

2025-10-14 11:18:16

@article{e951fff9-23ca-47e6-9fbe-9529edb167cc,
  abstract     = {{This article proposes nonparametric inference procedures for analyzing microarray gene expression data that are reliable, robust, and simple to implement. They are conceptually transparent and require no special-purpose software. The analysis begins by normalizing gene expression data in a unique way. The resulting adjusted observations consist of gene-treatment interaction terms ( representing differential expression) and error terms. The error terms are considered to be exchangeable, which is the only substantial assumption. Thus, under a family null hypothesis of no differential expression, the adjusted observations are exchangeable and all permutations of the observations are equally probable. The investigator may use the adjusted observations directly in a distribution-free test method or use their ranks in a rank-based method, where the ranking is taken over the whole data set. For the latter, the essential steps are as follows: 1. Calculate a Wilcoxon rank-sum difference or a corresponding Kruskal-Wallis rank statistic for each gene. 2. Randomly permute the observations and repeat the previous step. 3. Independently repeat the random permutation a suitable number of times. Under the exchangeability assumption, the permutation statistics are independent random draws from a null cumulative distribution function (c.d.f.) approximated by the empirical c.d.f. Reference to the empirical c.d.f. tells if the test statistic for a gene is outlying and, hence, shows differential expression. This feature is judged by using an appropriate rejection region or computing a p-value for each test statistic, taking into account multiple testing. The distribution-free analog of the rank-based approach is also available and has parallel steps which are described in the article. The proposed nonparametric analysis tends to give good results with no additional refinement, although a few refinements are presented that may interest some investigators. The implementation is illustrated with a case application involving differential gene expression in wild-type and knockout mice of an E. coli lipopolysaccharide (LPS) endotoxin treatment, relative to a baseline untreated condition.}},
  author       = {{Lee, MLT and Whitmore, GA and Björkbacka, Harry and Freeman, MW}},
  issn         = {{1520-5711}},
  keywords     = {{rank methods; normalization; nonparametric methods; multiple testing; microarray; gene expression; false discovery rate; distribution-free; exchangeable random variables; SAM; statistical analysis}},
  language     = {{eng}},
  number       = {{5}},
  pages        = {{783--797}},
  publisher    = {{Taylor & Francis}},
  series       = {{Journal of Biopharmaceutical Statistics}},
  title        = {{Nonparametric methods for microarray data based on exchangeability and borrowed power}},
  url          = {{http://dx.doi.org/10.1081/BIP-200067778}},
  doi          = {{10.1081/BIP-200067778}},
  volume       = {{15}},
  year         = {{2005}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Nonparametric methods for microarray data based on exchangeability and borrowed power