Nonparametric methods for microarray data based on exchangeability and borrowed power
(2005) In Journal of Biopharmaceutical Statistics 15(5). p.783-797- Abstract
- This article proposes nonparametric inference procedures for analyzing microarray gene expression data that are reliable, robust, and simple to implement. They are conceptually transparent and require no special-purpose software. The analysis begins by normalizing gene expression data in a unique way. The resulting adjusted observations consist of gene-treatment interaction terms ( representing differential expression) and error terms. The error terms are considered to be exchangeable, which is the only substantial assumption. Thus, under a family null hypothesis of no differential expression, the adjusted observations are exchangeable and all permutations of the observations are equally probable. The investigator may use the adjusted... (More)
- This article proposes nonparametric inference procedures for analyzing microarray gene expression data that are reliable, robust, and simple to implement. They are conceptually transparent and require no special-purpose software. The analysis begins by normalizing gene expression data in a unique way. The resulting adjusted observations consist of gene-treatment interaction terms ( representing differential expression) and error terms. The error terms are considered to be exchangeable, which is the only substantial assumption. Thus, under a family null hypothesis of no differential expression, the adjusted observations are exchangeable and all permutations of the observations are equally probable. The investigator may use the adjusted observations directly in a distribution-free test method or use their ranks in a rank-based method, where the ranking is taken over the whole data set. For the latter, the essential steps are as follows: 1. Calculate a Wilcoxon rank-sum difference or a corresponding Kruskal-Wallis rank statistic for each gene. 2. Randomly permute the observations and repeat the previous step. 3. Independently repeat the random permutation a suitable number of times. Under the exchangeability assumption, the permutation statistics are independent random draws from a null cumulative distribution function (c.d.f.) approximated by the empirical c.d.f. Reference to the empirical c.d.f. tells if the test statistic for a gene is outlying and, hence, shows differential expression. This feature is judged by using an appropriate rejection region or computing a p-value for each test statistic, taking into account multiple testing. The distribution-free analog of the rank-based approach is also available and has parallel steps which are described in the article. The proposed nonparametric analysis tends to give good results with no additional refinement, although a few refinements are presented that may interest some investigators. The implementation is illustrated with a case application involving differential gene expression in wild-type and knockout mice of an E. coli lipopolysaccharide (LPS) endotoxin treatment, relative to a baseline untreated condition. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/208505
- author
- Lee, MLT ; Whitmore, GA ; Björkbacka, Harry LU and Freeman, MW
- organization
- publishing date
- 2005
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- rank methods, normalization, nonparametric methods, multiple testing, microarray, gene expression, false discovery rate, distribution-free, exchangeable random variables, SAM, statistical analysis
- in
- Journal of Biopharmaceutical Statistics
- volume
- 15
- issue
- 5
- pages
- 783 - 797
- publisher
- Taylor & Francis
- external identifiers
-
- pmid:16078385
- wos:000236233000003
- scopus:22544456869
- pmid:16078385
- ISSN
- 1520-5711
- DOI
- 10.1081/BIP-200067778
- language
- English
- LU publication?
- yes
- id
- e951fff9-23ca-47e6-9fbe-9529edb167cc (old id 208505)
- date added to LUP
- 2016-04-01 12:17:38
- date last changed
- 2022-04-21 05:26:20
@article{e951fff9-23ca-47e6-9fbe-9529edb167cc, abstract = {{This article proposes nonparametric inference procedures for analyzing microarray gene expression data that are reliable, robust, and simple to implement. They are conceptually transparent and require no special-purpose software. The analysis begins by normalizing gene expression data in a unique way. The resulting adjusted observations consist of gene-treatment interaction terms ( representing differential expression) and error terms. The error terms are considered to be exchangeable, which is the only substantial assumption. Thus, under a family null hypothesis of no differential expression, the adjusted observations are exchangeable and all permutations of the observations are equally probable. The investigator may use the adjusted observations directly in a distribution-free test method or use their ranks in a rank-based method, where the ranking is taken over the whole data set. For the latter, the essential steps are as follows: 1. Calculate a Wilcoxon rank-sum difference or a corresponding Kruskal-Wallis rank statistic for each gene. 2. Randomly permute the observations and repeat the previous step. 3. Independently repeat the random permutation a suitable number of times. Under the exchangeability assumption, the permutation statistics are independent random draws from a null cumulative distribution function (c.d.f.) approximated by the empirical c.d.f. Reference to the empirical c.d.f. tells if the test statistic for a gene is outlying and, hence, shows differential expression. This feature is judged by using an appropriate rejection region or computing a p-value for each test statistic, taking into account multiple testing. The distribution-free analog of the rank-based approach is also available and has parallel steps which are described in the article. The proposed nonparametric analysis tends to give good results with no additional refinement, although a few refinements are presented that may interest some investigators. The implementation is illustrated with a case application involving differential gene expression in wild-type and knockout mice of an E. coli lipopolysaccharide (LPS) endotoxin treatment, relative to a baseline untreated condition.}}, author = {{Lee, MLT and Whitmore, GA and Björkbacka, Harry and Freeman, MW}}, issn = {{1520-5711}}, keywords = {{rank methods; normalization; nonparametric methods; multiple testing; microarray; gene expression; false discovery rate; distribution-free; exchangeable random variables; SAM; statistical analysis}}, language = {{eng}}, number = {{5}}, pages = {{783--797}}, publisher = {{Taylor & Francis}}, series = {{Journal of Biopharmaceutical Statistics}}, title = {{Nonparametric methods for microarray data based on exchangeability and borrowed power}}, url = {{http://dx.doi.org/10.1081/BIP-200067778}}, doi = {{10.1081/BIP-200067778}}, volume = {{15}}, year = {{2005}}, }