Power in the Phenotypic Extremes: A Simulation Study of Power in Discovery and Replication of Rare Variants

Guey, Lin T.; Kravic, Jasmina; Melander, Olle; Burtt, Noel P.; Laramie, Jason M.; Lyssenko, Valeriya; Jonsson, Anna; Lindholm, Eero; Tuomi, Tiinamaija; Isomaa, Bo; Nilsson, Peter; Almgren, Peter; Kathiresan, Sekar; Groop, Leif; Seymour, Albert B.; Altshuler, David; Voight, Benjamin F.

Power in the Phenotypic Extremes: A Simulation Study of Power in Discovery and Replication of Rare Variants

Mark

Guey, Lin T. ; Kravic, Jasmina ^LU ; Melander, Olle ^LU

; Burtt, Noel P. ; Laramie, Jason M. ; Lyssenko, Valeriya ^LU ; Jonsson, Anna ^LU ; Lindholm, Eero ^LU ; Tuomi, Tiinamaija and Isomaa, Bo , et al. (2011) In Genetic Epidemiology 35(4). p.236-246

Abstract: Next-generation sequencing technologies are making it possible to study the role of rare variants in human disease. Many studies balance statistical power with cost-effectiveness by (a) sampling from phenotypic extremes and (b) utilizing a two-stage design. Two-stage designs include a broad-based discovery phase and selection of a subset of potential causal genes/variants to be further examined in independent samples. We evaluate three parameters: first, the gain in statistical power due to extreme sampling to discover causal variants; second, the informativeness of initial (Phase I) association statistics to select genes/variants for follow-up; third, the impact of extreme and random sampling in (Phase 2) replication. We present a... (More); Next-generation sequencing technologies are making it possible to study the role of rare variants in human disease. Many studies balance statistical power with cost-effectiveness by (a) sampling from phenotypic extremes and (b) utilizing a two-stage design. Two-stage designs include a broad-based discovery phase and selection of a subset of potential causal genes/variants to be further examined in independent samples. We evaluate three parameters: first, the gain in statistical power due to extreme sampling to discover causal variants; second, the informativeness of initial (Phase I) association statistics to select genes/variants for follow-up; third, the impact of extreme and random sampling in (Phase 2) replication. We present a quantitative method to select individuals from the phenotypic extremes of a binary trait, and simulate disease association studies under a variety of sample sizes and sampling schemes. First, we find that while studies sampling from extremes have excellent power to discover rare variants, they have limited power to associate them to phenotype-suggesting high false-negative rates for upcoming studies. Second, consistent with previous studies, we find that the effect sizes estimated in these studies are expected to be systematically larger compared with the overall population effect size; in a well-cited lipids study, we estimate the reported effect to be twofold larger. Third, replication studies require large samples from the general population to have sufficient power; extreme sampling could reduce the required sample size as much as fourfold. Our observations offer practical guidance for the design and interpretation of studies that utilize extreme sampling. Genet. Epidemiol. 35: 236-246, 2011. (c) 2011 Wiley-Liss, Inc. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/1965707

author

Guey, Lin T. ; Kravic, Jasmina ^LU ; Melander, Olle ^LU

; Burtt, Noel P. ; Laramie, Jason M. ; Lyssenko, Valeriya ^LU ; Jonsson, Anna ^LU ; Lindholm, Eero ^LU ; Tuomi, Tiinamaija and Isomaa, Bo , et al. (More)

Guey, Lin T. ; Kravic, Jasmina ^LU ; Melander, Olle ^LU

; Burtt, Noel P. ; Laramie, Jason M. ; Lyssenko, Valeriya ^LU ; Jonsson, Anna ^LU ; Lindholm, Eero ^LU ; Tuomi, Tiinamaija ; Isomaa, Bo ; Nilsson, Peter ^LU ; Almgren, Peter ; Kathiresan, Sekar ; Groop, Leif ^LU ; Seymour, Albert B. ; Altshuler, David and Voight, Benjamin F. (Less)

organization

publishing date

2011

type

Contribution to journal

publication status

published

subject

keywords

* next-generation sequencing* liability ascertainment* variant discovery* replication of association* phenotype extremes

in

Genetic Epidemiology

volume

35

issue

4

pages

236 - 246

publisher

Wiley-Liss Inc.

external identifiers

wos:000289375400004
scopus:79959557454

ISSN

0741-0395

DOI

10.1002/gepi.20572

language

English

LU publication?

yes

additional info

PMID:21308769

id

a5d54699-2b4d-4be9-b152-cae4de0b066d (old id 1965707)

date added to LUP

2016-04-01 11:08:16

date last changed

2025-10-14 13:14:48

@article{a5d54699-2b4d-4be9-b152-cae4de0b066d,
  abstract     = {{Next-generation sequencing technologies are making it possible to study the role of rare variants in human disease. Many studies balance statistical power with cost-effectiveness by (a) sampling from phenotypic extremes and (b) utilizing a two-stage design. Two-stage designs include a broad-based discovery phase and selection of a subset of potential causal genes/variants to be further examined in independent samples. We evaluate three parameters: first, the gain in statistical power due to extreme sampling to discover causal variants; second, the informativeness of initial (Phase I) association statistics to select genes/variants for follow-up; third, the impact of extreme and random sampling in (Phase 2) replication. We present a quantitative method to select individuals from the phenotypic extremes of a binary trait, and simulate disease association studies under a variety of sample sizes and sampling schemes. First, we find that while studies sampling from extremes have excellent power to discover rare variants, they have limited power to associate them to phenotype-suggesting high false-negative rates for upcoming studies. Second, consistent with previous studies, we find that the effect sizes estimated in these studies are expected to be systematically larger compared with the overall population effect size; in a well-cited lipids study, we estimate the reported effect to be twofold larger. Third, replication studies require large samples from the general population to have sufficient power; extreme sampling could reduce the required sample size as much as fourfold. Our observations offer practical guidance for the design and interpretation of studies that utilize extreme sampling. Genet. Epidemiol. 35: 236-246, 2011. (c) 2011 Wiley-Liss, Inc.}},
  author       = {{Guey, Lin T. and Kravic, Jasmina and Melander, Olle and Burtt, Noel P. and Laramie, Jason M. and Lyssenko, Valeriya and Jonsson, Anna and Lindholm, Eero and Tuomi, Tiinamaija and Isomaa, Bo and Nilsson, Peter and Almgren, Peter and Kathiresan, Sekar and Groop, Leif and Seymour, Albert B. and Altshuler, David and Voight, Benjamin F.}},
  issn         = {{0741-0395}},
  keywords     = {{* next-generation sequencing* liability ascertainment* variant discovery* replication of association* phenotype extremes}},
  language     = {{eng}},
  number       = {{4}},
  pages        = {{236--246}},
  publisher    = {{Wiley-Liss Inc.}},
  series       = {{Genetic Epidemiology}},
  title        = {{Power in the Phenotypic Extremes: A Simulation Study of Power in Discovery and Replication of Rare Variants}},
  url          = {{http://dx.doi.org/10.1002/gepi.20572}},
  doi          = {{10.1002/gepi.20572}},
  volume       = {{35}},
  year         = {{2011}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Power in the Phenotypic Extremes: A Simulation Study of Power in Discovery and Replication of Rare Variants