Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Monitoring of technical variation in quantitative high-throughput datasets.

Lauss, Martin LU ; Visne, Ilhami ; Kriegner, Albert ; Ringnér, Markus LU orcid ; Jönsson, Göran B LU and Höglund, Mattias LU (2013) In Cancer Informatics 12(Sep 23). p.193-201
Abstract
High-dimensional datasets can be confounded by variation from technical sources, such as batches. Undetected batch effects can have severe consequences for the validity of a study's conclusion(s). We evaluate high-throughput RNAseq and miRNAseq as well as DNA methylation and gene expression microarray datasets, mainly from the Cancer Genome Atlas (TCGA) project, in respect to technical and biological annotations. We observe technical bias in these datasets and discuss corrective interventions. We then suggest a general procedure to control study design, detect technical bias using linear regression of principal components, correct for batch effects, and re-evaluate principal components. This procedure is implemented in the R package swamp,... (More)
High-dimensional datasets can be confounded by variation from technical sources, such as batches. Undetected batch effects can have severe consequences for the validity of a study's conclusion(s). We evaluate high-throughput RNAseq and miRNAseq as well as DNA methylation and gene expression microarray datasets, mainly from the Cancer Genome Atlas (TCGA) project, in respect to technical and biological annotations. We observe technical bias in these datasets and discuss corrective interventions. We then suggest a general procedure to control study design, detect technical bias using linear regression of principal components, correct for batch effects, and re-evaluate principal components. This procedure is implemented in the R package swamp, and as graphical user interface software. In conclusion, high-throughput platforms that generate continuous measurements are sensitive to various forms of technical bias. For such data, monitoring of technical variation is an important analysis step. (Less)
Please use this url to cite or link to this publication:
author
; ; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
in
Cancer Informatics
volume
12
issue
Sep 23
pages
193 - 201
publisher
Libertas Academica
external identifiers
  • pmid:24092958
  • scopus:84884558982
  • pmid:24092958
ISSN
1176-9351
DOI
10.4137/CIN.S12862
language
English
LU publication?
yes
id
2d64ff30-ca1b-456d-957e-02968028efd8 (old id 4143705)
alternative location
http://www.ncbi.nlm.nih.gov/pubmed/24092958?dopt=Abstract
date added to LUP
2016-04-01 14:21:16
date last changed
2022-04-06 18:12:15
@article{2d64ff30-ca1b-456d-957e-02968028efd8,
  abstract     = {{High-dimensional datasets can be confounded by variation from technical sources, such as batches. Undetected batch effects can have severe consequences for the validity of a study's conclusion(s). We evaluate high-throughput RNAseq and miRNAseq as well as DNA methylation and gene expression microarray datasets, mainly from the Cancer Genome Atlas (TCGA) project, in respect to technical and biological annotations. We observe technical bias in these datasets and discuss corrective interventions. We then suggest a general procedure to control study design, detect technical bias using linear regression of principal components, correct for batch effects, and re-evaluate principal components. This procedure is implemented in the R package swamp, and as graphical user interface software. In conclusion, high-throughput platforms that generate continuous measurements are sensitive to various forms of technical bias. For such data, monitoring of technical variation is an important analysis step.}},
  author       = {{Lauss, Martin and Visne, Ilhami and Kriegner, Albert and Ringnér, Markus and Jönsson, Göran B and Höglund, Mattias}},
  issn         = {{1176-9351}},
  language     = {{eng}},
  number       = {{Sep 23}},
  pages        = {{193--201}},
  publisher    = {{Libertas Academica}},
  series       = {{Cancer Informatics}},
  title        = {{Monitoring of technical variation in quantitative high-throughput datasets.}},
  url          = {{https://lup.lub.lu.se/search/files/3930611/4253834}},
  doi          = {{10.4137/CIN.S12862}},
  volume       = {{12}},
  year         = {{2013}},
}