Advanced

Robust smooth segmentation approach for array CGH data analysis

Huang, J; Gusnanto, Arief; O'Sullivan, Kathleen; Staaf, Johan LU ; Borg, Åke LU and Pawitan, Yudi (2007) In Bioinformatics 23(18). p.2463-2469
Abstract
Motivation: Array comparative genomic hybridization (aCGH) provides a genome- wide technique to screen for copy number alteration. The existing segmentation approaches for analyzing aCGH data are based on modeling data as a series of discrete segments with unknown boundaries and unknown heights. Although the biological process of copy number alteration is discrete, in reality a variety of biological and experimental factors can cause the signal to deviate from a stepwise function. To take this into account, we propose a smooth segmentation (smoothseg) approach. Methods: To achieve a robust segmentation, we use a doubly heavy-tailed random-effect model. The first heavy-tailed structure on the errors deals with outliers in the observations,... (More)
Motivation: Array comparative genomic hybridization (aCGH) provides a genome- wide technique to screen for copy number alteration. The existing segmentation approaches for analyzing aCGH data are based on modeling data as a series of discrete segments with unknown boundaries and unknown heights. Although the biological process of copy number alteration is discrete, in reality a variety of biological and experimental factors can cause the signal to deviate from a stepwise function. To take this into account, we propose a smooth segmentation (smoothseg) approach. Methods: To achieve a robust segmentation, we use a doubly heavy-tailed random-effect model. The first heavy-tailed structure on the errors deals with outliers in the observations, and the second deals with possible jumps in the underlying pattern associated with different segments. We develop a fast and reliable computational procedure based on the iterative weighted least- squares algorithm with band-limited matrix inversion. Results: Using simulated and real data sets, we demonstrate how smoothseg can aid in identification of regions with genomic alteration and in classification of samples. For the real data sets, smoothseg leads to smaller false discovery rate and classification error rate than the circular binary segmentation (CBS) algorithm. In a realistic simulation setting, smoothseg is better than wavelet smoothing and CBS in identification of regions with genomic alterations and better than CBS in classification of samples. For comparative analyses, we demonstrate that segmenting the t- statistics performs better than segmenting the data. (Less)
Please use this url to cite or link to this publication:
author
organization
publishing date
type
Contribution to journal
publication status
published
subject
in
Bioinformatics
volume
23
issue
18
pages
2463 - 2469
publisher
Oxford University Press
external identifiers
  • wos:000249818700015
  • scopus:34548725570
ISSN
1367-4803
DOI
10.1093/bioinformatics/btm359
language
English
LU publication?
yes
id
8bec22c8-1923-4e09-a487-eb390e84f335 (old id 656197)
date added to LUP
2007-12-13 08:37:56
date last changed
2017-11-12 03:24:57
@article{8bec22c8-1923-4e09-a487-eb390e84f335,
  abstract     = {Motivation: Array comparative genomic hybridization (aCGH) provides a genome- wide technique to screen for copy number alteration. The existing segmentation approaches for analyzing aCGH data are based on modeling data as a series of discrete segments with unknown boundaries and unknown heights. Although the biological process of copy number alteration is discrete, in reality a variety of biological and experimental factors can cause the signal to deviate from a stepwise function. To take this into account, we propose a smooth segmentation (smoothseg) approach. Methods: To achieve a robust segmentation, we use a doubly heavy-tailed random-effect model. The first heavy-tailed structure on the errors deals with outliers in the observations, and the second deals with possible jumps in the underlying pattern associated with different segments. We develop a fast and reliable computational procedure based on the iterative weighted least- squares algorithm with band-limited matrix inversion. Results: Using simulated and real data sets, we demonstrate how smoothseg can aid in identification of regions with genomic alteration and in classification of samples. For the real data sets, smoothseg leads to smaller false discovery rate and classification error rate than the circular binary segmentation (CBS) algorithm. In a realistic simulation setting, smoothseg is better than wavelet smoothing and CBS in identification of regions with genomic alterations and better than CBS in classification of samples. For comparative analyses, we demonstrate that segmenting the t- statistics performs better than segmenting the data.},
  author       = {Huang, J and Gusnanto, Arief and O'Sullivan, Kathleen and Staaf, Johan and Borg, Åke and Pawitan, Yudi},
  issn         = {1367-4803},
  language     = {eng},
  number       = {18},
  pages        = {2463--2469},
  publisher    = {Oxford University Press},
  series       = {Bioinformatics},
  title        = {Robust smooth segmentation approach for array CGH data analysis},
  url          = {http://dx.doi.org/10.1093/bioinformatics/btm359},
  volume       = {23},
  year         = {2007},
}