Advanced

Regression analysis and modelling of data acquisition for SELDI-TOF mass spectrometry.

Sköld, Martin LU ; Rydén, Tobias LU ; Samuelsson, Viktoria; Welinder, Charlotte LU ; Ekblad, Lars LU ; Olsson, Håkan LU and Baldetorp, Bo LU (2007) In Bioinformatics 23(11). p.1401-1409
Abstract
Motivation: Pre-processing of SELDI-TOF mass spectrometry data is currently performed on a largel y ad hoc basis. This makes comparison of results from independent analyses troublesome and does not provide a framework for distinguishing different sources of variation in data. Results: In this article, we consider the task of pooling a large number of single-shot spectra, a task commonly performed automatically by the instrument software. By viewing the underlying statistical problem as one of heteroscedastic linear regression, we provide a framework for introducing robust methods and for dealing with missing data resulting from a limited span of recordable intensity values provided by the instrument. Our framework provides an... (More)
Motivation: Pre-processing of SELDI-TOF mass spectrometry data is currently performed on a largel y ad hoc basis. This makes comparison of results from independent analyses troublesome and does not provide a framework for distinguishing different sources of variation in data. Results: In this article, we consider the task of pooling a large number of single-shot spectra, a task commonly performed automatically by the instrument software. By viewing the underlying statistical problem as one of heteroscedastic linear regression, we provide a framework for introducing robust methods and for dealing with missing data resulting from a limited span of recordable intensity values provided by the instrument. Our framework provides an interpretation of currently used methods as a maximum-likelihood estimator and allows theoretical derivation of its variance. We observe that this variance depends crucially on the total number of ionic species, which can vary considerably between different pooled spectra. This variation in variance can potentially invalidate the results from naive methods of discrimination/classification and we outline appropriate data transformations. Introducing methods from robust statistics did not improve the standard errors of the pooled samples. Imputing missing values however-using the EM algorithm-had a notable effect on the result; for our data, the pooled height of peaks which were frequently truncated increased by up to 30%. (Less)
Please use this url to cite or link to this publication:
author
organization
publishing date
type
Contribution to journal
publication status
published
subject
in
Bioinformatics
volume
23
issue
11
pages
1401 - 1409
publisher
Oxford University Press
external identifiers
  • wos:000247781300013
  • scopus:34447310845
ISSN
1367-4803
DOI
10.1093/bioinformatics/btm104
language
English
LU publication?
yes
id
78f285a9-4bb0-40e5-9664-edbbf6527e83 (old id 166222)
date added to LUP
2007-07-24 10:29:45
date last changed
2017-09-06 10:53:50
@article{78f285a9-4bb0-40e5-9664-edbbf6527e83,
  abstract     = {Motivation: Pre-processing of SELDI-TOF mass spectrometry data is currently performed on a largel y ad hoc basis. This makes comparison of results from independent analyses troublesome and does not provide a framework for distinguishing different sources of variation in data. Results: In this article, we consider the task of pooling a large number of single-shot spectra, a task commonly performed automatically by the instrument software. By viewing the underlying statistical problem as one of heteroscedastic linear regression, we provide a framework for introducing robust methods and for dealing with missing data resulting from a limited span of recordable intensity values provided by the instrument. Our framework provides an interpretation of currently used methods as a maximum-likelihood estimator and allows theoretical derivation of its variance. We observe that this variance depends crucially on the total number of ionic species, which can vary considerably between different pooled spectra. This variation in variance can potentially invalidate the results from naive methods of discrimination/classification and we outline appropriate data transformations. Introducing methods from robust statistics did not improve the standard errors of the pooled samples. Imputing missing values however-using the EM algorithm-had a notable effect on the result; for our data, the pooled height of peaks which were frequently truncated increased by up to 30%.},
  author       = {Sköld, Martin and Rydén, Tobias and Samuelsson, Viktoria and Welinder, Charlotte and Ekblad, Lars and Olsson, Håkan and Baldetorp, Bo},
  issn         = {1367-4803},
  language     = {eng},
  number       = {11},
  pages        = {1401--1409},
  publisher    = {Oxford University Press},
  series       = {Bioinformatics},
  title        = {Regression analysis and modelling of data acquisition for SELDI-TOF mass spectrometry.},
  url          = {http://dx.doi.org/10.1093/bioinformatics/btm104},
  volume       = {23},
  year         = {2007},
}