A probabilistic treatment of the missing spot problem in 2D gel electrophoresis experiments

Krogh, Morten; Fernandez, Celine; Teilum, Maria; Bengtsson, Sofia; James, Peter

A probabilistic treatment of the missing spot problem in 2D gel electrophoresis experiments

Mark

Krogh, Morten ^LU ; Fernandez, Celine ^LU ; Teilum, Maria ^LU ; Bengtsson, Sofia and James, Peter ^LU

(2007) In Journal of Proteome Research 6(8). p.3335-3343

Abstract: Two-dimensional SIDS-PAGE gel electrophoresis using post-run staining is widely used to measure the abundances of thousands of protein spots simultaneously. Usually, the protein abundances of two or more biological groups are compared using biological and technical replicates. After gel separation and staining, the spots are detected, spot volumes are quantified, and spots are matched across gels. There are almost always many missing values in the resulting data set. The missing values arise either because the corresponding proteins have very low abundances (or are absent) or because of experimental errors such as incomplete/over focusing in the first dimension or varying run times in the second dimension as well as faulty spot detection... (More); Two-dimensional SIDS-PAGE gel electrophoresis using post-run staining is widely used to measure the abundances of thousands of protein spots simultaneously. Usually, the protein abundances of two or more biological groups are compared using biological and technical replicates. After gel separation and staining, the spots are detected, spot volumes are quantified, and spots are matched across gels. There are almost always many missing values in the resulting data set. The missing values arise either because the corresponding proteins have very low abundances (or are absent) or because of experimental errors such as incomplete/over focusing in the first dimension or varying run times in the second dimension as well as faulty spot detection and matching. In this study, we show that the probability for a spot to be missing can be modeled by a logistic regression function of the logarithm of the volume. Furthermore, we present an algorithm that takes a set of gels with technical and biological replicates as input and estimates the average protein abundances in the biological groups from the number of missing spots and measured volumes of the present spots using a maximum likelihood approach. Confidence intervals for abundances and p-values for differential expression between two groups are calculated using bootstrap sampling. The algorithm is compared to two standard approaches, one that discards missing values and one that sets all missing values to zero. We have evaluated this approach in two different gel data sets of different biological origin. An F-program, implementing the algorithm, is freely available at httP://bioinfo.thep.lu.se/MissingValues2Dgels.html. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/646358

author

Krogh, Morten ^LU ; Fernandez, Celine ^LU ; Teilum, Maria ^LU ; Bengtsson, Sofia and James, Peter ^LU

organization

publishing date

2007

type

Contribution to journal

publication status

published

subject

keywords

missing values, maximum likelihood, 2D-PAGE

in

Journal of Proteome Research

volume

6

issue

8

pages

3335 - 3343

publisher

The American Chemical Society (ACS)

external identifiers

wos:000248683200041
scopus:34548149778

ISSN

1535-3893

DOI

10.1021/pr070137p

language

English

LU publication?

yes

additional info

The information about affiliations in this record was updated in December 2015. The record was previously connected to the following departments: Computational biology and biological physics (000006113), Department of Immunotechnology (011029300), Laboratory for Experimental Brain Research (013041000), Molecular Endocrinology (013212018)

id

6c878b1c-f8e8-4973-bda6-8a1cfa05643c (old id 646358)

date added to LUP

2016-04-01 11:45:46

date last changed

2025-10-14 09:44:32

@article{6c878b1c-f8e8-4973-bda6-8a1cfa05643c,
  abstract     = {{Two-dimensional SIDS-PAGE gel electrophoresis using post-run staining is widely used to measure the abundances of thousands of protein spots simultaneously. Usually, the protein abundances of two or more biological groups are compared using biological and technical replicates. After gel separation and staining, the spots are detected, spot volumes are quantified, and spots are matched across gels. There are almost always many missing values in the resulting data set. The missing values arise either because the corresponding proteins have very low abundances (or are absent) or because of experimental errors such as incomplete/over focusing in the first dimension or varying run times in the second dimension as well as faulty spot detection and matching. In this study, we show that the probability for a spot to be missing can be modeled by a logistic regression function of the logarithm of the volume. Furthermore, we present an algorithm that takes a set of gels with technical and biological replicates as input and estimates the average protein abundances in the biological groups from the number of missing spots and measured volumes of the present spots using a maximum likelihood approach. Confidence intervals for abundances and p-values for differential expression between two groups are calculated using bootstrap sampling. The algorithm is compared to two standard approaches, one that discards missing values and one that sets all missing values to zero. We have evaluated this approach in two different gel data sets of different biological origin. An F-program, implementing the algorithm, is freely available at httP://bioinfo.thep.lu.se/MissingValues2Dgels.html.}},
  author       = {{Krogh, Morten and Fernandez, Celine and Teilum, Maria and Bengtsson, Sofia and James, Peter}},
  issn         = {{1535-3893}},
  keywords     = {{missing values; maximum likelihood; 2D-PAGE}},
  language     = {{eng}},
  number       = {{8}},
  pages        = {{3335--3343}},
  publisher    = {{The American Chemical Society (ACS)}},
  series       = {{Journal of Proteome Research}},
  title        = {{A probabilistic treatment of the missing spot problem in 2D gel electrophoresis experiments}},
  url          = {{http://dx.doi.org/10.1021/pr070137p}},
  doi          = {{10.1021/pr070137p}},
  volume       = {{6}},
  year         = {{2007}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

A probabilistic treatment of the missing spot problem in 2D gel electrophoresis experiments