Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Methodological considerations for identifying multiple plasma proteins associated with all-cause mortality in a population-based prospective cohort

Drake, Isabel LU ; Hindy, George LU ; Almgren, Peter LU ; Engström, Gunnar LU ; Nilsson, Jan LU ; Melander, Olle LU orcid and Orho-Melander, Marju LU (2021) In Scientific Reports 11(1).
Abstract

Novel methods to characterize the plasma proteome has made it possible to examine a wide range of proteins in large longitudinal cohort studies, but the complexity of the human proteome makes it difficult to identify robust protein-disease associations. Nevertheless, identification of individuals at high risk of early mortality is a central issue in clinical decision making and novel biomarkers may be useful to improve risk stratification. With adjustment for established risk factors, we examined the associations between 138 plasma proteins measured using two proximity extension assays and long-term risk of all-cause mortality in 3,918 participants of the population-based Malmö Diet and Cancer Study. To examine the reproducibility of... (More)

Novel methods to characterize the plasma proteome has made it possible to examine a wide range of proteins in large longitudinal cohort studies, but the complexity of the human proteome makes it difficult to identify robust protein-disease associations. Nevertheless, identification of individuals at high risk of early mortality is a central issue in clinical decision making and novel biomarkers may be useful to improve risk stratification. With adjustment for established risk factors, we examined the associations between 138 plasma proteins measured using two proximity extension assays and long-term risk of all-cause mortality in 3,918 participants of the population-based Malmö Diet and Cancer Study. To examine the reproducibility of protein-mortality associations we used a two-step random-split approach to simulate a discovery and replication cohort and conducted analyses using four different methods: Cox regression, stepwise Cox regression, Lasso-Cox regression, and random survival forest (RSF). In the total study population, we identified eight proteins that associated with all-cause mortality after adjustment for established risk factors and with Bonferroni correction for multiple testing. In the two-step analyses, the number of proteins selected for model inclusion in both random samples ranged from 6 to 21 depending on the method used. However, only three proteins were consistently included in both samples across all four methods (growth/differentiation factor-15 (GDF-15), N-terminal pro-B-type natriuretic peptide, and epididymal secretory protein E4). Using the total study population, the C-statistic for a model including established risk factors was 0.7222 and increased to 0.7284 with inclusion of the most predictive protein (GDF-15; P < 0.0001). All multiple protein models showed additional improvement in the C-statistic compared to the single protein model (all P < 0.0001). We identified several plasma proteins associated with increased risk of all-cause mortality independently of established risk factors. Further investigation into the putatively causal role of these proteins for longevity is needed. In addition, the examined methods for identifying multiple proteins showed tendencies for overfitting by including several putatively false positive findings. Thus, the reproducibility of findings using such approaches may be limited.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
in
Scientific Reports
volume
11
issue
1
article number
6734
publisher
Nature Publishing Group
external identifiers
  • scopus:85103070489
  • pmid:33762603
ISSN
2045-2322
DOI
10.1038/s41598-021-85991-z
language
English
LU publication?
yes
id
71921500-5e04-4297-b1bf-4c59bdefa8da
date added to LUP
2021-04-06 13:39:17
date last changed
2025-10-14 10:04:32
@article{71921500-5e04-4297-b1bf-4c59bdefa8da,
  abstract     = {{<p>Novel methods to characterize the plasma proteome has made it possible to examine a wide range of proteins in large longitudinal cohort studies, but the complexity of the human proteome makes it difficult to identify robust protein-disease associations. Nevertheless, identification of individuals at high risk of early mortality is a central issue in clinical decision making and novel biomarkers may be useful to improve risk stratification. With adjustment for established risk factors, we examined the associations between 138 plasma proteins measured using two proximity extension assays and long-term risk of all-cause mortality in 3,918 participants of the population-based Malmö Diet and Cancer Study. To examine the reproducibility of protein-mortality associations we used a two-step random-split approach to simulate a discovery and replication cohort and conducted analyses using four different methods: Cox regression, stepwise Cox regression, Lasso-Cox regression, and random survival forest (RSF). In the total study population, we identified eight proteins that associated with all-cause mortality after adjustment for established risk factors and with Bonferroni correction for multiple testing. In the two-step analyses, the number of proteins selected for model inclusion in both random samples ranged from 6 to 21 depending on the method used. However, only three proteins were consistently included in both samples across all four methods (growth/differentiation factor-15 (GDF-15), N-terminal pro-B-type natriuretic peptide, and epididymal secretory protein E4). Using the total study population, the C-statistic for a model including established risk factors was 0.7222 and increased to 0.7284 with inclusion of the most predictive protein (GDF-15; P &lt; 0.0001). All multiple protein models showed additional improvement in the C-statistic compared to the single protein model (all P &lt; 0.0001). We identified several plasma proteins associated with increased risk of all-cause mortality independently of established risk factors. Further investigation into the putatively causal role of these proteins for longevity is needed. In addition, the examined methods for identifying multiple proteins showed tendencies for overfitting by including several putatively false positive findings. Thus, the reproducibility of findings using such approaches may be limited.</p>}},
  author       = {{Drake, Isabel and Hindy, George and Almgren, Peter and Engström, Gunnar and Nilsson, Jan and Melander, Olle and Orho-Melander, Marju}},
  issn         = {{2045-2322}},
  language     = {{eng}},
  number       = {{1}},
  publisher    = {{Nature Publishing Group}},
  series       = {{Scientific Reports}},
  title        = {{Methodological considerations for identifying multiple plasma proteins associated with all-cause mortality in a population-based prospective cohort}},
  url          = {{http://dx.doi.org/10.1038/s41598-021-85991-z}},
  doi          = {{10.1038/s41598-021-85991-z}},
  volume       = {{11}},
  year         = {{2021}},
}