Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Adaptive Bayesian SLOPE : Model Selection With Incomplete Data

Jiang, Wei ; Bogdan, Małgorzata LU ; Josse, Julie ; Majewski, Szymon ; Miasojedow, Błażej and Ročková, Veronika (2022) In Journal of Computational and Graphical Statistics 31(1). p.113-137
Abstract

We consider the problem of variable selection in high-dimensional settings with missing observations among the covariates. To address this relatively understudied problem, we propose a new synergistic procedure—adaptive Bayesian SLOPE with missing values—which effectively combines SLOPE (sorted l 1 regularization) with the spike-and-slab LASSO (SSL) and is accompanied by an efficient stochastic approximation of expected maximization (SAEM) algorithm to handle missing data. Similarly as in SSL, the regression coefficients are regarded as arising from a hierarchical model consisting of two groups: the spike for the inactive and the slab for the active. However, instead of assigning independent spike and slab Laplace priors for... (More)

We consider the problem of variable selection in high-dimensional settings with missing observations among the covariates. To address this relatively understudied problem, we propose a new synergistic procedure—adaptive Bayesian SLOPE with missing values—which effectively combines SLOPE (sorted l 1 regularization) with the spike-and-slab LASSO (SSL) and is accompanied by an efficient stochastic approximation of expected maximization (SAEM) algorithm to handle missing data. Similarly as in SSL, the regression coefficients are regarded as arising from a hierarchical model consisting of two groups: the spike for the inactive and the slab for the active. However, instead of assigning independent spike and slab Laplace priors for each covariate, here we deploy a joint SLOPE “spike-and-slab” prior which takes into account the ordering of coefficient magnitudes in order to control for false discoveries. We position our approach within a Bayesian framework which allows for simultaneous variable selection and parameter estimation while handling missing data. Through extensive simulations, we demonstrate satisfactory performance in terms of power, false discovery rate (FDR) and estimation bias under a wide range of scenarios including complete data and existence of missingness. Finally, we analyze a real dataset consisting of patients from Paris hospitals who underwent severe trauma, where we show competitive performance in predicting platelet levels. Our methodology has been implemented in C++ and wrapped into open source R programs for public use. Supplemental files for this article are available online.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; and
author collaboration
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
FDR control, Health data, Incomplete data, Penalized regression, Spike and slab prior, Stochastic approximation EM
in
Journal of Computational and Graphical Statistics
volume
31
issue
1
pages
113 - 137
publisher
American Statistical Association
external identifiers
  • scopus:85117475479
ISSN
1061-8600
DOI
10.1080/10618600.2021.1963263
language
English
LU publication?
yes
additional info
Publisher Copyright: © 2021 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.
id
ad1a2291-e9e2-4a7c-aa10-00ffa6e0b75b
date added to LUP
2021-11-22 09:20:21
date last changed
2022-06-29 18:19:43
@article{ad1a2291-e9e2-4a7c-aa10-00ffa6e0b75b,
  abstract     = {{<p>We consider the problem of variable selection in high-dimensional settings with missing observations among the covariates. To address this relatively understudied problem, we propose a new synergistic procedure—adaptive Bayesian SLOPE with missing values—which effectively combines SLOPE (sorted l <sub>1</sub> regularization) with the spike-and-slab LASSO (SSL) and is accompanied by an efficient stochastic approximation of expected maximization (SAEM) algorithm to handle missing data. Similarly as in SSL, the regression coefficients are regarded as arising from a hierarchical model consisting of two groups: the spike for the inactive and the slab for the active. However, instead of assigning independent spike and slab Laplace priors for each covariate, here we deploy a joint SLOPE “spike-and-slab” prior which takes into account the ordering of coefficient magnitudes in order to control for false discoveries. We position our approach within a Bayesian framework which allows for simultaneous variable selection and parameter estimation while handling missing data. Through extensive simulations, we demonstrate satisfactory performance in terms of power, false discovery rate (FDR) and estimation bias under a wide range of scenarios including complete data and existence of missingness. Finally, we analyze a real dataset consisting of patients from Paris hospitals who underwent severe trauma, where we show competitive performance in predicting platelet levels. Our methodology has been implemented in C++ and wrapped into open source R programs for public use. Supplemental files for this article are available online.</p>}},
  author       = {{Jiang, Wei and Bogdan, Małgorzata and Josse, Julie and Majewski, Szymon and Miasojedow, Błażej and Ročková, Veronika}},
  issn         = {{1061-8600}},
  keywords     = {{FDR control; Health data; Incomplete data; Penalized regression; Spike and slab prior; Stochastic approximation EM}},
  language     = {{eng}},
  number       = {{1}},
  pages        = {{113--137}},
  publisher    = {{American Statistical Association}},
  series       = {{Journal of Computational and Graphical Statistics}},
  title        = {{Adaptive Bayesian SLOPE : Model Selection With Incomplete Data}},
  url          = {{http://dx.doi.org/10.1080/10618600.2021.1963263}},
  doi          = {{10.1080/10618600.2021.1963263}},
  volume       = {{31}},
  year         = {{2022}},
}