Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Best-subset instrumental variable selection method using mixed integer optimization with applications to health-related quality of life and education–wage analyses

Qasim, Muhammad LU ; Månsson, Kristofer and Balakrishnan, Narayanaswamy (2026) In Statistics and Computing 36(1).
Abstract

The classical best-subset selection method has been demonstrated to be nondeterministic polynomial-time-hard and thus presents computational challenges. This problem can now be solved via advanced mixed integer optimization (MIO) algorithms for linear regression. We extend this methodology to linear instrumental variable (IV) regression and propose the best-subset instrumental variable (BSIV) method incorporating the MIO procedure. Classical IV estimation methods assume that IVs must not directly impact the outcome variable and should remain uncorrelated with nonmeasured variables. However, in practice, IVs are likely to be invalid, and existing methods can lead to a large bias relative to standard errors in certain situations. The... (More)

The classical best-subset selection method has been demonstrated to be nondeterministic polynomial-time-hard and thus presents computational challenges. This problem can now be solved via advanced mixed integer optimization (MIO) algorithms for linear regression. We extend this methodology to linear instrumental variable (IV) regression and propose the best-subset instrumental variable (BSIV) method incorporating the MIO procedure. Classical IV estimation methods assume that IVs must not directly impact the outcome variable and should remain uncorrelated with nonmeasured variables. However, in practice, IVs are likely to be invalid, and existing methods can lead to a large bias relative to standard errors in certain situations. The proposed BSIV estimator is robust in estimating causal effects in the presence of unknown IV validity. We demonstrate that the BSIV using MIO algorithms outperforms two-stage least squares, Lasso-type IVs, and two-sample analysis (median and mode estimators) through Monte Carlo simulations in terms of bias and relative efficiency. We analyze two datasets involving the health-related quality of life index and proximity and the education–wage relationship to demonstrate the utility of the proposed method.

(Less)
Please use this url to cite or link to this publication:
author
; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Best-subset selection, Instrumental variables, Lasso, Mendelian randomization, Mixed integer programming, Variable selection
in
Statistics and Computing
volume
36
issue
1
article number
12
publisher
Springer
external identifiers
  • scopus:105020716282
ISSN
0960-3174
DOI
10.1007/s11222-025-10760-1
language
English
LU publication?
yes
id
f27c9736-ec67-438c-b3e2-0a05ea69ffe4
date added to LUP
2026-01-29 15:41:02
date last changed
2026-01-29 15:42:04
@article{f27c9736-ec67-438c-b3e2-0a05ea69ffe4,
  abstract     = {{<p>The classical best-subset selection method has been demonstrated to be nondeterministic polynomial-time-hard and thus presents computational challenges. This problem can now be solved via advanced mixed integer optimization (MIO) algorithms for linear regression. We extend this methodology to linear instrumental variable (IV) regression and propose the best-subset instrumental variable (BSIV) method incorporating the MIO procedure. Classical IV estimation methods assume that IVs must not directly impact the outcome variable and should remain uncorrelated with nonmeasured variables. However, in practice, IVs are likely to be invalid, and existing methods can lead to a large bias relative to standard errors in certain situations. The proposed BSIV estimator is robust in estimating causal effects in the presence of unknown IV validity. We demonstrate that the BSIV using MIO algorithms outperforms two-stage least squares, Lasso-type IVs, and two-sample analysis (median and mode estimators) through Monte Carlo simulations in terms of bias and relative efficiency. We analyze two datasets involving the health-related quality of life index and proximity and the education–wage relationship to demonstrate the utility of the proposed method.</p>}},
  author       = {{Qasim, Muhammad and Månsson, Kristofer and Balakrishnan, Narayanaswamy}},
  issn         = {{0960-3174}},
  keywords     = {{Best-subset selection; Instrumental variables; Lasso; Mendelian randomization; Mixed integer programming; Variable selection}},
  language     = {{eng}},
  number       = {{1}},
  publisher    = {{Springer}},
  series       = {{Statistics and Computing}},
  title        = {{Best-subset instrumental variable selection method using mixed integer optimization with applications to health-related quality of life and education–wage analyses}},
  url          = {{http://dx.doi.org/10.1007/s11222-025-10760-1}},
  doi          = {{10.1007/s11222-025-10760-1}},
  volume       = {{36}},
  year         = {{2026}},
}