Best-subset instrumental variable selection method using mixed integer optimization with applications to health-related quality of life and education–wage analyses
(2026) In Statistics and Computing 36(1).- Abstract
The classical best-subset selection method has been demonstrated to be nondeterministic polynomial-time-hard and thus presents computational challenges. This problem can now be solved via advanced mixed integer optimization (MIO) algorithms for linear regression. We extend this methodology to linear instrumental variable (IV) regression and propose the best-subset instrumental variable (BSIV) method incorporating the MIO procedure. Classical IV estimation methods assume that IVs must not directly impact the outcome variable and should remain uncorrelated with nonmeasured variables. However, in practice, IVs are likely to be invalid, and existing methods can lead to a large bias relative to standard errors in certain situations. The... (More)
The classical best-subset selection method has been demonstrated to be nondeterministic polynomial-time-hard and thus presents computational challenges. This problem can now be solved via advanced mixed integer optimization (MIO) algorithms for linear regression. We extend this methodology to linear instrumental variable (IV) regression and propose the best-subset instrumental variable (BSIV) method incorporating the MIO procedure. Classical IV estimation methods assume that IVs must not directly impact the outcome variable and should remain uncorrelated with nonmeasured variables. However, in practice, IVs are likely to be invalid, and existing methods can lead to a large bias relative to standard errors in certain situations. The proposed BSIV estimator is robust in estimating causal effects in the presence of unknown IV validity. We demonstrate that the BSIV using MIO algorithms outperforms two-stage least squares, Lasso-type IVs, and two-sample analysis (median and mode estimators) through Monte Carlo simulations in terms of bias and relative efficiency. We analyze two datasets involving the health-related quality of life index and proximity and the education–wage relationship to demonstrate the utility of the proposed method.
(Less)
- author
- Qasim, Muhammad LU ; Månsson, Kristofer and Balakrishnan, Narayanaswamy
- organization
- publishing date
- 2026-02
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- Best-subset selection, Instrumental variables, Lasso, Mendelian randomization, Mixed integer programming, Variable selection
- in
- Statistics and Computing
- volume
- 36
- issue
- 1
- article number
- 12
- publisher
- Springer
- external identifiers
-
- scopus:105020716282
- ISSN
- 0960-3174
- DOI
- 10.1007/s11222-025-10760-1
- language
- English
- LU publication?
- yes
- id
- f27c9736-ec67-438c-b3e2-0a05ea69ffe4
- date added to LUP
- 2026-01-29 15:41:02
- date last changed
- 2026-01-29 15:42:04
@article{f27c9736-ec67-438c-b3e2-0a05ea69ffe4,
abstract = {{<p>The classical best-subset selection method has been demonstrated to be nondeterministic polynomial-time-hard and thus presents computational challenges. This problem can now be solved via advanced mixed integer optimization (MIO) algorithms for linear regression. We extend this methodology to linear instrumental variable (IV) regression and propose the best-subset instrumental variable (BSIV) method incorporating the MIO procedure. Classical IV estimation methods assume that IVs must not directly impact the outcome variable and should remain uncorrelated with nonmeasured variables. However, in practice, IVs are likely to be invalid, and existing methods can lead to a large bias relative to standard errors in certain situations. The proposed BSIV estimator is robust in estimating causal effects in the presence of unknown IV validity. We demonstrate that the BSIV using MIO algorithms outperforms two-stage least squares, Lasso-type IVs, and two-sample analysis (median and mode estimators) through Monte Carlo simulations in terms of bias and relative efficiency. We analyze two datasets involving the health-related quality of life index and proximity and the education–wage relationship to demonstrate the utility of the proposed method.</p>}},
author = {{Qasim, Muhammad and Månsson, Kristofer and Balakrishnan, Narayanaswamy}},
issn = {{0960-3174}},
keywords = {{Best-subset selection; Instrumental variables; Lasso; Mendelian randomization; Mixed integer programming; Variable selection}},
language = {{eng}},
number = {{1}},
publisher = {{Springer}},
series = {{Statistics and Computing}},
title = {{Best-subset instrumental variable selection method using mixed integer optimization with applications to health-related quality of life and education–wage analyses}},
url = {{http://dx.doi.org/10.1007/s11222-025-10760-1}},
doi = {{10.1007/s11222-025-10760-1}},
volume = {{36}},
year = {{2026}},
}