Rank-based lasso - Efficient methods for high-dimensional robust model selection

Rejchel, Wojciech; Bogdan, Malgorzata

Rank-based lasso - Efficient methods for high-dimensional robust model selection

Mark

Rejchel, Wojciech ^LU and Bogdan, Malgorzata ^LU (2020) In Journal of Machine Learning Research 21. p.1-47

Abstract: We consider the problem of identifying significant predictors in large data bases, where the response variable depends on the linear combination of explanatory variables through an unknown monotonic link function, corrupted with the noise from the unknown distribution. We utilize the natural, robust and efficient approach, which relies on replacing values of the response variables by their ranks and then identifying significant predictors by using well known Lasso. We provide new consistency results for the proposed procedure (called,,RankLasso”) and extend the scope of its applications by proposing its thresholded and adaptive versions. Our theoretical results show that these modifications can identify the set of relevant predictors... (More); We consider the problem of identifying significant predictors in large data bases, where the response variable depends on the linear combination of explanatory variables through an unknown monotonic link function, corrupted with the noise from the unknown distribution. We utilize the natural, robust and efficient approach, which relies on replacing values of the response variables by their ranks and then identifying significant predictors by using well known Lasso. We provide new consistency results for the proposed procedure (called,,RankLasso”) and extend the scope of its applications by proposing its thresholded and adaptive versions. Our theoretical results show that these modifications can identify the set of relevant predictors under a wide range of data generating scenarios. Theoretical results are supported by the simulation study and the real data analysis, which show that our methods can properly identify relevant predictors, even when the error terms come from the Cauchy distribution and the link function is nonlinear. They also demonstrate the superiority of the modified versions of RankLasso over its regular version in the case when predictors are substantially correlated. The numerical study shows also that RankLasso performs substantially better in model selection than LADLasso, which is a well established methodology for robust model selection.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/78681a42-91db-408f-a421-c592749e9ce8

author

Rejchel, Wojciech ^LU and Bogdan, Malgorzata ^LU

organization

Department of Statistics

publishing date

2020-11

type

Contribution to journal

publication status

published

subject

Probability Theory and Statistics

keywords

Lasso, Model Selection, Ranks, Single Index Model, Sparsity, U-statistics

in

Journal of Machine Learning Research

volume

21

pages

1 - 47

publisher

Microtome Publishing

external identifiers

scopus:85098455101

ISSN

1532-4435

language

English

LU publication?

yes

additional info

Funding Information: We would like to thank Patrick Tardivel for helpful comments. We gratefully acknowledge the grant of the Wroclaw Center of Networking and Supercomputing (WCSS), where most of the computations were performed. Ma lgorzata Bogdan is supported by Polish National Science Center grants no. 2016/23/B/ST1/00454. We would like also to thank the Associate Editor and two reviewers for their comments, which helped us improve the manuscript. Publisher Copyright: © 2020 Wojciech Rejchel and Malgorzata Bogdan. License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v21/20-120.html.

id

78681a42-91db-408f-a421-c592749e9ce8

alternative location

https://www.jmlr.org/papers/volume21/20-120/20-120.pdf

date added to LUP

2023-12-08 09:26:59

date last changed

2025-04-04 14:04:02

@article{78681a42-91db-408f-a421-c592749e9ce8,
  abstract     = {{<p>We consider the problem of identifying significant predictors in large data bases, where the response variable depends on the linear combination of explanatory variables through an unknown monotonic link function, corrupted with the noise from the unknown distribution. We utilize the natural, robust and efficient approach, which relies on replacing values of the response variables by their ranks and then identifying significant predictors by using well known Lasso. We provide new consistency results for the proposed procedure (called,,RankLasso”) and extend the scope of its applications by proposing its thresholded and adaptive versions. Our theoretical results show that these modifications can identify the set of relevant predictors under a wide range of data generating scenarios. Theoretical results are supported by the simulation study and the real data analysis, which show that our methods can properly identify relevant predictors, even when the error terms come from the Cauchy distribution and the link function is nonlinear. They also demonstrate the superiority of the modified versions of RankLasso over its regular version in the case when predictors are substantially correlated. The numerical study shows also that RankLasso performs substantially better in model selection than LADLasso, which is a well established methodology for robust model selection.</p>}},
  author       = {{Rejchel, Wojciech and Bogdan, Malgorzata}},
  issn         = {{1532-4435}},
  keywords     = {{Lasso; Model Selection; Ranks; Single Index Model; Sparsity; U-statistics}},
  language     = {{eng}},
  pages        = {{1--47}},
  publisher    = {{Microtome Publishing}},
  series       = {{Journal of Machine Learning Research}},
  title        = {{Rank-based lasso - Efficient methods for high-dimensional robust model selection}},
  url          = {{https://www.jmlr.org/papers/volume21/20-120/20-120.pdf}},
  volume       = {{21}},
  year         = {{2020}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Rank-based lasso - Efficient methods for high-dimensional robust model selection