Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

The Lasso and Ridge Regression Yield Biased Estimates of Imbalanced Binary Features

Larsson, Johan LU orcid and Wallin, Jonas LU (2024)
Abstract
Many regularized methods, such as the lasso and ridge regression, are sensitive to the scales of the features in the data. As a consequence, it has become standard practice to normalize
(center and scale) features such that they share the same scale. For continuous data, the most common strategy is standardization: centering and scaling each feature by its mean and
and standard deviation, respectively. For binary data, especially when it is high-dimensional and sparse, the most common strategy, however, is to not scale at all. In this paper, we show
that this choice has dramatic effects for the estimated model in the case when the binary features are imbalanced and that these effects, moreover, depend on the type... (More)
Many regularized methods, such as the lasso and ridge regression, are sensitive to the scales of the features in the data. As a consequence, it has become standard practice to normalize
(center and scale) features such that they share the same scale. For continuous data, the most common strategy is standardization: centering and scaling each feature by its mean and
and standard deviation, respectively. For binary data, especially when it is high-dimensional and sparse, the most common strategy, however, is to not scale at all. In this paper, we show
that this choice has dramatic effects for the estimated model in the case when the binary features are imbalanced and that these effects, moreover, depend on the type regularization
(lasso or ridge) used. In particular, we demonstrate the size of a feature’s corresponding coefficient in the lasso is directly related to its class imbalance and that this effect depends
on the normalization used. We suggest possible remedies for this problem and also discuss the case when data is mixed, that is, contains both continuous and binary features. (Less)
Please use this url to cite or link to this publication:
author
and
organization
publishing date
type
Other contribution
publication status
unpublished
subject
pages
27 pages
project
Optimization and Algorithms in Sparse Regression: Screening Rules, Coordinate Descent, and Normalization
language
English
LU publication?
yes
id
9666f660-209f-48e3-8a66-3ac213cea9a0
date added to LUP
2024-05-20 11:54:36
date last changed
2024-05-22 10:56:57
@misc{9666f660-209f-48e3-8a66-3ac213cea9a0,
  abstract     = {{Many regularized methods, such as the lasso and ridge regression, are sensitive to the scales of the features in the data. As a consequence, it has become standard practice to normalize<br/>(center and scale) features such that they share the same scale. For continuous data, the most common strategy is standardization: centering and scaling each feature by its mean and<br/>and standard deviation, respectively. For binary data, especially when it is high-dimensional and sparse, the most common strategy, however, is to not scale at all. In this paper, we show<br/>that this choice has dramatic effects for the estimated model in the case when the binary features are imbalanced and that these effects, moreover, depend on the type regularization<br/>(lasso or ridge) used. In particular, we demonstrate the size of a feature’s corresponding coefficient in the lasso is directly related to its class imbalance and that this effect depends<br/>on the normalization used. We suggest possible remedies for this problem and also discuss the case when data is mixed, that is, contains both continuous and binary features.}},
  author       = {{Larsson, Johan and Wallin, Jonas}},
  language     = {{eng}},
  title        = {{The Lasso and Ridge Regression Yield Biased Estimates of Imbalanced Binary Features}},
  url          = {{https://lup.lub.lu.se/search/files/184650909/larsson-wallin-2024a.pdf}},
  year         = {{2024}},
}