Using machine learning to select variables in data envelopment analysis : Simulations and application using electricity distribution data

Duras, Toni; Javed, Farrukh; Månsson, Kristofer; Sjölander, Pär; Söderberg, Magnus

Using machine learning to select variables in data envelopment analysis : Simulations and application using electricity distribution data

Mark

Duras, Toni ; Javed, Farrukh ^LU ; Månsson, Kristofer ; Sjölander, Pär and Söderberg, Magnus (2023) In Energy Economics 120.

Abstract: Agencies that regulate electricity providers often apply nonparametric data envelopment analysis (DEA) to assess the relative efficiency of each firm. The reliability and validity of DEA are contingent upon selecting relevant input variables. In the era of big (wide) data, the assumptions of traditional variable selection techniques are often violated due to challenges related to high-dimensional data and their standard empirical properties. Currently, regulators have access to a large number of potential input variables. Therefore, our aim is to introduce new machine learning methods for regulators of the energy market. We also propose a new two-step analytical approach where, in the first step, the machine learning-based adaptive... (More); Agencies that regulate electricity providers often apply nonparametric data envelopment analysis (DEA) to assess the relative efficiency of each firm. The reliability and validity of DEA are contingent upon selecting relevant input variables. In the era of big (wide) data, the assumptions of traditional variable selection techniques are often violated due to challenges related to high-dimensional data and their standard empirical properties. Currently, regulators have access to a large number of potential input variables. Therefore, our aim is to introduce new machine learning methods for regulators of the energy market. We also propose a new two-step analytical approach where, in the first step, the machine learning-based adaptive least absolute shrinkage and selection operator (ALASSO) is used to select variables and, in the second step, selected variables are used in a DEA model. In contrast to previous research, we find, by using a more realistic data-generating process common for production functions (i.e., Cobb–Douglas and Translog), that the performance of different machine learning techniques differs substantially in different empirically relevant situations. Simulations also reveal that the ALASSO is superior to other machine learning and regression-based methods when the collinearity is low or moderate. However, in situations of multicollinearity, the LASSO approach exhibits the best performance. We also use real data from the Swedish electricity distribution market to illustrate the empirical relevance of selecting the most appropriate variable selection method.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/8a6c86d1-b54e-4111-871b-073420133db9

author

Duras, Toni ; Javed, Farrukh ^LU ; Månsson, Kristofer ; Sjölander, Pär and Söderberg, Magnus

organization

Department of Statistics

publishing date

2023-04

type

Contribution to journal

publication status

published

subject

Computational Mathematics

keywords

Curse of dimensionality, Data envelopment analysis, Machine learning, Regulation, Variable selection

in

Energy Economics

volume

120

article number

106621

publisher

Elsevier

external identifiers

scopus:85150299788

ISSN

0140-9883

DOI

10.1016/j.eneco.2023.106621

language

English

LU publication?

yes

id

8a6c86d1-b54e-4111-871b-073420133db9

date added to LUP

2023-04-24 13:26:55

date last changed

2025-10-14 10:19:26

@article{8a6c86d1-b54e-4111-871b-073420133db9,
  abstract     = {{<p>Agencies that regulate electricity providers often apply nonparametric data envelopment analysis (DEA) to assess the relative efficiency of each firm. The reliability and validity of DEA are contingent upon selecting relevant input variables. In the era of big (wide) data, the assumptions of traditional variable selection techniques are often violated due to challenges related to high-dimensional data and their standard empirical properties. Currently, regulators have access to a large number of potential input variables. Therefore, our aim is to introduce new machine learning methods for regulators of the energy market. We also propose a new two-step analytical approach where, in the first step, the machine learning-based adaptive least absolute shrinkage and selection operator (ALASSO) is used to select variables and, in the second step, selected variables are used in a DEA model. In contrast to previous research, we find, by using a more realistic data-generating process common for production functions (i.e., Cobb–Douglas and Translog), that the performance of different machine learning techniques differs substantially in different empirically relevant situations. Simulations also reveal that the ALASSO is superior to other machine learning and regression-based methods when the collinearity is low or moderate. However, in situations of multicollinearity, the LASSO approach exhibits the best performance. We also use real data from the Swedish electricity distribution market to illustrate the empirical relevance of selecting the most appropriate variable selection method.</p>}},
  author       = {{Duras, Toni and Javed, Farrukh and Månsson, Kristofer and Sjölander, Pär and Söderberg, Magnus}},
  issn         = {{0140-9883}},
  keywords     = {{Curse of dimensionality; Data envelopment analysis; Machine learning; Regulation; Variable selection}},
  language     = {{eng}},
  publisher    = {{Elsevier}},
  series       = {{Energy Economics}},
  title        = {{Using machine learning to select variables in data envelopment analysis : Simulations and application using electricity distribution data}},
  url          = {{http://dx.doi.org/10.1016/j.eneco.2023.106621}},
  doi          = {{10.1016/j.eneco.2023.106621}},
  volume       = {{120}},
  year         = {{2023}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Using machine learning to select variables in data envelopment analysis : Simulations and application using electricity distribution data