A Reinforcement Learning Approach Based on Group Relative Policy Optimization for Economic Dispatch in Smart Grids

Rizki, Adil; Touil, Achraf; Echchatbi, Abdelwahed; Oucheikh, Rachid

A Reinforcement Learning Approach Based on Group Relative Policy Optimization for Economic Dispatch in Smart Grids

Mark

Rizki, Adil ; Touil, Achraf ; Echchatbi, Abdelwahed and Oucheikh, Rachid ^LU (2025) In Electricity 6(3).

Abstract: The Economic Dispatch Problem (EDP) plays a critical role in power system operations by trying to allocate power generation across multiple units at minimal cost while satisfying complex operational constraints. Traditional optimization techniques struggle with the non-convexities introduced by factors such as valve-point effects, prohibited operating zones, and spinning reserve requirements. While metaheuristics methods have shown promise, they often suffer from convergence issues and constraint-handling limitations. In this study, we introduce a novel application of Group Relative Policy Optimization (GRPO), a reinforcement learning framework that extends Proximal Policy Optimization by integrating group-based learning and relative... (More); The Economic Dispatch Problem (EDP) plays a critical role in power system operations by trying to allocate power generation across multiple units at minimal cost while satisfying complex operational constraints. Traditional optimization techniques struggle with the non-convexities introduced by factors such as valve-point effects, prohibited operating zones, and spinning reserve requirements. While metaheuristics methods have shown promise, they often suffer from convergence issues and constraint-handling limitations. In this study, we introduce a novel application of Group Relative Policy Optimization (GRPO), a reinforcement learning framework that extends Proximal Policy Optimization by integrating group-based learning and relative performance assessments. The proposed GRPO approach incorporates smart initialization, adaptive exploration, and elite-guided updates tailored to the EDP’s structure. Our method consistently produces high-quality, feasible solutions with faster convergence compared to state-of-the-art metaheuristics and learning-based methods. For instance, in the case of the 15-unit system, GRPO achieved the best cost of USD 32,421.67/h with full constraint satisfaction in just 4.24 s, surpassing many previous solutions. The algorithm also demonstrates excellent scalability, generalizability, and stability across larger-scale systems without requiring parameter retuning. These results highlight GRPO’s potential as a robust and efficient tool for real-time energy scheduling in smart grid environments.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/7bbc0e85-f099-45ef-a185-eb40f76787ca

author

Rizki, Adil ; Touil, Achraf ; Echchatbi, Abdelwahed and Oucheikh, Rachid ^LU

organization

publishing date

2025-09

type

Contribution to journal

publication status

published

subject

Embedded Systems

keywords

constraint handling, economic dispatch problem, energy scheduling, group relative policy optimization, non-convex optimization, reinforcement learning, smart grid

in

Electricity

volume

6

issue

3

article number

49

publisher

MDPI AG

external identifiers

scopus:105017058901

DOI

10.3390/electricity6030049

language

English

LU publication?

yes

id

7bbc0e85-f099-45ef-a185-eb40f76787ca

date added to LUP

2025-11-28 09:58:06

date last changed

2025-11-28 09:59:08

@article{7bbc0e85-f099-45ef-a185-eb40f76787ca,
  abstract     = {{<p>The Economic Dispatch Problem (EDP) plays a critical role in power system operations by trying to allocate power generation across multiple units at minimal cost while satisfying complex operational constraints. Traditional optimization techniques struggle with the non-convexities introduced by factors such as valve-point effects, prohibited operating zones, and spinning reserve requirements. While metaheuristics methods have shown promise, they often suffer from convergence issues and constraint-handling limitations. In this study, we introduce a novel application of Group Relative Policy Optimization (GRPO), a reinforcement learning framework that extends Proximal Policy Optimization by integrating group-based learning and relative performance assessments. The proposed GRPO approach incorporates smart initialization, adaptive exploration, and elite-guided updates tailored to the EDP’s structure. Our method consistently produces high-quality, feasible solutions with faster convergence compared to state-of-the-art metaheuristics and learning-based methods. For instance, in the case of the 15-unit system, GRPO achieved the best cost of USD 32,421.67/h with full constraint satisfaction in just 4.24 s, surpassing many previous solutions. The algorithm also demonstrates excellent scalability, generalizability, and stability across larger-scale systems without requiring parameter retuning. These results highlight GRPO’s potential as a robust and efficient tool for real-time energy scheduling in smart grid environments.</p>}},
  author       = {{Rizki, Adil and Touil, Achraf and Echchatbi, Abdelwahed and Oucheikh, Rachid}},
  keywords     = {{constraint handling; economic dispatch problem; energy scheduling; group relative policy optimization; non-convex optimization; reinforcement learning; smart grid}},
  language     = {{eng}},
  number       = {{3}},
  publisher    = {{MDPI AG}},
  series       = {{Electricity}},
  title        = {{A Reinforcement Learning Approach Based on Group Relative Policy Optimization for Economic Dispatch in Smart Grids}},
  url          = {{http://dx.doi.org/10.3390/electricity6030049}},
  doi          = {{10.3390/electricity6030049}},
  volume       = {{6}},
  year         = {{2025}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

A Reinforcement Learning Approach Based on Group Relative Policy Optimization for Economic Dispatch in Smart Grids