Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

A Reinforcement Learning Approach Based on Group Relative Policy Optimization for Economic Dispatch in Smart Grids

Rizki, Adil ; Touil, Achraf ; Echchatbi, Abdelwahed and Oucheikh, Rachid LU (2025) In Electricity 6(3).
Abstract

The Economic Dispatch Problem (EDP) plays a critical role in power system operations by trying to allocate power generation across multiple units at minimal cost while satisfying complex operational constraints. Traditional optimization techniques struggle with the non-convexities introduced by factors such as valve-point effects, prohibited operating zones, and spinning reserve requirements. While metaheuristics methods have shown promise, they often suffer from convergence issues and constraint-handling limitations. In this study, we introduce a novel application of Group Relative Policy Optimization (GRPO), a reinforcement learning framework that extends Proximal Policy Optimization by integrating group-based learning and relative... (More)

The Economic Dispatch Problem (EDP) plays a critical role in power system operations by trying to allocate power generation across multiple units at minimal cost while satisfying complex operational constraints. Traditional optimization techniques struggle with the non-convexities introduced by factors such as valve-point effects, prohibited operating zones, and spinning reserve requirements. While metaheuristics methods have shown promise, they often suffer from convergence issues and constraint-handling limitations. In this study, we introduce a novel application of Group Relative Policy Optimization (GRPO), a reinforcement learning framework that extends Proximal Policy Optimization by integrating group-based learning and relative performance assessments. The proposed GRPO approach incorporates smart initialization, adaptive exploration, and elite-guided updates tailored to the EDP’s structure. Our method consistently produces high-quality, feasible solutions with faster convergence compared to state-of-the-art metaheuristics and learning-based methods. For instance, in the case of the 15-unit system, GRPO achieved the best cost of USD 32,421.67/h with full constraint satisfaction in just 4.24 s, surpassing many previous solutions. The algorithm also demonstrates excellent scalability, generalizability, and stability across larger-scale systems without requiring parameter retuning. These results highlight GRPO’s potential as a robust and efficient tool for real-time energy scheduling in smart grid environments.

(Less)
Please use this url to cite or link to this publication:
author
; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
constraint handling, economic dispatch problem, energy scheduling, group relative policy optimization, non-convex optimization, reinforcement learning, smart grid
in
Electricity
volume
6
issue
3
article number
49
publisher
MDPI AG
external identifiers
  • scopus:105017058901
DOI
10.3390/electricity6030049
language
English
LU publication?
yes
id
7bbc0e85-f099-45ef-a185-eb40f76787ca
date added to LUP
2025-11-28 09:58:06
date last changed
2025-11-28 09:59:08
@article{7bbc0e85-f099-45ef-a185-eb40f76787ca,
  abstract     = {{<p>The Economic Dispatch Problem (EDP) plays a critical role in power system operations by trying to allocate power generation across multiple units at minimal cost while satisfying complex operational constraints. Traditional optimization techniques struggle with the non-convexities introduced by factors such as valve-point effects, prohibited operating zones, and spinning reserve requirements. While metaheuristics methods have shown promise, they often suffer from convergence issues and constraint-handling limitations. In this study, we introduce a novel application of Group Relative Policy Optimization (GRPO), a reinforcement learning framework that extends Proximal Policy Optimization by integrating group-based learning and relative performance assessments. The proposed GRPO approach incorporates smart initialization, adaptive exploration, and elite-guided updates tailored to the EDP’s structure. Our method consistently produces high-quality, feasible solutions with faster convergence compared to state-of-the-art metaheuristics and learning-based methods. For instance, in the case of the 15-unit system, GRPO achieved the best cost of USD 32,421.67/h with full constraint satisfaction in just 4.24 s, surpassing many previous solutions. The algorithm also demonstrates excellent scalability, generalizability, and stability across larger-scale systems without requiring parameter retuning. These results highlight GRPO’s potential as a robust and efficient tool for real-time energy scheduling in smart grid environments.</p>}},
  author       = {{Rizki, Adil and Touil, Achraf and Echchatbi, Abdelwahed and Oucheikh, Rachid}},
  keywords     = {{constraint handling; economic dispatch problem; energy scheduling; group relative policy optimization; non-convex optimization; reinforcement learning; smart grid}},
  language     = {{eng}},
  number       = {{3}},
  publisher    = {{MDPI AG}},
  series       = {{Electricity}},
  title        = {{A Reinforcement Learning Approach Based on Group Relative Policy Optimization for Economic Dispatch in Smart Grids}},
  url          = {{http://dx.doi.org/10.3390/electricity6030049}},
  doi          = {{10.3390/electricity6030049}},
  volume       = {{6}},
  year         = {{2025}},
}