A Reinforcement Learning Approach Based on Group Relative Policy Optimization for Economic Dispatch in Smart Grids
(2025) In Electricity 6(3).- Abstract
The Economic Dispatch Problem (EDP) plays a critical role in power system operations by trying to allocate power generation across multiple units at minimal cost while satisfying complex operational constraints. Traditional optimization techniques struggle with the non-convexities introduced by factors such as valve-point effects, prohibited operating zones, and spinning reserve requirements. While metaheuristics methods have shown promise, they often suffer from convergence issues and constraint-handling limitations. In this study, we introduce a novel application of Group Relative Policy Optimization (GRPO), a reinforcement learning framework that extends Proximal Policy Optimization by integrating group-based learning and relative... (More)
The Economic Dispatch Problem (EDP) plays a critical role in power system operations by trying to allocate power generation across multiple units at minimal cost while satisfying complex operational constraints. Traditional optimization techniques struggle with the non-convexities introduced by factors such as valve-point effects, prohibited operating zones, and spinning reserve requirements. While metaheuristics methods have shown promise, they often suffer from convergence issues and constraint-handling limitations. In this study, we introduce a novel application of Group Relative Policy Optimization (GRPO), a reinforcement learning framework that extends Proximal Policy Optimization by integrating group-based learning and relative performance assessments. The proposed GRPO approach incorporates smart initialization, adaptive exploration, and elite-guided updates tailored to the EDP’s structure. Our method consistently produces high-quality, feasible solutions with faster convergence compared to state-of-the-art metaheuristics and learning-based methods. For instance, in the case of the 15-unit system, GRPO achieved the best cost of USD 32,421.67/h with full constraint satisfaction in just 4.24 s, surpassing many previous solutions. The algorithm also demonstrates excellent scalability, generalizability, and stability across larger-scale systems without requiring parameter retuning. These results highlight GRPO’s potential as a robust and efficient tool for real-time energy scheduling in smart grid environments.
(Less)
- author
- Rizki, Adil ; Touil, Achraf ; Echchatbi, Abdelwahed and Oucheikh, Rachid LU
- organization
- publishing date
- 2025-09
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- constraint handling, economic dispatch problem, energy scheduling, group relative policy optimization, non-convex optimization, reinforcement learning, smart grid
- in
- Electricity
- volume
- 6
- issue
- 3
- article number
- 49
- publisher
- MDPI AG
- external identifiers
-
- scopus:105017058901
- DOI
- 10.3390/electricity6030049
- language
- English
- LU publication?
- yes
- id
- 7bbc0e85-f099-45ef-a185-eb40f76787ca
- date added to LUP
- 2025-11-28 09:58:06
- date last changed
- 2025-11-28 09:59:08
@article{7bbc0e85-f099-45ef-a185-eb40f76787ca,
abstract = {{<p>The Economic Dispatch Problem (EDP) plays a critical role in power system operations by trying to allocate power generation across multiple units at minimal cost while satisfying complex operational constraints. Traditional optimization techniques struggle with the non-convexities introduced by factors such as valve-point effects, prohibited operating zones, and spinning reserve requirements. While metaheuristics methods have shown promise, they often suffer from convergence issues and constraint-handling limitations. In this study, we introduce a novel application of Group Relative Policy Optimization (GRPO), a reinforcement learning framework that extends Proximal Policy Optimization by integrating group-based learning and relative performance assessments. The proposed GRPO approach incorporates smart initialization, adaptive exploration, and elite-guided updates tailored to the EDP’s structure. Our method consistently produces high-quality, feasible solutions with faster convergence compared to state-of-the-art metaheuristics and learning-based methods. For instance, in the case of the 15-unit system, GRPO achieved the best cost of USD 32,421.67/h with full constraint satisfaction in just 4.24 s, surpassing many previous solutions. The algorithm also demonstrates excellent scalability, generalizability, and stability across larger-scale systems without requiring parameter retuning. These results highlight GRPO’s potential as a robust and efficient tool for real-time energy scheduling in smart grid environments.</p>}},
author = {{Rizki, Adil and Touil, Achraf and Echchatbi, Abdelwahed and Oucheikh, Rachid}},
keywords = {{constraint handling; economic dispatch problem; energy scheduling; group relative policy optimization; non-convex optimization; reinforcement learning; smart grid}},
language = {{eng}},
number = {{3}},
publisher = {{MDPI AG}},
series = {{Electricity}},
title = {{A Reinforcement Learning Approach Based on Group Relative Policy Optimization for Economic Dispatch in Smart Grids}},
url = {{http://dx.doi.org/10.3390/electricity6030049}},
doi = {{10.3390/electricity6030049}},
volume = {{6}},
year = {{2025}},
}