Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Improving a Reinforcement Learning Algorithm for Resource Scheduling

Wilson Andersson, Elin and Håkansson, Johan (2022)
Department of Automatic Control
Abstract
This thesis aims to further investigate the viability of using reinforcement learning, specifically Q-learning, to schedule shared resources on the Ericsson Many-Core Architecture (EMCA). This was first explored by Patrik Trulsson in his master thesis Dynamic Scheduling of Shared Resources using Reinforcement Learning (2021). The shared resources complete jobs assigned to them, and the jobs have deadlines as well as a latency. The Q-learning based scheduler should minimize the latency in the system. Most importantly, it should avoid missing deadlines. In this work, the Q-learning algorithm was tested on a simulation model of the EMCA that Trulsson built. Its performance was compared to a baseline and random scheduler. Several parts of the... (More)
This thesis aims to further investigate the viability of using reinforcement learning, specifically Q-learning, to schedule shared resources on the Ericsson Many-Core Architecture (EMCA). This was first explored by Patrik Trulsson in his master thesis Dynamic Scheduling of Shared Resources using Reinforcement Learning (2021). The shared resources complete jobs assigned to them, and the jobs have deadlines as well as a latency. The Q-learning based scheduler should minimize the latency in the system. Most importantly, it should avoid missing deadlines. In this work, the Q-learning algorithm was tested on a simulation model of the EMCA that Trulsson built. Its performance was compared to a baseline and random scheduler. Several parts of the Q-learning algorithm were evaluated and modified. The action and state space have been made smaller, and the state space has been made more applicable to the real system. The reward function, as well as other parameters of the Q-learning, were altered for better performance. The result of all of these changes was that the Q-learning algorithm saw an increase in performance. Initially, it performed slightly better than the baseline on only one of the two configurations it was evaluated on, but in the end it performed significantly better on both. It also handles the introduction of noise to the simulation without a significant decrease in performance. While there are still things that might require further investigation, the algorithm always performs better than a baseline scheduler provided by Ericsson and is overall more suited for a real implementation due to the changes that have been done. (Less)
Please use this url to cite or link to this publication:
author
Wilson Andersson, Elin and Håkansson, Johan
supervisor
organization
year
type
H3 - Professional qualifications (4 Years - )
subject
report number
TFRT-6162
ISSN
0280-5316
language
English
id
9093862
date added to LUP
2022-08-12 10:02:18
date last changed
2022-08-12 10:02:18
@misc{9093862,
  abstract     = {{This thesis aims to further investigate the viability of using reinforcement learning, specifically Q-learning, to schedule shared resources on the Ericsson Many-Core Architecture (EMCA). This was first explored by Patrik Trulsson in his master thesis Dynamic Scheduling of Shared Resources using Reinforcement Learning (2021). The shared resources complete jobs assigned to them, and the jobs have deadlines as well as a latency. The Q-learning based scheduler should minimize the latency in the system. Most importantly, it should avoid missing deadlines. In this work, the Q-learning algorithm was tested on a simulation model of the EMCA that Trulsson built. Its performance was compared to a baseline and random scheduler. Several parts of the Q-learning algorithm were evaluated and modified. The action and state space have been made smaller, and the state space has been made more applicable to the real system. The reward function, as well as other parameters of the Q-learning, were altered for better performance. The result of all of these changes was that the Q-learning algorithm saw an increase in performance. Initially, it performed slightly better than the baseline on only one of the two configurations it was evaluated on, but in the end it performed significantly better on both. It also handles the introduction of noise to the simulation without a significant decrease in performance. While there are still things that might require further investigation, the algorithm always performs better than a baseline scheduler provided by Ericsson and is overall more suited for a real implementation due to the changes that have been done.}},
  author       = {{Wilson Andersson, Elin and Håkansson, Johan}},
  issn         = {{0280-5316}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Improving a Reinforcement Learning Algorithm for Resource Scheduling}},
  year         = {{2022}},
}