Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Robust Reinforcement Learning Control of a Furuta Pendulum

Olhager, Philip (2021)
Department of Automatic Control
Abstract
The use of Reinforcement Learning (RL) to design controllers for safety critical systems is an important research area. On the one hand, RL can function in and adapt to complex and changing environments without requiring a model of the system. On the other hand, in such systems robustness is of high importance, as well as ways to guarantee and certify a level of robustness. This project investigates state-of-the-art methods for training for and evaluating robustness in a neural network, when applied to RL control of a real-world system. The use of Projected Gradient Descent (PGD) as an adversary for robust Deep RL, as well as the Lipschitz constant as a measure of a neural network controller’s robustness, are evaluated.
The study is... (More)
The use of Reinforcement Learning (RL) to design controllers for safety critical systems is an important research area. On the one hand, RL can function in and adapt to complex and changing environments without requiring a model of the system. On the other hand, in such systems robustness is of high importance, as well as ways to guarantee and certify a level of robustness. This project investigates state-of-the-art methods for training for and evaluating robustness in a neural network, when applied to RL control of a real-world system. The use of Projected Gradient Descent (PGD) as an adversary for robust Deep RL, as well as the Lipschitz constant as a measure of a neural network controller’s robustness, are evaluated.
The study is conducted by training an agent to perform swing-up and balancing of a Furuta pendulum, and further training it with PGD of varying magnitudes as an adversary. The agents are evaluated by their robustness towards normally-distributed measurement noise as well as their estimated Lipschitz constant.
The results show that while training with PGD does result in better robustness for a classifier on the MNIST dataset, applying the technique to the Furuta pendulum in a Deep Reinforcement Learning setting is not so simple.
One of 30 agents managed to outperform the baseline agent, indicating that while the technique may have some promise, further fine-tuning of the training process is necessary. Further, the Lipschitz constant did not correlate
with robustness performance, indicating that it may not be an ideal measure of a neural network’s robustness. (Less)
Please use this url to cite or link to this publication:
author
Olhager, Philip
supervisor
organization
year
type
H3 - Professional qualifications (4 Years - )
subject
report number
TFRT-6150
other publication id
0280-5316
language
English
id
9069000
date added to LUP
2021-12-09 15:56:17
date last changed
2021-12-09 15:56:17
@misc{9069000,
  abstract     = {{The use of Reinforcement Learning (RL) to design controllers for safety critical systems is an important research area. On the one hand, RL can function in and adapt to complex and changing environments without requiring a model of the system. On the other hand, in such systems robustness is of high importance, as well as ways to guarantee and certify a level of robustness. This project investigates state-of-the-art methods for training for and evaluating robustness in a neural network, when applied to RL control of a real-world system. The use of Projected Gradient Descent (PGD) as an adversary for robust Deep RL, as well as the Lipschitz constant as a measure of a neural network controller’s robustness, are evaluated.
 The study is conducted by training an agent to perform swing-up and balancing of a Furuta pendulum, and further training it with PGD of varying magnitudes as an adversary. The agents are evaluated by their robustness towards normally-distributed measurement noise as well as their estimated Lipschitz constant.
 The results show that while training with PGD does result in better robustness for a classifier on the MNIST dataset, applying the technique to the Furuta pendulum in a Deep Reinforcement Learning setting is not so simple.
One of 30 agents managed to outperform the baseline agent, indicating that while the technique may have some promise, further fine-tuning of the training process is necessary. Further, the Lipschitz constant did not correlate
with robustness performance, indicating that it may not be an ideal measure of a neural network’s robustness.}},
  author       = {{Olhager, Philip}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Robust Reinforcement Learning Control of a Furuta Pendulum}},
  year         = {{2021}},
}