Robust Reinforcement Learning Control of a Furuta Pendulum

Olhager, Philip

Robust Reinforcement Learning Control of a Furuta Pendulum

Mark

Olhager, Philip (2021)
Department of Automatic Control

Abstract: The use of Reinforcement Learning (RL) to design controllers for safety critical systems is an important research area. On the one hand, RL can function in and adapt to complex and changing environments without requiring a model of the system. On the other hand, in such systems robustness is of high importance, as well as ways to guarantee and certify a level of robustness. This project investigates state-of-the-art methods for training for and evaluating robustness in a neural network, when applied to RL control of a real-world system. The use of Projected Gradient Descent (PGD) as an adversary for robust Deep RL, as well as the Lipschitz constant as a measure of a neural network controller’s robustness, are evaluated.
The study is... (More); The use of Reinforcement Learning (RL) to design controllers for safety critical systems is an important research area. On the one hand, RL can function in and adapt to complex and changing environments without requiring a model of the system. On the other hand, in such systems robustness is of high importance, as well as ways to guarantee and certify a level of robustness. This project investigates state-of-the-art methods for training for and evaluating robustness in a neural network, when applied to RL control of a real-world system. The use of Projected Gradient Descent (PGD) as an adversary for robust Deep RL, as well as the Lipschitz constant as a measure of a neural network controller’s robustness, are evaluated.
The study is conducted by training an agent to perform swing-up and balancing of a Furuta pendulum, and further training it with PGD of varying magnitudes as an adversary. The agents are evaluated by their robustness towards normally-distributed measurement noise as well as their estimated Lipschitz constant.
The results show that while training with PGD does result in better robustness for a classifier on the MNIST dataset, applying the technique to the Furuta pendulum in a Deep Reinforcement Learning setting is not so simple.
One of 30 agents managed to outperform the baseline agent, indicating that while the technique may have some promise, further fine-tuning of the training process is necessary. Further, the Lipschitz constant did not correlate
with robustness performance, indicating that it may not be an ideal measure of a neural network’s robustness. (Less)

- Open Access
- |
- PDF

Links

Document download statistics

Related Materials

Related object is popular science:
Popular science summary

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9069000

author

Olhager, Philip

supervisor

organization

Department of Automatic Control

year

2021

type

H3 - Professional qualifications (4 Years - )

subject

Technology and Engineering

report number

TFRT-6150

other publication id

0280-5316

language

English

id

9069000

date added to LUP

2021-12-09 15:56:17

date last changed

2021-12-09 15:56:17

@misc{9069000,
  abstract     = {{The use of Reinforcement Learning (RL) to design controllers for safety critical systems is an important research area. On the one hand, RL can function in and adapt to complex and changing environments without requiring a model of the system. On the other hand, in such systems robustness is of high importance, as well as ways to guarantee and certify a level of robustness. This project investigates state-of-the-art methods for training for and evaluating robustness in a neural network, when applied to RL control of a real-world system. The use of Projected Gradient Descent (PGD) as an adversary for robust Deep RL, as well as the Lipschitz constant as a measure of a neural network controller’s robustness, are evaluated.
 The study is conducted by training an agent to perform swing-up and balancing of a Furuta pendulum, and further training it with PGD of varying magnitudes as an adversary. The agents are evaluated by their robustness towards normally-distributed measurement noise as well as their estimated Lipschitz constant.
 The results show that while training with PGD does result in better robustness for a classifier on the MNIST dataset, applying the technique to the Furuta pendulum in a Deep Reinforcement Learning setting is not so simple.
One of 30 agents managed to outperform the baseline agent, indicating that while the technique may have some promise, further fine-tuning of the training process is necessary. Further, the Lipschitz constant did not correlate
with robustness performance, indicating that it may not be an ideal measure of a neural network’s robustness.}},
  author       = {{Olhager, Philip}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Robust Reinforcement Learning Control of a Furuta Pendulum}},
  year         = {{2021}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Robust Reinforcement Learning Control of a Furuta Pendulum