Robust Reinforcement Learning Control of a Furuta Pendulum
(2021)Department of Automatic Control
- Abstract
- The use of Reinforcement Learning (RL) to design controllers for safety critical systems is an important research area. On the one hand, RL can function in and adapt to complex and changing environments without requiring a model of the system. On the other hand, in such systems robustness is of high importance, as well as ways to guarantee and certify a level of robustness. This project investigates state-of-the-art methods for training for and evaluating robustness in a neural network, when applied to RL control of a real-world system. The use of Projected Gradient Descent (PGD) as an adversary for robust Deep RL, as well as the Lipschitz constant as a measure of a neural network controller’s robustness, are evaluated.
The study is... (More) - The use of Reinforcement Learning (RL) to design controllers for safety critical systems is an important research area. On the one hand, RL can function in and adapt to complex and changing environments without requiring a model of the system. On the other hand, in such systems robustness is of high importance, as well as ways to guarantee and certify a level of robustness. This project investigates state-of-the-art methods for training for and evaluating robustness in a neural network, when applied to RL control of a real-world system. The use of Projected Gradient Descent (PGD) as an adversary for robust Deep RL, as well as the Lipschitz constant as a measure of a neural network controller’s robustness, are evaluated.
The study is conducted by training an agent to perform swing-up and balancing of a Furuta pendulum, and further training it with PGD of varying magnitudes as an adversary. The agents are evaluated by their robustness towards normally-distributed measurement noise as well as their estimated Lipschitz constant.
The results show that while training with PGD does result in better robustness for a classifier on the MNIST dataset, applying the technique to the Furuta pendulum in a Deep Reinforcement Learning setting is not so simple.
One of 30 agents managed to outperform the baseline agent, indicating that while the technique may have some promise, further fine-tuning of the training process is necessary. Further, the Lipschitz constant did not correlate
with robustness performance, indicating that it may not be an ideal measure of a neural network’s robustness. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9069000
- author
- Olhager, Philip
- supervisor
-
- Johan Grönqvist LU
- Richard Pates LU
- Anders Rantzer LU
- organization
- year
- 2021
- type
- H3 - Professional qualifications (4 Years - )
- subject
- report number
- TFRT-6150
- other publication id
- 0280-5316
- language
- English
- id
- 9069000
- date added to LUP
- 2021-12-09 15:56:17
- date last changed
- 2021-12-09 15:56:17
@misc{9069000, abstract = {{The use of Reinforcement Learning (RL) to design controllers for safety critical systems is an important research area. On the one hand, RL can function in and adapt to complex and changing environments without requiring a model of the system. On the other hand, in such systems robustness is of high importance, as well as ways to guarantee and certify a level of robustness. This project investigates state-of-the-art methods for training for and evaluating robustness in a neural network, when applied to RL control of a real-world system. The use of Projected Gradient Descent (PGD) as an adversary for robust Deep RL, as well as the Lipschitz constant as a measure of a neural network controller’s robustness, are evaluated. The study is conducted by training an agent to perform swing-up and balancing of a Furuta pendulum, and further training it with PGD of varying magnitudes as an adversary. The agents are evaluated by their robustness towards normally-distributed measurement noise as well as their estimated Lipschitz constant. The results show that while training with PGD does result in better robustness for a classifier on the MNIST dataset, applying the technique to the Furuta pendulum in a Deep Reinforcement Learning setting is not so simple. One of 30 agents managed to outperform the baseline agent, indicating that while the technique may have some promise, further fine-tuning of the training process is necessary. Further, the Lipschitz constant did not correlate with robustness performance, indicating that it may not be an ideal measure of a neural network’s robustness.}}, author = {{Olhager, Philip}}, language = {{eng}}, note = {{Student Paper}}, title = {{Robust Reinforcement Learning Control of a Furuta Pendulum}}, year = {{2021}}, }