Reinforcement learning for optimal control problems in continuous time and space
(2024)Department of Automatic Control
- Abstract
- This thesis considers the problem of finding an optimal control policy for control problems in continuous time and space, utilising reinforcement learning techniques. Finding analytical solutions to optimal control problems is often intractable due to involved constraints, cost-function objectives, and uncertainty. The focal point will be on reinforcement learning RL techniques to find approximate replacements for analytical solutions; or as a tool to create initial control policies, enabling findings and potential problems early in the design process. The thesis will put focus on model-free stochastic RL methods and how introduction of quasi-stochastic noise can be used as a tool for variance reduction. Specifically, we study an... (More)
- This thesis considers the problem of finding an optimal control policy for control problems in continuous time and space, utilising reinforcement learning techniques. Finding analytical solutions to optimal control problems is often intractable due to involved constraints, cost-function objectives, and uncertainty. The focal point will be on reinforcement learning RL techniques to find approximate replacements for analytical solutions; or as a tool to create initial control policies, enabling findings and potential problems early in the design process. The thesis will put focus on model-free stochastic RL methods and how introduction of quasi-stochastic noise can be used as a tool for variance reduction. Specifically, we study an off-policy temporal difference TD learning method to find optimal control policies based on the quasi-stochastic approximation QSA method. In this approach, data gathering is done by sampling the system with an entirely unrelated policy to the policy to optimize.
The results show that the algorithm produces policies of increasing quality with each iteration, in many cases reaching optimality. The algorithm was tested on linear systems of varying complexity, although still relatively small, and a basic nonlinear system.We discussed how applicable the method is to optimal control problems and concluded that it is applicable to a majority of systems due to having neither harder nor softer requirements than most optimal control optimisers. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9174621
- author
- Forsell, Anton
- supervisor
- organization
- year
- 2024
- type
- H3 - Professional qualifications (4 Years - )
- subject
- report number
- TFRT-6258
- other publication id
- 0280-5316
- language
- English
- id
- 9174621
- date added to LUP
- 2024-09-16 08:50:59
- date last changed
- 2024-09-16 08:50:59
@misc{9174621, abstract = {{This thesis considers the problem of finding an optimal control policy for control problems in continuous time and space, utilising reinforcement learning techniques. Finding analytical solutions to optimal control problems is often intractable due to involved constraints, cost-function objectives, and uncertainty. The focal point will be on reinforcement learning RL techniques to find approximate replacements for analytical solutions; or as a tool to create initial control policies, enabling findings and potential problems early in the design process. The thesis will put focus on model-free stochastic RL methods and how introduction of quasi-stochastic noise can be used as a tool for variance reduction. Specifically, we study an off-policy temporal difference TD learning method to find optimal control policies based on the quasi-stochastic approximation QSA method. In this approach, data gathering is done by sampling the system with an entirely unrelated policy to the policy to optimize. The results show that the algorithm produces policies of increasing quality with each iteration, in many cases reaching optimality. The algorithm was tested on linear systems of varying complexity, although still relatively small, and a basic nonlinear system.We discussed how applicable the method is to optimal control problems and concluded that it is applicable to a majority of systems due to having neither harder nor softer requirements than most optimal control optimisers.}}, author = {{Forsell, Anton}}, language = {{eng}}, note = {{Student Paper}}, title = {{Reinforcement learning for optimal control problems in continuous time and space}}, year = {{2024}}, }