Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Reinforcement learning for optimal control problems in continuous time and space

Forsell, Anton (2024)
Department of Automatic Control
Abstract
This thesis considers the problem of finding an optimal control policy for control problems in continuous time and space, utilising reinforcement learning techniques. Finding analytical solutions to optimal control problems is often intractable due to involved constraints, cost-function objectives, and uncertainty. The focal point will be on reinforcement learning RL techniques to find approximate replacements for analytical solutions; or as a tool to create initial control policies, enabling findings and potential problems early in the design process. The thesis will put focus on model-free stochastic RL methods and how introduction of quasi-stochastic noise can be used as a tool for variance reduction. Specifically, we study an... (More)
This thesis considers the problem of finding an optimal control policy for control problems in continuous time and space, utilising reinforcement learning techniques. Finding analytical solutions to optimal control problems is often intractable due to involved constraints, cost-function objectives, and uncertainty. The focal point will be on reinforcement learning RL techniques to find approximate replacements for analytical solutions; or as a tool to create initial control policies, enabling findings and potential problems early in the design process. The thesis will put focus on model-free stochastic RL methods and how introduction of quasi-stochastic noise can be used as a tool for variance reduction. Specifically, we study an off-policy temporal difference TD learning method to find optimal control policies based on the quasi-stochastic approximation QSA method. In this approach, data gathering is done by sampling the system with an entirely unrelated policy to the policy to optimize.

The results show that the algorithm produces policies of increasing quality with each iteration, in many cases reaching optimality. The algorithm was tested on linear systems of varying complexity, although still relatively small, and a basic nonlinear system.We discussed how applicable the method is to optimal control problems and concluded that it is applicable to a majority of systems due to having neither harder nor softer requirements than most optimal control optimisers. (Less)
Please use this url to cite or link to this publication:
author
Forsell, Anton
supervisor
organization
year
type
H3 - Professional qualifications (4 Years - )
subject
report number
TFRT-6258
other publication id
0280-5316
language
English
id
9174621
date added to LUP
2024-09-16 08:50:59
date last changed
2024-09-16 08:50:59
@misc{9174621,
  abstract     = {{This thesis considers the problem of finding an optimal control policy for control problems in continuous time and space, utilising reinforcement learning techniques. Finding analytical solutions to optimal control problems is often intractable due to involved constraints, cost-function objectives, and uncertainty. The focal point will be on reinforcement learning RL techniques to find approximate replacements for analytical solutions; or as a tool to create initial control policies, enabling findings and potential problems early in the design process. The thesis will put focus on model-free stochastic RL methods and how introduction of quasi-stochastic noise can be used as a tool for variance reduction. Specifically, we study an off-policy temporal difference TD learning method to find optimal control policies based on the quasi-stochastic approximation QSA method. In this approach, data gathering is done by sampling the system with an entirely unrelated policy to the policy to optimize.

 The results show that the algorithm produces policies of increasing quality with each iteration, in many cases reaching optimality. The algorithm was tested on linear systems of varying complexity, although still relatively small, and a basic nonlinear system.We discussed how applicable the method is to optimal control problems and concluded that it is applicable to a majority of systems due to having neither harder nor softer requirements than most optimal control optimisers.}},
  author       = {{Forsell, Anton}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Reinforcement learning for optimal control problems in continuous time and space}},
  year         = {{2024}},
}