Reinforcement learning for optimal control problems in continuous time and space

Forsell, Anton

Reinforcement learning for optimal control problems in continuous time and space

Mark

Forsell, Anton (2024)
Department of Automatic Control

Abstract: This thesis considers the problem of finding an optimal control policy for control problems in continuous time and space, utilising reinforcement learning techniques. Finding analytical solutions to optimal control problems is often intractable due to involved constraints, cost-function objectives, and uncertainty. The focal point will be on reinforcement learning RL techniques to find approximate replacements for analytical solutions; or as a tool to create initial control policies, enabling findings and potential problems early in the design process. The thesis will put focus on model-free stochastic RL methods and how introduction of quasi-stochastic noise can be used as a tool for variance reduction. Specifically, we study an... (More); This thesis considers the problem of finding an optimal control policy for control problems in continuous time and space, utilising reinforcement learning techniques. Finding analytical solutions to optimal control problems is often intractable due to involved constraints, cost-function objectives, and uncertainty. The focal point will be on reinforcement learning RL techniques to find approximate replacements for analytical solutions; or as a tool to create initial control policies, enabling findings and potential problems early in the design process. The thesis will put focus on model-free stochastic RL methods and how introduction of quasi-stochastic noise can be used as a tool for variance reduction. Specifically, we study an off-policy temporal difference TD learning method to find optimal control policies based on the quasi-stochastic approximation QSA method. In this approach, data gathering is done by sampling the system with an entirely unrelated policy to the policy to optimize.

The results show that the algorithm produces policies of increasing quality with each iteration, in many cases reaching optimality. The algorithm was tested on linear systems of varying complexity, although still relatively small, and a basic nonlinear system.We discussed how applicable the method is to optimal control problems and concluded that it is applicable to a majority of systems due to having neither harder nor softer requirements than most optimal control optimisers. (Less)

- Open Access
- |
- PDF

Links

Document download statistics

Related Materials

Related object is popular science:
Popular Science summary

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9174621

author

Forsell, Anton

supervisor

Yiannis Karayiannidis ^LU
Anders Rantzer ^LU

organization

Department of Automatic Control

year

2024

type

H3 - Professional qualifications (4 Years - )

subject

Technology and Engineering

report number

TFRT-6258

other publication id

0280-5316

language

English

id

9174621

date added to LUP

2024-09-16 08:50:59

date last changed

2024-09-16 08:50:59

@misc{9174621,
  abstract     = {{This thesis considers the problem of finding an optimal control policy for control problems in continuous time and space, utilising reinforcement learning techniques. Finding analytical solutions to optimal control problems is often intractable due to involved constraints, cost-function objectives, and uncertainty. The focal point will be on reinforcement learning RL techniques to find approximate replacements for analytical solutions; or as a tool to create initial control policies, enabling findings and potential problems early in the design process. The thesis will put focus on model-free stochastic RL methods and how introduction of quasi-stochastic noise can be used as a tool for variance reduction. Specifically, we study an off-policy temporal difference TD learning method to find optimal control policies based on the quasi-stochastic approximation QSA method. In this approach, data gathering is done by sampling the system with an entirely unrelated policy to the policy to optimize.

 The results show that the algorithm produces policies of increasing quality with each iteration, in many cases reaching optimality. The algorithm was tested on linear systems of varying complexity, although still relatively small, and a basic nonlinear system.We discussed how applicable the method is to optimal control problems and concluded that it is applicable to a majority of systems due to having neither harder nor softer requirements than most optimal control optimisers.}},
  author       = {{Forsell, Anton}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Reinforcement learning for optimal control problems in continuous time and space}},
  year         = {{2024}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Reinforcement learning for optimal control problems in continuous time and space