Handling long-term safety and uncertainty in safe reinforcement learning

Günster, Jonas; Liu, Puze; Peters, Jan; Tateo, Davide

Handling long-term safety and uncertainty in safe reinforcement learning

Mark

Günster, Jonas ; Liu, Puze ; Peters, Jan and Tateo, Davide ^LU

(2024) 8th Conference on Robot Learning, CoRL 2024 In Proceedings of Machine Learning Research 270. p.4670-4697

Abstract: Safety is one of the key issues preventing the deployment of reinforcement learning techniques in real-world robots. While most approaches in the Safe Reinforcement Learning area do not require prior knowledge of constraints and robot kinematics and rely solely on data, it is often difficult to deploy them in complex real-world settings. Instead, model-based approaches that incorporate prior knowledge of the constraints and dynamics into the learning framework have proven capable of deploying the learning algorithm directly on the real robot. Unfortunately, while an approximated model of the robot dynamics is often available, the safety constraints are task-specific and hard to obtain: they may be too complicated to encode analytically,... (More); Safety is one of the key issues preventing the deployment of reinforcement learning techniques in real-world robots. While most approaches in the Safe Reinforcement Learning area do not require prior knowledge of constraints and robot kinematics and rely solely on data, it is often difficult to deploy them in complex real-world settings. Instead, model-based approaches that incorporate prior knowledge of the constraints and dynamics into the learning framework have proven capable of deploying the learning algorithm directly on the real robot. Unfortunately, while an approximated model of the robot dynamics is often available, the safety constraints are task-specific and hard to obtain: they may be too complicated to encode analytically, too expensive to compute, or it may be difficult to envision a priori the long-term safety requirements. In this paper, we bridge this gap by extending the safe exploration method, ATACOM, with learnable constraints, with a particular focus on ensuring long-term safety and handling of uncertainty. Our approach is competitive or superior to state-of-the-art methods in final performance while maintaining safer behavior during training.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/3b06c0c6-7807-49fe-a713-49592852c88a

author

Günster, Jonas ; Liu, Puze ; Peters, Jan and Tateo, Davide ^LU

publishing date

2024

type

Chapter in Book/Report/Conference proceeding

publication status

published

subject

keywords

Chance constraints, Distributional RL, Safe reinforcement learning

host publication

8th Conference on Robot Learning, CoRL 2024

series title

Proceedings of Machine Learning Research

volume

270

pages

28 pages

conference name

8th Conference on Robot Learning, CoRL 2024

conference location

Munich, Germany

conference dates

2024-11-06 - 2024-11-09

external identifiers

scopus:86000761633

ISSN

2640-3498

language

English

LU publication?

no

id

3b06c0c6-7807-49fe-a713-49592852c88a

date added to LUP

2025-10-16 14:07:43

date last changed

2025-11-03 16:10:54

@inproceedings{3b06c0c6-7807-49fe-a713-49592852c88a,
  abstract     = {{<p>Safety is one of the key issues preventing the deployment of reinforcement learning techniques in real-world robots. While most approaches in the Safe Reinforcement Learning area do not require prior knowledge of constraints and robot kinematics and rely solely on data, it is often difficult to deploy them in complex real-world settings. Instead, model-based approaches that incorporate prior knowledge of the constraints and dynamics into the learning framework have proven capable of deploying the learning algorithm directly on the real robot. Unfortunately, while an approximated model of the robot dynamics is often available, the safety constraints are task-specific and hard to obtain: they may be too complicated to encode analytically, too expensive to compute, or it may be difficult to envision a priori the long-term safety requirements. In this paper, we bridge this gap by extending the safe exploration method, ATACOM, with learnable constraints, with a particular focus on ensuring long-term safety and handling of uncertainty. Our approach is competitive or superior to state-of-the-art methods in final performance while maintaining safer behavior during training.</p>}},
  author       = {{Günster, Jonas and Liu, Puze and Peters, Jan and Tateo, Davide}},
  booktitle    = {{8th Conference on Robot Learning, CoRL 2024}},
  issn         = {{2640-3498}},
  keywords     = {{Chance constraints; Distributional RL; Safe reinforcement learning}},
  language     = {{eng}},
  pages        = {{4670--4697}},
  series       = {{Proceedings of Machine Learning Research}},
  title        = {{Handling long-term safety and uncertainty in safe reinforcement learning}},
  volume       = {{270}},
  year         = {{2024}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Handling long-term safety and uncertainty in safe reinforcement learning