Towards self-reliant robots : skill learning, failure recovery, and real-time adaptation: integrating behavior trees, reinforcement learning, and vision-language models for robust robotic autonomy

Ahmad, Faseeh

Towards self-reliant robots : skill learning, failure recovery, and real-time adaptation: integrating behavior trees, reinforcement learning, and vision-language models for robust robotic autonomy

Mark

Ahmad, Faseeh ^LU (2025)

Abstract: Robots operating in real-world settings must manage task variability, environmental uncertainty, and failures during execution. This thesis presents a unified framework for building self-reliant robotic systems by integrating symbolic planning, reinforcement learning, behavior trees (BTs), and vision-language models (VLMs).

At the core of the approach is an interpretable policy representation based on behavior trees and motion generators (BTMGs), supporting both manual design and automated parameter tuning. Multi-objective Bayesian optimization enables learning skill parameters that balance performance metrics such as safety, speed, and task success. Policies are trained in simulation and successfully transferred to real robots... (More); Robots operating in real-world settings must manage task variability, environmental uncertainty, and failures during execution. This thesis presents a unified framework for building self-reliant robotic systems by integrating symbolic planning, reinforcement learning, behavior trees (BTs), and vision-language models (VLMs).

At the core of the approach is an interpretable policy representation based on behavior trees and motion generators (BTMGs), supporting both manual design and automated parameter tuning. Multi-objective Bayesian optimization enables learning skill parameters that balance performance metrics such as safety, speed, and task success. Policies are trained in simulation and successfully transferred to real robots for contact-rich manipulation tasks.

To support generalization, the framework models task variations using gaussian processes, enabling interpolation of BTMG parameters across unseen scenarios. This allows adaptive behavior without retraining for each new task instance.

Failure recovery is addressed through a hierarchical scheme. BTs are extended with a reactive planner that dynamically updates execution policies based on runtime observations. Vision-language models assist in detecting and identifying failures, and in generating symbolic corrections when tasks are predicted to fail.

The thesis concludes with a discussion of future work, including (1) using vision-language-action (VLA) models or diffusion policies to generate new skills on the fly from multimodal inputs, and (2) extending the reactive planner with proactive failure prediction to anticipate and prevent execution errors before they occur. Together, these directions aim to advance robotic systems that are more robust, adaptable, and autonomous. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/7fcb2da0-0594-41a4-8050-e9367d0c408b

author

Ahmad, Faseeh ^LU

supervisor

Volker Krueger ^LU
Jacek Malec ^LU

opponent

Prof. Nalpantidis, Lazaros, DTU Technical University of Denmark, Denmark.

organization

Robotics and Semantic Systems

publishing date

2025-09-15

type

Thesis

publication status

published

subject

Robotics and automation

keywords

Autonomous Robotics, Behavior Trees, Reinforcement learning, Vision-Language Models, Failure Recovery

pages

258 pages

publisher

Computer Science, Lund University

defense location

Lecture Hall E:1406, building E, Klas Anshelms väg 10, Faculty of Engineering LTH, Lund University, Lund. The dissertation will be live streamed, but part of the premises is to be excluded from the live stream.

defense date

2025-10-10 13:00:00

ISBN

978-91-8104-681-6

978-91-8104-682-3

language

English

LU publication?

yes

id

7fcb2da0-0594-41a4-8050-e9367d0c408b

date added to LUP

2025-09-15 18:53:45

date last changed

2025-09-18 09:23:32

@phdthesis{7fcb2da0-0594-41a4-8050-e9367d0c408b,
  abstract     = {{Robots operating in real-world settings must manage task variability, environmental uncertainty, and failures during execution. This thesis presents a unified framework for building self-reliant robotic systems by integrating symbolic planning, reinforcement learning, behavior trees (BTs), and vision-language models (VLMs).<br/><br/>At the core of the approach is an interpretable policy representation based on behavior trees and motion generators (BTMGs), supporting both manual design and automated parameter tuning. Multi-objective Bayesian optimization enables learning skill parameters that balance performance metrics such as safety, speed, and task success. Policies are trained in simulation and successfully transferred to real robots for contact-rich manipulation tasks.<br/><br/>To support generalization, the framework models task variations using gaussian processes, enabling interpolation of BTMG parameters across unseen scenarios. This allows adaptive behavior without retraining for each new task instance.<br/><br/>Failure recovery is addressed through a hierarchical scheme. BTs are extended with a reactive planner that dynamically updates execution policies based on runtime observations. Vision-language models assist in detecting and identifying failures, and in generating symbolic corrections when tasks are predicted to fail.<br/><br/>The thesis concludes with a discussion of future work, including (1) using vision-language-action (VLA) models or diffusion policies to generate new skills on the fly from multimodal inputs, and (2) extending the reactive planner with proactive failure prediction to anticipate and prevent execution errors before they occur. Together, these directions aim to advance robotic systems that are more robust, adaptable, and autonomous.}},
  author       = {{Ahmad, Faseeh}},
  isbn         = {{978-91-8104-681-6}},
  keywords     = {{Autonomous Robotics; Behavior Trees; Reinforcement learning; Vision-Language Models; Failure Recovery}},
  language     = {{eng}},
  month        = {{09}},
  publisher    = {{Computer Science, Lund University}},
  school       = {{Lund University}},
  title        = {{Towards self-reliant robots : skill learning, failure recovery, and real-time adaptation: integrating behavior trees, reinforcement learning, and vision-language models for robust robotic autonomy}},
  url          = {{https://lup.lub.lu.se/search/files/227932138/Thesis_final_v2.pdf}},
  year         = {{2025}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Towards self-reliant robots : skill learning, failure recovery, and real-time adaptation: integrating behavior trees, reinforcement learning, and vision-language models for robust robotic autonomy