A Unified Framework for Real-Time Failure Handling in Robotics Using Vision-Language Models, Reactive Planner and Behavior Trees

Ahmad, Faseeh; Ismail, Hashim; Styrud, Jonathan; Stenmark, Maj; Krueger, Volker

A Unified Framework for Real-Time Failure Handling in Robotics Using Vision-Language Models, Reactive Planner and Behavior Trees

Mark

Ahmad, Faseeh ^LU ; Ismail, Hashim ^LU

; Styrud, Jonathan ; Stenmark, Maj ^LU

and Krueger, Volker ^LU

(2025) 21st IEEE International Conference on Automation Science and Engineering, CASE 2025 p.887-894

Abstract: Robotic systems often face execution failures due to unexpected obstacles, sensor errors, or environmental changes. Traditional failure recovery methods rely on predefined strategies or human intervention, making them less adaptable. This paper presents a unified failure recovery framework that combines Vision-Language Models (VLMs), a reactive planner, and Behavior Trees (BTs) to enable real-time failure handling. Our approach includes pre-execution verification, which checks for potential failures before execution, and reactive failure handling, which detects and corrects failures during execution by verifying existing BT conditions, adding missing preconditions and, when necessary, generating new skills. The framework uses a scene... (More); Robotic systems often face execution failures due to unexpected obstacles, sensor errors, or environmental changes. Traditional failure recovery methods rely on predefined strategies or human intervention, making them less adaptable. This paper presents a unified failure recovery framework that combines Vision-Language Models (VLMs), a reactive planner, and Behavior Trees (BTs) to enable real-time failure handling. Our approach includes pre-execution verification, which checks for potential failures before execution, and reactive failure handling, which detects and corrects failures during execution by verifying existing BT conditions, adding missing preconditions and, when necessary, generating new skills. The framework uses a scene graph for structured environmental perception and an execution history for continuous monitoring, enabling context-aware and adaptive failure handling. We evaluate our framework through real-world experiments with an ABB YuMi robot on tasks like peg insertion, object sorting, and drawer placement, as well as in AI2-THOR simulator. Compared to using pre-execution and reactive methods separately, our approach achieves higher task success rates and greater adaptability. Ablation studies highlight the importance of VLM-based reasoning, structured scene representation, and execution history tracking for effective failure recovery in robotics.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/2f6d895f-afd2-431e-a284-55a53a4162bc

author

Ahmad, Faseeh ^LU ; Ismail, Hashim ^LU

; Styrud, Jonathan ; Stenmark, Maj ^LU

and Krueger, Volker ^LU

organization

publishing date

2025

type

Chapter in Book/Report/Conference proceeding

publication status

published

subject

Robotics and automation

host publication

2025 IEEE 21st International Conference on Automation Science and Engineering, CASE 2025

pages

8 pages

publisher

IEEE Computer Society

conference name

21st IEEE International Conference on Automation Science and Engineering, CASE 2025

conference location

Los Angeles, United States

conference dates

2025-08-17 - 2025-08-21

external identifiers

scopus:105018322736

ISBN

9798331522469

DOI

10.1109/CASE58245.2025.11164021

language

English

LU publication?

yes

id

2f6d895f-afd2-431e-a284-55a53a4162bc

date added to LUP

2026-01-08 15:54:01

date last changed

2026-01-08 15:54:28

@inproceedings{2f6d895f-afd2-431e-a284-55a53a4162bc,
  abstract     = {{<p>Robotic systems often face execution failures due to unexpected obstacles, sensor errors, or environmental changes. Traditional failure recovery methods rely on predefined strategies or human intervention, making them less adaptable. This paper presents a unified failure recovery framework that combines Vision-Language Models (VLMs), a reactive planner, and Behavior Trees (BTs) to enable real-time failure handling. Our approach includes pre-execution verification, which checks for potential failures before execution, and reactive failure handling, which detects and corrects failures during execution by verifying existing BT conditions, adding missing preconditions and, when necessary, generating new skills. The framework uses a scene graph for structured environmental perception and an execution history for continuous monitoring, enabling context-aware and adaptive failure handling. We evaluate our framework through real-world experiments with an ABB YuMi robot on tasks like peg insertion, object sorting, and drawer placement, as well as in AI2-THOR simulator. Compared to using pre-execution and reactive methods separately, our approach achieves higher task success rates and greater adaptability. Ablation studies highlight the importance of VLM-based reasoning, structured scene representation, and execution history tracking for effective failure recovery in robotics.</p>}},
  author       = {{Ahmad, Faseeh and Ismail, Hashim and Styrud, Jonathan and Stenmark, Maj and Krueger, Volker}},
  booktitle    = {{2025 IEEE 21st International Conference on Automation Science and Engineering, CASE 2025}},
  isbn         = {{9798331522469}},
  language     = {{eng}},
  pages        = {{887--894}},
  publisher    = {{IEEE Computer Society}},
  title        = {{A Unified Framework for Real-Time Failure Handling in Robotics Using Vision-Language Models, Reactive Planner and Behavior Trees}},
  url          = {{http://dx.doi.org/10.1109/CASE58245.2025.11164021}},
  doi          = {{10.1109/CASE58245.2025.11164021}},
  year         = {{2025}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

A Unified Framework for Real-Time Failure Handling in Robotics Using Vision-Language Models, Reactive Planner and Behavior Trees