Synthetic versus real : an analysis of critical scenarios for autonomous vehicle testing
(2025) In Automated Software Engineering 32(2).- Abstract
With the emergence of autonomous vehicles comes the requirement of adequate and rigorous testing, particularly in critical scenarios that are both challenging and potentially hazardous. Generating synthetic simulation-based critical scenarios for testing autonomous vehicles has therefore received considerable interest, yet it is unclear how such scenarios relate to the actual crash or near-crash scenarios in the real world. Consequently, their realism is unknown. In this paper, we define realism as the degree of similarity of synthetic critical scenarios to real-world critical scenarios. We propose a methodology to measure realism using two metrics, namely attribute distribution and Euclidean distance. The methodology extracts various... (More)
With the emergence of autonomous vehicles comes the requirement of adequate and rigorous testing, particularly in critical scenarios that are both challenging and potentially hazardous. Generating synthetic simulation-based critical scenarios for testing autonomous vehicles has therefore received considerable interest, yet it is unclear how such scenarios relate to the actual crash or near-crash scenarios in the real world. Consequently, their realism is unknown. In this paper, we define realism as the degree of similarity of synthetic critical scenarios to real-world critical scenarios. We propose a methodology to measure realism using two metrics, namely attribute distribution and Euclidean distance. The methodology extracts various attributes from synthetic and realistic critical scenario datasets and performs a set of statistical tests to compare their distributions and distances. As a proof of concept for our methodology, we compare synthetic collision scenarios from DeepScenario against realistic autonomous vehicle collisions collected by the Department of Motor Vehicles in California, to analyse how well DeepScenario synthetic collision scenarios are aligned with real autonomous vehicle collisions recorded in California. We focus on five key attributes that are extractable from both datasets, and analyse the attribution distribution and distance between scenarios in the two datasets. Further, we derive recommendations to improve the realism of synthetic scenarios based on our analysis. Our study of realism provides a framework that can be replicated and extended for other dataset both concerning real-world and synthetically-generated scenarios.
(Less)
- author
- Song, Qunying
LU
; Bensoussan, Avner and Mousavi, Mohammad Reza
- organization
- publishing date
- 2025-11
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- Autonomous driving systems, Autonomous vehicles, Collision scenarios, Critical scenario identification, Realism, Synthetic scenarios, Testing
- in
- Automated Software Engineering
- volume
- 32
- issue
- 2
- article number
- 37
- publisher
- Springer
- external identifiers
-
- scopus:105002927935
- ISSN
- 0928-8910
- DOI
- 10.1007/s10515-025-00499-4
- language
- English
- LU publication?
- yes
- id
- 7eff06c0-84ef-4295-b966-65e174c77a3b
- date added to LUP
- 2025-08-07 12:53:48
- date last changed
- 2025-08-07 12:54:41
@article{7eff06c0-84ef-4295-b966-65e174c77a3b, abstract = {{<p>With the emergence of autonomous vehicles comes the requirement of adequate and rigorous testing, particularly in critical scenarios that are both challenging and potentially hazardous. Generating synthetic simulation-based critical scenarios for testing autonomous vehicles has therefore received considerable interest, yet it is unclear how such scenarios relate to the actual crash or near-crash scenarios in the real world. Consequently, their realism is unknown. In this paper, we define realism as the degree of similarity of synthetic critical scenarios to real-world critical scenarios. We propose a methodology to measure realism using two metrics, namely attribute distribution and Euclidean distance. The methodology extracts various attributes from synthetic and realistic critical scenario datasets and performs a set of statistical tests to compare their distributions and distances. As a proof of concept for our methodology, we compare synthetic collision scenarios from DeepScenario against realistic autonomous vehicle collisions collected by the Department of Motor Vehicles in California, to analyse how well DeepScenario synthetic collision scenarios are aligned with real autonomous vehicle collisions recorded in California. We focus on five key attributes that are extractable from both datasets, and analyse the attribution distribution and distance between scenarios in the two datasets. Further, we derive recommendations to improve the realism of synthetic scenarios based on our analysis. Our study of realism provides a framework that can be replicated and extended for other dataset both concerning real-world and synthetically-generated scenarios.</p>}}, author = {{Song, Qunying and Bensoussan, Avner and Mousavi, Mohammad Reza}}, issn = {{0928-8910}}, keywords = {{Autonomous driving systems; Autonomous vehicles; Collision scenarios; Critical scenario identification; Realism; Synthetic scenarios; Testing}}, language = {{eng}}, number = {{2}}, publisher = {{Springer}}, series = {{Automated Software Engineering}}, title = {{Synthetic versus real : an analysis of critical scenarios for autonomous vehicle testing}}, url = {{http://dx.doi.org/10.1007/s10515-025-00499-4}}, doi = {{10.1007/s10515-025-00499-4}}, volume = {{32}}, year = {{2025}}, }