Which approach better samples extreme traffic conflicts? Conventional- vs. machine learning-based sampling methods
(2026) In Accident Analysis and Prevention 229.- Abstract
Extreme value theory has been receiving much attention of late for proactively estimating crash risk through a two-step procedure that first samples extreme traffic conflicts and then estimates crash risk based on those sampled extremes. Although the existing body of research has encapsulated sampling methods within a predominant conventional technique, there is no universally accepted practice on how to efficiently select threshold values, nor on how to evaluate the sampled extreme conflicts alignment with the conceptual crash severity level framework. This research aims to address these issues by employing machine learning-based sampling methods, which do not require predefined thresholds, and by comparing the sampled extremes with... (More)
Extreme value theory has been receiving much attention of late for proactively estimating crash risk through a two-step procedure that first samples extreme traffic conflicts and then estimates crash risk based on those sampled extremes. Although the existing body of research has encapsulated sampling methods within a predominant conventional technique, there is no universally accepted practice on how to efficiently select threshold values, nor on how to evaluate the sampled extreme conflicts alignment with the conceptual crash severity level framework. This research aims to address these issues by employing machine learning-based sampling methods, which do not require predefined thresholds, and by comparing the sampled extremes with the conceptual severity levels, to assess their alignment. After a review of recent developments in machine learning techniques in transportation and other engineering fields, two promising machine learning sampling models, autoencoder neural network and isolation forest, were investigated using a database of vehicle-to-pedestrian conflicts at urban signalized intersections. Sampled extreme conflicts using the machine learning and conventional sampling techniques—as a baseline —were assessed and compared using two criteria: their visual alignment with the conceptual severity level framework, and their compatibility with the extreme value distribution. The results demonstrate that the extreme conflicts selected based on the machine learning methods better mirror the conceptual severity levels than the conventional sampling technique. Moreover, extremes classified by the isolation forest more closely preserve the characteristics of the empirical tail distributions, demonstrating a better contextual representation for modeling with the extreme value distribution compared to the autoencoder neural network and conventional sampling methods.
(Less)
- author
- Hasanpour, Maryam
; Chen, Zhankun
LU
; D'Agostino, Carmelo
LU
; Persaud, Bhagwant
and Milligan, Craig
- organization
- publishing date
- 2026-05
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- Autoencoder neural network, Extreme value theory, Isolation forest, Sampling techniques, Traffic conflicts
- in
- Accident Analysis and Prevention
- volume
- 229
- article number
- 108423
- publisher
- Elsevier
- external identifiers
-
- pmid:41633088
- scopus:105029003657
- ISSN
- 0001-4575
- DOI
- 10.1016/j.aap.2026.108423
- language
- English
- LU publication?
- yes
- id
- d3049320-0d13-4c1e-81c6-dfa5a3f19b39
- date added to LUP
- 2026-02-18 09:13:04
- date last changed
- 2026-02-19 03:34:00
@article{d3049320-0d13-4c1e-81c6-dfa5a3f19b39,
abstract = {{<p>Extreme value theory has been receiving much attention of late for proactively estimating crash risk through a two-step procedure that first samples extreme traffic conflicts and then estimates crash risk based on those sampled extremes. Although the existing body of research has encapsulated sampling methods within a predominant conventional technique, there is no universally accepted practice on how to efficiently select threshold values, nor on how to evaluate the sampled extreme conflicts alignment with the conceptual crash severity level framework. This research aims to address these issues by employing machine learning-based sampling methods, which do not require predefined thresholds, and by comparing the sampled extremes with the conceptual severity levels, to assess their alignment. After a review of recent developments in machine learning techniques in transportation and other engineering fields, two promising machine learning sampling models, autoencoder neural network and isolation forest, were investigated using a database of vehicle-to-pedestrian conflicts at urban signalized intersections. Sampled extreme conflicts using the machine learning and conventional sampling techniques—as a baseline —were assessed and compared using two criteria: their visual alignment with the conceptual severity level framework, and their compatibility with the extreme value distribution. The results demonstrate that the extreme conflicts selected based on the machine learning methods better mirror the conceptual severity levels than the conventional sampling technique. Moreover, extremes classified by the isolation forest more closely preserve the characteristics of the empirical tail distributions, demonstrating a better contextual representation for modeling with the extreme value distribution compared to the autoencoder neural network and conventional sampling methods.</p>}},
author = {{Hasanpour, Maryam and Chen, Zhankun and D'Agostino, Carmelo and Persaud, Bhagwant and Milligan, Craig}},
issn = {{0001-4575}},
keywords = {{Autoencoder neural network; Extreme value theory; Isolation forest; Sampling techniques; Traffic conflicts}},
language = {{eng}},
publisher = {{Elsevier}},
series = {{Accident Analysis and Prevention}},
title = {{Which approach better samples extreme traffic conflicts? Conventional- vs. machine learning-based sampling methods}},
url = {{http://dx.doi.org/10.1016/j.aap.2026.108423}},
doi = {{10.1016/j.aap.2026.108423}},
volume = {{229}},
year = {{2026}},
}