Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Which approach better samples extreme traffic conflicts? Conventional- vs. machine learning-based sampling methods

Hasanpour, Maryam ; Chen, Zhankun LU ; D'Agostino, Carmelo LU orcid ; Persaud, Bhagwant and Milligan, Craig (2026) In Accident Analysis and Prevention 229.
Abstract

Extreme value theory has been receiving much attention of late for proactively estimating crash risk through a two-step procedure that first samples extreme traffic conflicts and then estimates crash risk based on those sampled extremes. Although the existing body of research has encapsulated sampling methods within a predominant conventional technique, there is no universally accepted practice on how to efficiently select threshold values, nor on how to evaluate the sampled extreme conflicts alignment with the conceptual crash severity level framework. This research aims to address these issues by employing machine learning-based sampling methods, which do not require predefined thresholds, and by comparing the sampled extremes with... (More)

Extreme value theory has been receiving much attention of late for proactively estimating crash risk through a two-step procedure that first samples extreme traffic conflicts and then estimates crash risk based on those sampled extremes. Although the existing body of research has encapsulated sampling methods within a predominant conventional technique, there is no universally accepted practice on how to efficiently select threshold values, nor on how to evaluate the sampled extreme conflicts alignment with the conceptual crash severity level framework. This research aims to address these issues by employing machine learning-based sampling methods, which do not require predefined thresholds, and by comparing the sampled extremes with the conceptual severity levels, to assess their alignment. After a review of recent developments in machine learning techniques in transportation and other engineering fields, two promising machine learning sampling models, autoencoder neural network and isolation forest, were investigated using a database of vehicle-to-pedestrian conflicts at urban signalized intersections. Sampled extreme conflicts using the machine learning and conventional sampling techniques—as a baseline —were assessed and compared using two criteria: their visual alignment with the conceptual severity level framework, and their compatibility with the extreme value distribution. The results demonstrate that the extreme conflicts selected based on the machine learning methods better mirror the conceptual severity levels than the conventional sampling technique. Moreover, extremes classified by the isolation forest more closely preserve the characteristics of the empirical tail distributions, demonstrating a better contextual representation for modeling with the extreme value distribution compared to the autoencoder neural network and conventional sampling methods.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Autoencoder neural network, Extreme value theory, Isolation forest, Sampling techniques, Traffic conflicts
in
Accident Analysis and Prevention
volume
229
article number
108423
publisher
Elsevier
external identifiers
  • pmid:41633088
  • scopus:105029003657
ISSN
0001-4575
DOI
10.1016/j.aap.2026.108423
language
English
LU publication?
yes
id
d3049320-0d13-4c1e-81c6-dfa5a3f19b39
date added to LUP
2026-02-18 09:13:04
date last changed
2026-02-19 03:34:00
@article{d3049320-0d13-4c1e-81c6-dfa5a3f19b39,
  abstract     = {{<p>Extreme value theory has been receiving much attention of late for proactively estimating crash risk through a two-step procedure that first samples extreme traffic conflicts and then estimates crash risk based on those sampled extremes. Although the existing body of research has encapsulated sampling methods within a predominant conventional technique, there is no universally accepted practice on how to efficiently select threshold values, nor on how to evaluate the sampled extreme conflicts alignment with the conceptual crash severity level framework. This research aims to address these issues by employing machine learning-based sampling methods, which do not require predefined thresholds, and by comparing the sampled extremes with the conceptual severity levels, to assess their alignment. After a review of recent developments in machine learning techniques in transportation and other engineering fields, two promising machine learning sampling models, autoencoder neural network and isolation forest, were investigated using a database of vehicle-to-pedestrian conflicts at urban signalized intersections. Sampled extreme conflicts using the machine learning and conventional sampling techniques—as a baseline —were assessed and compared using two criteria: their visual alignment with the conceptual severity level framework, and their compatibility with the extreme value distribution. The results demonstrate that the extreme conflicts selected based on the machine learning methods better mirror the conceptual severity levels than the conventional sampling technique. Moreover, extremes classified by the isolation forest more closely preserve the characteristics of the empirical tail distributions, demonstrating a better contextual representation for modeling with the extreme value distribution compared to the autoencoder neural network and conventional sampling methods.</p>}},
  author       = {{Hasanpour, Maryam and Chen, Zhankun and D'Agostino, Carmelo and Persaud, Bhagwant and Milligan, Craig}},
  issn         = {{0001-4575}},
  keywords     = {{Autoencoder neural network; Extreme value theory; Isolation forest; Sampling techniques; Traffic conflicts}},
  language     = {{eng}},
  publisher    = {{Elsevier}},
  series       = {{Accident Analysis and Prevention}},
  title        = {{Which approach better samples extreme traffic conflicts? Conventional- vs. machine learning-based sampling methods}},
  url          = {{http://dx.doi.org/10.1016/j.aap.2026.108423}},
  doi          = {{10.1016/j.aap.2026.108423}},
  volume       = {{229}},
  year         = {{2026}},
}