Which approach better samples extreme traffic conflicts? Conventional- vs. machine learning-based sampling methods

Hasanpour, Maryam; Chen, Zhankun; D'Agostino, Carmelo; Persaud, Bhagwant; Milligan, Craig

Which approach better samples extreme traffic conflicts? Conventional- vs. machine learning-based sampling methods

Mark

Hasanpour, Maryam ; Chen, Zhankun ^LU ; D'Agostino, Carmelo ^LU

; Persaud, Bhagwant and Milligan, Craig (2026) In Accident Analysis and Prevention 229.

Abstract: Extreme value theory has been receiving much attention of late for proactively estimating crash risk through a two-step procedure that first samples extreme traffic conflicts and then estimates crash risk based on those sampled extremes. Although the existing body of research has encapsulated sampling methods within a predominant conventional technique, there is no universally accepted practice on how to efficiently select threshold values, nor on how to evaluate the sampled extreme conflicts alignment with the conceptual crash severity level framework. This research aims to address these issues by employing machine learning-based sampling methods, which do not require predefined thresholds, and by comparing the sampled extremes with... (More); Extreme value theory has been receiving much attention of late for proactively estimating crash risk through a two-step procedure that first samples extreme traffic conflicts and then estimates crash risk based on those sampled extremes. Although the existing body of research has encapsulated sampling methods within a predominant conventional technique, there is no universally accepted practice on how to efficiently select threshold values, nor on how to evaluate the sampled extreme conflicts alignment with the conceptual crash severity level framework. This research aims to address these issues by employing machine learning-based sampling methods, which do not require predefined thresholds, and by comparing the sampled extremes with the conceptual severity levels, to assess their alignment. After a review of recent developments in machine learning techniques in transportation and other engineering fields, two promising machine learning sampling models, autoencoder neural network and isolation forest, were investigated using a database of vehicle-to-pedestrian conflicts at urban signalized intersections. Sampled extreme conflicts using the machine learning and conventional sampling techniques—as a baseline —were assessed and compared using two criteria: their visual alignment with the conceptual severity level framework, and their compatibility with the extreme value distribution. The results demonstrate that the extreme conflicts selected based on the machine learning methods better mirror the conceptual severity levels than the conventional sampling technique. Moreover, extremes classified by the isolation forest more closely preserve the characteristics of the empirical tail distributions, demonstrating a better contextual representation for modeling with the extreme value distribution compared to the autoencoder neural network and conventional sampling methods.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/d3049320-0d13-4c1e-81c6-dfa5a3f19b39

author

Hasanpour, Maryam ; Chen, Zhankun ^LU ; D'Agostino, Carmelo ^LU

; Persaud, Bhagwant and Milligan, Craig

organization

publishing date

2026-05

type

Contribution to journal

publication status

published

subject

Transport Systems and Logistics

keywords

Autoencoder neural network, Extreme value theory, Isolation forest, Sampling techniques, Traffic conflicts

in

Accident Analysis and Prevention

volume

229

article number

108423

publisher

Elsevier

external identifiers

pmid:41633088
scopus:105029003657

ISSN

0001-4575

DOI

10.1016/j.aap.2026.108423

language

English

LU publication?

yes

id

d3049320-0d13-4c1e-81c6-dfa5a3f19b39

date added to LUP

2026-02-18 09:13:04

date last changed

2026-02-19 03:34:00

@article{d3049320-0d13-4c1e-81c6-dfa5a3f19b39,
  abstract     = {{<p>Extreme value theory has been receiving much attention of late for proactively estimating crash risk through a two-step procedure that first samples extreme traffic conflicts and then estimates crash risk based on those sampled extremes. Although the existing body of research has encapsulated sampling methods within a predominant conventional technique, there is no universally accepted practice on how to efficiently select threshold values, nor on how to evaluate the sampled extreme conflicts alignment with the conceptual crash severity level framework. This research aims to address these issues by employing machine learning-based sampling methods, which do not require predefined thresholds, and by comparing the sampled extremes with the conceptual severity levels, to assess their alignment. After a review of recent developments in machine learning techniques in transportation and other engineering fields, two promising machine learning sampling models, autoencoder neural network and isolation forest, were investigated using a database of vehicle-to-pedestrian conflicts at urban signalized intersections. Sampled extreme conflicts using the machine learning and conventional sampling techniques—as a baseline —were assessed and compared using two criteria: their visual alignment with the conceptual severity level framework, and their compatibility with the extreme value distribution. The results demonstrate that the extreme conflicts selected based on the machine learning methods better mirror the conceptual severity levels than the conventional sampling technique. Moreover, extremes classified by the isolation forest more closely preserve the characteristics of the empirical tail distributions, demonstrating a better contextual representation for modeling with the extreme value distribution compared to the autoencoder neural network and conventional sampling methods.</p>}},
  author       = {{Hasanpour, Maryam and Chen, Zhankun and D'Agostino, Carmelo and Persaud, Bhagwant and Milligan, Craig}},
  issn         = {{0001-4575}},
  keywords     = {{Autoencoder neural network; Extreme value theory; Isolation forest; Sampling techniques; Traffic conflicts}},
  language     = {{eng}},
  publisher    = {{Elsevier}},
  series       = {{Accident Analysis and Prevention}},
  title        = {{Which approach better samples extreme traffic conflicts? Conventional- vs. machine learning-based sampling methods}},
  url          = {{http://dx.doi.org/10.1016/j.aap.2026.108423}},
  doi          = {{10.1016/j.aap.2026.108423}},
  volume       = {{229}},
  year         = {{2026}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Which approach better samples extreme traffic conflicts? Conventional- vs. machine learning-based sampling methods