Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Injury severity prediction of cyclist crashes using random forests and random parameters logit models

Scarano, Antonella LU ; Rella Riccardi, Maria ; Mauriello, Filomena LU ; D'Agostino, Carmelo LU orcid ; Pasquino, Nicola and Montella, Alfonso (2023) In Accident Analysis and Prevention 192.
Abstract

Cycling provides numerous benefits to individuals and to society but the burden of road traffic injuries and fatalities is disproportionately sustained by cyclists. Without awareness of the contributory factors of cyclist death and injury, the capability to implement context-specific and appropriate measures is severely limited. In this paper, we investigated the effects of the characteristics related to the road, the environment, the vehicle involved, the driver, and the cyclist on severity of crashes involving cyclists analysing 72,363 crashes that occurred in Great Britain in the period 2016–2018. Both a machine learning method, as the Random Forest (RF), and an econometric model, as the Random Parameters Logit Model (RPLM), were... (More)

Cycling provides numerous benefits to individuals and to society but the burden of road traffic injuries and fatalities is disproportionately sustained by cyclists. Without awareness of the contributory factors of cyclist death and injury, the capability to implement context-specific and appropriate measures is severely limited. In this paper, we investigated the effects of the characteristics related to the road, the environment, the vehicle involved, the driver, and the cyclist on severity of crashes involving cyclists analysing 72,363 crashes that occurred in Great Britain in the period 2016–2018. Both a machine learning method, as the Random Forest (RF), and an econometric model, as the Random Parameters Logit Model (RPLM), were implemented. Three different RF algorithms were performed, namely the traditional RF, the Weighted Subspace RF, and the Random Survival Forest. The latter demonstrated superior predictive performances both in terms of F-measure and G-mean. The main result of the Random Survival Forest is the variable importance that provides a ranked list of the predictors associated with the fatal and severe cyclist crashes. For fatal classification, 19 variables showed a normalized importance higher than 5% with the second involved vehicle manoeuvring and the gender of the driver of the second vehicle having the greatest predictive ability. For serious injury classification, 13 variables showed a normalized importance higher than 5% with the bike leaving the carriageway having the greatest normalized importance. Furthermore, each path from the root node to the leaf nodes has been retraced the way back generating 361 if-then rules with fatal crash as consequent and 349 if-then rules with serious injury crash as consequent. The RPLM showed significant unobserved heterogeneity in the data finding four normal distributed indicator variables with random parameters: cyclist age ≥ 75 (fatal prediction), cyclist gender male (fatal and serious prediction), and driver aged 55–64 (serious prediction). The model's McFadden Pseudo R2 is equal to 0.21, indicating a very good fit. Furthermore, to understand the magnitude of the effects and the contribution of each variable to injury severity probabilities the pseudo-elasticity was assessed, gaining valuable insights into the relative importance and influence of the variables. The RF and the RPLM resulted complementary in identifying several roadways, environmental, vehicle, driver, and cyclist-related factors associated with higher crash severity. Based on the identified contributory factors, safety countermeasures useful to develop strategies for making bike a safer and more friendly form of transport were recommended.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Active travel, Crash contributory factors, Cyclist safety, Econometric models, Machine learning, Safety countermeasures
in
Accident Analysis and Prevention
volume
192
article number
107275
publisher
Elsevier
external identifiers
  • pmid:37683568
  • scopus:85171585916
ISSN
0001-4575
DOI
10.1016/j.aap.2023.107275
language
English
LU publication?
yes
additional info
Publisher Copyright: © 2023 The Authors
id
9685aebf-7492-43e2-9afa-918a96162d78
date added to LUP
2023-09-28 07:23:10
date last changed
2024-04-19 01:45:44
@article{9685aebf-7492-43e2-9afa-918a96162d78,
  abstract     = {{<p>Cycling provides numerous benefits to individuals and to society but the burden of road traffic injuries and fatalities is disproportionately sustained by cyclists. Without awareness of the contributory factors of cyclist death and injury, the capability to implement context-specific and appropriate measures is severely limited. In this paper, we investigated the effects of the characteristics related to the road, the environment, the vehicle involved, the driver, and the cyclist on severity of crashes involving cyclists analysing 72,363 crashes that occurred in Great Britain in the period 2016–2018. Both a machine learning method, as the Random Forest (RF), and an econometric model, as the Random Parameters Logit Model (RPLM), were implemented. Three different RF algorithms were performed, namely the traditional RF, the Weighted Subspace RF, and the Random Survival Forest. The latter demonstrated superior predictive performances both in terms of F-measure and G-mean. The main result of the Random Survival Forest is the variable importance that provides a ranked list of the predictors associated with the fatal and severe cyclist crashes. For fatal classification, 19 variables showed a normalized importance higher than 5% with the second involved vehicle manoeuvring and the gender of the driver of the second vehicle having the greatest predictive ability. For serious injury classification, 13 variables showed a normalized importance higher than 5% with the bike leaving the carriageway having the greatest normalized importance. Furthermore, each path from the root node to the leaf nodes has been retraced the way back generating 361 if-then rules with fatal crash as consequent and 349 if-then rules with serious injury crash as consequent. The RPLM showed significant unobserved heterogeneity in the data finding four normal distributed indicator variables with random parameters: cyclist age ≥ 75 (fatal prediction), cyclist gender male (fatal and serious prediction), and driver aged 55–64 (serious prediction). The model's McFadden Pseudo R<sup>2</sup> is equal to 0.21, indicating a very good fit. Furthermore, to understand the magnitude of the effects and the contribution of each variable to injury severity probabilities the pseudo-elasticity was assessed, gaining valuable insights into the relative importance and influence of the variables. The RF and the RPLM resulted complementary in identifying several roadways, environmental, vehicle, driver, and cyclist-related factors associated with higher crash severity. Based on the identified contributory factors, safety countermeasures useful to develop strategies for making bike a safer and more friendly form of transport were recommended.</p>}},
  author       = {{Scarano, Antonella and Rella Riccardi, Maria and Mauriello, Filomena and D'Agostino, Carmelo and Pasquino, Nicola and Montella, Alfonso}},
  issn         = {{0001-4575}},
  keywords     = {{Active travel; Crash contributory factors; Cyclist safety; Econometric models; Machine learning; Safety countermeasures}},
  language     = {{eng}},
  publisher    = {{Elsevier}},
  series       = {{Accident Analysis and Prevention}},
  title        = {{Injury severity prediction of cyclist crashes using random forests and random parameters logit models}},
  url          = {{http://dx.doi.org/10.1016/j.aap.2023.107275}},
  doi          = {{10.1016/j.aap.2023.107275}},
  volume       = {{192}},
  year         = {{2023}},
}