Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Forecasting Travel Patterns with Machine Learning: A Case Study Using Public Transport Data

Augustsson, Ellinor LU (2025) In Master's Theses in Mathematical Sciences FMAM05 20251
Mathematics (Faculty of Engineering)
Abstract
This thesis investigates the application of machine learning and statistical methods for forecasting hourly public transport validations using ticket‐validation data. After cleaning and aggregating validations into hourly counts, temporal features such as hour of day, weekday, and month are engineered. The modeling approaches include long short‐term memory networks trained with mean absolute error and Huber losses, an extremely randomized trees ensemble benchmarked via LazyPredict, and a naive persistence baseline. Models are trained on stable pre‐pandemic data and evaluated on both held‐out pre‐pandemic and pandemic‐affected test sets to assess generalization under structural breaks. Forecast accuracy is measured using mean absolute... (More)
This thesis investigates the application of machine learning and statistical methods for forecasting hourly public transport validations using ticket‐validation data. After cleaning and aggregating validations into hourly counts, temporal features such as hour of day, weekday, and month are engineered. The modeling approaches include long short‐term memory networks trained with mean absolute error and Huber losses, an extremely randomized trees ensemble benchmarked via LazyPredict, and a naive persistence baseline. Models are trained on stable pre‐pandemic data and evaluated on both held‐out pre‐pandemic and pandemic‐affected test sets to assess generalization under structural breaks. Forecast accuracy is measured using mean absolute error, root mean squared error, and the coefficient of determination. The Huber‐trained long short-term memory achieves the lowest root mean squared error on stable data, while the extremely randomized trees model demonstrates greater robustness when demand patterns shift abruptly. All methods outperform the naive baseline but struggle to fully adapt to such disruptions without additional contextual inputs. These results can guide the development of real-time predictive features in transit applications and highlight the potential benefits of incorporating exogenous factors such as weather conditions, public holidays, major events, or service disruptions, along with personalization and adaptive retraining strategies. (Less)
Please use this url to cite or link to this publication:
author
Augustsson, Ellinor LU
supervisor
organization
course
FMAM05 20251
year
type
H2 - Master's Degree (Two Years)
subject
publication/series
Master's Theses in Mathematical Sciences
report number
LUTFMA-3591-2025
ISSN
1404-6342
other publication id
2025:E48
language
English
id
9196200
date added to LUP
2025-09-15 11:13:38
date last changed
2025-09-15 11:13:38
@misc{9196200,
  abstract     = {{This thesis investigates the application of machine learning and statistical methods for forecasting hourly public transport validations using ticket‐validation data. After cleaning and aggregating validations into hourly counts, temporal features such as hour of day, weekday, and month are engineered. The modeling approaches include long short‐term memory networks trained with mean absolute error and Huber losses, an extremely randomized trees ensemble benchmarked via LazyPredict, and a naive persistence baseline. Models are trained on stable pre‐pandemic data and evaluated on both held‐out pre‐pandemic and pandemic‐affected test sets to assess generalization under structural breaks. Forecast accuracy is measured using mean absolute error, root mean squared error, and the coefficient of determination. The Huber‐trained long short-term memory achieves the lowest root mean squared error on stable data, while the extremely randomized trees model demonstrates greater robustness when demand patterns shift abruptly. All methods outperform the naive baseline but struggle to fully adapt to such disruptions without additional contextual inputs. These results can guide the development of real-time predictive features in transit applications and highlight the potential benefits of incorporating exogenous factors such as weather conditions, public holidays, major events, or service disruptions, along with personalization and adaptive retraining strategies.}},
  author       = {{Augustsson, Ellinor}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master's Theses in Mathematical Sciences}},
  title        = {{Forecasting Travel Patterns with Machine Learning: A Case Study Using Public Transport Data}},
  year         = {{2025}},
}