Forecasting Travel Patterns with Machine Learning: A Case Study Using Public Transport Data
(2025) In Master's Theses in Mathematical Sciences FMAM05 20251Mathematics (Faculty of Engineering)
- Abstract
- This thesis investigates the application of machine learning and statistical methods for forecasting hourly public transport validations using ticket‐validation data. After cleaning and aggregating validations into hourly counts, temporal features such as hour of day, weekday, and month are engineered. The modeling approaches include long short‐term memory networks trained with mean absolute error and Huber losses, an extremely randomized trees ensemble benchmarked via LazyPredict, and a naive persistence baseline. Models are trained on stable pre‐pandemic data and evaluated on both held‐out pre‐pandemic and pandemic‐affected test sets to assess generalization under structural breaks. Forecast accuracy is measured using mean absolute... (More)
- This thesis investigates the application of machine learning and statistical methods for forecasting hourly public transport validations using ticket‐validation data. After cleaning and aggregating validations into hourly counts, temporal features such as hour of day, weekday, and month are engineered. The modeling approaches include long short‐term memory networks trained with mean absolute error and Huber losses, an extremely randomized trees ensemble benchmarked via LazyPredict, and a naive persistence baseline. Models are trained on stable pre‐pandemic data and evaluated on both held‐out pre‐pandemic and pandemic‐affected test sets to assess generalization under structural breaks. Forecast accuracy is measured using mean absolute error, root mean squared error, and the coefficient of determination. The Huber‐trained long short-term memory achieves the lowest root mean squared error on stable data, while the extremely randomized trees model demonstrates greater robustness when demand patterns shift abruptly. All methods outperform the naive baseline but struggle to fully adapt to such disruptions without additional contextual inputs. These results can guide the development of real-time predictive features in transit applications and highlight the potential benefits of incorporating exogenous factors such as weather conditions, public holidays, major events, or service disruptions, along with personalization and adaptive retraining strategies. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9196200
- author
- Augustsson, Ellinor LU
- supervisor
- organization
- course
- FMAM05 20251
- year
- 2025
- type
- H2 - Master's Degree (Two Years)
- subject
- publication/series
- Master's Theses in Mathematical Sciences
- report number
- LUTFMA-3591-2025
- ISSN
- 1404-6342
- other publication id
- 2025:E48
- language
- English
- id
- 9196200
- date added to LUP
- 2025-09-15 11:13:38
- date last changed
- 2025-09-15 11:13:38
@misc{9196200, abstract = {{This thesis investigates the application of machine learning and statistical methods for forecasting hourly public transport validations using ticket‐validation data. After cleaning and aggregating validations into hourly counts, temporal features such as hour of day, weekday, and month are engineered. The modeling approaches include long short‐term memory networks trained with mean absolute error and Huber losses, an extremely randomized trees ensemble benchmarked via LazyPredict, and a naive persistence baseline. Models are trained on stable pre‐pandemic data and evaluated on both held‐out pre‐pandemic and pandemic‐affected test sets to assess generalization under structural breaks. Forecast accuracy is measured using mean absolute error, root mean squared error, and the coefficient of determination. The Huber‐trained long short-term memory achieves the lowest root mean squared error on stable data, while the extremely randomized trees model demonstrates greater robustness when demand patterns shift abruptly. All methods outperform the naive baseline but struggle to fully adapt to such disruptions without additional contextual inputs. These results can guide the development of real-time predictive features in transit applications and highlight the potential benefits of incorporating exogenous factors such as weather conditions, public holidays, major events, or service disruptions, along with personalization and adaptive retraining strategies.}}, author = {{Augustsson, Ellinor}}, issn = {{1404-6342}}, language = {{eng}}, note = {{Student Paper}}, series = {{Master's Theses in Mathematical Sciences}}, title = {{Forecasting Travel Patterns with Machine Learning: A Case Study Using Public Transport Data}}, year = {{2025}}, }