Hybrid Machine Learning for Stock Return Prediction: Integrating Technical and Fundamental Information in the S&P 100
(2026) NEKH02 20252Department of Economics
- Abstract
- This thesis investigates whether integrating technical market data with fundamental accounting variables improves the out-of-sample predictability of stock returns in the highly liquid S&P 100 universe. Using an expanding window approach from 2010 to 2024, the study compares the performance of regularized linear (Ridge Regression) and non-linear (Histogram Gradient Boosting) estimators against an autoregressive benchmark.
The empirical results reveal a divergence between statistical accuracy and economic value. Statistical tests reject the hypothesis that a hybrid feature set is necessary for minimizing forecast error; the Ridge Fundamental model achieved the lowest out-of-sample RMSE (0.0668), significantly outperforming the benchmark... (More) - This thesis investigates whether integrating technical market data with fundamental accounting variables improves the out-of-sample predictability of stock returns in the highly liquid S&P 100 universe. Using an expanding window approach from 2010 to 2024, the study compares the performance of regularized linear (Ridge Regression) and non-linear (Histogram Gradient Boosting) estimators against an autoregressive benchmark.
The empirical results reveal a divergence between statistical accuracy and economic value. Statistical tests reject the hypothesis that a hybrid feature set is necessary for minimizing forecast error; the Ridge Fundamental model achieved the lowest out-of-sample RMSE (0.0668), significantly outperforming the benchmark (p<0.01). However, in terms of economic significance, the Ridge Hybrid model proved superior, delivering the highest annualized Sharpe ratio ($1.46$) and significantly reducing portfolio volatility compared to the market benchmark.
Contrasting with the "virtue of complexity" often discussed in machine learning literature, this study finds that simple linear models consistently outperform complex non-linear specifications in this asset universe. Feature importance analysis supports the Signal Persistence Hypothesis: non-linear models underperformed by prioritizing high-frequency technical signals that decay rapidly within the monthly prediction horizon, whereas regularized linear models succeeded by isolating persistent fundamental characteristics such as valuation and asset growth. These findings suggest that in efficient markets, hybrid models create value not by improving raw accuracy, but by conditioning fundamental signals on liquidity constraints to manage downside risk. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9221892
- author
- Gummesson, Elias LU
- supervisor
- organization
- course
- NEKH02 20252
- year
- 2026
- type
- M2 - Bachelor Degree
- subject
- keywords
- finance machine learning stock forecasting fundamental analysis
- language
- English
- id
- 9221892
- date added to LUP
- 2026-02-04 08:21:57
- date last changed
- 2026-02-04 08:21:57
@misc{9221892,
abstract = {{This thesis investigates whether integrating technical market data with fundamental accounting variables improves the out-of-sample predictability of stock returns in the highly liquid S&P 100 universe. Using an expanding window approach from 2010 to 2024, the study compares the performance of regularized linear (Ridge Regression) and non-linear (Histogram Gradient Boosting) estimators against an autoregressive benchmark.
The empirical results reveal a divergence between statistical accuracy and economic value. Statistical tests reject the hypothesis that a hybrid feature set is necessary for minimizing forecast error; the Ridge Fundamental model achieved the lowest out-of-sample RMSE (0.0668), significantly outperforming the benchmark (p<0.01). However, in terms of economic significance, the Ridge Hybrid model proved superior, delivering the highest annualized Sharpe ratio ($1.46$) and significantly reducing portfolio volatility compared to the market benchmark.
Contrasting with the "virtue of complexity" often discussed in machine learning literature, this study finds that simple linear models consistently outperform complex non-linear specifications in this asset universe. Feature importance analysis supports the Signal Persistence Hypothesis: non-linear models underperformed by prioritizing high-frequency technical signals that decay rapidly within the monthly prediction horizon, whereas regularized linear models succeeded by isolating persistent fundamental characteristics such as valuation and asset growth. These findings suggest that in efficient markets, hybrid models create value not by improving raw accuracy, but by conditioning fundamental signals on liquidity constraints to manage downside risk.}},
author = {{Gummesson, Elias}},
language = {{eng}},
note = {{Student Paper}},
title = {{Hybrid Machine Learning for Stock Return Prediction: Integrating Technical and Fundamental Information in the S&P 100}},
year = {{2026}},
}