Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Hybrid Machine Learning for Stock Return Prediction: Integrating Technical and Fundamental Information in the S&P 100

Gummesson, Elias LU (2026) NEKH02 20252
Department of Economics
Abstract
This thesis investigates whether integrating technical market data with fundamental accounting variables improves the out-of-sample predictability of stock returns in the highly liquid S&P 100 universe. Using an expanding window approach from 2010 to 2024, the study compares the performance of regularized linear (Ridge Regression) and non-linear (Histogram Gradient Boosting) estimators against an autoregressive benchmark.

The empirical results reveal a divergence between statistical accuracy and economic value. Statistical tests reject the hypothesis that a hybrid feature set is necessary for minimizing forecast error; the Ridge Fundamental model achieved the lowest out-of-sample RMSE (0.0668), significantly outperforming the benchmark... (More)
This thesis investigates whether integrating technical market data with fundamental accounting variables improves the out-of-sample predictability of stock returns in the highly liquid S&P 100 universe. Using an expanding window approach from 2010 to 2024, the study compares the performance of regularized linear (Ridge Regression) and non-linear (Histogram Gradient Boosting) estimators against an autoregressive benchmark.

The empirical results reveal a divergence between statistical accuracy and economic value. Statistical tests reject the hypothesis that a hybrid feature set is necessary for minimizing forecast error; the Ridge Fundamental model achieved the lowest out-of-sample RMSE (0.0668), significantly outperforming the benchmark (p<0.01). However, in terms of economic significance, the Ridge Hybrid model proved superior, delivering the highest annualized Sharpe ratio ($1.46$) and significantly reducing portfolio volatility compared to the market benchmark.

Contrasting with the "virtue of complexity" often discussed in machine learning literature, this study finds that simple linear models consistently outperform complex non-linear specifications in this asset universe. Feature importance analysis supports the Signal Persistence Hypothesis: non-linear models underperformed by prioritizing high-frequency technical signals that decay rapidly within the monthly prediction horizon, whereas regularized linear models succeeded by isolating persistent fundamental characteristics such as valuation and asset growth. These findings suggest that in efficient markets, hybrid models create value not by improving raw accuracy, but by conditioning fundamental signals on liquidity constraints to manage downside risk. (Less)
Please use this url to cite or link to this publication:
author
Gummesson, Elias LU
supervisor
organization
course
NEKH02 20252
year
type
M2 - Bachelor Degree
subject
keywords
finance machine learning stock forecasting fundamental analysis
language
English
id
9221892
date added to LUP
2026-02-04 08:21:57
date last changed
2026-02-04 08:21:57
@misc{9221892,
  abstract     = {{This thesis investigates whether integrating technical market data with fundamental accounting variables improves the out-of-sample predictability of stock returns in the highly liquid S&P 100 universe. Using an expanding window approach from 2010 to 2024, the study compares the performance of regularized linear (Ridge Regression) and non-linear (Histogram Gradient Boosting) estimators against an autoregressive benchmark.

The empirical results reveal a divergence between statistical accuracy and economic value. Statistical tests reject the hypothesis that a hybrid feature set is necessary for minimizing forecast error; the Ridge Fundamental model achieved the lowest out-of-sample RMSE (0.0668), significantly outperforming the benchmark (p<0.01). However, in terms of economic significance, the Ridge Hybrid model proved superior, delivering the highest annualized Sharpe ratio ($1.46$) and significantly reducing portfolio volatility compared to the market benchmark.

Contrasting with the "virtue of complexity" often discussed in machine learning literature, this study finds that simple linear models consistently outperform complex non-linear specifications in this asset universe. Feature importance analysis supports the Signal Persistence Hypothesis: non-linear models underperformed by prioritizing high-frequency technical signals that decay rapidly within the monthly prediction horizon, whereas regularized linear models succeeded by isolating persistent fundamental characteristics such as valuation and asset growth. These findings suggest that in efficient markets, hybrid models create value not by improving raw accuracy, but by conditioning fundamental signals on liquidity constraints to manage downside risk.}},
  author       = {{Gummesson, Elias}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Hybrid Machine Learning for Stock Return Prediction: Integrating Technical and Fundamental Information in the S&P 100}},
  year         = {{2026}},
}