Hybrid Machine Learning for Stock Return Prediction: Integrating Technical and Fundamental Information in the S&P 100

Gummesson, Elias

Hybrid Machine Learning for Stock Return Prediction: Integrating Technical and Fundamental Information in the S&P 100

Mark

Gummesson, Elias ^LU (2026) NEKH02 20252
Department of Economics

Abstract: This thesis investigates whether integrating technical market data with fundamental accounting variables improves the out-of-sample predictability of stock returns in the highly liquid S&P 100 universe. Using an expanding window approach from 2010 to 2024, the study compares the performance of regularized linear (Ridge Regression) and non-linear (Histogram Gradient Boosting) estimators against an autoregressive benchmark.

The empirical results reveal a divergence between statistical accuracy and economic value. Statistical tests reject the hypothesis that a hybrid feature set is necessary for minimizing forecast error; the Ridge Fundamental model achieved the lowest out-of-sample RMSE (0.0668), significantly outperforming the benchmark... (More); This thesis investigates whether integrating technical market data with fundamental accounting variables improves the out-of-sample predictability of stock returns in the highly liquid S&P 100 universe. Using an expanding window approach from 2010 to 2024, the study compares the performance of regularized linear (Ridge Regression) and non-linear (Histogram Gradient Boosting) estimators against an autoregressive benchmark.

The empirical results reveal a divergence between statistical accuracy and economic value. Statistical tests reject the hypothesis that a hybrid feature set is necessary for minimizing forecast error; the Ridge Fundamental model achieved the lowest out-of-sample RMSE (0.0668), significantly outperforming the benchmark (p<0.01). However, in terms of economic significance, the Ridge Hybrid model proved superior, delivering the highest annualized Sharpe ratio ($1.46$) and significantly reducing portfolio volatility compared to the market benchmark.

Contrasting with the "virtue of complexity" often discussed in machine learning literature, this study finds that simple linear models consistently outperform complex non-linear specifications in this asset universe. Feature importance analysis supports the Signal Persistence Hypothesis: non-linear models underperformed by prioritizing high-frequency technical signals that decay rapidly within the monthly prediction horizon, whereas regularized linear models succeeded by isolating persistent fundamental characteristics such as valuation and asset growth. These findings suggest that in efficient markets, hybrid models create value not by improving raw accuracy, but by conditioning fundamental signals on liquidity constraints to manage downside risk. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9221892

author

Gummesson, Elias ^LU

supervisor

Luca Margaritella

organization

Department of Economics

course

NEKH02 20252

year

2026

type

M2 - Bachelor Degree

subject

Business and Economics

keywords

finance machine learning stock forecasting fundamental analysis

language

English

id

9221892

date added to LUP

2026-02-04 08:21:57

date last changed

2026-02-04 08:21:57

@misc{9221892,
  abstract     = {{This thesis investigates whether integrating technical market data with fundamental accounting variables improves the out-of-sample predictability of stock returns in the highly liquid S&P 100 universe. Using an expanding window approach from 2010 to 2024, the study compares the performance of regularized linear (Ridge Regression) and non-linear (Histogram Gradient Boosting) estimators against an autoregressive benchmark.

The empirical results reveal a divergence between statistical accuracy and economic value. Statistical tests reject the hypothesis that a hybrid feature set is necessary for minimizing forecast error; the Ridge Fundamental model achieved the lowest out-of-sample RMSE (0.0668), significantly outperforming the benchmark (p<0.01). However, in terms of economic significance, the Ridge Hybrid model proved superior, delivering the highest annualized Sharpe ratio ($1.46$) and significantly reducing portfolio volatility compared to the market benchmark.

Contrasting with the "virtue of complexity" often discussed in machine learning literature, this study finds that simple linear models consistently outperform complex non-linear specifications in this asset universe. Feature importance analysis supports the Signal Persistence Hypothesis: non-linear models underperformed by prioritizing high-frequency technical signals that decay rapidly within the monthly prediction horizon, whereas regularized linear models succeeded by isolating persistent fundamental characteristics such as valuation and asset growth. These findings suggest that in efficient markets, hybrid models create value not by improving raw accuracy, but by conditioning fundamental signals on liquidity constraints to manage downside risk.}},
  author       = {{Gummesson, Elias}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Hybrid Machine Learning for Stock Return Prediction: Integrating Technical and Fundamental Information in the S&P 100}},
  year         = {{2026}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Hybrid Machine Learning for Stock Return Prediction: Integrating Technical and Fundamental Information in the S&P 100