Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Exploiting structure and uncertainty of Bellman updates in Markov decision processes

Tateo, Davide LU orcid ; D'Eramo, Carlo ; Nuara, Alessandro ; Restelli, Marcello and Bonarini, Andrea (2017) 2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017 In 2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017 - Proceedings 2018-January. p.1-8
Abstract

In many real-world problems stochasticity is a critical issue for the learning process. The sources of stochasticity come from the transition model, the explorative component of the policy or, even worse, from noisy observations of the reward function. For a finite number of samples, traditional Reinforcement Learning (RL) methods provide biased estimates of the action-value function possibly leading to poor estimates, then propagated by the application of the Bellman operator. While some approaches assume that the estimation bias is the key problem in the learning process, we show that in some cases this assumption does not necessarily hold. We propose a method that exploits the structure of the Bellman update and the uncertainty of... (More)

In many real-world problems stochasticity is a critical issue for the learning process. The sources of stochasticity come from the transition model, the explorative component of the policy or, even worse, from noisy observations of the reward function. For a finite number of samples, traditional Reinforcement Learning (RL) methods provide biased estimates of the action-value function possibly leading to poor estimates, then propagated by the application of the Bellman operator. While some approaches assume that the estimation bias is the key problem in the learning process, we show that in some cases this assumption does not necessarily hold. We propose a method that exploits the structure of the Bellman update and the uncertainty of the estimation in order to better use the amount of information provided by the samples. We show theoretical considerations about this method and its relation w.r.t. Q-Learning. Moreover, we test it in environments available in literature in order to demonstrate its effectiveness against other algorithms that focus on bias and sample-efficiency.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; and
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
host publication
2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017 - Proceedings
series title
2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017 - Proceedings
volume
2018-January
pages
8 pages
publisher
IEEE - Institute of Electrical and Electronics Engineers Inc.
conference name
2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017
conference location
Honolulu, United States
conference dates
2017-11-27 - 2017-12-01
external identifiers
  • scopus:85046103830
ISBN
9781538627259
DOI
10.1109/SSCI.2017.8280923
language
English
LU publication?
no
id
043fca42-d1a6-4b3b-aa95-12c407c07809
date added to LUP
2025-10-16 14:43:03
date last changed
2025-10-24 03:39:45
@inproceedings{043fca42-d1a6-4b3b-aa95-12c407c07809,
  abstract     = {{<p>In many real-world problems stochasticity is a critical issue for the learning process. The sources of stochasticity come from the transition model, the explorative component of the policy or, even worse, from noisy observations of the reward function. For a finite number of samples, traditional Reinforcement Learning (RL) methods provide biased estimates of the action-value function possibly leading to poor estimates, then propagated by the application of the Bellman operator. While some approaches assume that the estimation bias is the key problem in the learning process, we show that in some cases this assumption does not necessarily hold. We propose a method that exploits the structure of the Bellman update and the uncertainty of the estimation in order to better use the amount of information provided by the samples. We show theoretical considerations about this method and its relation w.r.t. Q-Learning. Moreover, we test it in environments available in literature in order to demonstrate its effectiveness against other algorithms that focus on bias and sample-efficiency.</p>}},
  author       = {{Tateo, Davide and D'Eramo, Carlo and Nuara, Alessandro and Restelli, Marcello and Bonarini, Andrea}},
  booktitle    = {{2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017 - Proceedings}},
  isbn         = {{9781538627259}},
  language     = {{eng}},
  month        = {{07}},
  pages        = {{1--8}},
  publisher    = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
  series       = {{2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017 - Proceedings}},
  title        = {{Exploiting structure and uncertainty of Bellman updates in Markov decision processes}},
  url          = {{http://dx.doi.org/10.1109/SSCI.2017.8280923}},
  doi          = {{10.1109/SSCI.2017.8280923}},
  volume       = {{2018-January}},
  year         = {{2017}},
}