Exploiting structure and uncertainty of Bellman updates in Markov decision processes
(2017) 2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017 In 2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017 - Proceedings 2018-January. p.1-8- Abstract
In many real-world problems stochasticity is a critical issue for the learning process. The sources of stochasticity come from the transition model, the explorative component of the policy or, even worse, from noisy observations of the reward function. For a finite number of samples, traditional Reinforcement Learning (RL) methods provide biased estimates of the action-value function possibly leading to poor estimates, then propagated by the application of the Bellman operator. While some approaches assume that the estimation bias is the key problem in the learning process, we show that in some cases this assumption does not necessarily hold. We propose a method that exploits the structure of the Bellman update and the uncertainty of... (More)
In many real-world problems stochasticity is a critical issue for the learning process. The sources of stochasticity come from the transition model, the explorative component of the policy or, even worse, from noisy observations of the reward function. For a finite number of samples, traditional Reinforcement Learning (RL) methods provide biased estimates of the action-value function possibly leading to poor estimates, then propagated by the application of the Bellman operator. While some approaches assume that the estimation bias is the key problem in the learning process, we show that in some cases this assumption does not necessarily hold. We propose a method that exploits the structure of the Bellman update and the uncertainty of the estimation in order to better use the amount of information provided by the samples. We show theoretical considerations about this method and its relation w.r.t. Q-Learning. Moreover, we test it in environments available in literature in order to demonstrate its effectiveness against other algorithms that focus on bias and sample-efficiency.
(Less)
- author
- Tateo, Davide
LU
; D'Eramo, Carlo
; Nuara, Alessandro
; Restelli, Marcello
and Bonarini, Andrea
- publishing date
- 2017-07-01
- type
- Chapter in Book/Report/Conference proceeding
- publication status
- published
- subject
- host publication
- 2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017 - Proceedings
- series title
- 2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017 - Proceedings
- volume
- 2018-January
- pages
- 8 pages
- publisher
- IEEE - Institute of Electrical and Electronics Engineers Inc.
- conference name
- 2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017
- conference location
- Honolulu, United States
- conference dates
- 2017-11-27 - 2017-12-01
- external identifiers
-
- scopus:85046103830
- ISBN
- 9781538627259
- DOI
- 10.1109/SSCI.2017.8280923
- language
- English
- LU publication?
- no
- id
- 043fca42-d1a6-4b3b-aa95-12c407c07809
- date added to LUP
- 2025-10-16 14:43:03
- date last changed
- 2025-10-24 03:39:45
@inproceedings{043fca42-d1a6-4b3b-aa95-12c407c07809,
abstract = {{<p>In many real-world problems stochasticity is a critical issue for the learning process. The sources of stochasticity come from the transition model, the explorative component of the policy or, even worse, from noisy observations of the reward function. For a finite number of samples, traditional Reinforcement Learning (RL) methods provide biased estimates of the action-value function possibly leading to poor estimates, then propagated by the application of the Bellman operator. While some approaches assume that the estimation bias is the key problem in the learning process, we show that in some cases this assumption does not necessarily hold. We propose a method that exploits the structure of the Bellman update and the uncertainty of the estimation in order to better use the amount of information provided by the samples. We show theoretical considerations about this method and its relation w.r.t. Q-Learning. Moreover, we test it in environments available in literature in order to demonstrate its effectiveness against other algorithms that focus on bias and sample-efficiency.</p>}},
author = {{Tateo, Davide and D'Eramo, Carlo and Nuara, Alessandro and Restelli, Marcello and Bonarini, Andrea}},
booktitle = {{2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017 - Proceedings}},
isbn = {{9781538627259}},
language = {{eng}},
month = {{07}},
pages = {{1--8}},
publisher = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
series = {{2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017 - Proceedings}},
title = {{Exploiting structure and uncertainty of Bellman updates in Markov decision processes}},
url = {{http://dx.doi.org/10.1109/SSCI.2017.8280923}},
doi = {{10.1109/SSCI.2017.8280923}},
volume = {{2018-January}},
year = {{2017}},
}