Exploiting structure and uncertainty of Bellman updates in Markov decision processes

Tateo, Davide; D'Eramo, Carlo; Nuara, Alessandro; Restelli, Marcello; Bonarini, Andrea

Exploiting structure and uncertainty of Bellman updates in Markov decision processes

Mark

Tateo, Davide ^LU

; D'Eramo, Carlo ; Nuara, Alessandro ; Restelli, Marcello and Bonarini, Andrea (2017) 2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017 In 2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017 - Proceedings 2018-January. p.1-8

Abstract: In many real-world problems stochasticity is a critical issue for the learning process. The sources of stochasticity come from the transition model, the explorative component of the policy or, even worse, from noisy observations of the reward function. For a finite number of samples, traditional Reinforcement Learning (RL) methods provide biased estimates of the action-value function possibly leading to poor estimates, then propagated by the application of the Bellman operator. While some approaches assume that the estimation bias is the key problem in the learning process, we show that in some cases this assumption does not necessarily hold. We propose a method that exploits the structure of the Bellman update and the uncertainty of... (More); In many real-world problems stochasticity is a critical issue for the learning process. The sources of stochasticity come from the transition model, the explorative component of the policy or, even worse, from noisy observations of the reward function. For a finite number of samples, traditional Reinforcement Learning (RL) methods provide biased estimates of the action-value function possibly leading to poor estimates, then propagated by the application of the Bellman operator. While some approaches assume that the estimation bias is the key problem in the learning process, we show that in some cases this assumption does not necessarily hold. We propose a method that exploits the structure of the Bellman update and the uncertainty of the estimation in order to better use the amount of information provided by the samples. We show theoretical considerations about this method and its relation w.r.t. Q-Learning. Moreover, we test it in environments available in literature in order to demonstrate its effectiveness against other algorithms that focus on bias and sample-efficiency.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/043fca42-d1a6-4b3b-aa95-12c407c07809

author

Tateo, Davide ^LU

; D'Eramo, Carlo ; Nuara, Alessandro ; Restelli, Marcello and Bonarini, Andrea

publishing date

2017-07-01

type

Chapter in Book/Report/Conference proceeding

publication status

published

subject

host publication

2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017 - Proceedings

series title

2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017 - Proceedings

volume

2018-January

pages

8 pages

publisher

IEEE - Institute of Electrical and Electronics Engineers Inc.

conference name

2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017

conference location

Honolulu, United States

conference dates

2017-11-27 - 2017-12-01

external identifiers

scopus:85046103830

ISBN

9781538627259

DOI

10.1109/SSCI.2017.8280923

language

English

LU publication?

no

id

043fca42-d1a6-4b3b-aa95-12c407c07809

date added to LUP

2025-10-16 14:43:03

date last changed

2025-10-24 03:39:45

@inproceedings{043fca42-d1a6-4b3b-aa95-12c407c07809,
  abstract     = {{<p>In many real-world problems stochasticity is a critical issue for the learning process. The sources of stochasticity come from the transition model, the explorative component of the policy or, even worse, from noisy observations of the reward function. For a finite number of samples, traditional Reinforcement Learning (RL) methods provide biased estimates of the action-value function possibly leading to poor estimates, then propagated by the application of the Bellman operator. While some approaches assume that the estimation bias is the key problem in the learning process, we show that in some cases this assumption does not necessarily hold. We propose a method that exploits the structure of the Bellman update and the uncertainty of the estimation in order to better use the amount of information provided by the samples. We show theoretical considerations about this method and its relation w.r.t. Q-Learning. Moreover, we test it in environments available in literature in order to demonstrate its effectiveness against other algorithms that focus on bias and sample-efficiency.</p>}},
  author       = {{Tateo, Davide and D'Eramo, Carlo and Nuara, Alessandro and Restelli, Marcello and Bonarini, Andrea}},
  booktitle    = {{2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017 - Proceedings}},
  isbn         = {{9781538627259}},
  language     = {{eng}},
  month        = {{07}},
  pages        = {{1--8}},
  publisher    = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
  series       = {{2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017 - Proceedings}},
  title        = {{Exploiting structure and uncertainty of Bellman updates in Markov decision processes}},
  url          = {{http://dx.doi.org/10.1109/SSCI.2017.8280923}},
  doi          = {{10.1109/SSCI.2017.8280923}},
  volume       = {{2018-January}},
  year         = {{2017}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Exploiting structure and uncertainty of Bellman updates in Markov decision processes