Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

An empirical analysis of Measure-Valued Derivatives for policy gradients

Carvalho, Joao ; Tateo, Davide LU orcid ; Muratore, Fabio and Peters, Jan (2021) 2021 International Joint Conference on Neural Networks, IJCNN 2021 In Proceedings of the International Joint Conference on Neural Networks 2021-July.
Abstract

Reinforcement learning methods for robotics are increasingly successful due to the constant development of better policy gradient techniques. A precise (low variance) and accurate (low bias) gradient estimator is crucial to face increasingly complex tasks. Traditional policy gradient algorithms use the likelihood-ratio trick, which is known to produce unbiased but high variance estimates. More modern approaches exploit the reparametrization trick, which gives lower variance gradient estimates but requires differentiable value function approximators. In this work, we study a different type of stochastic gradient estimator: the Measure-Valued Derivative. This estimator is unbiased, has low variance, and can be used with differentiable and... (More)

Reinforcement learning methods for robotics are increasingly successful due to the constant development of better policy gradient techniques. A precise (low variance) and accurate (low bias) gradient estimator is crucial to face increasingly complex tasks. Traditional policy gradient algorithms use the likelihood-ratio trick, which is known to produce unbiased but high variance estimates. More modern approaches exploit the reparametrization trick, which gives lower variance gradient estimates but requires differentiable value function approximators. In this work, we study a different type of stochastic gradient estimator: the Measure-Valued Derivative. This estimator is unbiased, has low variance, and can be used with differentiable and non-differentiable function approximators. We empirically evaluate this estimator in the actor-critic policy gradient setting and show that it can reach comparable performance with methods based on the likelihood-ratio or reparametrization tricks, both in low and high-dimensional action spaces.

(Less)
Please use this url to cite or link to this publication:
author
; ; and
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
host publication
IJCNN 2021 - International Joint Conference on Neural Networks, Proceedings
series title
Proceedings of the International Joint Conference on Neural Networks
volume
2021-July
publisher
IEEE - Institute of Electrical and Electronics Engineers Inc.
conference name
2021 International Joint Conference on Neural Networks, IJCNN 2021
conference location
Virtual, Shenzhen, China
conference dates
2021-07-18 - 2021-07-22
external identifiers
  • scopus:85116438311
ISBN
9780738133669
DOI
10.1109/IJCNN52387.2021.9533642
language
English
LU publication?
no
id
3e3de528-c141-4380-8908-a9f5a7503d9d
date added to LUP
2025-10-16 14:36:39
date last changed
2025-10-23 03:43:29
@inproceedings{3e3de528-c141-4380-8908-a9f5a7503d9d,
  abstract     = {{<p>Reinforcement learning methods for robotics are increasingly successful due to the constant development of better policy gradient techniques. A precise (low variance) and accurate (low bias) gradient estimator is crucial to face increasingly complex tasks. Traditional policy gradient algorithms use the likelihood-ratio trick, which is known to produce unbiased but high variance estimates. More modern approaches exploit the reparametrization trick, which gives lower variance gradient estimates but requires differentiable value function approximators. In this work, we study a different type of stochastic gradient estimator: the Measure-Valued Derivative. This estimator is unbiased, has low variance, and can be used with differentiable and non-differentiable function approximators. We empirically evaluate this estimator in the actor-critic policy gradient setting and show that it can reach comparable performance with methods based on the likelihood-ratio or reparametrization tricks, both in low and high-dimensional action spaces.</p>}},
  author       = {{Carvalho, Joao and Tateo, Davide and Muratore, Fabio and Peters, Jan}},
  booktitle    = {{IJCNN 2021 - International Joint Conference on Neural Networks, Proceedings}},
  isbn         = {{9780738133669}},
  language     = {{eng}},
  month        = {{07}},
  publisher    = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
  series       = {{Proceedings of the International Joint Conference on Neural Networks}},
  title        = {{An empirical analysis of Measure-Valued Derivatives for policy gradients}},
  url          = {{http://dx.doi.org/10.1109/IJCNN52387.2021.9533642}},
  doi          = {{10.1109/IJCNN52387.2021.9533642}},
  volume       = {{2021-July}},
  year         = {{2021}},
}