A new Q-learning algorithm based on the Metropolis criterion
(2004) In IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 34(5). p.2140-2143- Abstract
- The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is de scribed as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments... (More)
- The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is de scribed as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments show that SA-Q-learning converges more quickly than Q-learning or Boltzmann exploration, and that the search does not suffer of performance degradation due to excessive exploration. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/267109
- author
- Guo, MZ
; Liu, Y
and Malec, Jacek
LU
- organization
- publishing date
- 2004
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- reinforcement learning, Q-learning, metropolis criterion, exploitation, exploration
- in
- IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
- volume
- 34
- issue
- 5
- pages
- 2140 - 2143
- publisher
- IEEE - Institute of Electrical and Electronics Engineers Inc.
- external identifiers
-
- pmid:15503510
- wos:000223937400020
- scopus:4844223639
- ISSN
- 1083-4419
- DOI
- 10.1109/TSMCB.2004.832154
- language
- English
- LU publication?
- yes
- id
- d3d2883b-8aff-4013-91af-a9c8fb579fd0 (old id 267109)
- date added to LUP
- 2016-04-01 16:58:51
- date last changed
- 2025-10-14 10:20:47
@article{d3d2883b-8aff-4013-91af-a9c8fb579fd0,
abstract = {{The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is de scribed as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments show that SA-Q-learning converges more quickly than Q-learning or Boltzmann exploration, and that the search does not suffer of performance degradation due to excessive exploration.}},
author = {{Guo, MZ and Liu, Y and Malec, Jacek}},
issn = {{1083-4419}},
keywords = {{reinforcement learning; Q-learning; metropolis criterion; exploitation; exploration}},
language = {{eng}},
number = {{5}},
pages = {{2140--2143}},
publisher = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
series = {{IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics}},
title = {{A new Q-learning algorithm based on the Metropolis criterion}},
url = {{http://dx.doi.org/10.1109/TSMCB.2004.832154}},
doi = {{10.1109/TSMCB.2004.832154}},
volume = {{34}},
year = {{2004}},
}