Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

A new Q-learning algorithm based on the Metropolis criterion

Guo, MZ ; Liu, Y and Malec, Jacek LU orcid (2004) In IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 34(5). p.2140-2143
Abstract
The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is de scribed as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments... (More)
The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is de scribed as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments show that SA-Q-learning converges more quickly than Q-learning or Boltzmann exploration, and that the search does not suffer of performance degradation due to excessive exploration. (Less)
Please use this url to cite or link to this publication:
author
; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
reinforcement learning, Q-learning, metropolis criterion, exploitation, exploration
in
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
volume
34
issue
5
pages
2140 - 2143
publisher
IEEE - Institute of Electrical and Electronics Engineers Inc.
external identifiers
  • pmid:15503510
  • wos:000223937400020
  • scopus:4844223639
ISSN
1083-4419
DOI
10.1109/TSMCB.2004.832154
language
English
LU publication?
yes
id
d3d2883b-8aff-4013-91af-a9c8fb579fd0 (old id 267109)
date added to LUP
2016-04-01 16:58:51
date last changed
2022-04-15 08:15:20
@article{d3d2883b-8aff-4013-91af-a9c8fb579fd0,
  abstract     = {{The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is de scribed as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments show that SA-Q-learning converges more quickly than Q-learning or Boltzmann exploration, and that the search does not suffer of performance degradation due to excessive exploration.}},
  author       = {{Guo, MZ and Liu, Y and Malec, Jacek}},
  issn         = {{1083-4419}},
  keywords     = {{reinforcement learning; Q-learning; metropolis criterion; exploitation; exploration}},
  language     = {{eng}},
  number       = {{5}},
  pages        = {{2140--2143}},
  publisher    = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
  series       = {{IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics}},
  title        = {{A new Q-learning algorithm based on the Metropolis criterion}},
  url          = {{http://dx.doi.org/10.1109/TSMCB.2004.832154}},
  doi          = {{10.1109/TSMCB.2004.832154}},
  volume       = {{34}},
  year         = {{2004}},
}