Advanced

A new Q-learning algorithm based on the Metropolis criterion

Guo, MZ; Liu, Y and Malec, Jacek LU (2004) In IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics 34(5). p.2140-2143
Abstract
The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is de scribed as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments... (More)
The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is de scribed as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments show that SA-Q-learning converges more quickly than Q-learning or Boltzmann exploration, and that the search does not suffer of performance degradation due to excessive exploration. (Less)
Please use this url to cite or link to this publication:
author
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
reinforcement learning, Q-learning, metropolis criterion, exploitation, exploration
in
IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics
volume
34
issue
5
pages
2140 - 2143
publisher
IEEE--Institute of Electrical and Electronics Engineers Inc.
external identifiers
  • pmid:15503510
  • wos:000223937400020
  • scopus:4844223639
ISSN
1083-4419
DOI
10.1109/TSMCB.2004.832154
language
English
LU publication?
yes
id
d3d2883b-8aff-4013-91af-a9c8fb579fd0 (old id 267109)
date added to LUP
2007-08-02 14:20:00
date last changed
2017-12-10 04:38:09
@article{d3d2883b-8aff-4013-91af-a9c8fb579fd0,
  abstract     = {The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is de scribed as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments show that SA-Q-learning converges more quickly than Q-learning or Boltzmann exploration, and that the search does not suffer of performance degradation due to excessive exploration.},
  author       = {Guo, MZ and Liu, Y and Malec, Jacek},
  issn         = {1083-4419},
  keyword      = {reinforcement learning,Q-learning,metropolis criterion,exploitation,exploration},
  language     = {eng},
  number       = {5},
  pages        = {2140--2143},
  publisher    = {IEEE--Institute of Electrical and Electronics Engineers Inc.},
  series       = {IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics},
  title        = {A new Q-learning algorithm based on the Metropolis criterion},
  url          = {http://dx.doi.org/10.1109/TSMCB.2004.832154},
  volume       = {34},
  year         = {2004},
}