A new Q-learning algorithm based on the Metropolis criterion

Guo, MZ; Liu, Y; Malec, Jacek

A new Q-learning algorithm based on the Metropolis criterion

Mark

Guo, MZ ; Liu, Y and Malec, Jacek ^LU

(2004) In IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 34(5). p.2140-2143

Abstract: The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is de scribed as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments... (More); The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is de scribed as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments show that SA-Q-learning converges more quickly than Q-learning or Boltzmann exploration, and that the search does not suffer of performance degradation due to excessive exploration. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/267109

author

Guo, MZ ; Liu, Y and Malec, Jacek ^LU

organization

Computer Science

publishing date

2004

type

Contribution to journal

publication status

published

subject

Computer Sciences

keywords

reinforcement learning, Q-learning, metropolis criterion, exploitation, exploration

in

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

volume

34

issue

5

pages

2140 - 2143

publisher

IEEE - Institute of Electrical and Electronics Engineers Inc.

external identifiers

pmid:15503510
wos:000223937400020
scopus:4844223639

ISSN

1083-4419

DOI

10.1109/TSMCB.2004.832154

language

English

LU publication?

yes

id

d3d2883b-8aff-4013-91af-a9c8fb579fd0 (old id 267109)

date added to LUP

2016-04-01 16:58:51

date last changed

2025-10-14 10:20:47

@article{d3d2883b-8aff-4013-91af-a9c8fb579fd0,
  abstract     = {{The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is de scribed as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments show that SA-Q-learning converges more quickly than Q-learning or Boltzmann exploration, and that the search does not suffer of performance degradation due to excessive exploration.}},
  author       = {{Guo, MZ and Liu, Y and Malec, Jacek}},
  issn         = {{1083-4419}},
  keywords     = {{reinforcement learning; Q-learning; metropolis criterion; exploitation; exploration}},
  language     = {{eng}},
  number       = {{5}},
  pages        = {{2140--2143}},
  publisher    = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
  series       = {{IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics}},
  title        = {{A new Q-learning algorithm based on the Metropolis criterion}},
  url          = {{http://dx.doi.org/10.1109/TSMCB.2004.832154}},
  doi          = {{10.1109/TSMCB.2004.832154}},
  volume       = {{34}},
  year         = {{2004}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

A new Q-learning algorithm based on the Metropolis criterion