A new Q-learning algorithm based on the Metropolis criterion
(2004) In IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 34(5). p.2140-2143- Abstract
- The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is de scribed as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments... (More)
- The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is de scribed as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments show that SA-Q-learning converges more quickly than Q-learning or Boltzmann exploration, and that the search does not suffer of performance degradation due to excessive exploration. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/267109
- author
- Guo, MZ ; Liu, Y and Malec, Jacek LU
- organization
- publishing date
- 2004
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- reinforcement learning, Q-learning, metropolis criterion, exploitation, exploration
- in
- IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
- volume
- 34
- issue
- 5
- pages
- 2140 - 2143
- publisher
- IEEE - Institute of Electrical and Electronics Engineers Inc.
- external identifiers
-
- pmid:15503510
- wos:000223937400020
- scopus:4844223639
- ISSN
- 1083-4419
- DOI
- 10.1109/TSMCB.2004.832154
- language
- English
- LU publication?
- yes
- id
- d3d2883b-8aff-4013-91af-a9c8fb579fd0 (old id 267109)
- date added to LUP
- 2016-04-01 16:58:51
- date last changed
- 2022-04-15 08:15:20
@article{d3d2883b-8aff-4013-91af-a9c8fb579fd0, abstract = {{The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is de scribed as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments show that SA-Q-learning converges more quickly than Q-learning or Boltzmann exploration, and that the search does not suffer of performance degradation due to excessive exploration.}}, author = {{Guo, MZ and Liu, Y and Malec, Jacek}}, issn = {{1083-4419}}, keywords = {{reinforcement learning; Q-learning; metropolis criterion; exploitation; exploration}}, language = {{eng}}, number = {{5}}, pages = {{2140--2143}}, publisher = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}}, series = {{IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics}}, title = {{A new Q-learning algorithm based on the Metropolis criterion}}, url = {{http://dx.doi.org/10.1109/TSMCB.2004.832154}}, doi = {{10.1109/TSMCB.2004.832154}}, volume = {{34}}, year = {{2004}}, }