Nonconvergence to saddle boundary points under perturbed reinforcement learning

Chasparis, Georgios C.; Shamma, Jeff S.; Rantzer, Anders

Nonconvergence to saddle boundary points under perturbed reinforcement learning

Mark

Chasparis, Georgios C. ; Shamma, Jeff S. and Rantzer, Anders ^LU

(2015) In International Journal of Game Theory 44(3). p.667-699

Abstract: For several reinforcement learning models in strategic-form games, convergence to action profiles that are not Nash equilibria may occur with positive probability under certain conditions on the payoff function. In this paper, we explore how an alternative reinforcement learning model, where the strategy of each agent is perturbed by a strategy-dependent perturbation (or mutations) function, may exclude convergence to non-Nash pure strategy profiles. This approach extends prior analysis on reinforcement learning in games that addresses the issue of convergence to saddle boundary points. It further provides a framework under which the effect of mutations can be analyzed in the context of reinforcement learning.

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/7767746

author

Chasparis, Georgios C. ; Shamma, Jeff S. and Rantzer, Anders ^LU

organization

publishing date

2015

type

Contribution to journal

publication status

published

subject

Control Engineering

keywords

Learning in games, Reinforcement learning, Replicator dynamics

in

International Journal of Game Theory

volume

44

issue

3

pages

667 - 699

publisher

Springer

external identifiers

wos:000358664800007
scopus:84938261308

ISSN

1432-1270

DOI

10.1007/s00182-014-0449-3

language

English

LU publication?

yes

id

2cfd7fa5-07d9-40b0-bc26-e362cce96d14 (old id 7767746)

date added to LUP

2016-04-01 10:43:11

date last changed

2025-10-14 11:55:15

@article{2cfd7fa5-07d9-40b0-bc26-e362cce96d14,
  abstract     = {{For several reinforcement learning models in strategic-form games, convergence to action profiles that are not Nash equilibria may occur with positive probability under certain conditions on the payoff function. In this paper, we explore how an alternative reinforcement learning model, where the strategy of each agent is perturbed by a strategy-dependent perturbation (or mutations) function, may exclude convergence to non-Nash pure strategy profiles. This approach extends prior analysis on reinforcement learning in games that addresses the issue of convergence to saddle boundary points. It further provides a framework under which the effect of mutations can be analyzed in the context of reinforcement learning.}},
  author       = {{Chasparis, Georgios C. and Shamma, Jeff S. and Rantzer, Anders}},
  issn         = {{1432-1270}},
  keywords     = {{Learning in games; Reinforcement learning; Replicator dynamics}},
  language     = {{eng}},
  number       = {{3}},
  pages        = {{667--699}},
  publisher    = {{Springer}},
  series       = {{International Journal of Game Theory}},
  title        = {{Nonconvergence to saddle boundary points under perturbed reinforcement learning}},
  url          = {{http://dx.doi.org/10.1007/s00182-014-0449-3}},
  doi          = {{10.1007/s00182-014-0449-3}},
  volume       = {{44}},
  year         = {{2015}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Nonconvergence to saddle boundary points under perturbed reinforcement learning