Distributed Dynamic Reinforcement of Efficient Outcomes in Multiagent Coordination and Network Formation

Chasparis, Georgios; Shamma, Jeff S.

Distributed Dynamic Reinforcement of Efficient Outcomes in Multiagent Coordination and Network Formation

Mark

Chasparis, Georgios ^LU and Shamma, Jeff S. (2012) In Dynamic Games and Applications 2(1). p.18-50

Abstract: We analyze reinforcement learning under so-called "dynamic reinforcement." In reinforcement learning, each agent repeatedly interacts with an unknown environment (i.e., other agents), receives a reward, and updates the probabilities of its next action based on its own previous actions and received rewards. Unlike standard reinforcement learning, dynamic reinforcement uses a combination of long-term rewards and recent rewards to construct myopically forward looking action selection probabilities. We analyze the long-term stability of the learning dynamics for general games with pure strategy Nash equilibria and specialize the results for coordination games and distributed network formation. In this class of problems, more than one stable... (More); We analyze reinforcement learning under so-called "dynamic reinforcement." In reinforcement learning, each agent repeatedly interacts with an unknown environment (i.e., other agents), receives a reward, and updates the probabilities of its next action based on its own previous actions and received rewards. Unlike standard reinforcement learning, dynamic reinforcement uses a combination of long-term rewards and recent rewards to construct myopically forward looking action selection probabilities. We analyze the long-term stability of the learning dynamics for general games with pure strategy Nash equilibria and specialize the results for coordination games and distributed network formation. In this class of problems, more than one stable equilibrium (i.e., coordination configuration) may exist. We demonstrate equilibrium selection under dynamic reinforcement. In particular, we show how a single agent is able to destabilize an equilibrium in favor of another by appropriately adjusting its dynamic reinforcement parameters. We contrast the conclusions with prior game theoretic results according to which the risk-dominant equilibrium is the only robust equilibrium when agents' decisions are subject to small randomized perturbations. The analysis throughout is based on the ODE method for stochastic approximations, where a special form of perturbation in the learning dynamics allows for analyzing its behavior at the boundary points of the state space. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/2435608

author

Chasparis, Georgios ^LU and Shamma, Jeff S.

organization

publishing date

2012

type

Contribution to journal

publication status

published

subject

Control Engineering

keywords

Evolutionary games, Reinforcement learning, Endogenous network formation, Dynamic reinforcement, Coordination games

in

Dynamic Games and Applications

volume

2

issue

1

pages

18 - 50

publisher

Springer

external identifiers

scopus:84869216520

ISSN

2153-0793

DOI

10.1007/s13235-011-0038-z

language

English

LU publication?

yes

additional info

key=chas_sham2012

id

f07fb3d4-0b20-4dc1-8d57-18ccc4f6bff1 (old id 2435608)

alternative location

http://www.springerlink.com/content/k4x1586588u02216/

date added to LUP

2016-04-01 10:06:37

date last changed

2025-10-14 13:17:06

@article{f07fb3d4-0b20-4dc1-8d57-18ccc4f6bff1,
  abstract     = {{We analyze reinforcement learning under so-called "dynamic reinforcement." In reinforcement learning, each agent repeatedly interacts with an unknown environment (i.e., other agents), receives a reward, and updates the probabilities of its next action based on its own previous actions and received rewards. Unlike standard reinforcement learning, dynamic reinforcement uses a combination of long-term rewards and recent rewards to construct myopically forward looking action selection probabilities. We analyze the long-term stability of the learning dynamics for general games with pure strategy Nash equilibria and specialize the results for coordination games and distributed network formation. In this class of problems, more than one stable equilibrium (i.e., coordination configuration) may exist. We demonstrate equilibrium selection under dynamic reinforcement. In particular, we show how a single agent is able to destabilize an equilibrium in favor of another by appropriately adjusting its dynamic reinforcement parameters. We contrast the conclusions with prior game theoretic results according to which the risk-dominant equilibrium is the only robust equilibrium when agents' decisions are subject to small randomized perturbations. The analysis throughout is based on the ODE method for stochastic approximations, where a special form of perturbation in the learning dynamics allows for analyzing its behavior at the boundary points of the state space.}},
  author       = {{Chasparis, Georgios and Shamma, Jeff S.}},
  issn         = {{2153-0793}},
  keywords     = {{Evolutionary games; Reinforcement learning; Endogenous network formation; Dynamic reinforcement; Coordination games}},
  language     = {{eng}},
  number       = {{1}},
  pages        = {{18--50}},
  publisher    = {{Springer}},
  series       = {{Dynamic Games and Applications}},
  title        = {{Distributed Dynamic Reinforcement of Efficient Outcomes in Multiagent Coordination and Network Formation}},
  url          = {{http://dx.doi.org/10.1007/s13235-011-0038-z}},
  doi          = {{10.1007/s13235-011-0038-z}},
  volume       = {{2}},
  year         = {{2012}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Distributed Dynamic Reinforcement of Efficient Outcomes in Multiagent Coordination and Network Formation