Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Learning Optimal Team-Decisions

Kjellqvist, Olle LU orcid and Gattami, Ather LU (2022) 61st IEEE Conference on Decision and Control, CDC 2022 In Proceedings of the IEEE Conference on Decision and Control 2022-December. p.1441-1446
Abstract

In this paper, we linear quadratic team decision problems, where a team of agents minimizes a convex quadratic cost function over T time steps subject to possibly distinct linear measurements of the state of nature. We assume that the state of nature is a Gaussian random variable and that the agents do not know the cost function nor the linear functions mapping the state of nature to their measurements. We present a gradient-descent based algorithm with an expected regret of O(log(T)) for full information gradient feedback and O(√(T)) for bandit feedback. In the case of bandit feedback, the expected regret has an additional multiplicative term O(d) where d reflects the number of learned parameters.

Please use this url to cite or link to this publication:
author
and
organization
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
subject
host publication
2022 IEEE 61st Conference on Decision and Control, CDC 2022
series title
Proceedings of the IEEE Conference on Decision and Control
volume
2022-December
pages
6 pages
publisher
IEEE - Institute of Electrical and Electronics Engineers Inc.
conference name
61st IEEE Conference on Decision and Control, CDC 2022
conference location
Cancun, Mexico
conference dates
2022-12-06 - 2022-12-09
external identifiers
  • scopus:85147018169
ISSN
2576-2370
0743-1546
ISBN
9781665467612
DOI
10.1109/CDC51059.2022.9992786
language
English
LU publication?
yes
id
a224e208-58a3-4e04-b1af-100b1557c40f
date added to LUP
2023-02-14 11:38:18
date last changed
2024-04-04 16:31:49
@inproceedings{a224e208-58a3-4e04-b1af-100b1557c40f,
  abstract     = {{<p>In this paper, we linear quadratic team decision problems, where a team of agents minimizes a convex quadratic cost function over T time steps subject to possibly distinct linear measurements of the state of nature. We assume that the state of nature is a Gaussian random variable and that the agents do not know the cost function nor the linear functions mapping the state of nature to their measurements. We present a gradient-descent based algorithm with an expected regret of O(log(T)) for full information gradient feedback and O(√(T)) for bandit feedback. In the case of bandit feedback, the expected regret has an additional multiplicative term O(d) where d reflects the number of learned parameters.</p>}},
  author       = {{Kjellqvist, Olle and Gattami, Ather}},
  booktitle    = {{2022 IEEE 61st Conference on Decision and Control, CDC 2022}},
  isbn         = {{9781665467612}},
  issn         = {{2576-2370}},
  language     = {{eng}},
  pages        = {{1441--1446}},
  publisher    = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
  series       = {{Proceedings of the IEEE Conference on Decision and Control}},
  title        = {{Learning Optimal Team-Decisions}},
  url          = {{http://dx.doi.org/10.1109/CDC51059.2022.9992786}},
  doi          = {{10.1109/CDC51059.2022.9992786}},
  volume       = {{2022-December}},
  year         = {{2022}},
}