Learning Optimal Team-Decisions
(2022) 61st IEEE Conference on Decision and Control, CDC 2022 In Proceedings of the IEEE Conference on Decision and Control 2022-December. p.1441-1446- Abstract
In this paper, we linear quadratic team decision problems, where a team of agents minimizes a convex quadratic cost function over T time steps subject to possibly distinct linear measurements of the state of nature. We assume that the state of nature is a Gaussian random variable and that the agents do not know the cost function nor the linear functions mapping the state of nature to their measurements. We present a gradient-descent based algorithm with an expected regret of O(log(T)) for full information gradient feedback and O(√(T)) for bandit feedback. In the case of bandit feedback, the expected regret has an additional multiplicative term O(d) where d reflects the number of learned parameters.
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/a224e208-58a3-4e04-b1af-100b1557c40f
- author
- Kjellqvist, Olle LU and Gattami, Ather LU
- organization
- publishing date
- 2022
- type
- Chapter in Book/Report/Conference proceeding
- publication status
- published
- subject
- host publication
- 2022 IEEE 61st Conference on Decision and Control, CDC 2022
- series title
- Proceedings of the IEEE Conference on Decision and Control
- volume
- 2022-December
- pages
- 6 pages
- publisher
- IEEE - Institute of Electrical and Electronics Engineers Inc.
- conference name
- 61st IEEE Conference on Decision and Control, CDC 2022
- conference location
- Cancun, Mexico
- conference dates
- 2022-12-06 - 2022-12-09
- external identifiers
-
- scopus:85147018169
- ISSN
- 2576-2370
- 0743-1546
- ISBN
- 9781665467612
- DOI
- 10.1109/CDC51059.2022.9992786
- language
- English
- LU publication?
- yes
- id
- a224e208-58a3-4e04-b1af-100b1557c40f
- date added to LUP
- 2023-02-14 11:38:18
- date last changed
- 2024-04-04 16:31:49
@inproceedings{a224e208-58a3-4e04-b1af-100b1557c40f, abstract = {{<p>In this paper, we linear quadratic team decision problems, where a team of agents minimizes a convex quadratic cost function over T time steps subject to possibly distinct linear measurements of the state of nature. We assume that the state of nature is a Gaussian random variable and that the agents do not know the cost function nor the linear functions mapping the state of nature to their measurements. We present a gradient-descent based algorithm with an expected regret of O(log(T)) for full information gradient feedback and O(√(T)) for bandit feedback. In the case of bandit feedback, the expected regret has an additional multiplicative term O(d) where d reflects the number of learned parameters.</p>}}, author = {{Kjellqvist, Olle and Gattami, Ather}}, booktitle = {{2022 IEEE 61st Conference on Decision and Control, CDC 2022}}, isbn = {{9781665467612}}, issn = {{2576-2370}}, language = {{eng}}, pages = {{1441--1446}}, publisher = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}}, series = {{Proceedings of the IEEE Conference on Decision and Control}}, title = {{Learning Optimal Team-Decisions}}, url = {{http://dx.doi.org/10.1109/CDC51059.2022.9992786}}, doi = {{10.1109/CDC51059.2022.9992786}}, volume = {{2022-December}}, year = {{2022}}, }