Continuous action reinforcement learning from a mixture of interpretable experts

Akrour, Riad; Tateo, Davide; Peters, Jan

Continuous action reinforcement learning from a mixture of interpretable experts

Mark

Akrour, Riad ; Tateo, Davide ^LU

and Peters, Jan (2022) In IEEE Transactions on Pattern Analysis and Machine Intelligence 44(10). p.6795-6806

Abstract: Reinforcement learning (RL) has demonstrated its ability to solve high dimensional tasks by leveraging non-linear function approximators. However, these successes are mostly achieved by 'black-box' policies in simulated domains. When deploying RL to the real world, several concerns regarding the use of a 'black-box' policy might be raised. In order to make the learned policies more transparent, we propose in this paper a policy iteration scheme that retains a complex function approximator for its internal value predictions but constrains the policy to have a concise, hierarchical, and human-readable structure, based on a mixture of interpretable experts. Each expert selects a primitive action according to a distance to a prototypical... (More); Reinforcement learning (RL) has demonstrated its ability to solve high dimensional tasks by leveraging non-linear function approximators. However, these successes are mostly achieved by 'black-box' policies in simulated domains. When deploying RL to the real world, several concerns regarding the use of a 'black-box' policy might be raised. In order to make the learned policies more transparent, we propose in this paper a policy iteration scheme that retains a complex function approximator for its internal value predictions but constrains the policy to have a concise, hierarchical, and human-readable structure, based on a mixture of interpretable experts. Each expert selects a primitive action according to a distance to a prototypical state. A key design decision to keep such experts interpretable is to select the prototypical states from trajectory data. The main technical contribution of the paper is to address the challenges introduced by this non-differentiable prototypical state selection procedure. Experimentally, we show that our proposed algorithm can learn compelling policies on continuous action deep RL benchmarks, matching the performance of neural network based policies, but returning policies that are more amenable to human inspection than neural network or linear-in-feature policies.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/266e057d-a685-4a83-9298-9d0d26baf9eb

author

Akrour, Riad ; Tateo, Davide ^LU

and Peters, Jan

publishing date

2022-10-01

type

Contribution to journal

publication status

published

subject

Computer Sciences

keywords

Interpretability, Mixture of experts, Reinforcement learning, Robotics

in

IEEE Transactions on Pattern Analysis and Machine Intelligence

volume

44

issue

10

pages

12 pages

publisher

IEEE - Institute of Electrical and Electronics Engineers Inc.

external identifiers

pmid:34375280
scopus:85138448448

ISSN

0162-8828

DOI

10.1109/TPAMI.2021.3103132

language

English

LU publication?

no

id

266e057d-a685-4a83-9298-9d0d26baf9eb

date added to LUP

2025-10-16 14:34:57

date last changed

2025-12-12 09:03:29

@article{266e057d-a685-4a83-9298-9d0d26baf9eb,
  abstract     = {{<p>Reinforcement learning (RL) has demonstrated its ability to solve high dimensional tasks by leveraging non-linear function approximators. However, these successes are mostly achieved by 'black-box' policies in simulated domains. When deploying RL to the real world, several concerns regarding the use of a 'black-box' policy might be raised. In order to make the learned policies more transparent, we propose in this paper a policy iteration scheme that retains a complex function approximator for its internal value predictions but constrains the policy to have a concise, hierarchical, and human-readable structure, based on a mixture of interpretable experts. Each expert selects a primitive action according to a distance to a prototypical state. A key design decision to keep such experts interpretable is to select the prototypical states from trajectory data. The main technical contribution of the paper is to address the challenges introduced by this non-differentiable prototypical state selection procedure. Experimentally, we show that our proposed algorithm can learn compelling policies on continuous action deep RL benchmarks, matching the performance of neural network based policies, but returning policies that are more amenable to human inspection than neural network or linear-in-feature policies.</p>}},
  author       = {{Akrour, Riad and Tateo, Davide and Peters, Jan}},
  issn         = {{0162-8828}},
  keywords     = {{Interpretability; Mixture of experts; Reinforcement learning; Robotics}},
  language     = {{eng}},
  month        = {{10}},
  number       = {{10}},
  pages        = {{6795--6806}},
  publisher    = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
  series       = {{IEEE Transactions on Pattern Analysis and Machine Intelligence}},
  title        = {{Continuous action reinforcement learning from a mixture of interpretable experts}},
  url          = {{http://dx.doi.org/10.1109/TPAMI.2021.3103132}},
  doi          = {{10.1109/TPAMI.2021.3103132}},
  volume       = {{44}},
  year         = {{2022}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Continuous action reinforcement learning from a mixture of interpretable experts