MAIA: The role of innate behaviors when picking flowers in Minecraft with Q-learning

Siljebråt, Henrik

MAIA: The role of innate behaviors when picking flowers in Minecraft with Q-learning

Mark

Siljebråt, Henrik ^LU (2015) KOGM20 20151
Cognitive Science

Abstract: Recent advances in reinforcement learning research has achieved human level performance in playing video games (Mnih et al., 2015). This inspired me to understand the methods of reinforcement learning (RL) and investigate whether there is any basis for those methods in neurobiology and animal learning theories. The current study shows how RL is based on theories of animal conditioning and that there is solid evidence for neurobiological correlates with RL algorithms, primarily in the basal ganglia complex. This motivated a simple perceptron-based model of the basal ganglia called Q-tron, which utilizes the Q-learning algorithm. Additionally, I wanted to explore the hypothesis that adding an innate behavior to a Q-learning agent would... (More); Recent advances in reinforcement learning research has achieved human level performance in playing video games (Mnih et al., 2015). This inspired me to understand the methods of reinforcement learning (RL) and investigate whether there is any basis for those methods in neurobiology and animal learning theories. The current study shows how RL is based on theories of animal conditioning and that there is solid evidence for neurobiological correlates with RL algorithms, primarily in the basal ganglia complex. This motivated a simple perceptron-based model of the basal ganglia called Q-tron, which utilizes the Q-learning algorithm. Additionally, I wanted to explore the hypothesis that adding an innate behavior to a Q-learning agent would increase performance. Thus four different agents were tasked with picking red flowers in the video game Minecraft where performance was measured as quantity of actions needed to pick a flower. A “pure” Q-learner called PQ used only the Q- tron model. MAIA (Minecraft Artificial Intelligence Agent) used the Q-tron model together with an innate behavior causing it to try picking when it saw red. Two mechanisms of the innate behavior were tested, creating MAIA1 and MAIA2, respectively. The fourth agent called random walker (RW) chose actions at random and acted as a baseline performance measure. We show that both MAIA versions have better performance than PQ, and MAIA1 has performance comparable to RW. Additionally, we show a difference in performance between MAIA1 and MAIA2 and argue that this shows the importance of investigations into the precise mechanisms underlying innate behaviors in animals in order to understand learning in general. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/8051962

author

Siljebråt, Henrik ^LU

supervisor

Christian Balkenius ^LU

organization

Cognitive Science

alternative title

MAIA: Instinkters roll vid blomplockning i Minecraft med Q-inlärning

course

KOGM20 20151

year

2015

type

H2 - Master's Degree (Two Years)

subject

Science General

keywords

reinforcement learning, q-learning, minecraft, innate behavior, artificial intelligence, conditioning, neuroscience, dopamine, prediction error, cognition, learning, neural networks, cognitive science

language

English

id

8051962

date added to LUP

2015-11-18 10:55:12

date last changed

2015-11-18 10:55:12

@misc{8051962,
  abstract     = {{Recent advances in reinforcement learning research has achieved human level performance in playing video games (Mnih et al., 2015). This inspired me to understand the methods of reinforcement learning (RL) and investigate whether there is any basis for those methods in neurobiology and animal learning theories. The current study shows how RL is based on theories of animal conditioning and that there is solid evidence for neurobiological correlates with RL algorithms, primarily in the basal ganglia complex. This motivated a simple perceptron-based model of the basal ganglia called Q-tron, which utilizes the Q-learning algorithm. Additionally, I wanted to explore the hypothesis that adding an innate behavior to a Q-learning agent would increase performance. Thus four different agents were tasked with picking red flowers in the video game Minecraft where performance was measured as quantity of actions needed to pick a flower. A “pure” Q-learner called PQ used only the Q- tron model. MAIA (Minecraft Artificial Intelligence Agent) used the Q-tron model together with an innate behavior causing it to try picking when it saw red. Two mechanisms of the innate behavior were tested, creating MAIA1 and MAIA2, respectively. The fourth agent called random walker (RW) chose actions at random and acted as a baseline performance measure. We show that both MAIA versions have better performance than PQ, and MAIA1 has performance comparable to RW. Additionally, we show a difference in performance between MAIA1 and MAIA2 and argue that this shows the importance of investigations into the precise mechanisms underlying innate behaviors in animals in order to understand learning in general.}},
  author       = {{Siljebråt, Henrik}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{MAIA: The role of innate behaviors when picking flowers in Minecraft with Q-learning}},
  year         = {{2015}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

MAIA: The role of innate behaviors when picking flowers in Minecraft with Q-learning