Application of Deep Q-learning for Vision Control on Atari Environments

Öhman, Jim

Application of Deep Q-learning for Vision Control on Atari Environments

Mark

Öhman, Jim ^LU (2021) FYTM03 20201
Computational Biology and Biological Physics - Has been reorganised

Abstract: The success of Reinforcement Learning (RL) has mostly been in artificial domains, with only some successful real-world applications. One of the reasons being that most real-world domains fail to satisfy a set of assumptions of RL theory.

In the past years, a popular way to gauge the performance of RL agents has been through a suite of Atari 2600 games. This suite has been used to benchmark the progress of building successively more intelligent agents. However, they do not capture all the challenges that make real-world tasks difficult for RL, such as having to learn and act with incomplete information.

This thesis modifies a set of Atari games to include the task of adaptive sensing for RL agents. The games are made partially... (More); The success of Reinforcement Learning (RL) has mostly been in artificial domains, with only some successful real-world applications. One of the reasons being that most real-world domains fail to satisfy a set of assumptions of RL theory.

In the past years, a popular way to gauge the performance of RL agents has been through a suite of Atari 2600 games. This suite has been used to benchmark the progress of building successively more intelligent agents. However, they do not capture all the challenges that make real-world tasks difficult for RL, such as having to learn and act with incomplete information.

This thesis modifies a set of Atari games to include the task of adaptive sensing for RL agents. The games are made partially observable by restricting the visible portion of the screen. The agents are then tasked to learn to control their vision while at the same time learn to play the game. This modification adds one of the extra challenges that are present in many real-world environments.

To solve these new tasks an algorithm based on a slight modification of Deep Q-learning is proposed, referred to as Myopic Deep Q-Learning (MyDQL). Furthermore, a comparison is made between two different network architectures for MyDQL, a feed-forward neural network, and a recurrent neural network.

It is shown that MyDQL can be successfully applied to the modified Atari games. Additionally, it is shown that using a recurrent neural network greatly enhances the performance of the agent on these tasks. Such an agent is able to achieve near-optimal performance on Pong, Breakout, and Space Invaders, with only 35% of the screen visible at any given time.

It is also shown that an agent with its visibility further reduced to 13% is still able to achieve impressive performance on these games. (Less)
Popular Abstract (Swedish): Vi människor, och djur i allmänhet, lär oss enormt mycket från att bara interagera med våra omgivningingar. Vi samlar ständigt in information om hur våra handlingar påverkar omgivningen, och justerar vårat beteende så att vi bättre uppfyller våra mål.

En gren inom maskininlärning, som kallas förstärkningsinlärning, är ett grupp metoder som beskriver en agent som kan lära sig att uppnå mål från erfarenheter. En sådan agent interagerar med sin omgivning genom olika handlingar och får samtidigt numeriska belöningar beroende på hur omgivningen påverkas. Det slutgiltiga målet för agenten är att lära sig ett beteende som samlar in mest belöning långsiktigt.

Dessa metoder har igen blivit ett hett ämne då det kombinerats med de nyaste... (More); Vi människor, och djur i allmänhet, lär oss enormt mycket från att bara interagera med våra omgivningingar. Vi samlar ständigt in information om hur våra handlingar påverkar omgivningen, och justerar vårat beteende så att vi bättre uppfyller våra mål.

En gren inom maskininlärning, som kallas förstärkningsinlärning, är ett grupp metoder som beskriver en agent som kan lära sig att uppnå mål från erfarenheter. En sådan agent interagerar med sin omgivning genom olika handlingar och får samtidigt numeriska belöningar beroende på hur omgivningen påverkas. Det slutgiltiga målet för agenten är att lära sig ett beteende som samlar in mest belöning långsiktigt.

Dessa metoder har igen blivit ett hett ämne då det kombinerats med de nyaste teknikerna inom djupinlärning. Denna kombination gjorde att förstärkningsinlärning kunde appliceras på större omgivningar och mer komplexa problem.

Majoriteten av applikationer inom förstärkningsinlärning har dock varit på artificiella omgivningar. En av anledningarna till varför är att verkliga omgivningar ofta bryter mot antagandet att omgivningen är helt observerbar för agenten. För att bättre kunna applicera förstärkningsinlärning på verkliga omgivningar krävs metoder som släpper på detta antagande.

Mycket av forskningen inom förstärkningsinlärning har de senaste åren gjorts på Atari 2600 spel. Dessa omgivningar testar en agents förmåga på en mängd olika utmaningar, och har används för att skapa agenter med bättre problemlösningsförmåga. Dessa spel har också fördelen att de är relativt enkla att implementera.

Denna masteruppsats introducerar en modifierad version av Atari 2600 spel inom vilket agenten inte har tillgång till hela spelets bildskärm utan enbart en rörlig reducerad del. Agenten måste därmed lära sig observera omgivningen på bästa möjliga sätt så att den även kan lära sig spela så bra som möjligt.

Agenter som lyckas prestera på dessa modifierade Atari 2600 spel kan möjligtvis appliceras på verkliga omgivningar som kräver samma typ av adaptiv observering. Detta skulle, till exempel, kunna vara inom robotik, där en robot använder sig av kameror för att känna av sin omgivning.

Denna masteruppsats introducerar även en metod kallad Myopic Deep Q-Learning (MyDQL), som gör det möjligt för en Atari agent att samtidigt lära sig observera och spela från sina erfarenheter.

Det visar sig att en sådan agent, som enbart har 35% av bildskärmen synlig, kan lära sig spela nästan optimalt på Atari 2600 spel som Pong, Breakout, och Space Invaders. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9041446

author

Öhman, Jim ^LU

supervisor

Mattias Ohlsson ^LU

organization

Computational Biology and Biological Physics - Has been reorganised

course

FYTM03 20201

year

2021

type

H2 - Master's Degree (Two Years)

subject

Physics and Astronomy

keywords

Reinforcement learning, Atari 2600, Deep Q-learning, Myopic Agents, Vision Control

language

English

id

9041446

date added to LUP

2021-03-08 14:38:22

date last changed

2021-03-08 14:38:22

@misc{9041446,
  abstract     = {{The success of Reinforcement Learning (RL) has mostly been in artificial domains, with only some successful real-world applications. One of the reasons being that most real-world domains fail to satisfy a set of assumptions of RL theory.

In the past years, a popular way to gauge the performance of RL agents has been through a suite of Atari 2600 games. This suite has been used to benchmark the progress of building successively more intelligent agents. However, they do not capture all the challenges that make real-world tasks difficult for RL, such as having to learn and act with incomplete information.

This thesis modifies a set of Atari games to include the task of adaptive sensing for RL agents. The games are made partially observable by restricting the visible portion of the screen. The agents are then tasked to learn to control their vision while at the same time learn to play the game. This modification adds one of the extra challenges that are present in many real-world environments.

To solve these new tasks an algorithm based on a slight modification of Deep Q-learning is proposed, referred to as Myopic Deep Q-Learning (MyDQL). Furthermore, a comparison is made between two different network architectures for MyDQL, a feed-forward neural network, and a recurrent neural network.

It is shown that MyDQL can be successfully applied to the modified Atari games. Additionally, it is shown that using a recurrent neural network greatly enhances the performance of the agent on these tasks. Such an agent is able to achieve near-optimal performance on Pong, Breakout, and Space Invaders, with only 35% of the screen visible at any given time.

It is also shown that an agent with its visibility further reduced to 13% is still able to achieve impressive performance on these games.}},
  author       = {{Öhman, Jim}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Application of Deep Q-learning for Vision Control on Atari Environments}},
  year         = {{2021}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Application of Deep Q-learning for Vision Control on Atari Environments