• Complain

Cazenave Tristan - Computer games: 5th Workshop on Computer Games, CGW 2016, and 5th Workshop on General Intelligence in Game-Playing Agents, GIGA 2016, held in conjunction with the 25th International Conference on

Here you can read online Cazenave Tristan - Computer games: 5th Workshop on Computer Games, CGW 2016, and 5th Workshop on General Intelligence in Game-Playing Agents, GIGA 2016, held in conjunction with the 25th International Conference on full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. City: Cham;New York;NY, year: 2017, publisher: Springer International Publishing, genre: Home and family. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Cazenave Tristan Computer games: 5th Workshop on Computer Games, CGW 2016, and 5th Workshop on General Intelligence in Game-Playing Agents, GIGA 2016, held in conjunction with the 25th International Conference on

Computer games: 5th Workshop on Computer Games, CGW 2016, and 5th Workshop on General Intelligence in Game-Playing Agents, GIGA 2016, held in conjunction with the 25th International Conference on: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Computer games: 5th Workshop on Computer Games, CGW 2016, and 5th Workshop on General Intelligence in Game-Playing Agents, GIGA 2016, held in conjunction with the 25th International Conference on" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Cazenave Tristan: author's other books


Who wrote Computer games: 5th Workshop on Computer Games, CGW 2016, and 5th Workshop on General Intelligence in Game-Playing Agents, GIGA 2016, held in conjunction with the 25th International Conference on? Find out the surname, the name of the author of the book and a list of all author's works by series.

Computer games: 5th Workshop on Computer Games, CGW 2016, and 5th Workshop on General Intelligence in Game-Playing Agents, GIGA 2016, held in conjunction with the 25th International Conference on — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Computer games: 5th Workshop on Computer Games, CGW 2016, and 5th Workshop on General Intelligence in Game-Playing Agents, GIGA 2016, held in conjunction with the 25th International Conference on" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make

Computer Games Workshop 2016
Springer International Publishing AG 2017
Tristan Cazenave , Mark H.M. Winands , Stefan Edelkamp , Stephan Schiffel , Michael Thielscher and Julian Togelius (eds.) Computer Games Communications in Computer and Information Science 10.1007/978-3-319-57969-6_1
NeuroHex: A Deep Q-learning Hex Agent
Kenny Young 1 , Gautham Vasan 1 and Ryan Hayward 1
(1)
Department of Computing Science, University of Alberta, Edmonton, Canada
Kenny Young
Email:
Abstract
DeepMinds recent spectacular success in using deep convolutional neural nets and machine learning to build superhuman level agentse.g. for Atari games via deep Q-learning and for the game of Go via other deep Reinforcement Learning methodsraises many questions, including to what extent these methods will succeed in other domains. In this paper we consider DQL for the game of Hex: after supervised initializing, we use self-play to train NeuroHex, an 11-layer convolutional neural network that plays Hex on the 13 Picture 1 13 board. Hex is the classic two-player alternate-turn stone placement game played on a rhombus of hexagonal cells in which the winner is whomever connects their two opposing sides. Despite the large action and state space, our system trains a Q-network capable of strong play with no search. After two weeks of Q-learning, NeuroHex achieves respective win-rates of 20.4% as first player and 2.1% as second player against a 1-s/move version of MoHex, the current ICGA Olympiad Hex champion. Our data suggests further improvement might be possible with more training time.
Motivation, Introduction, Background
1.1 Motivation
DeepMinds recent spectacular success in using deep convolutional neural nets and machine learning to build superhuman level agentse.g. for Atari games via deep Q-learning and for the game of Go via other deep Reinforcement Learning methodsraises many questions, including to what extent these methods will succeed in other domains. Motivated by this success, we explore whether DQL can work to build a strong network for the game of Hex.
1.2 The Game of Hex
Hex is the classic two-player connection game played on an Picture 2 rhombus of hexagonal cells. Each player is assigned two opposite sides of the board and a set of colored stones; in alternating turns, each player puts one of their stones on an empty cell; the winner is whomever joins their two sides with a contiguous chain of their stones. Draws are not possible (at most one player can have a winning chain, and if the game ends with the board full, then exactly one player will have such a chain), and for each Picture 3 board there exists a winning strategy for the 1st player [].
Despite its simple rules, Hex has deep tactics and strategy. Hex has served as a test bed for algorithms in artificial intelligence since Shannon and E.F. Moore built a resistance network to play the game [].
In this paper we consider Hex on the 13 13 board Fig Fig 1 The game of Hex 13 Related Work The two - photo 4 13 board (Fig. ).
Fig 1 The game of Hex 13 Related Work The two works that inspire this - photo 5
Fig. 1.
The game of Hex.
1.3 Related Work
The two works that inspire this paper are [], both from Google DeepMind.
[] introduces Deep Q-learning with Experience Replay. Q-learning is a reinforcement learning (RL) algorithm that learns a mapping from states to action values by backing up action value estimates from subsequent states to improve those in previous states. In Deep Q-learning the mapping from states to action values is learned by a Deep Neural network. Experience replay extends standard Q-learning by storing agent experiences in a memory buffer and sampling from these experiences every time-step to perform updates. This algorithm achieved superhuman performance on several classic Atari games using only raw visual input.
[] introduces AlphaGo, a Go playing program that combines Monte Carlo tree search with convolutional neural networks: one guides the search (policy network), another evaluates position quality (value network). Deep reinforcement learning (RL) is used to train both the value and policy networks, which each take a representation of the gamestate as input. The policy network outputs a probability distribution over available moves indicating the likelihood of choosing each move. The value network outputs a single scalar value estimating the expected win probability minus the expected loss probability for the - photo 6 , the expected win probability minus the expected loss probability for the current boardstate S . Before applying RL, AlphaGos network training begins with supervised mentoring: the policy network is trained to replicate moves from a database of human games. Then the policy network is trained by playing full games against past versions of their network, followed by increasing the probability of moves played by the winner and decreasing the probability of moves played by the loser. Finally the value network is trained by playing full games from various positions using the trained policy network, and performing a gradient descent update based on the observed game outcome. Temporal difference (TD) methodswhich update value estimates for previous states based on the systems own evaluation of subsequent states, rather than waiting for the true outcomeare not used.
An early example of applying RL with a neural network to games is TD-gammon []. There a network trained with TD methods to approximate state values achieved superhuman play. Recent advances in deep learning have opened the doors to apply such methods to more games.
1.4 Overview of this Work
In this work we explore the application of Deep Q-learning with Experience Replay, introduced in [], to Hex. There are several challenges involved in applying this method, so successful with Atari, to Hex. One challenge is that there are fewer available actions in Atari than in Hex (e.g. there are 169 possible initial moves in 13 Picture 7 13 Hex). Since Q-learning performs a maximization over all available actions, this large number might cause the noise in estimation to overwhelm the useful signal, resulting in catastrophic maximization bias. However in our work we found the use of a convolutional neural networkwhich by design learns features that generalize over spatial locationachieved good results.
Another challenge is that the reward signal in Hex occurs only at the end of a game, so (with respect to move actions) is infrequent, meaning that most updates are based only on network evaluations without immediate win/loss feedback. The question is whether the learning process will allow this end-of-game reward information to propagate back to the middle and early game. To address this challenge, we use supervised mentoring, training the network first to replicate the action values produced by a heuristic over a database of positions. Such training is faster than RL, and allows the middle and early game updates to be meaningful at the start of Q-learning, without having to rely on end-of-game reward propagating back from the endgame. As with AlphaGo [], we apply this heuristic only to initialize the network: the reward in our Q-learning is based only on the outcome of the game being played.
Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Computer games: 5th Workshop on Computer Games, CGW 2016, and 5th Workshop on General Intelligence in Game-Playing Agents, GIGA 2016, held in conjunction with the 25th International Conference on»

Look at similar books to Computer games: 5th Workshop on Computer Games, CGW 2016, and 5th Workshop on General Intelligence in Game-Playing Agents, GIGA 2016, held in conjunction with the 25th International Conference on. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Computer games: 5th Workshop on Computer Games, CGW 2016, and 5th Workshop on General Intelligence in Game-Playing Agents, GIGA 2016, held in conjunction with the 25th International Conference on»

Discussion, reviews of the book Computer games: 5th Workshop on Computer Games, CGW 2016, and 5th Workshop on General Intelligence in Game-Playing Agents, GIGA 2016, held in conjunction with the 25th International Conference on and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.