AU2018101314A4 - A MCST and deep neural network based FIR battle platform - Google Patents

A MCST and deep neural network based FIR battle platform Download PDF

Info

Publication number
AU2018101314A4
AU2018101314A4 AU2018101314A AU2018101314A AU2018101314A4 AU 2018101314 A4 AU2018101314 A4 AU 2018101314A4 AU 2018101314 A AU2018101314 A AU 2018101314A AU 2018101314 A AU2018101314 A AU 2018101314A AU 2018101314 A4 AU2018101314 A4 AU 2018101314A4
Authority
AU
Australia
Prior art keywords
model
monte carlo
neural network
mcts
player
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2018101314A
Inventor
Kaidi Chen
Zeyuan Dai
Andi Liu
Lihang Liu
Lixian Zhao
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to AU2018101314A priority Critical patent/AU2018101314A4/en
Application granted granted Critical
Publication of AU2018101314A4 publication Critical patent/AU2018101314A4/en
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/60Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
    • A63F13/67Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor adaptively or by learning from player actions, e.g. skill level adjustment or by storing successful combat sequences for re-use
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computational Mathematics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Evolutionary Biology (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

This invention is a dual self-mode that combines the neural network and the Monte Carlo tree search. This model competes with itself as a simulation of playing against an actual person as a self-promoting process. In the neural network, we adopt a brand new combination of calculations that performs the value network to analyze the chess states, and the policy network to select the position of next chess piece. The function of the Monte Carlo tree search in this model is to utilize the Monte Carlo to assess the value of every state in the tree. This model effectively dramatically reduced the time consumed because when the neural network is being combined with the Monte Carlo tree search, the neural network will automatically select the optimal method to perform the simulation which enables the model to avoid Monte Carlo tree accordingly so the model does not need to simulate the entire process of each game in order to achieve a resourceful result. After finishing all the steps in the loop, the model will restart the loop from the beginning to initial the whole process again. As a result, each very next step is different from the previous one since the model is constantly improving itself through the process of repeating the loop.

Description

TITLE
A MCST and deep neural network based FIR battle platform
FIELD OF THE INVENTION
This invention relates to a FIR battle platform, with entertainment function and has some educational function, It can improving the feeling of players when they compete with Al robots, and players can learn some tips and skills of FIR through competitions.
BACKGROUND OF THE INVENTION
With the rapid development of the Internet, the number of electronic products on the market is increasing. As a result, reinforcement Learning has become a popular research spot in the analysis and prediction fields. In this invention, we use Reinforcement Learning to play the FIR(Five In A Line). Originally, many competitive algorithms used exhaustion method to calculate the next step in man-machine combat. However, this method will be limited if the board is relatively huge or the number of the pieces in a line is larger since the computer will consume more time to calculate. Therefore, in this case, the exhaustion method is not a very efficient method. Yet if the Al learns how to play the game and knows about the tips to win, it will become harder to defeat and the calculation time will be shortened as well.
2018101314 07 Sep 2018
Reinforcement Learning is an important machine learning method. Reinforcement Learning uses a reward signal that enables intelligence system itself to do the choice, and the system does not know any information at first. The system will receive a reward when it has completed a certain task, so in the next iteration, the system may do some changes to achieve a more reward according compare to the last reward. Thus the system will become smarter, and eventually be able to perform well in this job even better than human, just like AlphaGo.
In this invention, we plan to use The Monte Carlo search tree and Convolutional Neural Network to build a Reinforcement Learning system. As the Monte Carlo search tree constructing a set of rules to figure out the math problem and with the tree, we can use a computer to do the FIR experiments, allows the process become faster and easier. Besides, we will use the Convolutional Neural Network to analyze every step of the movement and evaluate the step since Convolutional Neural Network can use numerous layers and gradient descent method to approach the accurate value. By using these processes, the computer is able to simulate the thinking process as the human brain.
SUMMARY OF THE INVENTION
Our invention is design and implement an artificial intelligence program to play the simple board game Gomoku based on Monte Carlo tree search
2018101314 07 Sep 2018 and convolutional neural network. And we used no human experts’ knowledge to guide the Al which means that there is no supervised systems in our program.
Self-play is the key in our entire design. The machine play against itself and learn form self-play. We used the Monte Carlo tree search to simulate the self-play. And once we have new moves which represents the states of the board changed, we input the board state into a CNN. And the outputs from the CNN can then influence the Monte Carlo tree search. The
MCTS can produce self-paly data which can train the network and the network produce information to guide the MCTS. These procedures compose the entire loop of our program.
Monte Carlo tree search is used to decide which position the machine should take in next move. It can provide very powerful search ability and is quiet easy to implement. In our program, each node of the graph represents a certain state of the board. First we select a node which is a leaf node or represents the end of a game. When the node is selected and it does not represent the end of a game, we try to expand the node. It represent that we make the decision to take next move. And when the expansion is done, we upgrade the node and all its parents nodes. This is
2018101314 07 Sep 2018 a brief description of the Monte Carlo tree search we used in our program.
Once we upgraded the graph, we put the board states into our CNN program. The CNN takes a certain board states as inputs and output the probabilities of every next move and the evaluation of these certain board states. The probabilities of every next move is used in the Monte Carlo tree search that decide which node to select.
Our CNN is composed of 3 convolutional layers, each contains 32, 64 and 128 filters which are 3 by 3. Use ReLu activation function after these three convolutional layers then divide the network into two output layers, policy output and value output. The policy output used four 1 by 1 filter to decrease the dimension. Then apply one full-connected layer, then use softmax function to output the probabilities of every next position of next move. The value output used two 1 by 1 filter to decrease the dimension, the apply one full-connected layer then use tanh function to output the evaluation of the state of the board.
The board size is 8 by 8 and our Al program did very well when it played against a pure MCTS Al program. And then we tried to expand the board size, say 10 by 10.
2018101314 07 Sep 2018
DESCRIPTION OF THE DRAWINGS
The attached drawings serve a function as explanation and description, which consist of:
Fig.l -5 Flow chart;
DESCRIPTION OF PREFERRED EMBODIMENT
As shown in figure 1, before the training process begins, the system firstly initialized parameters which include the size of the chess board, learning frequency, buffer size, batch size, training duration and checking-frequency. After the initialization, the system enters the cycle to perform numerous times of self-play in order to collect board states data. The board states will also be rotated and mirrored by the system to generate completely identical states in a different direction, this process can successfully avoid overlap calculation as well as enlarging the database.
The data will then be processed in the buffer to determine whether the data size matches the batch size inside the buffer. If a mismatch occurs, the system will continue to collect self-play data. If two sizes do match, the data in the buffer will get randomly selected and process through the train policy value net several times. After the process, the value net will return loss and entropy value to adjust the learning frequency accordingly.
2018101314 07 Sep 2018
After the entire process is completed, we will compare our data with the data obtained from an individual Monte Carlo tree search through competition to test the result.
As shown in figure 2, during the updating process of training, the system will use the previous data set to calculate the previous probability and winning rate, then perform a cycle training. After the update, the system will use the same data set to calculate the new probability and winning rate according to the policy net. Then the system applies both new and old result to calculate the KL divergence, which will be used to assess the size of the learning rate and adjust it accordingly.
Figure 3 represents the application of the Monte Carlo tree search. During each tree search, the Monte Carlo tree will collect states then establish a new root according to the states collected. After that, the Monte Carlo tree will repeatedly search the average Q and U plus the maximum child until the leaf is being detected, after the leaf is detected, the tree will determine whether the game should stop at that specific leaf. If not, the child will be expanded, and the system will recursively update the visit times and Q of a parent through the value calculated from policy net. If the leaf does represent the end of the game, then a certain value will be applied to update the parent. After numerous tree search, one single
2018101314 07 Sep 2018
Monte Carlo tree is able to complete one round of a game.
Figure 4 shows the structure of the Convolutional Neural Network that inputs 8Λ8Λ4 size board states. The CNN is composed of 3 convolutional layers, each contains a 3 by 3 convolutional core and has 32, 64 and 128 filters respectively to gather a feature. After the process, CNN will apply different convolution layers to the probability and winning rate, which allows two values to be trained separately. Next, the two convolutions will get flatten and inputted into the full connected layer to obtain a new probability and winning rate. Each time the CNN completed a series of process, it will calculate the loss value, and then update its corresponding values.
THE FUCTION OF THE PLATFORM
Figure 5 shows the interactive platform in our project intends for both entertainment and learning. Users are allowed to select difficulty accordingly to optimize their experience. Besides, the platform also includes the hint system, whereas when users get confused by the game, they can enable the hint system that automatically recommends moves, indicated by pieces of different sizes and colors to show the level of recommendation. The platform also provides undo and redo functions, all dedicated to improving users learning and entertaining experience.
2018101314 07 Sep 2018
COMPARISON WITH OTHER MODULE pure MCTS VS neural network+ MCTS
The rounds of the game:20
The basic information
The times of the Monte Carlo searching:2000 ghtir
9s,p
The basic information
The times of the Monte Carlo searching: 800
It had already trained almost 2000 times.
cum nine. /o.joohzh b, piay_uuuni. iv average cost time: mcts player : 2.115824 s, pure mcts player : 13.560057 s cost time: 277.254147 s, play_count: 33 average cost time: mcts player : 2.254340 s, pure mcts player : 14.932645 s cost time: 77.882141 s, play_count: 10 average cost time: mcts player : 2.219113 s, pure mcts player : 13.356912 s cost time: 60.968000 s, playcount: 9 average cost time: mcts player : 1.869358 s, pure mcts player : 12.905060 s cost time: 442.511613 s, play_count: 64 average cost time: mcts player : 2.049253 s, pure mcts player : 11.778762 s cost time: 65.496403 s, play count: 9 average cost time: mcts player : 2.312566 s, pure mcts player : 13.483017 s cost time: 81.285615 s, play_count: 10 average cost time: mcts player : 2.275838 s, pure mcts player : 13.980887 s cost time: 64.253853 s, play_count: 9 average cost time: mcts player : 2.118642 s, pure mcts player : 13.414910 s cost time: 113.065399 s, play count: 14 average cost time: mcts player : 2.529724 s, pure mcts player : 13.622332 s cost time: 65.616047 s, play count: 9 average cost time: mcts player : 2.083739 s, pure mcts player : 13.799088 s cost time: 269.548855 s, play_count: 34 average cost time: mcts player : 2.127301 s, pure mcts player : 13.728455 s cost time: 67.489317 s, play_count: 9 average cost time: mcts player : 2.317580 s, pure mcts player : 13.975354 s cost time: 85.224424 s, play_count: 10 average cost time: mcts player : 2.110235 s, pure mcts player : 14.934650 s cost time: 66.184902 s, play count: 9 average cost time: mcts player : 2.591482 s, pure mcts player : 13.306873 s cost time: 166.066459 s, play count: 20 average cost time: mcts player : 2.527148 s, pure mcts player : 14.079400 s cost time: 65.603459 s, play count: 9 average cost time: mcts player : 2.111826 s, pure mcts player : 13.760837 s cost time: 282.713064 s, play_count: 33 average cost time: mcts player : 2.258475 s, pure mcts player : 14.504319 s cost time: 279.071302 s, play_count: 32
2018101314 07 Sep 2018 average cost time: mcts player : 2.286256 s, pure mcts player : 15.155575 s cost time: 146.410816 s, play count: 18 average cost time: mcts player : 2.509988 s, pure mcts player : 13.757546 s
The sum of the whole rounds:20
Win:2 Lose: 17 Tie:l
Win: 17 Lose:2 Tie:l
According to the game results collected, we can conclude that the MCTS player is prior during each match. The neural network combined with the MCTS can result in a much more intelligent Al. Besides the winning ratio, the average computational time of the player we generated is remarkably faster than the independent MCTS generated player. The principle of this experiment is that the neural network collaborates with the MCTS have significantly better efficiency compare to an independent model MCTS which requires a complete simulation of the entire game stage. In contrast, our combined model enables the Al to compute precisely and also reduce the training time consumed.

Claims (3)

  1. CLAMIS
    1. A method can be used for human-computer battle, in which is especially for children to play so that they can improve their intelligence from a very early age;
    1) Human players can adjust the Al player’s difficulty which for different varieties of people; there are three kinds of difficulties for people to choose, they are easy, middle and hard;
  2. 2) If human players put the chess on a wrong position, human players also has another chance to undo his last step and select other positions to put on; however, human players don’t want to undo this chess, they can also click “redo” to restore the composition;
  3. 3) Sometimes, human players don’t know how to go to the next step, some gray dots with different sizes will appear at this time; human players can choose these positions so that they will have much bigger winning rate.
AU2018101314A 2018-09-07 2018-09-07 A MCST and deep neural network based FIR battle platform Ceased AU2018101314A4 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2018101314A AU2018101314A4 (en) 2018-09-07 2018-09-07 A MCST and deep neural network based FIR battle platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2018101314A AU2018101314A4 (en) 2018-09-07 2018-09-07 A MCST and deep neural network based FIR battle platform

Publications (1)

Publication Number Publication Date
AU2018101314A4 true AU2018101314A4 (en) 2018-10-11

Family

ID=63719890

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2018101314A Ceased AU2018101314A4 (en) 2018-09-07 2018-09-07 A MCST and deep neural network based FIR battle platform

Country Status (1)

Country Link
AU (1) AU2018101314A4 (en)

Similar Documents

Publication Publication Date Title
Juliani et al. Obstacle tower: A generalization challenge in vision, control, and planning
Fogel et al. A self-learning evolutionary chess program
Karakovskiy et al. The mario ai benchmark and competitions
Aponte et al. Measuring the level of difficulty in single player video games
Świechowski et al. Recent advances in general game playing
Zhang et al. AlphaZero
Apeldoorn et al. Exception-tolerant hierarchical knowledge bases for forward model learning
Yin et al. A data-driven approach for online adaptation of game difficulty
CN113318451A (en) Chess self-learning method and device based on machine learning
AU2018101314A4 (en) A MCST and deep neural network based FIR battle platform
Ben-Assayag et al. Train on small, play the large: Scaling up board games with alphazero and gnn
Rodríguez et al. Parallel evolutionary approaches for game playing and verification using Intel Xeon Phi
CN116943222A (en) Intelligent model generation method, device, equipment and storage medium
Guei et al. 2048-like games for teaching reinforcement learning
Zhou et al. Discovering of game AIs’ characters using a neural network based AI imitator for AI clustering
Zook et al. Temporal game challenge tailoring
Cao et al. UCT-ADP Progressive Bias Algorithm for Solving Gomoku
Chia et al. Designing card game strategies with genetic programming and monte-carlo tree search: A case study of hearthstone
Adam Generalised Linear Model for Football Matches Prediction.
CN115175750A (en) AI-based game application content generation
Ferguson Machine learning arena
Chen et al. Chaos cards: Creating novel digital card games through grammatical content generation and meta-based card evaluation
Togelius et al. Evolutionary Machine Learning and Games
Ferrari et al. Towards playing Risk with a hybrid Monte Carlo based agent
Zook Automated iterative game design

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)
MK22 Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry