CN111667043B

CN111667043B - Chess game playing method, system, terminal and storage medium

Info

Publication number: CN111667043B
Application number: CN202010429905.6A
Authority: CN
Inventors: 戚骁亚; 张校志
Original assignee: Ji Hua Laboratory
Current assignee: Ji Hua Laboratory
Priority date: 2020-05-20
Filing date: 2020-05-20
Publication date: 2023-09-19
Anticipated expiration: 2040-05-20
Also published as: CN111667043A

Abstract

The embodiment of the application relates to a chess game playing method, a chess game playing system, a chess game terminal and a chess game storage medium. The method comprises the following steps: optimizing an ant colony algorithm by utilizing a neural network, and performing self-playing of the chess game by combining the optimized ant colony algorithm and a tree search algorithm; performing iterative training on the neural network according to the self-playing data, and performing chess strength testing on the neural network obtained by the iterative training; when the neural network meets the preset condition of stopping training, the neural network with the chess force test winning rate reaching a preset value obtained by the last or last training is used as the optimal neural network for actually playing chess games. The embodiment of the application improves the searching capability and chess force performance of the ant colony algorithm, and widens the application of the ant colony algorithm in more chess games.

Description

Chess game playing method, system, terminal and storage medium

Technical Field

The embodiment of the application belongs to the technical field of artificial intelligence, and particularly relates to a chess game playing method, a chess game playing system, a chess game playing terminal and a chess game storage medium.

Background

In chess games, the force of an agent is determined by the ability of a search algorithm. The commonly used search algorithms include an ant colony algorithm, which is a simulated optimization algorithm that simulates the foraging behavior of ants, and a tree search algorithm, which was first proposed by italian students Dorigo, maniezzo et al in the nineties of the twentieth century. During research on ant foraging, they found that ant colony can always find the shortest route to food sources under different circumstances. Further studies, they considered that this is because ants release a substance, which may be called "pheromone", on their way through. Ants have a perception capability on 'pheromone', and can leave more 'pheromones' on the passing path to help subsequent ants to find objects while walking along the path with high 'pheromone' concentration, so that a positive feedback mechanism is formed, and an ant colony has an overall intelligent behavior. Algorithms that evolve inspired by ant colony foraging are traditionally referred to as ant colony algorithms. The algorithm has the characteristics of distributed computation, information positive feedback and heuristic search, and is essentially a heuristic global optimization algorithm in the evolutionary algorithm. However, due to the characteristics of the ant colony algorithm, the ant colony algorithm is easy to fall into a locally optimal solution, has too long time for converging to a globally optimal solution and the like when searching in a chess game, and cannot obtain good chess force performance.

Among the chess related algorithms, tree search algorithms such as the montecarbolt search algorithm have a very excellent performance. Compared with the ant colony algorithm, the Monte Carlo tree search algorithm has better search effect, but according to the basic form, the best action cannot be found in a limited time in some chess games which are not very complex. This is because the search space is too large and the key nodes cannot give reasonable estimates with a sufficient number of accesses. In addition, the method cannot have the characteristics of an ant colony algorithm based on the characteristics of tree search, such as network-like searching capability, and can not realize parallelization rapidly by only modifying the number of ants like the ant colony algorithm.

Recently, with the development of the field of deep learning and reinforcement learning, a well-known AlphaZero algorithm of Training action selection neural networks using look-ahead search (Application number: PCT/EP2018/063869,Publication number:WO2018/215665), journal paper of A general reinforcement learning algorithm that masters chess, shagi, and Go through self-play (DOI: 10.1126/science.aar6404), has appeared in board games, and its chess power performance has far exceeded that of humans. The alpha zero utilizes the combination of the Monte Carlo tree search algorithm and the deep convolution neural network, so that the search result can be used as the label data of the neural network, the neural network can be trained, and the next Monte Carlo tree search process is guided through the trained neural network, so that excellent performance is achieved in chess games such as go, chess and the like.

The core of the alpha zero algorithm is to utilize self-playing to generate chess data information, and then enhance the capability of the original Monte Carlo tree searching algorithm through a neural network. Therefore, in chess games, the neural network tree-adding search algorithm is a key to obtaining strong chess power. However, the tree search algorithm is more suitable for board games with tree-like decision features, and cannot be used in board games beyond this mode.

Disclosure of Invention

The application provides a chess game playing method, a chess game playing system, a chess game playing terminal and a chess game storage medium, which are used for solving the technical problems of low algorithm convergence speed, low chess strength, low algorithm efficiency and the like of a chess game ant colony algorithm in the prior art.

In order to solve the problems, the application provides the following technical scheme:

a method of playing a chess game comprising the steps of:

step a: optimizing an ant colony algorithm by utilizing a neural network, and performing self-playing of the chess game by combining the optimized ant colony algorithm and a tree search algorithm; wherein the self-playing comprises: using the chessboard state of the real chess game as a root node, expanding nodes in tree search by using the optimized ant colony algorithm, simulating the real chess game, selecting actions according to the simulated search probability, and executing the actions on the real chess game to obtain self-playing data of the real chess game;

step b: performing iterative training on the neural network according to the self-playing data, and performing chess strength testing on the neural network obtained by the iterative training;

step c: when the neural network meets the preset condition of stopping training, the neural network with the chess force test winning rate reaching a preset value obtained by the last or last training is used as the optimal neural network for actually playing chess games.

The technical scheme adopted by the embodiment of the application further comprises the following steps: in the step a, the neural network is a convolutional neural network, the convolutional neural network comprises three full convolutional network layers and three full connecting layers, the output of the three full convolutional network layers comprises a strategy network and two ends of a value network, the strategy network end is connected with a first full connecting layer, the value network end uses a filter to reduce the dimension and is connected with a second full connecting layer, and finally the strategy network end is output to a third full connecting layer.

The technical scheme adopted by the embodiment of the application further comprises the following steps: in the step a, the optimizing the ant colony algorithm by using the neural network specifically includes:

converting the win/lose information of the chess game into ant colony pheromone information;

initializing a new expansion node by using a pheromone;

automatically predicting tag data using the neural network;

the tag data comprises a priori probabilities given by the policy network and state values given by the value network.

The technical scheme adopted by the embodiment of the application further comprises the following steps: the automatically predicting tag data using the neural network includes:

calculating the action probability of ants by using the ant colony pheromone and the strategy network:

in the above formula, t represents time, a _t For the action probability of ants at time t, k represents the serial number of ants, τ represents pheromone, s _t Represents the state at time t, eta _net (s _t ,a _t ) The prior probability is given by the strategy network, and beta is a parameter for adjusting the weight of the prior probability; j (J) _k (s _t ) Is s for ant k _t A set of all actions in the state;

in the above formula, alpha is the volatilization parameter of pheromone, V _k The result is obtained after the kth ant finishes searching, m is the number of ants, and Q is the super parameter of the pheromone weight; training the value network according to the final chess game winning and losing results after each ant completes searching;

updating global pheromones of all nodes on a search path by using the value network:

the technical scheme adopted by the embodiment of the application further comprises the following steps: in the step a, the expanding nodes in the tree search by using the optimized ant colony algorithm includes:

taking the root node as a current node;

cloning the current chess game, and initializing a search path;

expanding the current node by utilizing the neural network, and selecting actions according to the state transition probability distribution of the ant colony algorithm;

executing the action on the cloned chess game, and returning the chessboard state of the cloned chess game at the next moment to serve as a new current node;

adding the new current node on the search path and re-expanding the new current node by using the neural network;

and after all the nodes are expanded, backtracking the search path and globally updating the pheromone.

The technical scheme adopted by the embodiment of the application further comprises the following steps: in the step b, the self-playing data comprises a chessboard state, search probability and a chess game winning or losing result of each step; the iterative training of the neural network according to the self-playing data comprises the following steps:

initializing a neural network and an optimizer;

sampling sample data from the self-playing data;

forward propagating each sample data by using the neural network, and accumulating a loss function;

adding a weight decay to the loss function;

counter-propagating a computational gradient through the neural network;

and updating the weight of the neural network by using the optimizer.

The technical scheme adopted by the embodiment of the application further comprises the following steps: in the step b, the iterative training of the neural network according to the self-playing data further includes:

judging whether the neural network in the iterative training process meets the preset condition for storing the neural network, if so, storing the neural network obtained by the latest training, and performing chess strength test on the neural network; otherwise, continuing to carry out iterative training on the neural network;

the preset conditions for meeting and storing the neural network are as follows:

the current node of the neural network is a preset check point, or the iterative training frequency of the neural network reaches a preset interval frequency.

The technical scheme adopted by the embodiment of the application further comprises the following steps: in the step c, wherein the step c, using the neural network with the chess force test winning rate obtained by the last or last training reaching a preset value as an optimal neural network, comprises the following steps:

judging whether the neural network meets the preset condition of stopping training, if not, continuing to perform iterative training on the neural network; if it is satisfied that the set of parameters,

judging whether the chess strength test winning rate of the neural network obtained by the last training reaches a preset value, and taking the neural network obtained by the last training as an optimal neural network if the chess strength test winning rate reaches the preset value; otherwise, selecting the neural network with the chess strength test winning rate reaching a preset value obtained by the last training as the optimal neural network.

The technical scheme adopted by the embodiment of the application further comprises the following steps: the meeting of the preset conditions for stopping training comprises the following steps:

the iterative training times of the neural network reach preset iterative times;

or when the chess force test reaches the preset times, the neural network acquires the full win.

The embodiment of the application adopts another technical scheme that: a chess game playing system comprising:

the ant colony algorithm optimization module: the method is used for optimizing the ant colony algorithm by utilizing the neural network, and performing self-playing of the chess game by combining the optimized ant colony algorithm and the tree search algorithm; wherein the self-playing comprises: using the chessboard state of the real chess game as a root node, expanding nodes in tree search by using the optimized ant colony algorithm, simulating the real chess game, selecting actions according to the simulated search probability, and executing the actions on the real chess game to obtain self-playing data of the real chess game;

and the network training module: the neural network is used for carrying out iterative training on the neural network according to the self-playing data;

the chess force testing module: the method is used for performing chess strength test on the neural network obtained through iterative training;

and an optimal network selection module: and when the neural network meets the preset condition of stopping training, taking the neural network with the chess force test winning rate reaching a preset value obtained by the last or last training as the optimal neural network for actually playing the chess game.

The embodiment of the application adopts the following technical scheme: a terminal comprising a processor, a memory coupled to the processor, wherein,

the memory stores program instructions for implementing the chess game playing method;

the processor is used for executing the program instructions stored in the memory to control the chess game to play.

The embodiment of the application adopts the following technical scheme: a storage medium storing program instructions executable by a processor for performing the chess game playing method.

Compared with the prior art, the embodiment of the application has the beneficial effects that: according to the chess game playing method, system, terminal and storage medium, based on the traditional ant colony algorithm, the traditional ant colony algorithm is optimized through the neural network, and the self-playing is performed by combining the relation among the value network, the strategy network and the ant colony pheromone and combining the tree search algorithm, so that the search capability and chess force performance of the ant colony algorithm are improved, and the net-shaped parallel search capability of the traditional ant colony algorithm is obtained. The embodiment of the application has the chess force of the hit-and-beat tree search algorithm while maintaining the advantages of the ant colony algorithm, and widens the application of the ant colony algorithm in more chess games.

Drawings

FIG. 1 is a flow chart of a method of playing a chess game according to a first embodiment of the present application;

FIG. 2 is a flow chart of a method of playing a chess game according to a second embodiment of the present application;

FIG. 3 is a flow chart of a neural network self-playing algorithm according to an embodiment of the present application;

FIG. 4 is a process diagram of expanding nodes in a tree search using an optimized ant colony algorithm in accordance with an embodiment of the present application;

FIG. 5 is a schematic diagram of a neural network iterative training algorithm according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a chess game playing system according to an embodiment of the present application;

fig. 7 is a schematic diagram of a terminal structure according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a storage medium according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

Aiming at the defects existing in the prior art, the chess game playing method of the embodiment of the application is based on the traditional ant colony algorithm, adopts an organic combination mode among ant colony pheromone, strategy network and value network, optimizes the searching efficiency of the ant colony algorithm by reducing the searching width of the strategy network trained by the neural network and reducing the searching depth of the value network, obtains the 'net' -shaped parallel searching capability of the traditional ant colony algorithm, and can beat the Monte Carlo tree searching algorithm under the condition of keeping the same searching times in the traditional chess game based on tree searching, and is suitable for the chess game which cannot be directly applied by the Monte Carlo tree searching algorithm.

Specifically, referring to fig. 1, a flowchart of a chess game playing method according to a first embodiment of the present application is shown. The chess game playing method of the first embodiment of the application comprises the following steps:

step 100: optimizing an ant colony algorithm by utilizing a neural network, and carrying out self-playing of the chess game by combining the optimized ant colony algorithm and a tree search algorithm;

in step 100, the optimization of the ant colony algorithm includes:

1. converting the win-lose information of the chess game into ant colony pheromone information;

2. initializing the pheromone only when the node is searched and expanded;

3. the label data is automatically predicted using a neural network.

The self-playing of the chess game comprises: the method comprises the steps of taking the chessboard state of a real chess game of a chess game as a root node, expanding nodes in tree search by utilizing an optimized ant colony algorithm, simulating the real chess game, selecting actions according to simulated search probabilities, and executing the actions on the real chess game to obtain self-playing data of the real chess game, wherein the self-playing data comprises the chessboard state, the search probabilities and the chess game winner and winner results of each step.

Step 110: performing iterative training on the neural network according to the self-playing data, and performing chess strength testing on the neural network obtained by the iterative training;

in step 110, the force test specifically includes: when the current node of the neural network is a preset check point or the iterative training per interval number of the neural network reaches the preset interval number, the neural network obtained through training is stored and chess strength testing is carried out. The chess force test mode specifically comprises the following steps: the optimized ant colony algorithm and the Monte Carlo tree search algorithm searching 1000-10000 times are played (ten rounds of playing are played alternately) and the winning rate is recorded, when the winning rate of the optimized ant colony algorithm reaches 100%, the searching times of the Monte Carlo tree search algorithm are increased by 1000 each time, and the playing is stopped until the searching times of the Monte Carlo tree search algorithm reach 10000, and the playing result is the force test result of the neural network of the embodiment of the application. It will be appreciated that the number of searches and the number of searches to be added to the Monte Carlo tree search algorithm may be set according to the actual application during the chess force test.

Step 120: when the neural network meets the preset condition of stopping training, the neural network with the chess force test winning rate reaching the preset value obtained in the last time or the last time is used as the optimal neural network for actually playing chess games.

Referring to fig. 2, a flow chart of a chess game playing method according to a second embodiment of the present application is shown. The chess game playing method of the second embodiment of the application comprises the following steps:

step 200: initializing neural network parameters;

in step 200, the neural network adopted in the embodiment of the present application is a conventional convolutional neural network, including three full convolutional network layers and three full connecting layers, where the three full convolutional network layers respectively include 32, 64 and 128 convolution kernels of 3*3, and are activated by using a ReLu function.

Specifically, the output of the three-layer full convolution network layer is divided into two ends of policy (policy network) and value (value network), 4 convolution kernels of 1*1 are used for dimension reduction at the policy end, the first layer full convolution network layer is connected, and the falling probability of each position on the chessboard is output by using a softmax function; at the value end, 2 filters (filters) of 1*1 are used for dimension reduction, a second full-connection layer comprising 64 neurons is connected, and finally the result is output to a third full-connection layer, and the situation scores between [ -1,1] are output by using a tanh nonlinear function. By adopting the neural network with the structure, the embodiment of the application can ensure that the training and the prediction in the chess game have high calculation efficiency.

In order to more clearly describe the application mode of the neural network in the embodiment of the application, a chessboard of 3*3 and a Tic-Tac-Toe (cross chess) game are taken as examples for illustration. Using four 3*3 binary feature planes, wherein the first two planes respectively represent the chess piece position of the current player and the chess piece position of the opposite player, the position with the chess piece is set to be 1, and the position without the chess piece is set to be 0; the third plane represents the last drop position of the opponent player, i.e. only one position of the whole plane is 1 and the rest are all 0. The fourth plane indicates whether the current player is a first-hand player, and if the current player is a first-hand player, the entire plane is all 1's, otherwise all 0's.

Step 210: optimizing an ant colony algorithm by utilizing a neural network algorithm, carrying out self-playing on the chess game by combining the optimized ant colony algorithm and a tree search algorithm, and storing generated self-playing data in a data pool;

in step 210, in order to make the ant colony algorithm more suitable for searching chess games, the embodiment of the application optimizes the traditional ant colony algorithm by combining a neural network algorithm, and the specific optimization mode comprises:

1. converting the win-lose information of the chess game into ant colony pheromone information; the method comprises the following steps: in a chess game, 0 (flat), -1 (negative), -1 (winning) is usually used to represent the winning or losing situation, and the setting of pheromone in the ant colony algorithm needs to be equal to or greater than zero, so in order to be able to apply the ant colony algorithm to a chess game, it is necessary to convert the winning or losing information of the chess into ant colony pheromone information, and the correlation between the chess outcome information and the ant colony pheromone information is obtained. Preferably, the conversion rule defined in the embodiment of the present application is: converting the win/lose information of the chess game into: 0.5 (flat), 0 (negative), and +1 (winning), i.e. the winning or losing information of the chess game is converted from-1, 0,1 to a positive value, namely the pheromone 0,0.5,1, and the specific numerical value can be set according to practical application.

2. Initializing the pheromone only when the node is searched and expanded; the method comprises the following steps: each node needs to be given an initialized pheromone amount when being unfolded, but because the search space is too huge, the pheromone of all the nodes is inconvenient to initialize at the beginning, and therefore, the embodiment of the application continuously explores the unfolded nodes in the search process and initializes the pheromone of each node when the node is newly unfolded.

3. Automatically predicting tag data using a neural network; the tag data comprises prior probability given by a strategy network and state value given by a value network, and the prediction of the tag data is specifically as follows:

(1) Combining ant colony pheromone and a strategy network to fit an optimal strategy to obtain an action probability function of ants, so as to guide the search of ant colony; the action probability function calculation formula is:

......（1）

in the formula (1), t represents time, a _t For the action probability of ants at time t, k represents the serial number of ants, τ represents pheromone, s _t Represents the state at time t, eta _net (s _t ,a _t ) The prior probability is given by the strategy network, and beta is a parameter for adjusting the weight of the prior probability; j (J) _k (s _t ) Is s for ant k _t The set of all actions in the state. For ant k, at time t, action probability a _t Obeying the distribution pk(s) _t ,a _t ) I.e. in state s _t Take action a _t Is a probability of (2). u is J _k (s _t ) Representing all optional actions.

The action probability formula is:

in the formula (2), alpha is the volatilization parameter of pheromone, V _k Is the result obtained after the kth ant finishes searching, m is the number of ants, and Q is the super parameter of the pheromone weight. After each ant completes the search, training the value network V according to the final chess winning and losing results _net 。

(2) Predicting the state value of the current state by using the value network, and updating the global pheromones of all nodes on the search path according to the state value predicted value:

further, the neural network for predicting tag data includes two parts, namely self-playing and training, please refer to fig. 3 together, which is a flowchart of the neural network self-playing algorithm according to an embodiment of the present application. The method specifically comprises the following steps:

step 211: acquiring the latest neural network theta++theta' from the storage area;

step 212: initializing a real chess game G ', and a time step number t' ≡0;

step 213: taking the chessboard state of the real chess game G' as a root node, and circularly executing the following steps until the chess game is finished:

step 2131: extending a current node (p, v) =f using a neural network _θ (S ₀ ) Increasing dirichlet noise P (S) on the prior probability of the current node ₀ ,a)＝(1-∈)p _a +∈η _a Wherein the proportion of added noise is 0.9;

step 2132: expanding nodes in tree search by using the optimized ant colony algorithm, and simulating a real chess game G' for a plurality of times; the simulation times of the chess game can be set according to practical application;

step 2133: selecting action a 'based on search probabilities of multiple simulations' _t′ ～π _t′ ；

Step 2134: action a 'is performed on real game G' _t′ And returns to the chessboard state S 'of the real chess at the next moment' _t′+1 ；

Step 214: after the game is finished, the self-playing data of the real game G 'is saved in the playback buffer, including the chessboard state, the search probability and the game win/lose of each step (S' _t′ ,π _t′ ,Z _t′ ) Etc.

Referring to fig. 4, a process diagram of expanding nodes in tree search by using an optimized ant colony algorithm according to an embodiment of the present application specifically includes:

the first step: taking the root node as a current node;

and a second step of: cloning the current chess game, and initializing a search path;

and a third step of: judging whether the current node is expanded, if so, executing a fourth step, otherwise, executing a fifth step;

fourth step: selecting actions according to the state transition probability distribution of the ant colony algorithm;

fifth step: performing actions on the cloned chess game, and returning to the chessboard state of the cloned chess game at the next moment to serve as a current node;

sixth step: adding a current node on the search path;

seventh step: expanding the newly added current node by utilizing a neural network;

eighth step: backtracking the search path, and accumulating pheromones on the path passing once according to the chess game win-lose result obtained by simulation.

Step 220: randomly selecting a part of data from the data pool as sample data, performing iterative training on the neural network through the sample data, and updating the parameters of the neural network;

in step 220, the iterative training algorithm of the neural network is shown in fig. 5, and specifically includes the following steps:

step 221: initializing a neural network and an optimizer;

step 222: sampling m sample data in batch in a playback buffer;

step 223: forward propagation is carried out on each sample data by using a neural network, and a loss function is accumulated;

step 224: for each weight of the neural network, adding a weight decay to the loss function;

step 225: counter-propagating the computed gradient through the neural network;

step 226: updating the weight of the neural network by using an optimizer;

step 227: judging whether the current neural network meets the preset conditions for storing the neural network, and executing step 228 if the current neural network meets the preset conditions; otherwise, step 222 is re-executed;

in this step, meeting the preset conditions for storing the neural network includes:

1. the current node is a preset check point; checkpoints are nodes which are arranged at intervals and are used for storing a neural network in the training process and testing chess strength of the neural network along with the increase of training times. The interval distance for setting checkpoints can be set according to practical application, and the embodiment of the application is preferably set to 50, namely, one checkpoint is set every 50 nodes.

2. The training times of the neural network reach the preset interval times; namely, when the training times of the neural network reach the preset interval times, the neural network is stored once and subjected to chess strength testing operation. Preferably, the preset interval number is set to 50, which can be specifically set according to practical applications.

Step 228: storing the current neural network in the shared storage area, and performing chess strength test on the current neural network;

in the step, the chess force testing mode of the neural network is specifically as follows: the optimized ant colony algorithm and the Monte Carlo tree search algorithm searching 1000-10000 times are played (ten rounds are played alternately) and the winning rate is recorded, and when the winning rate of the optimized ant colony algorithm reaches 100%, the searching times of the Monte Carlo tree search algorithm are increased by a first set time (preferably set to 1000) until the searching times of the Monte Carlo tree search algorithm reach a second set time (preferably set to 10000), the playing is stopped, and the playing result is the chess strength test result of the neural network. It will be appreciated that the number of searches and the number of searches to be added to the Monte Carlo tree search algorithm may be set according to the actual application during the chess force test. In another embodiment of the present application, the playing times of the ant colony algorithm and the monte carlo tree search algorithm may be set, and the playing is stopped when the playing times of the ant colony algorithm and the monte carlo tree search algorithm reach the set times.

It can be understood that the embodiment of the application stores the neural network and tests the chess strength by setting check points or preset interval times in the training process of the neural network, and can be used for evaluating whether the chess strength of the neural network is improved along with the training process.

Step 230: judging whether the neural network meets the preset condition of stopping training, if so, executing step 240, otherwise, continuing to execute step 220;

in this step, the meeting of the preset conditions for stopping training includes: the iterative training times reach the preset iterative times, or when the searching times of the Monte Carlo tree searching algorithm reach 10000 (or the playing times reach the preset times) in the chess strength test, the neural network of the embodiment of the application can obtain full wins.

Step 240: judging whether the chess force test winning rate of the neural network obtained by the last training reaches a preset value, if so, executing step 250, otherwise, executing step 260;

in step 240, the preset value is preferably set to 100%, which may be specifically set according to practical applications.

Step 250: taking the neural network obtained by the last training as an optimal neural network;

step 260: selecting a neural network with the chess force test winning rate reaching a preset value, which is obtained by the last training, as an optimal neural network;

in step 260, taking the preset value as 100% as an example, if the success rate of the neural network obtained by the latest training in the chess force test fails to reach 100%, the phenomenon that the neural network is over-fitted in the training is described, so that the neural network with the success rate of the latest chess force test of 100% is selected as the optimal neural network.

Step 270: performing actual chess playing by using the optimal neural network;

in step 270, during actual playing of the chess game, the current game situation is loaded into the neural network as input of the neural network, the policy probability distribution of the next action of the current game situation is obtained through the neural network, the next action with the highest probability is not necessarily selected for increasing exploratory performance during self-playing, and after adding certain noise, the next action with the highest probability is selected, and the above operation is repeated until the game is finished. In practical application, the chess force of an intelligent agent in a more complex chess game can be improved by increasing the number of ants in search, compared with an alpha zero algorithm, the embodiment of the application has a simpler parallel mode, and can be suitable for the chess game needing net search.

It will be appreciated that since the ant colony algorithm has many variants (elite ants, max-min ant colony, etc.), and the neural network has many variants (e.g. hierarchical structure change of the neural network, change of the number of neurons, etc.), if only part of the ant colony algorithm and the neural network algorithm are modified, the association relationship between the three in the embodiment of the present application is still included, and the algorithm is used as a search algorithm of chess intelligent body, which can be regarded as having similar characteristics as the embodiment of the present application. In addition, the inclusion of some algorithm hyper-parameters, as well as the use of different distribution functions as noise during the self-play process, should be seen as a feature consistent with embodiments of the present application.

Referring to fig. 6, a schematic diagram of a chess game playing system according to an embodiment of the present application is shown. The chess game playing system of the embodiment of the application comprises:

the ant colony algorithm optimization module: the method is used for optimizing the ant colony algorithm by utilizing the neural network, and performing self-playing of the chess game by combining the optimized ant colony algorithm and the tree search algorithm; wherein, chess game's self-playing includes: the method comprises the steps of taking the chessboard state of a real chess game of a chess game as a root node, expanding nodes in tree search by utilizing an optimized ant colony algorithm, simulating the real chess game, selecting actions according to simulated search probabilities, and executing the actions on the real chess game to obtain self-playing data of the real chess game, wherein the self-playing data comprises the chessboard state, the search probabilities and chess game winner and winner results of each step;

and the network training module: the method is used for carrying out iterative training on the neural network according to the self-playing data;

the chess force testing module: the method is used for performing chess force test on the neural network obtained through iterative training;

and an optimal network selection module: when the neural network meets the preset condition of stopping training, the neural network with the chess force test winning rate reaching the preset value obtained in the last or last training is used as the optimal neural network for actually playing chess games.

Fig. 7 is a schematic diagram of a terminal structure according to an embodiment of the application. The terminal 50 includes a processor 51, a memory 52 coupled to the processor 51.

The memory 52 stores program instructions for implementing the above-described chess game playing method.

The processor 51 is configured to execute program instructions stored in the memory 52 to control playing of the chess game.

The processor 51 may also be referred to as a CPU (Central Processing Unit ). The processor 51 may be an integrated circuit chip with signal processing capabilities. Processor 51 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Fig. 8 is a schematic structural diagram of a storage medium according to an embodiment of the application. The storage medium of the embodiment of the present application stores a program file 61 capable of implementing all the methods described above, where the program file 61 may be stored in the storage medium in the form of a software product, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes, or a terminal device such as a computer, a server, a mobile phone, a tablet, or the like.

According to the chess game playing method, system, terminal and storage medium, based on the traditional ant colony algorithm, the traditional ant colony algorithm is optimized through the neural network, and the self-playing is performed by combining the relation among the value network, the strategy network and the ant colony pheromone and combining the tree search algorithm, so that the search capability and chess force performance of the ant colony algorithm are improved, and the net-shaped parallel search capability of the traditional ant colony algorithm is obtained. The embodiment of the application has the chess force of the hit-and-beat tree search algorithm while maintaining the advantages of the ant colony algorithm, and widens the application of the ant colony algorithm in more chess games.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of playing a chess game, comprising the steps of:

step c: when the neural network meets the preset condition of stopping training, taking the neural network with the chess force test winning rate reaching a preset value obtained by the last or the last training as the optimal neural network for actually playing chess games;

the neural network is a convolutional neural network, the convolutional neural network comprises three full convolutional network layers and three full connection layers, the output of the three full convolutional network layers comprises a strategy network and two ends of a value network, the strategy network end is connected with a first full connection layer, the value network end uses a filter to reduce the dimension and is connected with a second full connection layer, and finally the strategy network end is output to a third full connection layer; in the step a, the optimizing the ant colony algorithm by using the neural network specifically includes:

initializing a new expansion node by using a pheromone;

automatically predicting tag data using the neural network,

the tag data comprises prior probability given by the strategy network and state value given by the value network;

the automatically predicting tag data using the neural network includes:

in the above formula, alpha is the volatilization parameter of pheromone, V _k The result is obtained after the kth ant finishes searching, m is the number of ants, and Q is the super parameter of the pheromone weight; when each ant is finishedTraining the value network according to the final chess game winning or losing results after searching;

2. a chess game as claimed in claim 1, wherein in said step a, said expanding nodes in a tree search using said optimized ant colony algorithm comprises:

taking the root node as a current node;

cloning chess bureau, and initializing a search path;

3. The method of playing a chess game as recited in claim 1, wherein in said step b, said self-playing data includes a board status, a search probability and a game win/lose result for each step; the iterative training of the neural network according to the self-playing data comprises the following steps:

initializing a neural network and an optimizer;

sampling sample data from the self-playing data;

adding a weight decay to the loss function;

counter-propagating a computational gradient through the neural network;

and updating the weight of the neural network by using the optimizer.

4. A chess game playing method as claimed in claim 3, wherein in said step b, said iterative training of said neural network in accordance with said self-playing data further comprises:

5. The chess game playing method according to claim 4, wherein in said step c, said neural network for achieving a predetermined value of the chess force test win rate obtained by the last or last training is regarded as an optimal neural network comprising:

6. A method of playing a board game as claimed in claim 5, wherein said meeting preset conditions for stopping training comprises:

or when the chess strength test reaches the preset test times, the neural network acquires the full win.

7. A chess game playing system, comprising:

the ant colony algorithm optimization module: the method is used for optimizing the ant colony algorithm by utilizing the neural network, and performing self-playing of the chess game by combining the optimized ant colony algorithm and the tree search algorithm; wherein the self-playing comprises: using the chessboard state of the real chess game as a root node, expanding nodes in tree search by using the optimized ant colony algorithm, simulating the real chess game, selecting actions according to the simulated search probability, and executing the actions on the real chess game to obtain the self-playing of the real chess game;

and an optimal network selection module: when the neural network meets the preset condition of stopping training, the neural network with the chess force test winning rate reaching the preset value obtained in the last or last training is used as the optimal neural network for actually playing chess games;

the neural network is a convolutional neural network, the convolutional neural network comprises three full convolutional network layers and three full connection layers, the output of the three full convolutional network layers comprises a strategy network and two ends of a value network, the strategy network end is connected with a first full connection layer, the value network end uses a filter to reduce the dimension and is connected with a second full connection layer, and finally the strategy network end is output to a third full connection layer;

the optimizing the ant colony algorithm by using the neural network specifically comprises the following steps:

initializing a new expansion node by using a pheromone;

automatically predicting tag data using the neural network,

the automatically predicting tag data using the neural network includes:

8. a terminal comprising a processor, a memory coupled to the processor, wherein,

the memory storing program instructions for implementing the chess game playing method of any one of claims 1-7;

9. A storage medium storing program instructions executable by a processor for performing the chess playing method of any one of claims 1 to 7.