CN116881656A - Reinforced learning military chess AI system based on deep Monte Carlo - Google Patents

Reinforced learning military chess AI system based on deep Monte Carlo Download PDF

Info

Publication number
CN116881656A
CN116881656A CN202310825710.7A CN202310825710A CN116881656A CN 116881656 A CN116881656 A CN 116881656A CN 202310825710 A CN202310825710 A CN 202310825710A CN 116881656 A CN116881656 A CN 116881656A
Authority
CN
China
Prior art keywords
chess
current
module
player
military
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310825710.7A
Other languages
Chinese (zh)
Other versions
CN116881656B (en
Inventor
林文斌
吕航
王玮
杨雪晴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of South China
Original Assignee
University of South China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of South China filed Critical University of South China
Priority to CN202310825710.7A priority Critical patent/CN116881656B/en
Publication of CN116881656A publication Critical patent/CN116881656A/en
Application granted granted Critical
Publication of CN116881656B publication Critical patent/CN116881656B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses an enhanced learning military chess AI system based on deep Monte Carlo, belonging to the technical field of artificial intelligence AI game; the invention provides a reinforcement learning army chess AI system based on a deep Monte Carlo, and further provides a training method and an actual fight execution method matched with the system, and the advanced Monte Carlo method and a chess upper and lower limit evaluation algorithm are utilized, so that the expression of the army chess AI is greatly improved, the application prospect is good, and the blank of the army chess AI in the artificial intelligence field is filled. The invention solves the problems that the AI of the incomplete information game of the military chess is difficult to design and train, and the like, and improves the performance of the military chess when the AI fights human beings. Has great significance for training and researching the army chess lovers, and provides the man-machine fight and army chess AI self-play training function.

Description

Reinforced learning military chess AI system based on deep Monte Carlo
Technical Field
The invention relates to the technical field of artificial intelligence AI games, in particular to an AI system for reinforcement learning army chess based on deep Monte Carlo.
Background
In the field of artificial intelligence, combining deep learning with reinforcement learning has produced outstanding game-like artificial intelligence such as AlphaGo. A typical chess game, such as go or chess, is a full information game in that both players can observe the entire board. However, the army chess is a non-complete information game, and in the process of fight, one player cannot know the chess piece information of the other player, so that the design of the AI of the army chess becomes very challenging.
The problem can not be well solved at home and abroad at present, therefore, a reinforced learning army chess AI system based on the depth Monte Carlo and a training method thereof are designed, the expression of the army chess AI is greatly improved by using the depth Monte Carlo method and the chess upper and lower limit evaluation algorithm, the practical significance and the application prospect are good, and the blank of the army chess AI in the artificial intelligence field is filled.
Disclosure of Invention
The invention aims to provide a reinforced learning army chess AI system based on deep Monte Carlo, which improves the performance of army chess AI when fighting human beings, and provides an artificial intelligent opponent which is convenient and interesting for army chess lovers.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a deep monte carlo based reinforcement learning military chess AI system, the system comprising: the system comprises a military chess fight module, a military chess law generation module, a military chess characteristic acquisition module, a military chess law decision module and a decision evaluation module;
the army chess fight module is used for displaying the situation of both sides of the chess, executing and interacting decisions of both sides, and judging the fight result between the chesses;
the military chess method generating module is used for searching the current chess situation, giving out all feasible methods of the current player and sending the current player into the military chess characteristic acquisition module;
the army chess characteristic acquisition module acquires current own chess piece information, enemy chess piece information and latest two-step recruitment methods from the army chess fight module, acquires all feasible recruitment methods from the army chess recruitment method generation module, converts the data into a proper coding format as a state value and inputs the state value into the army chess recruitment method decision module;
the army chess method decision-making module is divided into a training stage and an actual fight stage;
wherein the training phase comprises: generating evaluation values of all feasible recruitment methods of the current player by adopting a deep Monte Carlo network decision technique according to the input state values, namely evaluating all the feasible recruitment methods of the player under the current situation, then selecting the recruitment method with the maximum evaluation value, continuously choosing until the game is ended to win or lose, training a decision network according to the final feedback information of a decision evaluation module, and optimizing parameters of the decision network;
the actual combat stage comprises the following steps: the fully trained decision model does not update network parameters any more, and an optimal recruitment method is selected by adopting a deep Monte Carlo decision technology according to the input state value;
the decision evaluation module evaluates the scores of all decisions in the whole game according to the final game result and generates feedback information to the military chess law-solicitation decision module.
Preferably, the specific implementation flow of the system comprises the following steps:
s1, a military chess fight module stores current chess game information in real time and executes and judges decisions of fight parties;
s2, analyzing and matching all possible chess playing methods of the current player by the military chess playing method generating module according to the current chess game information; the structure of the recruitment method comprises: the coordinates of the chessmen of the already side, the type of the chessmen of the already side, the coordinates of the target and the type of the coordinates of the target;
s3, the military chess characteristic acquisition module extracts all visible information under the current player view angle according to the current chess game information, wherein the visible information comprises own chess piece information, enemy chess piece information, the number of rounds of the non-occurrence of the chess, all the recruitment methods of the current player and the latest twenty rounds of both the two-party recruitment methods, and the information is used as the input of the military chess recruitment method decision module;
s4, the military chess method decision module evaluates each method under the current chess game through the deep Monte Carlo network according to the input state value, selecting a playing method, executing the playing method by the army chess fight module, switching to the view angle of another player, and returning to the step S2 until the army chess fight module judges that the game is ended;
s5, the military chess method assessment module optimizes the deep Monte Carlo network in the military chess method decision system by calculating MSE (mean square error) according to the collected fight data, the final result and the state value in the period.
Preferably, the piece information in S3 specifically includes: all chessmen current coordinates, chessman types, survival states, whether the chessman is positioned on a railway or not, and whether the chessman is positioned on camping or not; the chessmen are divided into survival chessmen and matrix death chessmen according to the survival state; for the gusted pawns, the current coordinates, the survival status, the railway-located and camping features are all 0, which is different from the survival pawns. For the type of the enemy chessman, the method is expressed by adopting a possibility method because of unknown chessman type. Specifically, the initial possibility is set according to the coordinates of the army chess in the initial stage, and the initial possibility is set according to the rules of opening the army chess respectively. And in the game process, updating the possibility according to the fight result and the movement behavior.
Preferably, the deep Monte Carlo network in S4 is an estimation network of the value network for actions and states; the evaluation network is used for generating a value function of a bidding decision, the network structure comprises an LSTM network for receiving the characteristics generated by the latest twenty steps of bidding, the output of the LSTM network is connected with the action state value and the chess game state value and is transmitted into the full-connection layer together to generate a plurality of evaluation values equal to the types of the bidding, and the information transmitted into the full-connection layer specifically comprises own chess piece information, enemy chess piece information, characteristic values generated by the latest twenty steps of bidding through the LSTM network and currently feasible bidding.
Preferably, the training method of the training stage specifically includes the following contents:
a1, establishing an experience pool B for two players 1 、B 2 The experience pool is a storage, input characteristics F of each round in two players' combat are respectively stored, the upper limit of the experience pool is set to S by people, and when the experience pool storage reaches S, training is carried out to empty the experience pool and storage is restarted;
a2, establishing decision networks Q1, Q2 for two players, wherein the decision networks can read the input characteristic F and give an estimated value, and select a law-inviting action a in a round t t The method of (1) is as follows:
a t =argmax a Q(s t ,a)p=(1-ε)
a t =random(s t ,a)p=ε
the prize value for each round is determined by the end result:
r t ←r t +γr t+1
wherein s is t Refers to the state of the chessboard under the current view angle of the player in the t-th round; r is (r) t Referring to the t-th round, the current player operates the obtained rewarding value; argmax a Refers to the current round chessboard state s t The action of selecting the maximum estimated value from the estimated values calculated by the decision network Q with the action set a is used as the current round action a t The method comprises the steps of carrying out a first treatment on the surface of the random refers to randomly selecting an action from the current round action set a as the current round action a t The method comprises the steps of carrying out a first treatment on the surface of the Epsilon refers to the exploration probability of the current round; p refers to the corresponding probability of the action method obtained through exploring probability calculation; gamma is a decay factor, meaning that the calculation of the current round of rewards is jointly determined by the current round of rewards and a decayed next round of rewards;
a3, learning an experience pool with the total times of T times, inputting the round characteristics accumulated by a game into the experience pool B after each game is finished, and learning once after the experience pool is full until the experience pool is learned for T times finally;
a4, in each round of learning, firstly obtaining the characteristic F in the experience pool B t Into the network Q to obtain an estimated value G t R is recorded with t The loss function is calculated by using the mean square error, and the problems of vanishing of the network Q learning rate, slow convergence and abnormal parameter updating are solved by using the Adam algorithm.
Preferably, the actual combat stage refers to man-machine combat, wherein 'man' of man-machine combat refers to human players of the military chess, and 'machine' refers to AI of the military chess; the specific execution flow comprises the following steps:
b1, determining a first hand and a second hand at a starting interface by a human player of the army chess, and arranging own army chess layout;
b2, playing chess by a first player, extracting all visible information under the current player view angle by a military chess characteristic acquisition module according to the current chess game information, wherein the visible information specifically comprises own chess information, enemy chess information, the number of non-generated rounds of playing, all the betting methods of the current player and the twenty latest rounds of two-party betting methods, and detecting whether the current player betting method violates a game rule or not by a military chess betting method generation module, and if so, the operation cannot be performed;
b3, the military chess fight module executes a first player fight method, judges a draught result according to rules, and updates information of chess bureaus of the two parties;
b4, backhand player playing chess, wherein the military chess characteristic acquisition module extracts all visible information under the current player view angle according to the current chess game information, and specifically comprises own chess information, enemy chess information, the number of non-occurrence rounds of the playing, all the betting methods of the current player and the twenty latest rounds of two-party betting methods, and the military chess betting method generation module detects whether the current player betting method violates a game rule or not, and if so, the military chess betting method cannot be operated;
b5, executing a back hand player recruitment method by the military chess fight module, judging a draft result according to rules, and updating chess office information of both parties;
and B6, if the game is not finished, returning to the execution step S2, otherwise, declaring the winner.
Preferably, the military chess AI also has a function of determining hands before hands, and if the military chess AI is a first-hand player, the first-hand player in the B2 is evaluated and determined by a depth monte carlo network of the military chess AI; if the military chess AI is a backhand player, in the step B4, the forehand player's playing method is evaluated and determined by the depth Monte Carlo network of the military chess AI.
Compared with the prior art, the invention provides a reinforcement learning military chess AI system based on deep Monte Carlo, which has the following beneficial effects:
the invention provides an AI system for reinforcement learning army chess based on deep Monte Carlo, and further provides a training method matched with the system and an actual fight execution method, so that the problems that the AI of an incomplete information game of army chess is difficult to design and train and the like are solved, and the performance of the army chess AI in fight against human beings is improved. Has great significance for training and researching the army chess lovers, and provides the man-machine fight and army chess AI self-play training function. Through the invention, the army chess lovers can easily, conveniently and efficiently grind own technology, relax tension mood and ease life pressure.
Drawings
FIG. 1 is a practical combat flow chart in the embodiment 1 of the present invention;
FIG. 2 is a schematic diagram of interaction behavior of each module of the military chess AI system in embodiment 1 of the invention;
fig. 3 is a schematic diagram of a deep monte carlo based recruitment assessment network in embodiment 1 of the present invention.
The reference numerals in the figures illustrate:
101. a military chess fight module; 102. a military chess method generating module; 103. the army chess characteristic acquisition module; 104. a military chess law decision module; 105. and a decision evaluation module.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments.
Example 1:
referring to fig. 1-2, the present invention provides a deep Monte Carlo based reinforcement learning military chess AI system, and the normal game process is shown in fig. 1.
S1, initializing a chess bureau, wherein a human player of the military chess decides a first hand and a second hand in a starting interface, and arranges own military chess layout;
s2, performing real-time fight interaction, sequentially playing chess by players, planning a fight method by using a deep Monte Carlo algorithm, judging fight results by a military chess fight module 101, and updating a real-time situation;
and S3, feature processing is inserted in the fight, and after the fight module 101 updates the situation, new features including own chess piece information, enemy chess piece information, the number of rounds of the non-occurrence, all the recruitment methods of the current player and the latest twenty rounds of both the recruitment methods are transmitted to the player who is going to play chess. Wherein the known information comprises all information of the chess pieces of the chess player, the coordinates of the visible enemy chess pieces and the necessary information which can be obtained according to the rules of the military chess. The unknown information includes guesses for the enemy's surviving pieces and dead pieces. After processing these features, new features are generated.
The interactive behavior of each module of the military chess AI system provided by the invention is shown in fig. 2, a military chess fight module 101 stores current chess game information in real time, performs and judges decisions of fight parties, transmits the chess game information to a military chess fight method generation module 102 and a military chess feature acquisition module 103, and sends training data to a decision evaluation module 105 after meeting requirements;
the military chess method generating module 102 receives the current chess information and generates all feasible chess methods to be sent to the military chess characteristic collecting module 103 for further processing.
And the military chess feature acquisition module 103 receives the law and chess information, refines the information into features and sends the features to the military chess law decision module 104.
The military chess method decision module 104 accepts the features and adopts the deep Monte Carlo network decision technology to generate evaluation values of all feasible methods of the current player, namely, the evaluation of all the feasible methods of the player in the current situation, and then selects the method with the largest evaluation value.
The decision evaluation module 105 receives the training data packaged by the military chess fight module 101 to evaluate when the game data is generated to a certain amount, and calculates the gradient of the loss value generated after the evaluation for parameter updating.
Specifically, the deep Monte Carlo algorithm is used for the evaluation of the recruitment method, and the LSTM network is used for receiving the last few steps of the recruitment method to generate the characteristics about the history period. The characteristic network structure is shown in fig. 3, the input layer is a full-connection layer, the input state value is the current coordinate of the chess piece, the type of the chess piece, the survival state, whether the chess piece is positioned on a railway, whether the chess piece is positioned on camping and the recruitment method, and the characteristics generated by the LSTM in the last step are also shown, the middle layer of the network is the full-connection layer, the final output layer is a softmax layer, and the output value is the evaluation value of the current environment and the recruitment method.
Adam algorithm used for parameter updating, and loss function is used for calculating evaluation value G t And the final game result r t Is a mean square error of (c).
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims (7)

1. A deep monte carlo based reinforcement learning army chess AI system, the system comprising: the system comprises a military chess fight module, a military chess law generation module, a military chess characteristic acquisition module, a military chess law decision module and a decision evaluation module;
the army chess fight module is used for displaying the situation of both sides of the chess, executing and interacting decisions of both sides, and judging the fight result between the chesses;
the military chess method generating module is used for searching the current chess situation, giving out all feasible methods of the current player and sending the current player into the military chess characteristic acquisition module;
the army chess characteristic acquisition module acquires current own chess piece information, enemy chess piece information and latest two-step recruitment methods from the army chess fight module, acquires all feasible recruitment methods from the army chess recruitment method generation module, converts the data into a proper coding format as a state value and inputs the state value into the army chess recruitment method decision module;
the army chess method decision-making module is divided into a training stage and an actual fight stage;
wherein the training phase comprises: generating evaluation values of all feasible recruitment methods of the current player by adopting a deep Monte Carlo network decision technique according to the input state values, namely evaluating all the feasible recruitment methods of the player under the current situation, then selecting the recruitment method with the maximum evaluation value, continuously choosing until the game is ended to win or lose, training a decision network according to the final feedback information of a decision evaluation module, and optimizing parameters of the decision network;
the actual combat stage comprises the following steps: the fully trained decision model does not update network parameters any more, and an optimal recruitment method is selected by adopting a deep Monte Carlo decision technology according to the input state value;
the decision evaluation module evaluates the scores of all decisions in the whole game according to the final game result and generates feedback information to the military chess law-solicitation decision module.
2. The reinforcement learning army chess AI system based on deep Monte Carlo of claim 1, wherein the system specific execution flow comprises the following steps:
s1, a military chess fight module stores current chess game information in real time and executes and judges decisions of fight parties;
s2, analyzing and matching all possible chess playing methods of the current player by the military chess playing method generating module according to the current chess game information; the structure of the recruitment method comprises: the coordinates of the chessmen of the already side, the type of the chessmen of the already side, the coordinates of the target and the type of the coordinates of the target;
s3, the military chess characteristic acquisition module extracts all visible information under the current player view angle according to the current chess game information, wherein the visible information comprises own chess piece information, enemy chess piece information, the number of rounds of the non-occurrence of the chess, all the recruitment methods of the current player and the latest twenty rounds of both the two-party recruitment methods, and the information is used as the input of the military chess recruitment method decision module;
s4, the military chess method decision module evaluates each method under the current chess game through the deep Monte Carlo network according to the input state value, selecting a playing method, executing the playing method by the army chess fight module, switching to the view angle of another player, and returning to the step S2 until the army chess fight module judges that the game is ended;
s5, the military chess method evaluation module optimizes the deep Monte Carlo network in the military chess method decision system by calculating the mean square error according to the collected fight data, the final result and the state value in the period.
3. The deep Monte Carlo based reinforcement learning military chess AI system of claim 2, wherein the chess piece information in S3 specifically comprises: all chessmen current coordinates, chessman types, survival states, whether the chessman is positioned on a railway or not, and whether the chessman is positioned on camping or not; the chessmen are divided into survival chessmen and matrix death chessmen according to the survival state; for the gusted pawns, the current coordinates, the survival status, the railway-located and camping features are all 0, which is different from the survival pawns.
4. The deep Monte Carlo based reinforcement learning military chess AI system of claim 2, wherein the deep Monte Carlo network in S4 is an evaluation network for the value network of actions and states; the evaluation network is used for generating a value function of a bidding decision, the network structure comprises an LSTM network for receiving the characteristics generated by the latest twenty steps of bidding, the output of the LSTM network is connected with the action state value and the chess game state value and is transmitted into the full-connection layer together to generate a plurality of evaluation values equal to the types of the bidding, and the information transmitted into the full-connection layer specifically comprises own chess piece information, enemy chess piece information, characteristic values generated by the latest twenty steps of bidding through the LSTM network and currently feasible bidding.
5. The reinforcement learning army chess AI system based on deep Monte Carlo of claim 1, wherein the training phase training method specifically comprises the following contents:
a1, establishing an experience pool B for two players 1 、B 2 The experience pool is a storage, input characteristics F of each round in two players' combat are respectively stored, the upper limit of the experience pool is set to S by people, and when the experience pool storage reaches S, training is carried out to empty the experience pool and storage is restarted;
a2, establishing decision networks Q1, Q2 for two players, wherein the decision networks can read the input characteristic F and give an estimated value, and select a law-inviting action a in a round t t The method of (1) is as follows:
a t =argmax a Q(s t ,a)p=(1-ε)
a t =random(s t ,a)p=ε
the prize value for each round is determined by the end result:
r t ←r t +γr t+1
wherein s is t Refers to the state of the chessboard under the current view angle of the player in the t-th round; r is (r) t Referring to the t-th round, the current player operates the obtained rewarding value; argmax a Refers to the current round chessboard state s t The action of selecting the maximum estimated value from the estimated values calculated by the decision network Q with the action set a is used as the current round action a t The method comprises the steps of carrying out a first treatment on the surface of the random refers to randomly selecting an action from the current round action set a as the current round action a t The method comprises the steps of carrying out a first treatment on the surface of the Epsilon refers to the exploration probability of the current round; p refers to the corresponding probability of the action method obtained through exploring probability calculation; gamma is a decay factor, meaning that the calculation of the current round of rewards is jointly determined by the current round of rewards and a decayed next round of rewards;
a3, learning an experience pool with the total times of T times, inputting the round characteristics accumulated by a game into the experience pool B after each game is finished, and learning once after the experience pool is full until the experience pool is learned for T times finally;
a4, in each round of learning, firstly obtaining the characteristic F in the experience pool B t Into the network Q to obtain an estimated value G t R is recorded with t The loss function is calculated by using the mean square error, and the problems of vanishing of the network Q learning rate, slow convergence and abnormal parameter updating are solved by using the Adam algorithm.
6. The reinforcement learning army chess AI system based on deep Monte Carlo of claim 1, wherein the actual combat stage is man-machine combat, wherein "man" of man-machine combat is an army chess human player and "machine" is an army chess AI; the specific execution flow comprises the following steps:
b1, determining a first hand and a second hand at a starting interface by a human player of the army chess, and arranging own army chess layout;
b2, playing chess by a first player, extracting all visible information under the current player view angle by a military chess characteristic acquisition module according to the current chess game information, wherein the visible information specifically comprises own chess information, enemy chess information, the number of non-generated rounds of playing, all the betting methods of the current player and the twenty latest rounds of two-party betting methods, and detecting whether the current player betting method violates a game rule or not by a military chess betting method generation module, and if so, the operation cannot be performed;
b3, the military chess fight module executes a first player fight method, judges a draught result according to rules, and updates information of chess bureaus of the two parties;
b4, backhand player playing chess, wherein the military chess characteristic acquisition module extracts all visible information under the current player view angle according to the current chess game information, and specifically comprises own chess information, enemy chess information, the number of non-occurrence rounds of the playing, all the betting methods of the current player and the twenty latest rounds of two-party betting methods, and the military chess betting method generation module detects whether the current player betting method violates a game rule or not, and if so, the military chess betting method cannot be operated;
b5, executing a back hand player recruitment method by the military chess fight module, judging a draft result according to rules, and updating chess office information of both parties;
and B6, if the game is not finished, returning to the execution step S2, otherwise, declaring the winner.
7. The reinforcement learning army chess AI system based on deep monte carlo of claim 6, wherein the army chess AI also has a function of determining a forehand and a backhand, and if the army chess AI is a forehand player, the forehand player's method of engagement in B2 is evaluated and determined by the deep monte carlo network of the army chess AI; if the military chess AI is a backhand player, in the step B4, the forehand player's playing method is evaluated and determined by the depth Monte Carlo network of the military chess AI.
CN202310825710.7A 2023-07-06 2023-07-06 Reinforced learning military chess AI system based on deep Monte Carlo Active CN116881656B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310825710.7A CN116881656B (en) 2023-07-06 2023-07-06 Reinforced learning military chess AI system based on deep Monte Carlo

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310825710.7A CN116881656B (en) 2023-07-06 2023-07-06 Reinforced learning military chess AI system based on deep Monte Carlo

Publications (2)

Publication Number Publication Date
CN116881656A true CN116881656A (en) 2023-10-13
CN116881656B CN116881656B (en) 2024-03-22

Family

ID=88263647

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310825710.7A Active CN116881656B (en) 2023-07-06 2023-07-06 Reinforced learning military chess AI system based on deep Monte Carlo

Country Status (1)

Country Link
CN (1) CN116881656B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160314580A1 (en) * 2013-11-06 2016-10-27 H. Lee Moffitt Cancer Center And Research Institute, Inc. Pathology case review, analysis and prediction
CN108985458A (en) * 2018-07-23 2018-12-11 东北大学 A kind of double tree monte carlo search algorithms of sequential synchronous game
CN110119804A (en) * 2019-05-07 2019-08-13 安徽大学 A kind of Ai Ensitan chess game playing algorithm based on intensified learning
US20190311220A1 (en) * 2018-04-09 2019-10-10 Diveplane Corporation Improvements To Computer Based Reasoning and Artificial Intellignence Systems
CN110555517A (en) * 2019-09-05 2019-12-10 中国石油大学(华东) Improved chess game method based on Alphago Zero
CN113599798A (en) * 2021-08-25 2021-11-05 上海交通大学 Chinese chess game learning method and system based on deep reinforcement learning method
CN114462566A (en) * 2022-02-25 2022-05-10 中国科学技术大学 Method for realizing real-time determination of optimal decision action by intelligent real-time decision system
CN114997054A (en) * 2022-05-31 2022-09-02 清华大学 Method and device for simulating chess playing of chess
CN115054906A (en) * 2022-06-30 2022-09-16 成都潜在人工智能科技有限公司 Chess and card reinforcement learning method, system and medium based on Monte Carlo sampling
CN116128060A (en) * 2023-02-18 2023-05-16 之江实验室 Chess game method based on opponent modeling and Monte Carlo reinforcement learning

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160314580A1 (en) * 2013-11-06 2016-10-27 H. Lee Moffitt Cancer Center And Research Institute, Inc. Pathology case review, analysis and prediction
US20190311220A1 (en) * 2018-04-09 2019-10-10 Diveplane Corporation Improvements To Computer Based Reasoning and Artificial Intellignence Systems
CN108985458A (en) * 2018-07-23 2018-12-11 东北大学 A kind of double tree monte carlo search algorithms of sequential synchronous game
CN110119804A (en) * 2019-05-07 2019-08-13 安徽大学 A kind of Ai Ensitan chess game playing algorithm based on intensified learning
CN110555517A (en) * 2019-09-05 2019-12-10 中国石油大学(华东) Improved chess game method based on Alphago Zero
CN113599798A (en) * 2021-08-25 2021-11-05 上海交通大学 Chinese chess game learning method and system based on deep reinforcement learning method
CN114462566A (en) * 2022-02-25 2022-05-10 中国科学技术大学 Method for realizing real-time determination of optimal decision action by intelligent real-time decision system
CN114997054A (en) * 2022-05-31 2022-09-02 清华大学 Method and device for simulating chess playing of chess
CN115054906A (en) * 2022-06-30 2022-09-16 成都潜在人工智能科技有限公司 Chess and card reinforcement learning method, system and medium based on Monte Carlo sampling
CN116128060A (en) * 2023-02-18 2023-05-16 之江实验室 Chess game method based on opponent modeling and Monte Carlo reinforcement learning

Also Published As

Publication number Publication date
CN116881656B (en) 2024-03-22

Similar Documents

Publication Publication Date Title
Lucas et al. Evolutionary computation and games
CN111841018A (en) Model training method, model using method, computer device and storage medium
CN111729300A (en) Monte Carlo tree search and convolutional neural network based bucket owner strategy research method
CN109453524A (en) A kind of method of object matching, the method for model training and server
Di Palma et al. Traditional wisdom and monte carlo tree search face-to-face in the card game scopone
Nam et al. Generation of diverse stages in turn-based role-playing game using reinforcement learning
Lockett et al. Evolving explicit opponent models in game playing
Ward et al. AI solutions for drafting in Magic: the Gathering
de Mesentier Silva et al. Generating beginner heuristics for simple texas hold'em
Vieira et al. Drafting in collectible card games via reinforcement learning
CN116881656B (en) Reinforced learning military chess AI system based on deep Monte Carlo
Lu et al. Danzero: Mastering guandan game with reinforcement learning
Phuc et al. Learning human-like behaviors using neuroevolution with statistical penalties
CN114404976A (en) Method and device for training decision model, computer equipment and storage medium
Mozgovoy et al. Building a believable and effective agent for a 3D boxing simulation game
Sure et al. A Deep Reinforcement Learning Agent for General Video Game AI Framework Games
Conradie et al. Training bao game-playing agents using coevolutionary particle swarm optimization
Tan et al. Adaptive game AI for Gomoku
Boney et al. Learning to play imperfect-information games by imitating an oracle planner
Perez-Liebana et al. Rolling horizon neat for general video game playing
Chen et al. A Novel Reward Shaping Function for Single-Player Mahjong
Wu et al. Valuation of Convolutional Neural Network in Intelligent Computer Games
Balla et al. PyTAG: Challenges and Opportunities for Reinforcement Learning in Tabletop Games
Karavolos Orchestrating the generation of game facets via a model of gameplay
Yang et al. Co-evolutionary learning with strategic coalition for multiagents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant