CN113222106A - Intelligent military chess deduction method based on distributed reinforcement learning - Google Patents
Intelligent military chess deduction method based on distributed reinforcement learning Download PDFInfo
- Publication number
- CN113222106A CN113222106A CN202110185566.6A CN202110185566A CN113222106A CN 113222106 A CN113222106 A CN 113222106A CN 202110185566 A CN202110185566 A CN 202110185566A CN 113222106 A CN113222106 A CN 113222106A
- Authority
- CN
- China
- Prior art keywords
- operator
- state
- variable
- network
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000002787 reinforcement Effects 0.000 title claims abstract description 27
- 230000009471 action Effects 0.000 claims abstract description 34
- 238000013528 artificial neural network Methods 0.000 claims abstract description 33
- 238000012549 training Methods 0.000 claims abstract description 26
- 230000006870 function Effects 0.000 claims abstract description 20
- 238000011156 evaluation Methods 0.000 claims abstract description 9
- 230000008569 process Effects 0.000 claims abstract description 9
- 210000002569 neuron Anatomy 0.000 claims description 24
- 239000008280 blood Substances 0.000 claims description 8
- 210000004369 blood Anatomy 0.000 claims description 8
- 238000004891 communication Methods 0.000 claims description 7
- 239000013598 vector Substances 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 4
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 230000003993 interaction Effects 0.000 claims description 3
- 238000013473 artificial intelligence Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000004576 sand Substances 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/60—Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
- A63F13/67—Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor adaptively or by learning from player actions, e.g. skill level adjustment or by storing successful combat sequences for re-use
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/80—Special adaptations for executing a specific game genre or game mode
- A63F13/822—Strategy games; Role-playing games
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/60—Methods for processing data by generating or executing the game program
- A63F2300/6027—Methods for processing data by generating or executing the game program using adaptive systems learning from user actions, e.g. for skill level adjustment
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/80—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game specially adapted for executing a specific type of game
- A63F2300/807—Role playing or strategy games
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Electrically Operated Instructional Devices (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides an intelligent war game deduction method based on distributed reinforcement learning, which comprises the steps of firstly determining state variables and action variables of a war game operator decision network; secondly, determining a Markov decision process according to the state input variable and the state output variable, and constructing a training pool for reinforcement learning for neural network training; and then establishing a neural network for each operator by adopting an Actor-Critic algorithm, taking the minimization of an evaluation function as a training target, training parameters of the neural network of each operator step by combining information in an experience pool, and finally accessing the trained Actor network into a chess deduction system, wherein each operator makes a corresponding decision according to the battlefield situation. The method establishes a decision neural network for each operator, and in battlefield situation description, except that conventional labels are used for describing battlefield situations, the image data is adopted for representing environment states and individual attributes which are difficult to quantify, so that effective decision results can be obtained more accurately aiming at various battlefield situations.
Description
Technical Field
The invention relates to the technical field of intelligent chess deduction, in particular to an intelligent chess deduction method based on distributed reinforcement learning.
Background
Wargame deduction is a military scientific tool which simulates each party of war confrontation, uses chessboard and chessman representing battlefield and military strength thereof to carry out logic deduction research and evaluation on the war process according to rules summarized from war experience, and is a round game problem. The war is moved into a sand table and a computer to construct a virtual battlefield, and the army obtains greater victory in the future war through simulation which is as close to actual war as possible, namely the significance of 'war deduction'. A chess is usually composed of 3 parts, a map (chessboard), derived pieces (operators) and an adjudication rule (derived rule). In modern chess games, electronic chess systems with computers as carriers are adopted more frequently.
As shown in fig. 1, a four-sided or hexagonal grid map is adopted for a general war game pursuit, and the landform, the landform or the height of the position is marked in each grid through different marks or colors. The war game is completed by at least two parties, and each party is equipped with chess pieces with approximately same general ability but different maneuvering ability, shooting mode and the like. Generally, there are roughly two tasks that a chess carries out: it can kill adversary and land occupation. In a large military chess map, valuable grids are distributed sparsely: usually, in thousands of grids, only tens of grids which have important reference significance for the strategy formulation at the current moment, and only the surrounding grids of the feasible range of each chessman.
The war game deduction is an effective tool for knowing and mastering future war, utilizes war games to deduce future operation actions in a virtual operation environment, is beneficial to trending toward profit and avoiding harm, and converts various operation ideas into actual action schemes. In the future, the concept of multi-domain battle will be further developed and further improved, the war deduction will continue to play an important role in the development of multi-domain battle, and the cooperation capability among various military species is improved through the combined deduction.
With the transformation from 'informatization' to 'intellectualization' in the new military revolution, the 'artificial intelligence + chess' will be applied more in the military field in the future. Therefore, it is necessary to research the strategy of war game deduction by using artificial intelligence technology.
The warrior and the like of the national defense university provide a war game deduction decision method frame based on reinforcement learning, a scheme of layered reinforcement learning is adopted, battlefield situations are described by using artificially designed labels and vectorization, however, in practical application, certain battlefield situations cannot be accurately quantized, so that an effective decision result cannot be obtained aiming at various battlefield situations by adopting the scheme of layered reinforcement learning; in addition, the existing wargame deduction decision method based on reinforcement learning adopts a unified reinforcement learning model, has large network scale and needs high-performance computer support.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a distributed intelligent chess deduction method based on reinforcement learning, a decision neural network is established for each operator, and in battlefield situation description, except that conventional labels are used for describing battlefield situations, image data are adopted for representing environment states and individual attributes which are difficult to quantify, so that effective decision results can be obtained more accurately aiming at various battlefield situations.
The technical scheme of the invention is as follows:
the intelligent war game deduction method based on distributed reinforcement learning comprises the following steps:
step 1: determining state variables and action variables of a chess operator decision network;
for operator i, the state variable SiThe method comprises the state information of an own operator, the information of an enemy operator, the position information of a control-taking point and the communication state of the current position of an operator i; action variable AiAn action taken by the operator i at the current moment;
step 2: in the algorithm training stage, the scores of each action of an operator are obtained as a reward function R by interacting with a chess deduction platform in the process of chess deduction;
and step 3: determining Markov decision process < S, A, R, gamma > according to the state input variable and the state output variable, wherein S is the state variable input of an operator, and A is the action variable output of the operator; r is a reward function, and gamma is a discount factor; the training pool for reinforcement learning is constructed as follows
Wherein,is the state variable input of the operator i at time t,is the maneuver variable output by the operator i at time t,is the reward value that the operator finds by the reward function at time t,for operator i in stateTaking actionA post-update state variable; in the algorithm training stage, information generated by interaction of each operator and the platform is stored in the operator experience pool and used for neural network training;
and 4, step 4: for each operator, establishing a neural network by adopting an Actor-Critic algorithm, wherein the input of the Actor network is state variable observation of the operator, and the output of the Actor is an action variable of the operator, the input of the Critic network is the state variable observation of the operator and the action variable of the operator, and the output of the Critic network is an evaluation function which is a difference value of a real reward value and an estimated reward value of the operator;
training parameters of each operator neural network step by taking the minimization of the evaluation function as a training target and combining the information in the experience pool until the network converges;
and then accessing the convergent Actor network obtained by training into a chess deduction system, and making a corresponding decision by each operator according to the battlefield situation.
Further, the perspective state of the current position of the operator i is represented by a perspective view, and the perspective view is an image formed by a range which can be viewed by the current position of the operator; and processing the perspective view through a convolution neural network to obtain the perspective state of the current position of the operator i.
Further, the state information of the operator of the own party comprises position, power and residual blood volume; the enemy operator information comprises position, operator type and residual blood volume.
Further, the action variable is a 4-dimensional vector formed by a maneuver, a maneuver position, an attack and an attack target.
Further, the reward function R comprises a contention control score RconJianfen RdesAnd the force of the arms is divided into Rrem
R=Rcon+Rdes+Rrem。
Furthermore, the Actor network is a three-layer fully-connected neural network, the number of neurons in the first layer is determined by the dimension of the input operator state variable, the second layer comprises 256 neurons, and the number of neurons in the third layer is determined by the dimension of the operator action variable.
Furthermore, the Critic network is a three-layer fully-connected neural network, the neuron number of the first layer is determined by a state variable dimension input by the Actor neural network and an action variable dimension output by the Actor neural network, the neuron number of the second layer comprises 128 neurons, and the neuron number of the third layer is 1.
Further, the parameters of each operator neural network are trained step by using a gradient descent method and a reverse gradient propagation method.
Advantageous effects
The distributed reinforcement learning method is adopted, one operator corresponds to one decision network, the network scale is small, the search space is small, and the migration is convenient.
According to the invention, the image information is also added into the state variable of the operator, so that the complex battlefield situation which cannot be simply quantized can be described, and an effective decision result can be obtained more accurately according to various battlefield situations.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1: a military chess map;
FIG. 2: the general view is schematic;
FIG. 3: an Actor-Critic algorithm operation flow chart;
FIG. 4: a decision flow chart of a reinforcement learning network in a war game deduction environment.
Detailed Description
The war game deduction is a typical incomplete information game problem, different confrontation strategies are mainly explored in a human-human battle mode at present, and the war game deduction has certain limitation. By using the artificial intelligence method based on distributed reinforcement learning, the invention designs the intelligent war game deduction scheme without excessive human intervention, thereby realizing intelligent war game confrontation.
The method comprises the following specific steps:
step 1: and determining the state variable and the action variable of the chess operator decision network.
For operator i, the state variable SiMainly comprises two parts. One part is obtained by interacting with a war game deduction and fight platform, and mainly comprises a vector formed by state information (position, power and residual blood volume) of an operator of the own party (including the operator of the own party), enemy operator information (position, operator type and residual blood volume), position information of a control-taking point and the like, and the other part also comprises a communication state (represented by a communication view) of the current position of the operator, because the terrains of all grids of a war game map are different, whether operators in different grids can mutually observe is called a communication relation, an image formed by the range in which each grid can communicate is a communication view, as shown in fig. 2, a red position is the current position of the operator, and a blue area is the communication range of the operator in the current position. And processing the perspective view by a convolutional neural network to obtain the perspective state of the current position of the operator. The two parts jointly form the state variable S of the operator ii. The state variable is used as the input of the strategy network, and the action variable A is outputiThe actions that should be taken by the operator i at the current moment mainly include maneuvering, shooting, and the like.
In the embodiment, the number of the own operator is 3, the number of the enemy operator is 3, and the observed number of the enemy operator is 1. Thus for the own-operator i, the state variable SiThe system is composed of 42-dimensional vectors consisting of state information (position, power and residual blood volume) of a self operator (including the self operator), enemy operator information (position, operator type and residual blood volume), position information of a control point, and through-view information (24-dimensional vectors obtained through dimension reduction processing of a convolutional neural network), wherein the states of unobserved enemy operators are all set to be 0, and the states are used as the input of a decision network. Action variable A of decision network outputiThe operator i is the action to be taken at the current moment, and is a 4-dimensional vector formed by the maneuvering, the maneuvering position, the attack and the attack target.
Step 2: in the algorithm training stage, each operator is obtained by interacting with a weapon and chess deduction platform in the weapon and chess deduction processThe score of the step movement is used as a reward function R, mainly comprising a rob-control score RconJianfen RdesAnd the force of the arms is divided into RremIs shown as follows
R=Rcon+Rdes+Rrem
The higher the R value is, the better the military chess game performance of the game is.
And step 3: determining a Markov decision process according to the state input variables and the state output variables, wherein the Markov decision process is expressed as follows:
<S,A,R,γ>
wherein, S is the state variable input of the operator in the step 1, and A is the action variable output of the operator in the step 1; r is the reward function in the step 2, gamma is a discount factor, and the value range of gamma belongs to [0, 1 ].
Based on this, a training pool of reinforcement learning is constructed as follows
Wherein,is the state variable input of the operator i at time t,is the maneuver variable output by the operator i at time t,is the reward value that the operator finds by the reward function at time t,for operator i in stateTaking actionThe post-updated state variables.
And in the algorithm training stage, information generated by interaction of each operator and the platform is stored in the operator experience pool and is used for neural network training.
And 4, step 4: the distributed learning method is adopted, each operator has a neural network, the neural networks are realized by adopting an Actor-criticic algorithm commonly used in reinforcement learning, the distributed learning method mainly comprises two neural networks, one neural network is an Actor network, the input is the state variable observation of the operator, and the output is the action of the operator; one is a Critic network, the input is the observation of the state variables of the operators and the action variables of the operators, and the output is an evaluation function, and the main process is shown in FIG. 3.
The input of the Actor network defining each operator is the state variable of the operator determined in step 1, the initial value of the state variable is the situation information represented by the initial position and the initial state of the operator, and the output is the action variable of the operator determined in step 1. The Actor network is a three-layer fully-connected neural network, the number of neurons in the first layer is determined by the dimension of an input state variable, in this embodiment, the number of neurons in the first layer is 42, the second layer comprises 256 neurons, the number of neurons in the third layer is determined by the dimension of an action variable of an operator, and in this embodiment, the number of neurons in the third layer is 4.
And defining the input of the Critic network of each operator as the input of the Actor network and the output of the Actor network, wherein the evaluation function of the output of the Critic network is the difference value of the real reward value calculated by the operator through the reward function in the step 2 and the estimated reward value of the Critic network. The Critic neural network is a three-layer fully-connected layer, the number of neurons in the first layer is determined by a state variable dimension input by the Actor neural network and an action variable dimension output by the Actor neural network, in this embodiment, the number of neurons in the first layer is 46, the number of neurons in the second layer includes 128 neurons, and the number of neurons in the third layer is 1.
And (4) training parameters of the operator neural network step by combining information in the experience pool and utilizing a gradient descent method and a reverse gradient propagation method until the network is converged by using the minimization of the evaluation function as a training target.
And then accessing the convergent Actor network obtained by training into a chess deduction system, and making a corresponding decision by each operator according to the battlefield situation.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art without departing from the principle and spirit of the present invention.
Claims (8)
1. An intelligent war game deduction method based on distributed reinforcement learning is characterized in that: the method comprises the following steps:
step 1: determining state variables and action variables of a chess operator decision network;
for operator i, the state variable SiThe method comprises the state information of an own operator, the information of an enemy operator, the position information of a control-taking point and the communication state of the current position of an operator i; action variable AiAn action taken by the operator i at the current moment;
step 2: in the algorithm training stage, the scores of each action of an operator are obtained as a reward function R by interacting with a chess deduction platform in the process of chess deduction;
and step 3: determining Markov decision process < S, A, R, gamma > according to the state input variable and the state output variable, wherein S is the state variable input of an operator, and A is the action variable output of the operator; r is a reward function, and gamma is a discount factor; the training pool for reinforcement learning is constructed as follows
Wherein,is the state variable input of the operator i at time t,is the maneuver variable output by the operator i at time t,is the reward value that operator i finds by the reward function at time t,for operator i in stateTaking actionA post-update state variable; in the algorithm training stage, information generated by interaction of each operator and the platform is stored in the operator experience pool and used for neural network training;
and 4, step 4: for each operator, establishing a neural network by adopting an Actor-Critic algorithm, wherein the input of the Actor network is state variable observation of the operator, and the output of the Actor is an action variable of the operator, the input of the Critic network is the state variable observation of the operator and the action variable of the operator, and the output of the Critic network is an evaluation function which is a difference value of a real reward value and an estimated reward value of the operator;
training parameters of each operator neural network step by taking the minimization of the evaluation function as a training target and combining the information in the experience pool until the network converges;
and then accessing the convergent Actor network obtained by training into a chess deduction system, and making a corresponding decision by each operator according to the battlefield situation.
2. The intelligent chess deduction method based on distributed reinforcement learning as claimed in claim 1, characterized in that: the perspective state of the current position of the operator i is represented by a perspective view, and the perspective view is an image formed by a range which can be viewed by the current position of the operator i; and processing the perspective view through a convolution neural network to obtain the perspective state of the current position of the operator i.
3. The intelligent chess deduction method based on distributed reinforcement learning as claimed in claim 1, characterized in that: the state information of the operator at the own side comprises position, motor power and residual blood volume; the enemy operator information comprises position, operator type and residual blood volume.
4. The intelligent chess deduction method based on distributed reinforcement learning as claimed in claim 1, characterized in that: the action variables are 4-dimensional vectors formed by maneuvering, maneuvering positions, attacks and attack targets.
5. The intelligent chess deduction method based on distributed reinforcement learning as claimed in claim 1, characterized in that: the reward function R comprises a contention control score RconJianfen RdesAnd the force of the arms is divided into Rrem
R=Rcon+Rdes+Rrem。
6. The intelligent chess deduction method based on distributed reinforcement learning as claimed in claim 1, characterized in that: the Actor network is a three-layer fully-connected neural network, the number of neurons in the first layer is determined by the dimension of an input operator state variable, the second layer comprises 256 neurons, and the number of neurons in the third layer is determined by the dimension of an action variable of an operator.
7. The intelligent chess deduction method based on distributed reinforcement learning as claimed in claim 1, characterized in that: the Critic network is a three-layer fully-connected neural network, the number of neurons in the first layer is determined by the dimension of a state variable input by an Actor neural network and the dimension of an action variable output by the Actor neural network, the number of neurons in the second layer comprises 128 neurons, and the number of neurons in the third layer is 1.
8. The intelligent chess deduction method based on distributed reinforcement learning as claimed in claim 1, characterized in that: and training the parameters of each operator neural network step by using a gradient descent method and a reverse gradient propagation method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110185566.6A CN113222106B (en) | 2021-02-10 | 2021-02-10 | Intelligent soldier chess deduction method based on distributed reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110185566.6A CN113222106B (en) | 2021-02-10 | 2021-02-10 | Intelligent soldier chess deduction method based on distributed reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113222106A true CN113222106A (en) | 2021-08-06 |
CN113222106B CN113222106B (en) | 2024-04-30 |
Family
ID=77084912
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110185566.6A Active CN113222106B (en) | 2021-02-10 | 2021-02-10 | Intelligent soldier chess deduction method based on distributed reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113222106B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113723013A (en) * | 2021-09-10 | 2021-11-30 | 中国人民解放军国防科技大学 | Multi-agent decision method for continuous space chess deduction |
CN114091358A (en) * | 2022-01-20 | 2022-02-25 | 浙江建木智能系统有限公司 | Estimation method, device and system based on deduction simulation target area situation |
CN114611669A (en) * | 2022-03-14 | 2022-06-10 | 三峡大学 | Intelligent strategy-chess-deducing decision-making method based on double experience pool DDPG network |
CN114662655A (en) * | 2022-02-28 | 2022-06-24 | 南京邮电大学 | Attention mechanism-based weapon and chess deduction AI hierarchical decision method and device |
CN114722998A (en) * | 2022-03-09 | 2022-07-08 | 三峡大学 | Method for constructing chess deduction intelligent body based on CNN-PPO |
CN114880955A (en) * | 2022-07-05 | 2022-08-09 | 中国人民解放军国防科技大学 | War and chess multi-entity asynchronous collaborative decision-making method and device based on reinforcement learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108647374A (en) * | 2018-03-22 | 2018-10-12 | 中国科学院自动化研究所 | Tank tactics Behavior modeling method and system and equipment in ground force's tactics war game game |
CN112131786A (en) * | 2020-09-14 | 2020-12-25 | 中国人民解放军军事科学院评估论证研究中心 | Target detection and distribution method and device based on multi-agent reinforcement learning |
US20200410351A1 (en) * | 2015-07-24 | 2020-12-31 | Deepmind Technologies Limited | Continuous control with deep reinforcement learning |
-
2021
- 2021-02-10 CN CN202110185566.6A patent/CN113222106B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200410351A1 (en) * | 2015-07-24 | 2020-12-31 | Deepmind Technologies Limited | Continuous control with deep reinforcement learning |
CN108647374A (en) * | 2018-03-22 | 2018-10-12 | 中国科学院自动化研究所 | Tank tactics Behavior modeling method and system and equipment in ground force's tactics war game game |
CN112131786A (en) * | 2020-09-14 | 2020-12-25 | 中国人民解放军军事科学院评估论证研究中心 | Target detection and distribution method and device based on multi-agent reinforcement learning |
Non-Patent Citations (3)
Title |
---|
LUNTONG LI ET AL.: ""Actor-Critic Learning Control Based on 2-Regularized Temporal-Difference Prediction With Gradient Correction"", 《IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 》, vol. 29, no. 12 * |
李琛: ""Actor-Critic框架下的多智能体决策方法及其 在兵棋上的应用"", 《系统工程与电子技术》 * |
李琛等: "Actor-Critic 框架下的多智能体决策方法及其在兵棋上的应用", 系统工程与电子技术, pages 1 - 4 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113723013A (en) * | 2021-09-10 | 2021-11-30 | 中国人民解放军国防科技大学 | Multi-agent decision method for continuous space chess deduction |
CN114091358A (en) * | 2022-01-20 | 2022-02-25 | 浙江建木智能系统有限公司 | Estimation method, device and system based on deduction simulation target area situation |
CN114662655A (en) * | 2022-02-28 | 2022-06-24 | 南京邮电大学 | Attention mechanism-based weapon and chess deduction AI hierarchical decision method and device |
CN114662655B (en) * | 2022-02-28 | 2024-07-16 | 南京邮电大学 | Attention mechanism-based method and device for deriving AI layering decision by soldier chess |
CN114722998A (en) * | 2022-03-09 | 2022-07-08 | 三峡大学 | Method for constructing chess deduction intelligent body based on CNN-PPO |
CN114722998B (en) * | 2022-03-09 | 2024-02-02 | 三峡大学 | Construction method of soldier chess deduction intelligent body based on CNN-PPO |
CN114611669A (en) * | 2022-03-14 | 2022-06-10 | 三峡大学 | Intelligent strategy-chess-deducing decision-making method based on double experience pool DDPG network |
CN114611669B (en) * | 2022-03-14 | 2023-10-13 | 三峡大学 | Intelligent decision-making method for chess deduction based on double experience pool DDPG network |
CN114880955A (en) * | 2022-07-05 | 2022-08-09 | 中国人民解放军国防科技大学 | War and chess multi-entity asynchronous collaborative decision-making method and device based on reinforcement learning |
CN114880955B (en) * | 2022-07-05 | 2022-09-20 | 中国人民解放军国防科技大学 | War and chess multi-entity asynchronous collaborative decision-making method and device based on reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN113222106B (en) | 2024-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113222106A (en) | Intelligent military chess deduction method based on distributed reinforcement learning | |
CN110991545B (en) | Multi-agent confrontation oriented reinforcement learning training optimization method and device | |
CN113893539B (en) | Cooperative fighting method and device for intelligent agent | |
CN115291625A (en) | Multi-unmanned aerial vehicle air combat decision method based on multi-agent layered reinforcement learning | |
CN105678030B (en) | Divide the air-combat tactics team emulation mode of shape based on expert system and tactics tactics | |
CN114460959A (en) | Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game | |
CN116596343A (en) | Intelligent soldier chess deduction decision method based on deep reinforcement learning | |
CN112215350A (en) | Smart agent control method and device based on reinforcement learning | |
Shao et al. | Cooperative reinforcement learning for multiple units combat in StarCraft | |
CN111723931B (en) | Multi-agent confrontation action prediction method and device | |
CN114722998B (en) | Construction method of soldier chess deduction intelligent body based on CNN-PPO | |
CN113282100A (en) | Unmanned aerial vehicle confrontation game training control method based on reinforcement learning | |
CN111437605B (en) | Method for determining virtual object behaviors and hosting virtual object behaviors | |
CN114662655B (en) | Attention mechanism-based method and device for deriving AI layering decision by soldier chess | |
CN113705828B (en) | Battlefield game strategy reinforcement learning training method based on cluster influence degree | |
CN114611664A (en) | Multi-agent learning method, device and equipment | |
CN111723941B (en) | Rule generation method and device, electronic equipment and storage medium | |
Zuo | A deep reinforcement learning methods based on deterministic policy gradient for multi-agent cooperative competition | |
Bian et al. | Cooperative strike target assignment algorithm based on deep reinforcement learning | |
CN118001744A (en) | Intelligent decision-making method, device and storage medium for deduction of chess | |
CN116679742B (en) | Multi-six-degree-of-freedom aircraft collaborative combat decision-making method | |
CN117151224A (en) | Strategy evolution training method, device, equipment and medium for strong random game of soldiers | |
CN114254722B (en) | Multi-intelligent-model fusion method for game confrontation | |
Pan et al. | An algorithm to estimate enemy's location in WarGame based on pheromone | |
CN117647994A (en) | Collaborative countermeasure method, device, equipment and storage medium for unmanned aerial vehicle cluster |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |