CN113222106A - Intelligent military chess deduction method based on distributed reinforcement learning - Google Patents

Intelligent military chess deduction method based on distributed reinforcement learning Download PDF

Info

Publication number
CN113222106A
CN113222106A CN202110185566.6A CN202110185566A CN113222106A CN 113222106 A CN113222106 A CN 113222106A CN 202110185566 A CN202110185566 A CN 202110185566A CN 113222106 A CN113222106 A CN 113222106A
Authority
CN
China
Prior art keywords
operator
state
variable
network
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110185566.6A
Other languages
Chinese (zh)
Other versions
CN113222106B (en
Inventor
彭星光
李亚男
宋保维
潘光
张福斌
高剑
李乐
张立川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202110185566.6A priority Critical patent/CN113222106B/en
Publication of CN113222106A publication Critical patent/CN113222106A/en
Application granted granted Critical
Publication of CN113222106B publication Critical patent/CN113222106B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/60Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
    • A63F13/67Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor adaptively or by learning from player actions, e.g. skill level adjustment or by storing successful combat sequences for re-use
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/80Special adaptations for executing a specific game genre or game mode
    • A63F13/822Strategy games; Role-playing games
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/60Methods for processing data by generating or executing the game program
    • A63F2300/6027Methods for processing data by generating or executing the game program using adaptive systems learning from user actions, e.g. for skill level adjustment
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/80Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game specially adapted for executing a specific type of game
    • A63F2300/807Role playing or strategy games
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an intelligent war game deduction method based on distributed reinforcement learning, which comprises the steps of firstly determining state variables and action variables of a war game operator decision network; secondly, determining a Markov decision process according to the state input variable and the state output variable, and constructing a training pool for reinforcement learning for neural network training; and then establishing a neural network for each operator by adopting an Actor-Critic algorithm, taking the minimization of an evaluation function as a training target, training parameters of the neural network of each operator step by combining information in an experience pool, and finally accessing the trained Actor network into a chess deduction system, wherein each operator makes a corresponding decision according to the battlefield situation. The method establishes a decision neural network for each operator, and in battlefield situation description, except that conventional labels are used for describing battlefield situations, the image data is adopted for representing environment states and individual attributes which are difficult to quantify, so that effective decision results can be obtained more accurately aiming at various battlefield situations.

Description

Intelligent military chess deduction method based on distributed reinforcement learning
Technical Field
The invention relates to the technical field of intelligent chess deduction, in particular to an intelligent chess deduction method based on distributed reinforcement learning.
Background
Wargame deduction is a military scientific tool which simulates each party of war confrontation, uses chessboard and chessman representing battlefield and military strength thereof to carry out logic deduction research and evaluation on the war process according to rules summarized from war experience, and is a round game problem. The war is moved into a sand table and a computer to construct a virtual battlefield, and the army obtains greater victory in the future war through simulation which is as close to actual war as possible, namely the significance of 'war deduction'. A chess is usually composed of 3 parts, a map (chessboard), derived pieces (operators) and an adjudication rule (derived rule). In modern chess games, electronic chess systems with computers as carriers are adopted more frequently.
As shown in fig. 1, a four-sided or hexagonal grid map is adopted for a general war game pursuit, and the landform, the landform or the height of the position is marked in each grid through different marks or colors. The war game is completed by at least two parties, and each party is equipped with chess pieces with approximately same general ability but different maneuvering ability, shooting mode and the like. Generally, there are roughly two tasks that a chess carries out: it can kill adversary and land occupation. In a large military chess map, valuable grids are distributed sparsely: usually, in thousands of grids, only tens of grids which have important reference significance for the strategy formulation at the current moment, and only the surrounding grids of the feasible range of each chessman.
The war game deduction is an effective tool for knowing and mastering future war, utilizes war games to deduce future operation actions in a virtual operation environment, is beneficial to trending toward profit and avoiding harm, and converts various operation ideas into actual action schemes. In the future, the concept of multi-domain battle will be further developed and further improved, the war deduction will continue to play an important role in the development of multi-domain battle, and the cooperation capability among various military species is improved through the combined deduction.
With the transformation from 'informatization' to 'intellectualization' in the new military revolution, the 'artificial intelligence + chess' will be applied more in the military field in the future. Therefore, it is necessary to research the strategy of war game deduction by using artificial intelligence technology.
The warrior and the like of the national defense university provide a war game deduction decision method frame based on reinforcement learning, a scheme of layered reinforcement learning is adopted, battlefield situations are described by using artificially designed labels and vectorization, however, in practical application, certain battlefield situations cannot be accurately quantized, so that an effective decision result cannot be obtained aiming at various battlefield situations by adopting the scheme of layered reinforcement learning; in addition, the existing wargame deduction decision method based on reinforcement learning adopts a unified reinforcement learning model, has large network scale and needs high-performance computer support.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a distributed intelligent chess deduction method based on reinforcement learning, a decision neural network is established for each operator, and in battlefield situation description, except that conventional labels are used for describing battlefield situations, image data are adopted for representing environment states and individual attributes which are difficult to quantify, so that effective decision results can be obtained more accurately aiming at various battlefield situations.
The technical scheme of the invention is as follows:
the intelligent war game deduction method based on distributed reinforcement learning comprises the following steps:
step 1: determining state variables and action variables of a chess operator decision network;
for operator i, the state variable SiThe method comprises the state information of an own operator, the information of an enemy operator, the position information of a control-taking point and the communication state of the current position of an operator i; action variable AiAn action taken by the operator i at the current moment;
step 2: in the algorithm training stage, the scores of each action of an operator are obtained as a reward function R by interacting with a chess deduction platform in the process of chess deduction;
and step 3: determining Markov decision process < S, A, R, gamma > according to the state input variable and the state output variable, wherein S is the state variable input of an operator, and A is the action variable output of the operator; r is a reward function, and gamma is a discount factor; the training pool for reinforcement learning is constructed as follows
Figure RE-GDA0003110820020000021
Wherein,
Figure RE-GDA0003110820020000022
is the state variable input of the operator i at time t,
Figure RE-GDA0003110820020000023
is the maneuver variable output by the operator i at time t,
Figure RE-GDA0003110820020000024
is the reward value that the operator finds by the reward function at time t,
Figure RE-GDA0003110820020000027
for operator i in state
Figure RE-GDA0003110820020000025
Taking action
Figure RE-GDA0003110820020000026
A post-update state variable; in the algorithm training stage, information generated by interaction of each operator and the platform is stored in the operator experience pool and used for neural network training;
and 4, step 4: for each operator, establishing a neural network by adopting an Actor-Critic algorithm, wherein the input of the Actor network is state variable observation of the operator, and the output of the Actor is an action variable of the operator, the input of the Critic network is the state variable observation of the operator and the action variable of the operator, and the output of the Critic network is an evaluation function which is a difference value of a real reward value and an estimated reward value of the operator;
training parameters of each operator neural network step by taking the minimization of the evaluation function as a training target and combining the information in the experience pool until the network converges;
and then accessing the convergent Actor network obtained by training into a chess deduction system, and making a corresponding decision by each operator according to the battlefield situation.
Further, the perspective state of the current position of the operator i is represented by a perspective view, and the perspective view is an image formed by a range which can be viewed by the current position of the operator; and processing the perspective view through a convolution neural network to obtain the perspective state of the current position of the operator i.
Further, the state information of the operator of the own party comprises position, power and residual blood volume; the enemy operator information comprises position, operator type and residual blood volume.
Further, the action variable is a 4-dimensional vector formed by a maneuver, a maneuver position, an attack and an attack target.
Further, the reward function R comprises a contention control score RconJianfen RdesAnd the force of the arms is divided into Rrem
R=Rcon+Rdes+Rrem
Furthermore, the Actor network is a three-layer fully-connected neural network, the number of neurons in the first layer is determined by the dimension of the input operator state variable, the second layer comprises 256 neurons, and the number of neurons in the third layer is determined by the dimension of the operator action variable.
Furthermore, the Critic network is a three-layer fully-connected neural network, the neuron number of the first layer is determined by a state variable dimension input by the Actor neural network and an action variable dimension output by the Actor neural network, the neuron number of the second layer comprises 128 neurons, and the neuron number of the third layer is 1.
Further, the parameters of each operator neural network are trained step by using a gradient descent method and a reverse gradient propagation method.
Advantageous effects
The distributed reinforcement learning method is adopted, one operator corresponds to one decision network, the network scale is small, the search space is small, and the migration is convenient.
According to the invention, the image information is also added into the state variable of the operator, so that the complex battlefield situation which cannot be simply quantized can be described, and an effective decision result can be obtained more accurately according to various battlefield situations.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1: a military chess map;
FIG. 2: the general view is schematic;
FIG. 3: an Actor-Critic algorithm operation flow chart;
FIG. 4: a decision flow chart of a reinforcement learning network in a war game deduction environment.
Detailed Description
The war game deduction is a typical incomplete information game problem, different confrontation strategies are mainly explored in a human-human battle mode at present, and the war game deduction has certain limitation. By using the artificial intelligence method based on distributed reinforcement learning, the invention designs the intelligent war game deduction scheme without excessive human intervention, thereby realizing intelligent war game confrontation.
The method comprises the following specific steps:
step 1: and determining the state variable and the action variable of the chess operator decision network.
For operator i, the state variable SiMainly comprises two parts. One part is obtained by interacting with a war game deduction and fight platform, and mainly comprises a vector formed by state information (position, power and residual blood volume) of an operator of the own party (including the operator of the own party), enemy operator information (position, operator type and residual blood volume), position information of a control-taking point and the like, and the other part also comprises a communication state (represented by a communication view) of the current position of the operator, because the terrains of all grids of a war game map are different, whether operators in different grids can mutually observe is called a communication relation, an image formed by the range in which each grid can communicate is a communication view, as shown in fig. 2, a red position is the current position of the operator, and a blue area is the communication range of the operator in the current position. And processing the perspective view by a convolutional neural network to obtain the perspective state of the current position of the operator. The two parts jointly form the state variable S of the operator ii. The state variable is used as the input of the strategy network, and the action variable A is outputiThe actions that should be taken by the operator i at the current moment mainly include maneuvering, shooting, and the like.
In the embodiment, the number of the own operator is 3, the number of the enemy operator is 3, and the observed number of the enemy operator is 1. Thus for the own-operator i, the state variable SiThe system is composed of 42-dimensional vectors consisting of state information (position, power and residual blood volume) of a self operator (including the self operator), enemy operator information (position, operator type and residual blood volume), position information of a control point, and through-view information (24-dimensional vectors obtained through dimension reduction processing of a convolutional neural network), wherein the states of unobserved enemy operators are all set to be 0, and the states are used as the input of a decision network. Action variable A of decision network outputiThe operator i is the action to be taken at the current moment, and is a 4-dimensional vector formed by the maneuvering, the maneuvering position, the attack and the attack target.
Step 2: in the algorithm training stage, each operator is obtained by interacting with a weapon and chess deduction platform in the weapon and chess deduction processThe score of the step movement is used as a reward function R, mainly comprising a rob-control score RconJianfen RdesAnd the force of the arms is divided into RremIs shown as follows
R=Rcon+Rdes+Rrem
The higher the R value is, the better the military chess game performance of the game is.
And step 3: determining a Markov decision process according to the state input variables and the state output variables, wherein the Markov decision process is expressed as follows:
<S,A,R,γ>
wherein, S is the state variable input of the operator in the step 1, and A is the action variable output of the operator in the step 1; r is the reward function in the step 2, gamma is a discount factor, and the value range of gamma belongs to [0, 1 ].
Based on this, a training pool of reinforcement learning is constructed as follows
Figure RE-GDA0003110820020000051
Wherein,
Figure RE-GDA0003110820020000052
is the state variable input of the operator i at time t,
Figure RE-GDA0003110820020000053
is the maneuver variable output by the operator i at time t,
Figure RE-GDA0003110820020000054
is the reward value that the operator finds by the reward function at time t,
Figure RE-GDA0003110820020000055
for operator i in state
Figure RE-GDA0003110820020000056
Taking action
Figure RE-GDA0003110820020000057
The post-updated state variables.
And in the algorithm training stage, information generated by interaction of each operator and the platform is stored in the operator experience pool and is used for neural network training.
And 4, step 4: the distributed learning method is adopted, each operator has a neural network, the neural networks are realized by adopting an Actor-criticic algorithm commonly used in reinforcement learning, the distributed learning method mainly comprises two neural networks, one neural network is an Actor network, the input is the state variable observation of the operator, and the output is the action of the operator; one is a Critic network, the input is the observation of the state variables of the operators and the action variables of the operators, and the output is an evaluation function, and the main process is shown in FIG. 3.
The input of the Actor network defining each operator is the state variable of the operator determined in step 1, the initial value of the state variable is the situation information represented by the initial position and the initial state of the operator, and the output is the action variable of the operator determined in step 1. The Actor network is a three-layer fully-connected neural network, the number of neurons in the first layer is determined by the dimension of an input state variable, in this embodiment, the number of neurons in the first layer is 42, the second layer comprises 256 neurons, the number of neurons in the third layer is determined by the dimension of an action variable of an operator, and in this embodiment, the number of neurons in the third layer is 4.
And defining the input of the Critic network of each operator as the input of the Actor network and the output of the Actor network, wherein the evaluation function of the output of the Critic network is the difference value of the real reward value calculated by the operator through the reward function in the step 2 and the estimated reward value of the Critic network. The Critic neural network is a three-layer fully-connected layer, the number of neurons in the first layer is determined by a state variable dimension input by the Actor neural network and an action variable dimension output by the Actor neural network, in this embodiment, the number of neurons in the first layer is 46, the number of neurons in the second layer includes 128 neurons, and the number of neurons in the third layer is 1.
And (4) training parameters of the operator neural network step by combining information in the experience pool and utilizing a gradient descent method and a reverse gradient propagation method until the network is converged by using the minimization of the evaluation function as a training target.
And then accessing the convergent Actor network obtained by training into a chess deduction system, and making a corresponding decision by each operator according to the battlefield situation.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art without departing from the principle and spirit of the present invention.

Claims (8)

1. An intelligent war game deduction method based on distributed reinforcement learning is characterized in that: the method comprises the following steps:
step 1: determining state variables and action variables of a chess operator decision network;
for operator i, the state variable SiThe method comprises the state information of an own operator, the information of an enemy operator, the position information of a control-taking point and the communication state of the current position of an operator i; action variable AiAn action taken by the operator i at the current moment;
step 2: in the algorithm training stage, the scores of each action of an operator are obtained as a reward function R by interacting with a chess deduction platform in the process of chess deduction;
and step 3: determining Markov decision process < S, A, R, gamma > according to the state input variable and the state output variable, wherein S is the state variable input of an operator, and A is the action variable output of the operator; r is a reward function, and gamma is a discount factor; the training pool for reinforcement learning is constructed as follows
Figure FDA0002942936390000011
Wherein,
Figure FDA0002942936390000012
is the state variable input of the operator i at time t,
Figure FDA0002942936390000013
is the maneuver variable output by the operator i at time t,
Figure FDA0002942936390000014
is the reward value that operator i finds by the reward function at time t,
Figure FDA0002942936390000015
for operator i in state
Figure FDA0002942936390000016
Taking action
Figure FDA0002942936390000017
A post-update state variable; in the algorithm training stage, information generated by interaction of each operator and the platform is stored in the operator experience pool and used for neural network training;
and 4, step 4: for each operator, establishing a neural network by adopting an Actor-Critic algorithm, wherein the input of the Actor network is state variable observation of the operator, and the output of the Actor is an action variable of the operator, the input of the Critic network is the state variable observation of the operator and the action variable of the operator, and the output of the Critic network is an evaluation function which is a difference value of a real reward value and an estimated reward value of the operator;
training parameters of each operator neural network step by taking the minimization of the evaluation function as a training target and combining the information in the experience pool until the network converges;
and then accessing the convergent Actor network obtained by training into a chess deduction system, and making a corresponding decision by each operator according to the battlefield situation.
2. The intelligent chess deduction method based on distributed reinforcement learning as claimed in claim 1, characterized in that: the perspective state of the current position of the operator i is represented by a perspective view, and the perspective view is an image formed by a range which can be viewed by the current position of the operator i; and processing the perspective view through a convolution neural network to obtain the perspective state of the current position of the operator i.
3. The intelligent chess deduction method based on distributed reinforcement learning as claimed in claim 1, characterized in that: the state information of the operator at the own side comprises position, motor power and residual blood volume; the enemy operator information comprises position, operator type and residual blood volume.
4. The intelligent chess deduction method based on distributed reinforcement learning as claimed in claim 1, characterized in that: the action variables are 4-dimensional vectors formed by maneuvering, maneuvering positions, attacks and attack targets.
5. The intelligent chess deduction method based on distributed reinforcement learning as claimed in claim 1, characterized in that: the reward function R comprises a contention control score RconJianfen RdesAnd the force of the arms is divided into Rrem
R=Rcon+Rdes+Rrem
6. The intelligent chess deduction method based on distributed reinforcement learning as claimed in claim 1, characterized in that: the Actor network is a three-layer fully-connected neural network, the number of neurons in the first layer is determined by the dimension of an input operator state variable, the second layer comprises 256 neurons, and the number of neurons in the third layer is determined by the dimension of an action variable of an operator.
7. The intelligent chess deduction method based on distributed reinforcement learning as claimed in claim 1, characterized in that: the Critic network is a three-layer fully-connected neural network, the number of neurons in the first layer is determined by the dimension of a state variable input by an Actor neural network and the dimension of an action variable output by the Actor neural network, the number of neurons in the second layer comprises 128 neurons, and the number of neurons in the third layer is 1.
8. The intelligent chess deduction method based on distributed reinforcement learning as claimed in claim 1, characterized in that: and training the parameters of each operator neural network step by using a gradient descent method and a reverse gradient propagation method.
CN202110185566.6A 2021-02-10 2021-02-10 Intelligent soldier chess deduction method based on distributed reinforcement learning Active CN113222106B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110185566.6A CN113222106B (en) 2021-02-10 2021-02-10 Intelligent soldier chess deduction method based on distributed reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110185566.6A CN113222106B (en) 2021-02-10 2021-02-10 Intelligent soldier chess deduction method based on distributed reinforcement learning

Publications (2)

Publication Number Publication Date
CN113222106A true CN113222106A (en) 2021-08-06
CN113222106B CN113222106B (en) 2024-04-30

Family

ID=77084912

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110185566.6A Active CN113222106B (en) 2021-02-10 2021-02-10 Intelligent soldier chess deduction method based on distributed reinforcement learning

Country Status (1)

Country Link
CN (1) CN113222106B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723013A (en) * 2021-09-10 2021-11-30 中国人民解放军国防科技大学 Multi-agent decision method for continuous space chess deduction
CN114091358A (en) * 2022-01-20 2022-02-25 浙江建木智能系统有限公司 Estimation method, device and system based on deduction simulation target area situation
CN114611669A (en) * 2022-03-14 2022-06-10 三峡大学 Intelligent strategy-chess-deducing decision-making method based on double experience pool DDPG network
CN114662655A (en) * 2022-02-28 2022-06-24 南京邮电大学 Attention mechanism-based weapon and chess deduction AI hierarchical decision method and device
CN114722998A (en) * 2022-03-09 2022-07-08 三峡大学 Method for constructing chess deduction intelligent body based on CNN-PPO
CN114880955A (en) * 2022-07-05 2022-08-09 中国人民解放军国防科技大学 War and chess multi-entity asynchronous collaborative decision-making method and device based on reinforcement learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647374A (en) * 2018-03-22 2018-10-12 中国科学院自动化研究所 Tank tactics Behavior modeling method and system and equipment in ground force's tactics war game game
CN112131786A (en) * 2020-09-14 2020-12-25 中国人民解放军军事科学院评估论证研究中心 Target detection and distribution method and device based on multi-agent reinforcement learning
US20200410351A1 (en) * 2015-07-24 2020-12-31 Deepmind Technologies Limited Continuous control with deep reinforcement learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200410351A1 (en) * 2015-07-24 2020-12-31 Deepmind Technologies Limited Continuous control with deep reinforcement learning
CN108647374A (en) * 2018-03-22 2018-10-12 中国科学院自动化研究所 Tank tactics Behavior modeling method and system and equipment in ground force's tactics war game game
CN112131786A (en) * 2020-09-14 2020-12-25 中国人民解放军军事科学院评估论证研究中心 Target detection and distribution method and device based on multi-agent reinforcement learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LUNTONG LI ET AL.: ""Actor-Critic Learning Control Based on 2-Regularized Temporal-Difference Prediction With Gradient Correction"", 《IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 》, vol. 29, no. 12 *
李琛: ""Actor-Critic框架下的多智能体决策方法及其 在兵棋上的应用"", 《系统工程与电子技术》 *
李琛等: "Actor-Critic 框架下的多智能体决策方法及其在兵棋上的应用", 系统工程与电子技术, pages 1 - 4 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723013A (en) * 2021-09-10 2021-11-30 中国人民解放军国防科技大学 Multi-agent decision method for continuous space chess deduction
CN114091358A (en) * 2022-01-20 2022-02-25 浙江建木智能系统有限公司 Estimation method, device and system based on deduction simulation target area situation
CN114662655A (en) * 2022-02-28 2022-06-24 南京邮电大学 Attention mechanism-based weapon and chess deduction AI hierarchical decision method and device
CN114662655B (en) * 2022-02-28 2024-07-16 南京邮电大学 Attention mechanism-based method and device for deriving AI layering decision by soldier chess
CN114722998A (en) * 2022-03-09 2022-07-08 三峡大学 Method for constructing chess deduction intelligent body based on CNN-PPO
CN114722998B (en) * 2022-03-09 2024-02-02 三峡大学 Construction method of soldier chess deduction intelligent body based on CNN-PPO
CN114611669A (en) * 2022-03-14 2022-06-10 三峡大学 Intelligent strategy-chess-deducing decision-making method based on double experience pool DDPG network
CN114611669B (en) * 2022-03-14 2023-10-13 三峡大学 Intelligent decision-making method for chess deduction based on double experience pool DDPG network
CN114880955A (en) * 2022-07-05 2022-08-09 中国人民解放军国防科技大学 War and chess multi-entity asynchronous collaborative decision-making method and device based on reinforcement learning
CN114880955B (en) * 2022-07-05 2022-09-20 中国人民解放军国防科技大学 War and chess multi-entity asynchronous collaborative decision-making method and device based on reinforcement learning

Also Published As

Publication number Publication date
CN113222106B (en) 2024-04-30

Similar Documents

Publication Publication Date Title
CN113222106A (en) Intelligent military chess deduction method based on distributed reinforcement learning
CN110991545B (en) Multi-agent confrontation oriented reinforcement learning training optimization method and device
CN113893539B (en) Cooperative fighting method and device for intelligent agent
CN115291625A (en) Multi-unmanned aerial vehicle air combat decision method based on multi-agent layered reinforcement learning
CN105678030B (en) Divide the air-combat tactics team emulation mode of shape based on expert system and tactics tactics
CN114460959A (en) Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game
CN116596343A (en) Intelligent soldier chess deduction decision method based on deep reinforcement learning
CN112215350A (en) Smart agent control method and device based on reinforcement learning
Shao et al. Cooperative reinforcement learning for multiple units combat in StarCraft
CN111723931B (en) Multi-agent confrontation action prediction method and device
CN114722998B (en) Construction method of soldier chess deduction intelligent body based on CNN-PPO
CN113282100A (en) Unmanned aerial vehicle confrontation game training control method based on reinforcement learning
CN111437605B (en) Method for determining virtual object behaviors and hosting virtual object behaviors
CN114662655B (en) Attention mechanism-based method and device for deriving AI layering decision by soldier chess
CN113705828B (en) Battlefield game strategy reinforcement learning training method based on cluster influence degree
CN114611664A (en) Multi-agent learning method, device and equipment
CN111723941B (en) Rule generation method and device, electronic equipment and storage medium
Zuo A deep reinforcement learning methods based on deterministic policy gradient for multi-agent cooperative competition
Bian et al. Cooperative strike target assignment algorithm based on deep reinforcement learning
CN118001744A (en) Intelligent decision-making method, device and storage medium for deduction of chess
CN116679742B (en) Multi-six-degree-of-freedom aircraft collaborative combat decision-making method
CN117151224A (en) Strategy evolution training method, device, equipment and medium for strong random game of soldiers
CN114254722B (en) Multi-intelligent-model fusion method for game confrontation
Pan et al. An algorithm to estimate enemy's location in WarGame based on pheromone
CN117647994A (en) Collaborative countermeasure method, device, equipment and storage medium for unmanned aerial vehicle cluster

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant