CN114611669B - Intelligent decision-making method for chess deduction based on double experience pool DDPG network - Google Patents

Intelligent decision-making method for chess deduction based on double experience pool DDPG network Download PDF

Info

Publication number
CN114611669B
CN114611669B CN202210244709.0A CN202210244709A CN114611669B CN 114611669 B CN114611669 B CN 114611669B CN 202210244709 A CN202210244709 A CN 202210244709A CN 114611669 B CN114611669 B CN 114611669B
Authority
CN
China
Prior art keywords
data
deduction
network
experience
chess
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210244709.0A
Other languages
Chinese (zh)
Other versions
CN114611669A (en
Inventor
张震
臧兆祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Three Gorges University CTGU
Original Assignee
China Three Gorges University CTGU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Three Gorges University CTGU filed Critical China Three Gorges University CTGU
Priority to CN202210244709.0A priority Critical patent/CN114611669B/en
Publication of CN114611669A publication Critical patent/CN114611669A/en
Application granted granted Critical
Publication of CN114611669B publication Critical patent/CN114611669B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The application discloses a soldier chess deduction intelligent decision-making method based on a double experience pool DDPG network, which comprises the following steps: obtaining deduction data of the chess and constructing a double experience pool DDPG model; preprocessing the deduction data of the chess, vectorizing the preprocessed data, and obtaining vectorized data; and inputting the vectorized data into a double-experience-pool DDPG model for training, completing training when the double-experience-pool DDPG model reaches a preset convergence degree, and generating a chess deduction intelligent decision based on the trained double-experience-pool DDPG model. Compared with a general reinforcement learning architecture, the method has the advantages that the convergence speed is higher, the training time is saved, and the whole strategy is learned faster. The double experience pool DDPG structure is applied to chess deduction, and the training speed is improved by the double experience pools, so that an available neural network model is trained faster. By screening and utilizing high quality samples, the problem of model performance dependence on sample quality is ameliorated to some extent.

Description

Intelligent decision-making method for chess deduction based on double experience pool DDPG network
Technical Field
The application belongs to the field of intelligent decision making, and particularly relates to a soldier chess deduction intelligent decision making method based on a double experience pool DDPG network.
Background
The purpose of intelligent decision making is to solve complex decision making problems by artificial intelligence methods using human knowledge and by means of computers. Typical complex decision problems are deducted from a chess. The chess deduction is a common countermeasure pattern in military exercises, the sand table is used for replacing the field, different chess pieces are used for replacing different forces of the arms, the field actual combat is simulated to the greatest extent based on a background database and electronic situation information, and the chess deduction method can be used for checking a strategic tactic and inspiring a commander on tactical strategies. With the development of artificial intelligence technology, intelligent decisions and chess deductions are fused into research hotspots in the fields of chess deduction and artificial intelligence, and a plurality of achievements are obtained for the research of chess deduction intelligent decisions, so that the achievements are hopeful to practically promote the combat force of troops and deepen the intelligent military process.
The existing intelligent decision method is mainly divided into two types:
rule type: for example, a decision tree method solves the decision problem by setting different coping strategies adopted in different situations. The main problems of the technology are that the situation complexity in the deduction of the chess is high, the branches required by the rule type intelligent agent for performing the action through judging the situation are too many, and the complexity of the whole decision tree grows exponentially along with the rise of the complexity of the problem.
Learning type: a certain network model is built by deep learning and reinforcement learning technology, battlefield situation is used as network input, actions required to be taken by own force are used as network output, parameters of the network are updated through certain evaluation, learning of the whole decision frame is achieved, and after a certain time of training, the network model can be directly used for fight. The main limitation of this type of technology is that the convergence rate of the network model is greatly affected by the quality of the sample, and the convergence rate is not guaranteed.
Disclosure of Invention
The application aims to provide a soldier chess deduction intelligent decision-making method based on a double experience pool DDPG network, so as to solve the problems in the prior art.
In order to achieve the above purpose, the application provides a chess deduction intelligent decision method based on a double experience pool DDPG network, comprising the following steps:
obtaining deduction data of the chess and constructing a double experience pool DDPG model;
preprocessing the chess deduction data, vectorizing the preprocessed data, and obtaining vectorized data;
and inputting the vectorized data into the double-experience-pool DDPG model for training, completing training when the double-experience-pool DDPG model reaches a preset convergence degree, and generating a chess deduction intelligent decision based on the trained double-experience-pool DDPG model.
Optionally, the step of obtaining the chess deduction data includes running a chess deduction environment and obtaining the chess deduction data in the chess deduction environment;
the chess deduction data comprises: own entity attribute information, entity attribute information of which an enemy has been found, deduction time, map attribute information, and scoreboard information;
wherein the own entity attribute information comprises the residual blood volume of own units, the positions of own units and the residual elastic quantity of own units;
the entity attribute information that the enemy has been discovered includes enemy residual blood volume and enemy location;
the map attribute information comprises an elevation and a number;
the scoreboard information includes score information that is currently obtained.
Optionally, in the preprocessing process of the deduction data of the chess, a data cleaning mode is adopted for preprocessing, and the data cleaning includes:
carrying out data extraction on the acquired deduction data of the chess to obtain normalized data;
and classifying the normalized data and removing redundant data.
Optionally, the data extraction of the acquired deduction data of the chess game is performed, and the process of obtaining the normalized data includes:
when the deduction data of the chesses are extracted, removing the nonstandard data in the deduction data to obtain normalized data;
the non-canonical data includes: blank data and scrambled data.
Optionally, the process of classifying the normalized data and removing the redundant data includes:
dividing the normalized data into the own entity attribute information, the entity attribute information of which the enemy has been found, the deduction time and the scoreboard information;
and eliminating redundant data in the classified data, wherein the redundant data comprises information which is useless for decision.
Optionally, the vectorizing the preprocessed data includes:
coding deduction time, own entity attribute information and entity attribute information of discovered enemies based on a one-hot coding mode;
and directly taking the scoreboard information as one of the vectorized data without encoding the map attribute information and the scoreboard information.
Optionally, the process of constructing the dual experience pool DDPG model includes:
constructing a DDPG neural network based on a DDPG algorithm architecture, wherein the DDPG neural network comprises an Actor network, a Critic network, an actor_target network and a cirtic_target network;
constructing two experience pools for storing experiences generated in the training process, wherein the experience pools are multidimensional arrays;
and constructing the double experience pool DDPG model based on the DDPG neural network and the two experience pools.
Optionally, inputting the vectorized data into the dual experience pool DDPG model for training comprises:
inputting the vectorized data into the Actor network, and inputting the obtained value into the Critic network for processing;
updating the actor_target network based on the parameters of the Actor network every preset time step, and updating the cirtic_target network based on the parameters of the Critic network;
and storing the current experience into a first experience pool when each training is completed, and if the obtained rewards in the current experience are larger than the average rewards in the first experience pool, storing the current experience into a second experience pool.
Optionally, in the process of updating the actor_target network based on the parameters of the Actor network, the Actor network is updated by adopting a gradient descent method; and in the process of updating the cirtic_target network based on the parameters of the Critic network, the Critic network is also updated by using a gradient descent method, and in the updating process, the loss function of the Critic network uses mean square error loss.
The application has the technical effects that:
compared with a general reinforcement learning architecture, the method has the advantages that the convergence speed is higher, the training time is saved, and the whole strategy is learned faster. The double experience pool DDPG structure is applied to chess deduction, and the training speed is improved by the double experience pools, so that an available neural network model is trained faster. By screening and utilizing high quality samples, the problem of model performance dependence on sample quality is ameliorated to some extent.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:
FIG. 1 is a flow chart of a method in an embodiment of the application;
fig. 2 is a schematic diagram of a training process in an embodiment of the present application.
Detailed Description
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
Example 1
1-2, the embodiment provides a soldier chess deduction intelligent decision method based on a double experience pool DDPG network, which comprises the following steps:
step 1, running a soldier chess deduction environment and collecting data;
step 2, data cleaning, which comprises data extraction, data classification and redundant data rejection;
and 3, vectorizing the text data.
And 4, constructing a double experience pool DDPG model.
And 5, inputting the data into the model, and filling two experience pools.
And 6, training until the model converges.
The step 1 specifically comprises the following steps:
1.1 the chess is an instant tactical chess, so as to be distinguished from round-making chess.
1.2 the data mainly comprises own entity attribute information, entity attribute information of which an enemy has been found, map attribute information and scoreboard information. Wherein the own entity attribute information comprises the residual blood volume, the position, the residual bullet volume and the like of the own unit, the entity attribute information of the enemy found comprises the residual blood volume and the position but does not comprise the bullet volume, the map attribute information comprises the elevation, the number and the like, and the scoreboard information is the score information obtained at present.
The step 2 specifically comprises the following steps:
2.1 extracting each piece of data collected, and removing nonstandard data such as blank lines and messy codes. Normalized data is obtained.
2.2 removing redundant data, removing some useless data for decision according to expert experience to reduce state space, such as information of 'aggregation or not' given by environment, which is not helpful to decision of the intelligent agent, and can be deleted.
The step 3 specifically comprises the following steps:
1.1 converting formatted data output by environment into vector format, using one-hot coding mode to code deduction time, own unit information and obtained enemy unit information, the scoreboard information can be directly used, and it is not necessary to use one-hot coding to make conversion. The meaning of the above formatted data is: the data format given by the environment is certain, and the position of the data is preset by classifying.
1.2 eliminating the influence of different scales on data, and carrying out normalization processing on the data, wherein the normalization formula is as follows:
x′ ij is x ij Normalized x ij The value thereafter is the ith column, the jth dimension, the feature, x i Is the ith column feature, min (x i ) Is the minimum of the values in all dimensions of column i, max (x i ) Is the numerical maximum in all dimensions of column i.
The step 4 specifically comprises the following steps:
the neural network is built according to a DDPG algorithm architecture, and the DDPG architecture needs to build 4 neural networks, namely an Actor network, a Critic network, an actor_target network and a cirtic_target network.
4.1 the convolution layer uses multiple convolution kernels, different convolution kernels focus on different features, and features are extracted for more important attributes such as blood volume and coordinates, respectively. The update formula of the convolutional neural network is as follows:
x t =σ cnn (w cnn ⊙x t +b cnn )
wherein x is t Representing the current state characteristics, w cnn Representing the weights of the filters, b cnn Representing the deviation parameter, sigma cnn Is an activation function.
The 4.2Actor network is a three-layer neural network, the first layer is a convolution layer, the number of neurons is determined by the situation information dimension, the number of neurons of the second layer is 128, and the number of neurons of the third layer is determined by the action variable dimension.
The 4.3Critic network is a three-layer fully-connected neural network, the number of neurons of the first layer is determined by the input variable dimension of the Actor network and the output variable dimension of the Actor network together, the number of neurons of the second layer is 128, and the number of neurons of the third layer is 1.
4.4Acotr_target network structure is the same as the Actor network, and Critic_target network structure is the same as the Critic network.
4.5, constructing two experience pools for storing experiences obtained in the deduction process, wherein the experience pools are multidimensional arrays and have the following structure:
wherein the method comprises the steps ofFor the state at the current moment->For the action taken at time t, +.>For rewarding acquisitions->I represents the ith experience for the state at time t+1.
The dimensions of the experience pool are determined by the following formula:
dim=2*state_dim+action_dim+1
where dim represents the dimension of the experience pool, state_dim is the dimension of the situation information, and action_dim is the dimension of the motion vector, plus 1 dimension required for the prize value.
The step 5 specifically comprises the following steps:
and 5.1, inputting the vectorized data in the step 3 into an Actor network, and mining potential links in situation information. The output value of the Actor network.
And 5.2, splicing the output value of the convolutional neural network and the output value of the Actor network together, and inputting the spliced output value and the output value into the Critic network to obtain the output of the Critic network. And updating the actor_target network and the critic_target network respectively by using the parameters of the Actor network and the parameters of the Critic network at fixed time steps. The method for updating the parameters of the target network is soft-update, and is carried out according to the following formula:
θ targ ←——ρθ targ +(1-ρ)θ
here θ targ Referring to the initial parameters in the target network, ρ generally takes a larger value to ensure that the parameters are updated slowly, which is more robust than the hard update of directly copied parameters.
5.3 environmental Provisions one at a timeIs stored in an experience pool A, the experience R i t Greater than the average prize in experience pool a, it is copied into experience pool B.
The step 6 specifically comprises the following steps:
6.1Actor network is updated using gradient descent method.
The loss function of the 6.2Critic network was updated using mean square error loss (MSE), also using the gradient descent method.
6.3 updating the data in the experience pool using the overlay.
Example two
1-2, in this embodiment, a method for intelligent decision making of chess deduction based on a double experience pool DDPG network is provided, a chess deduction environment is operated, and data are collected; the data cleaning comprises data extraction, data classification and redundant data rejection; text data vectorization, construction of a double experience pool DDPG model, filling of an experience pool, updating of network parameters, training until the model converges, and the method specifically comprises the following steps:
step 1: and operating a chess deduction environment, and collecting fight data, wherein the fight data comprises the state of each step, actions taken by the user, the part score and the like, and the data can be generated by fight between a hard coding strategy written manually and a bot built in the deduction environment.
Step 2: normalizing collected data
x′ ij Is x ij Normalized x ij The value thereafter is the ith column, the jth dimension, the feature, x i Is the ith column feature, min (x i ) Is the minimum of the values in all dimensions of column i, max (x i ) Is the numerical maximum in all dimensions of column i.
Step 3: the DDPG network model of the double experience pool designed by the application is composed of 5 parts, namely an Actor network, a Critic network, an Actor-target network, a Critic-target network and a double experience pool. The input of the Actor network is the current observed state, and the output of the Actor network is the action of each unit of the current state; the input of the Critic network is the current observed state and the current actions of each unit, and the output is an estimated value, and the main process is shown in the figure.
Step 4: the observed state is observable situation information in the environment, and comprises a current score; step number; position of own unit, blood volume, ammunition allowance; the location of the observed enemy units, blood volume; the position of the robotically controlled point. The actions of the above units mainly include movement, masking, shooting, and the like.
In the examples, the number of own units was 3. The enemy unit operator number is also 3. The specific state information is a 36-dimensional vector composed of own unit state, enemy unit state and score board information returned by the deduction environment. This set of vectors represents the information on the current field that needs to be of interest, and is used as input to the Actor network. Each unit action comprises a 15-dimensional vector formed by actions such as maneuvering, maneuvering targets, robbery control, shooting targets and the like.
Step 5: according to the interaction between the network model and the environment, the current time state S is obtained t Action A at the present moment t The next time state S t+1 Rewarding information R given by score board t . The experience pool was constructed accordingly as follows:
wherein the subscript i represents the ith experience.The double experience pools are marked as experience pool A and experience pool B, wherein the experience pool A normally stores combat data, and the experience pool B only receives R in the experience pool A i t Experience with values above the average prize value.
The prize value is defined as:
wherein gamma is a discount factor, r(s) i ,a i ) Is in state s i Action a is taken at that time i A prize value may be obtained.
Step 6: after the data in the experience pool reaches a certain number, experience is extracted from the experience pools A and B according to different proportions to train the network. Wherein the number of neurons of the first layer of the Actor network is 36, the number of neurons of the second layer is 128, and the number of neurons of the third layer is 15. The Critic network has a first layer of neurons 51, a second layer of neurons 128, and a third layer of neurons 1.
Parameters of the Actor network are updated layer by layer using a gradient descent algorithm.
The Actor network update formula is:
and updating Critic network parameters layer by using a mean square error loss function and a gradient descent method.
The Critic network update formula is:
the following formula is used when updating the actor_target network:
θ targ ←——ρθ targ +(1-ρ)θ
the following formula is used when updating the critic_target network:
φ targ ←——ρφ targ +(1-ρ)φ
in this embodiment, ρ is set to 0.95.
The present application is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present application are intended to be included in the scope of the present application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims (6)

1. A soldier chess deduction intelligent decision method based on a double experience pool DDPG network is characterized by comprising the following steps:
obtaining deduction data of the chess and constructing a double experience pool DDPG model;
preprocessing the chess deduction data, vectorizing the preprocessed data, and obtaining vectorized data;
inputting the vectorized data into the double experience pool DDPG model for training, completing training when the double experience pool DDPG model reaches a preset convergence degree, and generating a chess deduction intelligent decision based on the trained double experience pool DDPG model;
the process of constructing the double experience pool DDPG model comprises the following steps:
constructing a DDPG neural network based on a DDPG algorithm architecture, wherein the DDPG neural network comprises an Actor network, a Critic network, an actor_target network and a cirtic_target network;
constructing two experience pools for storing experiences generated in the training process, wherein the experience pools are multidimensional arrays;
constructing a double experience pool DDPG model based on the DDPG neural network and the two experience pools;
the process of inputting the vectorized data into the dual experience pool DDPG model for training comprises the following steps:
inputting the vectorized data into the Actor network, and inputting the obtained value into the Critic network for processing;
updating the actor_target network based on the parameters of the Actor network every preset time step, and updating the cirtic_target network based on the parameters of the Critic network;
when each training is completed, storing the current experience into a first experience pool, and if the obtained rewards in the current experience are larger than the average rewards in the first experience pool, storing the current experience into a second experience pool;
in the process of updating the actor_target network based on the parameters of the Actor network, the Actor network is updated by adopting a gradient descent method; and in the process of updating the cirtic_target network based on the parameters of the Critic network, the Critic network is also updated by using a gradient descent method, and in the updating process, the loss function of the Critic network uses mean square error loss.
2. The method of claim 1, wherein the step of obtaining the chess deduction data comprises operating a chess deduction environment and obtaining the chess deduction data in the chess deduction environment;
the chess deduction data comprises: own entity attribute information, entity attribute information of which an enemy has been found, deduction time, map attribute information, and scoreboard information;
wherein the own entity attribute information comprises the residual blood volume of own units, the positions of own units and the residual elastic quantity of own units;
the entity attribute information that the enemy has been discovered includes enemy residual blood volume and enemy location;
the map attribute information comprises an elevation and a number;
the scoreboard information includes score information that is currently obtained.
3. The method according to claim 2, wherein in preprocessing the chess deduction data, the preprocessing mode adopts data cleaning, and the data cleaning includes:
carrying out data extraction on the acquired deduction data of the chess to obtain normalized data;
and classifying the normalized data and removing redundant data.
4. The method according to claim 3, wherein the step of extracting the acquired deduction data of the chess and obtaining the normalized data comprises the steps of:
when the deduction data of the chesses are extracted, removing the nonstandard data in the deduction data to obtain normalized data;
the non-canonical data includes: blank data and scrambled data.
5. A method according to claim 3, wherein classifying the normalized data and eliminating redundant data comprises:
dividing the normalized data into the own entity attribute information, the entity attribute information of which the enemy has been found, the deduction time and the scoreboard information;
and eliminating redundant data in the classified data, wherein the redundant data comprises information which is useless for decision.
6. The method of claim 2, wherein vectorizing the preprocessed data comprises:
coding deduction time, own entity attribute information and entity attribute information of discovered enemies based on a one-hot coding mode;
and directly taking the scoreboard information as one of the vectorized data without encoding the map attribute information and the scoreboard information.
CN202210244709.0A 2022-03-14 2022-03-14 Intelligent decision-making method for chess deduction based on double experience pool DDPG network Active CN114611669B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210244709.0A CN114611669B (en) 2022-03-14 2022-03-14 Intelligent decision-making method for chess deduction based on double experience pool DDPG network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210244709.0A CN114611669B (en) 2022-03-14 2022-03-14 Intelligent decision-making method for chess deduction based on double experience pool DDPG network

Publications (2)

Publication Number Publication Date
CN114611669A CN114611669A (en) 2022-06-10
CN114611669B true CN114611669B (en) 2023-10-13

Family

ID=81863363

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210244709.0A Active CN114611669B (en) 2022-03-14 2022-03-14 Intelligent decision-making method for chess deduction based on double experience pool DDPG network

Country Status (1)

Country Link
CN (1) CN114611669B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451195A (en) * 2017-07-03 2017-12-08 三峡大学 One kind is based on the visual war game simulation system of big data
CN112131786A (en) * 2020-09-14 2020-12-25 中国人民解放军军事科学院评估论证研究中心 Target detection and distribution method and device based on multi-agent reinforcement learning
CN112801249A (en) * 2021-02-09 2021-05-14 中国人民解放军国防科技大学 Intelligent chess and card identification and positioning device for tabletop chess deduction and use method thereof
CN113222106A (en) * 2021-02-10 2021-08-06 西北工业大学 Intelligent military chess deduction method based on distributed reinforcement learning
CN113341958A (en) * 2021-05-21 2021-09-03 西北工业大学 Multi-agent reinforcement learning movement planning method with mixed experience
CN113723013A (en) * 2021-09-10 2021-11-30 中国人民解放军国防科技大学 Multi-agent decision method for continuous space chess deduction

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109784490B (en) * 2019-02-02 2020-07-03 北京地平线机器人技术研发有限公司 Neural network training method and device and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451195A (en) * 2017-07-03 2017-12-08 三峡大学 One kind is based on the visual war game simulation system of big data
CN112131786A (en) * 2020-09-14 2020-12-25 中国人民解放军军事科学院评估论证研究中心 Target detection and distribution method and device based on multi-agent reinforcement learning
CN112801249A (en) * 2021-02-09 2021-05-14 中国人民解放军国防科技大学 Intelligent chess and card identification and positioning device for tabletop chess deduction and use method thereof
CN113222106A (en) * 2021-02-10 2021-08-06 西北工业大学 Intelligent military chess deduction method based on distributed reinforcement learning
CN113341958A (en) * 2021-05-21 2021-09-03 西北工业大学 Multi-agent reinforcement learning movement planning method with mixed experience
CN113723013A (en) * 2021-09-10 2021-11-30 中国人民解放军国防科技大学 Multi-agent decision method for continuous space chess deduction

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Combat Unit Selection Based on Hybrid Neural Network in Real-Time Strategy Games;Zhaoxiang Zang et.al;《ICONIP 2021: Neural Information Processing 》;第344-352页 *
Multi-critic DDPG Method and Double Experience Replay;Jiao Wu et.al;《 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC)》;第165-171页 *
Research on Deep Reinforcement Learning Exploration Strategy in Wargame Deduction;Tongfei Shang et.al;《2019 2nd International Conference on Information Systems and Computer Aided Education (ICISCAE)》;第622-625页 *
基于机器学习的个性化推荐系统关键技术研究;马梦迪;《中国优秀硕士学位论文全文数据库 (信息科技辑)》;第I138-1445页 *
基于深度强化学习的兵棋推演决策方法框架;崔文华;李东;唐宇波;柳少军;;国防科技(02);第118-126页 *

Also Published As

Publication number Publication date
CN114611669A (en) 2022-06-10

Similar Documents

Publication Publication Date Title
CN112329348B (en) Intelligent decision-making method for military countermeasure game under incomplete information condition
CN112465151A (en) Multi-agent federal cooperation method based on deep reinforcement learning
CN108819948B (en) Driver behavior modeling method based on reverse reinforcement learning
CN108791302B (en) Driver behavior modeling system
CN111461294B (en) Intelligent aircraft brain cognitive learning method facing dynamic game
CN116757497B (en) Multi-mode military intelligent auxiliary combat decision-making method based on graph-like perception transducer
CN113625569B (en) Small unmanned aerial vehicle prevention and control decision method and system based on hybrid decision model
CN112906888B (en) Task execution method and device, electronic equipment and storage medium
CN113298260B (en) Confrontation simulation deduction method based on deep reinforcement learning
Jaafra et al. A review of meta-reinforcement learning for deep neural networks architecture search
CN108891421B (en) Method for constructing driving strategy
CN115511069A (en) Neural network training method, data processing method, device and storage medium
CN108944940B (en) Driver behavior modeling method based on neural network
Qi et al. Battle damage assessment based on an improved Kullback-Leibler divergence sparse autoencoder
CN114611669B (en) Intelligent decision-making method for chess deduction based on double experience pool DDPG network
CN114722998B (en) Construction method of soldier chess deduction intelligent body based on CNN-PPO
Zhang et al. Intelligent battlefield situation comprehension method based on deep learning in wargame
CN114004282A (en) Method for extracting deep reinforcement learning emergency control strategy of power system
CN112906871A (en) Temperature prediction method and system based on hybrid multilayer neural network model
CN112926729B (en) Man-machine confrontation intelligent agent strategy making method
CN115238832B (en) CNN-LSTM-based air formation target intention identification method and system
CN112295232B (en) Navigation decision making method, AI model training method, server and medium
CN117786469A (en) Air combat rule generation method and device based on network interpretability analysis
CN112329948B (en) Multi-agent strategy prediction method and device
Puebla et al. Learning Relational Rules from Rewards

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant