CN109598342A - A kind of decision networks model is from game training method and system - Google Patents

A kind of decision networks model is from game training method and system Download PDF

Info

Publication number
CN109598342A
CN109598342A CN201811410380.0A CN201811410380A CN109598342A CN 109598342 A CN109598342 A CN 109598342A CN 201811410380 A CN201811410380 A CN 201811410380A CN 109598342 A CN109598342 A CN 109598342A
Authority
CN
China
Prior art keywords
network
game
module
variation
decision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811410380.0A
Other languages
Chinese (zh)
Other versions
CN109598342B (en
Inventor
任金磊
路鹰
张耀磊
李君�
黄虎
郑本昌
张佳
晁鲁静
倪越
吕静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Academy of Launch Vehicle Technology CALT
Original Assignee
China Academy of Launch Vehicle Technology CALT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Academy of Launch Vehicle Technology CALT filed Critical China Academy of Launch Vehicle Technology CALT
Priority to CN201811410380.0A priority Critical patent/CN109598342B/en
Publication of CN109598342A publication Critical patent/CN109598342A/en
Application granted granted Critical
Publication of CN109598342B publication Critical patent/CN109598342B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

A kind of decision networks model includes the following steps: to obtain red EN network and blue party EN network after variation Step 1: make a variation using initial network parameter of the simulated annealing to EN network from game training method;Step 2: red EN network described in step 1 and blue party EN network, which are put into Antagonistic Environment, carries out game confrontation, the decision data and EN value of record confrontation key node;Step 3: being saved to the decision data of the triumph side of game confrontation in step 2 and EN value as effective sample, the data of the losing side are eliminated;Step 4: being trained according to the effective sample in step 3 to EN network, the network parameter after being optimized, using the network parameter after optimization as new initial network parameter;Step 5: circulating repetition step 1 to step 4, is realized from game training.By the present invention in that the AI intelligent decision-making body of stratification can be formed with from game training method, high-level aid decision is provided for game commander and is supported.

Description

A kind of decision networks model is from game training method and system
Technical field
The present invention relates to a kind of decision networks models from game training method and system, belongs to field of artificial intelligence.
Background technique
In recent years, artificial intelligence technology is quickly grown, and is made great progress in terms of autonomous game, in chess category pair The fields such as anti-, image/speech recognition, easy game confrontation have reached or are more than mankind's highest level.And using the U.S. as representative Military power based on AI Military Equipment Battling commander with confrontation control on put into a large amount of reasearch funds.It is contemplated that people Work will intelligently play an increasingly important role in decision domain, wherein intelligent simulation deduction can effectively improve commander The level of training of member, is the inexorable trend of future development using intelligence aided decision.There is the training method for representing meaning at present Have, AlphaGo Zero is tactful from game training method, error back propagation learning algorithm, Monte Carlo tree search (MCTS).
The achievement having lifted generation and having attracted attention is obtained in go field from game training technique.The research and development of DeepMind company The major technique of AlphaGo Zero just includes self game, and you are among us and we are among you, is fought mutually, and constantly self is evolved.
In addition, the supervised learning training method that error back propagation learning algorithm (abbreviation BP algorithm) is representative has become The normal process of trained deep neural network model.For the structure of network, deep neural network and traditional artificial mind Through network compared to being provided with more hidden layers and every layer is provided with more neuron numbers.
Monte Carlo tree searches for (MCTS) strategy, which is only applicable to the similar such tree structure of go can be from multichannel Training from game for a paths is randomly choosed in diameter.
Summary of the invention
The technical problem to be solved by the present invention is overcome the deficiencies of the prior art and provide a kind of decision networks model from Game training method and system, using from game training method, by exporting decision networks parameter variation to single, in game iteration In effectively improve the search efficiency of parameter, solve the problems, such as in intelligent decision single output class in sample deficiency and game confrontation.
The object of the invention is achieved by the following technical programs:
A kind of decision networks model includes the following steps: from game training method
Step 1: making a variation using initial network parameter of the simulated annealing to EN network, red is obtained after variation EN network and blue party EN network;
Step 2: red EN network described in step 1 and blue party EN network, which are put into Antagonistic Environment, carries out game pair It is anti-, the decision data and EN value of record confrontation key node;
Step 3: being protected to the decision data of the triumph side of game confrontation in step 2 and EN value as effective sample It deposits, the data of the losing side is eliminated;
Step 4: EN network is trained according to the effective sample in step 3, the network parameter after being optimized, it will Network parameter after optimization is as new initial network parameter;
Step 5: circulating repetition step 1 to step 4, is realized from game training.
For above-mentioned decision networks model from game training method, Antagonistic Environment described in step 2 is the symmetrical of incomplete condition Game fights scene.
Above-mentioned decision networks model from game training method, using back-propagation algorithm to the effective sample in step 4 into Row study, then pours into EN network and is trained.
Above-mentioned decision networks model is from game training method, using simulated annealing to initial network described in step 1 Parameter makes a variation, and the variation of the initial network parameter is random variation.
Above-mentioned decision networks model is made of from game training method, the EN network multiple EN sub-networks, each EN The feature input of network is same type, and the network structure of each EN sub-network is all the same.
Above-mentioned decision networks model is more than or equal to 100,000 from game training method, the circulating repetition number of the step 5 It is secondary.
For above-mentioned decision networks model from game training method, the decision networks model is the decision networks mould singly exported Type.
A kind of decision networks model fights module, data from game training system, including network parameter variation module, game Choose module, network training module, circulating repetition module;
The network parameter variation module makes a variation to the initial network parameter of EN network using simulated annealing, obtains Red EN network and blue party EN network after must making a variation are then output to the game confrontation module;
Game confrontation module by after variation red EN network and blue party EN network be put into Antagonistic Environment and win Confrontation is played chess, the decision data and EN value of record confrontation key node are then output to the data decimation module;
The data decimation module is protected using the decision data for the triumph side that game is fought and EN value as effective sample It deposits, then exports the decision data of preservation and EN value to the network training module;
The network training module is trained EN network according to effective sample, the network parameter after being optimized, will Network parameter after optimization is exported as new initial network parameter to the circulating repetition module;
Initial network parameter is exported and gives network parameter variation module by the circulating repetition module, is realized and is instructed from game Practice.
Above-mentioned decision networks model is from game training system, and the network training module is using back-propagation algorithm to effective Sample is learnt, and is then poured into EN network and is trained.
Above-mentioned decision networks model uses simulated annealing pair from game training system, the network parameter variation module Initial network parameter makes a variation, and the variation of the initial network parameter is random variation.
For above-mentioned decision networks model from game training system, it is non-complete item that the game, which fights the Antagonistic Environment in module, The symmetric game of part fights scene.
The present invention has the following beneficial effects: compared with the prior art
(1) by the present invention in that can be formed the AI intelligent decision-making body of stratification with from game training method, be referred to for game The person of waving provides high-level aid decision and supports;
(2) confrontation both sides' situation power method for quantitatively evaluating of the invention, can be effectively applied to state in complicated Antagonistic Environment The problems such as potential analysis is assessed can reach accurately and rapidly strength judgement from game environment;
(3) present invention can cover the symmetric game confrontation scene of incomplete condition, and applied widely, practicability is stronger;
(4) the method for the present invention and system are the valid data obtained in a large amount of training confrontation, reliability and accurate Property is higher.
Detailed description of the invention
Fig. 1 is the step flow chart of the method for the present invention;
Fig. 2 is that the present invention schemes from the generation of game training sample;
Fig. 3 is EN web frame figure of the present invention;
Fig. 4 be the present invention with hp be input EN1 web frame figure;
Fig. 5 is the present invention whether to be detected as the EN2 web frame figure inputted.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to implementation of the invention Mode is described in further detail.
A kind of decision networks model from game training method, as shown in Figure 1, the decision networks model be singly export certainly Plan network model, includes the following steps:
Step 1: carrying out random variation using initial network parameter of the simulated annealing to EN network, obtained after variation Red EN network and blue party EN network.The EN network is made of multiple EN sub-networks, and the feature input of each EN sub-network is The network structure of same type, each EN sub-network is all the same.
Step 2: red EN network described in step 1 and blue party EN network to be put into the symmetric game of incomplete condition It fights scene and carries out game confrontation, the decision data and EN value of record confrontation key node.
Step 3: being protected to the decision data of the triumph side of game confrontation in step 2 and EN value as effective sample It deposits, the data of the losing side is eliminated.
Step 4: according to the effective sample in step 3, using back-propagation algorithm to the effective sample in step 4 into Row study, then pour into EN network and be trained, the network parameter after being optimized, using the network parameter after optimization as newly Initial network parameter.
Step 5: circulating repetition step 1 to step 4, is realized from game training.Circulating repetition number is more than or equal to 100,000 It is secondary.
A kind of decision networks model fights module, data from game training system, including network parameter variation module, game Choose module, network training module, circulating repetition module;
The network parameter variation module becomes the initial network parameter of EN network using simulated annealing at random Different, red EN network and blue party EN network after being made a variation are then output to the game confrontation module;
Game confrontation module by after variation red EN network and blue party EN network be put into the symmetrical of incomplete condition Game fights scene and carries out game confrontation, and the decision data and EN value of record confrontation key node are then output to the data choosing Modulus block;
The data decimation module is protected using the decision data for the triumph side that game is fought and EN value as effective sample It deposits, then exports the decision data of preservation and EN value to the network training module;
The network training module learns effective sample according to effective sample, using back-propagation algorithm, then It pours into EN network to be trained, the network parameter after being optimized, join the network parameter after optimization as new initial network Number output is to the circulating repetition module;
Initial network parameter is exported and gives network parameter variation module by the circulating repetition module, is realized and is instructed from game Practice.
Embodiment:
The present invention is directed to the dynamic decision demand for fighting game under the incomplete information condition of complex scene by force, is based on from game Training technique and dynamic non-cooperative games are theoretical, and with value assessment network (hereinafter referred to as EN network) for core, mirror image generates red Blue both sides' initial network, before each confrontation using simulated annealing by the network parameter of red blue both sides' initial network carry out with Machine variation, red AI and blue party AI after variation carry out game confrontation in the grafting scene symmetrically fought, collect enough To resisting sample, the decision data of one side of triumph and EN value is selected to retain as effective sample in the sample, picks out confrontation and lose The invalid sample lost learns effective sample by back-propagation algorithm, trains the version that strengthened in a wheel from after game EN network, new EN network replacement initial network is continued into variation and from game training, is recycled with this and achieve the purpose that evolution. Figure is generated as indicated with 2 from game training sample.
(1) EN network
Entire EN net is made of { EN1, EN2 ... ..., ENn } several subnets, and each EN subnet possesses the feature of same type Input, identical network structure, the output of each subnet is as input.The present invention is illustrated by taking fully-connected network structure as an example (while can extend to the network structures such as convolutional neural networks), finally obtains the output of entire EN.Using this method for splitting Convenient for individually training for each EN subnet, training effectiveness is improved;Another advantage is to be easy to subsequent Expansion development, that is, is worked as There is new understanding for confrontation scene, when needing to increase the characteristic type of EN, can first train subnet, obtain good effect Whole network training is added in the subnet afterwards, improves the accuracy of EN.
In the present invention, EN network such as Fig. 3 is constituted using two sub- EN { EN1, EN2 }, and the network inputs feature of EN1 is to fight Unit blood value (hp), network be four-layer network network, wherein hide the number of plies be two layers, input hp characteristic parameter for discretized space 0, 1,2,3,4,5,6 } value in exports as one-dimensional real number space, represents we determined by combat unit blood value or enemy Strength is strong and weak, and as hp higher, EN1 output is larger, as the reduction EN1 of hp is gradually reduced.EN1 network structure is designed such as Fig. 4 institute Show.Hp is input layer, a(2)And a(3)For hidden layer, EN is output layer.
The network inputs feature of EN2 is the detected state (Ship_Detect) of combat unit, network structure and EN1 Network structure is identical, and EN2 represents the strength power that we determined or enemy whether are detected by combat unit, is detected shape State in EN2 value inversely, i.e., when naval vessel is not detected, EN2 is larger, as combat unit is detected one by one, EN2 will be gradually reduced.The design of EN2 network structure is as shown in Figure 5.
(2) from game training method
The present invention learns effective sample by back-propagation algorithm, trains the EN network for the version that strengthened, will be new EN network replacement initial network continue variation and from game training, recycled with this and achieve the purpose that evolution.Specific training step It is rapid as follows:
Step 1: being generated from game network;
Step 2: initial state EN0 randomly selects network parameter according to network model from [- 10,10] is denoted as W0
Step 3: according to simulated annealing initial temperature t0To W0Make a variation, generate two variation value network EN0A, EN0B, EN0A parameter are denoted as W0A、W0B
Step 4: variation after two networks be put into countermeasure system carry out from game fight, using the sample of triumph side as Effective sample retains, and the data of the losing side are eliminated;
Step 5: with effective sample using error backpropagation algorithm to the network parameter W of EN00It is trained, after training The network EN1 to be evolved, corresponding network parameter are W1
Step 6: taking new temperature according to coefficient of temperature drop a=0.5, make a variation to EN1.
Step 7: above-mentioned 4th step to the 6th step of circulating repetition, is realized from game training.
It is analyzed using method of the invention in vessel position anticipation, trained network model is subjected to confrontation survey Examination, test result is as follows table 1.The result shows that prejudging network model based on trained vessel position, enemy's ship is carried out pre- The Average Accuracy sentenced is 81.8%.And do not use vessel position anticipation network model situation anticipation Average Accuracy for 50.07%.
Table 1
Furthermore there are also following features by the present invention:
(1) platform of the invention that a simulation is needed from game training method is realized;
(2) realize that effect of the invention at least needs training 100,000 disks confrontation;
(3) application scenarios of the invention must be symmetrical game confrontation scene.
AlphaGo Zero's is master of the present invention from game training technique and of the invention distinguish from game training technique Solve single training method for exporting decision networks model.
The difference of the present invention and BP algorithm is mainly to solve the problems, such as that the game under symmetrical scene is fought, and is in no sample The training method realized under conditions of this.
Of the invention makes a variation the parameter of value assessment network from game training requirement simultaneously, and MCTS is obviously uncomfortable With.
The content that description in the present invention is not described in detail belongs to the well-known technique of those skilled in the art.

Claims (11)

1. a kind of decision networks model is from game training method, characterized by the following steps:
Step 1: making a variation using initial network parameter of the simulated annealing to EN network, red EN net is obtained after variation Network and blue party EN network;
Step 2: red EN network described in step 1 and blue party EN network, which are put into Antagonistic Environment, carries out game confrontation, note The decision data and EN value of record confrontation key node;
Step 3: the decision data of the triumph side of game confrontation in step 2 and EN value are saved as effective sample, it will The data of the losing side are eliminated;
Step 4: being trained according to the effective sample in step 3 to EN network, the network parameter after being optimized will optimize Network parameter afterwards is as new initial network parameter;
Step 5: circulating repetition step 1 to step 4, is realized from game training.
2. a kind of decision networks model according to claim 1 is from game training method, it is characterised in that: institute in step 2 It states the symmetric game that Antagonistic Environment is incomplete condition and fights scene.
3. a kind of decision networks model according to claim 1 is from game training method, it is characterised in that: passed using reversed It broadcasts algorithm to learn the effective sample in step 4, then pours into EN network and be trained.
4. a kind of decision networks model according to claim 1 is from game training method, it is characterised in that: institute in step 1 It states and is made a variation using simulated annealing to initial network parameter, the variation of the initial network parameter is random variation.
5. a kind of decision networks model according to claim 1 is from game training method, it is characterised in that: the EN network It is made of multiple EN sub-networks, the feature input of each EN sub-network is same type, and the network structure of each EN sub-network is homogeneous Together.
6. a kind of decision networks model according to claim 1 is from game training method, it is characterised in that: the step 5 Circulating repetition number be more than or equal to 100,000 times.
7. a kind of decision networks model according to claim 1 is from game training method, it is characterised in that: the decision-making mode Network model is the decision networks model singly exported.
8. a kind of decision networks model is from game training system, it is characterised in that: including network parameter variation module, game confrontation Module, data decimation module, network training module, circulating repetition module;
The network parameter variation module makes a variation to the initial network parameter of EN network using simulated annealing, is become Red EN network and blue party EN network after different are then output to the game confrontation module;
Game confrontation module by after variation red EN network and blue party EN network be put into progress game pair in Antagonistic Environment Anti-, the decision data and EN value of record confrontation key node are then output to the data decimation module;
The data decimation module is saved using the decision data for the triumph side that game is fought and EN value as effective sample, so The decision data of preservation and EN value are exported to the network training module afterwards;
The network training module is trained EN network according to effective sample, and the network parameter after being optimized will optimize Network parameter afterwards is exported as new initial network parameter to the circulating repetition module;
Initial network parameter is exported and gives network parameter variation module by the circulating repetition module, is realized from game training.
9. a kind of decision networks model according to claim 8 is from game training system, it is characterised in that: the network instruction Practice module to learn effective sample using back-propagation algorithm, then pours into EN network and be trained.
10. a kind of decision networks model according to claim 8 is from game training system, it is characterised in that: the network Parameter variation module makes a variation to initial network parameter using simulated annealing, and the variation of the initial network parameter is random Variation.
11. a kind of decision networks model according to claim 8 is from game training system, it is characterised in that: the game It fights the symmetric game that the Antagonistic Environment in module is incomplete condition and fights scene.
CN201811410380.0A 2018-11-23 2018-11-23 Decision network model self-game training method and system Active CN109598342B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811410380.0A CN109598342B (en) 2018-11-23 2018-11-23 Decision network model self-game training method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811410380.0A CN109598342B (en) 2018-11-23 2018-11-23 Decision network model self-game training method and system

Publications (2)

Publication Number Publication Date
CN109598342A true CN109598342A (en) 2019-04-09
CN109598342B CN109598342B (en) 2021-07-13

Family

ID=65958708

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811410380.0A Active CN109598342B (en) 2018-11-23 2018-11-23 Decision network model self-game training method and system

Country Status (1)

Country Link
CN (1) CN109598342B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110782039A (en) * 2019-10-11 2020-02-11 南京摄星智能科技有限公司 Artificial intelligence instant combat guide platform based on layered structure and multiple modules
CN110852436A (en) * 2019-10-18 2020-02-28 桂林力港网络科技股份有限公司 Data processing method, device and storage medium for electronic poker game
CN110841295A (en) * 2019-11-07 2020-02-28 腾讯科技(深圳)有限公司 Data processing method based on artificial intelligence and related device
CN111598253A (en) * 2019-05-13 2020-08-28 谷歌有限责任公司 Training machine learning models using teacher annealing
CN111667075A (en) * 2020-06-12 2020-09-15 杭州浮云网络科技有限公司 Service execution method, device and related equipment
CN112380780A (en) * 2020-11-27 2021-02-19 中国运载火箭技术研究院 Symmetric scene grafting method for asymmetric confrontation scene self-game training
CN112434791A (en) * 2020-11-13 2021-03-02 北京圣涛平试验工程技术研究院有限责任公司 Multi-agent strong countermeasure simulation method and device and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426969A (en) * 2015-08-11 2016-03-23 浙江大学 Game strategy generation method of non-complete information
US20160252902A1 (en) * 2011-11-08 2016-09-01 United States Of America, As Represented By The Secretary Of The Navy System and Method for Predicting An Adequate Ratio of Unmanned Vehicles to Operators
CN106503642A (en) * 2016-10-18 2017-03-15 长园长通新材料股份有限公司 A kind of model of vibration method for building up for being applied to optical fiber sensing system
CN107729953A (en) * 2017-09-18 2018-02-23 清华大学 Robot plume method for tracing based on continuous state behavior domain intensified learning
CN107958206A (en) * 2017-11-07 2018-04-24 北京临近空间飞行器系统工程研究所 A kind of aircraft surface heat flux unit temp measurement data preprocess method
CN108170531A (en) * 2017-12-26 2018-06-15 北京工业大学 A kind of cloud data center request stream scheduling method based on depth belief network
US20180247107A1 (en) * 2015-09-30 2018-08-30 Siemens Healthcare Gmbh Method and system for classification of endoscopic images using deep decision networks

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160252902A1 (en) * 2011-11-08 2016-09-01 United States Of America, As Represented By The Secretary Of The Navy System and Method for Predicting An Adequate Ratio of Unmanned Vehicles to Operators
CN105426969A (en) * 2015-08-11 2016-03-23 浙江大学 Game strategy generation method of non-complete information
US20180247107A1 (en) * 2015-09-30 2018-08-30 Siemens Healthcare Gmbh Method and system for classification of endoscopic images using deep decision networks
CN106503642A (en) * 2016-10-18 2017-03-15 长园长通新材料股份有限公司 A kind of model of vibration method for building up for being applied to optical fiber sensing system
CN107729953A (en) * 2017-09-18 2018-02-23 清华大学 Robot plume method for tracing based on continuous state behavior domain intensified learning
CN107958206A (en) * 2017-11-07 2018-04-24 北京临近空间飞行器系统工程研究所 A kind of aircraft surface heat flux unit temp measurement data preprocess method
CN108170531A (en) * 2017-12-26 2018-06-15 北京工业大学 A kind of cloud data center request stream scheduling method based on depth belief network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
U. JANJARASSUK 等: "A Simulated Annealing Algorithm to the Stochastic Network Interdiction Problem", 《IEEE》 *
顾佼佼 等: "基于博弈论及Memetic算法求解的空战机动决策框架", 《电光与控制》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111598253A (en) * 2019-05-13 2020-08-28 谷歌有限责任公司 Training machine learning models using teacher annealing
CN110782039A (en) * 2019-10-11 2020-02-11 南京摄星智能科技有限公司 Artificial intelligence instant combat guide platform based on layered structure and multiple modules
CN110782039B (en) * 2019-10-11 2021-10-01 南京摄星智能科技有限公司 Artificial intelligence instant combat guide platform based on layered structure and multiple modules
CN110852436A (en) * 2019-10-18 2020-02-28 桂林力港网络科技股份有限公司 Data processing method, device and storage medium for electronic poker game
CN110852436B (en) * 2019-10-18 2023-08-01 桂林力港网络科技股份有限公司 Data processing method, device and storage medium for electronic poker game
CN110841295A (en) * 2019-11-07 2020-02-28 腾讯科技(深圳)有限公司 Data processing method based on artificial intelligence and related device
CN111667075A (en) * 2020-06-12 2020-09-15 杭州浮云网络科技有限公司 Service execution method, device and related equipment
CN112434791A (en) * 2020-11-13 2021-03-02 北京圣涛平试验工程技术研究院有限责任公司 Multi-agent strong countermeasure simulation method and device and electronic equipment
CN112380780A (en) * 2020-11-27 2021-02-19 中国运载火箭技术研究院 Symmetric scene grafting method for asymmetric confrontation scene self-game training

Also Published As

Publication number Publication date
CN109598342B (en) 2021-07-13

Similar Documents

Publication Publication Date Title
CN109598342A (en) A kind of decision networks model is from game training method and system
CN113420326B (en) Deep reinforcement learning-oriented model privacy protection method and system
CN114358141A (en) Multi-agent reinforcement learning method oriented to multi-combat-unit cooperative decision
CN114757351B (en) Defense method for resisting attack by deep reinforcement learning model
Chong et al. Observing the evolution of neural networks learning to play the game of Othello
CN113052289B (en) Method for selecting cluster hitting position of unmanned ship based on game theory
CN113392396A (en) Strategy protection defense method for deep reinforcement learning
Hou et al. Advances in memetic automaton: Toward human-like autonomous agents in complex multi-agent learning problems
CN117078182A (en) Air defense and reflection conductor system cooperative method, device and equipment of heterogeneous network
CN102955948B (en) A kind of distributed mode recognition methods based on multiple agent
CN107169561A (en) Towards the hybrid particle swarm impulsive neural networks mapping method of power consumption
Sasaki et al. A neural network program of tsume-go
Liu et al. An improved minimax-Q algorithm based on generalized policy iteration to solve a Chaser-Invader game
CN113255883A (en) Weight initialization method based on power law distribution
Zhang et al. Tactical reward shaping: Bypassing reinforcement learning with strategy-based goals
Teixeira et al. A new hybrid nature-inspired metaheuristic for problem solving based on the social interaction genetic algorithm employing fuzzy systems
Yin et al. Computer Assisted Operational Agent Training Method through Deep Learning and Artificial Intelligence Technology
Ikuta et al. Multi-layer perceptron with glial network for solving two-spiral problem
CN112380780A (en) Symmetric scene grafting method for asymmetric confrontation scene self-game training
Niu et al. A neural-evolutionary model for case-based planning in real time strategy games
Tao et al. Design and Application of Computer Games Algorithm of Checkers
CN117454966A (en) Multi-domain collaborative reinforcement learning solution method oriented to large-scale decision space
Bossuyt et al. Introduction: The EU and China in Central Asia:(Un) natural partners?
Mo et al. Research on virtual human swarm football collaboration technology based on reinforcement learning
Battaglia SMaILE game: application of search and learning algorithm within combinatorial game theory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant