CN108921298A - Intensified learning multiple agent is linked up and decision-making technique - Google Patents
Intensified learning multiple agent is linked up and decision-making technique Download PDFInfo
- Publication number
- CN108921298A CN108921298A CN201810606662.1A CN201810606662A CN108921298A CN 108921298 A CN108921298 A CN 108921298A CN 201810606662 A CN201810606662 A CN 201810606662A CN 108921298 A CN108921298 A CN 108921298A
- Authority
- CN
- China
- Prior art keywords
- intelligent body
- cluster
- decision
- intelligent
- state feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Abstract
The invention discloses a kind of communication of intensified learning multiple agent and decision-making techniques, including:Corresponding state feature is extracted by neural network according to the observation state information of each intelligent body;Using the state feature of all intelligent bodies as linking up information input to carrying out soft distribution in VLAD layer and cluster, the communication information after being clustered;Communication information after cluster is distributed to each intelligent body, is polymerize the state feature of itself with the communication information after the cluster received by each intelligent body, and movement decision is carried out by the full Connection Neural Network inside intelligent body.This method can the status information to each intelligent body carry out cluster and linked up with other intelligent bodies, and then improve intelligent body level of decision-making.
Description
Technical field
The present invention relates to multiple agent deeply learning art field more particularly to a kind of intensified learning multiple agent ditches
Logical and decision-making technique.
Background technique
Intensified learning (Reinforcement Learning) is a kind of can be achieved directly from environment sensing to movement mapping
Algorithm.By inputting perception information (such as visual information, status information), mapping model output action is then established, in turn
Realize the decision process of intelligent body (Agent) in circumstances not known.Deeply study combines deep neural network and reinforcing
The advantage of study can effectively solve perception decision problem of the intelligent body (Agent) under the strange complex environment of higher-dimension.Tradition
Supervised learning algorithm usually require the largely training data that manually marks, while the obtained model of training it is horizontal also by
It is limited to the level of training data.Intensified learning, which passes through, constantly generates data with environmental interaction, and not according to the feedback of environment
Disconnected iteration itself strategy.The data manually marked are depended on to solve supervised learning method to a certain extent, are also limited
In the problem of human data level.Therefore, depth enhancing study is the forward position research direction of general artificial intelligence field, is had wide
Wealthy application prospect.
The case where common deeply study is mainly applied to single intelligent body (Single-Agent), i.e., in environment
Only one Agent constantly interacts with environment and then obtains sample, and training one depth-size strategy network-control one
Agent.And the problem of being more multiple agent in actual environment, i.e. environment, have multiple intelligent bodies to carry out decision, multiple intelligence
It influences each other between body, the common state for changing environment.There are also different relationships, (such as competitive relation is closed between multiple intelligent bodies
Make relationship etc.).For single intelligent body, when carrying out decision in multiple agent environment, and meanwhile it should also be taken into account that it is teammate, right
Hand state in which and their strategy.Many problems in natural world and human society can be regarded as multiple agent
Gambling process (such as vehicular traffic travel, be related to the game etc. of more people), therefore the nitrification enhancement based on multiple agent
It has broad application prospects, while being also the only way which must be passed that the mankind realize strong artificial intelligence.
However, existing nitrification enhancement is typically only capable to the neural network model of cooperation lightweight, under complex model
Performance and bad.Therefore efficient, succinct, practical neural network model how is designed, the relationship between description intelligent body comprehensively
While, guarantee being condensed into for the key of multiple agent intensified learning method for network structure.
Summary of the invention
The object of the present invention is to provide a kind of communication of intensified learning multiple agent and decision-making techniques, can be to each intelligent body
Status information cluster and linked up with other intelligent bodies, and then improves the level of decision-making of intelligent body.
The purpose of the present invention is what is be achieved through the following technical solutions:
A kind of intensified learning multiple agent is linked up and decision-making technique, including:
Corresponding state feature is extracted by neural network according to the observation state information of each intelligent body;
Using the state feature of all intelligent bodies as information input is linked up to carrying out soft distribution in VLAD layer and cluster, obtain
Communication information after cluster;
Communication information after cluster is distributed to each intelligent body, by the state feature of itself and is received by each intelligent body
To cluster after communication information polymerize, and movement decision is carried out by the full Connection Neural Network inside intelligent body.
As seen from the above technical solution provided by the invention, it can be propagated based on gradient, cluster centre can learn
The intensified learning multiple agent communication mechanism of VLAD, for the collaborative problem between intelligent body under multiple agent environment, Ke Yishi
It is effectively linked up between existing intelligent body and status information is interactive, have very strong robust simultaneously for the dynamic change of intelligent body quantity
Property, the final performance for improving neural network model.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, required use in being described below to embodiment
Attached drawing be briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this
For the those of ordinary skill in field, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing.
Fig. 1 is network architecture schematic diagram provided in an embodiment of the present invention;
Fig. 2 is the flow chart of a kind of intensified learning multiple agent communication and decision-making technique provided in an embodiment of the present invention
The network architecture schematic diagram that Fig. 3 is VLAD layers provided in an embodiment of the present invention.
Specific embodiment
With reference to the attached drawing in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete
Ground description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Based on this
The embodiment of invention, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, belongs to protection scope of the present invention.
In order to make intelligent physical efficiency preferably cooperate, compete, while have algorithm to the dynamic change of intelligent body quantity
Stronger robustness, the embodiment of the present invention provide a kind of intensified learning multiple agent and link up and decision-making technique, can be
In the training optimization process of multiple agent intensified learning strategy network, communication mechanism is established between each intelligent body, to each intelligence
Energy body state in which carries out cluster coding, and each intelligent body is compiled according to oneself state information and the state of other intelligent bodies later
Code information can carry out decision;Entire communication mechanism is simple and effective, dynamic change robustness of the communication mechanism to intelligent body quantity
By force, while end-to-end mapping from ambient condition to intelligent body strategy is realized.
In the embodiment of the present invention, the joint decision of multiple agent, network model knot are realized using the neural network of multilayer
Structure is as shown in Figure 1, the realization process of correlation technique is as shown in Figure 2.
Referring to Fig. 1, it is assumed that now have N number of intelligent body in the environment, this N number of intelligent body in the environment it is observed that environment
Different status information is respectively s1,s2,……sN, neural network module inside each moment t, each intelligent body
f1……fmCorresponding movement can be generated according to its state in which, if the movement that each intelligent body is taken is respectively a1,
a2,……aN, after all intelligent bodies have been carried out movement, each intelligent body is believed the reward that environmental feedback is returned is received
Cease rt.Wherein rtIt is related with the movement of all intelligent bodies selection in environment, namely in embodiments of the present invention, all intelligent bodies
It is all identical in the environment reward that each moment receives.
Referring to fig. 2, the realization process of correlation technique mainly includes:
Step 1 extracts corresponding state feature by neural network according to the observation state information of each intelligent body.
In the embodiment of the present invention, the observation state information of each intelligent body is subjected to manual coding, is realized from physical world
To the mapping of mathematical space, coding result can be vector form or graphic form;If mapping code result is vector form,
The feature that does well then is extracted by MLP network;If mapping code result is graphic form, shape is extracted by CNN network
State feature.
Step 2, using the state feature of all intelligent bodies as linking up information input to carrying out soft distribution in VLAD layer and gather
Class, the communication information after being clustered.
It in the embodiment of the present invention, can be transmitted using gradient, the VLAD (partial polymerization description vectors) that cluster centre can learn
Layer, structure are as shown in Figure 3.
In the embodiment of the present invention, VLAD cluster is carried out to the state feature of each intelligent body by the way of soft distribution, point
After dispensing respectively clusters weight a little by the multiplication of state characteristic weighing, cooperation softmax formula is provided, and is expressed as:
In above formula, wk(Xi) indicate i-th of intelligent body state feature XiDistribute to the weight of k-th of cluster centre, ak、bk
For the corresponding soft distribution weight of k-th of cluster centre, akFor row vector, bkFor scalar, xiFor the state feature of i-th of intelligent body
XiRepresented column vector, traversal of the k ' expression to k all cluster centres, ak′、bk′Indicate that kth ' a cluster centre is corresponding
Soft distribution weight, ak′For row vector, bk′For scalar.
In the embodiment of the present invention, the convolution kernel that 1*1 can be used realizes weight computations a in soft distributionkxi+bk;It
The weight w of soft distribution is further calculated out using the softmax layer in neural network afterwardsk(Xi)。
Thought later based on VLAD cluster, final cluster result is by feature space between vector sum cluster centre
Distance characterization, the cluster result of k-th of cluster centre are as follows:
Wherein, V (j, k) is the communication information after the cluster result of k-th of cluster centre jth dimension, namely cluster;xi(j)
For the state feature X of i-th of intelligent bodyiJth dimension in represented column vector, ckIt (j) is the jth dimension of k-th of cluster centre point
Coordinate, N are intelligent body quantity.
It, can be by VLAD core layer according to w in the embodiment of the present inventionk(Xi) and Xi, complete the distribution of specific cluster centre
With the generation work of final VLAD vector, this layer is mainly made of the plus-minus module of vector.
Communication information after cluster is distributed to each intelligent body by step 3, by each intelligent body by the state feature of itself
It is polymerize with the communication information after the cluster received, and is moved by the full Connection Neural Network module inside intelligent body
It makes decision.
In the embodiment of the present invention, each intelligent body adopts the state feature of itself and the communication information after the cluster that receives
It is polymerize with concatenated mode;Then, the optional n of intelligent body is generated by the full Connection Neural Network module inside intelligent body
A movement a1,a2,……anCorresponding probability distribution p1,p2,……pn;The full Connection Neural Network is one or more layers, is inputted
Layer dimension is the sum of the dimension of the communication information after state feature and cluster, output layer dimension and optional movement a1,a2,……an
It is corresponding, so its dimension is n;After the probability distribution of n movement of generation, it can be sampled according to probability and generate final move
Make, the movement that can also choose maximum probability is the final movement of the intelligent body;Each intelligent body according to itself state not
Together, in conjunction with information is linked up, there may be identical movements, it is also possible to generate different movements.
On the other hand, each intelligent body of the embodiment of the present invention will receive the reward letter that environmental feedback is returned after being carried out movement
It ceases, the incentive message that sharing model parameters and environmental feedback are returned between each intelligent body, before being measured by the size of reward
The quality for the movement taken, and then training smart body Model uses preferably strategy when next time with environmental interaction.And it adopts
With the mode of course transfer learning, it is stepped up the complexity of environment and the quantity of intelligent body in the training process, thus plus
The training speed of fast model.
Course transfer learning refers to the complexity that environment is stepped up in the training process of model, first relatively easy
Environment in (such as in the environment of intelligent body negligible amounts) training pattern, later using trained parameter in more complicated ring
It is trained in border, is finally slowly transitioned into desired complex environment.Meanwhile in the training process, the intelligent body of same type
All-network model (including handle observation state information neural network, VLAD layer, final generation act decision full connection
Neural network) parameter be all it is shared, the reward feedback signal that each intelligent body is obtained from environment is also identical, Ge Gezhi
Energy body updates the same model parameter according to the state iteration of itself.The model parameter of different types of intelligent body is different, environment
It is identical to reward feedback signal.Therefore the model in the embodiment of the present invention has very strong robustness to the variation of intelligent body quantity.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment can
The mode of necessary general hardware platform can also be added to realize by software by software realization.Based on this understanding,
The technical solution of above-described embodiment can be embodied in the form of software products, which can store non-easy at one
In the property lost storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.), including some instructions are with so that a computer is set
Standby (can be personal computer, server or the network equipment etc.) executes method described in each embodiment of the present invention.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto,
Within the technical scope of the present disclosure, any changes or substitutions that can be easily thought of by anyone skilled in the art,
It should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with the protection model of claims
Subject to enclosing.
Claims (5)
1. a kind of intensified learning multiple agent is linked up and decision-making technique, which is characterized in that including:
Corresponding state feature is extracted by neural network according to the observation state information of each intelligent body;
Using the state feature of all intelligent bodies as information input is linked up to carrying out soft distribution in VLAD layer and cluster, clustered
Communication information afterwards;
Communication information after cluster is distributed to each intelligent body, by each intelligent body by the state feature of itself with receive
Communication information after cluster is polymerize, and carries out movement decision by the full Connection Neural Network inside intelligent body.
2. a kind of intensified learning multiple agent according to claim 1 is linked up and decision-making technique, which is characterized in that state is special
The process of sign includes:
The observation state information of each intelligent body is subjected to manual coding, realizes the mapping from physical world to mathematical space, is compiled
Code result is vector form or graphic form;
If mapping code result is vector form, the feature that does well is extracted by MLP network;
If mapping code result is graphic form, the feature that does well is extracted by CNN network.
3. a kind of intensified learning multiple agent according to claim 1 is linked up and decision-making technique, which is characterized in that VLAD layers
It carries out soft distribution and the process of cluster includes:
VLAD cluster is carried out to the state feature of each intelligent body by the way of soft distribution, distribute to the weight of each cluster point by
After state characteristic weighing is multiplied, cooperation softmax formula is provided, and is expressed as:
In above formula, wk(Xi) indicate i-th of intelligent body state feature XiDistribute to the weight of k-th of cluster centre, ak、bkIt is
The corresponding soft distribution weight of k cluster centre, xiFor the state feature X of i-th of intelligent bodyiRepresented column vector, k ' expression pair
The traversal of k all cluster centres, ak′、bk′Indicate the corresponding soft distribution weight of kth ' a cluster centre;
Final cluster result is characterized by the distance between vector sum cluster centre in feature space, k-th cluster centre it is poly-
Class result is as follows:
Wherein, V (j, k) is the communication information after the cluster result of k-th of cluster centre jth dimension, namely cluster;xiIt (j) is i-th
The state feature X of a intelligent bodyiJth dimension in represented column vector, ck(j) coordinate is tieed up for the jth of k-th of cluster centre point,
N is intelligent body quantity.
4. a kind of intensified learning multiple agent according to claim 1 is linked up and decision-making technique, which is characterized in that described to incite somebody to action
The state feature of itself is polymerize with the communication information after the cluster received, and passes through the full connection nerve inside intelligent body
Network carries out movement decision:
Each intelligent body will be carried out the state feature of itself with the communication information after the cluster that receives using concatenated mode
Polymerization;
Then, the optional n movement a of intelligent body is generated by the full Connection Neural Network inside intelligent body1, a2... anIt is corresponding
Probability distribution p1, p2... pn;After generating the probability distribution that n acts, final move is generated according to probability sampling
Make, or choosing the movement of maximum probability is the final movement of the intelligent body;
The full Connection Neural Network is one or more layers, and input layer dimension is the dimension of the communication information after state feature and cluster
The sum of, output layer dimension and optional movement a1, a2... anIt is corresponding, dimension n.
5. a kind of intensified learning multiple agent according to claim 1 is linked up and decision-making technique, which is characterized in that Ge Gezhi
Energy body executes the incentive message that will be returned by environmental feedback after a movement, sharing model parameters and ring between each intelligent body
The incentive message that border is fed back, the quality for the movement taken before being measured by the size of reward, and then training smart body exists
Next time with when environmental interaction using preferably strategy;Meanwhile by the way of course transfer learning, in the training process gradually
Increase the complexity of environment and the quantity of intelligent body.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810606662.1A CN108921298B (en) | 2018-06-12 | 2018-06-12 | Multi-agent communication and decision-making method for reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810606662.1A CN108921298B (en) | 2018-06-12 | 2018-06-12 | Multi-agent communication and decision-making method for reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108921298A true CN108921298A (en) | 2018-11-30 |
CN108921298B CN108921298B (en) | 2022-04-19 |
Family
ID=64419238
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810606662.1A Active CN108921298B (en) | 2018-06-12 | 2018-06-12 | Multi-agent communication and decision-making method for reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108921298B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109960259A (en) * | 2019-02-15 | 2019-07-02 | 青岛大学 | A kind of unmanned guiding vehicle paths planning method of the multiple agent intensified learning based on gradient gesture |
CN109978176A (en) * | 2019-03-05 | 2019-07-05 | 华南理工大学 | A kind of multiple agent cooperative learning methods based on state dynamic sensing |
CN110070099A (en) * | 2019-02-20 | 2019-07-30 | 北京航空航天大学 | A kind of industrial data feature structure method based on intensified learning |
CN110119749A (en) * | 2019-05-16 | 2019-08-13 | 北京小米智能科技有限公司 | Identify method and apparatus, the storage medium of product image |
CN110554604A (en) * | 2019-08-08 | 2019-12-10 | 中国地质大学(武汉) | multi-agent synchronous control method, equipment and storage equipment |
WO2020199690A1 (en) * | 2019-03-29 | 2020-10-08 | 深圳先进技术研究院 | Cloud platform-based sharing learning system and method, sharing platform and method, and medium |
CN112215350A (en) * | 2020-09-17 | 2021-01-12 | 天津(滨海)人工智能军民融合创新中心 | Smart agent control method and device based on reinforcement learning |
CN112260733A (en) * | 2020-11-10 | 2021-01-22 | 东南大学 | Multi-agent deep reinforcement learning-based MU-MISO hybrid precoding design method |
CN112507104A (en) * | 2020-12-18 | 2021-03-16 | 北京百度网讯科技有限公司 | Dialog system acquisition method, apparatus, storage medium and computer program product |
CN112926729A (en) * | 2021-05-06 | 2021-06-08 | 中国科学院自动化研究所 | Man-machine confrontation intelligent agent strategy making method |
CN113110582A (en) * | 2021-04-22 | 2021-07-13 | 中国科学院重庆绿色智能技术研究院 | Unmanned aerial vehicle cluster intelligent system control method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104698854A (en) * | 2015-03-26 | 2015-06-10 | 哈尔滨工业大学 | Distributed fuzzy cooperative tracking control method of network Euler-Lagrange system |
CN106649456A (en) * | 2016-09-23 | 2017-05-10 | 西安电子科技大学 | Cluster and outlier detection method based on multi-agent evolution |
US9860391B1 (en) * | 2003-03-07 | 2018-01-02 | Wai Wu | Method and system for matching entities in an auction |
US20180032858A1 (en) * | 2015-12-14 | 2018-02-01 | Stats Llc | System and method for predictive sports analytics using clustered multi-agent data |
CN108108759A (en) * | 2017-12-19 | 2018-06-01 | 四川九洲电器集团有限责任公司 | A kind of dynamic of multiple agent compiles group's method |
-
2018
- 2018-06-12 CN CN201810606662.1A patent/CN108921298B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9860391B1 (en) * | 2003-03-07 | 2018-01-02 | Wai Wu | Method and system for matching entities in an auction |
CN104698854A (en) * | 2015-03-26 | 2015-06-10 | 哈尔滨工业大学 | Distributed fuzzy cooperative tracking control method of network Euler-Lagrange system |
US20180032858A1 (en) * | 2015-12-14 | 2018-02-01 | Stats Llc | System and method for predictive sports analytics using clustered multi-agent data |
CN106649456A (en) * | 2016-09-23 | 2017-05-10 | 西安电子科技大学 | Cluster and outlier detection method based on multi-agent evolution |
CN108108759A (en) * | 2017-12-19 | 2018-06-01 | 四川九洲电器集团有限责任公司 | A kind of dynamic of multiple agent compiles group's method |
Non-Patent Citations (4)
Title |
---|
JAKOB N. FOERSTER等: "Learning to Communicate with Deep Multi-Agent Reinforcement Learning", 《ARXIV》 * |
ROYA ASADI等: "A Framework For Intelligent Multi Agent System Based Neural Network Classification Model", 《ARXIV》 * |
潘晓英等: "密度敏感的多智能体进化聚类算法", 《软件学报》 * |
范波等: "一种基于分布式强化学习的多智能体协调方法", 《计算机仿真》 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109960259B (en) * | 2019-02-15 | 2021-09-24 | 青岛大学 | Multi-agent reinforcement learning unmanned guided vehicle path planning method based on gradient potential |
CN109960259A (en) * | 2019-02-15 | 2019-07-02 | 青岛大学 | A kind of unmanned guiding vehicle paths planning method of the multiple agent intensified learning based on gradient gesture |
CN110070099A (en) * | 2019-02-20 | 2019-07-30 | 北京航空航天大学 | A kind of industrial data feature structure method based on intensified learning |
CN109978176A (en) * | 2019-03-05 | 2019-07-05 | 华南理工大学 | A kind of multiple agent cooperative learning methods based on state dynamic sensing |
WO2020199690A1 (en) * | 2019-03-29 | 2020-10-08 | 深圳先进技术研究院 | Cloud platform-based sharing learning system and method, sharing platform and method, and medium |
CN110119749A (en) * | 2019-05-16 | 2019-08-13 | 北京小米智能科技有限公司 | Identify method and apparatus, the storage medium of product image |
CN110554604A (en) * | 2019-08-08 | 2019-12-10 | 中国地质大学(武汉) | multi-agent synchronous control method, equipment and storage equipment |
CN110554604B (en) * | 2019-08-08 | 2021-07-09 | 中国地质大学(武汉) | Multi-agent synchronous control method, equipment and storage equipment |
CN112215350A (en) * | 2020-09-17 | 2021-01-12 | 天津(滨海)人工智能军民融合创新中心 | Smart agent control method and device based on reinforcement learning |
CN112215350B (en) * | 2020-09-17 | 2023-11-03 | 天津(滨海)人工智能军民融合创新中心 | Method and device for controlling agent based on reinforcement learning |
CN112260733A (en) * | 2020-11-10 | 2021-01-22 | 东南大学 | Multi-agent deep reinforcement learning-based MU-MISO hybrid precoding design method |
CN112260733B (en) * | 2020-11-10 | 2022-02-01 | 东南大学 | Multi-agent deep reinforcement learning-based MU-MISO hybrid precoding design method |
CN112507104A (en) * | 2020-12-18 | 2021-03-16 | 北京百度网讯科技有限公司 | Dialog system acquisition method, apparatus, storage medium and computer program product |
CN113110582A (en) * | 2021-04-22 | 2021-07-13 | 中国科学院重庆绿色智能技术研究院 | Unmanned aerial vehicle cluster intelligent system control method |
CN113110582B (en) * | 2021-04-22 | 2023-06-02 | 中国科学院重庆绿色智能技术研究院 | Unmanned aerial vehicle cluster intelligent system control method |
CN112926729B (en) * | 2021-05-06 | 2021-08-03 | 中国科学院自动化研究所 | Man-machine confrontation intelligent agent strategy making method |
CN112926729A (en) * | 2021-05-06 | 2021-06-08 | 中国科学院自动化研究所 | Man-machine confrontation intelligent agent strategy making method |
Also Published As
Publication number | Publication date |
---|---|
CN108921298B (en) | 2022-04-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108921298A (en) | Intensified learning multiple agent is linked up and decision-making technique | |
CN111291836B (en) | Method for generating student network model | |
CN104662526B (en) | Apparatus and method for efficiently updating spiking neuron network | |
CN109299262A (en) | A kind of text implication relation recognition methods for merging more granular informations | |
CN107092959A (en) | Hardware friendly impulsive neural networks model based on STDP unsupervised-learning algorithms | |
CN110134774A (en) | It is a kind of based on the image vision Question-Answering Model of attention decision, method and system | |
CN108090658A (en) | Arc fault diagnostic method based on time domain charactreristic parameter fusion | |
CN108427989B (en) | Deep space-time prediction neural network training method for radar echo extrapolation | |
CN107247989A (en) | A kind of neural network training method and device | |
CN108171323A (en) | A kind of artificial neural networks device and method | |
CN104504520B (en) | A kind of autonomous mission planning method of deep space probe based on neutral net | |
CN104636985A (en) | Method for predicting radio disturbance of electric transmission line by using improved BP (back propagation) neural network | |
CN104636801A (en) | Transmission line audible noise prediction method based on BP neural network optimization | |
CN107705556A (en) | A kind of traffic flow forecasting method combined based on SVMs and BP neural network | |
CN107292352A (en) | Image classification method and device based on convolutional neural networks | |
CN106529818A (en) | Water quality evaluation prediction method based on fuzzy wavelet neural network | |
CN110334196B (en) | Neural network Chinese problem generation system based on strokes and self-attention mechanism | |
CN105976020A (en) | Network flow prediction method considering wavelet cross-layer correlations | |
CN113780002A (en) | Knowledge reasoning method and device based on graph representation learning and deep reinforcement learning | |
CN109726676A (en) | The planing method of automated driving system | |
CN114398976A (en) | Machine reading understanding method based on BERT and gate control type attention enhancement network | |
CN111931934A (en) | Affine transformation solving method under mass control points based on improved genetic algorithm | |
An et al. | A unified information perceptron using deep reservoir computing | |
Tong et al. | Enhancing rolling horizon evolution with policy and value networks | |
Guo et al. | Skewed normal cloud modified whale optimization algorithm for degree reduction of S-λ curves |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |