CN108921298A - Intensified learning multiple agent is linked up and decision-making technique - Google Patents

Intensified learning multiple agent is linked up and decision-making technique Download PDF

Info

Publication number
CN108921298A
CN108921298A CN201810606662.1A CN201810606662A CN108921298A CN 108921298 A CN108921298 A CN 108921298A CN 201810606662 A CN201810606662 A CN 201810606662A CN 108921298 A CN108921298 A CN 108921298A
Authority
CN
China
Prior art keywords
intelligent body
cluster
decision
intelligent
state feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810606662.1A
Other languages
Chinese (zh)
Other versions
CN108921298B (en
Inventor
查正军
李厚强
温忻
李斌
王子磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN201810606662.1A priority Critical patent/CN108921298B/en
Publication of CN108921298A publication Critical patent/CN108921298A/en
Application granted granted Critical
Publication of CN108921298B publication Critical patent/CN108921298B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Abstract

The invention discloses a kind of communication of intensified learning multiple agent and decision-making techniques, including:Corresponding state feature is extracted by neural network according to the observation state information of each intelligent body;Using the state feature of all intelligent bodies as linking up information input to carrying out soft distribution in VLAD layer and cluster, the communication information after being clustered;Communication information after cluster is distributed to each intelligent body, is polymerize the state feature of itself with the communication information after the cluster received by each intelligent body, and movement decision is carried out by the full Connection Neural Network inside intelligent body.This method can the status information to each intelligent body carry out cluster and linked up with other intelligent bodies, and then improve intelligent body level of decision-making.

Description

Intensified learning multiple agent is linked up and decision-making technique
Technical field
The present invention relates to multiple agent deeply learning art field more particularly to a kind of intensified learning multiple agent ditches Logical and decision-making technique.
Background technique
Intensified learning (Reinforcement Learning) is a kind of can be achieved directly from environment sensing to movement mapping Algorithm.By inputting perception information (such as visual information, status information), mapping model output action is then established, in turn Realize the decision process of intelligent body (Agent) in circumstances not known.Deeply study combines deep neural network and reinforcing The advantage of study can effectively solve perception decision problem of the intelligent body (Agent) under the strange complex environment of higher-dimension.Tradition Supervised learning algorithm usually require the largely training data that manually marks, while the obtained model of training it is horizontal also by It is limited to the level of training data.Intensified learning, which passes through, constantly generates data with environmental interaction, and not according to the feedback of environment Disconnected iteration itself strategy.The data manually marked are depended on to solve supervised learning method to a certain extent, are also limited In the problem of human data level.Therefore, depth enhancing study is the forward position research direction of general artificial intelligence field, is had wide Wealthy application prospect.
The case where common deeply study is mainly applied to single intelligent body (Single-Agent), i.e., in environment Only one Agent constantly interacts with environment and then obtains sample, and training one depth-size strategy network-control one Agent.And the problem of being more multiple agent in actual environment, i.e. environment, have multiple intelligent bodies to carry out decision, multiple intelligence It influences each other between body, the common state for changing environment.There are also different relationships, (such as competitive relation is closed between multiple intelligent bodies Make relationship etc.).For single intelligent body, when carrying out decision in multiple agent environment, and meanwhile it should also be taken into account that it is teammate, right Hand state in which and their strategy.Many problems in natural world and human society can be regarded as multiple agent Gambling process (such as vehicular traffic travel, be related to the game etc. of more people), therefore the nitrification enhancement based on multiple agent It has broad application prospects, while being also the only way which must be passed that the mankind realize strong artificial intelligence.
However, existing nitrification enhancement is typically only capable to the neural network model of cooperation lightweight, under complex model Performance and bad.Therefore efficient, succinct, practical neural network model how is designed, the relationship between description intelligent body comprehensively While, guarantee being condensed into for the key of multiple agent intensified learning method for network structure.
Summary of the invention
The object of the present invention is to provide a kind of communication of intensified learning multiple agent and decision-making techniques, can be to each intelligent body Status information cluster and linked up with other intelligent bodies, and then improves the level of decision-making of intelligent body.
The purpose of the present invention is what is be achieved through the following technical solutions:
A kind of intensified learning multiple agent is linked up and decision-making technique, including:
Corresponding state feature is extracted by neural network according to the observation state information of each intelligent body;
Using the state feature of all intelligent bodies as information input is linked up to carrying out soft distribution in VLAD layer and cluster, obtain Communication information after cluster;
Communication information after cluster is distributed to each intelligent body, by the state feature of itself and is received by each intelligent body To cluster after communication information polymerize, and movement decision is carried out by the full Connection Neural Network inside intelligent body.
As seen from the above technical solution provided by the invention, it can be propagated based on gradient, cluster centre can learn The intensified learning multiple agent communication mechanism of VLAD, for the collaborative problem between intelligent body under multiple agent environment, Ke Yishi It is effectively linked up between existing intelligent body and status information is interactive, have very strong robust simultaneously for the dynamic change of intelligent body quantity Property, the final performance for improving neural network model.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, required use in being described below to embodiment Attached drawing be briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this For the those of ordinary skill in field, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.
Fig. 1 is network architecture schematic diagram provided in an embodiment of the present invention;
Fig. 2 is the flow chart of a kind of intensified learning multiple agent communication and decision-making technique provided in an embodiment of the present invention
The network architecture schematic diagram that Fig. 3 is VLAD layers provided in an embodiment of the present invention.
Specific embodiment
With reference to the attached drawing in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete Ground description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Based on this The embodiment of invention, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, belongs to protection scope of the present invention.
In order to make intelligent physical efficiency preferably cooperate, compete, while have algorithm to the dynamic change of intelligent body quantity Stronger robustness, the embodiment of the present invention provide a kind of intensified learning multiple agent and link up and decision-making technique, can be In the training optimization process of multiple agent intensified learning strategy network, communication mechanism is established between each intelligent body, to each intelligence Energy body state in which carries out cluster coding, and each intelligent body is compiled according to oneself state information and the state of other intelligent bodies later Code information can carry out decision;Entire communication mechanism is simple and effective, dynamic change robustness of the communication mechanism to intelligent body quantity By force, while end-to-end mapping from ambient condition to intelligent body strategy is realized.
In the embodiment of the present invention, the joint decision of multiple agent, network model knot are realized using the neural network of multilayer Structure is as shown in Figure 1, the realization process of correlation technique is as shown in Figure 2.
Referring to Fig. 1, it is assumed that now have N number of intelligent body in the environment, this N number of intelligent body in the environment it is observed that environment Different status information is respectively s1,s2,……sN, neural network module inside each moment t, each intelligent body f1……fmCorresponding movement can be generated according to its state in which, if the movement that each intelligent body is taken is respectively a1, a2,……aN, after all intelligent bodies have been carried out movement, each intelligent body is believed the reward that environmental feedback is returned is received Cease rt.Wherein rtIt is related with the movement of all intelligent bodies selection in environment, namely in embodiments of the present invention, all intelligent bodies It is all identical in the environment reward that each moment receives.
Referring to fig. 2, the realization process of correlation technique mainly includes:
Step 1 extracts corresponding state feature by neural network according to the observation state information of each intelligent body.
In the embodiment of the present invention, the observation state information of each intelligent body is subjected to manual coding, is realized from physical world To the mapping of mathematical space, coding result can be vector form or graphic form;If mapping code result is vector form, The feature that does well then is extracted by MLP network;If mapping code result is graphic form, shape is extracted by CNN network State feature.
Step 2, using the state feature of all intelligent bodies as linking up information input to carrying out soft distribution in VLAD layer and gather Class, the communication information after being clustered.
It in the embodiment of the present invention, can be transmitted using gradient, the VLAD (partial polymerization description vectors) that cluster centre can learn Layer, structure are as shown in Figure 3.
In the embodiment of the present invention, VLAD cluster is carried out to the state feature of each intelligent body by the way of soft distribution, point After dispensing respectively clusters weight a little by the multiplication of state characteristic weighing, cooperation softmax formula is provided, and is expressed as:
In above formula, wk(Xi) indicate i-th of intelligent body state feature XiDistribute to the weight of k-th of cluster centre, ak、bk For the corresponding soft distribution weight of k-th of cluster centre, akFor row vector, bkFor scalar, xiFor the state feature of i-th of intelligent body XiRepresented column vector, traversal of the k ' expression to k all cluster centres, ak′、bk′Indicate that kth ' a cluster centre is corresponding Soft distribution weight, ak′For row vector, bk′For scalar.
In the embodiment of the present invention, the convolution kernel that 1*1 can be used realizes weight computations a in soft distributionkxi+bk;It The weight w of soft distribution is further calculated out using the softmax layer in neural network afterwardsk(Xi)。
Thought later based on VLAD cluster, final cluster result is by feature space between vector sum cluster centre Distance characterization, the cluster result of k-th of cluster centre are as follows:
Wherein, V (j, k) is the communication information after the cluster result of k-th of cluster centre jth dimension, namely cluster;xi(j) For the state feature X of i-th of intelligent bodyiJth dimension in represented column vector, ckIt (j) is the jth dimension of k-th of cluster centre point Coordinate, N are intelligent body quantity.
It, can be by VLAD core layer according to w in the embodiment of the present inventionk(Xi) and Xi, complete the distribution of specific cluster centre With the generation work of final VLAD vector, this layer is mainly made of the plus-minus module of vector.
Communication information after cluster is distributed to each intelligent body by step 3, by each intelligent body by the state feature of itself It is polymerize with the communication information after the cluster received, and is moved by the full Connection Neural Network module inside intelligent body It makes decision.
In the embodiment of the present invention, each intelligent body adopts the state feature of itself and the communication information after the cluster that receives It is polymerize with concatenated mode;Then, the optional n of intelligent body is generated by the full Connection Neural Network module inside intelligent body A movement a1,a2,……anCorresponding probability distribution p1,p2,……pn;The full Connection Neural Network is one or more layers, is inputted Layer dimension is the sum of the dimension of the communication information after state feature and cluster, output layer dimension and optional movement a1,a2,……an It is corresponding, so its dimension is n;After the probability distribution of n movement of generation, it can be sampled according to probability and generate final move Make, the movement that can also choose maximum probability is the final movement of the intelligent body;Each intelligent body according to itself state not Together, in conjunction with information is linked up, there may be identical movements, it is also possible to generate different movements.
On the other hand, each intelligent body of the embodiment of the present invention will receive the reward letter that environmental feedback is returned after being carried out movement It ceases, the incentive message that sharing model parameters and environmental feedback are returned between each intelligent body, before being measured by the size of reward The quality for the movement taken, and then training smart body Model uses preferably strategy when next time with environmental interaction.And it adopts With the mode of course transfer learning, it is stepped up the complexity of environment and the quantity of intelligent body in the training process, thus plus The training speed of fast model.
Course transfer learning refers to the complexity that environment is stepped up in the training process of model, first relatively easy Environment in (such as in the environment of intelligent body negligible amounts) training pattern, later using trained parameter in more complicated ring It is trained in border, is finally slowly transitioned into desired complex environment.Meanwhile in the training process, the intelligent body of same type All-network model (including handle observation state information neural network, VLAD layer, final generation act decision full connection Neural network) parameter be all it is shared, the reward feedback signal that each intelligent body is obtained from environment is also identical, Ge Gezhi Energy body updates the same model parameter according to the state iteration of itself.The model parameter of different types of intelligent body is different, environment It is identical to reward feedback signal.Therefore the model in the embodiment of the present invention has very strong robustness to the variation of intelligent body quantity.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment can The mode of necessary general hardware platform can also be added to realize by software by software realization.Based on this understanding, The technical solution of above-described embodiment can be embodied in the form of software products, which can store non-easy at one In the property lost storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.), including some instructions are with so that a computer is set Standby (can be personal computer, server or the network equipment etc.) executes method described in each embodiment of the present invention.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, Within the technical scope of the present disclosure, any changes or substitutions that can be easily thought of by anyone skilled in the art, It should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with the protection model of claims Subject to enclosing.

Claims (5)

1. a kind of intensified learning multiple agent is linked up and decision-making technique, which is characterized in that including:
Corresponding state feature is extracted by neural network according to the observation state information of each intelligent body;
Using the state feature of all intelligent bodies as information input is linked up to carrying out soft distribution in VLAD layer and cluster, clustered Communication information afterwards;
Communication information after cluster is distributed to each intelligent body, by each intelligent body by the state feature of itself with receive Communication information after cluster is polymerize, and carries out movement decision by the full Connection Neural Network inside intelligent body.
2. a kind of intensified learning multiple agent according to claim 1 is linked up and decision-making technique, which is characterized in that state is special The process of sign includes:
The observation state information of each intelligent body is subjected to manual coding, realizes the mapping from physical world to mathematical space, is compiled Code result is vector form or graphic form;
If mapping code result is vector form, the feature that does well is extracted by MLP network;
If mapping code result is graphic form, the feature that does well is extracted by CNN network.
3. a kind of intensified learning multiple agent according to claim 1 is linked up and decision-making technique, which is characterized in that VLAD layers It carries out soft distribution and the process of cluster includes:
VLAD cluster is carried out to the state feature of each intelligent body by the way of soft distribution, distribute to the weight of each cluster point by After state characteristic weighing is multiplied, cooperation softmax formula is provided, and is expressed as:
In above formula, wk(Xi) indicate i-th of intelligent body state feature XiDistribute to the weight of k-th of cluster centre, ak、bkIt is The corresponding soft distribution weight of k cluster centre, xiFor the state feature X of i-th of intelligent bodyiRepresented column vector, k ' expression pair The traversal of k all cluster centres, ak′、bk′Indicate the corresponding soft distribution weight of kth ' a cluster centre;
Final cluster result is characterized by the distance between vector sum cluster centre in feature space, k-th cluster centre it is poly- Class result is as follows:
Wherein, V (j, k) is the communication information after the cluster result of k-th of cluster centre jth dimension, namely cluster;xiIt (j) is i-th The state feature X of a intelligent bodyiJth dimension in represented column vector, ck(j) coordinate is tieed up for the jth of k-th of cluster centre point, N is intelligent body quantity.
4. a kind of intensified learning multiple agent according to claim 1 is linked up and decision-making technique, which is characterized in that described to incite somebody to action The state feature of itself is polymerize with the communication information after the cluster received, and passes through the full connection nerve inside intelligent body Network carries out movement decision:
Each intelligent body will be carried out the state feature of itself with the communication information after the cluster that receives using concatenated mode Polymerization;
Then, the optional n movement a of intelligent body is generated by the full Connection Neural Network inside intelligent body1, a2... anIt is corresponding Probability distribution p1, p2... pn;After generating the probability distribution that n acts, final move is generated according to probability sampling Make, or choosing the movement of maximum probability is the final movement of the intelligent body;
The full Connection Neural Network is one or more layers, and input layer dimension is the dimension of the communication information after state feature and cluster The sum of, output layer dimension and optional movement a1, a2... anIt is corresponding, dimension n.
5. a kind of intensified learning multiple agent according to claim 1 is linked up and decision-making technique, which is characterized in that Ge Gezhi Energy body executes the incentive message that will be returned by environmental feedback after a movement, sharing model parameters and ring between each intelligent body The incentive message that border is fed back, the quality for the movement taken before being measured by the size of reward, and then training smart body exists Next time with when environmental interaction using preferably strategy;Meanwhile by the way of course transfer learning, in the training process gradually Increase the complexity of environment and the quantity of intelligent body.
CN201810606662.1A 2018-06-12 2018-06-12 Multi-agent communication and decision-making method for reinforcement learning Active CN108921298B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810606662.1A CN108921298B (en) 2018-06-12 2018-06-12 Multi-agent communication and decision-making method for reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810606662.1A CN108921298B (en) 2018-06-12 2018-06-12 Multi-agent communication and decision-making method for reinforcement learning

Publications (2)

Publication Number Publication Date
CN108921298A true CN108921298A (en) 2018-11-30
CN108921298B CN108921298B (en) 2022-04-19

Family

ID=64419238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810606662.1A Active CN108921298B (en) 2018-06-12 2018-06-12 Multi-agent communication and decision-making method for reinforcement learning

Country Status (1)

Country Link
CN (1) CN108921298B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109960259A (en) * 2019-02-15 2019-07-02 青岛大学 A kind of unmanned guiding vehicle paths planning method of the multiple agent intensified learning based on gradient gesture
CN109978176A (en) * 2019-03-05 2019-07-05 华南理工大学 A kind of multiple agent cooperative learning methods based on state dynamic sensing
CN110070099A (en) * 2019-02-20 2019-07-30 北京航空航天大学 A kind of industrial data feature structure method based on intensified learning
CN110119749A (en) * 2019-05-16 2019-08-13 北京小米智能科技有限公司 Identify method and apparatus, the storage medium of product image
CN110554604A (en) * 2019-08-08 2019-12-10 中国地质大学(武汉) multi-agent synchronous control method, equipment and storage equipment
WO2020199690A1 (en) * 2019-03-29 2020-10-08 深圳先进技术研究院 Cloud platform-based sharing learning system and method, sharing platform and method, and medium
CN112215350A (en) * 2020-09-17 2021-01-12 天津(滨海)人工智能军民融合创新中心 Smart agent control method and device based on reinforcement learning
CN112260733A (en) * 2020-11-10 2021-01-22 东南大学 Multi-agent deep reinforcement learning-based MU-MISO hybrid precoding design method
CN112507104A (en) * 2020-12-18 2021-03-16 北京百度网讯科技有限公司 Dialog system acquisition method, apparatus, storage medium and computer program product
CN112926729A (en) * 2021-05-06 2021-06-08 中国科学院自动化研究所 Man-machine confrontation intelligent agent strategy making method
CN113110582A (en) * 2021-04-22 2021-07-13 中国科学院重庆绿色智能技术研究院 Unmanned aerial vehicle cluster intelligent system control method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104698854A (en) * 2015-03-26 2015-06-10 哈尔滨工业大学 Distributed fuzzy cooperative tracking control method of network Euler-Lagrange system
CN106649456A (en) * 2016-09-23 2017-05-10 西安电子科技大学 Cluster and outlier detection method based on multi-agent evolution
US9860391B1 (en) * 2003-03-07 2018-01-02 Wai Wu Method and system for matching entities in an auction
US20180032858A1 (en) * 2015-12-14 2018-02-01 Stats Llc System and method for predictive sports analytics using clustered multi-agent data
CN108108759A (en) * 2017-12-19 2018-06-01 四川九洲电器集团有限责任公司 A kind of dynamic of multiple agent compiles group's method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9860391B1 (en) * 2003-03-07 2018-01-02 Wai Wu Method and system for matching entities in an auction
CN104698854A (en) * 2015-03-26 2015-06-10 哈尔滨工业大学 Distributed fuzzy cooperative tracking control method of network Euler-Lagrange system
US20180032858A1 (en) * 2015-12-14 2018-02-01 Stats Llc System and method for predictive sports analytics using clustered multi-agent data
CN106649456A (en) * 2016-09-23 2017-05-10 西安电子科技大学 Cluster and outlier detection method based on multi-agent evolution
CN108108759A (en) * 2017-12-19 2018-06-01 四川九洲电器集团有限责任公司 A kind of dynamic of multiple agent compiles group's method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JAKOB N. FOERSTER等: "Learning to Communicate with Deep Multi-Agent Reinforcement Learning", 《ARXIV》 *
ROYA ASADI等: "A Framework For Intelligent Multi Agent System Based Neural Network Classification Model", 《ARXIV》 *
潘晓英等: "密度敏感的多智能体进化聚类算法", 《软件学报》 *
范波等: "一种基于分布式强化学习的多智能体协调方法", 《计算机仿真》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109960259B (en) * 2019-02-15 2021-09-24 青岛大学 Multi-agent reinforcement learning unmanned guided vehicle path planning method based on gradient potential
CN109960259A (en) * 2019-02-15 2019-07-02 青岛大学 A kind of unmanned guiding vehicle paths planning method of the multiple agent intensified learning based on gradient gesture
CN110070099A (en) * 2019-02-20 2019-07-30 北京航空航天大学 A kind of industrial data feature structure method based on intensified learning
CN109978176A (en) * 2019-03-05 2019-07-05 华南理工大学 A kind of multiple agent cooperative learning methods based on state dynamic sensing
WO2020199690A1 (en) * 2019-03-29 2020-10-08 深圳先进技术研究院 Cloud platform-based sharing learning system and method, sharing platform and method, and medium
CN110119749A (en) * 2019-05-16 2019-08-13 北京小米智能科技有限公司 Identify method and apparatus, the storage medium of product image
CN110554604A (en) * 2019-08-08 2019-12-10 中国地质大学(武汉) multi-agent synchronous control method, equipment and storage equipment
CN110554604B (en) * 2019-08-08 2021-07-09 中国地质大学(武汉) Multi-agent synchronous control method, equipment and storage equipment
CN112215350A (en) * 2020-09-17 2021-01-12 天津(滨海)人工智能军民融合创新中心 Smart agent control method and device based on reinforcement learning
CN112215350B (en) * 2020-09-17 2023-11-03 天津(滨海)人工智能军民融合创新中心 Method and device for controlling agent based on reinforcement learning
CN112260733A (en) * 2020-11-10 2021-01-22 东南大学 Multi-agent deep reinforcement learning-based MU-MISO hybrid precoding design method
CN112260733B (en) * 2020-11-10 2022-02-01 东南大学 Multi-agent deep reinforcement learning-based MU-MISO hybrid precoding design method
CN112507104A (en) * 2020-12-18 2021-03-16 北京百度网讯科技有限公司 Dialog system acquisition method, apparatus, storage medium and computer program product
CN113110582A (en) * 2021-04-22 2021-07-13 中国科学院重庆绿色智能技术研究院 Unmanned aerial vehicle cluster intelligent system control method
CN113110582B (en) * 2021-04-22 2023-06-02 中国科学院重庆绿色智能技术研究院 Unmanned aerial vehicle cluster intelligent system control method
CN112926729B (en) * 2021-05-06 2021-08-03 中国科学院自动化研究所 Man-machine confrontation intelligent agent strategy making method
CN112926729A (en) * 2021-05-06 2021-06-08 中国科学院自动化研究所 Man-machine confrontation intelligent agent strategy making method

Also Published As

Publication number Publication date
CN108921298B (en) 2022-04-19

Similar Documents

Publication Publication Date Title
CN108921298A (en) Intensified learning multiple agent is linked up and decision-making technique
CN111291836B (en) Method for generating student network model
CN104662526B (en) Apparatus and method for efficiently updating spiking neuron network
CN109299262A (en) A kind of text implication relation recognition methods for merging more granular informations
CN107092959A (en) Hardware friendly impulsive neural networks model based on STDP unsupervised-learning algorithms
CN110134774A (en) It is a kind of based on the image vision Question-Answering Model of attention decision, method and system
CN108090658A (en) Arc fault diagnostic method based on time domain charactreristic parameter fusion
CN108427989B (en) Deep space-time prediction neural network training method for radar echo extrapolation
CN107247989A (en) A kind of neural network training method and device
CN108171323A (en) A kind of artificial neural networks device and method
CN104504520B (en) A kind of autonomous mission planning method of deep space probe based on neutral net
CN104636985A (en) Method for predicting radio disturbance of electric transmission line by using improved BP (back propagation) neural network
CN104636801A (en) Transmission line audible noise prediction method based on BP neural network optimization
CN107705556A (en) A kind of traffic flow forecasting method combined based on SVMs and BP neural network
CN107292352A (en) Image classification method and device based on convolutional neural networks
CN106529818A (en) Water quality evaluation prediction method based on fuzzy wavelet neural network
CN110334196B (en) Neural network Chinese problem generation system based on strokes and self-attention mechanism
CN105976020A (en) Network flow prediction method considering wavelet cross-layer correlations
CN113780002A (en) Knowledge reasoning method and device based on graph representation learning and deep reinforcement learning
CN109726676A (en) The planing method of automated driving system
CN114398976A (en) Machine reading understanding method based on BERT and gate control type attention enhancement network
CN111931934A (en) Affine transformation solving method under mass control points based on improved genetic algorithm
An et al. A unified information perceptron using deep reservoir computing
Tong et al. Enhancing rolling horizon evolution with policy and value networks
Guo et al. Skewed normal cloud modified whale optimization algorithm for degree reduction of S-λ curves

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant