CN113286275A - Unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning - Google Patents
Unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning Download PDFInfo
- Publication number
- CN113286275A CN113286275A CN202110441049.0A CN202110441049A CN113286275A CN 113286275 A CN113286275 A CN 113286275A CN 202110441049 A CN202110441049 A CN 202110441049A CN 113286275 A CN113286275 A CN 113286275A
- Authority
- CN
- China
- Prior art keywords
- unmanned aerial
- aerial vehicle
- captain
- reinforcement learning
- communication method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000004891 communication Methods 0.000 title claims abstract description 30
- 230000002787 reinforcement Effects 0.000 title claims abstract description 28
- 238000012549 training Methods 0.000 claims abstract description 45
- 230000003993 interaction Effects 0.000 claims abstract description 14
- 230000007246 mechanism Effects 0.000 claims abstract description 14
- 238000012545 processing Methods 0.000 claims abstract description 9
- 230000004083 survival effect Effects 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 26
- 230000009471 action Effects 0.000 claims description 21
- 230000008569 process Effects 0.000 claims description 9
- 238000005457 optimization Methods 0.000 claims description 8
- 230000000875 corresponding effect Effects 0.000 claims description 7
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 238000004088 simulation Methods 0.000 claims description 3
- 238000013461 design Methods 0.000 claims description 2
- 238000012546 transfer Methods 0.000 claims description 2
- 238000004422 calculation algorithm Methods 0.000 description 8
- 230000007613 environmental effect Effects 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- RZVHIXYEVGDQDX-UHFFFAOYSA-N 9,10-anthraquinone Chemical compound C1=CC=C2C(=O)C3=CC=CC=C3C(=O)C2=C1 RZVHIXYEVGDQDX-UHFFFAOYSA-N 0.000 description 3
- 238000011217 control strategy Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002195 synergetic effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/30—Services specially adapted for particular environments, situations or purposes
- H04W4/40—Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
- H04W4/46—Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P] for vehicle-to-vehicle communication [V2V]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/02—Arrangements for optimising operational condition
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/06—Testing, supervising or monitoring using simulated traffic
Abstract
The invention discloses an unmanned aerial vehicle cluster high-efficiency communication method based on multi-agent reinforcement learning, which comprises the steps of constructing an unmanned aerial vehicle flight environment simulator; randomly selecting an unmanned aerial vehicle as a captain and marking; each unmanned aerial vehicle acquires and maintains local observed values of the unmanned aerial vehicle, encodes the observed values and sends the encoded observed values to the captain; the captain performs attention mechanism processing on the global observation value according to the observation value of each unmanned aerial vehicle, determines the weight of the information according to the importance degree of the information, and then sends the calculated observation value to each teammate as the global observation value of the teammate; in the training stage, the global observation value is used as training data until the strategy network is converged; the execution phase is carried out in a distributed mode; the captain is given an additional reward for survival. The unmanned aerial vehicle cluster centralized information interaction method and the unmanned aerial vehicle cluster centralized information interaction system can solve the problem of unmanned aerial vehicle cluster centralized information interaction under the condition of low communication overhead, and give the unmanned aerial vehicle autonomous decision-making right.
Description
Technical Field
The invention relates to an unmanned aerial vehicle cluster learning method and an unmanned aerial vehicle cluster information interaction method based on a multi-agent reinforcement learning algorithm, and belongs to the technical field of unmanned aerial vehicle cluster communication cooperation.
Background
With the rapid development of science and technology, unmanned aerial vehicles are more and more miniaturized and intelligentized, so that the unmanned aerial vehicles are widely applied to actions such as battlefield investigation, joint attack, emergency rescue and the like. Compared with a single unmanned aerial vehicle, the unmanned aerial vehicle cluster has obvious scale advantages, synergetic advantages and the like, and the limitation of the single unmanned aerial vehicle in the aspects of efficiency and endurance is effectively solved. At present, most unmanned aerial vehicle cluster solutions rely on deep reinforcement learning. Deep reinforcement learning is a branch of machine learning, and is a learning paradigm of interactive trial and error learning with the environment. The reinforcement learning has strong robustness, the core of the reinforcement learning is iterative learning through interaction with the environment, and two basic characteristics of the reinforcement learning are as follows: continuously interacting with the environment to perform trial and error exploration and obtaining delayed return after interacting with the environment. The goal of reinforcement learning is to acquire new knowledge in an iterative process, thereby improving the state-action function to adapt to the environment.
No matter the unmanned aerial vehicle cluster is collaborative training or co-deployment, the communication scheme among all unmanned aerial vehicles is an important component of the unmanned aerial vehicle cluster solution. The existing control strategy for unmanned aerial vehicle cluster information interaction generally comprises three modes, namely centralized control, distributed control and distributed control, wherein each mode has advantages and disadvantages. The centralized control effect is the best, but a large amount of information interaction is needed, the calculation amount is large, the communication efficiency is low, and the unmanned aerial vehicle cluster is often lack of flexibility and autonomy under the method.
In order to improve the communication efficiency among unmanned aerial vehicle clusters, the academic community proposes a plurality of methods, such as a centralized training and decentralized execution framework, but the method ignores the security of a physical layer. The centralized training and decentralized execution framework comprises an Actor network and a Critic network, action strategies of other agents are considered for reinforcement learning of each agent, centralized training and distributed execution are carried out, the Critic network can obtain global information in the training process, and input of the Actor in the execution process comprises local information of a single agent. The frame of decentralized training and decentralized execution is applied to unmanned aerial vehicle cluster control, so that the unmanned aerial vehicle cluster can be prevented from losing autonomy, and each unmanned aerial vehicle can make a decision according to local information acquired by a sensor of the unmanned aerial vehicle, so that certain autonomy capability is realized. However, in the cluster of unmanned aerial vehicles based on a frame for decentralized training and decentralized execution, if each unmanned aerial vehicle needs to interact information such as position, speed, posture and moving object with all unmanned aerial vehicles in the formation, bandwidth is wasted, and information is easily acquired by enemies.
Disclosure of Invention
The purpose of the invention is as follows: in order to overcome the defects of low communication efficiency and unsafe communication in an unmanned aerial vehicle cluster based on a centralized training and decentralized execution framework, the invention provides an unmanned aerial vehicle cluster high-efficiency communication method based on a multi-agent deep reinforcement learning framework based on the background of the multi-agent reinforcement learning algorithm in the unmanned aerial vehicle cluster. In the cluster, a 'captain-teammate' mode is adopted, a specific reward mechanism is designed for the captain, the longest survival time of the captain is guaranteed, then the unmanned aerial vehicle cluster is numbered, one captain is selected, and each unmanned aerial vehicle acquires and maintains a local observation value of the unmanned aerial vehicle through a sensor of the unmanned aerial vehicle. In order to reduce information dimension, each unmanned aerial vehicle carries out embedding coding on the local observed value, and the team leader collects the local observed values of teammates and maintains the local observed values as global observed values; in order to reduce the search space and improve the compactness among the information of the global observation values, the captain performs attention mechanism processing on the global observation values according to the own observation values of each unmanned aerial vehicle, and sends the calculated observation values to each teammate as the global observation values of the teammates. The scheme of the invention reduces the communication frequency from N (N-1) to 2(N-1) (N is the number of unmanned racks), greatly reduces communication links and communication traffic, and can better solve the problems of communication efficiency and safety.
The technical scheme is as follows: an unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning comprises the following steps: (1) constructing an unmanned aerial vehicle flight environment simulator; (2) in the unmanned aerial vehicle cluster, randomly selecting one unmanned aerial vehicle as a team leader and marking, and taking the rest unmanned aerial vehicles as teammates; (3) the team leader is an observation value transfer station, collects local observation values of the team members and maintains the local observation values into a global observation value, and sends the global observation value to team friends for information interaction; (4) performing centralized training and decentralized execution based on a centralized training and decentralized execution framework, for example, performing unmanned aerial vehicle action training and execution by using an MADDPG algorithm and a QMIX algorithm MADDPG framework, wherein a global observation value is used as training data in a training stage until a strategy network is converged; the execution stage is carried out in a distributed mode, namely each unmanned aerial vehicle sends the local observation value of the unmanned aerial vehicle to a strategy execution network to obtain corresponding actions; (5) to maintain the captain free from attacks, the captain is given an additional reward by a reward function for survival.
In the step (1), an unmanned aerial vehicle flight environment simulator based on aerodynamics is constructed based on a simulation environment engine.
In the step (3), each unmanned aerial vehicle acquires and maintains the local observation value of the unmanned aerial vehicle, encodes the local observation value of the unmanned aerial vehicle and sends the encoded local observation value to the captain; because the importance degrees of the information in the global observation value formed by the own observation values of the unmanned aerial vehicles are inconsistent, the captain respectively carries out attention mechanism processing on the global observation value according to the own local observation value of each unmanned aerial vehicle, determines the weight of the information according to the importance degree of the information, and then sends the calculated observation value to each teammate to serve as the global observation value of the teammate, so that the exploration space is reduced, the learning efficiency is improved, and the global observation value which is more closely related is constructed.
Meanwhile, in order to further reduce bandwidth consumption, each unmanned aerial vehicle observes a local observation value o of the unmanned aerial vehicleωCarrying out embedding coding processing, wherein each unmanned aerial vehicle shares the same coding mechanism, and teammates can obtain self observed values containing position, speed, attitude and state information after codingAnd sending the observation values to the captain, and collecting the local observation values of the teammates by the captain to maintain the local observation values into a whole local observation value.
In the initial stage, the unmanned aerial vehicle information is deficient, and the unmanned aerial vehicle can be based on the local observed value o of the unmanned aerial vehicleωUsing a strategy of piωGenerate a corresponding action aω。
In the training phase, in order to guarantee the safety of unmanned aerial vehicle cluster communication, a captain needs to survive to the end in the whole cooperation process, so as to design a specific reward function, wherein the reward function comprises: course reward functionOutcome reward functionTeam leader reward function
The attention mechanism function contains three basic elements: query, key, value, where query represents a given element,<key,value>forming a data pair, which represents the value corresponding to each key; query represents the action performed by the unmanned aerial vehicle, Key represents the current state of each unmanned aerial vehicle, value corresponds to Key and is the number of KeyAnd presenting the value. First pass through the similarity functionCalculating the similarity between a given query and each key, and then passing through a softmax functionObtaining the normalized Attention weight, and finally carrying out weighted summation on the normalized Attention weight to Attention (Q, K)i)=∑iαiValuei,αiRepresenting the weight of each original query. The weight attention mechanism of each element pay comprises a plurality of keys, alphaiThe similarity between a certain key and a given query is calculated, and for calculating the final weight of the query, the weights of the query and each key need to be weighted and summed, so as to obtain an attribute value. Q stands for query, and the superscript T stands for transposing Q.
In the unmanned aerial vehicle flight environment simulator, an unmanned aerial vehicle cluster interacts with the environment to acquire training data. Each unmanned aerial vehicle acquires the local observed value, takes action according to the action strategy of the unmanned aerial vehicle, and obtains the reward value. Storing the obtained tuples composed of the global observation values, the actions and the rewards into an experience playback poolIn (1).
In the training stage, the Critic network of the evaluation network is centrally trained, and the joint Q value function of the Critic network of the evaluation network is defined asWhereinIs a parameter of the action strategy function, and the optimization goal is WhereinIs the target action of the next moment. Sampling partial samples from training dataThe function is optimized until the model (Critic network) converges. r represents a bonus that is given,representing the global reward, gamma is a loss factor, representing the impact of reward on the current action.
In the training stage, the recommendation strategy is trained through a gradient descent method to maximize the accumulated rewardT is the time range of the time,representing the impact of reward on the current action, the smaller the value, the greater the value,is time t reward.
The optimization target is as follows:whereinRepresenting the strategy under different roles, x, y and z represent the number of the unmanned aerial vehicle, and can be expanded into a plurality of numbers thetaωRepresenting policy parameters of the drone.
In the execution stage, the local observation value of the local machine is sent to the strategy execution network to obtain the corresponding action.
Has the advantages that: compared with the prior art, the unmanned aerial vehicle cluster high-efficiency communication method based on multi-agent reinforcement learning provided by the invention has the advantages that the unmanned aerial vehicle obtains rewards from the environment by adopting a deep reinforcement learning algorithm, has autonomous decision making capability, and realizes higher unmanned aerial vehicle cluster flight control than the traditional rule-based control mode;
according to the invention, the cluster autonomous control of the unmanned aerial vehicle is realized through a multi-agent deep reinforcement learning algorithm, so that the problem caused by a centralized information interaction mode can be effectively solved, and the unmanned aerial vehicle has autonomous capability;
the invention strips the decision-making power of the control center in centralized information interaction, hands each unmanned aerial vehicle to make a decision autonomously, adopts a frame based on centralized training and decentralized execution, and effectively solves the problems of unmanned aerial vehicle cluster communication calculated amount, autonomy and the like by centralized training and distributed execution.
The invention carries out coding processing on the local observed value, effectively reduces the dimensionality of information, and thus can reduce bandwidth consumption; and attention mechanism processing is carried out on the global observation value, the importance degree of different information is reflected, the exploration space is reduced, and therefore the learning efficiency is improved.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
fig. 2 is a schematic diagram of interaction of a cluster of drones with an environment.
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.
First, the main process of the method of the present invention is briefly described with reference to FIG. 1. The experiment begins, constructs the unmanned aerial vehicle flight environment simulator, initializes the intelligent unmanned aerial vehicle, and then selects the captain and marks. And starting each unmanned aerial vehicle to acquire environmental information, then performing information sharing with the unmanned aerial vehicle cluster, and then judging whether the training information is enough. If the information is not enough, each unmanned aerial vehicle continues to acquire the environmental information, and if the information is enough, whether the information center contains the captain information is judged; if the captain has no captain confidence, reselecting the captain, and if the captain information is contained, training the model; and then judging whether the model converges, if not, reselecting the captain, and if so, finishing the training.
The process of interaction of the drone cluster with the environment is also described with respect to fig. 2. Every unmanned aerial vehicle interacts with the environment, unmanned aerial vehicle obtains environmental information, and exert the action in the environment, later unmanned aerial vehicle will obtain environmental information and carry out embedding embedded code, the team member sends the information after the code for the team leader, the team leader puts into the attention mechanism with training information, give information and add weight, later put into the model and train, the data deployment after the model will train is at the intelligent body (unmanned aerial vehicle promptly), the intelligent body interacts with the environment, carry out the training of next round, so the circulation.
In a specific embodiment, the method of the present invention is further detailed by means of pseudo code.
Algorithm 1 unmanned aerial vehicle cluster cooperation algorithm based on multi-agent reinforcement learning
Inputting: agent structure, attention network, environmental information, other model training parameters
And (3) outputting: structure and weight of each agent policy network
1: simulator constructed according to flight physical environment of unmanned aerial vehicle
2: initializing Agents
3: while policy model does not converge yet
4: selecting the queue length from each agent according to a certain strategy
5: while training information accumulation failing to reach threshold do
6: the intelligent agent interacts with the environment simulator to accumulate interaction information
7: each team member embedding self-accumulated interactive information
8: each team member sends the embedding data to the captain
9: the captain collects and processes the data, and gives different weights to the information through an attention mechanism
10:end while
11: checking whether the information related to the captain is legal
12: if captain information is illegal the
13: abandoning the training data accumulated by the iteration
14:continue
15:end if
16: training strategy model by using newly added data
18:end while
First, an environment simulator is constructed. And constructing an unmanned aerial vehicle flight environment simulator based on aerodynamics based on a simulation environment engine. This embodiment contains three cooperation unmanned aerial vehicles, can be convenient extend to many unmanned aerial vehicle clusters. A reward function is designed. The roles contained in the three cooperative unmanned aerial vehicles comprise a captain and team members, and in order to guarantee the safety of unmanned aerial vehicle cluster communication, the captain needs to survive to the end in the whole cooperation process, so that a specific reward function is designed as follows: course reward function Outcome reward functionTeam leader reward function
Secondly, the three cooperative unmanned aerial vehicles are initialized and numbered<x,y,z>Selecting one unmanned aerial vehicle as the captain and marking the captain as xc. The following steps are carried out:
step 1:
in the training phase, the unmanned aerial vehicle x, y and z interact with the environment and transmit through the unmanned aerial vehicle x, y and zThe sensor (sensor combination) acquires a local observed value, then takes the observed value as the input of the embedding layer, and outputs the encoded observed value
Step 2:
the team leader collects the own observed values after the team friends encode, and integrates the own local observed values and the local observed values from the team friends into a global observed valuePerforming attention mechanism processing on the global observation value and calculating the calculated global observation valueSending the information to teammates;
and step 3:
designing a neural network structure, selecting a neural network hyper-parameter, and building a neural network.
For example, a policy network may include 5-layer fully-connected neural networks, each layer using a relu function as an activation function.
And 4, step 4:
and training a flight control strategy in the simulator by using the current flight control strategy according to a training process of a centralization training and decentralization execution frame until the model is converged.
And 5:
according to the execution flow of the centralization training decentralization execution framework, the obtained local observation value is sent to a strategy execution network (a model obtained after convergence), and a corresponding action is obtained.
In the unmanned aerial vehicle flight environment simulator, an unmanned aerial vehicle cluster interacts with the environment to acquire training data. Each unmanned aerial vehicle acquires local observed values and acts according to own action strategyTaking actionEarning a prize value ofThe global observation value, the action and the reward obtained above are combined into a tupleStore to experience playback poolIn (1).Indicating the environment information of the next moment.
In step 4, the Critic network is centrally trained, and the joint Q value function is defined asWhereinIs a parameter of the action strategy function, and the optimization goal isWhereinIs the target action of the next moment. Sampling partial samples from training dataThe function is optimized until the model converges.
Claims (10)
1. An unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning is characterized by comprising the following steps: (1) constructing an unmanned aerial vehicle flight environment simulator; (2) in the unmanned aerial vehicle cluster, randomly selecting one unmanned aerial vehicle as a team leader and marking, and taking the rest unmanned aerial vehicles as teammates; (3) the team leader is an observation value transfer station, collects local observation values of the team members and maintains the local observation values into a global observation value, and sends the global observation value to team friends for information interaction; (4) based on a frame of decentralized execution of centralized training, the training stage takes the global observation value as training data until the strategy network is converged; the execution stage is carried out in a distributed mode, namely each unmanned aerial vehicle sends the local observation value of the unmanned aerial vehicle to a strategy execution network to obtain corresponding actions; (5) to maintain the captain free from attacks, the captain is given an additional reward by a reward function for survival.
2. The multi-agent reinforcement learning-based unmanned aerial vehicle cluster high-efficiency communication method according to claim 1, wherein in the step (1), an aerodynamic-based unmanned aerial vehicle flight environment simulator is constructed based on a simulation environment engine.
3. The multi-agent reinforcement learning-based unmanned aerial vehicle cluster efficient communication method according to claim 1, wherein in the step (3), each unmanned aerial vehicle acquires and maintains local observation values of the unmanned aerial vehicle, encodes the local observation values of the unmanned aerial vehicle and transmits the local observation values to the captain; the team leader respectively carries out attention mechanism processing on the global observation values according to the local observation values of each unmanned aerial vehicle, determines the weight of the information according to the importance degree of the information, and then sends the calculated observation values to each team friend to serve as the global observation values of the team friends.
4. The multi-agent reinforcement learning-based unmanned aerial vehicle cluster high-efficiency communication method as claimed in claim 1, wherein each unmanned aerial vehicle is used for observing its own local observation value oωCarrying out embedding coding processing, wherein each unmanned aerial vehicle shares the same coding mechanism, and teammates can obtain self observed values containing position, speed, attitude and state information after codingAnd sending the observation values to the captain, and collecting the local observation values of the teammates by the captain to maintain the local observation values into a whole local observation value.
5. The multi-agent reinforcement learning-based unmanned aerial vehicle cluster high-efficiency communication method as claimed in claim 1, wherein in the initial stage, the unmanned aerial vehicle can perform local observation o according to itselfωUsing a strategy of piωGenerate a corresponding action aω。
6. The multi-agent reinforcement learning-based unmanned aerial vehicle cluster high-efficiency communication method as claimed in claim 1, wherein the captain needs to survive to the end in the whole unmanned aerial vehicle cooperation process so as to design a reward function, wherein the reward function comprises: course reward functionOutcome reward functionTeam leader reward function
7. The multi-agent reinforcement learning-based unmanned aerial vehicle cluster high-efficiency communication method according to claim 3, wherein the attention mechanism function comprises three basic elements: query, key, value, first through a similarity functionCalculating the similarity between a given query and each key, and then passing through a softmax functionObtaining the normalized Attention weight, and finally carrying out weighted summation on the normalized Attention weight to Attention (Q, K)i)=∑iαiValuei。
8. The multi-agent reinforcement learning-based unmanned aerial vehicle cluster high-efficiency communication method as claimed in claim 1, wherein in the unmanned aerial vehicle flight environment simulator, the unmanned aerial vehicle cluster interacts with the environment to obtain training data. Each unmanned aerial vehicle acquires a local observation value, takes action according to an action strategy of the unmanned aerial vehicle, and obtains a reward value; storing the obtained tuples composed of the global observation values, the actions and the rewards into an experience playback poolIn (1).
9. The multi-agent reinforcement learning-based unmanned aerial vehicle cluster efficient communication method as claimed in claim 1, wherein the Critic network is trained in a centralized manner, and its joint Q-value function is defined asWhereinIs a parameter of the action strategy function, and the optimization goal isWhereinThe target action of the next moment; and sampling part of samples from the training data to perform function optimization until the model converges.
10. The multi-agent reinforcement learning-based unmanned aerial vehicle cluster high-efficiency communication method as claimed in claim 1, wherein gradient descent method is adopted to train recommendation strategy to maximize accumulated rewardsThe optimization target is as follows:whereinRepresenting the policy under different roles, ω denotes the number of the drone.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110441049.0A CN113286275A (en) | 2021-04-23 | 2021-04-23 | Unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110441049.0A CN113286275A (en) | 2021-04-23 | 2021-04-23 | Unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113286275A true CN113286275A (en) | 2021-08-20 |
Family
ID=77277177
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110441049.0A Pending CN113286275A (en) | 2021-04-23 | 2021-04-23 | Unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113286275A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113703482A (en) * | 2021-08-30 | 2021-11-26 | 西安电子科技大学 | Task planning method based on simplified attention network in large-scale unmanned aerial vehicle cluster |
CN114980020A (en) * | 2022-05-17 | 2022-08-30 | 重庆邮电大学 | Unmanned aerial vehicle data collection method based on MADDPG algorithm |
CN115150408A (en) * | 2022-06-21 | 2022-10-04 | 中国电子科技集团公司第五十四研究所 | Unmanned cluster distributed situation maintenance method based on information extraction |
CN115857556A (en) * | 2023-01-30 | 2023-03-28 | 中国人民解放军96901部队 | Unmanned aerial vehicle collaborative detection planning method based on reinforcement learning |
WO2023109699A1 (en) * | 2021-12-17 | 2023-06-22 | 深圳先进技术研究院 | Multi-agent communication learning method |
CN117572893A (en) * | 2024-01-15 | 2024-02-20 | 白杨时代(北京)科技有限公司 | Unmanned plane cluster countermeasure strategy acquisition method based on reinforcement learning and related equipment |
CN117707219A (en) * | 2024-02-05 | 2024-03-15 | 西安羚控电子科技有限公司 | Unmanned aerial vehicle cluster investigation countermeasure method and device based on deep reinforcement learning |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110991545A (en) * | 2019-12-10 | 2020-04-10 | 中国人民解放军军事科学院国防科技创新研究院 | Multi-agent confrontation oriented reinforcement learning training optimization method and device |
CN111786713A (en) * | 2020-06-04 | 2020-10-16 | 大连理工大学 | Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning |
CN111860162A (en) * | 2020-06-17 | 2020-10-30 | 上海交通大学 | Video crowd counting system and method |
CN112131660A (en) * | 2020-09-10 | 2020-12-25 | 南京大学 | Unmanned aerial vehicle cluster collaborative learning method based on multi-agent reinforcement learning |
CN112269876A (en) * | 2020-10-26 | 2021-01-26 | 南京邮电大学 | Text classification method based on deep learning |
CN112561395A (en) * | 2020-12-25 | 2021-03-26 | 桂林电子科技大学 | Unmanned aerial vehicle cooperation method, system, device, electronic equipment and storage medium |
-
2021
- 2021-04-23 CN CN202110441049.0A patent/CN113286275A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110991545A (en) * | 2019-12-10 | 2020-04-10 | 中国人民解放军军事科学院国防科技创新研究院 | Multi-agent confrontation oriented reinforcement learning training optimization method and device |
CN111786713A (en) * | 2020-06-04 | 2020-10-16 | 大连理工大学 | Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning |
CN111860162A (en) * | 2020-06-17 | 2020-10-30 | 上海交通大学 | Video crowd counting system and method |
CN112131660A (en) * | 2020-09-10 | 2020-12-25 | 南京大学 | Unmanned aerial vehicle cluster collaborative learning method based on multi-agent reinforcement learning |
CN112269876A (en) * | 2020-10-26 | 2021-01-26 | 南京邮电大学 | Text classification method based on deep learning |
CN112561395A (en) * | 2020-12-25 | 2021-03-26 | 桂林电子科技大学 | Unmanned aerial vehicle cooperation method, system, device, electronic equipment and storage medium |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113703482A (en) * | 2021-08-30 | 2021-11-26 | 西安电子科技大学 | Task planning method based on simplified attention network in large-scale unmanned aerial vehicle cluster |
CN113703482B (en) * | 2021-08-30 | 2022-08-12 | 西安电子科技大学 | Task planning method based on simplified attention network in large-scale unmanned aerial vehicle cluster |
WO2023109699A1 (en) * | 2021-12-17 | 2023-06-22 | 深圳先进技术研究院 | Multi-agent communication learning method |
CN114980020A (en) * | 2022-05-17 | 2022-08-30 | 重庆邮电大学 | Unmanned aerial vehicle data collection method based on MADDPG algorithm |
CN115150408A (en) * | 2022-06-21 | 2022-10-04 | 中国电子科技集团公司第五十四研究所 | Unmanned cluster distributed situation maintenance method based on information extraction |
CN115857556A (en) * | 2023-01-30 | 2023-03-28 | 中国人民解放军96901部队 | Unmanned aerial vehicle collaborative detection planning method based on reinforcement learning |
CN117572893A (en) * | 2024-01-15 | 2024-02-20 | 白杨时代(北京)科技有限公司 | Unmanned plane cluster countermeasure strategy acquisition method based on reinforcement learning and related equipment |
CN117572893B (en) * | 2024-01-15 | 2024-03-19 | 白杨时代(北京)科技有限公司 | Unmanned plane cluster countermeasure strategy acquisition method based on reinforcement learning and related equipment |
CN117707219A (en) * | 2024-02-05 | 2024-03-15 | 西安羚控电子科技有限公司 | Unmanned aerial vehicle cluster investigation countermeasure method and device based on deep reinforcement learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113286275A (en) | Unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning | |
CN109635917B (en) | Multi-agent cooperation decision and training method | |
CN110852448A (en) | Cooperative intelligent agent learning method based on multi-intelligent agent reinforcement learning | |
CN112465151A (en) | Multi-agent federal cooperation method based on deep reinforcement learning | |
CN108921298B (en) | Multi-agent communication and decision-making method for reinforcement learning | |
CN112131660A (en) | Unmanned aerial vehicle cluster collaborative learning method based on multi-agent reinforcement learning | |
CN113176776B (en) | Unmanned ship weather self-adaptive obstacle avoidance method based on deep reinforcement learning | |
CN111160525A (en) | Task unloading intelligent decision method based on unmanned aerial vehicle group in edge computing environment | |
CN108573303A (en) | It is a kind of that recovery policy is improved based on the complex network local failure for improving intensified learning certainly | |
CN113784410B (en) | Heterogeneous wireless network vertical switching method based on reinforcement learning TD3 algorithm | |
Andersen et al. | Towards safe reinforcement-learning in industrial grid-warehousing | |
CN114757351A (en) | Defense method for resisting attack by deep reinforcement learning model | |
CN116136945A (en) | Unmanned aerial vehicle cluster countermeasure game simulation method based on anti-facts base line | |
CN114861747A (en) | Method, device, equipment and storage medium for identifying key nodes of multilayer network | |
Kong et al. | Hierarchical multi‐agent reinforcement learning for multi‐aircraft close‐range air combat | |
CN115268493A (en) | Large-scale multi-unmanned-aerial-vehicle task scheduling method based on double-layer reinforcement learning | |
CN113276852B (en) | Unmanned lane keeping method based on maximum entropy reinforcement learning framework | |
Gisslén et al. | Sequential constant size compressors for reinforcement learning | |
CN116757249A (en) | Unmanned aerial vehicle cluster strategy intention recognition method based on distributed reinforcement learning | |
CN115759199A (en) | Multi-robot environment exploration method and system based on hierarchical graph neural network | |
Liu et al. | Forward-looking imaginative planning framework combined with prioritized-replay double DQN | |
CN114879738A (en) | Model-enhanced unmanned aerial vehicle flight trajectory reinforcement learning optimization method | |
CN115938104A (en) | Dynamic short-time road network traffic state prediction model and prediction method | |
CN114004282A (en) | Method for extracting deep reinforcement learning emergency control strategy of power system | |
Faber | The sensor management prisoners dilemma: a deep reinforcement learning approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210820 |