CN113286275A - Unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning - Google Patents

Unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning Download PDF

Info

Publication number
CN113286275A
CN113286275A CN202110441049.0A CN202110441049A CN113286275A CN 113286275 A CN113286275 A CN 113286275A CN 202110441049 A CN202110441049 A CN 202110441049A CN 113286275 A CN113286275 A CN 113286275A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
captain
reinforcement learning
communication method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110441049.0A
Other languages
Chinese (zh)
Inventor
俞扬
詹德川
周志华
练娅莉
袁雷
秦熔均
庞竟成
管聪
罗凡明
张云天
陈雄辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202110441049.0A priority Critical patent/CN113286275A/en
Publication of CN113286275A publication Critical patent/CN113286275A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/40Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
    • H04W4/46Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P] for vehicle-to-vehicle communication [V2V]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic

Abstract

The invention discloses an unmanned aerial vehicle cluster high-efficiency communication method based on multi-agent reinforcement learning, which comprises the steps of constructing an unmanned aerial vehicle flight environment simulator; randomly selecting an unmanned aerial vehicle as a captain and marking; each unmanned aerial vehicle acquires and maintains local observed values of the unmanned aerial vehicle, encodes the observed values and sends the encoded observed values to the captain; the captain performs attention mechanism processing on the global observation value according to the observation value of each unmanned aerial vehicle, determines the weight of the information according to the importance degree of the information, and then sends the calculated observation value to each teammate as the global observation value of the teammate; in the training stage, the global observation value is used as training data until the strategy network is converged; the execution phase is carried out in a distributed mode; the captain is given an additional reward for survival. The unmanned aerial vehicle cluster centralized information interaction method and the unmanned aerial vehicle cluster centralized information interaction system can solve the problem of unmanned aerial vehicle cluster centralized information interaction under the condition of low communication overhead, and give the unmanned aerial vehicle autonomous decision-making right.

Description

Unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning
Technical Field
The invention relates to an unmanned aerial vehicle cluster learning method and an unmanned aerial vehicle cluster information interaction method based on a multi-agent reinforcement learning algorithm, and belongs to the technical field of unmanned aerial vehicle cluster communication cooperation.
Background
With the rapid development of science and technology, unmanned aerial vehicles are more and more miniaturized and intelligentized, so that the unmanned aerial vehicles are widely applied to actions such as battlefield investigation, joint attack, emergency rescue and the like. Compared with a single unmanned aerial vehicle, the unmanned aerial vehicle cluster has obvious scale advantages, synergetic advantages and the like, and the limitation of the single unmanned aerial vehicle in the aspects of efficiency and endurance is effectively solved. At present, most unmanned aerial vehicle cluster solutions rely on deep reinforcement learning. Deep reinforcement learning is a branch of machine learning, and is a learning paradigm of interactive trial and error learning with the environment. The reinforcement learning has strong robustness, the core of the reinforcement learning is iterative learning through interaction with the environment, and two basic characteristics of the reinforcement learning are as follows: continuously interacting with the environment to perform trial and error exploration and obtaining delayed return after interacting with the environment. The goal of reinforcement learning is to acquire new knowledge in an iterative process, thereby improving the state-action function to adapt to the environment.
No matter the unmanned aerial vehicle cluster is collaborative training or co-deployment, the communication scheme among all unmanned aerial vehicles is an important component of the unmanned aerial vehicle cluster solution. The existing control strategy for unmanned aerial vehicle cluster information interaction generally comprises three modes, namely centralized control, distributed control and distributed control, wherein each mode has advantages and disadvantages. The centralized control effect is the best, but a large amount of information interaction is needed, the calculation amount is large, the communication efficiency is low, and the unmanned aerial vehicle cluster is often lack of flexibility and autonomy under the method.
In order to improve the communication efficiency among unmanned aerial vehicle clusters, the academic community proposes a plurality of methods, such as a centralized training and decentralized execution framework, but the method ignores the security of a physical layer. The centralized training and decentralized execution framework comprises an Actor network and a Critic network, action strategies of other agents are considered for reinforcement learning of each agent, centralized training and distributed execution are carried out, the Critic network can obtain global information in the training process, and input of the Actor in the execution process comprises local information of a single agent. The frame of decentralized training and decentralized execution is applied to unmanned aerial vehicle cluster control, so that the unmanned aerial vehicle cluster can be prevented from losing autonomy, and each unmanned aerial vehicle can make a decision according to local information acquired by a sensor of the unmanned aerial vehicle, so that certain autonomy capability is realized. However, in the cluster of unmanned aerial vehicles based on a frame for decentralized training and decentralized execution, if each unmanned aerial vehicle needs to interact information such as position, speed, posture and moving object with all unmanned aerial vehicles in the formation, bandwidth is wasted, and information is easily acquired by enemies.
Disclosure of Invention
The purpose of the invention is as follows: in order to overcome the defects of low communication efficiency and unsafe communication in an unmanned aerial vehicle cluster based on a centralized training and decentralized execution framework, the invention provides an unmanned aerial vehicle cluster high-efficiency communication method based on a multi-agent deep reinforcement learning framework based on the background of the multi-agent reinforcement learning algorithm in the unmanned aerial vehicle cluster. In the cluster, a 'captain-teammate' mode is adopted, a specific reward mechanism is designed for the captain, the longest survival time of the captain is guaranteed, then the unmanned aerial vehicle cluster is numbered, one captain is selected, and each unmanned aerial vehicle acquires and maintains a local observation value of the unmanned aerial vehicle through a sensor of the unmanned aerial vehicle. In order to reduce information dimension, each unmanned aerial vehicle carries out embedding coding on the local observed value, and the team leader collects the local observed values of teammates and maintains the local observed values as global observed values; in order to reduce the search space and improve the compactness among the information of the global observation values, the captain performs attention mechanism processing on the global observation values according to the own observation values of each unmanned aerial vehicle, and sends the calculated observation values to each teammate as the global observation values of the teammates. The scheme of the invention reduces the communication frequency from N (N-1) to 2(N-1) (N is the number of unmanned racks), greatly reduces communication links and communication traffic, and can better solve the problems of communication efficiency and safety.
The technical scheme is as follows: an unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning comprises the following steps: (1) constructing an unmanned aerial vehicle flight environment simulator; (2) in the unmanned aerial vehicle cluster, randomly selecting one unmanned aerial vehicle as a team leader and marking, and taking the rest unmanned aerial vehicles as teammates; (3) the team leader is an observation value transfer station, collects local observation values of the team members and maintains the local observation values into a global observation value, and sends the global observation value to team friends for information interaction; (4) performing centralized training and decentralized execution based on a centralized training and decentralized execution framework, for example, performing unmanned aerial vehicle action training and execution by using an MADDPG algorithm and a QMIX algorithm MADDPG framework, wherein a global observation value is used as training data in a training stage until a strategy network is converged; the execution stage is carried out in a distributed mode, namely each unmanned aerial vehicle sends the local observation value of the unmanned aerial vehicle to a strategy execution network to obtain corresponding actions; (5) to maintain the captain free from attacks, the captain is given an additional reward by a reward function for survival.
In the step (1), an unmanned aerial vehicle flight environment simulator based on aerodynamics is constructed based on a simulation environment engine.
In the step (3), each unmanned aerial vehicle acquires and maintains the local observation value of the unmanned aerial vehicle, encodes the local observation value of the unmanned aerial vehicle and sends the encoded local observation value to the captain; because the importance degrees of the information in the global observation value formed by the own observation values of the unmanned aerial vehicles are inconsistent, the captain respectively carries out attention mechanism processing on the global observation value according to the own local observation value of each unmanned aerial vehicle, determines the weight of the information according to the importance degree of the information, and then sends the calculated observation value to each teammate to serve as the global observation value of the teammate, so that the exploration space is reduced, the learning efficiency is improved, and the global observation value which is more closely related is constructed.
Meanwhile, in order to further reduce bandwidth consumption, each unmanned aerial vehicle observes a local observation value o of the unmanned aerial vehicleωCarrying out embedding coding processing, wherein each unmanned aerial vehicle shares the same coding mechanism, and teammates can obtain self observed values containing position, speed, attitude and state information after coding
Figure BDA0003035015690000021
And sending the observation values to the captain, and collecting the local observation values of the teammates by the captain to maintain the local observation values into a whole local observation value.
In the initial stage, the unmanned aerial vehicle information is deficient, and the unmanned aerial vehicle can be based on the local observed value o of the unmanned aerial vehicleωUsing a strategy of piωGenerate a corresponding action aω
In the training phase, in order to guarantee the safety of unmanned aerial vehicle cluster communication, a captain needs to survive to the end in the whole cooperation process, so as to design a specific reward function, wherein the reward function comprises: course reward function
Figure BDA0003035015690000031
Outcome reward function
Figure BDA0003035015690000032
Team leader reward function
Figure BDA0003035015690000033
The attention mechanism function contains three basic elements: query, key, value, where query represents a given element,<key,value>forming a data pair, which represents the value corresponding to each key; query represents the action performed by the unmanned aerial vehicle, Key represents the current state of each unmanned aerial vehicle, value corresponds to Key and is the number of KeyAnd presenting the value. First pass through the similarity function
Figure BDA0003035015690000034
Calculating the similarity between a given query and each key, and then passing through a softmax function
Figure BDA0003035015690000035
Obtaining the normalized Attention weight, and finally carrying out weighted summation on the normalized Attention weight to Attention (Q, K)i)=∑iαiValuei,αiRepresenting the weight of each original query. The weight attention mechanism of each element pay comprises a plurality of keys, alphaiThe similarity between a certain key and a given query is calculated, and for calculating the final weight of the query, the weights of the query and each key need to be weighted and summed, so as to obtain an attribute value. Q stands for query, and the superscript T stands for transposing Q.
In the unmanned aerial vehicle flight environment simulator, an unmanned aerial vehicle cluster interacts with the environment to acquire training data. Each unmanned aerial vehicle acquires the local observed value, takes action according to the action strategy of the unmanned aerial vehicle, and obtains the reward value. Storing the obtained tuples composed of the global observation values, the actions and the rewards into an experience playback pool
Figure BDA0003035015690000036
In (1).
In the training stage, the Critic network of the evaluation network is centrally trained, and the joint Q value function of the Critic network of the evaluation network is defined as
Figure BDA0003035015690000037
Wherein
Figure BDA0003035015690000038
Is a parameter of the action strategy function, and the optimization goal is
Figure BDA0003035015690000039
Figure BDA00030350156900000310
Wherein
Figure BDA00030350156900000311
Is the target action of the next moment. Sampling partial samples from training data
Figure BDA00030350156900000312
The function is optimized until the model (Critic network) converges. r represents a bonus that is given,
Figure BDA00030350156900000313
representing the global reward, gamma is a loss factor, representing the impact of reward on the current action.
In the training stage, the recommendation strategy is trained through a gradient descent method to maximize the accumulated reward
Figure BDA00030350156900000314
T is the time range of the time,
Figure BDA00030350156900000315
representing the impact of reward on the current action, the smaller the value, the greater the value,
Figure BDA00030350156900000316
is time t reward.
The optimization target is as follows:
Figure BDA00030350156900000317
wherein
Figure BDA00030350156900000318
Representing the strategy under different roles, x, y and z represent the number of the unmanned aerial vehicle, and can be expanded into a plurality of numbers thetaωRepresenting policy parameters of the drone.
In the execution stage, the local observation value of the local machine is sent to the strategy execution network to obtain the corresponding action.
Has the advantages that: compared with the prior art, the unmanned aerial vehicle cluster high-efficiency communication method based on multi-agent reinforcement learning provided by the invention has the advantages that the unmanned aerial vehicle obtains rewards from the environment by adopting a deep reinforcement learning algorithm, has autonomous decision making capability, and realizes higher unmanned aerial vehicle cluster flight control than the traditional rule-based control mode;
according to the invention, the cluster autonomous control of the unmanned aerial vehicle is realized through a multi-agent deep reinforcement learning algorithm, so that the problem caused by a centralized information interaction mode can be effectively solved, and the unmanned aerial vehicle has autonomous capability;
the invention strips the decision-making power of the control center in centralized information interaction, hands each unmanned aerial vehicle to make a decision autonomously, adopts a frame based on centralized training and decentralized execution, and effectively solves the problems of unmanned aerial vehicle cluster communication calculated amount, autonomy and the like by centralized training and distributed execution.
The invention carries out coding processing on the local observed value, effectively reduces the dimensionality of information, and thus can reduce bandwidth consumption; and attention mechanism processing is carried out on the global observation value, the importance degree of different information is reflected, the exploration space is reduced, and therefore the learning efficiency is improved.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
fig. 2 is a schematic diagram of interaction of a cluster of drones with an environment.
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.
First, the main process of the method of the present invention is briefly described with reference to FIG. 1. The experiment begins, constructs the unmanned aerial vehicle flight environment simulator, initializes the intelligent unmanned aerial vehicle, and then selects the captain and marks. And starting each unmanned aerial vehicle to acquire environmental information, then performing information sharing with the unmanned aerial vehicle cluster, and then judging whether the training information is enough. If the information is not enough, each unmanned aerial vehicle continues to acquire the environmental information, and if the information is enough, whether the information center contains the captain information is judged; if the captain has no captain confidence, reselecting the captain, and if the captain information is contained, training the model; and then judging whether the model converges, if not, reselecting the captain, and if so, finishing the training.
The process of interaction of the drone cluster with the environment is also described with respect to fig. 2. Every unmanned aerial vehicle interacts with the environment, unmanned aerial vehicle obtains environmental information, and exert the action in the environment, later unmanned aerial vehicle will obtain environmental information and carry out embedding embedded code, the team member sends the information after the code for the team leader, the team leader puts into the attention mechanism with training information, give information and add weight, later put into the model and train, the data deployment after the model will train is at the intelligent body (unmanned aerial vehicle promptly), the intelligent body interacts with the environment, carry out the training of next round, so the circulation.
In a specific embodiment, the method of the present invention is further detailed by means of pseudo code.
Algorithm 1 unmanned aerial vehicle cluster cooperation algorithm based on multi-agent reinforcement learning
Inputting: agent structure, attention network, environmental information, other model training parameters
And (3) outputting: structure and weight of each agent policy network
1: simulator constructed according to flight physical environment of unmanned aerial vehicle
2: initializing Agents
3: while policy model does not converge yet
4: selecting the queue length from each agent according to a certain strategy
5: while training information accumulation failing to reach threshold do
6: the intelligent agent interacts with the environment simulator to accumulate interaction information
7: each team member embedding self-accumulated interactive information
8: each team member sends the embedding data to the captain
9: the captain collects and processes the data, and gives different weights to the information through an attention mechanism
10:end while
11: checking whether the information related to the captain is legal
12: if captain information is illegal the
13: abandoning the training data accumulated by the iteration
14:continue
15:end if
16: training strategy model by using newly added data
17: according to the optimization objective
Figure BDA0003035015690000051
Training a Critic network in a centralized manner
18:end while
First, an environment simulator is constructed. And constructing an unmanned aerial vehicle flight environment simulator based on aerodynamics based on a simulation environment engine. This embodiment contains three cooperation unmanned aerial vehicles, can be convenient extend to many unmanned aerial vehicle clusters. A reward function is designed. The roles contained in the three cooperative unmanned aerial vehicles comprise a captain and team members, and in order to guarantee the safety of unmanned aerial vehicle cluster communication, the captain needs to survive to the end in the whole cooperation process, so that a specific reward function is designed as follows: course reward function
Figure BDA0003035015690000052
Figure BDA0003035015690000053
Outcome reward function
Figure BDA0003035015690000054
Team leader reward function
Figure BDA0003035015690000061
Secondly, the three cooperative unmanned aerial vehicles are initialized and numbered<x,y,z>Selecting one unmanned aerial vehicle as the captain and marking the captain as xc. The following steps are carried out:
step 1:
in the training phase, the unmanned aerial vehicle x, y and z interact with the environment and transmit through the unmanned aerial vehicle x, y and zThe sensor (sensor combination) acquires a local observed value, then takes the observed value as the input of the embedding layer, and outputs the encoded observed value
Figure BDA0003035015690000062
Step 2:
the team leader collects the own observed values after the team friends encode, and integrates the own local observed values and the local observed values from the team friends into a global observed value
Figure BDA0003035015690000063
Performing attention mechanism processing on the global observation value and calculating the calculated global observation value
Figure BDA0003035015690000064
Sending the information to teammates;
and step 3:
designing a neural network structure, selecting a neural network hyper-parameter, and building a neural network.
For example, a policy network may include 5-layer fully-connected neural networks, each layer using a relu function as an activation function.
And 4, step 4:
and training a flight control strategy in the simulator by using the current flight control strategy according to a training process of a centralization training and decentralization execution frame until the model is converged.
And 5:
according to the execution flow of the centralization training decentralization execution framework, the obtained local observation value is sent to a strategy execution network (a model obtained after convergence), and a corresponding action is obtained.
In the unmanned aerial vehicle flight environment simulator, an unmanned aerial vehicle cluster interacts with the environment to acquire training data. Each unmanned aerial vehicle acquires local observed values and acts according to own action strategy
Figure BDA0003035015690000065
Taking action
Figure BDA0003035015690000066
Earning a prize value of
Figure BDA0003035015690000067
The global observation value, the action and the reward obtained above are combined into a tuple
Figure BDA0003035015690000068
Store to experience playback pool
Figure BDA0003035015690000069
In (1).
Figure BDA00030350156900000610
Indicating the environment information of the next moment.
In step 4, the Critic network is centrally trained, and the joint Q value function is defined as
Figure BDA00030350156900000611
Wherein
Figure BDA00030350156900000612
Is a parameter of the action strategy function, and the optimization goal is
Figure BDA00030350156900000613
Wherein
Figure BDA00030350156900000614
Is the target action of the next moment. Sampling partial samples from training data
Figure BDA00030350156900000615
The function is optimized until the model converges.
Training recommendation strategy by gradient descent method to maximize cumulative rewards
Figure BDA0003035015690000071
The optimization target is as follows:
Figure BDA0003035015690000072
wherein
Figure BDA0003035015690000073
Representing policies under different roles.

Claims (10)

1. An unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning is characterized by comprising the following steps: (1) constructing an unmanned aerial vehicle flight environment simulator; (2) in the unmanned aerial vehicle cluster, randomly selecting one unmanned aerial vehicle as a team leader and marking, and taking the rest unmanned aerial vehicles as teammates; (3) the team leader is an observation value transfer station, collects local observation values of the team members and maintains the local observation values into a global observation value, and sends the global observation value to team friends for information interaction; (4) based on a frame of decentralized execution of centralized training, the training stage takes the global observation value as training data until the strategy network is converged; the execution stage is carried out in a distributed mode, namely each unmanned aerial vehicle sends the local observation value of the unmanned aerial vehicle to a strategy execution network to obtain corresponding actions; (5) to maintain the captain free from attacks, the captain is given an additional reward by a reward function for survival.
2. The multi-agent reinforcement learning-based unmanned aerial vehicle cluster high-efficiency communication method according to claim 1, wherein in the step (1), an aerodynamic-based unmanned aerial vehicle flight environment simulator is constructed based on a simulation environment engine.
3. The multi-agent reinforcement learning-based unmanned aerial vehicle cluster efficient communication method according to claim 1, wherein in the step (3), each unmanned aerial vehicle acquires and maintains local observation values of the unmanned aerial vehicle, encodes the local observation values of the unmanned aerial vehicle and transmits the local observation values to the captain; the team leader respectively carries out attention mechanism processing on the global observation values according to the local observation values of each unmanned aerial vehicle, determines the weight of the information according to the importance degree of the information, and then sends the calculated observation values to each team friend to serve as the global observation values of the team friends.
4. The multi-agent reinforcement learning-based unmanned aerial vehicle cluster high-efficiency communication method as claimed in claim 1, wherein each unmanned aerial vehicle is used for observing its own local observation value oωCarrying out embedding coding processing, wherein each unmanned aerial vehicle shares the same coding mechanism, and teammates can obtain self observed values containing position, speed, attitude and state information after coding
Figure FDA0003035015680000011
And sending the observation values to the captain, and collecting the local observation values of the teammates by the captain to maintain the local observation values into a whole local observation value.
5. The multi-agent reinforcement learning-based unmanned aerial vehicle cluster high-efficiency communication method as claimed in claim 1, wherein in the initial stage, the unmanned aerial vehicle can perform local observation o according to itselfωUsing a strategy of piωGenerate a corresponding action aω
6. The multi-agent reinforcement learning-based unmanned aerial vehicle cluster high-efficiency communication method as claimed in claim 1, wherein the captain needs to survive to the end in the whole unmanned aerial vehicle cooperation process so as to design a reward function, wherein the reward function comprises: course reward function
Figure FDA0003035015680000012
Outcome reward function
Figure FDA0003035015680000013
Team leader reward function
Figure FDA0003035015680000014
7. The multi-agent reinforcement learning-based unmanned aerial vehicle cluster high-efficiency communication method according to claim 3, wherein the attention mechanism function comprises three basic elements: query, key, value, first through a similarity function
Figure FDA0003035015680000015
Calculating the similarity between a given query and each key, and then passing through a softmax function
Figure FDA0003035015680000021
Obtaining the normalized Attention weight, and finally carrying out weighted summation on the normalized Attention weight to Attention (Q, K)i)=∑iαiValuei
8. The multi-agent reinforcement learning-based unmanned aerial vehicle cluster high-efficiency communication method as claimed in claim 1, wherein in the unmanned aerial vehicle flight environment simulator, the unmanned aerial vehicle cluster interacts with the environment to obtain training data. Each unmanned aerial vehicle acquires a local observation value, takes action according to an action strategy of the unmanned aerial vehicle, and obtains a reward value; storing the obtained tuples composed of the global observation values, the actions and the rewards into an experience playback pool
Figure FDA0003035015680000022
In (1).
9. The multi-agent reinforcement learning-based unmanned aerial vehicle cluster efficient communication method as claimed in claim 1, wherein the Critic network is trained in a centralized manner, and its joint Q-value function is defined as
Figure FDA0003035015680000023
Wherein
Figure FDA0003035015680000024
Is a parameter of the action strategy function, and the optimization goal is
Figure FDA0003035015680000025
Wherein
Figure FDA0003035015680000026
The target action of the next moment; and sampling part of samples from the training data to perform function optimization until the model converges.
10. The multi-agent reinforcement learning-based unmanned aerial vehicle cluster high-efficiency communication method as claimed in claim 1, wherein gradient descent method is adopted to train recommendation strategy to maximize accumulated rewards
Figure FDA0003035015680000027
The optimization target is as follows:
Figure FDA0003035015680000028
wherein
Figure FDA0003035015680000029
Representing the policy under different roles, ω denotes the number of the drone.
CN202110441049.0A 2021-04-23 2021-04-23 Unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning Pending CN113286275A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110441049.0A CN113286275A (en) 2021-04-23 2021-04-23 Unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110441049.0A CN113286275A (en) 2021-04-23 2021-04-23 Unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning

Publications (1)

Publication Number Publication Date
CN113286275A true CN113286275A (en) 2021-08-20

Family

ID=77277177

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110441049.0A Pending CN113286275A (en) 2021-04-23 2021-04-23 Unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning

Country Status (1)

Country Link
CN (1) CN113286275A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113703482A (en) * 2021-08-30 2021-11-26 西安电子科技大学 Task planning method based on simplified attention network in large-scale unmanned aerial vehicle cluster
CN114980020A (en) * 2022-05-17 2022-08-30 重庆邮电大学 Unmanned aerial vehicle data collection method based on MADDPG algorithm
CN115150408A (en) * 2022-06-21 2022-10-04 中国电子科技集团公司第五十四研究所 Unmanned cluster distributed situation maintenance method based on information extraction
CN115857556A (en) * 2023-01-30 2023-03-28 中国人民解放军96901部队 Unmanned aerial vehicle collaborative detection planning method based on reinforcement learning
WO2023109699A1 (en) * 2021-12-17 2023-06-22 深圳先进技术研究院 Multi-agent communication learning method
CN117572893A (en) * 2024-01-15 2024-02-20 白杨时代(北京)科技有限公司 Unmanned plane cluster countermeasure strategy acquisition method based on reinforcement learning and related equipment
CN117707219A (en) * 2024-02-05 2024-03-15 西安羚控电子科技有限公司 Unmanned aerial vehicle cluster investigation countermeasure method and device based on deep reinforcement learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991545A (en) * 2019-12-10 2020-04-10 中国人民解放军军事科学院国防科技创新研究院 Multi-agent confrontation oriented reinforcement learning training optimization method and device
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
CN111860162A (en) * 2020-06-17 2020-10-30 上海交通大学 Video crowd counting system and method
CN112131660A (en) * 2020-09-10 2020-12-25 南京大学 Unmanned aerial vehicle cluster collaborative learning method based on multi-agent reinforcement learning
CN112269876A (en) * 2020-10-26 2021-01-26 南京邮电大学 Text classification method based on deep learning
CN112561395A (en) * 2020-12-25 2021-03-26 桂林电子科技大学 Unmanned aerial vehicle cooperation method, system, device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991545A (en) * 2019-12-10 2020-04-10 中国人民解放军军事科学院国防科技创新研究院 Multi-agent confrontation oriented reinforcement learning training optimization method and device
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
CN111860162A (en) * 2020-06-17 2020-10-30 上海交通大学 Video crowd counting system and method
CN112131660A (en) * 2020-09-10 2020-12-25 南京大学 Unmanned aerial vehicle cluster collaborative learning method based on multi-agent reinforcement learning
CN112269876A (en) * 2020-10-26 2021-01-26 南京邮电大学 Text classification method based on deep learning
CN112561395A (en) * 2020-12-25 2021-03-26 桂林电子科技大学 Unmanned aerial vehicle cooperation method, system, device, electronic equipment and storage medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113703482A (en) * 2021-08-30 2021-11-26 西安电子科技大学 Task planning method based on simplified attention network in large-scale unmanned aerial vehicle cluster
CN113703482B (en) * 2021-08-30 2022-08-12 西安电子科技大学 Task planning method based on simplified attention network in large-scale unmanned aerial vehicle cluster
WO2023109699A1 (en) * 2021-12-17 2023-06-22 深圳先进技术研究院 Multi-agent communication learning method
CN114980020A (en) * 2022-05-17 2022-08-30 重庆邮电大学 Unmanned aerial vehicle data collection method based on MADDPG algorithm
CN115150408A (en) * 2022-06-21 2022-10-04 中国电子科技集团公司第五十四研究所 Unmanned cluster distributed situation maintenance method based on information extraction
CN115857556A (en) * 2023-01-30 2023-03-28 中国人民解放军96901部队 Unmanned aerial vehicle collaborative detection planning method based on reinforcement learning
CN117572893A (en) * 2024-01-15 2024-02-20 白杨时代(北京)科技有限公司 Unmanned plane cluster countermeasure strategy acquisition method based on reinforcement learning and related equipment
CN117572893B (en) * 2024-01-15 2024-03-19 白杨时代(北京)科技有限公司 Unmanned plane cluster countermeasure strategy acquisition method based on reinforcement learning and related equipment
CN117707219A (en) * 2024-02-05 2024-03-15 西安羚控电子科技有限公司 Unmanned aerial vehicle cluster investigation countermeasure method and device based on deep reinforcement learning

Similar Documents

Publication Publication Date Title
CN113286275A (en) Unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning
CN109635917B (en) Multi-agent cooperation decision and training method
CN110852448A (en) Cooperative intelligent agent learning method based on multi-intelligent agent reinforcement learning
CN112465151A (en) Multi-agent federal cooperation method based on deep reinforcement learning
CN108921298B (en) Multi-agent communication and decision-making method for reinforcement learning
CN112131660A (en) Unmanned aerial vehicle cluster collaborative learning method based on multi-agent reinforcement learning
CN113176776B (en) Unmanned ship weather self-adaptive obstacle avoidance method based on deep reinforcement learning
CN111160525A (en) Task unloading intelligent decision method based on unmanned aerial vehicle group in edge computing environment
CN108573303A (en) It is a kind of that recovery policy is improved based on the complex network local failure for improving intensified learning certainly
CN113784410B (en) Heterogeneous wireless network vertical switching method based on reinforcement learning TD3 algorithm
Andersen et al. Towards safe reinforcement-learning in industrial grid-warehousing
CN114757351A (en) Defense method for resisting attack by deep reinforcement learning model
CN116136945A (en) Unmanned aerial vehicle cluster countermeasure game simulation method based on anti-facts base line
CN114861747A (en) Method, device, equipment and storage medium for identifying key nodes of multilayer network
Kong et al. Hierarchical multi‐agent reinforcement learning for multi‐aircraft close‐range air combat
CN115268493A (en) Large-scale multi-unmanned-aerial-vehicle task scheduling method based on double-layer reinforcement learning
CN113276852B (en) Unmanned lane keeping method based on maximum entropy reinforcement learning framework
Gisslén et al. Sequential constant size compressors for reinforcement learning
CN116757249A (en) Unmanned aerial vehicle cluster strategy intention recognition method based on distributed reinforcement learning
CN115759199A (en) Multi-robot environment exploration method and system based on hierarchical graph neural network
Liu et al. Forward-looking imaginative planning framework combined with prioritized-replay double DQN
CN114879738A (en) Model-enhanced unmanned aerial vehicle flight trajectory reinforcement learning optimization method
CN115938104A (en) Dynamic short-time road network traffic state prediction model and prediction method
CN114004282A (en) Method for extracting deep reinforcement learning emergency control strategy of power system
Faber The sensor management prisoners dilemma: a deep reinforcement learning approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210820