CN112256056B - Unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning - Google Patents

Unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning Download PDF

Info

Publication number
CN112256056B
CN112256056B CN202011118496.4A CN202011118496A CN112256056B CN 112256056 B CN112256056 B CN 112256056B CN 202011118496 A CN202011118496 A CN 202011118496A CN 112256056 B CN112256056 B CN 112256056B
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
value
network
information acquisition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011118496.4A
Other languages
Chinese (zh)
Other versions
CN112256056A (en
Inventor
陈武辉
杨志华
郑子彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202011118496.4A priority Critical patent/CN112256056B/en
Publication of CN112256056A publication Critical patent/CN112256056A/en
Application granted granted Critical
Publication of CN112256056B publication Critical patent/CN112256056B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft
    • G05D1/104Simultaneous control of position or course in three dimensions specially adapted for aircraft involving a plurality of aircrafts, e.g. formation flying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention provides an unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning, wherein the method comprises the following steps: establishing an information acquisition task model according to parameters of the unmanned aerial vehicle group information acquisition system; the information acquisition task is divided into an acquisition subtask and a calculation subtask; constructing a deep neural network model according to the task model, and training the deep neural network model by utilizing a multi-agent deep reinforcement learning algorithm combined with an attention mechanism; and controlling the unmanned aerial vehicle group in the actual environment to complete the information acquisition task by using the trained deep neural network model. In the invention, each unmanned aerial vehicle is used as an intelligent agent, and the performance of the actor network is evaluated by using the critic network with an attention unit, so that the training speed of the actor network can be accelerated by using a more accurate evaluation value; when the information acquisition task is executed, each unmanned aerial vehicle does not need to communicate with other intelligent bodies, so that the communication time delay is reduced.

Description

Unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning
Technical Field
The invention relates to the technical field of wireless communication, in particular to an unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning.
Background
An Unmanned Aerial Vehicle (UAV) is an unmanned aircraft that is remotely controlled by an operator via a radio remote control device or automatically controlled by a computer program. The majority of applications of the unmanned aerial vehicle are information acquisition tasks, and in the prior art, the control instruction for the data acquisition tasks of the multi-unmanned aerial vehicle system is solved mainly by two methods, namely a heuristic method and a method based on machine learning.
The heuristic algorithm needs to obtain the most information acquisition and calculation migration scheme through multiple rounds of calculation after receiving the tasks, so that a large time delay is generated, and some urgent tasks are not facilitated; the deep reinforcement learning algorithm of the single intelligent agent needs to acquire the states of all unmanned aerial vehicles in a communication mode after receiving a task, a certain time delay is generated, meanwhile, as the number of the unmanned aerial vehicles increases, the training times required for a single deep neural network to achieve convergence are also greatly increased, and the obtained strategy is difficult to realize better energy consumption and time consumption.
Therefore, it is difficult for the drone system to make appropriate strategies within a short time delay when confronted with various complex tasks and environments.
Disclosure of Invention
The invention aims to provide an unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning, and aims to solve the technical problem that an unmanned aerial vehicle system is difficult to make a proper strategy within a short time delay when facing various complex tasks and environments.
The purpose of the invention can be realized by the following technical scheme:
an unmanned aerial vehicle control method based on multi-agent deep reinforcement learning comprises the following steps:
establishing an information acquisition task model according to parameters of the unmanned aerial vehicle group information acquisition system; the information acquisition task is divided into an acquisition subtask and a calculation subtask;
constructing a deep neural network model according to the task model, and training the deep neural network model by utilizing a multi-agent deep reinforcement learning algorithm combined with an attention mechanism; wherein the agent is an unmanned aerial vehicle;
and controlling the unmanned aerial vehicle group in the actual environment to complete an information acquisition task by using the trained deep neural network model.
Optionally, before constructing the deep neural network model according to the task model, the method further includes:
the parameters of the unmanned aerial vehicle group information acquisition system are converted into a state space of the system and an action space of the intelligent agent, and an instant reward function is set.
Optionally, the deep neural network model specifically includes: the deep neural network model comprises an operator network and a critic network, wherein the operator network comprises an estimation operator network and a target operator network, the critic network comprises an estimation critic network and a target critic network, and attention units are embedded in the critic network on three full connection layers.
Optionally, the method further comprises: when an actor network is trained, a critic network with an attention unit is used for evaluating the performance of the actor network, and the specific process is as follows:
firstly, the number of unmanned planes in the unmanned plane cluster is N, and the observed value o of an unmanned plane i (i is more than or equal to 1 and less than or equal to N) is usediAnd an action value aiInputting the single-layer full-connection layer to obtain the state action characteristic value g (o) of each unmanned aerial vehiclei,ai) Inputting the state action characteristic values of all unmanned aerial vehicles into an attention unit;
the attention unit calculates attention weight alpha of the unmanned aerial vehicle j according to the characteristic value of the unmanned aerial vehicle i and the characteristic values of the other unmanned aerial vehicles j (j ≠ i)j
Figure BDA0002731160330000021
Wherein the content of the first and second substances,
Figure BDA0002731160330000022
and WqIs a learnable attention parameter matrix;
calculating the influence value e of the rest unmanned aerial vehicles on the unmanned aerial vehicle i in a weighted sum mode according to the attention weight and the feature values of the rest unmanned aerial vehiclesi
Figure BDA0002731160330000023
The state action characteristic value g (o) of the unmanned aerial vehicle ii,ai) And the influence value eiThe value Q of the action state of the unmanned aerial vehicle is obtained by inputting the value Q into a double-layer full-connection layer networki
Optionally, the training of the deep neural network model by using a multi-agent deep reinforcement learning algorithm with an attention mechanism specifically includes:
s201: randomly initializing a system state and a neural network parameter;
s202: acquiring an observation value X of the current time slot of each unmanned aerial vehicle as [ o ] according to the system state and the observation range of the unmanned aerial vehicle1,o2,…,oM](ii) a Wherein M is the number of unmanned aerial vehicles in the unmanned aerial vehicle groupAn amount;
s203: the observed value o of each unmanned aerial vehicleiInputting the data into a corresponding actor network to obtain an action value a corresponding to each unmanned aerial vehiclei(ii) a Wherein i is more than or equal to 1 and less than or equal to M;
s204: according to the system state and the action value A of all the unmanned planes in the current time slot ═ a0,a1,…,aM]Get the reward R ═ R for all drones0,r1,…,rM]The system next slot state S' and the observed value X ═ o1′,o2′,…,o′M]Storing the experience samples (X, A, R, X') in an experience pool of the intelligent agent;
s205: and repeating S202-S204 until the number of the experience pool samples reaches a set threshold, and extracting a certain number of experience samples from the experience pool to update the neural network parameters until the strategy function of the operator network converges.
Optionally, the step of controlling the unmanned aerial vehicle group in the actual environment to complete the information acquisition task by using the trained deep neural network model specifically includes:
parameterizing the state of the unmanned aerial vehicle system and the observed value of each unmanned aerial vehicle in the actual environment;
inputting the observation value parameterized by the unmanned aerial vehicle into a trained operator network to obtain an action value of the unmanned aerial vehicle;
and converting the action value into an acquisition instruction and a calculation instruction, and acquiring information and calculating and transferring by the unmanned aerial vehicle according to the acquisition instruction and the calculation instruction.
The invention also provides an unmanned aerial vehicle control system based on multi-agent deep reinforcement learning, which comprises the following components:
the task model establishing module is used for establishing an information acquisition task model according to parameters of the unmanned aerial vehicle group information acquisition system; the information acquisition task is divided into an acquisition subtask and a calculation subtask;
the deep neural network building and training model is used for building a deep neural network model according to the task model and training the deep neural network model by utilizing a multi-agent deep reinforcement learning algorithm combined with an attention mechanism; wherein the agent is an unmanned aerial vehicle;
and the information acquisition task execution module is used for controlling the unmanned aerial vehicle group in the actual environment to complete the information acquisition task by utilizing the trained deep neural network model.
Optionally, the method further comprises:
and the system parameter conversion module is used for converting the parameters of the unmanned aerial vehicle group information acquisition system into a state space of the system and an action space of the intelligent agent and setting an instant reward function.
The present invention also provides a computer-readable storage medium for storing a computer program; wherein the computer program, when executed by a processor, implements the multi-agent deep reinforcement learning-based drone control method.
The present invention also provides an electronic device, comprising:
a memory for storing a computer program;
and the processor is used for executing the computer program to realize the unmanned aerial vehicle control method based on multi-agent deep reinforcement learning.
The invention provides an unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning, wherein the method comprises the following steps: establishing an information acquisition task model according to parameters of the unmanned aerial vehicle group information acquisition system; the information acquisition task is divided into an acquisition subtask and a calculation subtask; constructing a deep neural network model according to the task model, and training the deep neural network model by utilizing a multi-agent deep reinforcement learning algorithm combined with an attention mechanism; wherein the agent is an unmanned aerial vehicle; and controlling the unmanned aerial vehicle group in the actual environment to complete an information acquisition task by using the trained deep neural network model.
In the invention, each unmanned aerial vehicle is regarded as an intelligent agent, and the multi-intelligent-agent deep reinforcement learning only needs to acquire the reward value through interaction between each intelligent agent and the environment, so that the strategy of the intelligent agents is continuously learned and improved, and the state information of the whole system is not required to be acquired through communication when the intelligent agents make decisions, thereby avoiding the time delay of communication. When the actor network is trained, the critic network with the attention unit is used for evaluating the performance of the actor network, so that the influence of other agents with higher similarity on the actor network can be better noticed during evaluation, more accurate evaluation values are obtained to guide the training of the actor network, and the training speed of the actor network is accelerated. When the information acquisition task is executed, each unmanned aerial vehicle can obtain the control instruction of the task period only by directly inputting the observation value of the unmanned aerial vehicle into the trained actor network, so that the condition that the single intelligent agent depth reinforcement learning algorithm needs to collect the states and the observation values of all unmanned aerial vehicles through communication before the control instruction is formulated is avoided, and the reaction time delay is reduced.
Drawings
FIG. 1 is a schematic diagram of a neural network training framework of the multi-agent deep reinforcement learning-based unmanned aerial vehicle control method and system of the present invention;
FIG. 2 is a method flow diagram of the multi-agent deep reinforcement learning-based unmanned aerial vehicle control method and system of the present invention;
fig. 3 is a flowchart of a method of an embodiment of the multi-agent deep reinforcement learning-based drone control method and system of the present invention.
Detailed Description
Interpretation of terms:
compute offload (computing offload) is the transfer of resource-intensive computing tasks onto a separate processor (e.g., hardware accelerator) or an external platform (e.g., cloud server, edge server). Offloading to a coprocessor may be used to accelerate applications, including image rendering and mathematical computations. Offloading the computing to an external platform over a network may provide computing power and overcome hardware limitations of the device, such as limited computing power, storage, and energy.
Multi-agent deep reinforcement learning (Multi-agent deep reinforcement learning): in a multi-agent system, each agent learns to improve its own policy by interacting with the environment to obtain reward values (rewarded), thus obtaining a process of optimal policy in the environment.
Attention mechanism (Attention mechanism): the attention mechanism in deep learning is similar to the human selective mechanism in nature, and the core target is to select information which is more critical to the current task target from a plurality of information. Currently, attention mechanism has been widely used in various deep learning tasks such as natural language processing, image recognition and speech recognition, and is one of the most important core technologies in deep learning technology.
The embodiment of the invention provides an unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning, and aims to solve the technical problem that an unmanned aerial vehicle system is difficult to make a proper strategy within a short time delay when facing various complex tasks and environments.
To facilitate an understanding of the invention, the invention will now be described more fully with reference to the accompanying drawings. Preferred embodiments of the present invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The unmanned aerial vehicle is mainly applied to the military field at the beginning of birth and is used for replacing a common manned aircraft to perform 'dull' or 'dangerous' tasks, such as intelligence reconnaissance, ammunition release and the like. With the improvement of the manufacturing technology of unmanned aerial vehicles and the emergence of unmanned aerial vehicles with various functions in recent years, the application range of unmanned aerial vehicles has been expanded to a plurality of civil fields, such as terrain exploration, traffic road condition monitoring, scenic spot aerial photography, natural disaster observation and the like. And as the complexity of the application is gradually increased, the unmanned aerial vehicle cluster is cooperated to gradually replace a single unmanned aerial vehicle so as to improve the efficiency of the system. For a single unmanned aerial vehicle, the most common control mode is manual remote control, but for a multi-unmanned aerial vehicle system, a large amount of manpower is consumed for configuring one controller for each unmanned aerial vehicle to control, and therefore the industry often uses a computer program to perform automatic control. For example, in the flight performance of the unmanned aerial vehicle cluster, each unmanned aerial vehicle is controlled by a preset program. However, in a complex and variable environment, a preset program cannot give a proper instruction to the unmanned aerial vehicle according to specific conditions. Therefore, a method for making different flight control commands according to different specific environmental conditions is needed.
The majority of the applications of the unmanned aerial vehicle can be regarded as information acquisition tasks, and the information data of the earth surface is acquired by utilizing devices such as a high-definition camera and an infrared sensor which are equipped by the unmanned aerial vehicle. Meanwhile, the data result required by the user is not only the original collected data such as a photo, but also a result obtained by calculating the collected original data to a certain degree. For example, for terrain exploration, the user-desired result is often a 3D terrain map rendered from the acquired data; for traffic road condition monitoring, the result required by the user is often road condition data such as traffic flow calculated according to the shot picture. Therefore, the information collection task of the unmanned aerial vehicle can be divided into an acquisition subtask and a calculation subtask. With the development of chip technology, the chip carried on the unmanned aerial vehicle can already complete a certain calculation task, but due to the limitations of electric quantity, time and the like, the unmanned aerial vehicle is difficult to independently complete all calculation tasks. In order to solve the problem, part of the computing tasks of the unmanned aerial vehicle can be calculated and migrated, namely, part of the computing tasks of the unmanned aerial vehicle are uploaded to the cloud server or the edge server, and the computing tasks are rapidly completed by the aid of the cloud server and the edge server which are higher in computing capacity. When calculation migration is carried out, the unmanned aerial vehicle system needs to pay for consumed server resources, so that the control program of the unmanned aerial vehicle information acquisition system needs to make a flight control instruction and also needs to make a calculation migration control instruction according to the balance of time and cost.
From the perspective of the drone system, it is aimed at minimizing the energy consumption of the system and the processing time of the tasks. Therefore, the unmanned aerial vehicle system needs to adjust its own control command according to the actual task state and the environmental state (e.g., the state of the server, etc.), so as to achieve the optimal energy consumption and task completion time, and such a problem can be regarded as a joint optimization problem. In the existing research, the control instruction for the data acquisition task of the multi-unmanned aerial vehicle system is solved by two methods, namely a heuristic method and a method based on machine learning. The heuristic algorithm is that a multi-unmanned aerial vehicle information acquisition task is modeled into a NP-hard combined optimization problem, and then the optimal information acquisition strategy is obtained after multiple rounds of calculation are carried out on multiple randomly generated combined solutions by utilizing algorithms such as a genetic algorithm, a particle swarm algorithm, simulated annealing and the like. The traditional heuristic algorithm needs to obtain unmanned aerial vehicle information acquisition and calculation migration control instructions through multiple rounds of calculation after receiving tasks, generates larger time delay and is not beneficial to the execution of some emergency tasks; deep reinforcement learning is used as one of machine learning methods, a deep neural network can be trained to serve as a strategy function, the system state of each time slot is input into the neural network, and specific actions of the unmanned aerial vehicle are output, so that the unmanned aerial vehicle system can be helped to make appropriate flight and calculation migration decisions. However, the current research adopts a deep reinforcement learning method based on a single intelligent agent, the whole system is regarded as an intelligent agent, and flight and computational migration strategies are uniformly formulated for all unmanned aerial vehicles in the system according to a centralized strategy network. This requires the drone system to collect all drone states in each timeslot set, resulting in some communication delay. And with the increase of the number of unmanned aerial vehicles and the more complex environment, the problem that the optimal strategy is difficult to obtain or the neural network is not converged occurs in the deep reinforcement learning of the single intelligent agent. Aiming at the problems, the multi-agent deep reinforcement learning only needs to acquire the reward value through interaction between each agent and the environment, so that the strategy of each agent is continuously learned and improved, and the agent does not need to obtain the global state information of the system through communication when making a decision, so that the communication delay is avoided.
Referring to fig. 1 to 3, the following provides a method for controlling an unmanned aerial vehicle based on multi-agent deep reinforcement learning, including:
s101: establishing an information acquisition task model according to parameters of the unmanned aerial vehicle group information acquisition system; the information acquisition task is divided into an acquisition subtask and a calculation subtask;
s102: constructing a deep neural network model according to the task model, and training the deep neural network model by utilizing a multi-agent deep reinforcement learning algorithm combined with an attention mechanism; wherein the agent is an unmanned aerial vehicle;
s103: and controlling the unmanned aerial vehicle group in the actual environment to complete an information acquisition task by using the trained deep neural network model.
In this embodiment, there are M unmanned aerial vehicles in the unmanned aerial vehicle system, K edge server that can supply unmanned aerial vehicle to insert. Meanwhile, the time is discretized into equal-length time slots tau, and in each time slot tau, the system needs to perform N information acquisition tasks. Before using the multi-agent deep reinforcement learning algorithm, the system model needs to be parameterized into a system state space and an agent action space, and an instant reward function is set.
The specific process of parameterizing the system state space in this embodiment is as follows:
in each time slot, the total state of the system comprises the states of N information acquisition tasks generated by the system, the states of K edge servers and the states of M unmanned aerial vehicles, and the states are respectively defined
Figure BDA0002731160330000081
Figure BDA0002731160330000082
And
Figure BDA0002731160330000083
order to
Figure BDA0002731160330000084
Is the state represented as the jth information collection task of the current time slot, wherein,
Figure BDA0002731160330000085
the abscissa representing the position of the j-th acquisition task,
Figure BDA0002731160330000086
ordinate representing the position of the j-th acquisition task, bjIndicating the size of data that needs to be collected for the jth task. Order to
Figure BDA0002731160330000087
Indicating the state of the kth edge server at the current time slot, where,
Figure BDA0002731160330000088
representing the computation rate of the k-th edge compute server,
Figure BDA0002731160330000089
representing the upstream bandwidth of the k-th edge server. Order to
Figure BDA00027311603300000810
The state of the ith drone for the current timeslot, wherein,
Figure BDA00027311603300000811
an abscissa indicating the current position of the ith drone,
Figure BDA00027311603300000812
and the ordinate represents the current position of the ith unmanned aerial vehicle.
It is worth explaining that, in the multi-agent unmanned aerial vehicle system, when the unmanned aerial vehicle makes a specific action of the current time slot of the unmanned aerial vehicle, the unmanned aerial vehicle does not need to communicate with other unmanned aerial vehicles, so that each unmanned aerial vehicle cannot obtain all the states of the current system, and obtains a local observation value based on the total state of the system and the observation range of the unmanned aerial vehicle
Figure BDA00027311603300000813
Wherein T isiSet all information points in the observation range of the unmanned aerial vehicle i in the current time slot into a state UiFor all other unmanned aerial vehicles in the i observation range of unmanned aerial vehicle in the current time slotAnd E is the state set of all edge servers in the system.
The specific process of parameterizing the motion space of the agent in this embodiment is as follows:
observation o obtained at each known time slot itselfiUnder the circumstances of (2), each unmanned aerial vehicle needs to obtain the action according to its own policy function. Defining the action of the ith unmanned aerial vehicle at the time slot tau strategy function output as ai=[diii,zi]Wherein d isiDistance, θ, for flight of the ith unmanned aerial vehicleiFor the angle, rho, at which the ith unmanned aerial vehicle flies in the time slot tauiThe ratio of the calculated migration for the ith drone, ziNumbering the edge servers accessed by the ith unmanned aerial vehicle.
The specific process of setting the reward function in this embodiment is as follows:
for the unmanned aerial vehicle information acquisition system, the goal of the unmanned aerial vehicle i is to maximize the income of the unmanned aerial vehicle i. After the unmanned aerial vehicle i finishes the information acquisition task, the unmanned aerial vehicle i can obtain certain benefits, and meanwhile, the energy consumption and the time cost of the unmanned aerial vehicle i in the process of finishing the acquisition task are also considered. Thus defining the reward function of a single drone in a slot as
Figure BDA0002731160330000091
GiIndicating that drone i collects the benefits of task completion in one time slot,
Figure BDA0002731160330000092
indicating the time cost of drone i in one time slot,
Figure BDA0002731160330000093
representing the energy consumption of drone i during one time slot.
It is worth noting that the profit of a task is related to the data size of the task and is defined as
Figure BDA0002731160330000094
Figure BDA0002731160330000095
Wherein beta isijThat 1 indicates that the unmanned aerial vehicle i has performed information acquisition on the information point j, βijMeaning unmanned aerial vehicle i does not carry out information acquisition to information point j, b ═ 0jThe total amount of data for the jth task, g is the completion yield of unit data.
Figure BDA0002731160330000096
Mainly obtained by adding the flight time of the unmanned aerial vehicle, the time for acquiring information and the time for carrying out data calculation, and defined as
Figure BDA0002731160330000097
Wherein d isiFlight distance, v, for the ith unmanned aerial vehicleiFor the flight rate of the ith drone,
Figure BDA0002731160330000098
indicating the information acquisition rate of the ith drone,
Figure BDA0002731160330000099
represents the calculated rate of the ith drone, and
Figure BDA00027311603300000910
indicating that drone i spends time computing task j; epsiloniIs the data uploading rate of the unmanned plane i, and is an edge server z accessed by the unmanned plane i according to Shannon's theoremiBandwidth of
Figure BDA00027311603300000911
So as to obtain the compound with the characteristics of,
Figure BDA00027311603300000912
wherein n isziAccessing edge servers z for the same time slotiSNR is the signal-to-noise ratio. While
Figure BDA00027311603300000913
Figure BDA00027311603300000914
The time cost for carrying out the edge calculation is shown, and the time cost comprises the time cost for uploading data and the time cost for calculating by an edge server.
Figure BDA00027311603300000915
Mainly obtained by adding the flight energy consumption of the unmanned aerial vehicle, the energy consumption of information acquisition and the energy consumption of data calculation, and defined as
Figure BDA00027311603300000916
Wherein the content of the first and second substances,
Figure BDA0002731160330000101
is the power at which the drone i is flying,
Figure BDA0002731160330000102
is the power of the information acquisition of the unmanned plane i,
Figure BDA0002731160330000103
the power calculated locally for drone i,
Figure BDA0002731160330000104
the power of the unmanned aerial vehicle i for data uploading,
Figure BDA0002731160330000105
is the z thiThe computational power of each edge server.
The multi-agent deep reinforcement learning algorithm combined with the attention mechanism in the embodiment is mainly divided into two parts, wherein the first part is to establish a deep neural network for computer simulation environment training according to an information acquisition task model; and the second part is to acquire the information acquisition and calculation migration control instruction of the unmanned aerial vehicle in the actual environment by using the trained deep neural network.
In the embodiment, a multi-agent deep reinforcement learning algorithm combined with an attention mechanism is adopted to train a deep neural network model, the adopted deep reinforcement learning algorithm is based on an Actor-Critic framework, and the deep neural network is divided into an Actor network and a Critic network. The actor network is used as a strategy function of the intelligent agent and is used for acquiring the specific action of the intelligent agent; the critic network is used for evaluating the strategy performance, namely the Q value, of the agent operator network in the training process as a function of the action state value of the agent. In the neural network training stage, an operator network and a critic network need to be trained simultaneously. It should be noted that the deep reinforcement learning algorithm of the present embodiment is a multi-agent deep reinforcement learning algorithm, that is, each agent has its own actor network.
Wherein the actor network comprises an estimated actor network and a target actor network, the actor network is a three-layer full-connection layer deep neural network, and the input of the network is the observed value o of the unmanned aerial vehicleiAnd the output is the action a of the unmanned aerial vehicle i at the current time sloti. The operator network is trained to obtain a better action strategy function, and is used for obtaining corresponding optimal actions according to different state inputs of the actual environment.
The critic network also comprises an estimation critic network and a target critic network, and in order to enable the critic network to obtain more accurate estimation values, the critic network adds an attention unit on the basis of a three-layer fully-connected layer deep neural network, and the structure is shown in fig. 1.
The specific method comprises the following steps: firstly, the observed value o of each unmanned aerial vehicle is determinediAnd an action value aiInputting the data into a single-layer full-connection layer deep neural network (1-layer MLP) to obtain a state action characteristic value g (o) of each unmanned aerial vehiclei,ai);
Then, the state action characteristic values of all the unmanned aerial vehicles are input into the attention unit.
In the attention unit, the attention weight alpha of each unmanned aerial vehicle j is calculated according to the characteristic value of the unmanned aerial vehicle i and the characteristic values of the other unmanned aerial vehicles j (j ≠ i)jThe specific attention weight is calculated as follows:
Figure BDA0002731160330000111
wherein the content of the first and second substances,
Figure BDA0002731160330000112
and WqFor a learnable attention parameter matrix, the above formula mainly obtains an attention coefficient by performing scaling dot product (scaled dot product) on the state action characteristic value and the attention parameter matrix, and then obtains the attention weight of the unmanned aerial vehicle j by normalizing the attention coefficient by using a softmax function.
Then, calculating the influence value e of the rest of unmanned aerial vehicles on the unmanned aerial vehicle i in a weighted sum mode according to the attention weight and the feature values of the rest of unmanned aerial vehiclesi
Figure BDA0002731160330000113
Wherein, WoFor a learnable attention parameter matrix, h is a dot product operation.
Finally, the state action characteristic value g (o) of the unmanned aerial vehicle ii,ai) And the influence value eiInputting the data into a double-layer full-connection layer deep neural network (2-layers MLP) to obtain the action state value Q of the unmanned aerial vehiclei
The training process of the deep neural network in this embodiment is as follows: firstly, building a parameterized simulation environment model by python, initializing the total state S of the system, and generating the observation value X of each unmanned aerial vehicle in the current time slot according to the total state of the system in the current time slot and the observation range of each unmanned aerial vehicle1,o2,…,oM]. Observing value o of each unmanned aerial vehicleiInputting into corresponding operator network to obtain eachAction value a of unmanned aerial vehiclei. The simulation environment is according to the total state of the current system and the action value A ═ a of all the unmanned aerial vehicles in the current time slot0,a1,…,aM]Calculating to obtain the reward R ═ R of all the unmanned planes0,r1,…,rM]The system next slot state S ' and the observed value X ' ═ o '1,o′2,…,o′M]. (X, a, R, X ') in a time slot is stored as an experience sample in the agent's experience pool for updating network parameters. And updating the network parameters after the number of the experience pool samples reaches a certain threshold, taking the updating of the network parameters of the unmanned aerial vehicle i as an example, and the updating steps of the network parameters of other unmanned aerial vehicles are the same.
The process of updating the estimated critic network parameters comprises the following steps: randomly taking H experience samples (X) from the experience poolj,Aj,Rj,X′j) J ∈ {1,2, …, H }, and the next slot in each sample j is observed for value X'jRespectively inputting the data into the target operator network of the corresponding agent to obtain the action of all agents about the next time slot of the experience sample
Figure BDA0002731160330000114
The observed value X 'of the next slot in sample j'jAnd action value A'jInputting the value into a target critic network to obtain a target Q value of the agent i,
Figure BDA0002731160330000121
the Q value for agent i is obtained in an estimation criticc network that inputs the observed value and the action value for the current time slot in sample j,
Figure BDA0002731160330000122
repeating the steps above for all the intelligent persons and calculating the mean square error loss function of the evaluation criticic network according to the following formula, wherein the smaller the mean square error loss function value is, the more accurate the evaluation result obtained by the criticic network is, wherein gamma is the discount factor of the reward,
Figure BDA0002731160330000123
Figure BDA0002731160330000124
the prize value for agent i in the jth experience sample. Then, the parameter theta of the criticc network is updated by minimizing a loss function by using a random gradient descent methodQ
Figure BDA0002731160330000125
The process of updating the estimated operator network parameters is as follows: for each agent i, inputting the observed value and the action value of the current time slot in the H sampled empirical samples into an estimation critic network to obtain a Q value, Qi(X,a1,a2,…,aM). The objective of the operator network is to maximize the Q value, and its performance function is expressed as the expected value of the Q value, and the specific formula is as follows:
Figure BDA0002731160330000126
wherein E isx,a~DExpressed as the expected value, μ, of the Q value calculated using the decimated samplesi(oi) A policy function that approximates the estimated actor network for drone i. In-pair estimation of operator network parameters
Figure BDA0002731160330000127
When updating, the operator network parameters are updated according to the performance function
Figure BDA0002731160330000128
Make a derivation
Figure BDA0002731160330000129
Figure BDA00027311603300001210
And updated using a random gradient ascent method.
The process of continuously updating the target network parameters comprises the following steps: most preferablyThen the parameter theta of the target criticc network according to the following formulaQ' and all agents i (i e {1,2, …, M })
Figure BDA00027311603300001211
Performing a soft update wherein
Figure BDA00027311603300001212
Learning rate for the target network:
Figure BDA00027311603300001213
Figure BDA00027311603300001214
and circulating the training operation for multiple times until the strategy function approximated by the estimation operator network converges.
Pseudo code for deep neural network training is as follows:
Figure BDA0002731160330000131
after the deep neural network is trained for multiple times, namely after the deep neural network is trained, the estimation operator network can be used for controlling the unmanned aerial vehicle group in the actual environment to complete the information acquisition task, and the method specifically comprises the following steps:
firstly, parameterizing the state of an unmanned aerial vehicle system in an actual environment and the observation of each unmanned aerial vehicle; then, inputting the parameterized observation value of each task period of the unmanned aerial vehicle into a trained operator network to obtain the action value of the task period of the unmanned aerial vehicle; and finally, converting the obtained action value into a flight instruction and a calculation migration instruction, enabling the unmanned aerial vehicle to execute flight action according to the instruction and fly to the target position, acquiring all information tasks in the range, and calculating and migrating the acquired original data according to the migration rate to complete the calculation task. The above steps are repeated at each task cycle.
The following is another embodiment of the method for controlling an unmanned aerial vehicle based on multi-agent deep reinforcement learning, which includes:
s1: building a simulation environment by using the parameterized multi-unmanned aerial vehicle information acquisition task model, and randomly initializing a system state and neural network parameters;
s2: acquiring an observed value of the current time slot of the unmanned aerial vehicle;
s3: determining the information acquisition and the calculation migration action of the unmanned aerial vehicle in the current time slot by adopting an actor network according to the observed value;
s4: calculating the reward of the current time slot and the observation value of the unmanned aerial vehicle of the next time slot according to the parameterized model, and storing the time slot experience sample into an experience pool; repeating S2-S4 until the number of experience pool samples reaches a certain threshold;
s5: randomly sampling a certain number of experience samples from the experience pool, updating the neural network parameters to obtain updated network parameters, and repeating S2-S5 until the strategy function is converged;
s6: parameterizing an information acquisition task in an actual environment, and acquiring an actual observation value of the unmanned aerial vehicle;
s7: and determining the action value of the unmanned aerial vehicle by adopting an actor network according to the observed value, and controlling the unmanned aerial vehicle to carry out information acquisition and calculation migration according to the action value.
Various methods may be used in the embodiment, including but not limited to:
(1) modifying a parameter updating formula of the deep neural network, and adopting an updating mode of deep reinforcement learning algorithms such as PPO (polyphenylene oxide), SAC (SAC) and the like
(2) Computing method for reward function R in modification model
On the basis of the technical scheme of the invention, the improvement and equivalent transformation of individual steps of the algorithm according to the principle of the invention are not excluded from the protection scope of the invention.
The embodiment models the information acquisition task of the unmanned aerial vehicle, decomposes the information acquisition task of the unmanned aerial vehicle into an acquisition subtask and a calculation subtask, and parameterizes the whole unmanned aerial vehicle system model. In the embodiment, each unmanned aerial vehicle is regarded as an intelligent agent, and the multi-intelligent-agent deep reinforcement learning only needs to acquire the reward value through interaction between each intelligent agent and the environment, so that the strategy of the intelligent agent is continuously learned and improved, and the global state information of the system is not required to be acquired through communication when the intelligent agent makes a decision, so that the communication delay is avoided.
In the embodiment, when the actor network is trained, an attention mechanism is combined, and the critic network with the attention unit is used for evaluating the performance of the actor network, so that the influence of other agents with higher similarity on the actor network can be better noticed during evaluation, more accurate evaluation values are obtained to guide the training of the actor network, and the training speed of the actor network is accelerated.
Compare with ordinary single intelligent agent degree of depth reinforcement learning algorithm with whole unmanned aerial vehicle system as an intelligent agent, every unmanned aerial vehicle is regarded as an intelligent agent to this embodiment, when carrying out the information acquisition task, every unmanned aerial vehicle only need with the observed value direct input of self in the actor network that has trained alright obtain the control instruction of this task cycle, avoided single intelligent agent degree of depth reinforcement learning algorithm to need collect all unmanned aerial vehicle's state and observed value through communication before formulating control instruction, thereby reaction delay has been reduced.
The invention also provides an embodiment of the unmanned aerial vehicle control system based on multi-agent deep reinforcement learning, which comprises the following steps:
the task model establishing module is used for establishing an information acquisition task model according to parameters of the unmanned aerial vehicle group information acquisition system; the information acquisition task is divided into an acquisition subtask and a calculation subtask;
the deep neural network building and training model is used for building a deep neural network model according to the task model and training the deep neural network model by utilizing a multi-agent deep reinforcement learning algorithm combined with an attention mechanism; wherein the agent is an unmanned aerial vehicle;
and the information acquisition task execution module is used for controlling the unmanned aerial vehicle group in the actual environment to complete the information acquisition task by utilizing the trained deep neural network model.
The present invention also provides a computer-readable storage medium for storing a computer program; wherein the computer program, when executed by a processor, implements the multi-agent deep reinforcement learning-based drone control method.
In addition, the present invention also provides an electronic device including:
a memory for storing a computer program;
and the processor is used for executing the computer program to realize the unmanned aerial vehicle control method based on multi-agent deep reinforcement learning.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (6)

1. An unmanned aerial vehicle control method based on multi-agent deep reinforcement learning is characterized by comprising the following steps:
establishing an information acquisition task model according to parameters of the unmanned aerial vehicle group information acquisition system; the information acquisition task is divided into an acquisition subtask and a calculation subtask;
converting parameters of the unmanned aerial vehicle group information acquisition system into a state space of the system and an action space of the intelligent agent, and setting an instant reward function;
constructing a deep neural network model according to the task model, and training the deep neural network model by utilizing a multi-agent deep reinforcement learning algorithm combined with an attention mechanism; the deep neural network model comprises an operator network and a critic network, the operator network comprises an estimated value operator network and a target operator network, the critic network comprises an estimated value critic network and a target critic network, attention units are embedded in the critic network on three full-connection layers, and the intelligent agent is an unmanned aerial vehicle;
when an actor network is trained, a critic network with an attention unit is used for evaluating the performance of the actor network, and the specific process is as follows:
firstly, the number of unmanned planes in the unmanned plane cluster is N, and the observed value o of the unmanned plane i is calculatediAnd an action value aiInputting the single-layer full-connection layer to obtain the state action characteristic value g (o) of each unmanned aerial vehiclei,ai) Inputting the state action characteristic values of all unmanned aerial vehicles into an attention unit;
the attention unit calculates attention weight alpha of the unmanned aerial vehicle j according to the characteristic value of the unmanned aerial vehicle i and the characteristic values of the other unmanned aerial vehicles jj
Figure FDA0003374387210000011
Wherein the content of the first and second substances,
Figure FDA0003374387210000012
and WqI is more than or equal to 1 and less than or equal to N, and j is not equal to i;
calculating the influence value e of the rest unmanned aerial vehicles on the unmanned aerial vehicle i in a weighted sum mode according to the attention weight and the feature values of the rest unmanned aerial vehiclesi
Figure FDA0003374387210000013
The state action characteristic value g (o) of the unmanned aerial vehicle ii,ai) And the influence value eiThe value Q of the action state of the unmanned aerial vehicle is obtained by inputting the value Q into a double-layer full-connection layer networki
And controlling the unmanned aerial vehicle group in the actual environment to complete an information acquisition task by using the trained deep neural network model.
2. The method for controlling the multi-agent deep reinforcement learning-based unmanned aerial vehicle according to claim 1, wherein training the deep neural network model by using a multi-agent deep reinforcement learning algorithm combined with an attention mechanism specifically comprises:
s201: randomly initializing a system state and a neural network parameter;
s202: acquiring an observation value X of the current time slot of each unmanned aerial vehicle as [ o ] according to the system state and the observation range of the unmanned aerial vehicle1,o2,...,oM](ii) a Wherein M is the number of unmanned aerial vehicles in the unmanned aerial vehicle cluster;
s203: the observed value o of each unmanned aerial vehicleiInputting the data into a corresponding actor network to obtain an action value a corresponding to each unmanned aerial vehiclei(ii) a Wherein i is more than or equal to 1 and less than or equal to M;
s204: according to the system state and the action value A of all the unmanned planes in the current time slot ═ a0,a1,...,aM]Get the reward R ═ R for all drones0,r1,...,rM]The system next slot state S ' and the observed value X ' ═ o '1,o′2,...,o′M]Storing the experience samples (X, A, R, X') in an experience pool of the intelligent agent;
s205: and repeating S202-S204 until the number of the experience pool samples reaches a set threshold, and extracting a certain number of experience samples from the experience pool to update the neural network parameters until the strategy function of the operator network converges.
3. The multi-agent deep reinforcement learning-based unmanned aerial vehicle control method according to claim 1, wherein the step of controlling the unmanned aerial vehicle cluster in the actual environment by using the trained deep neural network model to complete the information acquisition task specifically comprises:
parameterizing the state of the unmanned aerial vehicle system and the observed value of each unmanned aerial vehicle in the actual environment;
inputting the observation value parameterized by the unmanned aerial vehicle into a trained operator network to obtain an action value of the unmanned aerial vehicle;
and converting the action value into an acquisition instruction and a calculation instruction, and acquiring information and calculating and transferring by the unmanned aerial vehicle according to the acquisition instruction and the calculation instruction.
4. Unmanned aerial vehicle control system based on many intelligent agent degree of depth reinforcement study, its characterized in that includes:
the task model establishing module is used for establishing an information acquisition task model according to parameters of the unmanned aerial vehicle group information acquisition system; the information acquisition task is divided into an acquisition subtask and a calculation subtask;
the system parameter conversion module is used for converting the parameters of the unmanned aerial vehicle group information acquisition system into a state space of the system and an action space of the intelligent agent and setting an instant reward function;
the deep neural network building and training module is used for building a deep neural network model according to the task model and training the deep neural network model by utilizing a multi-agent deep reinforcement learning algorithm combined with an attention mechanism; the deep neural network model comprises an operator network and a critic network, the operator network comprises an estimated value operator network and a target operator network, the critic network comprises an estimated value critic network and a target critic network, attention units are embedded in the critic network on three full-connection layers, and the intelligent agent is an unmanned aerial vehicle;
when an actor network is trained, a critic network with an attention unit is used for evaluating the performance of the actor network, and the specific process is as follows:
firstly, the number of unmanned planes in the unmanned plane cluster is N, and the observed value o of the unmanned plane i is calculatediAnd an action value aiFor obtaining each unmanned aerial vehicle in a single-layer full-link layerCharacteristic value g (o) of state motioni,ai) Inputting the state action characteristic values of all unmanned aerial vehicles into an attention unit;
the attention unit calculates attention weight alpha of the unmanned aerial vehicle j according to the characteristic value of the unmanned aerial vehicle i and the characteristic values of the other unmanned aerial vehicles jj
Figure FDA0003374387210000031
Wherein the content of the first and second substances,
Figure FDA0003374387210000032
and WqI is more than or equal to 1 and less than or equal to N, and j is not equal to i;
calculating the influence value e of the rest unmanned aerial vehicles on the unmanned aerial vehicle i in a weighted sum mode according to the attention weight and the feature values of the rest unmanned aerial vehiclesi
Figure FDA0003374387210000033
The state action characteristic value g (o) of the unmanned aerial vehicle ii,ai) And the influence value eiThe value Q of the action state of the unmanned aerial vehicle is obtained by inputting the value Q into a double-layer full-connection layer networki
And the information acquisition task execution module is used for controlling the unmanned aerial vehicle group in the actual environment to complete the information acquisition task by utilizing the trained deep neural network model.
5. A computer-readable storage medium for storing a computer program; wherein the computer program when executed by a processor implements the multi-agent deep reinforcement learning based drone controlling method according to any one of claims 1 to 3.
6. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the multi-agent deep reinforcement learning-based drone controlling method according to any one of claims 1 to 3.
CN202011118496.4A 2020-10-19 2020-10-19 Unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning Active CN112256056B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011118496.4A CN112256056B (en) 2020-10-19 2020-10-19 Unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011118496.4A CN112256056B (en) 2020-10-19 2020-10-19 Unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN112256056A CN112256056A (en) 2021-01-22
CN112256056B true CN112256056B (en) 2022-03-01

Family

ID=74244867

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011118496.4A Active CN112256056B (en) 2020-10-19 2020-10-19 Unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN112256056B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966431B (en) * 2021-02-04 2023-04-28 西安交通大学 Data center energy consumption joint optimization method, system, medium and equipment
CN113128698B (en) * 2021-03-12 2022-09-20 合肥工业大学 Reinforced learning method for multi-unmanned aerial vehicle cooperative confrontation decision
CN112947575B (en) * 2021-03-17 2023-05-16 中国人民解放军国防科技大学 Unmanned aerial vehicle cluster multi-target searching method and system based on deep reinforcement learning
CN113146624B (en) * 2021-03-25 2022-04-29 重庆大学 Multi-agent control method based on maximum angle aggregation strategy
CN113269329B (en) * 2021-04-30 2024-03-19 北京控制工程研究所 Multi-agent distributed reinforcement learning method
CN113381824B (en) * 2021-06-08 2023-01-31 清华大学 Underwater acoustic channel measuring method and device, unmanned underwater vehicle and storage medium
CN113572548B (en) * 2021-06-18 2023-07-07 南京理工大学 Unmanned plane network cooperative fast frequency hopping method based on multi-agent reinforcement learning
CN113625757B (en) * 2021-08-12 2023-10-24 中国电子科技集团公司第二十八研究所 Unmanned aerial vehicle group scheduling method based on reinforcement learning and attention mechanism
CN113703482B (en) * 2021-08-30 2022-08-12 西安电子科技大学 Task planning method based on simplified attention network in large-scale unmanned aerial vehicle cluster
CN113900445A (en) * 2021-10-13 2022-01-07 厦门渊亭信息科技有限公司 Unmanned aerial vehicle cooperative control training method and system based on multi-agent reinforcement learning
CN114423061B (en) * 2022-01-20 2024-05-07 重庆邮电大学 Wireless route optimization method based on attention mechanism and deep reinforcement learning
CN115086914B (en) * 2022-05-20 2023-11-10 成都飞机工业(集团)有限责任公司 Remote online reconstruction method for acquisition strategy of airborne test system
CN115507852B (en) * 2022-09-07 2023-11-03 广东工业大学 Multi-unmanned aerial vehicle path planning method based on blockchain and enhanced attention learning
CN115599129A (en) * 2022-11-07 2023-01-13 北京卓翼智能科技有限公司(Cn) Unmanned aerial vehicle cluster system and unmanned aerial vehicle
CN116009590B (en) * 2023-02-01 2023-11-17 中山大学 Unmanned aerial vehicle network distributed track planning method, system, equipment and medium
CN116741019A (en) * 2023-08-11 2023-09-12 成都飞航智云科技有限公司 Flight model training method and training system based on AI

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109884897A (en) * 2019-03-21 2019-06-14 中山大学 A kind of matching of unmanned plane task and computation migration method based on deeply study
CN110956148A (en) * 2019-12-05 2020-04-03 上海舵敏智能科技有限公司 Autonomous obstacle avoidance method and device for unmanned vehicle, electronic device and readable storage medium
CN111144793A (en) * 2020-01-03 2020-05-12 南京邮电大学 Commercial building HVAC control method based on multi-agent deep reinforcement learning
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109884897A (en) * 2019-03-21 2019-06-14 中山大学 A kind of matching of unmanned plane task and computation migration method based on deeply study
CN110956148A (en) * 2019-12-05 2020-04-03 上海舵敏智能科技有限公司 Autonomous obstacle avoidance method and device for unmanned vehicle, electronic device and readable storage medium
CN111144793A (en) * 2020-01-03 2020-05-12 南京邮电大学 Commercial building HVAC control method based on multi-agent deep reinforcement learning
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"基于强化学习的多无人机协同任务规划算法研究";樊龙涛;《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》;20191115;正文第17-21页 *

Also Published As

Publication number Publication date
CN112256056A (en) 2021-01-22

Similar Documents

Publication Publication Date Title
CN112256056B (en) Unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning
CN111667513B (en) Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning
CN107479368B (en) Method and system for training unmanned aerial vehicle control model based on artificial intelligence
CN110673620B (en) Four-rotor unmanned aerial vehicle air line following control method based on deep reinforcement learning
CN110806759B (en) Aircraft route tracking method based on deep reinforcement learning
CN109520504B (en) Grid discretization-based unmanned aerial vehicle patrol route optimization method
CN109884897B (en) Unmanned aerial vehicle task matching and calculation migration method based on deep reinforcement learning
CN112465151A (en) Multi-agent federal cooperation method based on deep reinforcement learning
CN112711271B (en) Autonomous navigation unmanned aerial vehicle power optimization method based on deep reinforcement learning
Hong et al. Energy-efficient online path planning of multiple drones using reinforcement learning
CN110587606B (en) Open scene-oriented multi-robot autonomous collaborative search and rescue method
CN110531786B (en) Unmanned aerial vehicle maneuvering strategy autonomous generation method based on DQN
CN113110546B (en) Unmanned aerial vehicle autonomous flight control method based on offline reinforcement learning
CN107092987B (en) Method for predicting autonomous landing wind speed of small and medium-sized unmanned aerial vehicles
CN112580537B (en) Deep reinforcement learning method for multi-unmanned aerial vehicle system to continuously cover specific area
CN112651437A (en) Spatial non-cooperative target pose estimation method based on deep learning
CN112766499A (en) Method for realizing autonomous flight of unmanned aerial vehicle through reinforcement learning technology
CN113268074B (en) Unmanned aerial vehicle flight path planning method based on joint optimization
Yue et al. Deep reinforcement learning and its application in autonomous fitting optimization for attack areas of UCAVs
CN114967721A (en) Unmanned aerial vehicle self-service path planning and obstacle avoidance strategy method based on DQ-CapsNet
CN114679729A (en) Radar communication integrated unmanned aerial vehicle cooperative multi-target detection method
Puente-Castro et al. Q-learning based system for path planning with unmanned aerial vehicles swarms in obstacle environments
CN114371729B (en) Unmanned aerial vehicle air combat maneuver decision method based on distance-first experience playback
Feng et al. Infrared camera assisted uav autonomous control via deep reinforcement learning
CN116400726A (en) Rotor unmanned aerial vehicle escape method and system based on reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant