CN115951707A

CN115951707A - Unmanned aerial vehicle cluster task planning method and device, storage medium and equipment

Info

Publication number: CN115951707A
Application number: CN202310006543.3A
Authority: CN
Inventors: 丘昌镇; 刘紫薇; 张志勇; 徐雪阳
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2023-01-04
Filing date: 2023-01-04
Publication date: 2023-04-11

Abstract

The application provides a method, a device, a storage medium and equipment for planning unmanned aerial vehicle cluster tasks, comprising the following steps: randomly selecting one unmanned aerial vehicle from the unmanned aerial vehicle cluster as a first unmanned aerial vehicle, using other unmanned aerial vehicles as second unmanned aerial vehicles, and forming the rest unmanned aerial vehicle cluster by the second unmanned aerial vehicles; acquiring a first unmanned aerial vehicle actual task execution environment and an unmanned aerial vehicle cluster task planning model; inputting the actual task execution environment into an unmanned aerial vehicle cluster task planning model to obtain the task planning of the unmanned aerial vehicle cluster; the unmanned aerial vehicle cluster task planning model is obtained by learning and training an improved COMA model by taking a simulation task execution environment as a training sample; the improved COMA model comprises a COMA network and a graph convolution module, wherein the graph convolution module is arranged in the COMA network. The unmanned aerial vehicle cluster can be guided to make globally optimal actions according to the setting of the joint reward and the continuous updating of the evaluation value of the evaluation function.

Description

Unmanned aerial vehicle cluster task planning method and device, storage medium and equipment

Technical Field

The invention relates to the technical field of unmanned aerial vehicles, in particular to a method, a device, a storage medium and equipment for planning a cluster task of an unmanned aerial vehicle.

Background

The unmanned aerial vehicle has the characteristics of low manufacturing cost, flexibility, convenience and quickness in deployment, long endurance and the like, is increasingly widely applied in the fields of military and civil use, and is the best choice for executing tasks such as reconnaissance, cruising and the like. Because the scope that single unmanned aerial vehicle can cover is less, the efficiency of performance is limited, under the condition that the executive task is complicated day by day, unmanned aerial vehicle cluster cooperatees and carries out the task and become unmanned aerial vehicle's development trend.

The existing unmanned aerial vehicle cluster task planning is divided into a flight path planning part and a task allocation part, and the task allocation part and the flight path planning part are preset, so that the coupling of the task allocation part and the flight path planning part is not considered during setting, and the unmanned aerial vehicle cluster task planning method cannot cope with a dynamic environment with unstable factors. In addition, in the process of setting task planning by the existing reinforcement learning algorithm, each unmanned aerial vehicle in the unmanned aerial vehicle cluster is treated equally, and the influence of each unmanned aerial vehicle on the reward value of the cluster is considered to be the same, so that the result obtained by calculation has the possibility of local optimal solution; and under the condition that unmanned aerial vehicles are more in the cluster, the interaction process easily occupies a large amount of communication and calculation resources, and the efficiency of task planning is not improved.

Disclosure of Invention

Based on the method, the device, the storage medium and the equipment, the unmanned aerial vehicle cluster task planning method, the device, the storage medium and the equipment can enable the unmanned aerial vehicle to acquire global environment information and transformation thereof in time, and guide the unmanned aerial vehicle cluster to make an optimal decision by considering the credit allocation problem of the unmanned aerial vehicle.

In a first aspect, the present invention provides a method for planning a cluster task of an unmanned aerial vehicle, including:

randomly selecting one unmanned aerial vehicle from the unmanned aerial vehicle cluster as a first unmanned aerial vehicle, and using other unmanned aerial vehicles as second unmanned aerial vehicles, wherein the second unmanned aerial vehicles form the rest unmanned aerial vehicle cluster;

acquiring a first unmanned aerial vehicle actual task execution environment and an unmanned aerial vehicle cluster task planning model;

inputting the actual task execution environment into an unmanned aerial vehicle cluster task planning model to obtain the task planning of the unmanned aerial vehicle cluster;

the unmanned aerial vehicle cluster task planning model is obtained by learning and training an improved COMA model by taking a simulated task execution environment as a training sample; the improved COMA model includes a COMA network and a graph convolution module, wherein the graph convolution module is disposed in the COMA network.

In a second aspect, the present invention provides an unmanned aerial vehicle cluster mission planning apparatus, including:

the unmanned aerial vehicle selecting module is used for selecting one unmanned aerial vehicle from the unmanned aerial vehicle cluster as a first unmanned aerial vehicle, and other unmanned aerial vehicles are used as second unmanned aerial vehicles, and the second unmanned aerial vehicles form the rest unmanned aerial vehicle cluster;

the parameter acquisition module is used for acquiring a first unmanned aerial vehicle actual task execution environment and an unmanned aerial vehicle cluster task planning model;

the task planning module is used for inputting the actual task execution environment into an unmanned aerial vehicle cluster task planning model to obtain the task planning of the unmanned aerial vehicle cluster;

the unmanned aerial vehicle cluster task planning model is obtained by learning and training an improved COMA model by using a training sample for simulating a task execution environment; the improved COMA model includes a COMA network and a graph convolution module, wherein the graph convolution module is disposed in the COMA network.

In a third aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, performs the steps of the method for task planning for a cluster of drones according to the first aspect.

In a fourth aspect, the present invention provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor executes the computer program to perform the method for planning the unmanned aerial vehicle cluster mission according to any one of the first aspect.

The beneficial effects of adopting the above technical scheme are: the unmanned aerial vehicle cluster task planning model is constructed based on COMA and graph convolution deep reinforcement learning, and the unmanned aerial vehicle cluster can be guided to make globally optimal actions according to the setting of joint rewards and the continuous updating of the estimation value of an evaluation function; and in the calculation process of the evaluation function estimation value, the local state stacking result of the adjacent unmanned aerial vehicle is considered, so that the use of communication resources and calculation resources in the interaction process is reduced, and the efficiency of task planning is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below.

Fig. 1 is a schematic diagram of a method for planning a task of an unmanned aerial vehicle cluster in an embodiment of the present application;

fig. 2 is a schematic diagram of a task planning process of an unmanned aerial vehicle cluster in an embodiment of the present application;

fig. 3 is a frame diagram of a task planning graph convolution module of an unmanned aerial vehicle cluster according to an embodiment of the present application;

fig. 4a is a test environment for a cluster of drones to perform cooperative communication tasks according to an embodiment of the present application;

fig. 4b is a test environment of an drone cluster executing a physical spoofing task according to an embodiment of the present application;

fig. 5 is a schematic diagram of an unmanned aerial vehicle cluster mission planning device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. In order to explain the present invention in more detail, the method, the apparatus, the storage medium, and the device for planning the unmanned aerial vehicle cluster mission provided by the present invention are specifically described below with reference to the accompanying drawings.

The unmanned aerial vehicle cluster task planning makes up the problem of increasingly complex task processing requirements due to insufficient task execution capacity of unit price unmanned aerial vehicles by adopting the cooperative cooperation of a plurality of unmanned aerial vehicles. At present, when an unmanned aerial vehicle cluster executes a regional defense task, because information acquired by a single unmanned aerial vehicle is limited, the optimal strategy in the task planning process cannot be acquired in time from the global environment. Aiming at the problem, the application provides an unmanned aerial vehicle cluster task planning method, an unmanned aerial vehicle cluster task planning device, a storage medium and equipment.

The embodiment of the application provides a specific application scenario of the unmanned aerial vehicle cluster task planning method. The application scenario includes the terminal device provided by the embodiment, and the terminal device may be various electronic devices including, but not limited to, a smart phone and a computer device, where the computer device may be at least one of a desktop computer, a portable computer, a laptop computer, a tablet computer, and the like. The user operates the terminal device, sends out an operation instruction of unmanned aerial vehicle cluster task planning, and the terminal device executes the unmanned aerial vehicle cluster task planning method.

Based on this, an unmanned aerial vehicle cluster mission planning method is provided in the embodiment of the present application, which is described by taking the application of the method to a terminal device as an example, and with reference to the schematic diagram of the unmanned aerial vehicle cluster mission planning method shown in fig. 1.

In this application embodiment, each unmanned aerial vehicle in the unmanned aerial vehicle cluster is regarded as a spherical intelligent body, and the radius of each unmanned aerial vehicle is set to r _uva Setting the initial position of the ith unmanned aerial vehicle as P _i ＝[x _i ，y _i ，z _i ] ^T Initial speed of ith drone is set to V _i ＝[v _i，x ，v _i，y ，v _i，z ] ^T The speed of the ith unmanned aerial vehicle at the preset time is set as

Wherein v is _i，x X-axis component, v, of initial velocity of the ith drone _i，y Y-axis component, v, of initial velocity of the ith drone _i，z Is the z-axis component, v ', of the initial velocity of the ith drone' _i，x Is x-axis component, v 'of speed of ith unmanned aerial vehicle at preset time' _i，y Is the y-axis component, v 'of the speed of the ith drone at the preset time' _i，z The z-axis component of the speed of the ith unmanned aerial vehicle at the preset time is shown, a is the acceleration of the ith unmanned aerial vehicle, and delta t is the preset time; of any one unmanned aerial vehicleVelocity V _i ≤V _max ，V _max Y-axis component h of any one drone position for a predetermined maximum flyer velocity _min ≤y _i ≤h _max ，h _min At a predetermined minimum flying height h _max Is the preset maximum height of the flyer.

One or more obstacles and destinations are also included in the course of the drone cluster performing the mission. Wherein the obstacle is also considered as a sphere with a radius set to r _adv Initial position of the obstacle is set to P _k ＝[x _k ，y _k ，z _k ] ^T Initial velocity of the obstacle is set to V _k ＝[v _k,x ，v _k,y ，v _k,z ] ^T The speed of the obstacle at a preset time is set as

Wherein v is _k,x Is the x-axis component, v, of the initial velocity of the obstacle _k,y Is the y-axis component of the initial velocity of the obstacle, v _k,z Is the z-axis component, v ', of the initial velocity of the obstacle' _k，x Is the x-axis component, v ', of the velocity of the obstacle at a preset time' _k，y Is the y-axis component, v ', of the velocity of the obstacle at the preset time' _k，z Is the z-axis component of the velocity of the obstacle at a predetermined time, a _k Is the acceleration of the obstacle, Δ t is the preset time, and the speed V of the obstacle _k ≤V _max ，V _max The y-axis component h of the position of the obstacle for a predetermined maximum speed of the aircraft _min ≤y _k ≤h _max ，h _min At a predetermined minimum flying height h _max Is the preset maximum height of the flyer.

The location of the destination is set to g = [ x = [ x ] _g ，y _g ，z _g ] ^T The radius of the destination is set to r _aim 。

Wherein, the collision distance of ith unmanned aerial vehicle and barrier sets up to D _col ＝r _uva +r _adv (ii) a When the ith unmanned aerial vehicle reaches the target area, the distance between the unmanned aerial vehicle and the target area is set to be D _aim ≤r _uav +r _aim 。

In the embodiment of the application, the unmanned aerial vehicle cluster task planning can be represented by a Markov game model, specifically<N，S，A，Γ，R，O，γ>N is the total number of unmanned aerial vehicles simulating the task execution environment; s is the local state of all unmanned aerial vehicles of the unmanned aerial vehicle cluster; a is the motion vector of all unmanned aerial vehicles of the unmanned aerial vehicle cluster, and A = A ₁ ×A ₂ ×…×A _N (ii) a The gamma is the probability that the unmanned aerial vehicle cluster is transferred to the next state by adopting combined action in the current state, and the gamma is _： S×A ₁ ×A ₂ ×…×A _N → S ', S' is the next local state of all drones of the drone cluster; r is the joint reward for the drone,

gamma is the discount coefficient, r _i A reward value obtained for an ith drone interacting with the environment; and O is the local state of each unmanned aerial vehicle.

Based on the physical models and the motion models of the unmanned aerial vehicle, the obstacle and the destination, the unmanned aerial vehicle cluster task planning method in the embodiment of the application specifically comprises the following steps:

step S101: randomly selecting one unmanned aerial vehicle as a first unmanned aerial vehicle in the unmanned aerial vehicle cluster, using other unmanned aerial vehicles as second unmanned aerial vehicles, and forming the rest unmanned aerial vehicle cluster by the second unmanned aerial vehicles.

For convenience of description, in the embodiment of the present application, a first drone is denoted as an ith drone in a drone cluster, and remaining drone clusters are denoted as B _i The second drone is recorded as the jth drone in the drone cluster, j belongs to B _i Then the unmanned plane cluster can be marked as B _+i 。

Step S102: and acquiring a first unmanned aerial vehicle actual task execution environment and an unmanned aerial vehicle cluster task planning model.

The unmanned aerial vehicle cluster task planning model is obtained by performing learning training on an improved COMA model by taking a simulated task execution environment as a training sample; the improved COMA model includes a COMA network and a graph convolution module, wherein the graph convolution module is disposed in the COMA network.

Step S103: and inputting the actual task execution environment into the unmanned aerial vehicle cluster task planning model to obtain the task planning of the unmanned aerial vehicle cluster.

Specifically, the local state of each unmanned aerial vehicle in the actual task execution environment is input into the unmanned aerial vehicle cluster task planning model, and the task planning of the unmanned aerial vehicle cluster in the actual task execution environment is obtained.

Further, further explanation is made with respect to the drone cluster mission planning model used in steps S102-S103:

the unmanned aerial vehicle cluster task planning model is obtained by performing learning training on an improved COMA model by taking a simulated task execution environment as a training sample, wherein the improved COMA model comprises a COMA network and a graph convolution module; furthermore, the COMA network comprises a strategy network and an evaluation network which are sequentially connected, and the graph volume module is embedded in the evaluation network.

The simulation task execution environment can be obtained by using a gym simulation platform or a universe simulation platform of the OPENI.

As shown in the flowchart of fig. 2, the establishing of the unmanned aerial vehicle cluster mission planning model specifically includes the following steps:

step S201: obtaining training samples including simulation local states S of all unmanned aerial vehicles in the unmanned aerial vehicle cluster at the current moment, simulation local states S' of all unmanned aerial vehicles in the unmanned aerial vehicle cluster at the next moment, simulation action vectors A of all unmanned aerial vehicles in the unmanned aerial vehicle cluster, and simulation joint rewards of all unmanned aerial vehicles in the unmanned aerial vehicle cluster

And a remaining cluster set B of drones _i 。

Each training sample may be recorded as

Wherein done is an end signal of the training sample; s = { o = _t,1 ，o _t,2 ，...，o _t,N }，o _t,i The simulation local state of the ith unmanned aerial vehicle in the unmanned aerial vehicle cluster at the current moment is shown, and N is the number of the unmanned aerial vehicles in the unmanned aerial vehicle cluster; a = { a = _t,1 ，a _t,2 ，...，a _t,N }，a _t,i The simulation motion vector of the ith unmanned aerial vehicle in the unmanned aerial vehicle cluster at the current moment is obtained; s '= { o' _t，1 ，o′ _t，2 ，...，o′ _t，N }，o′ _t，i Simulating a local state of an ith unmanned aerial vehicle in the unmanned aerial vehicle cluster at the next moment; />

And (4) performing simulated joint reward for the ith unmanned aerial vehicle in the unmanned aerial vehicle cluster at the current moment.

Specifically, the training and learning of the unmanned aerial vehicle cluster mission planning model by the training samples comprises the following steps:

step S202: obtaining simulated local states S = { o ] of all unmanned aerial vehicles in unmanned aerial vehicle cluster at current moment _t,1 ，o _t,2 ，...，o _t,N }；

Step S203: calculating according to the simulated local state of each unmanned aerial vehicle to obtain the simulated motion vector of each unmanned aerial vehicle, which specifically comprises the following steps:

simulating local states o of all unmanned aerial vehicles in the unmanned aerial vehicle cluster at the current moment _t,i Respectively inputting the strategy network to obtain the simulated intermediate motion vector mu of each unmanned aerial vehicle _i (o _t,i )。

Simulating the intermediate motion vector mu of each unmanned aerial vehicle _i (o _t,i ) Respectively superposed with the noise vector Q to obtain the simulated motion vector a of each unmanned aerial vehicle in the unmanned aerial vehicle cluster _t,i ', the concrete expression can be recorded as a _t,i ′＝μ _i (o _t,i ) + N. Wherein the introduction of the noise vector increases the exploratory properties of the strategy function.

Step S204: simulating local states S = { o) of all unmanned aerial vehicles at the current moment _t,1 ，o _t,2 ，...，o _t,N And the remaining cluster set of drones B _i And the input graph convolution module is used for obtaining the simulation local state characteristics of the first unmanned aerial vehicle and the weights between the first unmanned aerial vehicle and each second unmanned aerial vehicle.

The graph convolution module framework shown in fig. 3 includes an observation coding layer, a convolution layer, a fully-connected network layer and a ReLU nonlinear activation function layer. Wherein, in step S204, the simulated local state S = { o ] of the drone _t,1 ，o _t,2 ，...，o _t,N And the remaining cluster set of drones B _i Inputting the graph convolution module to obtain a simulated local state characteristic h 'of the first unmanned plane' _i And a weight between the first drone and each second drone

The method comprises the following steps:

step S301: simulating local states S = { o) of all unmanned aerial vehicles at the current moment _t，1 ，o _t，2 ，...，o _t，N Respectively observing the coding layer to obtain initial characteristics of each unmanned aerial vehicle, and recording the initial characteristics of the ith unmanned aerial vehicle as h _t，i 。

Step S302: and inputting the initial characteristics of each unmanned aerial vehicle into an attention mechanism for processing to obtain the weight between the first unmanned aerial vehicle and each second unmanned aerial vehicle.

The specific expression is as follows:

wherein it is present>

In the mth convolution layer, the weight between the first drone and any second drone is greater or smaller than>

For the queried linear mapping parameter in the attention mechanism in the mth convolutional layer, -is->

Linear mapping parameters for keywords in the attention mechanism in the mth convolutional layer, h _t，i For the ith unmanned aerial vehicle initial characteristic at the moment t, h _t，j Is the jth unmanned aerial vehicle initial characteristic at the moment t.

Step S303: the initial characteristic h of each unmanned aerial vehicle _t，i And the weight

Sequentially inputting a fully-connected network layer and a ReLU nonlinear activation function layer to obtain a simulated local state feature h 'of each unmanned aerial vehicle' _i 。

The specific expression is as follows:

wherein, h' _i The simulation local state characteristics of the ith unmanned aerial vehicle are provided, and M is the number of the convolution layers; and carrying out weighted summation and connection on the M initial characteristics, and processing through a function sigma (comprising a fully-connected network layer and a ReLU nonlinear activation function layer) to obtain the simulated local state characteristics of the unmanned aerial vehicle.

Step S205: simulating local state characteristic h 'of a second unmanned aerial vehicle' _j Stacking to obtain the local stacking result S of the second unmanned aerial vehicle _i，C 。

Step S206: stacking the local state of the second drone _i，C First unmanned simulation motion vector a _t，i ', and simulated joint reward of a first drone in a cluster of drones

And inputting the evaluation network to obtain the strategy parameters. The method specifically comprises the following steps:

step S401: stacking the local state of the second drone _i，C And calculating to obtain a first unmanned simulation motion vector a _t,i ' input evaluation network

Obtaining an evaluation networkIn a mean evaluation value of%>

Step S402: for the middle evaluation value

Upon correction, the prize is awarded in conjunction with the first drone simulation>

Overlapping to obtain an estimated value y of the evaluation network _i . The specific expression of the estimated value of the evaluation network is as follows:

where gamma is the discount coefficient. The estimated value is obtained according to the simulated motion vector calculated by the strategy network.

Step S403: inputting the intermediate evaluation value of the evaluation network and the evaluation value of the evaluation network into a loss function to obtain a strategy parameter theta _i 。

The specific expression of the loss function is as follows:

wherein L (θ) _i ) Based on the loss function>

And obtaining an intermediate evaluation value for the simulated motion vector directly obtained from the training sample.

Step S207: according to the policy parameter theta _i Calculating to obtain a strategy gradient

And updating the strategy network and the evaluation network according to the strategy gradient until the updating times are reached so as to obtain the unmanned aerial vehicle cluster task planning model.

The strategy gradient concrete expression of the strategy network is as follows:

wherein G is the number of training samples,

for a policy function gradient of a policy network, <' >>

To evaluate the gradient of the network action value function, mu _i (o _i ) For policy networks in local state vector o _i An action value selected at the moment>

Is a state space s _t，i Take action a _i ＝μ _i (o _i ) A time rating function.

And continuously updating the strategy parameters by the difference value of the evaluation function calculated by the action value calculated by the strategy network and the action value in the training sample, updating the strategy network and the evaluation network according to the continuously updated strategy parameters, continuously converging the updated strategy network and the evaluation network, and continuously attaching the calculated evaluation value to the evaluation function value in the sample.

Additionally, the first unmanned joint reward calculation includes the steps of:

step S501: first reward value r obtained by interaction of first unmanned machine and simulated task execution environment _t，i And a plurality of second reward values r obtained by interaction of each second unmanned aerial vehicle and the simulated task execution environment _t，j 。

Wherein the first reward value comprises a first collision reward r of the unmanned aerial vehicle and the obstacle ^c The first drone arrives at the destination with an arrival reward r ^g And an action reward r for the first unmanned machine to perform the action ^s 。

Collision reward r ^c The specific expression of (A) is as follows:

r _col = -5, D is the distance between the drone and the obstacle, D _col The collision distance between the unmanned aerial vehicle and the obstacle is determined.

Arrival reward r ^g The specific expression of (A) is as follows:

r _arr =10, epsilon is the guiding coefficient of the unmanned plane approaching the destination, epsilon =1.1,P _i Position of the ith drone, g position of destination, D _aim Distance of unmanned aerial vehicle from destination, r _uav Radius of unmanned plane, r _aim Is the radius of the destination range.

The specific expression of the action reward rs is: r is a radical of hydrogen ^s ＝-3。

In the same way, a plurality of second reward values obtained by interaction between each second unmanned aerial vehicle and the simulated task execution environment can be obtained, and the details are not repeated here.

Step S502: the plurality of second reward values r _t，j And the weight of the first drone and each second drone

And performing weighting processing to obtain the weighted reward values of the rest unmanned aerial vehicle clusters. />

Step S503: combining the first reward value and the weighted reward value to obtain a first unmanned joint reward

The specific expression is as follows: />

The unmanned aerial vehicle cluster tasks executed by the embodiment of the invention comprise unmanned aerial vehicle cluster cooperative communication and unmanned aerial vehicle cluster physical deception, and as shown in the attached drawings 4a and 4b, a test environment for the unmanned aerial vehicle cluster to execute the cooperative communication tasks and the physical deception is provided through simulation.

According to the method and the device, the UAV cluster mission planning model is constructed by adopting COMA and graph convolution deep reinforcement learning, so that the UAV cluster can be continuously updated and guided to make globally optimal actions according to the setting of joint rewards and the estimation value of an evaluation function. The training method is characterized in that a mode of combining a strategy network and an evaluation network is adopted, the strategy network of each unmanned aerial vehicle is trained respectively, action values obtained by the strategy network of each unmanned aerial vehicle are collected to the evaluation network to obtain an estimated value, the estimated value is compared with an evaluation network value obtained by calculating a training sample, and the strategy network and the evaluation network are adjusted, so that a calculation result obtained by an updated unmanned aerial vehicle cluster task planning model is closer to a training sample value.

In addition, as a large amount of data is adopted to update and adjust the policy network and the evaluation network in the training process, the final unmanned aerial vehicle cluster task planning model can achieve global optimal planning, so that in an unknown dynamic three-dimensional environment, the unmanned aerial vehicle cluster can adopt centralized training and distributed execution strategies at the same time, and unmanned aerial vehicles communicate with each other in a training environment to learn a cooperation strategy; and in the actual task execution environment, the unmanned aerial vehicle can only rely on the local state observed by the unmanned aerial vehicle to make a decision, communication is not needed, and the decision-making time is shortened to a great extent.

It should be understood that although the various steps in the flowchart of fig. 1 are shown in order, as indicated by the nominal arrows, the steps are not necessarily performed in order, as indicated by the arrows. The steps are not limited to being performed in the exact order described, and may be performed in other orders, unless otherwise indicated herein. Moreover, at least some of the steps in fig. 1 may include multiple sub-steps or sub-stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least some of the sub-steps or stages of other steps.

The unmanned aerial vehicle cluster task planning method is described in detail in the embodiment disclosed by the invention, and the method disclosed by the invention can be realized by adopting equipment in various forms, so that the invention also discloses an unmanned aerial vehicle cluster task planning device corresponding to the method, and a specific embodiment is provided for detailed description in combination with an attached drawing 5.

Unmanned aerial vehicle selects module 601 for choose an unmanned aerial vehicle as first unmanned aerial vehicle wantonly in the unmanned aerial vehicle cluster, other unmanned aerial vehicles are as the second unmanned aerial vehicle, the surplus unmanned aerial vehicle cluster is constituteed to the second unmanned aerial vehicle.

A parameter obtaining module 602, configured to obtain a first unmanned aerial vehicle actual task execution environment and an unmanned aerial vehicle cluster task planning model.

And the task planning module 603 is configured to input the actual task execution environment to the unmanned aerial vehicle cluster task planning model, so as to obtain a task plan of the unmanned aerial vehicle cluster.

For specific limitations of the drone cluster mission planning device, reference may be made to the above limitations on the drone cluster mission planning method, which are not described herein again. The various modules in the above-described apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or be independent of a processor of the terminal device, and can also be stored in a memory of the terminal device in a software form, so that the processor calls and executes operations corresponding to the modules.

In an embodiment, the present invention further provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the unmanned aerial vehicle cluster mission planning method described above.

The computer-readable storage medium may be an electronic memory such as a flash memory, an EEPROM (electrically erasable and programmable read only memory), an EPROM (erasable and programmable read only memory), a hard disk, or a ROM. Optionally, the computer-readable storage medium comprises a non-transitory computer-readable medium. The computer readable storage medium has a storage space for program code for performing any of the method steps of the above-described method. These program codes can be read from or written to one or more computer program products, which can be compressed in a suitable form.

In one embodiment, the present invention provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor executes the computer program to perform the steps of the unmanned aerial vehicle cluster mission planning method.

The computer device comprises a memory, a processor, and one or more computer programs, wherein the one or more computer programs may be stored in the memory and configured to be executed by the one or more processors, the one or more application programs configured to perform the drone cluster mission planning method described above.

A processor may include one or more processing cores. The processor, using the various interfaces and lines to connect the various parts throughout the computer device, performs the various functions of the computer device and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in memory, and calling data stored in memory. Alternatively, the processor may be implemented in hardware using at least one of Digital Signal Processing (DSP), field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor may integrate one or a combination of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is to be understood that the modem may be implemented by a communication chip without being integrated into the processor.

The Memory may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory may be used to store an instruction, a program, code, a set of codes, or a set of instructions. The memory may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like. The storage data area may also store data created by the terminal device in use, and the like.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An unmanned aerial vehicle cluster task planning method is characterized by comprising the following steps:

inputting the actual task execution environment into an unmanned aerial vehicle cluster task planning model to obtain task planning of an unmanned aerial vehicle cluster;

2. The unmanned aerial vehicle cluster mission planning method of claim 1, wherein the COMA network comprises a policy network and an evaluation network connected in sequence, wherein the graph volume module is embedded in the evaluation network; the unmanned aerial vehicle cluster task planning model establishment method comprises the following steps:

acquiring training samples, wherein the training samples comprise simulated local states of all unmanned aerial vehicles in an unmanned aerial vehicle cluster at the current moment, simulated local states of all unmanned aerial vehicles in the unmanned aerial vehicle cluster at the next moment, simulated motion vectors of all unmanned aerial vehicles in the unmanned aerial vehicle cluster, and simulated joint rewards of all unmanned aerial vehicles in the unmanned aerial vehicle cluster;

respectively inputting the simulated local states of all unmanned aerial vehicles in the unmanned aerial vehicle cluster at the current moment into a strategy network to obtain a simulated intermediate action vector of each unmanned aerial vehicle;

superposing the simulated intermediate motion vector of each unmanned aerial vehicle with the noise vector respectively to obtain the simulated motion vector of each unmanned aerial vehicle in the unmanned aerial vehicle cluster;

inputting the simulated local states of all the unmanned aerial vehicles and the rest unmanned aerial vehicle cluster set at the current moment into a graph convolution module to obtain the simulated local state characteristics of the first unmanned aerial vehicle and the weights between the first unmanned aerial vehicle and each second unmanned aerial vehicle;

stacking the simulated local state features of the second unmanned aerial vehicle to obtain a local state stacking result of the second unmanned aerial vehicle;

calculating the local state stacking result of the second unmanned aerial vehicle and the strategy network to obtain a simulation action vector of a first unmanned aerial vehicle, and inputting the simulation joint reward of the first unmanned aerial vehicle in the unmanned aerial vehicle cluster into an evaluation network to obtain strategy parameters;

and calculating according to the strategy parameters to obtain a strategy gradient, and updating the strategy network and the evaluation network according to the strategy gradient until the updating times are reached so as to obtain the unmanned aerial vehicle cluster task planning model.

3. The unmanned aerial vehicle cluster mission planning method of claim 2, wherein the graph convolution module comprises an observation coding layer, a convolution layer, a fully connected network layer, and a ReLU nonlinear activation function layer; the method for obtaining the simulation local state characteristics of the first unmanned aerial vehicle and the weights between the first unmanned aerial vehicle and each second unmanned aerial vehicle comprises the following steps of:

respectively carrying out observation coding layer processing on the simulated local states of all the unmanned aerial vehicles at the current moment to obtain the initial characteristics of all the unmanned aerial vehicles;

inputting the initial characteristics of each unmanned aerial vehicle into an attention mechanism for processing to obtain the weight between the first unmanned aerial vehicle and each second unmanned aerial vehicle;

and sequentially inputting the initial characteristics and the weight of each unmanned aerial vehicle into a full-connection network layer and a ReLU nonlinear activation function layer to obtain the simulated local state characteristics of each unmanned aerial vehicle.

4. The method for unmanned aerial vehicle cluster mission planning of claim 2, wherein the step of inputting the local state stacking result of the second unmanned aerial vehicle, the simulated action vector of the first unmanned aerial vehicle calculated by the policy network, and the simulated joint reward of the first unmanned aerial vehicle in the unmanned aerial vehicle cluster into the evaluation network to obtain the policy parameters comprises:

inputting the local state stacking result of the second unmanned aerial vehicle and the simulated motion vector of the first unmanned aerial vehicle obtained by the calculation of the strategy network into an evaluation network to obtain a middle evaluation value of the evaluation network;

multiplying the intermediate evaluation value by a discount coefficient, and then superposing the intermediate evaluation value and the first unmanned simulation joint reward to obtain an evaluation network estimation value;

and inputting the intermediate evaluation value of the evaluation network and the estimation value of the evaluation network into a loss function to obtain a strategy parameter.

5. The unmanned aerial vehicle cluster mission planning method of claim 2, wherein calculating the first unmanned joint reward comprises:

the first unmanned aerial vehicle and the simulated task execution environment interact to obtain a first reward value, and each second unmanned aerial vehicle and the simulated task execution environment interact to obtain a plurality of second reward values;

weighting the plurality of second reward values and the weights of the first unmanned aerial vehicle and each second unmanned aerial vehicle to obtain weighted reward values of the rest unmanned aerial vehicle clusters;

and combining the first reward value and the weighted reward value to obtain a first unmanned joint reward.

6. The unmanned aerial vehicle cluster mission planning method of claim 5, wherein the first reward value comprises:

the collision reward of the first unmanned machine and the obstacle, the arrival reward of the first unmanned machine to the destination and the action reward of the first unmanned machine to execute the action.

7. The unmanned aerial vehicle cluster mission planning method of claim 2, wherein calculating the simulated local state of the first unmanned aerial vehicle in the unmanned aerial vehicle cluster at the next time comprises:

and interacting the first unmanned simulation action vector with the simulation task execution environment to obtain a first unmanned simulation local state at the next moment.

8. An unmanned aerial vehicle cluster mission planning device, the device includes:

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for drone cluster mission planning according to any one of claims 1 to 7.

10. A computer arrangement comprising a memory and a processor, the memory storing a computer program, wherein the processor, when executing the computer program, performs the method of drone cluster mission planning according to any of claims 1-7.