CN115951707A - Unmanned aerial vehicle cluster task planning method and device, storage medium and equipment - Google Patents

Unmanned aerial vehicle cluster task planning method and device, storage medium and equipment Download PDF

Info

Publication number
CN115951707A
CN115951707A CN202310006543.3A CN202310006543A CN115951707A CN 115951707 A CN115951707 A CN 115951707A CN 202310006543 A CN202310006543 A CN 202310006543A CN 115951707 A CN115951707 A CN 115951707A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
cluster
vehicle cluster
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310006543.3A
Other languages
Chinese (zh)
Inventor
丘昌镇
刘紫薇
张志勇
徐雪阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202310006543.3A priority Critical patent/CN115951707A/en
Publication of CN115951707A publication Critical patent/CN115951707A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a method, a device, a storage medium and equipment for planning unmanned aerial vehicle cluster tasks, comprising the following steps: randomly selecting one unmanned aerial vehicle from the unmanned aerial vehicle cluster as a first unmanned aerial vehicle, using other unmanned aerial vehicles as second unmanned aerial vehicles, and forming the rest unmanned aerial vehicle cluster by the second unmanned aerial vehicles; acquiring a first unmanned aerial vehicle actual task execution environment and an unmanned aerial vehicle cluster task planning model; inputting the actual task execution environment into an unmanned aerial vehicle cluster task planning model to obtain the task planning of the unmanned aerial vehicle cluster; the unmanned aerial vehicle cluster task planning model is obtained by learning and training an improved COMA model by taking a simulation task execution environment as a training sample; the improved COMA model comprises a COMA network and a graph convolution module, wherein the graph convolution module is arranged in the COMA network. The unmanned aerial vehicle cluster can be guided to make globally optimal actions according to the setting of the joint reward and the continuous updating of the evaluation value of the evaluation function.

Description

Unmanned aerial vehicle cluster task planning method and device, storage medium and equipment
Technical Field
The invention relates to the technical field of unmanned aerial vehicles, in particular to a method, a device, a storage medium and equipment for planning a cluster task of an unmanned aerial vehicle.
Background
The unmanned aerial vehicle has the characteristics of low manufacturing cost, flexibility, convenience and quickness in deployment, long endurance and the like, is increasingly widely applied in the fields of military and civil use, and is the best choice for executing tasks such as reconnaissance, cruising and the like. Because the scope that single unmanned aerial vehicle can cover is less, the efficiency of performance is limited, under the condition that the executive task is complicated day by day, unmanned aerial vehicle cluster cooperatees and carries out the task and become unmanned aerial vehicle's development trend.
The existing unmanned aerial vehicle cluster task planning is divided into a flight path planning part and a task allocation part, and the task allocation part and the flight path planning part are preset, so that the coupling of the task allocation part and the flight path planning part is not considered during setting, and the unmanned aerial vehicle cluster task planning method cannot cope with a dynamic environment with unstable factors. In addition, in the process of setting task planning by the existing reinforcement learning algorithm, each unmanned aerial vehicle in the unmanned aerial vehicle cluster is treated equally, and the influence of each unmanned aerial vehicle on the reward value of the cluster is considered to be the same, so that the result obtained by calculation has the possibility of local optimal solution; and under the condition that unmanned aerial vehicles are more in the cluster, the interaction process easily occupies a large amount of communication and calculation resources, and the efficiency of task planning is not improved.
Disclosure of Invention
Based on the method, the device, the storage medium and the equipment, the unmanned aerial vehicle cluster task planning method, the device, the storage medium and the equipment can enable the unmanned aerial vehicle to acquire global environment information and transformation thereof in time, and guide the unmanned aerial vehicle cluster to make an optimal decision by considering the credit allocation problem of the unmanned aerial vehicle.
In a first aspect, the present invention provides a method for planning a cluster task of an unmanned aerial vehicle, including:
randomly selecting one unmanned aerial vehicle from the unmanned aerial vehicle cluster as a first unmanned aerial vehicle, and using other unmanned aerial vehicles as second unmanned aerial vehicles, wherein the second unmanned aerial vehicles form the rest unmanned aerial vehicle cluster;
acquiring a first unmanned aerial vehicle actual task execution environment and an unmanned aerial vehicle cluster task planning model;
inputting the actual task execution environment into an unmanned aerial vehicle cluster task planning model to obtain the task planning of the unmanned aerial vehicle cluster;
the unmanned aerial vehicle cluster task planning model is obtained by learning and training an improved COMA model by taking a simulated task execution environment as a training sample; the improved COMA model includes a COMA network and a graph convolution module, wherein the graph convolution module is disposed in the COMA network.
In a second aspect, the present invention provides an unmanned aerial vehicle cluster mission planning apparatus, including:
the unmanned aerial vehicle selecting module is used for selecting one unmanned aerial vehicle from the unmanned aerial vehicle cluster as a first unmanned aerial vehicle, and other unmanned aerial vehicles are used as second unmanned aerial vehicles, and the second unmanned aerial vehicles form the rest unmanned aerial vehicle cluster;
the parameter acquisition module is used for acquiring a first unmanned aerial vehicle actual task execution environment and an unmanned aerial vehicle cluster task planning model;
the task planning module is used for inputting the actual task execution environment into an unmanned aerial vehicle cluster task planning model to obtain the task planning of the unmanned aerial vehicle cluster;
the unmanned aerial vehicle cluster task planning model is obtained by learning and training an improved COMA model by using a training sample for simulating a task execution environment; the improved COMA model includes a COMA network and a graph convolution module, wherein the graph convolution module is disposed in the COMA network.
In a third aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, performs the steps of the method for task planning for a cluster of drones according to the first aspect.
In a fourth aspect, the present invention provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor executes the computer program to perform the method for planning the unmanned aerial vehicle cluster mission according to any one of the first aspect.
The beneficial effects of adopting the above technical scheme are: the unmanned aerial vehicle cluster task planning model is constructed based on COMA and graph convolution deep reinforcement learning, and the unmanned aerial vehicle cluster can be guided to make globally optimal actions according to the setting of joint rewards and the continuous updating of the estimation value of an evaluation function; and in the calculation process of the evaluation function estimation value, the local state stacking result of the adjacent unmanned aerial vehicle is considered, so that the use of communication resources and calculation resources in the interaction process is reduced, and the efficiency of task planning is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below.
Fig. 1 is a schematic diagram of a method for planning a task of an unmanned aerial vehicle cluster in an embodiment of the present application;
fig. 2 is a schematic diagram of a task planning process of an unmanned aerial vehicle cluster in an embodiment of the present application;
fig. 3 is a frame diagram of a task planning graph convolution module of an unmanned aerial vehicle cluster according to an embodiment of the present application;
fig. 4a is a test environment for a cluster of drones to perform cooperative communication tasks according to an embodiment of the present application;
fig. 4b is a test environment of an drone cluster executing a physical spoofing task according to an embodiment of the present application;
fig. 5 is a schematic diagram of an unmanned aerial vehicle cluster mission planning device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. In order to explain the present invention in more detail, the method, the apparatus, the storage medium, and the device for planning the unmanned aerial vehicle cluster mission provided by the present invention are specifically described below with reference to the accompanying drawings.
The unmanned aerial vehicle cluster task planning makes up the problem of increasingly complex task processing requirements due to insufficient task execution capacity of unit price unmanned aerial vehicles by adopting the cooperative cooperation of a plurality of unmanned aerial vehicles. At present, when an unmanned aerial vehicle cluster executes a regional defense task, because information acquired by a single unmanned aerial vehicle is limited, the optimal strategy in the task planning process cannot be acquired in time from the global environment. Aiming at the problem, the application provides an unmanned aerial vehicle cluster task planning method, an unmanned aerial vehicle cluster task planning device, a storage medium and equipment.
The embodiment of the application provides a specific application scenario of the unmanned aerial vehicle cluster task planning method. The application scenario includes the terminal device provided by the embodiment, and the terminal device may be various electronic devices including, but not limited to, a smart phone and a computer device, where the computer device may be at least one of a desktop computer, a portable computer, a laptop computer, a tablet computer, and the like. The user operates the terminal device, sends out an operation instruction of unmanned aerial vehicle cluster task planning, and the terminal device executes the unmanned aerial vehicle cluster task planning method.
Based on this, an unmanned aerial vehicle cluster mission planning method is provided in the embodiment of the present application, which is described by taking the application of the method to a terminal device as an example, and with reference to the schematic diagram of the unmanned aerial vehicle cluster mission planning method shown in fig. 1.
In this application embodiment, each unmanned aerial vehicle in the unmanned aerial vehicle cluster is regarded as a spherical intelligent body, and the radius of each unmanned aerial vehicle is set to r uva Setting the initial position of the ith unmanned aerial vehicle as P i =[x i ,y i ,z i ] T Initial speed of ith drone is set to V i =[v i,x ,v i,y ,v i,z ] T The speed of the ith unmanned aerial vehicle at the preset time is set as
Figure BDA0004037132870000041
Wherein v is i,x X-axis component, v, of initial velocity of the ith drone i,y Y-axis component, v, of initial velocity of the ith drone i,z Is the z-axis component, v ', of the initial velocity of the ith drone' i,x Is x-axis component, v 'of speed of ith unmanned aerial vehicle at preset time' i,y Is the y-axis component, v 'of the speed of the ith drone at the preset time' i,z The z-axis component of the speed of the ith unmanned aerial vehicle at the preset time is shown, a is the acceleration of the ith unmanned aerial vehicle, and delta t is the preset time; of any one unmanned aerial vehicleVelocity V i ≤V max ,V max Y-axis component h of any one drone position for a predetermined maximum flyer velocity min ≤y i ≤h max ,h min At a predetermined minimum flying height h max Is the preset maximum height of the flyer.
One or more obstacles and destinations are also included in the course of the drone cluster performing the mission. Wherein the obstacle is also considered as a sphere with a radius set to r adv Initial position of the obstacle is set to P k =[x k ,y k ,z k ] T Initial velocity of the obstacle is set to V k =[v k,x ,v k,y ,v k,z ] T The speed of the obstacle at a preset time is set as
Figure BDA0004037132870000051
Wherein v is k,x Is the x-axis component, v, of the initial velocity of the obstacle k,y Is the y-axis component of the initial velocity of the obstacle, v k,z Is the z-axis component, v ', of the initial velocity of the obstacle' k,x Is the x-axis component, v ', of the velocity of the obstacle at a preset time' k,y Is the y-axis component, v ', of the velocity of the obstacle at the preset time' k,z Is the z-axis component of the velocity of the obstacle at a predetermined time, a k Is the acceleration of the obstacle, Δ t is the preset time, and the speed V of the obstacle k ≤V max ,V max The y-axis component h of the position of the obstacle for a predetermined maximum speed of the aircraft min ≤y k ≤h max ,h min At a predetermined minimum flying height h max Is the preset maximum height of the flyer.
The location of the destination is set to g = [ x = [ x ] g ,y g ,z g ] T The radius of the destination is set to r aim
Wherein, the collision distance of ith unmanned aerial vehicle and barrier sets up to D col =r uva +r adv (ii) a When the ith unmanned aerial vehicle reaches the target area, the distance between the unmanned aerial vehicle and the target area is set to be D aim ≤r uav +r aim
In the embodiment of the application, the unmanned aerial vehicle cluster task planning can be represented by a Markov game model, specifically<N,S,A,Γ,R,O,γ>N is the total number of unmanned aerial vehicles simulating the task execution environment; s is the local state of all unmanned aerial vehicles of the unmanned aerial vehicle cluster; a is the motion vector of all unmanned aerial vehicles of the unmanned aerial vehicle cluster, and A = A 1 ×A 2 ×…×A N (ii) a The gamma is the probability that the unmanned aerial vehicle cluster is transferred to the next state by adopting combined action in the current state, and the gamma is S×A 1 ×A 2 ×…×A N → S ', S' is the next local state of all drones of the drone cluster; r is the joint reward for the drone,
Figure BDA0004037132870000052
gamma is the discount coefficient, r i A reward value obtained for an ith drone interacting with the environment; and O is the local state of each unmanned aerial vehicle.
Based on the physical models and the motion models of the unmanned aerial vehicle, the obstacle and the destination, the unmanned aerial vehicle cluster task planning method in the embodiment of the application specifically comprises the following steps:
step S101: randomly selecting one unmanned aerial vehicle as a first unmanned aerial vehicle in the unmanned aerial vehicle cluster, using other unmanned aerial vehicles as second unmanned aerial vehicles, and forming the rest unmanned aerial vehicle cluster by the second unmanned aerial vehicles.
For convenience of description, in the embodiment of the present application, a first drone is denoted as an ith drone in a drone cluster, and remaining drone clusters are denoted as B i The second drone is recorded as the jth drone in the drone cluster, j belongs to B i Then the unmanned plane cluster can be marked as B +i
Step S102: and acquiring a first unmanned aerial vehicle actual task execution environment and an unmanned aerial vehicle cluster task planning model.
The unmanned aerial vehicle cluster task planning model is obtained by performing learning training on an improved COMA model by taking a simulated task execution environment as a training sample; the improved COMA model includes a COMA network and a graph convolution module, wherein the graph convolution module is disposed in the COMA network.
Step S103: and inputting the actual task execution environment into the unmanned aerial vehicle cluster task planning model to obtain the task planning of the unmanned aerial vehicle cluster.
Specifically, the local state of each unmanned aerial vehicle in the actual task execution environment is input into the unmanned aerial vehicle cluster task planning model, and the task planning of the unmanned aerial vehicle cluster in the actual task execution environment is obtained.
Further, further explanation is made with respect to the drone cluster mission planning model used in steps S102-S103:
the unmanned aerial vehicle cluster task planning model is obtained by performing learning training on an improved COMA model by taking a simulated task execution environment as a training sample, wherein the improved COMA model comprises a COMA network and a graph convolution module; furthermore, the COMA network comprises a strategy network and an evaluation network which are sequentially connected, and the graph volume module is embedded in the evaluation network.
The simulation task execution environment can be obtained by using a gym simulation platform or a universe simulation platform of the OPENI.
As shown in the flowchart of fig. 2, the establishing of the unmanned aerial vehicle cluster mission planning model specifically includes the following steps:
step S201: obtaining training samples including simulation local states S of all unmanned aerial vehicles in the unmanned aerial vehicle cluster at the current moment, simulation local states S' of all unmanned aerial vehicles in the unmanned aerial vehicle cluster at the next moment, simulation action vectors A of all unmanned aerial vehicles in the unmanned aerial vehicle cluster, and simulation joint rewards of all unmanned aerial vehicles in the unmanned aerial vehicle cluster
Figure BDA0004037132870000071
And a remaining cluster set B of drones i
Each training sample may be recorded as
Figure BDA0004037132870000072
Wherein done is an end signal of the training sample; s = { o = t,1 ,o t,2 ,...,o t,N },o t,i The simulation local state of the ith unmanned aerial vehicle in the unmanned aerial vehicle cluster at the current moment is shown, and N is the number of the unmanned aerial vehicles in the unmanned aerial vehicle cluster; a = { a = t,1 ,a t,2 ,...,a t,N },a t,i The simulation motion vector of the ith unmanned aerial vehicle in the unmanned aerial vehicle cluster at the current moment is obtained; s '= { o' t,1 ,o′ t,2 ,...,o′ t,N },o′ t,i Simulating a local state of an ith unmanned aerial vehicle in the unmanned aerial vehicle cluster at the next moment; />
Figure BDA0004037132870000073
Figure BDA0004037132870000074
And (4) performing simulated joint reward for the ith unmanned aerial vehicle in the unmanned aerial vehicle cluster at the current moment.
Specifically, the training and learning of the unmanned aerial vehicle cluster mission planning model by the training samples comprises the following steps:
step S202: obtaining simulated local states S = { o ] of all unmanned aerial vehicles in unmanned aerial vehicle cluster at current moment t,1 ,o t,2 ,...,o t,N };
Step S203: calculating according to the simulated local state of each unmanned aerial vehicle to obtain the simulated motion vector of each unmanned aerial vehicle, which specifically comprises the following steps:
simulating local states o of all unmanned aerial vehicles in the unmanned aerial vehicle cluster at the current moment t,i Respectively inputting the strategy network to obtain the simulated intermediate motion vector mu of each unmanned aerial vehicle i (o t,i )。
Simulating the intermediate motion vector mu of each unmanned aerial vehicle i (o t,i ) Respectively superposed with the noise vector Q to obtain the simulated motion vector a of each unmanned aerial vehicle in the unmanned aerial vehicle cluster t,i ', the concrete expression can be recorded as a t,i ′=μ i (o t,i ) + N. Wherein the introduction of the noise vector increases the exploratory properties of the strategy function.
Step S204: simulating local states S = { o) of all unmanned aerial vehicles at the current moment t,1 ,o t,2 ,...,o t,N And the remaining cluster set of drones B i And the input graph convolution module is used for obtaining the simulation local state characteristics of the first unmanned aerial vehicle and the weights between the first unmanned aerial vehicle and each second unmanned aerial vehicle.
The graph convolution module framework shown in fig. 3 includes an observation coding layer, a convolution layer, a fully-connected network layer and a ReLU nonlinear activation function layer. Wherein, in step S204, the simulated local state S = { o ] of the drone t,1 ,o t,2 ,...,o t,N And the remaining cluster set of drones B i Inputting the graph convolution module to obtain a simulated local state characteristic h 'of the first unmanned plane' i And a weight between the first drone and each second drone
Figure BDA0004037132870000081
The method comprises the following steps:
step S301: simulating local states S = { o) of all unmanned aerial vehicles at the current moment t,1 ,o t,2 ,...,o t,N Respectively observing the coding layer to obtain initial characteristics of each unmanned aerial vehicle, and recording the initial characteristics of the ith unmanned aerial vehicle as h t,i
Step S302: and inputting the initial characteristics of each unmanned aerial vehicle into an attention mechanism for processing to obtain the weight between the first unmanned aerial vehicle and each second unmanned aerial vehicle.
The specific expression is as follows:
Figure BDA0004037132870000082
wherein it is present>
Figure BDA0004037132870000083
In the mth convolution layer, the weight between the first drone and any second drone is greater or smaller than>
Figure BDA0004037132870000084
For the queried linear mapping parameter in the attention mechanism in the mth convolutional layer, -is->
Figure BDA0004037132870000085
Linear mapping parameters for keywords in the attention mechanism in the mth convolutional layer, h t,i For the ith unmanned aerial vehicle initial characteristic at the moment t, h t,j Is the jth unmanned aerial vehicle initial characteristic at the moment t.
Step S303: the initial characteristic h of each unmanned aerial vehicle t,i And the weight
Figure BDA0004037132870000086
Sequentially inputting a fully-connected network layer and a ReLU nonlinear activation function layer to obtain a simulated local state feature h 'of each unmanned aerial vehicle' i
The specific expression is as follows:
Figure BDA0004037132870000087
wherein, h' i The simulation local state characteristics of the ith unmanned aerial vehicle are provided, and M is the number of the convolution layers; and carrying out weighted summation and connection on the M initial characteristics, and processing through a function sigma (comprising a fully-connected network layer and a ReLU nonlinear activation function layer) to obtain the simulated local state characteristics of the unmanned aerial vehicle.
Step S205: simulating local state characteristic h 'of a second unmanned aerial vehicle' j Stacking to obtain the local stacking result S of the second unmanned aerial vehicle i,C
Step S206: stacking the local state of the second drone i,C First unmanned simulation motion vector a t,i ', and simulated joint reward of a first drone in a cluster of drones
Figure BDA0004037132870000088
And inputting the evaluation network to obtain the strategy parameters. The method specifically comprises the following steps:
step S401: stacking the local state of the second drone i,C And calculating to obtain a first unmanned simulation motion vector a t,i ' input evaluation network
Figure BDA0004037132870000091
Obtaining an evaluation networkIn a mean evaluation value of%>
Figure BDA0004037132870000092
Step S402: for the middle evaluation value
Figure BDA0004037132870000093
Upon correction, the prize is awarded in conjunction with the first drone simulation>
Figure BDA0004037132870000094
Overlapping to obtain an estimated value y of the evaluation network i . The specific expression of the estimated value of the evaluation network is as follows:
Figure BDA0004037132870000095
where gamma is the discount coefficient. The estimated value is obtained according to the simulated motion vector calculated by the strategy network.
Step S403: inputting the intermediate evaluation value of the evaluation network and the evaluation value of the evaluation network into a loss function to obtain a strategy parameter theta i
The specific expression of the loss function is as follows:
Figure BDA0004037132870000096
Figure BDA0004037132870000097
wherein L (θ) i ) Based on the loss function>
Figure BDA0004037132870000098
And obtaining an intermediate evaluation value for the simulated motion vector directly obtained from the training sample.
Step S207: according to the policy parameter theta i Calculating to obtain a strategy gradient
Figure BDA0004037132870000099
And updating the strategy network and the evaluation network according to the strategy gradient until the updating times are reached so as to obtain the unmanned aerial vehicle cluster task planning model.
The strategy gradient concrete expression of the strategy network is as follows:
Figure BDA00040371328700000910
wherein G is the number of training samples,
Figure BDA00040371328700000911
for a policy function gradient of a policy network, <' >>
Figure BDA00040371328700000912
To evaluate the gradient of the network action value function, mu i (o i ) For policy networks in local state vector o i An action value selected at the moment>
Figure BDA00040371328700000913
Is a state space s t,i Take action a i =μ i (o i ) A time rating function.
And continuously updating the strategy parameters by the difference value of the evaluation function calculated by the action value calculated by the strategy network and the action value in the training sample, updating the strategy network and the evaluation network according to the continuously updated strategy parameters, continuously converging the updated strategy network and the evaluation network, and continuously attaching the calculated evaluation value to the evaluation function value in the sample.
Additionally, the first unmanned joint reward calculation includes the steps of:
step S501: first reward value r obtained by interaction of first unmanned machine and simulated task execution environment t,i And a plurality of second reward values r obtained by interaction of each second unmanned aerial vehicle and the simulated task execution environment t,j
Wherein the first reward value comprises a first collision reward r of the unmanned aerial vehicle and the obstacle c The first drone arrives at the destination with an arrival reward r g And an action reward r for the first unmanned machine to perform the action s
Collision reward r c The specific expression of (A) is as follows:
Figure BDA0004037132870000101
r col = -5, D is the distance between the drone and the obstacle, D col The collision distance between the unmanned aerial vehicle and the obstacle is determined.
Arrival reward r g The specific expression of (A) is as follows:
Figure BDA0004037132870000102
r arr =10, epsilon is the guiding coefficient of the unmanned plane approaching the destination, epsilon =1.1,P i Position of the ith drone, g position of destination, D aim Distance of unmanned aerial vehicle from destination, r uav Radius of unmanned plane, r aim Is the radius of the destination range.
The specific expression of the action reward rs is: r is a radical of hydrogen s =-3。
In the same way, a plurality of second reward values obtained by interaction between each second unmanned aerial vehicle and the simulated task execution environment can be obtained, and the details are not repeated here.
Step S502: the plurality of second reward values r t,j And the weight of the first drone and each second drone
Figure BDA0004037132870000103
And performing weighting processing to obtain the weighted reward values of the rest unmanned aerial vehicle clusters. />
Step S503: combining the first reward value and the weighted reward value to obtain a first unmanned joint reward
Figure BDA0004037132870000104
The specific expression is as follows: />
Figure BDA0004037132870000105
The unmanned aerial vehicle cluster tasks executed by the embodiment of the invention comprise unmanned aerial vehicle cluster cooperative communication and unmanned aerial vehicle cluster physical deception, and as shown in the attached drawings 4a and 4b, a test environment for the unmanned aerial vehicle cluster to execute the cooperative communication tasks and the physical deception is provided through simulation.
According to the method and the device, the UAV cluster mission planning model is constructed by adopting COMA and graph convolution deep reinforcement learning, so that the UAV cluster can be continuously updated and guided to make globally optimal actions according to the setting of joint rewards and the estimation value of an evaluation function. The training method is characterized in that a mode of combining a strategy network and an evaluation network is adopted, the strategy network of each unmanned aerial vehicle is trained respectively, action values obtained by the strategy network of each unmanned aerial vehicle are collected to the evaluation network to obtain an estimated value, the estimated value is compared with an evaluation network value obtained by calculating a training sample, and the strategy network and the evaluation network are adjusted, so that a calculation result obtained by an updated unmanned aerial vehicle cluster task planning model is closer to a training sample value.
In addition, as a large amount of data is adopted to update and adjust the policy network and the evaluation network in the training process, the final unmanned aerial vehicle cluster task planning model can achieve global optimal planning, so that in an unknown dynamic three-dimensional environment, the unmanned aerial vehicle cluster can adopt centralized training and distributed execution strategies at the same time, and unmanned aerial vehicles communicate with each other in a training environment to learn a cooperation strategy; and in the actual task execution environment, the unmanned aerial vehicle can only rely on the local state observed by the unmanned aerial vehicle to make a decision, communication is not needed, and the decision-making time is shortened to a great extent.
It should be understood that although the various steps in the flowchart of fig. 1 are shown in order, as indicated by the nominal arrows, the steps are not necessarily performed in order, as indicated by the arrows. The steps are not limited to being performed in the exact order described, and may be performed in other orders, unless otherwise indicated herein. Moreover, at least some of the steps in fig. 1 may include multiple sub-steps or sub-stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least some of the sub-steps or stages of other steps.
The unmanned aerial vehicle cluster task planning method is described in detail in the embodiment disclosed by the invention, and the method disclosed by the invention can be realized by adopting equipment in various forms, so that the invention also discloses an unmanned aerial vehicle cluster task planning device corresponding to the method, and a specific embodiment is provided for detailed description in combination with an attached drawing 5.
Unmanned aerial vehicle selects module 601 for choose an unmanned aerial vehicle as first unmanned aerial vehicle wantonly in the unmanned aerial vehicle cluster, other unmanned aerial vehicles are as the second unmanned aerial vehicle, the surplus unmanned aerial vehicle cluster is constituteed to the second unmanned aerial vehicle.
A parameter obtaining module 602, configured to obtain a first unmanned aerial vehicle actual task execution environment and an unmanned aerial vehicle cluster task planning model.
And the task planning module 603 is configured to input the actual task execution environment to the unmanned aerial vehicle cluster task planning model, so as to obtain a task plan of the unmanned aerial vehicle cluster.
The unmanned aerial vehicle cluster task planning model is obtained by learning and training an improved COMA model by using a training sample for simulating a task execution environment; the improved COMA model includes a COMA network and a graph convolution module, wherein the graph convolution module is disposed in the COMA network.
For specific limitations of the drone cluster mission planning device, reference may be made to the above limitations on the drone cluster mission planning method, which are not described herein again. The various modules in the above-described apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or be independent of a processor of the terminal device, and can also be stored in a memory of the terminal device in a software form, so that the processor calls and executes operations corresponding to the modules.
In an embodiment, the present invention further provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the unmanned aerial vehicle cluster mission planning method described above.
The computer-readable storage medium may be an electronic memory such as a flash memory, an EEPROM (electrically erasable and programmable read only memory), an EPROM (erasable and programmable read only memory), a hard disk, or a ROM. Optionally, the computer-readable storage medium comprises a non-transitory computer-readable medium. The computer readable storage medium has a storage space for program code for performing any of the method steps of the above-described method. These program codes can be read from or written to one or more computer program products, which can be compressed in a suitable form.
In one embodiment, the present invention provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor executes the computer program to perform the steps of the unmanned aerial vehicle cluster mission planning method.
The computer device comprises a memory, a processor, and one or more computer programs, wherein the one or more computer programs may be stored in the memory and configured to be executed by the one or more processors, the one or more application programs configured to perform the drone cluster mission planning method described above.
A processor may include one or more processing cores. The processor, using the various interfaces and lines to connect the various parts throughout the computer device, performs the various functions of the computer device and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in memory, and calling data stored in memory. Alternatively, the processor may be implemented in hardware using at least one of Digital Signal Processing (DSP), field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor may integrate one or a combination of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is to be understood that the modem may be implemented by a communication chip without being integrated into the processor.
The Memory may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory may be used to store an instruction, a program, code, a set of codes, or a set of instructions. The memory may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like. The storage data area may also store data created by the terminal device in use, and the like.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. An unmanned aerial vehicle cluster task planning method is characterized by comprising the following steps:
randomly selecting one unmanned aerial vehicle from the unmanned aerial vehicle cluster as a first unmanned aerial vehicle, and using other unmanned aerial vehicles as second unmanned aerial vehicles, wherein the second unmanned aerial vehicles form the rest unmanned aerial vehicle cluster;
acquiring a first unmanned aerial vehicle actual task execution environment and an unmanned aerial vehicle cluster task planning model;
inputting the actual task execution environment into an unmanned aerial vehicle cluster task planning model to obtain task planning of an unmanned aerial vehicle cluster;
the unmanned aerial vehicle cluster task planning model is obtained by learning and training an improved COMA model by taking a simulated task execution environment as a training sample; the improved COMA model includes a COMA network and a graph convolution module, wherein the graph convolution module is disposed in the COMA network.
2. The unmanned aerial vehicle cluster mission planning method of claim 1, wherein the COMA network comprises a policy network and an evaluation network connected in sequence, wherein the graph volume module is embedded in the evaluation network; the unmanned aerial vehicle cluster task planning model establishment method comprises the following steps:
acquiring training samples, wherein the training samples comprise simulated local states of all unmanned aerial vehicles in an unmanned aerial vehicle cluster at the current moment, simulated local states of all unmanned aerial vehicles in the unmanned aerial vehicle cluster at the next moment, simulated motion vectors of all unmanned aerial vehicles in the unmanned aerial vehicle cluster, and simulated joint rewards of all unmanned aerial vehicles in the unmanned aerial vehicle cluster;
respectively inputting the simulated local states of all unmanned aerial vehicles in the unmanned aerial vehicle cluster at the current moment into a strategy network to obtain a simulated intermediate action vector of each unmanned aerial vehicle;
superposing the simulated intermediate motion vector of each unmanned aerial vehicle with the noise vector respectively to obtain the simulated motion vector of each unmanned aerial vehicle in the unmanned aerial vehicle cluster;
inputting the simulated local states of all the unmanned aerial vehicles and the rest unmanned aerial vehicle cluster set at the current moment into a graph convolution module to obtain the simulated local state characteristics of the first unmanned aerial vehicle and the weights between the first unmanned aerial vehicle and each second unmanned aerial vehicle;
stacking the simulated local state features of the second unmanned aerial vehicle to obtain a local state stacking result of the second unmanned aerial vehicle;
calculating the local state stacking result of the second unmanned aerial vehicle and the strategy network to obtain a simulation action vector of a first unmanned aerial vehicle, and inputting the simulation joint reward of the first unmanned aerial vehicle in the unmanned aerial vehicle cluster into an evaluation network to obtain strategy parameters;
and calculating according to the strategy parameters to obtain a strategy gradient, and updating the strategy network and the evaluation network according to the strategy gradient until the updating times are reached so as to obtain the unmanned aerial vehicle cluster task planning model.
3. The unmanned aerial vehicle cluster mission planning method of claim 2, wherein the graph convolution module comprises an observation coding layer, a convolution layer, a fully connected network layer, and a ReLU nonlinear activation function layer; the method for obtaining the simulation local state characteristics of the first unmanned aerial vehicle and the weights between the first unmanned aerial vehicle and each second unmanned aerial vehicle comprises the following steps of:
respectively carrying out observation coding layer processing on the simulated local states of all the unmanned aerial vehicles at the current moment to obtain the initial characteristics of all the unmanned aerial vehicles;
inputting the initial characteristics of each unmanned aerial vehicle into an attention mechanism for processing to obtain the weight between the first unmanned aerial vehicle and each second unmanned aerial vehicle;
and sequentially inputting the initial characteristics and the weight of each unmanned aerial vehicle into a full-connection network layer and a ReLU nonlinear activation function layer to obtain the simulated local state characteristics of each unmanned aerial vehicle.
4. The method for unmanned aerial vehicle cluster mission planning of claim 2, wherein the step of inputting the local state stacking result of the second unmanned aerial vehicle, the simulated action vector of the first unmanned aerial vehicle calculated by the policy network, and the simulated joint reward of the first unmanned aerial vehicle in the unmanned aerial vehicle cluster into the evaluation network to obtain the policy parameters comprises:
inputting the local state stacking result of the second unmanned aerial vehicle and the simulated motion vector of the first unmanned aerial vehicle obtained by the calculation of the strategy network into an evaluation network to obtain a middle evaluation value of the evaluation network;
multiplying the intermediate evaluation value by a discount coefficient, and then superposing the intermediate evaluation value and the first unmanned simulation joint reward to obtain an evaluation network estimation value;
and inputting the intermediate evaluation value of the evaluation network and the estimation value of the evaluation network into a loss function to obtain a strategy parameter.
5. The unmanned aerial vehicle cluster mission planning method of claim 2, wherein calculating the first unmanned joint reward comprises:
the first unmanned aerial vehicle and the simulated task execution environment interact to obtain a first reward value, and each second unmanned aerial vehicle and the simulated task execution environment interact to obtain a plurality of second reward values;
weighting the plurality of second reward values and the weights of the first unmanned aerial vehicle and each second unmanned aerial vehicle to obtain weighted reward values of the rest unmanned aerial vehicle clusters;
and combining the first reward value and the weighted reward value to obtain a first unmanned joint reward.
6. The unmanned aerial vehicle cluster mission planning method of claim 5, wherein the first reward value comprises:
the collision reward of the first unmanned machine and the obstacle, the arrival reward of the first unmanned machine to the destination and the action reward of the first unmanned machine to execute the action.
7. The unmanned aerial vehicle cluster mission planning method of claim 2, wherein calculating the simulated local state of the first unmanned aerial vehicle in the unmanned aerial vehicle cluster at the next time comprises:
and interacting the first unmanned simulation action vector with the simulation task execution environment to obtain a first unmanned simulation local state at the next moment.
8. An unmanned aerial vehicle cluster mission planning device, the device includes:
the unmanned aerial vehicle selecting module is used for selecting one unmanned aerial vehicle from the unmanned aerial vehicle cluster as a first unmanned aerial vehicle, and other unmanned aerial vehicles are used as second unmanned aerial vehicles, and the second unmanned aerial vehicles form the rest unmanned aerial vehicle cluster;
the parameter acquisition module is used for acquiring a first unmanned aerial vehicle actual task execution environment and an unmanned aerial vehicle cluster task planning model;
the task planning module is used for inputting the actual task execution environment into an unmanned aerial vehicle cluster task planning model to obtain the task planning of the unmanned aerial vehicle cluster;
the unmanned aerial vehicle cluster task planning model is obtained by learning and training an improved COMA model by using a training sample for simulating a task execution environment; the improved COMA model includes a COMA network and a graph convolution module, wherein the graph convolution module is disposed in the COMA network.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for drone cluster mission planning according to any one of claims 1 to 7.
10. A computer arrangement comprising a memory and a processor, the memory storing a computer program, wherein the processor, when executing the computer program, performs the method of drone cluster mission planning according to any of claims 1-7.
CN202310006543.3A 2023-01-04 2023-01-04 Unmanned aerial vehicle cluster task planning method and device, storage medium and equipment Pending CN115951707A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310006543.3A CN115951707A (en) 2023-01-04 2023-01-04 Unmanned aerial vehicle cluster task planning method and device, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310006543.3A CN115951707A (en) 2023-01-04 2023-01-04 Unmanned aerial vehicle cluster task planning method and device, storage medium and equipment

Publications (1)

Publication Number Publication Date
CN115951707A true CN115951707A (en) 2023-04-11

Family

ID=87296606

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310006543.3A Pending CN115951707A (en) 2023-01-04 2023-01-04 Unmanned aerial vehicle cluster task planning method and device, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN115951707A (en)

Similar Documents

Publication Publication Date Title
CN112465151A (en) Multi-agent federal cooperation method based on deep reinforcement learning
CN112001585B (en) Multi-agent decision method, device, electronic equipment and storage medium
CN109690576A (en) The training machine learning model in multiple machine learning tasks
US20130325773A1 (en) Stochastic apparatus and methods for implementing generalized learning rules
CN112015174A (en) Multi-AGV motion planning method, device and system
CN111178545B (en) Dynamic reinforcement learning decision training system
CN109405843B (en) Path planning method and device and mobile device
US11663522B2 (en) Training reinforcement machine learning systems
CN115018017B (en) Multi-agent credit allocation method, system and equipment based on ensemble learning
CN113561986A (en) Decision-making method and device for automatically driving automobile
CN114510012A (en) Unmanned cluster evolution system and method based on meta-action sequence reinforcement learning
CN108229536A (en) Optimization method, device and the terminal device of classification prediction model
CN114261400A (en) Automatic driving decision-making method, device, equipment and storage medium
CN113052253A (en) Hyper-parameter determination method, device, deep reinforcement learning framework, medium and equipment
CN115648204A (en) Training method, device, equipment and storage medium of intelligent decision model
CN113894780B (en) Multi-robot cooperation countermeasure method, device, electronic equipment and storage medium
CN115951707A (en) Unmanned aerial vehicle cluster task planning method and device, storage medium and equipment
US20210357692A1 (en) Multi-fidelity simulated data for machine learning
Noureddine et al. Towards an Agent-Based Architecture using Deep Reinforcement Learning for Intelligent Internet of Things Applications. pdf
CN116301022A (en) Unmanned aerial vehicle cluster task planning method and device based on deep reinforcement learning
CN112926729B (en) Man-machine confrontation intelligent agent strategy making method
Han et al. Three‐dimensional obstacle avoidance for UAV based on reinforcement learning and RealSense
CN117093010B (en) Underwater multi-agent path planning method, device, computer equipment and medium
CN112295232B (en) Navigation decision making method, AI model training method, server and medium
Zhang et al. Digital Twin Enhanced Reinforcement Learning for Integrated Scheduling in Automated Container Terminals

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination