CN116382339A

CN116382339A - Multi-unmanned aerial vehicle path planning method, system and electronic equipment

Info

Publication number: CN116382339A
Application number: CN202310377014.4A
Authority: CN
Inventors: 杨秀霞; 张毅; 王晨蕾; 杨林; 梁勇; 李文强; 姜子劼; 于浩
Original assignee: Naval Aeronautical University
Current assignee: Naval Aeronautical University
Priority date: 2023-04-11
Filing date: 2023-04-11
Publication date: 2023-07-04

Abstract

The invention discloses a path planning method, a system and electronic equipment for a multi-unmanned aerial vehicle, and relates to the technical field of path planning. Firstly, under a formation description mode based on pilot reference, effectively describing the geometric formation of the UAV cluster by using a formation matrix mode, establishing an expandable self-adaptive formation library, and establishing a foundation for formation transformation research; secondly, researching formation dynamic transformation under the obstacle avoidance condition, establishing formation transformation evaluation criteria related to environmental constraint and unmanned aerial vehicle cluster track constraint, constructing formation transformation evaluation functions, and designing formation optimal transformation strategies under the obstacle avoidance condition by applying the formation transformation evaluation functions; and finally, according to the selected formation transformation scheme, the MADDPG algorithm is adopted to realize the safe flight of the multi-UAV formation under the obstacle environment, so that the accuracy and the instantaneity of the path planning of the multi-unmanned aerial vehicle can be improved, and the multi-UAV formation has stronger fault tolerance.

Description

Multi-unmanned aerial vehicle path planning method, system and electronic equipment

Technical Field

The invention relates to the technical field of path planning, in particular to a path planning method, a system and electronic equipment for a multi-unmanned aerial vehicle.

Background

The single unmanned aerial vehicle (Unmanned Aerial Vehicle, UAV) is limited by own resources and cannot be capable of large-scale tasks such as collaborative search, fuel supply and the like, and the collaborative coordination of multiple UAV formations can not only execute large-term tasks which cannot be completed by the single UAV, but also increase redundancy by virtue of the number of advantages and improve task success rate. Therefore, path planning research of multiple UAV platoons has very important practical significance.

The transformation of multiple UAV formation and obstacle avoidance problems are important and difficult points of research in the field of multiple UAV control. In the face of continuous changes of task demands and environmental states, how to comprehensively consider the priorities of various performance parameters, and reasonably and efficiently formulate an optimal formation transformation strategy according to local conditions, so that formation safety flying under the condition of obstacle avoidance is an important evaluation index for measuring the quality of UAV formation control technology. Richert, university of California, proposes a formation partition synergy algorithm by minimizing the voyage cost during formation execution tasks to achieve the goal of dynamic transformation of the UAV formation during flight. Aiming at the collision prevention problem in formation transformation, the Xiamen university controls UAV formation to form a certain formation structure by adopting a PID algorithm, and avoids the obstacle by scaling the distance between UAVs, thereby ensuring formation and transformation of formation. Giacomin.P et al propose a control algorithm based on trajectory segmentation, which completes the reorganization of the formation by computing the UAV navigation trajectory in a segmented manner. Rong Xin et al construct a formation transformation evaluation model with total course energy consumption and time cost in the formation flight process, and convert the complex constraint problem of multi-UAV formation transformation in a complex obstacle environment into an optimal solution problem of a solution function model, thereby realizing formation transformation.

The formation transformation obstacle avoidance research has certain advantages, but a plurality of problems still remain to be further solved. The problem of UAV cluster consistency based on classical control algorithms has been one of the main research directions, but classical control theory is often based on strict mathematical derivation, numerous assumptions and ideal conditions need to be set empirically, and these ideal conditions are often difficult to meet in reality, so that the environmental adaptability is poor, and development is limited. In contrast, the explosive development of artificial intelligence, particularly deep reinforcement learning, provides a new idea for the research of multi-agent systems. Unlike classical control theory, deep reinforcement learning gives UAVs stronger practicality and wide applicability in facing complex unknown environments in their constant interaction with the environment. However, most of the current researches are still centralized control, the fault tolerance of the system is poor, and the formation distributed path planning research based on multi-agent deep reinforcement learning is not yet mature.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a path planning method, a system and electronic equipment for a multi-unmanned aerial vehicle.

In order to achieve the above object, the present invention provides the following solutions:

A method of path planning for a multi-unmanned aerial vehicle, comprising:

describing geometric formations of the UAV cluster in a formation matrix mode, and establishing an adaptive formation library based on the geometric formations obtained through description;

determining formation dynamic transformation under the obstacle avoidance condition based on the adaptive formation library;

establishing a formation transformation evaluation criterion of environmental constraint and unmanned aerial vehicle cluster track constraint based on the formation dynamic transformation;

constructing a formation transformation evaluation function based on the formation transformation evaluation criteria;

selecting an optimal transformation strategy of formation under the obstacle avoidance condition based on the formation transformation evaluation function; the optimal transformation strategy is generated by adopting an MADDPG algorithm; the MADDPG algorithm is an improved algorithm based on an Actor-Critic algorithm and a DDPG algorithm;

and realizing the safe flight of the multi-UAV formation in the obstacle environment based on the optimal transformation strategy.

Preferably, the formation matrix is F:

wherein b is a formation expansion parameter, f is a geometric configuration parameter,

the angle parameter for the formation, n, is the total number of UAVs in the cluster.

Preferably, the formation transformation evaluation criteria for establishing environmental constraints and unmanned aerial vehicle cluster track constraints based on the formation dynamic transformation specifically include:

Constructing constraint conditions;

determining the formation structure difference degree based on the formation geometric parameters corresponding to the current formation matrix and the formation geometric parameters corresponding to the formation matrix after transformation;

determining the formation transformation convergence time and the formation transformation path cost;

under the constraint condition, determining an evaluation vector of one formation transformation based on the formation structure difference degree, the formation transformation convergence time and the formation transformation path cost; and taking the evaluation vector of the primary formation transformation as the formation transformation evaluation criterion.

Preferably, the formation transformation evaluation function is:

R(F _start ,F _end )＝H(F _start ,F _end )·α；

wherein R (F) _start ，F _end ) For one formation transformation evaluation result, H (F _start ，F _end ) Evaluation vector for one order of formation transformation, alpha is column vector, F _start For the current formation matrix, F _end Is a formation matrix after transformation.

Preferably, the selecting the optimal transformation strategy for formation under the obstacle avoidance condition based on the formation transformation evaluation function specifically includes:

determining a formation transformation factor based on a maximum length of the obstacle region and a formation width of the current UAV cluster;

and selecting an optimal transformation strategy according to the relation between the formation transformation factors and a preset transformation threshold range.

Preferably, the selecting an optimal transformation strategy according to the relation between the formation transformation factor and a preset transformation threshold range specifically includes:

When the formation transformation factor is larger than a maximum preset transformation threshold, the formation transformation is not needed to be carried out, and the obstacle region can be passed through;

when the formation transformation factor is larger than a minimum preset transformation threshold and smaller than a maximum preset transformation threshold, the UAV cluster executes formation telescopic transformation;

and when the formation transformation factor is smaller than a minimum preset transformation threshold, the UAV cluster executes formation structural transformation.

Preferably, the process of performing the formation telescoping transformation by the UAV cluster is as follows:

and determining a final formation matrix to be transformed according to the formation transformation factors by the pilots in the cluster, and then sending the final formation matrix to be transformed to other UAVs for formation telescopic transformation.

Preferably, the process of performing the formation structural transformation by the UAV cluster is:

traversing a formation library to determine a formation transformation evaluation function corresponding to each of the different geometric formations;

selecting the geometric formation with the smallest formation transformation evaluation function as the transformed formation, and executing formation structural transformation;

when the UAV cluster encounters an obstacle and cannot bypass from the same side, the cluster is divided into two sub-formations according to a division strategy, the two sub-formations bypass from two sides of the obstacle respectively, and the sub-formations are restored to the original formation after the sub-formations pass over the obstacle.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

according to the multi-unmanned aerial vehicle path planning method, firstly, under a formation description mode based on pilot reference, geometric formations of a UAV cluster are effectively described in a formation matrix mode, an expandable self-adaptive formation library is built, and a foundation is built for formation transformation research; secondly, researching formation dynamic transformation under the obstacle avoidance condition, establishing formation transformation evaluation criteria related to environmental constraint and unmanned aerial vehicle cluster track constraint, constructing formation transformation evaluation functions, and designing formation optimal transformation strategies under the obstacle avoidance condition by applying the formation transformation evaluation functions; and finally, according to the selected formation transformation scheme, the MADDPG algorithm is adopted to realize the safe flight of the multi-UAV formation under the obstacle environment, so that the accuracy and the instantaneity of the path planning of the multi-unmanned aerial vehicle can be improved, and the multi-UAV formation has stronger fault tolerance.

Corresponding to the multi-unmanned aerial vehicle path planning method provided by the invention, the invention also provides the following implementation structure:

a multiple unmanned aerial vehicle path planning system, comprising:

the formation library construction module is used for describing the geometric formation of the UAV cluster in a formation matrix mode and establishing an adaptive formation library based on the geometric formation obtained by description;

The dynamic transformation determining module is used for determining formation dynamic transformation under the obstacle avoidance condition based on the self-adaptive formation library;

the evaluation criterion establishing module is used for establishing formation transformation evaluation criteria of environmental constraints and unmanned aerial vehicle cluster track constraints based on the formation dynamic transformation;

an evaluation function determining module for constructing a formation transformation evaluation function based on the formation transformation evaluation criteria;

the optimal transformation strategy selection module is used for selecting an optimal transformation strategy of formation under the condition of obstacle avoidance based on the formation transformation evaluation function; the optimal transformation strategy is generated by adopting an MADDPG algorithm; the MADDPG algorithm is an improved algorithm based on an Actor-Critic algorithm and a DDPG algorithm;

and the safe flight control module is used for realizing the safe flight of the multi-UAV formation in the obstacle environment based on the optimal transformation strategy.

An electronic device, comprising:

a memory for storing a control program;

and the processor is connected with the memory and used for calling and executing the control program so as to implement the multi-unmanned aerial vehicle path planning method.

The technical effects achieved by the two implementation structures provided by the invention are the same as those achieved by the multi-unmanned aerial vehicle path planning method provided by the invention, so that the description is omitted here.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for path planning for multiple unmanned aerial vehicles provided by the invention;

fig. 2 is a schematic diagram of a common formation of an unmanned aerial vehicle according to an embodiment of the present invention; fig. 2 (a) is a schematic diagram of an unmanned plane in-line type column; fig. 2 (b) is a schematic diagram of a polygon team of unmanned aerial vehicles;

fig. 2 (c) is a schematic view of a V-team of unmanned aerial vehicles; fig. 2 (d) is a schematic diagram of a trapezoidal team of unmanned aerial vehicles;

fig. 2 (e) is a schematic diagram of an unmanned plane in-line cross-member; fig. 2 (f) is a schematic diagram of a serpentine team of unmanned aerial vehicles;

FIG. 3 is a schematic diagram of formation transformation provided in an embodiment of the present invention;

fig. 4 is a schematic diagram of an unmanned aerial vehicle formation segmentation strategy provided by an embodiment of the present invention;

fig. 5 is a frame diagram of a madppg algorithm provided in an embodiment of the present invention;

FIG. 6 is a diagram of a UAV platoon flight trajectory provided by an embodiment of the present invention;

FIG. 7 is a graph of distance between UAVs according to an embodiment of the present invention;

FIG. 8 is a graph of the distance between a UAV and an obstacle 1 according to an embodiment of the present invention;

FIG. 9 is a graph of the distance between a UAV and an obstacle 2 provided by an embodiment of the present invention;

fig. 10 is a graph of a distance between a UAV and an obstacle 3 according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention aims to provide a path planning method, a system and electronic equipment for a multi-unmanned aerial vehicle, which can improve the accuracy and the instantaneity of the path planning of the multi-unmanned aerial vehicle and have stronger fault tolerance capability.

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

As shown in fig. 1, the method for planning a path of a multi-unmanned aerial vehicle provided by the invention comprises the following steps:

s1, describing geometric formations of the UAV cluster in a formation matrix mode, and establishing an adaptive formation library based on the geometric formations obtained through description.

Many different kinds of tasks that UAV cluster formation flights can accomplish, such as air co-observation, fueling, multi-machine co-striking, etc. When performing the formation tasks, few tasks maintain an inherent geometry formation throughout the UAV cluster formation task due to the number of UAVs in the UAV cluster, the application environment, and the difference in tasks. In order to maximize the benefits of UAV formation, a good formation control algorithm is required to ensure that formations can be flexibly and quickly transformed.

The UAV formation transformation is to perform formation transformation by comprehensively considering the flight constraint and the environmental constraint of the UAV in order to balance the energy consumption of each UAV in the cluster or to smoothly pass through the obstacle area in the process of flying the UAV cluster formation to the task area after formation, and the formation transformation is performed efficiently and reasonably so as to be beneficial to playing the best effect of the UAV formation.

Because the existing UAV flight operations are mostly in a single-machine flight mode, the specific formation design can be used for referencing the flight experience of man-machine formation. From long-term combat experience of aircraft, it can be appreciated that formation formations can be selected according to different mission requirements. When the aeronauts execute the fight task, the fight formation is correctly selected and used, the whole power of the air force can be fully exerted, and the close cooperative combat can be performed.

The formations commonly used in formation control are a line-shaped column, a regular polygon, a V-shaped column, a trapezoid, a line-shaped cross, a serpentine, etc., as shown in fig. 2. The in-line vertical team is often used for air drop and interception tasks, has strong maneuverability and is easy to change the team shape, and the capability of executing tasks in complex areas is stronger and the safety is higher. Regular polygon queues are often used to protect objects and, due to their close distance from each other, have a strong mutual protection capability in emergency situations. V-fleets are commonly used for reconnaissance. Trapezoidal teams are commonly used for batch attacks. The horizontal team in a straight shape is often used for searching in a wide front face, the coverage area is relatively large, the target detection capability is stronger in an open area with wide vision, and the task execution efficiency is higher. A snake is often used for large formation voyages.

Based on this, in order to build or maintain a particular geometric formation, a representation of the geometric formation needs to be built first. There is currently no unified formation representation method for describing the formation of UAV clusters. The present invention proposes a formation matrix for representing UAV formations in step S1 described above. In order to express the position of each UAV in the formation, a structural model of the formation is established in combination with a formation description mode of pilot reference. The formation structural model adopts a formation matrix F of 4 rows and n columnsTo represent geometric formations of UAVs, where n represents the number of UAVs in the cluster. In the formation matrix F, the first behavior UAV is numbered, and the second, third, and fourth rows represent distances between the UAV and the reference UAV in the x, y, and z directions, respectively. In the method for representing a formation matrix, the formation of the whole UAV cluster is described by a matrix F, wherein the formation information of each UAV is represented by a column vector a ^T _i To represent. For example, assuming that the long machine is a pilot reference UAV in the V-shaped formation shown in fig. 2 (c), the formation matrix F of the entire UAV cluster may be expressed as:

wherein b represents a formation expansion parameter corresponding to the formation, f is a geometric configuration parameter corresponding to the formation,

An angle parameter representing formation.

The invention establishes a formation library aiming at the common geometric formations such as 'V' -shaped ',' I '-shaped', 'polygonal', 'trapezoidal' and the like in the formation control field, which can support any number of UAV clusters, is used for supporting formation and formation transformation of UAV cluster formation flying, so that formation transformation is more flexible.

S2, forming dynamic transformation under the condition of obstacle avoidance is determined based on the self-adaptive forming formation library.

S3, establishing a formation transformation evaluation criterion of environmental constraint and unmanned aerial vehicle cluster track constraint based on formation dynamic transformation. When an obstacle is encountered during the task execution of UAV cluster formation, the existing literature generally processes that the original formation is disturbed by UAV individuals, the formation is reorganized after bypassing the obstacle, or the formation is changed into a "line" formation to pass through the obstacle region. None of these existing methods take into account environmental constraints in the formation transformation or the formation transformation that does not take into account UAV cluster flight path constraints or flight performance constraints, both in terms of time consumption and formation retentionNeither is optimal in view. Based on the above, in order to optimize the efficiency of the formation transformation in the actual scene, the invention establishes an extensible formation transformation evaluation criterion to measure the primary formation transformation from the initial formation F _start To end formation F _end Form transformation efficiency of (c) to make an optimal deformation transformation choice. Based on this, the process of establishing the formation transformation evaluation criteria of the environmental constraint and the unmanned aerial vehicle cluster track constraint in step S3 specifically includes:

s3-1, UAV kinematic constraints

In the whole formation transformation process, the course angle and the course angular speed of the UAV are required to be changed within a certain range so as to meet the flight performance constraint J of the UAV _uav . The constraint conditions are as follows:

in the psi- _min 、ψ _max The minimum and maximum heading angles of the UAV are respectively.

The minimum and maximum heading angular velocities of the UAV are respectively.

S3-2, formation Structure Difference degree

When a team performs a task, it is often necessary to maintain a particular geometry in order to ensure the efficiency of task completion. The V-shaped formation is typically maintained when it is desirable to maintain the overall energy consumption of the UAV cluster to a minimum, while maintaining the regular polygon formation is advantageous for maximizing the formation observation range. Thus maintaining the formation transformation in the same geometry facilitates better performance when encountering an obstacle region and having to do the formation transformation.

The formation structure difference (Formation structural difference) describes the geometric difference of the formations before and after formation transformation, using H _fsd (F _start ,F _end ) The calculation method is represented by the following formula:

in the above, F _start For the current formation matrix, F _end Representing the formation matrix after transformation. n is the total number of UAVs in the cluster, F _start (k, i) describes the geometry of the formation corresponding to UAVi in the cluster, k representing the matrix. Formation structure difference degree H _fsd (F _start ,F _end ) The difference in the geometric configuration corresponding to the two geometric formations before and after transformation is quantitatively described from the geometric description angle of the formations, the influence of the absolute value of the formation parameters on the measurement value is eliminated, and the method has definite geometric meaning.

S3-3, captain transformation Convergence time

In UAV cluster formation tracking observation tasks, UAV clusters involve three processes from formation, formation holding to formation transformation until the task requirements are finally completed. In the whole formation process, the formation convergence rate measures the quality of a formation control algorithm. While a formation transformation describes the process from the destruction of the original formation until convergence to a new formation, the shorter the convergence time used for the formation transformation, the more efficient the overall formation task.

The formation transformation convergence time (Formation transformation convergence time) describes the time taken to converge to form a new formation from the time that the UAV cluster performs the formation transformation, calculated as follows:

In the above-mentioned method, the step of,

and->

The time when the ith UAV in the cluster starts transforming the formation and the time when the ith UAV in the cluster converges to the new formation after the last UAV in the cluster starts transforming the formation are respectively represented.

S3-4, formation transformation path cost

Because of the problems of short duration and insufficient navigational energy of the UAV, researchers often measure the energy consumption of the UAV by the navigational path length of the unmanned aerial vehicle in one mission in order to measure the energy consumption of the UAV.

To optimize energy consumption of the UAV cluster in the formation transformation, the formation transformation path cost (Formation transformationpath cost) describes the overall path cost for all UAVs in the UAV cluster performing the formation transformation from start to converge to a new formation, calculated as follows:

in the above, v _i Representing real-time speed of the ith UAV in the formation transformation process,

representing the total flight path of the ith UAV in the formation transformation process, and measuring the path cost of the UAV in the formation transformation process by the sum of absolute values of each path.

S3-5, marking the evaluation factor as an evaluation vector of the formation transformation, wherein the evaluation vector of the formation transformation at one time can be expressed as:

H(F _start ,F _end )＝[H _fsd (F _start ,F _end )，H _ftct (F _start ,F _end )，H _ftpc (F _start ,F _end )]。

s4, constructing a formation transformation evaluation function based on the formation transformation evaluation criterion. Based on the implementation process of step S3, a column vector α= [ α ] is introduced ₁ ,α ₂ ,α ₃ ]Wherein alpha is ₁ ,α ₂ ,α ₃ The weight factors corresponding to the formation structure difference degree, formation transformation convergence time and formation transformation path cost are respectively represented. By alpha ₁ Representing the importance of maintaining geometry in formation transformation, using alpha ₂ Representation formation transformationThe importance of keeping the formation stable is achieved by adopting alpha ₃ Representing the importance of saving UAV cluster energy. The evaluation function that measures one formation transformation can be expressed as:

R(F _start ,F _end )＝H(F _start ,F _end )·α。

the formation evaluation function fully considers the environmental constraints of the formation transformation and the flight path constraints of the UAV cluster. The formation transformation evaluation vector is used for calculation, so that newly added evaluation factors can be easily added into the evaluation vector, and the expandability of the formation transformation evaluation criterion is increased.

S5, selecting an optimal transformation strategy of formation under the obstacle avoidance condition based on the formation transformation evaluation function. The optimal transformation strategy is generated by adopting a MADDPG algorithm. The MADDPG algorithm is an improved algorithm based on an Actor-Critic algorithm and a DDPG algorithm.

In order to fully consider environmental constraint and UAV cluster flight path constraint in the course of formation transformation, dynamically selecting an optimal formation transformation mode when the formation transformation is required. The step introduces a formation transformation factor, and divides the dynamic formation transformation into two modes of formation expansion transformation and formation structural transformation so as to select an optimal formation transformation strategy.

The formation transformation factor refers to the parameter of the UAV cluster that maintains the existing geometry, while achieving a formation scaling transformation over the formation size, denoted by the symbol ω. The geometry of each formation is written as ω corresponding to a particular formation transformation factor minimum _min The minimum value represents a minimum value of a formation transformation factor that ensures that UAVs in the cluster do not collide. The formation transformation factor is calculated by the following steps:

wherein Z is the maximum length of the obstacle region, D _F Form width for UAV clusters.

When the formation expansion transformation is performed, the following steps are:

wherein omega is _start And omega _end The corresponding formation transformation factors before and after formation transformation are respectively represented. Thus, when performing the formation scaling, the formation matrix F after the formation scaling _end For transforming the pre-matrix F _start Obtained by elementary conversion of matrix, i.e. F _end And F _start Is a similarity matrix. Combining the formation transformation evaluation functions obtained above, when F _end And F _start In the case of a similarity matrix, the value of the evaluation function is minimal. Thus, under equal conditions, the formation scaling transformation is the optimal transformation strategy with respect to the formation structural transformation.

Further, the optimal transformation strategy of the dynamic formation based on obstacle avoidance is as follows:

When the UAV cluster compares the position data of the UAV cluster with the known environment information to find that an obstacle or threat area exists within the safe distance of the UAV cluster, firstly, calculating a formation transformation factor of the current formation, and judging according to the size of the formation transformation factor, wherein a schematic diagram is shown in fig. 3:

(1) If the formation transformation factor omega > 1, the method is represented by the formula

The distance between obstacles is larger than the formation width of the UAV cluster, and the obstacle region can be passed through without formation transformation.

(2) If the formation transformation factor omega _min And < omega < 1, wherein the formation expansion transformation is performed at the moment, and the formation interval of the UAV is shortened to keep the geometric configuration unchanged, so that the UAV can pass through the obstacle region. At this time, the UAV cluster executes formation telescoping transformation, and a navigator in the cluster determines a formation matrix F to be finally transformed according to the formation transformation factor _end And then sent to other UAVs for formation scaling. Since the formation width takes into account the minimum safe distance between the UAV clusters and the obstacle region, the formation width of the transformed formation is adjusted to be obstacleThe maximum distance of the obstacle region can ensure that the UAV cluster safely passes through the obstacle region. The formation matrix after transformation at this time is:

wherein D is _Hstart Formation width of UAV clusters prior to formation transformation.

(3) If the formation transformation factor omega < omega _min At this time, even if the formation pitch is shortened to the shortest safe pitch that maintains the current geometry, the formation cannot pass through the obstacle region, so that structural transformation of the formation is required. Firstly traversing a formation library, calculating formation transformation evaluation functions corresponding to each different geometric formation, and selecting the geometric formation with the minimum formation transformation evaluation function as the transformed formation F _end A formation structural transformation is performed.

Particularly, when the UAV cluster encounters some obstacles and cannot wrap around from the same side, the cluster can be divided into two sub-formations according to a division strategy, and the two sub-formations can wrap around from two sides of the obstacle respectively, and then the original formation is restored after the sub-formations pass over the obstacle.

And determining whether to divide the formation according to the relative position relation between the formation and the obstacle with the greatest threat. FIG. 4 is a schematic diagram of a queue partition strategy, where the queue partition mainly includes two determinants. One factor is the current position of all UAVs, based on which an obstacle passing center p _obs Along the length of the machine currently desired relative speed

The straight line of direction initially divides the formation into two parts. Another factor is to consider each UAV as the desired relative velocity of the long-machine UAV

And judging the part of the UAVs which do not contain the long machine after the preliminary division according to the following formula:

in the method, in the process of the invention,

is the distance between the plane i and the center of the obstacle,/->

Is the distance between the long machine and the center of the obstacle circle.

The UAVs meeting the conditions are then finally separated from the current formation to form a sub-formation. The two sub-formations will detour from both sides of the two obstacles respectively and, after both sub-formations have passed the current obstacle, reassemble into a new formation.

Further, the optimal transformation strategy employed in this step S5 is generated using the madppg algorithm. Specifically, in a multi-UAV environment, each UAV needs to learn constantly in order to obtain optimal strategies, resulting in a static environment that is not a stationary environment due to learning strategies that the UAV is changing constantly. In a multi-UAV environment, UAVs use independent reinforcement learning algorithms to optimize policies through local behavior value functions or value functions, which may cause difficult convergence of policies due to non-stationarity. The MADDPG algorithm is a reinforcement learning algorithm based on a multi-agent environment, which is proposed for solving the problems.

The MADDPG algorithm is based on the Actor-Critic and DDPG, and adopts the principles of centralized learning and distributed application, so that the MADDPG algorithm can be suitable for complex multi-agent environments which cannot be processed by the traditional reinforcement learning algorithm. While the traditional reinforcement learning algorithm must use the same information data for both learning and application, the madppg algorithm allows some additional information (i.e., global information) to be used for learning, but only local information to be used for applying decisions. Compared with the traditional Actor-Critic algorithm, n intelligent agents are shared in the MADDPG algorithm environment, and pi is used for the strategy of the ith intelligent agent _i Is expressed, and its policy parameter is theta _i Then a combined policy set of n agents can be obtained as pi=pi ₁ ,π ₂ ,…,π _n The policy parameter set is θ=θ ₁ ,θ ₂ ,…,θ _n . The method has the core ideas of searching the optimal combination strategy through a framework of centralized training and distributed execution, and can solve the problems of non-stationarity of the multi-agent reinforcement learning environment and failure of an experience playback method.

The empirical pool design of MADDPG algorithm is

Wherein (1)>

For the set of observations of all UAVs at time t, +.>

For the set of actions of all UAVs at time t, +.>

Rewards obtained after performing the respective actions for all UAVs at time t,>

is the set of observations for all UAVs at time t+1.

The term "centralized training and decentralized execution" refers to centralized training and decentralized execution, that is, an optimal strategy obtained through training and learning, and when the method is applied, the optimal action can be output only by using the observation information-local information of the UAV. During centralized training, additional information is overlapped on a basic DDPG algorithm to obtain more accurate Q value calculation, the Q value calculation is fed back to an Actor network, the values can be states and actions of other UAVs, and each UAV evaluates the value of the current Actor network output action according to the observed value and actions of the UAVs and the actions of the other UAVs. The Q value is calculated as:

In θ ^Q Is atParameters of the line critic network, Q (s _t ,a ₁ ,a ₂ ,…,a _n |θ ^Q ) Is a centralized state-action function that includes not only the states observed by itself and actions performed, but also the actions of other UAVs (a) ₁ ,a ₂ ,…,a _n ). In state s _t Execute action downwards

Rewards from the environment->

Will be superimposed on Q.

The input of the online Critic network of each UAV is the same, and the input is updated by minimizing the loss function, which is equivalent to building a centralized Critic network, wherein the loss function is as follows:

wherein N is the training times.

Is a centralized state-action function, comprising not only the state observed by itself and the action performed, but also the actions of other UAVs +.>

Thus, the Critic network of each UAV not only knows about changes in its own UAV, but also about all other UAV action policies, in which case the environment can be considered stable even if the policies are constantly updating changes. Because even when pi _i ≠π′ _i The following equation is still true, which is why madddpg can solve environmental instability.

In the middle ofP is a multi-agent system dynamics model, pi _i ' means different from pi _i I=1, 2,..n.

The on-line Actor network updating strategy gradient is as follows:

Wherein N is the training number, Q (s, a|theta ^μ ) For an on-line Actor network, θ ^μ Mu (s|theta) ^μ ) In order to be a target Actor network,

represents gradient calculations, μ (s _i ) For an online actor network in state si, J represents the policy gradient.

After the decentralized execution refers to training, each Actor can take proper action according to own observation value without action of other UAVs. The Actor network and the Critic network in the MADDPG algorithm cooperate, each UAV uses an independent Actor to output a determined action, but the Critic network inputs the observation state information and the action information of the UAV and also comprises the action information of other UAVs. Each UAV corresponds to a centralized Critic network that accepts data generated by the Actor network of all UAVs simultaneously.

The madppg algorithm framework is shown in fig. 5, and it is known from the overall algorithm framework that, for a single UAV, the state of the single UAV is first input into its own policy network, and then output after an action is obtained and acted on the environment, at this time, a new state and return value are obtained, and finally state transition data is stored into the UAV's own experience pool. All UAVs constantly interact with the environment, constantly generating data and storing it in their respective experience pools. In updating the network, a batch of data at the same time is randomly taken from the experience pool of each UAV, and spliced to obtain new experiences (S, A, S', R). Wherein S and S 'are observed values of all UAVs at the same moment, S is state information observed by an agent from the environment at a certain moment, S' is state information observed by the agent from the environment after executing the action a, A is an action set made by all UAVs at the same moment, and R is a reward value of the ith UAV. Finally, the observed value S 'is input into a target Actor network of the ith UAV to obtain an action A', then the action A 'and the observed value S' are input into a target Critic network of the ith UAV together to obtain a target Q value estimated for the next moment, and the target Q value at the current moment is calculated according to the following formula.

y _i ＝r _i +γQ′(s _i+1 ,μ′(s _i+1 |θ ^μ′ )|θ ^Q′ )。

Wherein μ ' = [ μ ' ] ' ₁ ,μ′ ₂ ,…,μ′ _n ]An Actor network representing an ith UAV after performing action a, Q' being the Q value after performing action a, θ ^μ′ Theta is a target Actor network parameter ^Q′ For the target Critic network parameters, γ is a discount factor, representing the importance of the current feedback, and the longer the time, the smaller the impact.

S6, realizing safe flight of the multi-UAV formation in the obstacle environment based on the optimal transformation strategy. Based on the implementation of step S5, reinforcement learning of the UAV completes the learning task under the guidance of a reward function, which directs the UAV to interact with the environment according to the task requirements and instructs the UAV to prioritize the learning perception of the environment in step S6. The selection of the reward function typically requires extensive experimentation and trial and error, and the selection of an improper form of the reward function may lead to unexpected problems, resulting in the UAV learning of an undesirable solution. Multiple UAV learning environments will face more complex situations than single UAV learning environments, so in order for multiple UAVs to be able to fulfill task requirements, their reward functions also need to be more carefully tried out and selected. Generally speaking, the reward function of multi-UAV reinforcement learning considers both the behavior of the UAV itself and the cooperative behavior between UAVs, and the cooperative behavior is also embodied by the reward function.

The invention develops UAV cluster path planning research based on a two-dimensional dynamic obstacle environment. Within the cluster there are n UAVs, each of which has its own state S _uavi Velocity vector including current time

And position coordinates in the environment +.>

Environmental state S _env Comprises distance and speed information (d, psi) of all j dynamic and static obstacles in the environment relative to the UAV _d ,θ _d ,v,ψ _v ,θ _v ) Where d is the Euclidean distance of the obstacle relative to the UAV. Psi phi type _d Is the relative distance heading angle. θ _d Is the relative distance climb angle. v is the speed of movement of the obstacle relative to the UAV. Psi phi type _v Is the course angle of the relative movement speed. θ _v Is the climbing angle of the relative movement speed.

In the madppg algorithm, the state of each UAV includes its own state, the states of other UAVs, and the environmental state. The state of the UAV1 at time t can be defined as:

the final network inputs for each UAV are:

/>

the method of training different multi-UAVs to form a queuing model relies primarily on differences in the reward functions. The reward value function R consists of four parts, one is to instruct the UAV to fly with stable attitude and speed, and R is used _single And (3) representing. The second is to instruct UAVs and other UAVs how to fly cooperatively and maintain a certain formation distance, R is used _form And (3) representing. Thirdly, guiding the unmanned aerial vehicle to avoid obstacle by R _obstacle And (3) representing. Fourth, instruct UAV to determine the least costly formation transformation method using R (F _start ,F _end ) The bonus function is represented by the following:

R _form ＝-|d _ij -d _safe |。

R(F _start ,F _end )＝H(F _start ,F _end )·α。

R＝α _s R _single +α _f R _form +α _o R _obstacle +α _FF R(F _start ,F _end )。

wherein V= [ V, ψ, θ]，d _ij Represents the distance between the ith UAV and the jth UAV, d _safe Representing the safe distance of formation, dist ^ij Represents the distance between the ith UAV and the jth obstacle, d _det Representing the detection distance, alpha, of the UAV _s 、α _f 、α _o 、α _FF Is a constant, alpha _s +α _f +α _o +α _FF ＝1。

In addition, in the process of performing the path planning method for multiple unmanned aerial vehicles provided by the invention, UAV cluster problem description is also required, which specifically includes:

step 1, task modeling

Assuming that a multi-UAV formation consisting of n UAVs flies in a complex obstacle environment in a certain initial formation, after receiving a formation transformation instruction in the flying process, completing formation transformation in the shortest time by the UAV formation, and realizing optimal position selection of each UAV. The following assumptions are given for the convenience of the study:

(1) In the overall UAV platoon, each UAV may obtain the position and heading of other UAVs in real time through sensors.

(2) Communication delay and packet loss are not considered when the formation transmits information.

(3) The aerodynamic impact between the UAVs during formation transformation is not considered.

Step 2, motion model

The UAV in the formation transformation problem is regarded as a particle motion model, and the acceleration and course angle of the UAV are used for controlling the motion process of the UAV. The equation of motion of the UAV can be expressed as:

Wherein: i=1, 2,..n, n is the number of UAVs, v _i The speed of the ith UAV in the XOY plane is represented, ψ is the heading angle of the UAV, and a is the acceleration of the UAV. X is x _i Is the position of the ith UAV in the x direction, y _i For the position of the ith UAV in the y-direction,

for the speed of the ith UAV in x direction,/->

Is the speed of the ith UAV in the y-direction.

The acceleration a and heading angle ψ of the UAV satisfy the following conditions in view of the saturation constraints of the control inputs:

/>

where the acceleration specific constraint parameters are dependent on the model and flight parameters of the UAV.

The obstacle avoidance performance of the proposed UAV cluster formation transformation strategy is tested in a simulation environment, so as to explain the effectiveness of the multi-unmanned aerial vehicle path planning method.

The simulation experiment environment is an InterCore i5 processor with Python 3.7 and 2.42GHz main frequency and a Windows 10 operating system. The test scenario is shown in fig. 6, where the simulation environment range is set to 500m×500m.

The four UAVs are adopted for experiments, starting from initial positions with coordinates of (57.30 m,44.04 m), (11.82 m,21.95 m), (263.83 m,29.89 m) and (91.10 m and 12.34 m), target polygonal formations are gradually formed in the flying process and are always kept flying until formation flying is detected, and then telescopic transformation is firstly carried out according to the optimal formation transformation strategy designed in the step S5, then the two sides of the obstacle are bypassed according to a segmentation strategy, structural transformation is completed, and the flying of the original polygonal formations is restored until the end point is reached after the formation flying safely passes through the obstacle region.

According to the formation transformation evaluation criterion established in the step S3, in order to improve the stability of formation maintenance as much as possible, the formation structure difference degree and the formation transformation time cost are respectively set to be 0.4, 0.4 and 0.2. At simulated 80.4s, UAV2 detects that only 26.1m remains from obstacle 1 at this time, while the set safe distance for UAV formation is 30m, so the formation needs to be adjusted accordingly. The coordinates of 3 obstacles (i.e., obstacle 1, obstacle 2 and obstacle 3) set in the simulation environment are (84 m,180 m), (200 m,220 m) and (110 m,280 m), respectively, the diameters are 36m, 65m and 60m, respectively, the maximum passable width between the obstacle 1 and the obstacle 2 is 72.2m, and the currently maintained square formation has a formation width of 109.9m, and the formation cannot directly pass through the obstacle region. According to the calculation, when the formation transformation factor is taken to be the smallest formation transformation factor of the current formation, the minimum formation width can be reduced to 46.9m, so that the formation expansion parameter b of the formation can be correspondingly adjusted to safely pass through the obstacle region at the moment, and structural transformation of the formation is not needed. However, since the maximum passable width between the obstacle 1 and the obstacle 3 is 55.3m and the maximum passable width between the obstacle 2 and the obstacle 3 is only 45.7m, which is smaller than the minimum formation width under the current quadrilateral formation, a formation structural transformation is required to divide the original formation into two sub formations to detour from both sides of the obstacle 3, respectively. When the movement reaches 125.4s, the UAV convoy exits the obstacle region to recover the original polygonal formation. The UAV flight trajectory is shown in fig. 6.

The simulation takes 300ms as a sampling period, fig. 7-10 are position information of each UAV in the formation under each sampling time, wherein fig. 7 is a distance curve between each UAV, and fig. 8-10 are distance curves between each UAV and three obstacles respectively. As can be seen from fig. 7-10, UAV formation is in T ₁ At the beginning of timeStep forming quadrilateral formations, T ₂ After detecting the existence of an obstacle within a safe distance at any time, performing formation telescopic transformation, wherein the UAV interval in formation is relatively reduced, and then at T ₄ And (3) performing formation segmentation in a time period, wherein 4 UAVs respectively pass through two sides of the obstacle, and after the UAV formation smoothly passes through the obstacle region, the UAV formation is reaggregated to recover the original polygonal formation.

In addition, corresponding to the multi-unmanned aerial vehicle path planning method provided by the invention, the invention also provides the following implementation structure:

a multi-unmanned aerial vehicle path planning system is used for implementing the multi-unmanned aerial vehicle path planning method. The system comprises:

the formation library construction module is used for describing the geometric formation of the UAV cluster in a formation matrix mode and establishing an adaptive formation library based on the geometric formation obtained through description.

And the dynamic transformation determining module is used for determining formation dynamic transformation under the obstacle avoidance condition based on the adaptive formation library.

And the evaluation criterion establishing module is used for establishing formation transformation evaluation criteria of environmental constraints and unmanned aerial vehicle cluster track constraints based on formation dynamic transformation.

And the evaluation function determining module is used for constructing a formation transformation evaluation function based on the formation transformation evaluation criteria.

And the optimal transformation strategy selection module is used for selecting an optimal transformation strategy formed under the obstacle avoidance condition based on the formation transformation evaluation function. The optimal transformation strategy is generated by adopting a MADDPG algorithm. The MADDPG algorithm is an improved algorithm based on an Actor-Critic algorithm and a DDPG algorithm.

And the safe flight control module is used for realizing the safe flight of the multi-UAV formation under the obstacle environment based on the optimal transformation strategy.

An electronic device, comprising:

and the memory is used for storing the control program.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims

1. A method for path planning for a multi-unmanned aerial vehicle, comprising:

2. The method of claim 1, wherein the formation matrix is F:

3. The multi-unmanned aerial vehicle path planning method of claim 1, wherein the establishing the formation transformation evaluation criteria for the environmental constraint and the unmanned aerial vehicle cluster track constraint based on the formation dynamic transformation specifically comprises:

constructing constraint conditions;

4. A method of path planning for a multi-unmanned aerial vehicle as claimed in claim 3, wherein the formation transformation evaluation function is:

R(F _start ,F _end )＝H(F _start ,F _end )·α；

Wherein R (F) _start ，F _end ) For one formation transformation evaluation result, H (F _start ，F _end ) Evaluation vector for one order of formation transformation, alpha is column vector, F _start For the current formation matrix, F _end Is a transformationThe subsequent formation matrix.

5. The method for planning a path for a plurality of unmanned aerial vehicles according to claim 1, wherein the selecting an optimal transformation strategy for formation in case of obstacle avoidance based on the formation transformation evaluation function specifically comprises:

6. The method for planning a path for a plurality of unmanned aerial vehicles according to claim 5, wherein the selecting an optimal transformation strategy according to the relationship between the formation transformation factor and a preset transformation threshold range specifically comprises:

7. The method of claim 6, wherein the UAV cluster performs a formation telescoping transformation by:

8. The method of claim 6, wherein the UAV cluster performs a formation structural transformation by:

9. A multiple unmanned aerial vehicle path planning system, comprising:

10. An electronic device, comprising:

a memory for storing a control program;

a processor, connected to the memory, for retrieving and executing the control program to implement the multi-unmanned aerial vehicle path planning method according to any one of claims 1-8.