CN115494866A

CN115494866A - Multi-unmanned aerial vehicle global and local path intelligent planning method and system

Info

Publication number: CN115494866A
Application number: CN202211160137.4A
Authority: CN
Inventors: 谢拥军; 贾培; 谭川; 王学慧; 刘莹
Original assignee: Zhuhai Anqing Technology Co ltd
Current assignee: Zhuhai Anqing Technology Co ltd
Priority date: 2022-09-22
Filing date: 2022-09-22
Publication date: 2022-12-20

Abstract

The invention discloses a method and a system for intelligently planning global and local paths of multiple unmanned aerial vehicles, which comprises the steps of constructing a grid map, obtaining a global path planning model by utilizing an improved bidirectional search A star algorithm and planning the global path; acquiring barrier motion state information in an environment through a millimeter wave radar, and acquiring flight state information of the unmanned aerial vehicle; judging whether the conflicts exist, designing a reward function, and generating a corresponding obstacle avoidance strategy; acquiring a state space and an action space, performing obstacle avoidance training by adopting a depth certainty strategy gradient algorithm, controlling the action of the unmanned aerial vehicle on a continuous action space, avoiding obstacles, acquiring a local path planning model and tracking a global navigation path; each unmanned aerial vehicle carries out path planning by using a global path planning model and a local path planning model at the same time; the invention can improve the track planning and obstacle avoidance efficiency of the multi-unmanned aerial vehicle system, realize the control of the continuous action output of the multi-unmanned aerial vehicle in a dynamic unknown complex environment and meet all-weather working conditions.

Description

Multi-unmanned aerial vehicle global and local path intelligent planning method and system

Technical Field

The invention relates to the technical field of unmanned aerial vehicles, in particular to a method and a system for intelligently planning global and local paths of multiple unmanned aerial vehicles.

Background

With the continuous progress and development of the scientific and technological level, the unmanned aerial vehicle is widely applied to the fields of military affairs, agriculture, transportation, public management and the like by virtue of the characteristics of strong maneuverability, low cost, convenient operation and the like. The application generally relates to the joint completion of tasks of multiple unmanned aerial vehicles to improve efficiency, and because static and dynamic obstacles exist in the air, the unmanned aerial vehicles need to have very high global path planning capability and local dynamic obstacle avoidance capability to complete designated tasks, and the global path planning and local dynamic obstacle avoidance research of the multiple unmanned aerial vehicles is one of important technologies in the field of unmanned aerial vehicles.

The traditional global path intelligent planning algorithm mainly comprises an A star search algorithm, a particle swarm algorithm, a fast expansion random tree algorithm and a genetic algorithm, and the local dynamic obstacle avoidance method mainly comprises an obstacle avoidance algorithm based on conductivity, an obstacle avoidance algorithm based on a speed obstacle method and an obstacle avoidance algorithm based on an artificial potential field, but the traditional obstacle avoidance method has large limitation and is not suitable for complex and dynamic unknown environments; one of the popular directions which are receiving attention in recent years is an intelligent obstacle avoidance method combining deep learning and reinforcement learning, but the method has the problems of high convergence difficulty, poor real-time performance and the like.

Therefore, how to provide an intelligent planning method and system for global and local paths of multiple unmanned aerial vehicles is a problem that needs to be solved urgently by those skilled in the art.

Disclosure of Invention

In view of this, the invention provides a method and a system for intelligently planning global and local paths of multiple unmanned aerial vehicles, which can improve the track planning and obstacle avoidance efficiency of a multiple unmanned aerial vehicle system, realize the control of continuous action output of the multiple unmanned aerial vehicles in a dynamic unknown complex environment, and meet all-weather working conditions.

In order to achieve the purpose, the invention adopts the following technical scheme:

a multi-unmanned aerial vehicle global and local path intelligent planning method comprises the following steps:

s1, constructing a grid map according to a starting point and a target point of each unmanned aerial vehicle navigation task, obtaining a global path planning model by using an improved bidirectional search A star algorithm, and performing global path planning to obtain a reference global navigation path of each unmanned aerial vehicle;

s2, acquiring obstacle motion state information in the surrounding environment of the unmanned aerial vehicle through a millimeter wave radar, and acquiring flight state information of the unmanned aerial vehicle;

s3, judging whether to plan a local obstacle avoidance path according to the relative position relation between the unmanned aerial vehicle and the obstacle;

s4, judging whether the unmanned aerial vehicle and the obstacle generate flight conflict or not according to the obstacle motion state information and the unmanned aerial vehicle flight state information, designing a reward function according to the flight conflict, and generating a corresponding local obstacle avoidance strategy;

s5, acquiring a state space and an action space, performing obstacle avoidance training by adopting a depth certainty strategy gradient algorithm, controlling the action of the unmanned aerial vehicle on a continuous action space, avoiding obstacles, acquiring a local path planning model and tracking a global navigation path;

and S6, simultaneously using the global path planning model and the local path planning model to plan paths for each unmanned aerial vehicle in the unmanned aerial vehicle cluster.

Preferably, the improved bidirectional search a-star algorithm in S1 includes:

s11, optimizing the path planning;

specifically, the method comprises the following steps: searching the shortest route to the destination by adjusting the heuristic function of the A star algorithm;

s12, smoothing the flight path;

specifically, the method comprises the following steps: and smoothing the navigation path by using a quasi-uniform B-spline curve, wherein the quasi-uniform B-spline curve adopts a segmented continuous polynomial.

Preferably, the obstacle motion state information includes a distance between the obstacle and the unmanned aerial vehicle, and position information and motion speed information of the obstacle relative to the unmanned aerial vehicle; the flight state information of the unmanned aerial vehicle comprises unmanned aerial vehicle position information and flight speed information.

Preferably, when the speed direction of the unmanned aerial vehicle relative to the obstacle and the position direction of the obstacle relative to the unmanned aerial vehicle are detected to be in the same quadrant, local path planning is carried out;

preferably, the specific method for judging whether the unmanned aerial vehicle and the obstacle generate flight conflict in S4 is as follows:

simplifying the unmanned aerial vehicle into particle A, regarding the barrier as barrier circle O with safe radius, the distance between the two points A and O is d _i The line segment OA is on the straight line l, and the relative velocity vector of the unmanned aerial vehicle and the obstacle is v _uioi ：

v _uioi ＝v _ui -v _oi

Wherein v is _ui Velocity vector information for unmanned aerial vehicles, v _oi Velocity vector information for the obstacle;

obtaining a tangent line l passing through the point A and tangent to the obstacle circle ₁ And l ₂ Straight line l and tangent line l ₁ Or l ₂ The included angle between them is alpha _i Relative velocity vector v _uioi And the straight line l form an angle beta _i The radius of the obstacle circle is R _i When is beta _i ≤α _i In time, a flight conflict exists between the unmanned aerial vehicle and the obstacle; otherwise, when beta _i ＞α _i When, there is no flight conflict;

preferably, the state space after the interaction between the unmanned aerial vehicle and the environment in S5 is:

wherein u is _i Real-time position information of the ith unmanned aerial vehicle; v. of _ui Real-time velocity vector information of the ith unmanned aerial vehicle;

real-time course angle information, gamma, for the ith drone _i For real-time pitch angle information of the unmanned aerial vehicle, o is the real-time pitch angle information of the obstacle detected by the millimeter wave radarPosition vector information, v _o The real-time velocity vector information of the obstacle detected by the millimeter wave radar, d is the real-time distance vector information of the obstacle and the millimeter wave radar, p is the position information of a real-time sub-target point on the global path, and g is the position information of the global path terminal.

Preferably, the action space after the interaction between the unmanned aerial vehicle and the environment in S5 is:

wherein v is _ui Is the real-time velocity vector information of the unmanned aerial vehicle,

real-time course angle information, gamma, for unmanned aerial vehicles _i Real-time pitch angle information of the unmanned aerial vehicle;

v _ui 、

and gamma _i The variation range of (A) is as follows:

v _ui ∈[v _min ，v _max ]∩[v-v _a Δt，v+v _b Δt]

γ _i ∈[0，π]∩[γ-γ _a Δt，γ+γ _b Δt]

wherein v is _min And v _max Minimum and maximum speed, v, of the drone _a And v _b For the maximum deceleration and the maximum acceleration of the drone in the direction of travel,

and

for the unmanned plane on the most sailing angleLarge deceleration and maximum acceleration, gamma _a And gamma _b The maximum deceleration and the maximum acceleration of the unmanned plane at the pitch angle.

Preferably, the reward function is:

R(s，a)＝R ₁ (t)+R ₂ (t)+R ₃ (t)+R ₄ (t)

wherein R is ₁ (t) reward generated by the distance change between the unmanned aerial vehicle and the obstacle, wherein the reward is smaller when the unmanned aerial vehicle is closer to the obstacle within the safe distance from the obstacle, and the reward is a normal number when the unmanned aerial vehicle is out of the safe distance;

R ₂ (t) reward generated by speed change when the unmanned aerial vehicle detects that the barrier threatens the flight of the unmanned aerial vehicle; when flight conflict exists between the unmanned aerial vehicle and the barrier, obtaining a negative reward; otherwise, flight conflict does not exist, and a positive reward is obtained;

R ₃ (t) rewards are generated by the distance change between the unmanned aerial vehicle and the navigation task target point, and the rewards are larger when the distance is closer;

R ₄ (t) represents the reward generated by the change in distance between the drone and the temporary sub-target points, with rewards increasing with closer distance.

Preferably, the specific content of performing obstacle avoidance training by using the depth deterministic strategy gradient algorithm in S5 includes:

the algorithm comprises a strategy network based on a strategy, a target strategy network, a value network based on a value and a target value network;

the unmanned aerial vehicle obtains the current state s through a policy network _t Action a of _t Interacting with the environment to obtain the next state s _t+1 Wherein the values of the state and the action meet the constraints of the state space and the action space, and the current reward r is calculated through the reward function _t And the reward r in the next state _t+1 ；

Saving S _t ,a _t ,r _t+1 ,S _t+1 And fourthly, randomly selecting N samples from the experience pool to train the network when the samples reach the condition of starting trainingAnd updating the parameters.

A multi-unmanned aerial vehicle global and local path intelligent planning system comprises a global path planning module, an acquisition module, a local path planning module and a training module;

the global path planning module is used for constructing a grid map according to the starting point and the target point of each unmanned aerial vehicle navigation task, obtaining a global path planning model by using an improved bidirectional search A star algorithm, planning a global path, obtaining a reference global navigation path of each unmanned aerial vehicle, and uniformly selecting a plurality of sub target points in the reference global navigation path;

the acquisition module is used for acquiring barrier motion state information in the surrounding environment of the unmanned aerial vehicle through a millimeter wave radar and acquiring flight state information of the unmanned aerial vehicle, wherein the barrier motion state information comprises the distance between a barrier and the unmanned aerial vehicle, and position information and motion speed information of the barrier relative to the unmanned aerial vehicle; the flight state information of the unmanned aerial vehicle comprises unmanned aerial vehicle position information and flight speed information;

the local path planning module is used for judging whether to plan a local obstacle avoidance path according to the relative position relationship between the unmanned aerial vehicle and an obstacle, judging whether the unmanned aerial vehicle and the obstacle generate flight conflict according to the obstacle motion state information and the unmanned aerial vehicle flight state information, realizing control over the action of the unmanned aerial vehicle on a continuous action space by using an obstacle avoidance strategy of the local path planning model, avoiding the obstacle, and tracking the unmanned aerial vehicle to a global navigation path;

the training module is used for designing a reward function, acquiring a state space and an action space, performing obstacle avoidance training by adopting a depth certainty strategy gradient algorithm, realizing control over the action of the unmanned aerial vehicle on a continuous action space and acquiring a local path planning model;

the global path planning model and the local path planning model are used for planning paths of all unmanned planes in the unmanned plane cluster.

Compared with the prior art, the method and the system for intelligently planning the global and local paths of the multiple unmanned aerial vehicles have the advantages that:

(1) The unmanned aerial vehicle global and local path intelligent planning method is provided for multi-type obstacle environments of multiple unmanned aerial vehicles, compared with single global or local path planning, the unmanned aerial vehicle global and local path intelligent planning method is more flexible, and each unmanned aerial vehicle in an unmanned aerial vehicle cluster has the global path planning and local dynamic obstacle avoidance capabilities;

(2) The invention realizes the control of the continuous action output of the multi-unmanned aerial vehicle system through the depth certainty strategy gradient algorithm, and has the advantages of applicability to dynamic unknown complex environment, strong real-time performance and high efficiency;

(3) The unmanned aerial vehicle system simultaneously considers the flight state information of the unmanned aerial vehicle and the motion state information of the barrier, judges whether to adopt obstacle avoidance measures according to the threat degree of the barrier, and designs a reward function according to the judgment, thereby improving the path planning and obstacle avoidance efficiency of the multi-unmanned aerial vehicle system;

(4) The invention uses the millimeter wave radar to collect the environmental data around each unmanned aerial vehicle and obtain the obstacle information, compared with the laser radar, the detection range is wider, the penetration capability of the guide head to the penetrating fog, smoke, dust and the like is strong, the all-weather work can be realized, the adaptability to the environment is strong, and the applicable range is wider.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart of an intelligent planning method for global and local paths of multiple unmanned aerial vehicles according to the present invention;

fig. 2 is a schematic diagram illustrating an intelligent planning method for global and local paths of multiple unmanned aerial vehicles according to the present invention;

fig. 3 is a schematic diagram of dividing a flight conflict range and a non-flight conflict range of the unmanned aerial vehicle relative to an obstacle, provided by the invention;

fig. 4 is a flowchart illustrating obstacle avoidance training performed according to a depth deterministic strategy gradient algorithm according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention discloses a multi-unmanned aerial vehicle global and local path intelligent planning method, which comprises the following steps as shown in figures 1 and 2:

s2, acquiring barrier motion state information in the surrounding environment of the unmanned aerial vehicle through a millimeter wave radar, and acquiring flight state information of the unmanned aerial vehicle;

s4, judging whether the unmanned aerial vehicle and the obstacle generate flight conflicts or not according to the obstacle motion state information and the unmanned aerial vehicle flight state information, designing a reward function according to the flight conflicts, and generating a corresponding local obstacle avoidance strategy;

In order to further implement the above technical solution, the improved bidirectional search a-star algorithm in S1 includes:

s11, optimizing the path planning;

specifically, the method comprises the following steps: by adjusting the heuristic function of the A star algorithm, the path which quickly reaches the destination is more prone to be searched when the search is started, and the shortest path which reaches the destination is more prone to be searched when the search is finished;

in this embodiment, the improved heuristic function is:

F(n)＝G(n)+w*H(n)

wherein n is a current node, F (n) is an estimated value from a starting point to a target point, G (n) is an estimated value from a starting point to a current node, H (n) is an estimated value from a current node to a target point, and omega is a dynamic coefficient; the value of omega is reduced along with the reduction of the diagonal distance from the current node to the target point;

s12, smoothing the flight path;

specifically, the method comprises the following steps: smoothing the navigation path by using a quasi-uniform B-spline curve, wherein the quasi-uniform B-spline curve adopts a segmented continuous polynomial;

the whole curve uses a complete expression, but the internal quantity is one section, so that the optimization effect is better compared with that of a Bezier curve under the same order, and the modification of the local path planning is facilitated.

In one embodiment, each curve segment is determined by continuously adjacent 4 track control points, and the navigation path is smoothed.

In this embodiment, the improved bidirectional search a-star algorithm can greatly shorten the path search time and improve the search efficiency.

In order to further implement the technical scheme, the obstacle motion state information comprises the distance between the obstacle and the unmanned aerial vehicle, and position information and motion speed information of the obstacle relative to the unmanned aerial vehicle; the flight state information of the unmanned aerial vehicle comprises unmanned aerial vehicle position information and flight speed information.

In the embodiment, the millimeter wave radar is arranged on the left side, the right side, the front side and the rear side of the unmanned aerial vehicle and used for collecting information of obstacles around the unmanned aerial vehicle, and the millimeter wave radar has strong penetration capability to mist, smoke, dust and the like, has higher ranging accuracy and longer ranging range and provides powerful guarantee for the safe flight of a multi-unmanned aerial vehicle system under complex climate and environmental conditions; the flight state information of the unmanned aerial vehicle is obtained through monitoring of the ground station.

In order to further implement the technical scheme, when the speed direction of the unmanned aerial vehicle relative to the obstacle and the position direction of the obstacle relative to the unmanned aerial vehicle are detected to be in the same quadrant, local path planning is carried out.

Specifically, the method comprises the following steps:

1. when the barrier is in the front of the unmanned aerial vehicle on the right, under the track coordinate system, when the following three conditions are met, the unmanned aerial vehicle immediately avoids the barrier: a. the projection of the unmanned aerial vehicle on the xoy plane relative to the speed direction of the obstacle is in a first quadrant, b, the obstacle approaches to the flying height of the unmanned aerial vehicle, and c, in a preset time period, the obstacle can threaten the unmanned aerial vehicle to fly; if not satisfied above condition, unmanned aerial vehicle navigates according to original flight path.

2. When the barrier is in the left front of the unmanned aerial vehicle, under a track coordinate system, when the following three conditions are met, the unmanned aerial vehicle immediately avoids the barrier: a. the projection of the unmanned aerial vehicle on the xoy plane relative to the speed direction of the barrier is in a second quadrant, b, the barrier is close to the flying height of the unmanned aerial vehicle, and c, the barrier can threaten the flying of the unmanned aerial vehicle within a preset time period; if not satisfied above condition, unmanned aerial vehicle navigates according to original flight path.

3. When the barrier is at the left rear of the unmanned aerial vehicle, under a track coordinate system, when the following three conditions are met, the unmanned aerial vehicle immediately avoids the barrier: a. the projection of the unmanned aerial vehicle on the xoy plane relative to the speed direction of the obstacle is in a third quadrant, b, the obstacle approaches to the flying height of the unmanned aerial vehicle, and c, in a preset time period, the obstacle can threaten the unmanned aerial vehicle to fly; if not satisfied above condition, unmanned aerial vehicle navigates according to original flight path.

4. When the barrier is behind the unmanned aerial vehicle right, under the track coordinate system, when satisfying following three condition, unmanned aerial vehicle carries out avoiding of barrier immediately: a. the projection of the unmanned aerial vehicle on the xoy plane relative to the speed direction of the obstacle is in the fourth quadrant, b, the obstacle approaches to the flying height of the unmanned aerial vehicle, and c, in a preset time period, the obstacle can threaten the unmanned aerial vehicle to fly; if not satisfied above condition, unmanned aerial vehicle navigates according to original flight path.

In order to further implement the above technical solution, as shown in fig. 3, a specific method for determining whether the unmanned aerial vehicle and the obstacle generate a flight conflict in S4 includes:

simplifying the unmanned aerial vehicle into a particle A, regarding the obstacle as an obstacle circle with a safe radius [ [ O ] ], and the distance between the two points A and O is d _i The line segment OA is on the straight line l, and the relative velocity vector of the unmanned aerial vehicle and the obstacle is v _uioi ：

v _uioi ＝v _ui -v _oi

obtaining a tangent line l passing through the point A and tangent to the obstacle circle ₁ And l ₂ Straight line l and tangent line l ₁ Or l ₂ The included angle between is alpha _i Relative velocity vector v _uioi And the straight line l form an angle beta _i The radius of the obstacle circle is R _i When is beta _i ≤α _i In time, a flight conflict exists between the unmanned aerial vehicle and the obstacle; otherwise, when beta _i ＞α _i There is no flight conflict.

real-time course angle information, gamma, for the ith drone _i For real-time pitch angle information of the unmanned aerial vehicle, o is the real-time position vector information of the obstacle detected by the millimeter wave radar, v _o The real-time velocity vector information of the obstacle detected by the millimeter wave radar, d is the real-time distance vector information of the obstacle and the millimeter wave radar, p is the position information of a real-time sub-target point on the global path, and g is the position information of the global path terminal.

is real-time course angle information, gamma, of the drone _i Real-time pitch angle information of the unmanned aerial vehicle;

v _ui 、

and gamma _i The variation range of (A) is as follows:

v _ui ∈[v _min ，v _max ]∩[v-v _a Δt，v+v _b Δt]

γ _i ∈[0，π]∩[γ-γ _a Δt，γ+γ _b Δt]

and

for maximum deceleration and maximum acceleration, gamma, of the drone at the heading angle _a And gamma _b The maximum deceleration and the maximum acceleration of the unmanned plane at the pitch angle.

In order to further implement the above technical solution, the reward function is:

R(s，a)＝R ₁ (t)+R ₂ (t)+R ₃ (t)+R ₄ (t)

wherein R is ₁ (t) is the reward that the distance between unmanned aerial vehicle and the barrier changes and produces, and in unmanned aerial vehicle and barrier safe distance, the nearer reward is littleer apart from the barrier, and when unmanned aerial vehicle was outside safe distance, the reward was the normal number.

R ₂ (t) reward generated by speed change when the unmanned aerial vehicle detects that the obstacle threatens the flight of the unmanned aerial vehicle, and when the unmanned aerial vehicle and the obstacle have flight conflict, namely beta _i ≤α _i Then, a negative reward is obtained; otherwise, flight conflict does not exist, and a positive reward is obtained;

In the present embodiment, it is preferred that,

wherein R is ₁ (t) reward for the change in distance between the drone and the obstacle, the greater the distance, in order to make the drone avoid the obstacle, r ₁ Is a negative constant, representing the penalty obtained when the drone enters the obstacle threat zone, d _o For the distance between the unmanned aerial vehicle and the edge of the non-bulked obstacle at different times, D _o Width expanded for obstacle, D _u Is the safe distance, k, between the unmanned aerial vehicle and the obstacle ₁ Is a negative reward and punishment coefficient, v _r The navigation speed of the unmanned aerial vehicle relative to the barrier;

wherein R is ₂ (t) reward produced by speed change when the unmanned aerial vehicle detects that the obstacle threatens the flight of the unmanned aerial vehicle, in order to make the unmanned aerial vehicle sail in the range of non-flight conflict, r ₂ The negative constant represents the punishment obtained by the unmanned aerial vehicle navigating in the flight conflict range; r is ₃ The number is a normal number and represents the reward obtained when the unmanned plane sails in a non-flight conflict range;

wherein R is ₃ (t) is the reward generated by the distance change between the unmanned aerial vehicle and the navigation task target point, the closer the distance is, the greater the reward is, the purpose is to ensure that the unmanned aerial vehicle deviates from the navigation task target point as little as possible in the obstacle avoidance process, r ₄ Is a normal number representing the reward obtained when arriving at the target point of the navigation mission, k ₂ Is a positive reward or punishment coefficient, d _g For the distance between the unmanned aerial vehicle and the sailing task target point at different moments, when d _g ＜d _m1 When the unmanned aerial vehicle reaches the target point of the navigation task, v _u The navigation speed of the unmanned aerial vehicle;

wherein R is ₄ (t) represents the reward generated by the change of the distance between the unmanned aerial vehicle and the temporary sub-target points, the closer the distance is, the greater the reward is, the purpose is to ensure that the unmanned aerial vehicle can return to the global path planned by the improved bidirectional search A star algorithm by tracking the temporary sub-target points on the global path after obstacle avoidance is finished, and r is ₅ Is a normal number, indicating that a temporary sub-goal is reachedReward for Point acquisition, k ₃ Is a positive reward or punishment coefficient, d _p Indicating the distance between the unmanned aerial vehicle and the temporary sub-target point at different moments, when d _p ＜d _m2 When, it means that the unmanned aerial vehicle has arrived at the temporary child target point, v _u The navigation speed of the unmanned aerial vehicle.

In order to further implement the above technical solution, the specific content of performing the obstacle avoidance training by using the depth deterministic strategy gradient algorithm in S5 includes:

the unmanned aerial vehicle obtains the current state s through a policy network _t Action a of _t Interacting with the environment to obtain the next state s _t+1 Wherein the values of the state and the action meet the constraints of the state space and the action space, and the current reward r is calculated through the reward function _t And the prize r in the next state _t+1 。

Saving S _t ，a _t ，r _t+1 ，S _t+1 And (4) in an experience pool, randomly selecting N samples from the experience pool to train the network to complete parameter updating when the samples reach a training starting condition, and stopping parameter updating when a maximum number of rounds of parameter updating is reached or a network convergence condition is reached to obtain a planned local obstacle avoidance path.

In one embodiment, as shown in fig. 4, the specific content of the depth deterministic strategy gradient algorithm for obstacle avoidance training is as follows:

s51, initializing parameters theta and eta of a strategy network pi (S; theta) and a value network q (S, a; eta); initializing parameters theta 'and eta' of a target strategy network pi '(s; theta') and a target value network q '(s, a; eta'); initializing a target network learning rate epsilon, wherein the batch size of a learning sample at each time is N, and the size of an experience pool is Z;

s52, according to the state space and the action space, the current S is processed through the strategy network _t Output action a under the State _t ＝π(s _t ) Wherein the selection of states and actions satisfies the constraints of the state space and the action space;

s53, executing action, and obtaining reward r from environment according to reward function _t+1 And the state S at the next moment _t+1 ；

S54 saves S _t ，a _t ，r _t+1 ，S _t+1 And (4) in an experience pool, when the samples reach the condition of starting training, randomly selecting N samples from the experience pool,

s55, updating the policy network and the value network:

y _i ＝r _i+1 +γq(s _i+1 ，π(s _i+1 ，θ′)，η′)

s56, updating the target value network target strategy network once at intervals:

η′←εη+(1-ε)η′

θ′←εθ+(1-ε)θ′

s57 returns to S52 to loop until the maximum number of rounds is reached or a network convergence condition is reached.

the system comprises an acquisition module and an acquisition module, wherein the acquisition module is used for acquiring barrier motion state information in the surrounding environment of the unmanned aerial vehicle through a millimeter wave radar and acquiring flight state information of the unmanned aerial vehicle, and the barrier motion state information comprises the distance between a barrier and the unmanned aerial vehicle, and position information and motion speed information of the barrier relative to the unmanned aerial vehicle; the flight state information of the unmanned aerial vehicle comprises unmanned aerial vehicle position information and flight speed information;

the local path planning module is used for dividing a flight conflict range and a non-flight conflict range of the unmanned aerial vehicle relative to the barrier according to the barrier motion state information and the unmanned aerial vehicle flight state information, generating a corresponding obstacle avoidance strategy, uniformly selecting a plurality of sub-target points in the reference global navigation path, and tracking the sub-target points closest to the reference global navigation path as temporary sub-target points after the unmanned aerial vehicle obstacle avoidance is finished;

In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A multi-unmanned aerial vehicle global and local path intelligent planning method is characterized by comprising the following steps:

s4, judging whether the unmanned aerial vehicle and the obstacle generate flight conflicts or not according to the obstacle motion state information and the unmanned aerial vehicle flight state information, designing a reward function, and generating a corresponding local obstacle avoidance strategy;

2. The method according to claim 1, wherein the improved bidirectional search a-star algorithm in S1 comprises:

s11, optimizing the path planning;

s12, smoothing the flight path;

3. The method according to claim 1, wherein the obstacle motion state information includes a distance between an obstacle and the drone, position information and motion speed information of the obstacle relative to the drone; the flight state information of the unmanned aerial vehicle comprises unmanned aerial vehicle position information and flight speed information.

4. The method according to claim 3, wherein the local path planning is performed when the speed direction of the UAV relative to the obstacle and the position direction of the obstacle relative to the UAV are detected to be in the same quadrant.

5. The method for intelligently planning the global and local paths of multiple unmanned aerial vehicles according to claim 1, wherein the specific method for judging whether the unmanned aerial vehicles and the obstacles generate flight conflicts in the step S4 is as follows:

v _uioi ＝v _ui -v _oi

obtaining a tangent line l passing through the point A and tangent to the obstacle circle ₁ And l ₂ Straight line l and tangent line l ₁ Or l ₂ The included angle between them is alpha _i Relative velocity vector v _uioi And the straight line l form an angle beta _i The radius of the obstacle circle is R _i When is beta _i ≤α _i In time, a flight conflict exists between the unmanned aerial vehicle and the obstacle; otherwise, when beta _i >α _i There is no flight conflict.

6. The method according to claim 1, wherein the state space of the unmanned aerial vehicle after interaction with the environment in S5 is:

real-time course angle information, gamma, for the ith drone _i For real-time pitch angle information of the unmanned aerial vehicle, o is the real-time position vector information of the obstacle detected by the millimeter wave radar, v _o The real-time speed vector information of the obstacle detected by the millimeter wave radar, d is the real-time distance vector information of the obstacle and the millimeter wave radar, p is the position information of a real-time sub-target point on the global path, and g is the position information of a global path end point.

7. The method according to claim 1, wherein an action space after interaction between the unmanned aerial vehicle and the environment in S5 is:

v _ui 、

and gamma _i The variation range of (A) is as follows:

v _ui ∈[v _min ,v _max ]∩[v-v _a Δt,v+v _b Δt]

γ _i ∈[0,π]∩[γ-γ _a Δt,γ+γ _b Δt]

wherein v is _min And v _max Minimum and maximum speed, v, of the drone _a And v _b For the maximum deceleration and the maximum acceleration of the unmanned aerial vehicle in the forward direction,

and

for maximum deceleration and maximum acceleration of the unmanned plane on the heading angle, gamma _a And gamma _b The maximum deceleration and the maximum acceleration of the unmanned aerial vehicle at the pitch angle.

8. The method of claim 1, wherein the reward function is:

R(s,a)＝R ₁ (t)+R ₂ (t)+R ₃ (t)+R ₄ (t)

R ₂ (t) reward generated by speed change when the unmanned aerial vehicle detects that the obstacle threatens the flight of the unmanned aerial vehicle, and when flight conflict exists between the unmanned aerial vehicle and the obstacle, a negative is obtainedA reward; otherwise, flight conflict does not exist, and a positive reward is obtained;

R ₃ (t) rewards generated by the distance change between the unmanned aerial vehicle and the navigation task target point, wherein the closer the distance is, the larger the rewards are;

R ₄ (t) awards are generated for the distance change between the unmanned aerial vehicle and the temporary sub-target points, and the awards are larger when the distance is closer.

9. The method for intelligently planning the global and local paths of multiple unmanned aerial vehicles according to claim 1, wherein the specific content of performing obstacle avoidance training by using a depth deterministic strategy gradient algorithm in S5 comprises:

the unmanned aerial vehicle obtains the current state S through the policy network _t Action a of _t Interacting with the environment to obtain the next state S _t+1 Wherein the values of the state and the action meet the constraints of the state space and the action space, and the current reward r is calculated through the reward function _t And the prize r in the next state _t+1 ；

Saving S _t ,a _t ,r _t+1 ,S _t+1 And (4) in an experience pool, when the samples reach the condition of starting training, randomly selecting N samples from the experience pool to train the network to finish parameter updating.

10. An intelligent planning system for global and local paths of multiple unmanned aerial vehicles based on the intelligent planning method for global and local paths of multiple unmanned aerial vehicles of any one of claims 1-9 is characterized by comprising a global path planning module, an acquisition module, a local path planning module and a training module;

the global path planning module is used for constructing a grid map according to the starting point and the target point of each unmanned aerial vehicle navigation task, obtaining a global path planning model by using an improved bidirectional search A star algorithm, and carrying out global path planning to obtain a reference global navigation path of each unmanned aerial vehicle;

the acquisition module is used for acquiring barrier motion state information in the surrounding environment of the unmanned aerial vehicle through a millimeter wave radar and acquiring flight state information of the unmanned aerial vehicle;