CN117148857A

CN117148857A - Unmanned aerial vehicle path planning inspection method applied to complex environment detection

Info

Publication number: CN117148857A
Application number: CN202310783208.4A
Authority: CN
Inventors: 袁伟康; 赵晓东; 孙延旭; 杨帆; 陈泽林; 吕强; 陈张平; 潘一帆; 俞永杰
Original assignee: State Grid Zhejiang Electric Power Co Ltd Hangzhou Qiantang District Power Supply Co; Hangzhou Dianzi University
Current assignee: State Grid Zhejiang Electric Power Co Ltd Hangzhou Qiantang District Power Supply Co; Hangzhou Dianzi University
Priority date: 2023-06-29
Filing date: 2023-06-29
Publication date: 2023-12-01

Abstract

The application discloses an unmanned aerial vehicle path planning and inspection method applied to complex environment detection. According to the application, the global optimal routing inspection path generated by training the DRLGA model is input into a near-end strategy optimization algorithm (PPO) model, so that the unmanned aerial vehicle makes a corresponding decision according to the current environment and state to avoid obstacles in the path, an efficient path is explored and planned in the environment, and finally the routing inspection and monitoring of farm plants are completed on the basis.

Description

Unmanned aerial vehicle path planning inspection method applied to complex environment detection

Technical Field

The application belongs to the technical field of path planning, and relates to an unmanned aerial vehicle (Unmanned Aerial Vehicle) path planning inspection method applied to complex environment detection.

Background

In agricultural production, pests are important factors affecting crop yield, and monitoring of crop pests is required in order to ensure good crop quality and yield. The agricultural inspection can efficiently collect and record plant disease and pest information in the environment, quickly identify the disease and pest types and take effective control measures, thereby improving the crop yield. Therefore, agricultural inspection has a critical role in agricultural production.

The traditional agricultural inspection mode is manual inspection, but the mode consumes a great deal of manpower and has low efficiency due to the large number of plants. In order to improve the agricultural inspection efficiency, a semi-automatic inspection mode is adopted by many enterprises. However, this approach presents some challenges in complex environments; for example, sensor modules are vulnerable to damage or inaccurate data acquisition in diverse plant layouts or environments of varying topography and topography, and their fixed locations limit the flexibility of the inspection system. In addition, static barriers exist in a complex environment, and the unmanned aerial vehicle is required to have obstacle avoidance capability.

In order to cope with the complex environmental factors, an efficient agricultural inspection method is needed, so that the plant diseases and insect pests can be effectively prevented and treated, and the requirement of avoiding static obstacles can be met. Through adopting unmanned aerial vehicle technique, adopt nimble obstacle avoidance algorithm and sensing technology, can realize accurate environmental perception and data acquisition. The unmanned aerial vehicle can flexibly plan a path in a complex environment, avoid static obstacles, and combine sensor data to carry out image analysis in a later stage, so that the inspection efficiency and accuracy are improved.

Disclosure of Invention

In order to solve the problems, the application designs an unmanned aerial vehicle path planning inspection scheme applied to complex environment detection, which utilizes the advantages of small size, high flexibility, high operation efficiency, good maneuverability, carrying various sensors and the like of an unmanned aerial vehicle and is used for plant inspection and agricultural information acquisition and monitoring tasks. The scheme builds an unmanned aerial vehicle path planning model in a farm environment based on a preset unmanned aerial vehicle and a patrol point.

The application provides an unmanned aerial vehicle path planning and inspection method applied to complex environment detection, which comprises the following steps:

step 1: generating an initial population using a greedy algorithm, encoding each solution as a real vector, and evaluating the fitness of each individual using a fitness function;

a variation crossing mechanism in a genetic algorithm is reconstructed by adopting a SAC (Soft Actor-Critic) algorithm, and a DRLGA model is constructed;

determining an objective function of a unmanned aerial vehicle traversal inspection point problem, namely a travel promoter problem (TSP), for solving the unmanned aerial vehicle inspection point path planning problem;

step 2: setting DRLGA model parameters, and designing a population state space, a genetic operation action space and a reward function;

training to obtain a global optimal inspection path of the unmanned aerial vehicle traversing inspection points;

step 3: importing a global optimal routing inspection path into a near-end policy optimization algorithm (PPO) model training environment, and defining state representation of the unmanned aerial vehicle according to information related to the unmanned aerial vehicle, a target point and the environment;

calculating the course angle and the terminal state design which are required to be adjusted of the unmanned aerial vehicle, and adopting a climbing tiger algorithm to perform obstacle avoidance path optimization and setting a loss function and a reward function;

step 4: and training and optimizing the near-end strategy optimization algorithm model to generate an optimal obstacle avoidance path.

Another aspect of the present application provides an unmanned aerial vehicle path planning inspection apparatus applied to complex environment detection, including: the unmanned aerial vehicle path planning and inspection method applied to complex environment detection is realized when the processor executes the program.

A further aspect of the present application provides a computer readable storage medium storing a computer program for executing a method of unmanned aerial vehicle path planning and inspection for complex environment detection as described above.

Compared with the prior art, the application has the following advantages:

1. a novel scene state representation and rewarding function is presented that can effectively map an environment to a drone maneuver. The new scene state representation method may contain more information and better adapt to the characteristics of unmanned plane control problems, and the trained model can generate continuous course angle control instructions and speed instructions.

2. The crossover mutation mechanism in the genetic algorithm is reconstructed, and the SAC algorithm is adopted to generate an action strategy for controlling the population evolution, so that the conventional crossover mutation mechanism is replaced. The application can better balance the exploration and utilization of the search space and effectively maintain the diversity of the population. Meanwhile, the method can also avoid the problems of limitation and disturbance possibly caused by the traditional cross mutation mechanism, thereby improving the searching efficiency and performance of the genetic algorithm.

3. The bonus function is introduced into the loss function, providing more direct feedback information to enhance guidance of the unmanned action. The drone can evaluate its own behaviour more accurately and can adjust its behaviour strategy faster to maximize the jackpot. The learning efficiency and performance of the unmanned aerial vehicle can be improved, so that the learning process of the unmanned aerial vehicle is guided more effectively.

4. In course angle control, a climbing tiger algorithm is provided, and the application of the climbing tiger algorithm has the advantage of improving the obstacle avoidance capability and the path planning effect of the unmanned aerial vehicle. The algorithm can intelligently calculate the path, avoid collision with the obstacle, and ensure safety while considering the shortest path. The innovative solution provides effective support for the flight safety and autonomy of the unmanned aerial vehicle, so that the unmanned aerial vehicle can cope with complex environments and reach the target site quickly.

5. On the basis of the DRLGA algorithm model, a PPO algorithm model is introduced to better process a complex environment, the TSP problem is not a simple distance between points in practical application, and complex factors such as obstacles are often included. These factors make the state space and action space of the problem more complex, while reinforcement learning can handle such complex environments and action spaces, but when dealing with TSP problems, obstacles need to be avoided when encountering them, which increases the complexity of the problem. Meanwhile, the DRLGA can process complex optimization problems, has global searching capability and can find a global optimal solution. Therefore, by fusing the DRLGA with the PPO, the optimal path avoiding the obstacle can be quickly learned on the basis of the DRLGA through the learning capability of the PPO, so that the efficiency and the accuracy are improved when the TSP problem of the complex environment is processed. The method can also adapt to a plurality of different obstacle layouts, and therefore has wider applicability.

Drawings

FIG. 1 is a schematic diagram of a technical path of the present application;

FIG. 2 is a DRLGA algorithm model;

FIG. 3 is a flow chart of a DRLGA algorithm model implementation;

fig. 4 is a schematic view of an unmanned aerial vehicle inspection and obstacle avoidance environment;

FIG. 5 is a PPO skeleton diagram;

FIG. 6 is a diagram of a mountain climbing tiger algorithm;

fig. 7 is a schematic view of the apparatus structure of the present application.

Detailed Description

The technical solutions in the embodiments of the present application are clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application.

The application adopts Genetic Algorithm (GA) as basic optimization method, and adopts SAC (Soft Actor-Critic) algorithm to reconstruct the mutation crossing mechanism. And establishing a DRLGA self-learning model by taking the minimum routing inspection path length as a target, solving a unmanned aerial vehicle traversal routing inspection point problem, namely a travel promoter problem (TSP) problem, and intelligently selecting an evolution strategy of the population by using a SAC algorithm so as to find a complete and lowest-cost global optimal routing inspection path which is accessed once for each routing inspection point. However, in practical application, due to the uncertainty and complexity of the environment, various obstacles can be encountered by the unmanned aerial vehicle on the inspection path, the global optimal inspection path generated by training the DRLGA model is input into a near-end policy optimization algorithm (PPO) model, so that the unmanned aerial vehicle makes a corresponding decision according to the current environment and state to avoid the obstacles in the path, an efficient path is explored and planned in the environment, and finally the inspection and monitoring of farm plants are completed on the basis.

As shown in fig. 1, an embodiment of the present application provides an unmanned aerial vehicle path planning and inspection method applied to complex environment detection, including the following steps:

step 1: an initial population is generated using a greedy algorithm, each solution is encoded as a real vector, and fitness of each individual is evaluated using a fitness function. And constructing a DRLGA model by adopting a mutation crossing mechanism in a SAC algorithm reconstruction genetic algorithm. And determining an objective function of the TSP problem, and solving the unmanned aerial vehicle inspection point path planning problem.

Step 2: setting DRLGA model parameters, and designing a population state space, a genetic operation action space and a reward function. And training to obtain an accurate and efficient global optimal inspection path for the unmanned aerial vehicle to traverse the inspection point.

Step 3: and importing the global optimal routing inspection path trained by the DRLGA model into a PPO model training environment, and defining state representation of the unmanned aerial vehicle according to information related to the unmanned aerial vehicle, the target point and the environment. And calculating the course angle and the terminal state design which are required to be adjusted of the unmanned aerial vehicle, and adopting a climbing tiger algorithm to perform obstacle avoidance path optimization and setting a loss function and a reward function.

Step 4: optimizing is trained by a near-end strategy optimization algorithm (PPO) model to generate an optimal obstacle avoidance path. Meanwhile, through fusion of the DRLGA algorithm model and the PPO algorithm model, the problem that an unmanned aerial vehicle can traverse the inspection point in an optimal path is well solved, and an obstacle can be well avoided.

In a certain preferred embodiment, the specific steps of step 1 are as follows:

step 1.1 by inputting preset inspection point position information in the field, a greedy algorithm is used for generating an initial population, and each individual represents a solution.

Step 1.2 encodes each solution, converting it into a chromosomal representation. Binary coding or real coding may generally be used. In this embodiment, real number encoding is used to convert the value of each argument to a real number, forming an n-dimensional vector. The fitness of each individual is evaluated using a fitness function, and the relative contribution of each individual in the population is determined.

Step 1.3, a mutation crossing mechanism in a genetic algorithm is reconstructed by adopting a SAC algorithm, and a DRLGA model is constructed, see fig. 2 and 3. And the SAC algorithm is used for controlling an action strategy for controlling the evolution of the population so as to replace the traditional variation crossover mechanism, and the evolution process of the population is optimized to improve the algorithm performance.

Step 1.4 construction of TSP problem objective function, wherein the TSP problem is a classical combination optimization problem, belongs to NP difficult class, and a mathematical model is used for searching a global optimal inspection path V= [ V ] ₁ ,v ₂ ,...,v _n ]The path length is minimized while satisfying the access order of the patrol points and the constraint condition of returning to the start point.

The objective function is used for quantifying the path length, wherein the objective function comprises the distance between the inspection points and the distance from the last inspection point to the starting point, and the objective function comprises the following formula:

v in _i Numbering inspection points, wherein i is more than or equal to 1 and less than or equal to n; n is the number of inspection points; d (v) _i ,v _i+1 ) Is the inspection point v _i To inspection point v _i+1 Distance, d (v) _n ,v ₁ ) For the last inspection point v _n To the first inspection point v ₁ I.e. the distance that finally needs to be returned to the starting point.

In a certain preferred embodiment, the specific steps of step 2 are as follows:

step 2.1, population state space design: in the evolution process, the path length of each individual in the population determines the quality of the current solution set and the size of the individual fitness value, and the difference value between the path length of each individual in the current population and the global optimal path length is taken as the population state s, and the population state s is at the moment of t _t Can be expressed as s _t ＝{D ₁ ,D ₂ ,...,D _m }，D _i The method comprises the following steps:

in the above formula, d is the set global optimal path length; c (C) ₁ As a constant, the neural network model is prevented from gradient explosion in the training process; d (D) _i (i e {1, 2.,. M }) is the state of the i-th individual in the population, m is the population size, and the entire population state space can be represented as s= [ S ] _j ]J e {1,2,., N }, N is the population state number.

Step 2.2 determining genetic manipulation action space:

A＝{a ₁ ,a ₂ }

in the above, a ₁ For cross operation, a ₂ Is a mutation operation.

Step 2.3 designing a reward function: the shorter the path of each individual in the population, the higher the quality of the current solution set, and the maximum probability of obtaining the optimal solution of the problem by genetic operation of the population is correspondingly obtained, and the reward function is set as follows:

in the aboveRepresented in population state s _t Down selection action->The obtained reward value, C ₂ Is a constant of the setting, +.>To perform an action->Average value of the path length of the latter group; />And selecting action i for the unmanned aerial vehicle at the time t. If the selected action enables the whole solution set to be closer to the global optimal path length d, a larger prize value can be obtained.

And 2.4, finally, selecting proper genetic operation actions to act on the population in the crossover and mutation operations according to the current population state by taking the accumulated rewards for maximizing the population evolution process as a target, so as to realize the control of the population evolution process.

Training the DRLGA model, and continuously performing the evolution process until a preset termination condition is reached, namely the maximum iteration number or population fitness reaches a certain threshold.

In a certain preferred embodiment, the specific step 3 is as follows:

step 3.1, firstly determining an environment for path planning, referring to fig. 4, presetting a starting point and a target point of an aircraft, an obstacle and the like, and importing a globally optimal routing inspection path trained by a DRLGA model into a PPO model training environment, wherein the PPO model is shown in fig. 5.

Step 3.2 define the state representation: the state of the unmanned aerial vehicle at the time t is expressed as gamma _t Will gamma _t Divided into two parts, i.eWherein->Including information associated with the drone itself and the target point, < >>Including information associated with the environment, such as information of obstacles. Set->Information representing obstacle i>Represented as

Wherein d is _g Vx represents the speed in the x-direction of the drone and vy represents the speed in the y-direction of the drone, which is the distance of the drone to the target point.

Information w of obstacle i _t ⁱ The method comprises the following steps:

in the above-mentioned method, the step of,delta for the position of obstacle i on the y-axis _i For the distance between the unmanned aerial vehicle and the centre of obstacle i, < > j->Encouraging the drone to find a globally optimal solution, e.g., when the drone approaches an obstacle, the +.>The positive value of (2) indicates that the drone is to the right of the straight line connecting through the center of the obstacle and the destination, and therefore a small angle of counter-clockwise rotation is recommended.

Step 3.3, calculating a course angle to be adjusted of the unmanned aerial vehicle, and defining a course angle change limit of the controlled unmanned aerial vehicle at each time t as follows:

A _h ＝[-30°,30°]

the unmanned aerial vehicle selects action a at each instant t _h ∈A _h And changes its course angle, t+1, the course angle _t+1 The method comprises the following steps:

ψ _t+1 ＝ψ _t +a _h

step 3.4, terminal state design: when the distance between the unmanned aerial vehicle and the obstacle is smaller than the separation requirement, conflict can occur. When the operation of the unmanned aerial vehicle is deterministic, a buffer area is not required to be arranged, and in the static obstacle avoidance implementation with uncertainty, the uncertainty of the unmanned aerial vehicle position is considered, and the minimum interval distance is required to be arranged in a separation mode. The terminal state consists of two different types of states:

conflict state: the distance between the unmanned aerial vehicle and the obstacle is smaller than the separation requirement;

target state: the drone is within 2m of the target.

And 3.5, optimizing the obstacle avoidance path of the unmanned aerial vehicle, wherein in course angle control, the problem of the optimal path of the obstacle avoidance of the unmanned aerial vehicle is faced, namely, how to avoid the obstacle with the shortest path. To solve this problem, the present embodiment proposes a climbing tiger algorithm, which assists path planning by using the idea of climbing behavior of the climbing tiger plant, as shown in fig. 6.

The mountain climbing tiger plant is a vine plant which has strong climbing ability, can grow along the edge of an obstacle and find the most suitable growth path. The climbing behavior provides the teaching of the application, and can be applied to course angle control of the unmanned aerial vehicle so as to help the unmanned aerial vehicle find the shortest path to bypass the obstacle.

In course angle control, the present embodiment may consider the current position of the unmanned aerial vehicle as the position of the climbing tiger plant, and the obstacle edge as the path along which the climbing tiger plant climbs. By analyzing the position of the obstacle, the optimal moving direction of the next step of the current position of the unmanned aerial vehicle can be determined. Specifically, the unmanned opportunities are as close to the edge of the obstacle as possible, keep a certain distance and detour along the edge, thereby avoiding the obstacle. In this way, the drone can bypass the obstacle with the shortest path to reach the target location. Course angle control is as follows:

F_a＝k_attraction(P_a-B)/||P_a-B|| ² +a _h

F_r＝∑(k_r*(1/||P_a-B||-1/d_max)*(1/||P_a-B|| ² )*(P_a-B)/||P_a-B||)

ψ _adjust ＝F_a+F_r+a _h

in the above formula, f_a is the attractive force between the unmanned aerial vehicle and the obstacle, f_r is the repulsive force between the unmanned aerial vehicle and the obstacle, p_a is the current unmanned aerial vehicle position, B (x_b, y_b) is a point on the obstacle boundary, B represents a set of boundary points, and the attractive force and repulsive force of each boundary point to the unmanned aerial vehicle can be calculated.

k_extraction represents the guiding force of boundary points on the unmanned aerial vehicle, and is a guiding force constant, k_r is a repulsive force constant, P_a-B represents Euclidean distance, d_max is the maximum acting range of repulsive force, and ψ is the maximum acting range of repulsive force _adjust To add the attractive force F_a between the unmanned aerial vehicle and the obstacle and the repulsive force F_r between the unmanned aerial vehicle and the obstacle, and to add the unmanned aerial vehicle time t to select the action a _h The resulting total force.

Like this, unmanned aerial vehicle just can keep away the barrier along the edge of barrier, finds a comparatively safe route.

Step 3.6, a loss function and a reward function of the PPO model are set, and the reward function is set as follows:

in the above equation, R (γ, ψ) represents a prize obtained after the unmanned aerial vehicle selects the adjustment heading angle ψ in the state γ. Obtaining rewards when the unmanned aerial vehicle reaches a target state, giving a certain punishment in a conflict state, and obtaining linear terms-0.001 d in a rewarding function R (gamma, phi) _g The unmanned aerial vehicle is guided to approach the target point, and the shortest path can be planned by the constant penalty of each step.

The penalty function penalty term with the bonus function is introduced into the penalty function setting:

in the above equation, the first term of the loss function is the policy loss, which is used to maximize the expected return of the policy function. Where K is the number of samples sampled, pi _θ (ψ _i |γ _i ) Is a policy function, pi _θold (ψ _i |γ _i ) Is an old policy function, A (gamma _i ,ψ _i ) As a dominance function, a is a coefficient of a penalty term of the reward function for penalizing the unmanned aerial vehicle to perform bad actions or to have no behavior approaching the target location. By adjusting the value of a, the proportion of the bonus function in the loss function can be controlled.

In a preferred embodiment, the proximal policy optimization algorithm (PPO) model is trained and optimized in step 4 to generate an optimal obstacle avoidance path. Meanwhile, through fusion of the DRLGA algorithm model and the PPO algorithm model, the problem that an unmanned aerial vehicle can traverse the inspection point in an optimal path is well solved, and an obstacle can be well avoided.

The embodiment of the application also discloses unmanned aerial vehicle path planning inspection equipment applied to complex environment detection, which is shown in fig. 7 and comprises the following steps: the unmanned aerial vehicle route planning and inspection method applied to complex environment detection is realized when the processor executes the program.

The embodiment of the application also discloses a computer readable storage medium, wherein the storage medium stores a computer program, and the computer program is used for executing the unmanned aerial vehicle path planning and inspection method applied to complex environment detection.

Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program, which may be stored on a non-transitory computer readable storage medium and which, when executed, may comprise the steps of the above-described embodiments of the methods. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

A processor in the present application may include one or more processing cores. The processor performs the various functions of the application and processes the data by executing or executing instructions, programs, code sets, or instruction sets stored in memory, calling data stored in memory. The processor may be at least one of an application specific integrated circuit (Application Specific IntegratedCircuit, ASIC), a digital signal processor (Digital Signal Processor, DSP), a digital signal processing device (Digital Signal Processing Device, DSPD), a Programmable logic device (Programmable LogicDevice, PLD), a field Programmable gate array (Field Programmable Gate Array, FPGA), a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, and a microprocessor. It will be appreciated that the electronics for implementing the above-described processor functions may be other for different devices, and embodiments of the present application are not particularly limited.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.

The above embodiments are not intended to limit the scope of the present application, so: all equivalent changes in structure, shape and principle of the application should be covered in the scope of protection of the application.

Claims

1. The unmanned aerial vehicle path planning inspection method applied to complex environment detection is characterized by comprising the following steps of:

constructing a DRLGA model by adopting a variation crossing mechanism in a SAC algorithm reconstruction genetic algorithm;

determining an objective function of the TSP problem, and solving the unmanned aerial vehicle routing inspection point path planning problem;

step 3: importing the global optimal routing inspection path into a PPO model training environment, and defining state representation of the unmanned aerial vehicle according to information related to the unmanned aerial vehicle, a target point and the environment;

2. The unmanned aerial vehicle path planning and inspection method applied to complex environment detection according to claim 1, wherein the method comprises the following steps: the specific steps of the step 1 are as follows:

step 1.1, presetting inspection point position information in an input field;

using greedy algorithms to generate an initial population, each individual representing a solution;

step 1.2 encodes each solution, converting it into a chromosomal representation;

converting the value of each argument into a real number using real number coding, forming a multidimensional vector;

evaluating the fitness of each individual using a fitness function, determining the relative contribution of each individual in the population;

step 1.3, constructing a DRLGA model by adopting a variation crossing mechanism in a SAC algorithm reconstruction genetic algorithm;

step 1.4TSP problem objective function construction, wherein a mathematical model is used for searching a global optimal inspection path V= [ V ] ₁ ,v ₂ ,...,v _n ]And satisfies the objective function:

3. The unmanned aerial vehicle path planning and inspection method applied to complex environment detection according to claim 1, wherein the method comprises the following steps: the specific steps of the step 2 are as follows:

step 2.1, population state space design:

taking the difference value between each individual path length and the global optimal path length in the current population as a population state s;

step 2.2 determining genetic manipulation action space:

A＝{a ₁ ,a ₂ }

wherein a is ₁ For cross operation, a ₂ Is a mutation operation;

step 2.3 designing a reward function: if the selected action can enable the whole solution set to be closer to the global optimal path length, a larger rewarding value can be obtained;

and 2.4, selecting proper genetic operation actions to act on the population in the crossover and mutation operations according to the current population state by taking the accumulated rewards of the maximum population evolution process as targets, so as to realize the control of the population evolution process.

4. The unmanned aerial vehicle path planning inspection method applied to complex environment detection according to claim 3, wherein the method comprises the following steps: the bonus function is set to:

wherein the method comprises the steps ofRepresented in population state s _t Down selection action->The obtained reward value, C ₂ Indicating the constant of the setting,/->Representing execution of an action->Average value of the path length of the latter population,/->The action i selected by the unmanned aerial vehicle at the time t is shown.

5. The unmanned aerial vehicle path planning and inspection method applied to complex environment detection according to claim 1, wherein the method comprises the following steps: the specific steps of the step 3 are as follows:

step 3.1, determining an environment for path planning, presetting a starting point and a target point of an aircraft and an obstacle, and importing a global optimal routing inspection path trained by a DRLGA model into a near-end strategy optimization algorithm model training environment;

step 3.2 define the state representation: the state of the unmanned aerial vehicle at the time t is expressed as g _t Will g _t Divided into two parts, i.eWherein the method comprises the steps of/>Including information associated with the drone itself and the target point, < >>Including information associated with the environment;

step 3.3, calculating the course angle to be adjusted of the unmanned aerial vehicle, and limiting the course angle change of the controlled unmanned aerial vehicle at each time t to be A _h The definition is as follows:

A _h ＝[-30°,30°]

ψ _t+1 ＝ψ _t +a _h

step 3.4 terminal state design, the terminal state is composed of two different types of states:

target state: the unmanned aerial vehicle is within 2m from the target;

step 3.5, optimization of obstacle avoidance paths of the unmanned aerial vehicle: in course angle control, the idea of climbing actions of the climbing tiger plants is used for assisting path planning; the unmanned aerial vehicle is close to the edge of the obstacle as much as possible, keeps a certain distance and bypasses along the edge, so that the obstacle is avoided;

step 3.6, setting a loss function and a reward function of the near-end strategy optimization algorithm model, wherein the reward function is set as follows:

wherein R (gamma, phi) represents rewards obtained after the unmanned aerial vehicle selects to adjust the heading angle phi in the state gamma;

obtaining rewards when the unmanned aerial vehicle reaches a target state, giving certain punishments in conflict states,linear term in the reward function R (γ, ψ) -0.001d _g The unmanned aerial vehicle is guided to approach the target point, and the constant punishment fine of each step plans the shortest path.

6. The unmanned aerial vehicle path planning and inspection method applied to complex environment detection according to claim 5, wherein the method comprises the following steps:

set to-> Information indicating an obstacle i;

set to->Wherein d is _g Representing the distance from the unmanned aerial vehicle to the target point, vx represents the speed in the x direction of the unmanned aerial vehicle, and vy represents the speed in the y direction of the unmanned aerial vehicle;

information of obstacle iThe method comprises the following steps:

wherein the method comprises the steps ofDelta for the position of obstacle i on the y-axis _i Is the distance between the unmanned aerial vehicle and the center of the obstacle i; />The drone is encouraged to find a globally optimal solution.

7. The unmanned aerial vehicle path planning and inspection method applied to complex environment detection according to claim 6, wherein the method comprises the following steps: the course angle control is as follows:

F_a＝k_attraction(P_a-B)/||P_a-B|| ² +a _h

F_r＝∑(k_r*(1/||P_a-B||-1/d_max)*(1/||P_a-B|| ² )*(P_a-B)/||P_a-B||)

ψ _adjust ＝F_a+F_r+a _h

wherein f_a represents attractive force between the unmanned aerial vehicle and the obstacle, f_r represents repulsive force between the unmanned aerial vehicle and the obstacle, p_a represents the current unmanned aerial vehicle position, and B (x_b, y_b) represents a point on the obstacle boundary; k _ extraction represents the guiding force of the boundary point to the unmanned aerial vehicle, k _ r represents the repulsive force constant, i p_a-B i represents the euclidean distance, d_max represents the maximum range of action of repulsive force, ψ _adjust Representing the total resultant force.

8. Be applied to unmanned aerial vehicle path planning equipment of patrolling and examining of complex environment detection, characterized in that includes: the unmanned aerial vehicle path planning and inspection method for complex environment detection comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the unmanned aerial vehicle path planning and inspection method for complex environment detection according to any one of claims 1-7 when executing the program.

9. A computer readable storage medium, wherein the storage medium stores a computer program for executing an unmanned aerial vehicle path planning inspection method for complex environment detection according to any one of claims 1 to 7.