CN115290096B

CN115290096B - Unmanned aerial vehicle dynamic track planning method based on reinforcement learning difference algorithm

Info

Publication number: CN115290096B
Application number: CN202211195962.8A
Authority: CN
Inventors: 谭志平; 唐宇; 黄明浩; 黄文轩; 邢诗曼; 黄华盛; 郭琪伟; 方明伟
Original assignee: Guangdong Polytechnic Normal University
Current assignee: Guangdong Polytechnic Normal University
Priority date: 2022-09-29
Filing date: 2022-09-29
Publication date: 2022-12-20
Anticipated expiration: 2042-09-29
Also published as: CN115290096A

Abstract

The invention relates to the technical field of unmanned aerial vehicle dynamic track planning, and discloses an unmanned aerial vehicle dynamic track planning method based on reinforcement learning difference algorithm, which comprises the following steps: s1: acquiring a terrain environment in which the unmanned aerial vehicle needs to fly; s2: establishing a flight path planning model according to the acquired environmental data and the performance constraint of the unmanned aerial vehicle, representing the environment as an artificial potential field, establishing a gravitational potential field by taking a target point as a center, and establishing a repulsive potential field by taking an obstacle and a threat as centers; s3: when a track planning model is established, a function structural body for correcting a positioning error is added, the current resultant force borne by the unmanned aerial vehicle is calculated according to the artificial potential field, and the unmanned aerial vehicle is enabled to move forward under the action of the resultant force; s4: designing a reinforcement learning difference algorithm based on a flight path planning model; s5: and optimizing the reinforcement learning differential algorithm, implanting the optimized reinforcement learning differential algorithm into an intelligent system of the unmanned aerial vehicle, and solving the algorithm based on the optimization of the reinforcement learning differential algorithm to complete the flight path planning of the unmanned aerial vehicle.

Description

Unmanned aerial vehicle dynamic flight path planning method based on reinforcement learning differential algorithm

Technical Field

The invention relates to the technical field of unmanned aerial vehicle dynamic track planning, in particular to an unmanned aerial vehicle dynamic track planning method based on reinforcement learning difference algorithm.

Background

Oranges and tangerines in the south of hills are mainly planted in hills and mountainous areas, and have the characteristics of large planting density, small scale, large dispersity, variable topographic relief, large steep curves and the like, so that the traditional manual plant protection operation mode is very difficult, and the adoption of the plant protection unmanned aerial vehicle for autonomous operation has obvious advantages.

However, the complex terrain environment causes unstable hilly climate conditions, often accompanied by environmental disturbances such as gusts, heavy fog and rainstorms, and flight operations using a manual remote control mode or autonomous flight operations under a fixed route are difficult to meet the requirements of the plant protection unmanned aerial vehicle on track planning in the complex environment of hilly and mountainous areas. Therefore, the dynamic track planning algorithm of the plant protection unmanned aerial vehicle suitable for the planting characteristics of hilly and mountainous areas is researched, the dynamic planning and autonomous operation of the track of the plant protection unmanned aerial vehicle in the complex environment are realized, and the method is a key link for improving the plant protection efficiency of the citrus unmanned aerial vehicle in the south of the mountains.

As a core part of a track planning system, an optimal track is searched by using a track planning algorithm, which is a popular research subject all the time, and the track planning problem of the plant protection unmanned aerial vehicle in the complex environment of hilly and mountainous areas is a dynamic multi-constraint optimization problem with high dimension, multiple constraints and strong coupling, which is an NP-hard problem. Solving a dynamic multi-constraint optimization problem, the most difficult task is to maintain the diversity of the solution, which requires the algorithm to have very fast convergence speed and calculation accuracy. The traditional evolutionary algorithm is more suitable for solving the problem of static track planning, the problem of dynamic multi-constraint optimization track planning under complex conditions is difficult to process efficiently, the problems of low convergence speed, easy falling into local optimum and the like generally exist, the performance of the algorithm is unsatisfactory, the problem of track planning of the plant protection unmanned aerial vehicle under the complex environment of hilly and mountainous areas is a dynamic multi-constraint optimization problem, and the real-time performance of the algorithm requires that the algorithm has very high planning speed and calculation precision.

Few scholars currently conducting research on dynamic trajectory planning. And the Hidalgo and the like adopt an RRT algorithm to combine with a GPU to realize autonomous real-time planning of the unmanned aerial vehicle flight path in a plurality of simulated scene environments. The algorithm efficiency under various scenes is verified through a numerical simulation experiment, and the algorithm adopts the GPU for calculation, so that the requirement on hardware configuration is very high. Cai and the like adopt an optimization algorithm based on cognitive behaviors to realize real-time planning of the unmanned aerial vehicle flight path in a 3-dimensional environment. The algorithm firstly adopts a three-level function model to design a track route, designs a track target function into three levels of high, medium and low, and adopts a cognitive behavior optimization algorithm for optimization, and experimental results show that the algorithm is superior to a particle swarm algorithm and an RRT algorithm, but the track route is difficult to grade in an actual flight environment. Wan et al uses DeepLabV3+ deep learning model to segment the fruit tree canopy image, and the results of the implementation show that the accuracy of extracting the route by the algorithm is 95% through the fruit tree canopy barycenter number of the segmented binary image, but the algorithm can only be used for planning the route of fruits and vegetables with canopies, and has certain limitations.

In summary, the algorithms for dynamic track planning are few at present, and the conventional planning algorithm and the intelligent optimization algorithm generally have the problems of low convergence rate, easy algorithm to enter local optimization and the like when solving the problem of complex dynamic track planning. Therefore, it is necessary to design an algorithm capable of efficiently processing the dynamic multi-constraint flight path planning problem, and therefore, a method for planning the dynamic flight path of the unmanned aerial vehicle based on the reinforcement learning difference algorithm is provided.

Disclosure of Invention

The invention aims to disclose an unmanned aerial vehicle dynamic track planning method based on a reinforcement learning difference algorithm, and solves the problem of how to efficiently process dynamic multi-constraint track planning.

In order to achieve the purpose, the invention adopts the following technical scheme:

an unmanned aerial vehicle dynamic track planning method based on a reinforcement learning difference algorithm comprises the steps of

S1: acquiring a terrain environment in which the unmanned aerial vehicle needs to fly;

s2: establishing a flight path planning model according to the acquired environmental data and the performance constraint of the unmanned aerial vehicle, representing the environment as an artificial potential field, establishing a gravitational potential field by taking a target point as a center, and establishing a repulsive potential field by taking an obstacle and a threat as centers;

s3: when a track planning model is established, a function structure body for correcting positioning errors is added, the current resultant force borne by the unmanned aerial vehicle is calculated according to the artificial potential field, and the unmanned aerial vehicle is enabled to advance under the action of the resultant force;

s4: designing a reinforcement learning difference algorithm based on a flight path planning model;

s5: and optimizing the reinforcement learning differential algorithm, implanting the optimized reinforcement learning differential algorithm into an unmanned aerial vehicle intelligent system, and solving the algorithm optimized based on the reinforcement learning differential algorithm to complete the flight path planning of the unmanned aerial vehicle.

Preferably, the function structure for increasing the positioning error correction in S3 includes the following steps;

s21: setting an unmanned aerial vehicle flight path planning area consisting of 1 departure point, 1 destination, R horizontal correction points and L vertical correction points;

s22: constructing an unmanned aerial vehicle track planning area containing a point of 2+ R + L, wherein the unmanned aerial vehicle needs to be positioned in real time in the space flight process, the positioning error comprises a vertical error and a horizontal error, the vertical error and the horizontal error are respectively increased by delta special units when the unmanned aerial vehicle flies for 1m, and the vertical error and the horizontal error are both smaller than theta units when the unmanned aerial vehicle reaches a target point, so that the unmanned aerial vehicle can fly according to the planned track;

s23: the unmanned aerial vehicle needs to correct the positioning error in the flight process, correction points exist in a track planning area and can be used for error correction, when the unmanned aerial vehicle reaches the correction points, the error correction can be carried out according to the error correction types of the correction points, the positions for correcting vertical and horizontal errors can be determined before the track planning according to the terrain, when the vertical error and the horizontal error can be corrected in time, the unmanned aerial vehicle can fly according to a preset route, and finally reaches a destination after error correction is carried out through a plurality of correction points.

Preferably, the design of the strong chemical habit difference evolution algorithm in S4 comprises the following steps: s31: combining reinforcement learning and a differential evolution algorithm, and adopting a Q learning algorithm or a deep Q learning algorithm as an intelligent agent to carry out intelligent decision;

s32: analyzing the optimization problem by using the dispersion measurement, the autocorrelation roughness, the terrain information roughness and the fitness cloud, and taking the terrain feature information of the fitness of the optimization problem as the state space of the reinforcement learning intelligent agent;

s33: selecting a control parameter and a variation strategy of a differential evolution algorithm as an action space of the intelligent agent, and designing population evolution efficiency as the reward of the intelligent agent;

s34: and finally, the intelligent agent obtains the local information of the optimization problem through the state space, executes the corresponding operation of the action space according to the state space information, calculates the reward obtained after the corresponding action operation is executed, and returns the reward to the intelligent agent.

Preferably, the calculation of the resultant force in S2 determines the movement direction of the drone according to the following formula:

wherein,

indicating the attraction of the target to the drone,

is the coordinate vector of the target, and X is the coordinate vector of the current position of the drone; k is coefficient, and the value is 0-1;

the repulsion force of the no-fly zone to the unmanned aerial vehicle is expressed, and the existing repulsion field function is adopted in the scheme to complete

Calculating; the resultant force F of the attraction force and the repulsion force is the moving direction of the unmanned aerial vehicle.

Preferably, in the step S5, the solution is performed through an algorithm optimized based on a reinforcement learning difference algorithm, so as to complete the flight path planning of the unmanned aerial vehicle and the obstacle avoidance under the constraint condition on the flight path.

Preferably, the constraint condition obstacle avoidance includes the following steps: s61: inputting the initial position of the unmanned aerial vehicle as the current position

In m no-fly zonesThe position of the heart is determined by the position of the heart,

and a target position G assigned by the drone;

s62: taking two variables G1 and G2, respectively representing a target position and a final target position in the calculation process, and initializing G1= G2= G; open up two storage spaces of A, B to with unmanned aerial vehicle current position

Storing in A; initializing iteration times num =0;

s63: determining the motion direction of the unmanned aerial vehicle, setting the motion step length of the unmanned aerial vehicle to be L, and enabling the unmanned aerial vehicle to move from the current position

Moving according to the movement step length L in the determined movement direction, and updating the current position by the moved position

And storing the position of the unmanned aerial vehicle in A, wherein the iteration number num = num +1;

s64: judging whether num > N is true, if yes, setting num =0 and performing step S65, otherwise, returning to step S63; wherein N is a preset total number of iterations;

s65: judging the current position

Whether the distance d from G1 satisfies d<d ₀ Wherein d is ₀ Is a preset distance threshold;

s66: judging whether the last M position points stored in A are all in a preset circular area, if so, indicating that the position points are in a balance position or a local minimum point currently, and performing jump-out processing; if not, continuing to step S63;

s67: solving a straight line expression between two points which are stored at the last time of A;

s68: judging whether the straight line intersects with each circular no-fly zone, if not, returning to the step S63, otherwise, assigning the last stored position of A to G1, emptying A, and then performing the step S63;

s69: storing all the positions in A into B, judging whether G1 is equal to G2, if not, making order

= G1, G1= G2, and then proceeds to step S63;

s610: and the position points stored in the B are the obstacle avoidance tracks of the unmanned aerial vehicle.

Preferably, the establishing of the track planning model in S2 further includes the following steps: s71: acquiring image data of a target area including surface topography data and plant data;

s72: obtaining an initial route of the unmanned aerial vehicle based on the image data of the target area;

s73: extracting a first actual geographic coordinate of the initial air route based on the inflection point position on the initial air route, and adjusting the first actual geographic coordinate based on the elevation value of the surface topographic data to obtain a first elevation coordinate;

s74: adjusting the initial route based on the first elevation coordinate to obtain a terrain route;

s75: dividing the initial route into sections at a preset distance, and extracting a second actual geographic coordinate of an end point of each section point by point;

s76: adjusting a second actual geographic coordinate based on the crop planting data to obtain a second elevation coordinate, and adjusting the initial air route based on the second elevation coordinate to obtain a crop planting air route;

s77: and establishing a track planning model based on the terrain route and the plant route.

Compared with the prior art, the unmanned aerial vehicle dynamic track planning method based on the reinforcement learning difference algorithm has the following beneficial effects:

1. according to the unmanned aerial vehicle dynamic track planning method based on the reinforcement learning difference algorithm, by aiming at the problem of insufficient diversity of solutions under complex dynamic multi-constraint conditions, a constraint condition processing method combining a self-adaptive relaxation variable method and a feasibility criterion is provided, the reduction of diversity of the solutions is avoided, the time for searching the optimal solution by the algorithm is shortened, the difficulty of optimizing the algorithm is reduced, and the efficiency of the algorithm is improved.

2. According to the unmanned aerial vehicle dynamic flight path planning method based on the reinforcement learning differential algorithm, the situation information of the optimization problem is obtained by adopting fitness terrain analysis methods such as dispersion measurement, autocorrelation roughness, terrain information roughness and fitness cloud and is used as the state space of the intelligent body, the deep reinforcement learning algorithm is combined with the differential evolution, the differential evolution algorithm can be used for adaptively selecting an optimal variation strategy through the decision of the intelligent body in the process of solving the boundary limitation continuous domain optimization problem, the optimal solution can be quickly and efficiently found in real time, and the dynamic planning of the flight path is realized.

Drawings

The invention is further illustrated by means of the attached drawings, but the embodiments in the drawings do not constitute any limitation to the invention, and for a person skilled in the art, without inventive effort, further drawings may be derived from the following figures.

Fig. 1 is a schematic flow diagram of an unmanned aerial vehicle dynamic flight path planning method based on a reinforcement learning difference algorithm according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

As shown in fig. 1, the present invention provides a method for planning a dynamic flight path of an unmanned aerial vehicle based on a reinforcement learning difference algorithm, including:

s2: establishing a track planning model according to the acquired environment data and the performance constraint of the unmanned aerial vehicle, representing the environment as an artificial potential field, establishing a gravitational potential field by taking a target point as a center, and establishing a repulsive potential field by taking an obstacle and a threat as centers;

s3: when a track planning model is established, a function structural body for correcting a positioning error is added, the current resultant force borne by the unmanned aerial vehicle is calculated according to the artificial potential field, and the unmanned aerial vehicle is enabled to move forward under the action of the resultant force;

s4: designing a reinforcement learning difference algorithm based on a flight path planning model; the difference calculation is an operation performed by using difference, and reinforcement learning is also called refit learning, evaluation learning or reinforcement learning, is one of paradigms and methodologies of machine learning, and is used for describing and solving the problem that an intelligent agent achieves return maximization or achieves a specific target through a learning strategy in an interaction process with an environment.

A common model for reinforcement learning is the standard markov decision process. Reinforcement learning can be classified into pattern-based reinforcement learning and modeless reinforcement learning, as well as active reinforcement learning and passive reinforcement learning, under given conditions. Variations of reinforcement learning include reverse reinforcement learning, hierarchical reinforcement learning, and reinforcement learning of partially observable systems. Algorithms used for solving the reinforcement learning problem can be divided into two types, namely a strategy search algorithm and a value function algorithm. The deep learning model can be used in reinforcement learning to form the deep reinforcement learning.

Reinforcement learning does not require any data to be given in advance, but rather obtains learning information and updates model parameters by receiving feedback of the environment on the actions.

The reinforcement learning problem is discussed in the fields of information theory, game theory, automatic control and the like, and is used for explaining a balance state, a design recommendation system and a robot interaction system under the condition of limited rationality.

S5: and optimizing the reinforcement learning differential algorithm, implanting the optimized reinforcement learning differential algorithm into an intelligent system of the unmanned aerial vehicle, and solving the algorithm based on the optimization of the reinforcement learning differential algorithm to complete the flight path planning of the unmanned aerial vehicle.

Preferably, the design of the strong chemical habit differential evolution algorithm in S4 comprises the following steps: s31: combining reinforcement learning and a differential evolution algorithm, and adopting a Q learning algorithm or a deep Q learning algorithm as an agent to make intelligent decision;

s32: analyzing the optimization problem by using the dispersion measurement, the autocorrelation roughness, the terrain information roughness and the adaptability cloud, and using the adaptability terrain feature information of the optimization problem as the state space of the reinforcement learning intelligent agent;

s34: and finally, the intelligent agent obtains local information of the optimization problem through the state space, executes corresponding operation of the action space according to the state space information, calculates rewards obtained after corresponding action operation is executed and returns the rewards to the intelligent agent, and continuously trains and tests the intelligent agent by selecting an IEEE Congress on evolution computing (CEC) series competition dynamic optimization problem test set, so that the reinforcement learning differential evolution algorithm can quickly and efficiently find the optimal solution in real time according to the change of constraint conditions, and the dynamic planning of the flight path is realized.

Preferably, in S2, the direction of motion of the drone is determined according to the following equation:

wherein,

indicating the attraction of the target to the drone,

the repulsion force of the no-fly zone to the unmanned aerial vehicle is expressed, and the existing repulsion field function is adopted in the scheme to complete the scheme

Calculating; the resultant force F of the attraction force and the repulsion force is the direction of the unmanned aerial vehicle movement.

Preferably, in the step S5, the algorithm optimized based on the reinforcement learning difference algorithm is used for solving, so as to complete the flight path planning of the unmanned aerial vehicle and the obstacle avoidance under the constraint condition on the flight path.

The constraint condition obstacle avoidance method comprises the following steps: s61: inputting the initial position of the unmanned aerial vehicle as the current position

The central positions of the m no-fly zones,

and a target position G assigned by the drone;

s62: two variables G1 and G2 are taken to respectively represent targets in the calculation processPosition and final target position, and initializing G1= G2= G; open up two storage spaces of A, B to with unmanned aerial vehicle current position

Storing in A; initializing iteration times num =0;

And storing the position of the unmanned aerial vehicle in A, wherein the iteration times num = num +1;

s65: judging the current position

Whether the distance d from G1 satisfies d<d ₀ In which d is ₀ Is a preset distance threshold;

= G1, G1= G2, and then proceeds to step S63;

Preferably, in S2, the establishing of the track planning model further includes the following steps:

s71: acquiring image data of a target area including surface topography data and plant data;

s74: adjusting an initial route based on the first elevation coordinate to obtain a terrain route;

s75: dividing the initial route into sections at a preset distance, and extracting a second actual geographic coordinate of an endpoint of each section point by point;

s76: adjusting a second actual geographic coordinate based on the crop planting data to obtain a second elevation coordinate, and adjusting an initial route based on the second elevation coordinate to obtain a crop planting route;

s77: and establishing a flight path planning model based on the terrain route and the plant crop route.

Preferably, in S71, the plant data includes a plant type and a plant area, and is obtained by:

s711, dividing the target area into a plurality of sub-areas;

s712, respectively obtaining aerial photos of each sub-area;

and S712, carrying out image recognition processing on the aerial photographs, and acquiring the type of the plant crops contained in each sub-area and the area of each type of plant crops.

Specifically, the aerial photography unmanned aerial vehicle can be controlled in a manual flying mode to acquire aerial photographs of all sub-areas. Because the endurance time of the unmanned aerial vehicle is limited, it is difficult to directly acquire aerial photos of the whole target area, and therefore the method and the device divide the target area and then acquire the aerial photos of each sub-area respectively.

Preferably, in S712, the image recognition processing of the aerial photograph includes:

carrying out enhancement processing on the aerial photo to obtain an enhanced image;

and inputting the enhanced image into a pre-trained neural network model for image recognition processing to obtain the types of the plant crops contained in the enhanced image and calculate the occupied areas of the various types of plant crops.

Preferably, the enhancing the aerial photo to obtain an enhanced image includes:

performing illumination optimization processing on the aerial photo to obtain a first image;

carrying out noise reduction processing on the first image to obtain a second image;

and extracting the region of interest of the second image to obtain an enhanced image.

When the camera is used for aerial photography, the shooting range of the camera can be influenced by cloud layers, so that the illumination distribution is unbalanced, and the influence of air quality can be caused. Therefore, the invention can effectively reduce the influence of the problem of illumination distribution on the final identification of the type and the occupied area of the plant crops by performing illumination optimization processing on the aerial photo, thereby improving the safety of the flight path planning of the invention.

Preferably, the performing illumination optimization processing on the aerial photo to obtain a first image includes:

s81: decomposing the aerial photo by using an improved Retinex model, and decomposing the aerial photo into an illumination component image L and a reflection component image S;

s82: dividing the irradiation component image L into a plurality of sub-images, and storing all the sub-images obtained by the division into a set cutLSet;

s83: respectively acquiring an illumination distribution value of each sub-image in the set cutLSet;

s84: dividing the reflection component image S into a plurality of sub-images, and storing all the sub-images obtained by the division into a set cutSSet;

s85: and respectively carrying out optimization processing on each sub-image in the set cutSSet through a preset model to obtain a first image.

The existing Retinex algorithm generally directly processes the obtained reflection component image to obtain an illumination optimization result, but such a processing manner does not consider information added to the illumination component, so that the finally obtained processing result is not accurate enough. Therefore, after the illumination component image and the reflection component image are obtained, the illumination distribution value is obtained by blocking the illumination component image, and the illumination distribution value is added into the illumination optimization processing process of the reflection component image S, so that the accuracy of the illumination optimization processing result is further improved.

When the illumination distribution value is obtained, the invention accelerates the obtaining speed of the illumination distribution value by dividing the illumination component image L. Similarly, when the reflection component image S is optimized, by performing the division processing, it is avoided that each pixel point is respectively obtained with the calculation parameter in the processing formula, the calculation amount of the corresponding parameter is reduced, and the calculation speed is accelerated while the calculation accuracy is ensured.

Preferably, S81 includes:

s811: the pixel value of the pixel point in the irradiation component image L is obtained by the following equation:

wherein,

show about

To be solved equation，

Representing the pixel value of pixel point d in the illumination component image L,

which represents a constant coefficient of the constant,

represented in the illumination component image L, centred on the pixel point d

A collection of pixel points within a window of size,

to represent

The number of the pixel points in (a) is,

to represent

The pixel value of the pixel point g in the illumination component image L,

and

indicating a control parameter greater than 0 and,

the pixel value of the corresponding pixel point in the aerial photo I of the pixel point d is represented,

expressing the pixel value of the corresponding pixel point in the aerial photo I of the pixel point g; k is a radical ofRepresenting the number of operations;

s812: the reflection component image S is acquired using the following formula:

wherein,

the coordinates of the points of pixels are represented,

、

and

respectively represents the coordinates in the aerial photograph I, the illumination component image L and the reflection component image S as

The pixel value of the pixel point of (1).

In the process of acquiring the illumination component image L and the reflection component image S, the influence of pixel points around the pixel points on the acquired result is not considered in the conventional retinex algorithm, so that the acquired results of the illumination component image and the reflection component image are not accurate enough.

Preferably, the S82 includes:

s821: smoothing the irradiation component image L to obtain a smoothed irradiation component image smL;

s822: the smL is divided in the following mode:

a first round of division processing:

dividing the smL into D sub-images with the same number of pixel points, and dividing the sub-images into D sub-imagesStoring all sub-images obtained by the division into a set

；

Respectively calculate

The judgment coefficient of each sub-image in the image data will be

Storing the subimage with middle judgment coefficient larger than the set judgment coefficient threshold value into the collection

Will be

Storing the subimage of which the middle judgment coefficient is less than or equal to the set judgment coefficient threshold value into a set cutLSet;

dividing the nth round, wherein n is more than or equal to 2:

respectively dividing the sets obtained by the n-1 th division processing

Dividing each sub-image into D sub-images with the same number of pixel points, and storing all sub-images obtained by the division into a set

；

Respectively calculate

The judgment coefficient of each sub-image in the image processing system is

Will be

Storing the subimages of which the middle judgment coefficients are less than or equal to the set judgment coefficient threshold value into a set cutLSet;

judgment of

If the number of the middle elements is less than the set number threshold, the division processing of the smL is finished, and the sub-images contained in the current set cutLSet are used as the division processing result.

In the embodiment of the invention, the illumination component image L is divided, when the sub-image is obtained, the smoothing processing is firstly carried out, and then the division processing is carried out on the smoothing processing result, so that the influence of the pixel points with sudden change on the division processing efficiency can be avoided. In the invention, the whole illumination distribution value of the sub-image needs to be acquired, so that the influence of a single mutation pixel point on the whole sub-image is very small, but the influence on the dividing efficiency is very large, and the number of dividing rounds is greatly increased due to the pixel point with the mutation pixel value. The invention can make the difference between the pixel points in the same subimage as small as possible, and the difference between different subimages as large as possible, so that the illumination distribution value is more representative.

Preferably, the S821 includes:

the illumination component image L is smoothed using the following formula:

wherein,

representing the pixel value of pixel point h in smL,

representing a pixelThe set of pixels in the preset size neighborhood of the corresponding pixel in L for point h,

the length of the connecting line between the pixel point corresponding to the pixel point h in the L and the pixel point m is represented,

the pixel value of the pixel point corresponding to the pixel point h in L is represented,

representing the pixel value of pixel m in L,

represent

The variance of the distance between the pixel point in (b) and the corresponding pixel point in (L) of the pixel point (h),

to represent

The variance of the difference between the pixel value of the pixel point h and the pixel point corresponding to the pixel point h in L.

The embodiment of the invention also considers the relation between the pixel point and the surrounding pixel points in the aspects of pixel value and distance while smoothing the pixel point, so that the transition of the pixel value in the smoothed image is more natural, and corresponding detail information can be reserved. If the processing is performed by using a gaussian filter or the like as it is, the detail information is easily lost. Affecting the accuracy of the sub-image division result.

Preferably, the judgment coefficient is calculated by the following formula:

wherein,

which represents the coefficient of judgment,

representing a collection of pixel points in the sub-image,

representing the pixel value of pixel u in smL,

representing the gradient magnitude of the pixel point u in smL,

to represent

The total number of pixel points contained in (a),

a variance reference value in terms of pixel values,

a variance reference value in terms of gradient magnitude is indicated,

representing a preset scaling factor.

In the embodiment of the invention, the judgment coefficient not only considers the pixel value but also considers the gradient amplitude, and the difference between pixel points in the obtained sub-images is smaller by comprehensively considering the two aspects, so that the accuracy of representing the whole sub-image by using a single illumination distribution value by light is improved.

Preferably, S83 includes:

s831: converting the sub-image to an HSV color space;

s832: acquiring an image V of a brightness component corresponding to the sub-image in an HSV color space;

s833: respectively counting the occurrence frequency of each pixel value in the image V;

s834: and taking the pixel value with the highest occurrence frequency as the illumination distribution value of the sub-image.

Since a value is used to represent the whole case, the present invention will present the most pixel values as the illumination distribution values.

Preferably, the S84 includes:

s841: acquiring a characteristic image DS based on the reflection component image S;

s842: dividing the characteristic image DS into a plurality of sub-images;

s843: the division result of the division processing on the feature image DS is applied to the reflection component image S, and a set cutSSet of sub-images is obtained.

Specifically, the embodiment of the present invention does not directly perform the division processing on the reflection component image S, but obtains the feature image based on the reflection component image S and performs the division processing based on the feature image. The arrangement mode can improve the accuracy of the dividing processing result and ensure the dividing speed. In the feature image DS, since the pixel values are obtained by comprehensive calculation from a plurality of aspects, the information that can be expressed by the pixel values is richer than the information that can be expressed by the pixel values of the pixels of the original reflection component image S.

When the feature image is divided, the smoothed illumination component image may be divided, or an existing extracted image division may be used.

In step S843, for example, a set corresponding to the pixel points in the sub-image Q obtained by the feature image DS is DSQ; and acquiring a set SDSQ of corresponding pixel points of the DSQ in the S, and forming the pixel points in the SDSQ into a sub-image.

Preferably, the S841 includes:

for reflectionThe pixel value of the pixel point T in the characteristic image DS is obtained by adopting the following formula

：

Wherein,

which represents a preset weight coefficient for the weight of the image,

representing pixel point T in the image

The value of the pixel of (1) is,

representing pixel point T in image

The value of the pixel of (a) is,

representing pixel point T in the image

The value of the pixel of (a) is,

for the hue component image of the reflection component image S in the HSV color space,

is an image of the lightness component of the reflected component image S in the HSV color space,

an image representing the luminance component of the reflected component image S in the Lab color space.

Specifically, the pixel values of the pixels in the feature image are weighted and fused from the aspects of hue component, brightness component and brightness component, so that information which can be expressed by one pixel in the feature image is richer.

Preferably, S85 includes:

the following formula is adopted to calculate and optimize the sub-image in the cutSSet:

wherein,

representing sub-images obtained after an optimization process

In (d), the pixel value of the pixel point with coordinates (x, y),

sub-image where pixel point with (x, y) as coordinate is located

The value of the light distribution of (a),

a pixel value representing a pixel point of coordinates (x, y) in the reflection component image S,

sub-image representing the location of pixel point with coordinates (x, y)

Permeability coefficient of (a);

wherein,

and

the weight parameter is represented by a weight value,

respectively representing sub-images

The average value of the red component, the green component and the blue component of the pixel point in the RGB color space,

representing sub-images

The variance of the dark channel values of the pixel points in (1),

representing a preset control coefficient;

the value of (a) is obtained by:

obtaining sub-images

Set of corresponding pixel points of the pixel points in the illumination component image L

；

Obtaining Inclusion collections in a cutLSet

Set of sub-images of pixel points in (1)

；

Will be assembled

As the average value of the illumination distribution values of the sub-images in (1)

The value of (c).

When optimization processing is carried out, the same parameters are adopted for calculating the pixel points in the same sub-image, therefore, for the pixel points in the same sub-image, except the first pixel point for optimization processing, other pixel points are calculated by adopting the parameters obtained by the first pixel point for optimization processing, and therefore the efficiency of optimization processing is effectively improved. Specifically, in the invention, when the pixel points of the same sub-image are optimized,

and

except that the pixel point of the first optimization processing needs to be calculated, other pixel points do not need to be calculated.

The dynamic track planning problem modeling is to research and utilize a grid three-dimensional space division method, combine environmental terrain information, establish a three-dimensional terrain flying environment model, analyze the self performance constraint of the unmanned aerial vehicle, simultaneously consider external constraints such as terrain threats (obstacles), atmospheric threats (gusts and thick fog), sudden threats (flying birds), no-fly zones (high-voltage towers) and the like, establish an external environment constraint condition mathematical model, research and utilize the shortest track length, the smallest track threat and the lowest flying height to construct a track evaluation function, and realize the modeling of the dynamic track planning problem.

The precision, accuracy and optimizing speed of the algorithm are requirements of a dynamic track planning problem on the algorithm, so that the design of the algorithm capable of efficiently solving dynamic multi-constraint conditions is the key point of project research, the invention combines reinforcement learning with a differential evolution algorithm to design the algorithm, the reinforcement learning differential evolution algorithm design combines reinforcement learning with the differential evolution algorithm, the design of an action space, a state space and a value reward function in the reinforcement learning algorithm is researched, the relation between a reinforcement learning decision controller and a differential evolution algorithm variation strategy and control parameters is established, and the algorithm can adaptively select parameters and variation strategies in real time when different dynamic optimization problems are solved.

The dynamic track planning of the reinforcement learning differential evolution algorithm is to research a dynamic multi-constraint condition processing strategy, construct a proper track coding mode, find out the algorithm performance of the reinforcement learning-based differential evolution algorithm in the dynamic track planning problem, design a discrete track point smoothing processing algorithm and realize the dynamic planning of the plant protection unmanned aerial vehicle track under the dynamic multi-constraint condition.

The dynamic track planning problem modeling needs to acquire terrain information of an operation area, including the number of mountains, the heights of the mountains, the operation area, the area outline and the like, a grid three-dimensional space division method is adopted to establish a flight environment model, self performance constraints such as the maximum flight range, the minimum flight height, the maximum turning angle, the maximum diving angle, the minimum step length and the like of the plant protection unmanned aerial vehicle are considered, terrain threats, atmospheric threats, sudden threats, no-fly zones and other external environment constraints existing in a hilly mountain area terrain orange planting base are analyzed, and a multi-constraint condition equation is established; and constructing a track evaluation function according to the shortest track length, the lowest flight height and the smallest track threat, and realizing modeling of a track planning problem under the dynamic multi-constraint condition.

The method aims to solve the problems that a differential evolution algorithm is difficult to select a variable strategy when solving different optimization problems, the algorithm performance is further improved and the like. The method comprises the steps of analyzing a single-target optimization problem in a series of continuous domains by using fitness terrain analysis methods such as information entropy roughness and fitness distance correlation to obtain fitness terrain features corresponding to the optimization problem, establishing a relation between the fitness terrain features and a differential evolution algorithm variation strategy by using a random forest, achieving an improved differential evolution algorithm, and adaptively selecting the variation strategy according to the fitness terrain features of the problem when different optimization problems are solved.

And analyzing the single-target optimization problem limited by the boundary condition by adopting a fitness terrain analysis method, researching the relation between the fitness terrain characteristic and the optimization problem, and judging the complexity of the optimization problem through the fitness terrain analysis characteristic. And the differential evolution algorithm based on the local fitness terrain is realized by analyzing the local fitness terrain of the optimization problem.

The application of the reinforcement learning differential evolution algorithm dynamic track planning comprises the following core steps: a constraint condition processing method based on the combination of the adaptive relaxation variables and the feasibility criteria is adopted to process the dynamic constraint equation, so that the constraint conditions are simplified, the number of feasible solutions is increased, and the solving speed of the algorithm is increased; converting the shortest flight path length, the lowest flight height and the minimum flight path threat into three mutually contradictory objective functions by adopting a self-adaptive weight factor; and introducing a flight environment model, coding a population of a reinforcement learning differential evolution algorithm to solve the dynamic track planning problem, providing a new track smoothing algorithm based on 5-order PH curve splicing to smooth track points, verifying the algorithm performance through a numerical simulation experiment, finally embedding the algorithm into an autonomous research and development plant protection unmanned aerial vehicle flight control system, and performing experiment verification in an actual environment.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. An unmanned aerial vehicle dynamic flight path planning method based on reinforcement learning difference algorithm is characterized by comprising the following steps: s1: acquiring a terrain environment in which the unmanned aerial vehicle needs to fly;

s5: optimizing the reinforcement learning differential algorithm, implanting the optimized reinforcement learning differential algorithm into an unmanned aerial vehicle intelligent system, and solving the algorithm optimized based on the reinforcement learning differential algorithm to complete the flight path planning of the unmanned aerial vehicle;

the design of the S4 mesoscale chemical learning differential evolution algorithm comprises the following steps:

s31: combining reinforcement learning and a differential evolution algorithm, and adopting a Q learning algorithm or a deep Q learning algorithm as an intelligent agent to carry out intelligent decision;

s33: selecting a control parameter and a variation strategy of a differential evolution algorithm as an action space of the intelligent agent, and designing population evolution efficiency as reward of the intelligent agent;

2. The unmanned aerial vehicle dynamic track planning method based on the reinforcement learning difference algorithm according to claim 1, wherein the step of adding a function structure body for positioning error correction in the step S3 comprises the following steps:

s21: setting an unmanned aerial vehicle track planning area consisting of 1 departure point, 1 destination, R horizontal correction points and L vertical correction points of the unmanned aerial vehicle;

s23: the unmanned aerial vehicle needs to correct the positioning error in the flight process, correction points exist in a track planning area and are used for error correction, when the unmanned aerial vehicle reaches the correction points, the error correction can be carried out according to the error correction types of the correction points, the positions for correcting vertical and horizontal errors are determined before track planning according to terrain, when the vertical errors and the horizontal errors can be corrected in time, the unmanned aerial vehicle can fly according to a preset route, and finally reaches a destination after error correction is carried out through a plurality of correction points.

3. The dynamic unmanned aerial vehicle track planning method based on the reinforcement learning difference algorithm as claimed in claim 1, wherein the resultant force calculation in S3 determines the motion direction of the unmanned aerial vehicle according to the following formula:

；

wherein,

indicating the attraction of the target to the drone,

the repulsion force of the no-fly zone to the unmanned aerial vehicle is expressed and is completed by adopting the existing repulsion field function

4. The dynamic unmanned aerial vehicle track planning method based on the reinforcement learning differential algorithm as claimed in claim 1, wherein in S5, the algorithm optimized based on the reinforcement learning differential algorithm is solved to complete the track planning of the unmanned aerial vehicle and the obstacle avoidance of the constraint condition on the track.

5. The unmanned aerial vehicle dynamic track planning method based on the reinforcement learning difference algorithm of claim 4, wherein the constraint condition obstacle avoidance comprises the following steps:

s61: inputting the initial position of the unmanned aerial vehicle as the current position

The central positions of the m no-fly zones,

and a target position G assigned by the drone;

s62: taking two variables G1 and G2, respectively representing a target position in the calculation process and a final target position, and initializing G1= G2= G; opening up two storage spaces A and B and using the current position of the unmanned aerial vehicle

Storing in A; initializationThe iteration number num =0;

Moving according to the movement step length L in the determined movement direction, and updating the current position with the moved position

s65: judging the current position

Whether the distance d from G1 satisfies d<

In which

Is a preset distance threshold;

s67: solving a straight line expression between two points stored in the last step A;

= G1, G1= G2, and then proceeds to step S63;

6. The unmanned aerial vehicle dynamic track planning method based on the reinforcement learning difference algorithm as claimed in claim 1, wherein the establishing of the track planning model in S2 further comprises the following steps:

s73: extracting a first actual geographic coordinate of the initial route based on the inflection point position on the initial route, and adjusting the first actual geographic coordinate based on the elevation value of the terrain data of the earth surface to obtain a first elevation coordinate;