CN115290096B - Unmanned aerial vehicle dynamic track planning method based on reinforcement learning difference algorithm - Google Patents

Unmanned aerial vehicle dynamic track planning method based on reinforcement learning difference algorithm Download PDF

Info

Publication number
CN115290096B
CN115290096B CN202211195962.8A CN202211195962A CN115290096B CN 115290096 B CN115290096 B CN 115290096B CN 202211195962 A CN202211195962 A CN 202211195962A CN 115290096 B CN115290096 B CN 115290096B
Authority
CN
China
Prior art keywords
aerial vehicle
unmanned aerial
algorithm
reinforcement learning
track planning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211195962.8A
Other languages
Chinese (zh)
Other versions
CN115290096A (en
Inventor
谭志平
唐宇
黄明浩
黄文轩
邢诗曼
黄华盛
郭琪伟
方明伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Polytechnic Normal University
Original Assignee
Guangdong Polytechnic Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Polytechnic Normal University filed Critical Guangdong Polytechnic Normal University
Priority to CN202211195962.8A priority Critical patent/CN115290096B/en
Publication of CN115290096A publication Critical patent/CN115290096A/en
Application granted granted Critical
Publication of CN115290096B publication Critical patent/CN115290096B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/04Constraint-based CAD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/10Numerical modelling
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Automation & Control Theory (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention relates to the technical field of unmanned aerial vehicle dynamic track planning, and discloses an unmanned aerial vehicle dynamic track planning method based on reinforcement learning difference algorithm, which comprises the following steps: s1: acquiring a terrain environment in which the unmanned aerial vehicle needs to fly; s2: establishing a flight path planning model according to the acquired environmental data and the performance constraint of the unmanned aerial vehicle, representing the environment as an artificial potential field, establishing a gravitational potential field by taking a target point as a center, and establishing a repulsive potential field by taking an obstacle and a threat as centers; s3: when a track planning model is established, a function structural body for correcting a positioning error is added, the current resultant force borne by the unmanned aerial vehicle is calculated according to the artificial potential field, and the unmanned aerial vehicle is enabled to move forward under the action of the resultant force; s4: designing a reinforcement learning difference algorithm based on a flight path planning model; s5: and optimizing the reinforcement learning differential algorithm, implanting the optimized reinforcement learning differential algorithm into an intelligent system of the unmanned aerial vehicle, and solving the algorithm based on the optimization of the reinforcement learning differential algorithm to complete the flight path planning of the unmanned aerial vehicle.

Description

Unmanned aerial vehicle dynamic flight path planning method based on reinforcement learning differential algorithm
Technical Field
The invention relates to the technical field of unmanned aerial vehicle dynamic track planning, in particular to an unmanned aerial vehicle dynamic track planning method based on reinforcement learning difference algorithm.
Background
Oranges and tangerines in the south of hills are mainly planted in hills and mountainous areas, and have the characteristics of large planting density, small scale, large dispersity, variable topographic relief, large steep curves and the like, so that the traditional manual plant protection operation mode is very difficult, and the adoption of the plant protection unmanned aerial vehicle for autonomous operation has obvious advantages.
However, the complex terrain environment causes unstable hilly climate conditions, often accompanied by environmental disturbances such as gusts, heavy fog and rainstorms, and flight operations using a manual remote control mode or autonomous flight operations under a fixed route are difficult to meet the requirements of the plant protection unmanned aerial vehicle on track planning in the complex environment of hilly and mountainous areas. Therefore, the dynamic track planning algorithm of the plant protection unmanned aerial vehicle suitable for the planting characteristics of hilly and mountainous areas is researched, the dynamic planning and autonomous operation of the track of the plant protection unmanned aerial vehicle in the complex environment are realized, and the method is a key link for improving the plant protection efficiency of the citrus unmanned aerial vehicle in the south of the mountains.
As a core part of a track planning system, an optimal track is searched by using a track planning algorithm, which is a popular research subject all the time, and the track planning problem of the plant protection unmanned aerial vehicle in the complex environment of hilly and mountainous areas is a dynamic multi-constraint optimization problem with high dimension, multiple constraints and strong coupling, which is an NP-hard problem. Solving a dynamic multi-constraint optimization problem, the most difficult task is to maintain the diversity of the solution, which requires the algorithm to have very fast convergence speed and calculation accuracy. The traditional evolutionary algorithm is more suitable for solving the problem of static track planning, the problem of dynamic multi-constraint optimization track planning under complex conditions is difficult to process efficiently, the problems of low convergence speed, easy falling into local optimum and the like generally exist, the performance of the algorithm is unsatisfactory, the problem of track planning of the plant protection unmanned aerial vehicle under the complex environment of hilly and mountainous areas is a dynamic multi-constraint optimization problem, and the real-time performance of the algorithm requires that the algorithm has very high planning speed and calculation precision.
Few scholars currently conducting research on dynamic trajectory planning. And the Hidalgo and the like adopt an RRT algorithm to combine with a GPU to realize autonomous real-time planning of the unmanned aerial vehicle flight path in a plurality of simulated scene environments. The algorithm efficiency under various scenes is verified through a numerical simulation experiment, and the algorithm adopts the GPU for calculation, so that the requirement on hardware configuration is very high. Cai and the like adopt an optimization algorithm based on cognitive behaviors to realize real-time planning of the unmanned aerial vehicle flight path in a 3-dimensional environment. The algorithm firstly adopts a three-level function model to design a track route, designs a track target function into three levels of high, medium and low, and adopts a cognitive behavior optimization algorithm for optimization, and experimental results show that the algorithm is superior to a particle swarm algorithm and an RRT algorithm, but the track route is difficult to grade in an actual flight environment. Wan et al uses DeepLabV3+ deep learning model to segment the fruit tree canopy image, and the results of the implementation show that the accuracy of extracting the route by the algorithm is 95% through the fruit tree canopy barycenter number of the segmented binary image, but the algorithm can only be used for planning the route of fruits and vegetables with canopies, and has certain limitations.
In summary, the algorithms for dynamic track planning are few at present, and the conventional planning algorithm and the intelligent optimization algorithm generally have the problems of low convergence rate, easy algorithm to enter local optimization and the like when solving the problem of complex dynamic track planning. Therefore, it is necessary to design an algorithm capable of efficiently processing the dynamic multi-constraint flight path planning problem, and therefore, a method for planning the dynamic flight path of the unmanned aerial vehicle based on the reinforcement learning difference algorithm is provided.
Disclosure of Invention
The invention aims to disclose an unmanned aerial vehicle dynamic track planning method based on a reinforcement learning difference algorithm, and solves the problem of how to efficiently process dynamic multi-constraint track planning.
In order to achieve the purpose, the invention adopts the following technical scheme:
an unmanned aerial vehicle dynamic track planning method based on a reinforcement learning difference algorithm comprises the steps of
S1: acquiring a terrain environment in which the unmanned aerial vehicle needs to fly;
s2: establishing a flight path planning model according to the acquired environmental data and the performance constraint of the unmanned aerial vehicle, representing the environment as an artificial potential field, establishing a gravitational potential field by taking a target point as a center, and establishing a repulsive potential field by taking an obstacle and a threat as centers;
s3: when a track planning model is established, a function structure body for correcting positioning errors is added, the current resultant force borne by the unmanned aerial vehicle is calculated according to the artificial potential field, and the unmanned aerial vehicle is enabled to advance under the action of the resultant force;
s4: designing a reinforcement learning difference algorithm based on a flight path planning model;
s5: and optimizing the reinforcement learning differential algorithm, implanting the optimized reinforcement learning differential algorithm into an unmanned aerial vehicle intelligent system, and solving the algorithm optimized based on the reinforcement learning differential algorithm to complete the flight path planning of the unmanned aerial vehicle.
Preferably, the function structure for increasing the positioning error correction in S3 includes the following steps;
s21: setting an unmanned aerial vehicle flight path planning area consisting of 1 departure point, 1 destination, R horizontal correction points and L vertical correction points;
s22: constructing an unmanned aerial vehicle track planning area containing a point of 2+ R + L, wherein the unmanned aerial vehicle needs to be positioned in real time in the space flight process, the positioning error comprises a vertical error and a horizontal error, the vertical error and the horizontal error are respectively increased by delta special units when the unmanned aerial vehicle flies for 1m, and the vertical error and the horizontal error are both smaller than theta units when the unmanned aerial vehicle reaches a target point, so that the unmanned aerial vehicle can fly according to the planned track;
s23: the unmanned aerial vehicle needs to correct the positioning error in the flight process, correction points exist in a track planning area and can be used for error correction, when the unmanned aerial vehicle reaches the correction points, the error correction can be carried out according to the error correction types of the correction points, the positions for correcting vertical and horizontal errors can be determined before the track planning according to the terrain, when the vertical error and the horizontal error can be corrected in time, the unmanned aerial vehicle can fly according to a preset route, and finally reaches a destination after error correction is carried out through a plurality of correction points.
Preferably, the design of the strong chemical habit difference evolution algorithm in S4 comprises the following steps: s31: combining reinforcement learning and a differential evolution algorithm, and adopting a Q learning algorithm or a deep Q learning algorithm as an intelligent agent to carry out intelligent decision;
s32: analyzing the optimization problem by using the dispersion measurement, the autocorrelation roughness, the terrain information roughness and the fitness cloud, and taking the terrain feature information of the fitness of the optimization problem as the state space of the reinforcement learning intelligent agent;
s33: selecting a control parameter and a variation strategy of a differential evolution algorithm as an action space of the intelligent agent, and designing population evolution efficiency as the reward of the intelligent agent;
s34: and finally, the intelligent agent obtains the local information of the optimization problem through the state space, executes the corresponding operation of the action space according to the state space information, calculates the reward obtained after the corresponding action operation is executed, and returns the reward to the intelligent agent.
Preferably, the calculation of the resultant force in S2 determines the movement direction of the drone according to the following formula:
Figure 440154DEST_PATH_IMAGE001
wherein,
Figure 945478DEST_PATH_IMAGE002
indicating the attraction of the target to the drone,
Figure 920519DEST_PATH_IMAGE003
is the coordinate vector of the target, and X is the coordinate vector of the current position of the drone; k is coefficient, and the value is 0-1;
Figure 637808DEST_PATH_IMAGE004
the repulsion force of the no-fly zone to the unmanned aerial vehicle is expressed, and the existing repulsion field function is adopted in the scheme to complete
Figure 548257DEST_PATH_IMAGE004
Calculating; the resultant force F of the attraction force and the repulsion force is the moving direction of the unmanned aerial vehicle.
Preferably, in the step S5, the solution is performed through an algorithm optimized based on a reinforcement learning difference algorithm, so as to complete the flight path planning of the unmanned aerial vehicle and the obstacle avoidance under the constraint condition on the flight path.
Preferably, the constraint condition obstacle avoidance includes the following steps: s61: inputting the initial position of the unmanned aerial vehicle as the current position
Figure 819839DEST_PATH_IMAGE005
In m no-fly zonesThe position of the heart is determined by the position of the heart,
Figure 16814DEST_PATH_IMAGE006
and a target position G assigned by the drone;
s62: taking two variables G1 and G2, respectively representing a target position and a final target position in the calculation process, and initializing G1= G2= G; open up two storage spaces of A, B to with unmanned aerial vehicle current position
Figure 277025DEST_PATH_IMAGE005
Storing in A; initializing iteration times num =0;
s63: determining the motion direction of the unmanned aerial vehicle, setting the motion step length of the unmanned aerial vehicle to be L, and enabling the unmanned aerial vehicle to move from the current position
Figure 92796DEST_PATH_IMAGE005
Moving according to the movement step length L in the determined movement direction, and updating the current position by the moved position
Figure 117253DEST_PATH_IMAGE005
And storing the position of the unmanned aerial vehicle in A, wherein the iteration number num = num +1;
s64: judging whether num > N is true, if yes, setting num =0 and performing step S65, otherwise, returning to step S63; wherein N is a preset total number of iterations;
s65: judging the current position
Figure 309462DEST_PATH_IMAGE005
Whether the distance d from G1 satisfies d<d 0 Wherein d is 0 Is a preset distance threshold;
s66: judging whether the last M position points stored in A are all in a preset circular area, if so, indicating that the position points are in a balance position or a local minimum point currently, and performing jump-out processing; if not, continuing to step S63;
s67: solving a straight line expression between two points which are stored at the last time of A;
s68: judging whether the straight line intersects with each circular no-fly zone, if not, returning to the step S63, otherwise, assigning the last stored position of A to G1, emptying A, and then performing the step S63;
s69: storing all the positions in A into B, judging whether G1 is equal to G2, if not, making order
Figure 470185DEST_PATH_IMAGE005
= G1, G1= G2, and then proceeds to step S63;
s610: and the position points stored in the B are the obstacle avoidance tracks of the unmanned aerial vehicle.
Preferably, the establishing of the track planning model in S2 further includes the following steps: s71: acquiring image data of a target area including surface topography data and plant data;
s72: obtaining an initial route of the unmanned aerial vehicle based on the image data of the target area;
s73: extracting a first actual geographic coordinate of the initial air route based on the inflection point position on the initial air route, and adjusting the first actual geographic coordinate based on the elevation value of the surface topographic data to obtain a first elevation coordinate;
s74: adjusting the initial route based on the first elevation coordinate to obtain a terrain route;
s75: dividing the initial route into sections at a preset distance, and extracting a second actual geographic coordinate of an end point of each section point by point;
s76: adjusting a second actual geographic coordinate based on the crop planting data to obtain a second elevation coordinate, and adjusting the initial air route based on the second elevation coordinate to obtain a crop planting air route;
s77: and establishing a track planning model based on the terrain route and the plant route.
Compared with the prior art, the unmanned aerial vehicle dynamic track planning method based on the reinforcement learning difference algorithm has the following beneficial effects:
1. according to the unmanned aerial vehicle dynamic track planning method based on the reinforcement learning difference algorithm, by aiming at the problem of insufficient diversity of solutions under complex dynamic multi-constraint conditions, a constraint condition processing method combining a self-adaptive relaxation variable method and a feasibility criterion is provided, the reduction of diversity of the solutions is avoided, the time for searching the optimal solution by the algorithm is shortened, the difficulty of optimizing the algorithm is reduced, and the efficiency of the algorithm is improved.
2. According to the unmanned aerial vehicle dynamic flight path planning method based on the reinforcement learning differential algorithm, the situation information of the optimization problem is obtained by adopting fitness terrain analysis methods such as dispersion measurement, autocorrelation roughness, terrain information roughness and fitness cloud and is used as the state space of the intelligent body, the deep reinforcement learning algorithm is combined with the differential evolution, the differential evolution algorithm can be used for adaptively selecting an optimal variation strategy through the decision of the intelligent body in the process of solving the boundary limitation continuous domain optimization problem, the optimal solution can be quickly and efficiently found in real time, and the dynamic planning of the flight path is realized.
Drawings
The invention is further illustrated by means of the attached drawings, but the embodiments in the drawings do not constitute any limitation to the invention, and for a person skilled in the art, without inventive effort, further drawings may be derived from the following figures.
Fig. 1 is a schematic flow diagram of an unmanned aerial vehicle dynamic flight path planning method based on a reinforcement learning difference algorithm according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
As shown in fig. 1, the present invention provides a method for planning a dynamic flight path of an unmanned aerial vehicle based on a reinforcement learning difference algorithm, including:
s1: acquiring a terrain environment in which the unmanned aerial vehicle needs to fly;
s2: establishing a track planning model according to the acquired environment data and the performance constraint of the unmanned aerial vehicle, representing the environment as an artificial potential field, establishing a gravitational potential field by taking a target point as a center, and establishing a repulsive potential field by taking an obstacle and a threat as centers;
s3: when a track planning model is established, a function structural body for correcting a positioning error is added, the current resultant force borne by the unmanned aerial vehicle is calculated according to the artificial potential field, and the unmanned aerial vehicle is enabled to move forward under the action of the resultant force;
s4: designing a reinforcement learning difference algorithm based on a flight path planning model; the difference calculation is an operation performed by using difference, and reinforcement learning is also called refit learning, evaluation learning or reinforcement learning, is one of paradigms and methodologies of machine learning, and is used for describing and solving the problem that an intelligent agent achieves return maximization or achieves a specific target through a learning strategy in an interaction process with an environment.
A common model for reinforcement learning is the standard markov decision process. Reinforcement learning can be classified into pattern-based reinforcement learning and modeless reinforcement learning, as well as active reinforcement learning and passive reinforcement learning, under given conditions. Variations of reinforcement learning include reverse reinforcement learning, hierarchical reinforcement learning, and reinforcement learning of partially observable systems. Algorithms used for solving the reinforcement learning problem can be divided into two types, namely a strategy search algorithm and a value function algorithm. The deep learning model can be used in reinforcement learning to form the deep reinforcement learning.
Reinforcement learning does not require any data to be given in advance, but rather obtains learning information and updates model parameters by receiving feedback of the environment on the actions.
The reinforcement learning problem is discussed in the fields of information theory, game theory, automatic control and the like, and is used for explaining a balance state, a design recommendation system and a robot interaction system under the condition of limited rationality.
S5: and optimizing the reinforcement learning differential algorithm, implanting the optimized reinforcement learning differential algorithm into an intelligent system of the unmanned aerial vehicle, and solving the algorithm based on the optimization of the reinforcement learning differential algorithm to complete the flight path planning of the unmanned aerial vehicle.
Preferably, the function structure for increasing the positioning error correction in S3 includes the following steps;
s21: setting an unmanned aerial vehicle flight path planning area consisting of 1 departure point, 1 destination, R horizontal correction points and L vertical correction points;
s22: constructing an unmanned aerial vehicle track planning area containing a point of 2+ R + L, wherein the unmanned aerial vehicle needs to be positioned in real time in the space flight process, the positioning error comprises a vertical error and a horizontal error, the vertical error and the horizontal error are respectively increased by delta special units when the unmanned aerial vehicle flies for 1m, and the vertical error and the horizontal error are both smaller than theta units when the unmanned aerial vehicle reaches a target point, so that the unmanned aerial vehicle can fly according to the planned track;
s23: the unmanned aerial vehicle needs to correct the positioning error in the flight process, correction points exist in a track planning area and can be used for error correction, when the unmanned aerial vehicle reaches the correction points, the error correction can be carried out according to the error correction types of the correction points, the positions for correcting vertical and horizontal errors can be determined before the track planning according to the terrain, when the vertical error and the horizontal error can be corrected in time, the unmanned aerial vehicle can fly according to a preset route, and finally reaches a destination after error correction is carried out through a plurality of correction points.
Preferably, the design of the strong chemical habit differential evolution algorithm in S4 comprises the following steps: s31: combining reinforcement learning and a differential evolution algorithm, and adopting a Q learning algorithm or a deep Q learning algorithm as an agent to make intelligent decision;
s32: analyzing the optimization problem by using the dispersion measurement, the autocorrelation roughness, the terrain information roughness and the adaptability cloud, and using the adaptability terrain feature information of the optimization problem as the state space of the reinforcement learning intelligent agent;
s33: selecting a control parameter and a variation strategy of a differential evolution algorithm as an action space of the intelligent agent, and designing population evolution efficiency as the reward of the intelligent agent;
s34: and finally, the intelligent agent obtains local information of the optimization problem through the state space, executes corresponding operation of the action space according to the state space information, calculates rewards obtained after corresponding action operation is executed and returns the rewards to the intelligent agent, and continuously trains and tests the intelligent agent by selecting an IEEE Congress on evolution computing (CEC) series competition dynamic optimization problem test set, so that the reinforcement learning differential evolution algorithm can quickly and efficiently find the optimal solution in real time according to the change of constraint conditions, and the dynamic planning of the flight path is realized.
Preferably, in S2, the direction of motion of the drone is determined according to the following equation:
Figure 456858DEST_PATH_IMAGE001
wherein,
Figure 703031DEST_PATH_IMAGE002
indicating the attraction of the target to the drone,
Figure 433352DEST_PATH_IMAGE003
is the coordinate vector of the target, and X is the coordinate vector of the current position of the drone; k is coefficient, and the value is 0-1;
Figure 714161DEST_PATH_IMAGE004
the repulsion force of the no-fly zone to the unmanned aerial vehicle is expressed, and the existing repulsion field function is adopted in the scheme to complete the scheme
Figure 871735DEST_PATH_IMAGE004
Calculating; the resultant force F of the attraction force and the repulsion force is the direction of the unmanned aerial vehicle movement.
Preferably, in the step S5, the algorithm optimized based on the reinforcement learning difference algorithm is used for solving, so as to complete the flight path planning of the unmanned aerial vehicle and the obstacle avoidance under the constraint condition on the flight path.
The constraint condition obstacle avoidance method comprises the following steps: s61: inputting the initial position of the unmanned aerial vehicle as the current position
Figure 339625DEST_PATH_IMAGE007
The central positions of the m no-fly zones,
Figure 637752DEST_PATH_IMAGE006
and a target position G assigned by the drone;
s62: two variables G1 and G2 are taken to respectively represent targets in the calculation processPosition and final target position, and initializing G1= G2= G; open up two storage spaces of A, B to with unmanned aerial vehicle current position
Figure 274532DEST_PATH_IMAGE007
Storing in A; initializing iteration times num =0;
s63: determining the motion direction of the unmanned aerial vehicle, setting the motion step length of the unmanned aerial vehicle to be L, and enabling the unmanned aerial vehicle to move from the current position
Figure 101542DEST_PATH_IMAGE007
Moving according to the movement step length L in the determined movement direction, and updating the current position by the moved position
Figure 280896DEST_PATH_IMAGE007
And storing the position of the unmanned aerial vehicle in A, wherein the iteration times num = num +1;
s64: judging whether num > N is true, if yes, setting num =0 and performing step S65, otherwise, returning to step S63; wherein N is a preset total number of iterations;
s65: judging the current position
Figure 851554DEST_PATH_IMAGE007
Whether the distance d from G1 satisfies d<d 0 In which d is 0 Is a preset distance threshold;
s66: judging whether the last M position points stored in A are all in a preset circular area, if so, indicating that the position points are in a balance position or a local minimum point currently, and performing jump-out processing; if not, continuing to step S63;
s67: solving a straight line expression between two points which are stored at the last time of A;
s68: judging whether the straight line intersects with each circular no-fly zone, if not, returning to the step S63, otherwise, assigning the last stored position of A to G1, emptying A, and then performing the step S63;
s69: storing all the positions in A into B, judging whether G1 is equal to G2, if not, making order
Figure 77261DEST_PATH_IMAGE007
= G1, G1= G2, and then proceeds to step S63;
s610: and the position points stored in the B are the obstacle avoidance tracks of the unmanned aerial vehicle.
Preferably, in S2, the establishing of the track planning model further includes the following steps:
s71: acquiring image data of a target area including surface topography data and plant data;
s72: obtaining an initial route of the unmanned aerial vehicle based on the image data of the target area;
s73: extracting a first actual geographic coordinate of the initial air route based on the inflection point position on the initial air route, and adjusting the first actual geographic coordinate based on the elevation value of the surface topographic data to obtain a first elevation coordinate;
s74: adjusting an initial route based on the first elevation coordinate to obtain a terrain route;
s75: dividing the initial route into sections at a preset distance, and extracting a second actual geographic coordinate of an endpoint of each section point by point;
s76: adjusting a second actual geographic coordinate based on the crop planting data to obtain a second elevation coordinate, and adjusting an initial route based on the second elevation coordinate to obtain a crop planting route;
s77: and establishing a flight path planning model based on the terrain route and the plant crop route.
Preferably, in S71, the plant data includes a plant type and a plant area, and is obtained by:
s711, dividing the target area into a plurality of sub-areas;
s712, respectively obtaining aerial photos of each sub-area;
and S712, carrying out image recognition processing on the aerial photographs, and acquiring the type of the plant crops contained in each sub-area and the area of each type of plant crops.
Specifically, the aerial photography unmanned aerial vehicle can be controlled in a manual flying mode to acquire aerial photographs of all sub-areas. Because the endurance time of the unmanned aerial vehicle is limited, it is difficult to directly acquire aerial photos of the whole target area, and therefore the method and the device divide the target area and then acquire the aerial photos of each sub-area respectively.
Preferably, in S712, the image recognition processing of the aerial photograph includes:
carrying out enhancement processing on the aerial photo to obtain an enhanced image;
and inputting the enhanced image into a pre-trained neural network model for image recognition processing to obtain the types of the plant crops contained in the enhanced image and calculate the occupied areas of the various types of plant crops.
Preferably, the enhancing the aerial photo to obtain an enhanced image includes:
performing illumination optimization processing on the aerial photo to obtain a first image;
carrying out noise reduction processing on the first image to obtain a second image;
and extracting the region of interest of the second image to obtain an enhanced image.
When the camera is used for aerial photography, the shooting range of the camera can be influenced by cloud layers, so that the illumination distribution is unbalanced, and the influence of air quality can be caused. Therefore, the invention can effectively reduce the influence of the problem of illumination distribution on the final identification of the type and the occupied area of the plant crops by performing illumination optimization processing on the aerial photo, thereby improving the safety of the flight path planning of the invention.
Preferably, the performing illumination optimization processing on the aerial photo to obtain a first image includes:
s81: decomposing the aerial photo by using an improved Retinex model, and decomposing the aerial photo into an illumination component image L and a reflection component image S;
s82: dividing the irradiation component image L into a plurality of sub-images, and storing all the sub-images obtained by the division into a set cutLSet;
s83: respectively acquiring an illumination distribution value of each sub-image in the set cutLSet;
s84: dividing the reflection component image S into a plurality of sub-images, and storing all the sub-images obtained by the division into a set cutSSet;
s85: and respectively carrying out optimization processing on each sub-image in the set cutSSet through a preset model to obtain a first image.
The existing Retinex algorithm generally directly processes the obtained reflection component image to obtain an illumination optimization result, but such a processing manner does not consider information added to the illumination component, so that the finally obtained processing result is not accurate enough. Therefore, after the illumination component image and the reflection component image are obtained, the illumination distribution value is obtained by blocking the illumination component image, and the illumination distribution value is added into the illumination optimization processing process of the reflection component image S, so that the accuracy of the illumination optimization processing result is further improved.
When the illumination distribution value is obtained, the invention accelerates the obtaining speed of the illumination distribution value by dividing the illumination component image L. Similarly, when the reflection component image S is optimized, by performing the division processing, it is avoided that each pixel point is respectively obtained with the calculation parameter in the processing formula, the calculation amount of the corresponding parameter is reduced, and the calculation speed is accelerated while the calculation accuracy is ensured.
Preferably, S81 includes:
s811: the pixel value of the pixel point in the irradiation component image L is obtained by the following equation:
Figure 340752DEST_PATH_IMAGE008
wherein,
Figure 19121DEST_PATH_IMAGE009
show about
Figure 659050DEST_PATH_IMAGE010
To be solved equation,
Figure 270422DEST_PATH_IMAGE010
Representing the pixel value of pixel point d in the illumination component image L,
Figure 173656DEST_PATH_IMAGE011
which represents a constant coefficient of the constant,
Figure 837855DEST_PATH_IMAGE012
represented in the illumination component image L, centred on the pixel point d
Figure 517360DEST_PATH_IMAGE013
A collection of pixel points within a window of size,
Figure 216195DEST_PATH_IMAGE014
to represent
Figure 57374DEST_PATH_IMAGE012
The number of the pixel points in (a) is,
Figure 943291DEST_PATH_IMAGE015
to represent
Figure 393864DEST_PATH_IMAGE012
The pixel value of the pixel point g in the illumination component image L,
Figure 183090DEST_PATH_IMAGE016
and
Figure 959285DEST_PATH_IMAGE017
indicating a control parameter greater than 0 and,
Figure 365121DEST_PATH_IMAGE018
the pixel value of the corresponding pixel point in the aerial photo I of the pixel point d is represented,
Figure 619385DEST_PATH_IMAGE019
expressing the pixel value of the corresponding pixel point in the aerial photo I of the pixel point g; k is a radical ofRepresenting the number of operations;
s812: the reflection component image S is acquired using the following formula:
Figure 618495DEST_PATH_IMAGE020
wherein,
Figure 300012DEST_PATH_IMAGE021
the coordinates of the points of pixels are represented,
Figure 160521DEST_PATH_IMAGE022
Figure 985520DEST_PATH_IMAGE023
and
Figure 716716DEST_PATH_IMAGE024
respectively represents the coordinates in the aerial photograph I, the illumination component image L and the reflection component image S as
Figure 805019DEST_PATH_IMAGE021
The pixel value of the pixel point of (1).
In the process of acquiring the illumination component image L and the reflection component image S, the influence of pixel points around the pixel points on the acquired result is not considered in the conventional retinex algorithm, so that the acquired results of the illumination component image and the reflection component image are not accurate enough.
Preferably, the S82 includes:
s821: smoothing the irradiation component image L to obtain a smoothed irradiation component image smL;
s822: the smL is divided in the following mode:
a first round of division processing:
dividing the smL into D sub-images with the same number of pixel points, and dividing the sub-images into D sub-imagesStoring all sub-images obtained by the division into a set
Figure 152824DEST_PATH_IMAGE025
Respectively calculate
Figure 748891DEST_PATH_IMAGE025
The judgment coefficient of each sub-image in the image data will be
Figure 367216DEST_PATH_IMAGE025
Storing the subimage with middle judgment coefficient larger than the set judgment coefficient threshold value into the collection
Figure 124956DEST_PATH_IMAGE026
Will be
Figure 461522DEST_PATH_IMAGE025
Storing the subimage of which the middle judgment coefficient is less than or equal to the set judgment coefficient threshold value into a set cutLSet;
dividing the nth round, wherein n is more than or equal to 2:
respectively dividing the sets obtained by the n-1 th division processing
Figure 64542DEST_PATH_IMAGE027
Dividing each sub-image into D sub-images with the same number of pixel points, and storing all sub-images obtained by the division into a set
Figure 567067DEST_PATH_IMAGE028
Respectively calculate
Figure 997174DEST_PATH_IMAGE028
The judgment coefficient of each sub-image in the image processing system is
Figure 319571DEST_PATH_IMAGE028
Storing the subimage with middle judgment coefficient larger than the set judgment coefficient threshold value into the collection
Figure 726281DEST_PATH_IMAGE029
Will be
Figure 584778DEST_PATH_IMAGE028
Storing the subimages of which the middle judgment coefficients are less than or equal to the set judgment coefficient threshold value into a set cutLSet;
judgment of
Figure 684321DEST_PATH_IMAGE029
If the number of the middle elements is less than the set number threshold, the division processing of the smL is finished, and the sub-images contained in the current set cutLSet are used as the division processing result.
In the embodiment of the invention, the illumination component image L is divided, when the sub-image is obtained, the smoothing processing is firstly carried out, and then the division processing is carried out on the smoothing processing result, so that the influence of the pixel points with sudden change on the division processing efficiency can be avoided. In the invention, the whole illumination distribution value of the sub-image needs to be acquired, so that the influence of a single mutation pixel point on the whole sub-image is very small, but the influence on the dividing efficiency is very large, and the number of dividing rounds is greatly increased due to the pixel point with the mutation pixel value. The invention can make the difference between the pixel points in the same subimage as small as possible, and the difference between different subimages as large as possible, so that the illumination distribution value is more representative.
Preferably, the S821 includes:
the illumination component image L is smoothed using the following formula:
Figure 494014DEST_PATH_IMAGE030
wherein,
Figure 471460DEST_PATH_IMAGE031
representing the pixel value of pixel point h in smL,
Figure 682998DEST_PATH_IMAGE032
representing a pixelThe set of pixels in the preset size neighborhood of the corresponding pixel in L for point h,
Figure 708768DEST_PATH_IMAGE033
the length of the connecting line between the pixel point corresponding to the pixel point h in the L and the pixel point m is represented,
Figure 740178DEST_PATH_IMAGE034
the pixel value of the pixel point corresponding to the pixel point h in L is represented,
Figure 19850DEST_PATH_IMAGE035
representing the pixel value of pixel m in L,
Figure 587359DEST_PATH_IMAGE036
represent
Figure 28705DEST_PATH_IMAGE032
The variance of the distance between the pixel point in (b) and the corresponding pixel point in (L) of the pixel point (h),
Figure 783297DEST_PATH_IMAGE037
to represent
Figure 866659DEST_PATH_IMAGE032
The variance of the difference between the pixel value of the pixel point h and the pixel point corresponding to the pixel point h in L.
The embodiment of the invention also considers the relation between the pixel point and the surrounding pixel points in the aspects of pixel value and distance while smoothing the pixel point, so that the transition of the pixel value in the smoothed image is more natural, and corresponding detail information can be reserved. If the processing is performed by using a gaussian filter or the like as it is, the detail information is easily lost. Affecting the accuracy of the sub-image division result.
Preferably, the judgment coefficient is calculated by the following formula:
Figure 521631DEST_PATH_IMAGE038
wherein,
Figure 166502DEST_PATH_IMAGE039
which represents the coefficient of judgment,
Figure 438083DEST_PATH_IMAGE040
representing a collection of pixel points in the sub-image,
Figure 561022DEST_PATH_IMAGE041
representing the pixel value of pixel u in smL,
Figure 70501DEST_PATH_IMAGE042
representing the gradient magnitude of the pixel point u in smL,
Figure 384807DEST_PATH_IMAGE043
to represent
Figure 645150DEST_PATH_IMAGE040
The total number of pixel points contained in (a),
Figure 539156DEST_PATH_IMAGE044
a variance reference value in terms of pixel values,
Figure 168721DEST_PATH_IMAGE045
a variance reference value in terms of gradient magnitude is indicated,
Figure 155394DEST_PATH_IMAGE046
representing a preset scaling factor.
In the embodiment of the invention, the judgment coefficient not only considers the pixel value but also considers the gradient amplitude, and the difference between pixel points in the obtained sub-images is smaller by comprehensively considering the two aspects, so that the accuracy of representing the whole sub-image by using a single illumination distribution value by light is improved.
Preferably, S83 includes:
s831: converting the sub-image to an HSV color space;
s832: acquiring an image V of a brightness component corresponding to the sub-image in an HSV color space;
s833: respectively counting the occurrence frequency of each pixel value in the image V;
s834: and taking the pixel value with the highest occurrence frequency as the illumination distribution value of the sub-image.
Since a value is used to represent the whole case, the present invention will present the most pixel values as the illumination distribution values.
Preferably, the S84 includes:
s841: acquiring a characteristic image DS based on the reflection component image S;
s842: dividing the characteristic image DS into a plurality of sub-images;
s843: the division result of the division processing on the feature image DS is applied to the reflection component image S, and a set cutSSet of sub-images is obtained.
Specifically, the embodiment of the present invention does not directly perform the division processing on the reflection component image S, but obtains the feature image based on the reflection component image S and performs the division processing based on the feature image. The arrangement mode can improve the accuracy of the dividing processing result and ensure the dividing speed. In the feature image DS, since the pixel values are obtained by comprehensive calculation from a plurality of aspects, the information that can be expressed by the pixel values is richer than the information that can be expressed by the pixel values of the pixels of the original reflection component image S.
When the feature image is divided, the smoothed illumination component image may be divided, or an existing extracted image division may be used.
In step S843, for example, a set corresponding to the pixel points in the sub-image Q obtained by the feature image DS is DSQ; and acquiring a set SDSQ of corresponding pixel points of the DSQ in the S, and forming the pixel points in the SDSQ into a sub-image.
Preferably, the S841 includes:
for reflectionThe pixel value of the pixel point T in the characteristic image DS is obtained by adopting the following formula
Figure 401567DEST_PATH_IMAGE047
Figure 866309DEST_PATH_IMAGE048
Wherein,
Figure 615959DEST_PATH_IMAGE049
which represents a preset weight coefficient for the weight of the image,
Figure 785252DEST_PATH_IMAGE050
representing pixel point T in the image
Figure 987563DEST_PATH_IMAGE051
The value of the pixel of (1) is,
Figure 754531DEST_PATH_IMAGE052
representing pixel point T in image
Figure 125732DEST_PATH_IMAGE053
The value of the pixel of (a) is,
Figure 952742DEST_PATH_IMAGE054
representing pixel point T in the image
Figure 878235DEST_PATH_IMAGE055
The value of the pixel of (a) is,
Figure 917735DEST_PATH_IMAGE051
for the hue component image of the reflection component image S in the HSV color space,
Figure 376399DEST_PATH_IMAGE053
is an image of the lightness component of the reflected component image S in the HSV color space,
Figure 610196DEST_PATH_IMAGE055
an image representing the luminance component of the reflected component image S in the Lab color space.
Specifically, the pixel values of the pixels in the feature image are weighted and fused from the aspects of hue component, brightness component and brightness component, so that information which can be expressed by one pixel in the feature image is richer.
Preferably, S85 includes:
the following formula is adopted to calculate and optimize the sub-image in the cutSSet:
Figure 787099DEST_PATH_IMAGE056
wherein,
Figure 895870DEST_PATH_IMAGE057
representing sub-images obtained after an optimization process
Figure 976083DEST_PATH_IMAGE058
In (d), the pixel value of the pixel point with coordinates (x, y),
Figure 879317DEST_PATH_IMAGE059
sub-image where pixel point with (x, y) as coordinate is located
Figure 44982DEST_PATH_IMAGE060
The value of the light distribution of (a),
Figure 957443DEST_PATH_IMAGE024
a pixel value representing a pixel point of coordinates (x, y) in the reflection component image S,
Figure 125119DEST_PATH_IMAGE061
sub-image representing the location of pixel point with coordinates (x, y)
Figure 700719DEST_PATH_IMAGE060
Permeability coefficient of (a);
Figure 55477DEST_PATH_IMAGE062
wherein,
Figure 506050DEST_PATH_IMAGE063
and
Figure 826435DEST_PATH_IMAGE064
the weight parameter is represented by a weight value,
Figure 805893DEST_PATH_IMAGE065
respectively representing sub-images
Figure 444684DEST_PATH_IMAGE060
The average value of the red component, the green component and the blue component of the pixel point in the RGB color space,
Figure 923115DEST_PATH_IMAGE066
representing sub-images
Figure 65383DEST_PATH_IMAGE060
The variance of the dark channel values of the pixel points in (1),
Figure 215742DEST_PATH_IMAGE067
representing a preset control coefficient;
Figure 312136DEST_PATH_IMAGE068
the value of (a) is obtained by:
obtaining sub-images
Figure 104512DEST_PATH_IMAGE060
Set of corresponding pixel points of the pixel points in the illumination component image L
Figure 602751DEST_PATH_IMAGE069
Obtaining Inclusion collections in a cutLSet
Figure 189591DEST_PATH_IMAGE069
Set of sub-images of pixel points in (1)
Figure 271816DEST_PATH_IMAGE070
Will be assembled
Figure 369347DEST_PATH_IMAGE070
As the average value of the illumination distribution values of the sub-images in (1)
Figure 751787DEST_PATH_IMAGE068
The value of (c).
When optimization processing is carried out, the same parameters are adopted for calculating the pixel points in the same sub-image, therefore, for the pixel points in the same sub-image, except the first pixel point for optimization processing, other pixel points are calculated by adopting the parameters obtained by the first pixel point for optimization processing, and therefore the efficiency of optimization processing is effectively improved. Specifically, in the invention, when the pixel points of the same sub-image are optimized,
Figure 243948DEST_PATH_IMAGE068
and
Figure 314935DEST_PATH_IMAGE071
except that the pixel point of the first optimization processing needs to be calculated, other pixel points do not need to be calculated.
The dynamic track planning problem modeling is to research and utilize a grid three-dimensional space division method, combine environmental terrain information, establish a three-dimensional terrain flying environment model, analyze the self performance constraint of the unmanned aerial vehicle, simultaneously consider external constraints such as terrain threats (obstacles), atmospheric threats (gusts and thick fog), sudden threats (flying birds), no-fly zones (high-voltage towers) and the like, establish an external environment constraint condition mathematical model, research and utilize the shortest track length, the smallest track threat and the lowest flying height to construct a track evaluation function, and realize the modeling of the dynamic track planning problem.
The precision, accuracy and optimizing speed of the algorithm are requirements of a dynamic track planning problem on the algorithm, so that the design of the algorithm capable of efficiently solving dynamic multi-constraint conditions is the key point of project research, the invention combines reinforcement learning with a differential evolution algorithm to design the algorithm, the reinforcement learning differential evolution algorithm design combines reinforcement learning with the differential evolution algorithm, the design of an action space, a state space and a value reward function in the reinforcement learning algorithm is researched, the relation between a reinforcement learning decision controller and a differential evolution algorithm variation strategy and control parameters is established, and the algorithm can adaptively select parameters and variation strategies in real time when different dynamic optimization problems are solved.
The dynamic track planning of the reinforcement learning differential evolution algorithm is to research a dynamic multi-constraint condition processing strategy, construct a proper track coding mode, find out the algorithm performance of the reinforcement learning-based differential evolution algorithm in the dynamic track planning problem, design a discrete track point smoothing processing algorithm and realize the dynamic planning of the plant protection unmanned aerial vehicle track under the dynamic multi-constraint condition.
The dynamic track planning problem modeling needs to acquire terrain information of an operation area, including the number of mountains, the heights of the mountains, the operation area, the area outline and the like, a grid three-dimensional space division method is adopted to establish a flight environment model, self performance constraints such as the maximum flight range, the minimum flight height, the maximum turning angle, the maximum diving angle, the minimum step length and the like of the plant protection unmanned aerial vehicle are considered, terrain threats, atmospheric threats, sudden threats, no-fly zones and other external environment constraints existing in a hilly mountain area terrain orange planting base are analyzed, and a multi-constraint condition equation is established; and constructing a track evaluation function according to the shortest track length, the lowest flight height and the smallest track threat, and realizing modeling of a track planning problem under the dynamic multi-constraint condition.
The method aims to solve the problems that a differential evolution algorithm is difficult to select a variable strategy when solving different optimization problems, the algorithm performance is further improved and the like. The method comprises the steps of analyzing a single-target optimization problem in a series of continuous domains by using fitness terrain analysis methods such as information entropy roughness and fitness distance correlation to obtain fitness terrain features corresponding to the optimization problem, establishing a relation between the fitness terrain features and a differential evolution algorithm variation strategy by using a random forest, achieving an improved differential evolution algorithm, and adaptively selecting the variation strategy according to the fitness terrain features of the problem when different optimization problems are solved.
And analyzing the single-target optimization problem limited by the boundary condition by adopting a fitness terrain analysis method, researching the relation between the fitness terrain characteristic and the optimization problem, and judging the complexity of the optimization problem through the fitness terrain analysis characteristic. And the differential evolution algorithm based on the local fitness terrain is realized by analyzing the local fitness terrain of the optimization problem.
The application of the reinforcement learning differential evolution algorithm dynamic track planning comprises the following core steps: a constraint condition processing method based on the combination of the adaptive relaxation variables and the feasibility criteria is adopted to process the dynamic constraint equation, so that the constraint conditions are simplified, the number of feasible solutions is increased, and the solving speed of the algorithm is increased; converting the shortest flight path length, the lowest flight height and the minimum flight path threat into three mutually contradictory objective functions by adopting a self-adaptive weight factor; and introducing a flight environment model, coding a population of a reinforcement learning differential evolution algorithm to solve the dynamic track planning problem, providing a new track smoothing algorithm based on 5-order PH curve splicing to smooth track points, verifying the algorithm performance through a numerical simulation experiment, finally embedding the algorithm into an autonomous research and development plant protection unmanned aerial vehicle flight control system, and performing experiment verification in an actual environment.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (6)

1. An unmanned aerial vehicle dynamic flight path planning method based on reinforcement learning difference algorithm is characterized by comprising the following steps: s1: acquiring a terrain environment in which the unmanned aerial vehicle needs to fly;
s2: establishing a flight path planning model according to the acquired environmental data and the performance constraint of the unmanned aerial vehicle, representing the environment as an artificial potential field, establishing a gravitational potential field by taking a target point as a center, and establishing a repulsive potential field by taking an obstacle and a threat as centers;
s3: when a track planning model is established, a function structure body for correcting positioning errors is added, the current resultant force borne by the unmanned aerial vehicle is calculated according to the artificial potential field, and the unmanned aerial vehicle is enabled to advance under the action of the resultant force;
s4: designing a reinforcement learning difference algorithm based on a flight path planning model;
s5: optimizing the reinforcement learning differential algorithm, implanting the optimized reinforcement learning differential algorithm into an unmanned aerial vehicle intelligent system, and solving the algorithm optimized based on the reinforcement learning differential algorithm to complete the flight path planning of the unmanned aerial vehicle;
the design of the S4 mesoscale chemical learning differential evolution algorithm comprises the following steps:
s31: combining reinforcement learning and a differential evolution algorithm, and adopting a Q learning algorithm or a deep Q learning algorithm as an intelligent agent to carry out intelligent decision;
s32: analyzing the optimization problem by using the dispersion measurement, the autocorrelation roughness, the terrain information roughness and the adaptability cloud, and using the adaptability terrain feature information of the optimization problem as the state space of the reinforcement learning intelligent agent;
s33: selecting a control parameter and a variation strategy of a differential evolution algorithm as an action space of the intelligent agent, and designing population evolution efficiency as reward of the intelligent agent;
s34: and finally, the intelligent agent obtains the local information of the optimization problem through the state space, executes the corresponding operation of the action space according to the state space information, calculates the reward obtained after the corresponding action operation is executed, and returns the reward to the intelligent agent.
2. The unmanned aerial vehicle dynamic track planning method based on the reinforcement learning difference algorithm according to claim 1, wherein the step of adding a function structure body for positioning error correction in the step S3 comprises the following steps:
s21: setting an unmanned aerial vehicle track planning area consisting of 1 departure point, 1 destination, R horizontal correction points and L vertical correction points of the unmanned aerial vehicle;
s22: constructing an unmanned aerial vehicle track planning area containing a point of 2+ R + L, wherein the unmanned aerial vehicle needs to be positioned in real time in the space flight process, the positioning error comprises a vertical error and a horizontal error, the vertical error and the horizontal error are respectively increased by delta special units when the unmanned aerial vehicle flies for 1m, and the vertical error and the horizontal error are both smaller than theta units when the unmanned aerial vehicle reaches a target point, so that the unmanned aerial vehicle can fly according to the planned track;
s23: the unmanned aerial vehicle needs to correct the positioning error in the flight process, correction points exist in a track planning area and are used for error correction, when the unmanned aerial vehicle reaches the correction points, the error correction can be carried out according to the error correction types of the correction points, the positions for correcting vertical and horizontal errors are determined before track planning according to terrain, when the vertical errors and the horizontal errors can be corrected in time, the unmanned aerial vehicle can fly according to a preset route, and finally reaches a destination after error correction is carried out through a plurality of correction points.
3. The dynamic unmanned aerial vehicle track planning method based on the reinforcement learning difference algorithm as claimed in claim 1, wherein the resultant force calculation in S3 determines the motion direction of the unmanned aerial vehicle according to the following formula:
Figure 37655DEST_PATH_IMAGE001
Figure 234763DEST_PATH_IMAGE002
wherein,
Figure 759285DEST_PATH_IMAGE003
indicating the attraction of the target to the drone,
Figure 44773DEST_PATH_IMAGE004
is the coordinate vector of the target, and X is the coordinate vector of the current position of the drone; k is coefficient, and the value is 0-1;
Figure 312943DEST_PATH_IMAGE005
the repulsion force of the no-fly zone to the unmanned aerial vehicle is expressed and is completed by adopting the existing repulsion field function
Figure 367487DEST_PATH_IMAGE006
Calculating; the resultant force F of the attraction force and the repulsion force is the direction of the unmanned aerial vehicle movement.
4. The dynamic unmanned aerial vehicle track planning method based on the reinforcement learning differential algorithm as claimed in claim 1, wherein in S5, the algorithm optimized based on the reinforcement learning differential algorithm is solved to complete the track planning of the unmanned aerial vehicle and the obstacle avoidance of the constraint condition on the track.
5. The unmanned aerial vehicle dynamic track planning method based on the reinforcement learning difference algorithm of claim 4, wherein the constraint condition obstacle avoidance comprises the following steps:
s61: inputting the initial position of the unmanned aerial vehicle as the current position
Figure 328490DEST_PATH_IMAGE007
The central positions of the m no-fly zones,
Figure 366853DEST_PATH_IMAGE008
and a target position G assigned by the drone;
s62: taking two variables G1 and G2, respectively representing a target position in the calculation process and a final target position, and initializing G1= G2= G; opening up two storage spaces A and B and using the current position of the unmanned aerial vehicle
Figure 173135DEST_PATH_IMAGE009
Storing in A; initializationThe iteration number num =0;
s63: determining the motion direction of the unmanned aerial vehicle, setting the motion step length of the unmanned aerial vehicle to be L, and enabling the unmanned aerial vehicle to move from the current position
Figure 347764DEST_PATH_IMAGE011
Moving according to the movement step length L in the determined movement direction, and updating the current position with the moved position
Figure 214089DEST_PATH_IMAGE013
And storing the position of the unmanned aerial vehicle in A, wherein the iteration number num = num +1;
s64: judging whether num > N is true, if yes, setting num =0 and performing step S65, otherwise, returning to step S63; wherein N is a preset total number of iterations;
s65: judging the current position
Figure 474169DEST_PATH_IMAGE014
Whether the distance d from G1 satisfies d<
Figure 349721DEST_PATH_IMAGE015
In which
Figure 113278DEST_PATH_IMAGE016
Is a preset distance threshold;
s66: judging whether the last M position points stored in A are all in a preset circular area, if so, indicating that the position points are in a balance position or a local minimum point currently, and performing jump-out processing; if not, continuing to step S63;
s67: solving a straight line expression between two points stored in the last step A;
s68: judging whether the straight line intersects with each circular no-fly zone, if not, returning to the step S63, otherwise, assigning the last stored position of A to G1, emptying A, and then performing the step S63;
s69: storing all the positions in A into B, judging whether G1 is equal to G2, if not, making order
Figure 681662DEST_PATH_IMAGE018
= G1, G1= G2, and then proceeds to step S63;
s610: and the position points stored in the B are the obstacle avoidance tracks of the unmanned aerial vehicle.
6. The unmanned aerial vehicle dynamic track planning method based on the reinforcement learning difference algorithm as claimed in claim 1, wherein the establishing of the track planning model in S2 further comprises the following steps:
s71: acquiring image data of a target area including surface topography data and plant data;
s72: obtaining an initial route of the unmanned aerial vehicle based on the image data of the target area;
s73: extracting a first actual geographic coordinate of the initial route based on the inflection point position on the initial route, and adjusting the first actual geographic coordinate based on the elevation value of the terrain data of the earth surface to obtain a first elevation coordinate;
s74: adjusting the initial route based on the first elevation coordinate to obtain a terrain route;
s75: dividing the initial route into sections at a preset distance, and extracting a second actual geographic coordinate of an endpoint of each section point by point;
s76: adjusting a second actual geographic coordinate based on the crop planting data to obtain a second elevation coordinate, and adjusting the initial air route based on the second elevation coordinate to obtain a crop planting air route;
s77: and establishing a flight path planning model based on the terrain route and the plant crop route.
CN202211195962.8A 2022-09-29 2022-09-29 Unmanned aerial vehicle dynamic track planning method based on reinforcement learning difference algorithm Active CN115290096B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211195962.8A CN115290096B (en) 2022-09-29 2022-09-29 Unmanned aerial vehicle dynamic track planning method based on reinforcement learning difference algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211195962.8A CN115290096B (en) 2022-09-29 2022-09-29 Unmanned aerial vehicle dynamic track planning method based on reinforcement learning difference algorithm

Publications (2)

Publication Number Publication Date
CN115290096A CN115290096A (en) 2022-11-04
CN115290096B true CN115290096B (en) 2022-12-20

Family

ID=83834641

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211195962.8A Active CN115290096B (en) 2022-09-29 2022-09-29 Unmanned aerial vehicle dynamic track planning method based on reinforcement learning difference algorithm

Country Status (1)

Country Link
CN (1) CN115290096B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116400722B (en) * 2023-05-10 2024-07-09 江苏方天电力技术有限公司 Unmanned aerial vehicle obstacle avoidance flight method and related device
CN116540723B (en) * 2023-05-30 2024-04-12 南通大学 Underwater robot sliding mode track tracking control method based on artificial potential field
CN116412831B (en) * 2023-06-12 2023-09-19 中国电子科技集团公司信息科学研究院 Multi-unmanned aerial vehicle dynamic obstacle avoidance route planning method for recall and anti-dive
CN117668497B (en) * 2024-01-31 2024-05-07 山西卓昇环保科技有限公司 Carbon emission analysis method and system based on deep learning under environment protection

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8626565B2 (en) * 2008-06-30 2014-01-07 Autonomous Solutions, Inc. Vehicle dispatching method and system
CN109540163B (en) * 2018-11-20 2022-06-07 太原科技大学 Obstacle avoidance path planning algorithm based on combination of differential evolution and fuzzy control
CN110673637B (en) * 2019-10-08 2022-05-13 福建工程学院 Unmanned aerial vehicle pseudo path planning method based on deep reinforcement learning
US20210123741A1 (en) * 2019-10-29 2021-04-29 Loon Llc Systems and Methods for Navigating Aerial Vehicles Using Deep Reinforcement Learning
CN112286203B (en) * 2020-11-11 2021-10-15 大连理工大学 Multi-agent reinforcement learning path planning method based on ant colony algorithm
CN112712193B (en) * 2020-12-02 2024-08-02 南京航空航天大学 Multi-unmanned aerial vehicle local route planning method and device based on improved Q-Learning
US20220214692A1 (en) * 2021-01-05 2022-07-07 Ford Global Technologies, Llc VIsion-Based Robot Navigation By Coupling Deep Reinforcement Learning And A Path Planning Algorithm
CN113255890A (en) * 2021-05-27 2021-08-13 中国人民解放军军事科学院评估论证研究中心 Reinforced learning intelligent agent training method based on PPO algorithm
CN113359744B (en) * 2021-06-21 2022-03-01 暨南大学 Robot obstacle avoidance system based on safety reinforcement learning and visual sensor
CN113534838A (en) * 2021-07-15 2021-10-22 西北工业大学 Improved unmanned aerial vehicle track planning method based on artificial potential field method
CN114048689B (en) * 2022-01-13 2022-04-15 南京信息工程大学 Multi-unmanned aerial vehicle aerial charging and task scheduling method based on deep reinforcement learning

Also Published As

Publication number Publication date
CN115290096A (en) 2022-11-04

Similar Documents

Publication Publication Date Title
CN115290096B (en) Unmanned aerial vehicle dynamic track planning method based on reinforcement learning difference algorithm
CN109614985B (en) Target detection method based on densely connected feature pyramid network
CN110544296B (en) Intelligent planning method for three-dimensional global track of unmanned aerial vehicle in uncertain enemy threat environment
CN110806756B (en) Unmanned aerial vehicle autonomous guidance control method based on DDPG
CN110181508B (en) Three-dimensional route planning method and system for underwater robot
CN108319286A (en) A kind of unmanned plane Air Combat Maneuvering Decision Method based on intensified learning
CN106970615A (en) A kind of real-time online paths planning method of deeply study
CN110531786B (en) Unmanned aerial vehicle maneuvering strategy autonomous generation method based on DQN
CN109492556A (en) Synthetic aperture radar target identification method towards the study of small sample residual error
CN112462803B (en) Unmanned aerial vehicle path planning method based on improved NSGA-II
CN116954233A (en) Automatic matching method for inspection task and route
CN115357031B (en) Ship path planning method and system based on improved ant colony algorithm
CN117556979B (en) Unmanned plane platform and load integrated design method based on group intelligent search
CN115060263A (en) Flight path planning method considering low-altitude wind and energy consumption of unmanned aerial vehicle
CN117214904A (en) Intelligent fish identification monitoring method and system based on multi-sensor data
CN110147816A (en) A kind of acquisition methods of color depth image, equipment, computer storage medium
CN115933693A (en) Robot path planning method based on adaptive chaotic particle swarm algorithm
Short et al. Abio-inspiredalgorithminimage-based pathplanning and localization using visual features and maps
CN117948976B (en) Unmanned platform navigation method based on graph sampling and aggregation
CN117724524A (en) Unmanned aerial vehicle route planning method based on improved spherical vector particle swarm algorithm
CN114972429B (en) Target tracking method and system for cloud edge cooperative self-adaptive reasoning path planning
CN116795098A (en) Spherical amphibious robot path planning method based on improved sparrow search algorithm
CN113589810B (en) Dynamic autonomous obstacle avoidance movement method and device for intelligent body, server and storage medium
CN115454096A (en) Robot strategy training system and training method based on curriculum reinforcement learning
CN113034598A (en) Unmanned aerial vehicle power line patrol method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant