CN115574826A

CN115574826A - National park unmanned aerial vehicle patrol path optimization method based on reinforcement learning

Info

Publication number: CN115574826A
Application number: CN202211572414.2A
Authority: CN
Inventors: 郭强辉; 殷虹娇; 张鹏; 王永峰; 宋尚源; 刘兆泽; 高琳
Original assignee: Beijing Deepiot Technology Co ltd; Nankai University
Current assignee: Beijing Deepiot Technology Co ltd; Nankai University
Priority date: 2022-12-08
Filing date: 2022-12-08
Publication date: 2023-01-06
Anticipated expiration: 2042-12-08
Also published as: CN115574826B

Abstract

The invention discloses a national park Unmanned Aerial Vehicle (UAV) patrol path optimization method based on reinforcement learning, which comprises the steps of taking an unmanned aerial vehicle flight path as an optimization target, adding constraint conditions of unmanned aerial vehicle traversal path points, unmanned aerial vehicle electric quantity limitation and path point task execution energy consumption, and establishing an UAV path planning model with a self-service charging function; then respectively corresponding the unmanned aerial vehicle, the path points, the charging base station, the energy, the battery capacity, the flight path energy consumption and the path point task energy consumption in the unmanned aerial vehicle path planning model to a CVRP problem model; the unmanned aerial vehicle patrol route planning problem which needs to consider side energy consumption constraint and point energy consumption constraint originally is reduced into a CVRP problem which takes the route length as an optimization target and takes the customer demand and the vehicle load as constraints by using a feedforward weighting method; and finally, solving the reduced CVRP problem by using a multi-decoder attention model.

Description

National park unmanned aerial vehicle patrol path optimization method based on reinforcement learning

Technical Field

The invention belongs to the technical field of computer intelligent calculation and unmanned aerial vehicle flight control, and particularly relates to a national park unmanned aerial vehicle patrol route optimization method based on reinforcement learning.

Background

The field patrol monitoring is the most important ecological monitoring and daily supervision means in national parks and natural conservation places, and a patrol guard collects data in the aspects of wild species population, habitat, phenology and the like through patrol monitoring, can timely discover ecological environment problems, inhibit illegal activities and the like, realizes effective protection on the national parks and the natural conservation places, and provides decision basis for natural resource supervision. However, national parks and natural protection lands have large areas, wide ranges and complex terrains, people and vehicles in most regions are difficult to reach, and the traditional manual patrol mode has low efficiency, wastes time and labor. Therefore, in recent years, unmanned aerial vehicles are increasingly used for patrol monitoring work of various natural protection places.

The unmanned aerial vehicle technology is an unmanned aerial vehicle remote sensing technology which is realized by fusing an aircraft technology, a communication technology, a GPS (global positioning system), a differential positioning technology and an image technology, and automatic acquisition and transmission of monitoring data are realized by carrying sensing equipment such as a high-definition camera and an intelligent sensor and combining a wireless communication network. The existing unmanned aerial vehicle used for patrol monitoring of national parks and natural conservation places has the challenges of short endurance, high requirement on flight control personnel, difficult storage and transportation of airplanes, high application integration difficulty and the like, and is difficult to meet the application requirements of normalized monitoring.

The automatic airport of unmanned aerial vehicle is the ground automation facility of assisting unmanned aerial vehicle full flow operation, for unmanned aerial vehicle provides all-weather protection, through automatic opening and shutting, go up and down, get and unload structural design, let unmanned aerial vehicle take off, descend, deposit and battery management all can accomplish automatically, need not artificial intervention. The unmanned aerial vehicle is stored in the automatic airport, and when flight demands exist, the unmanned aerial vehicle takes off from the airport autonomously, and automatically lands in the automatic airport after the task is finished, so that charging is carried out in the automatic airport, preparation is made for the next task, and full-automatic operation is realized.

For realizing the normalized development of unmanned aerial vehicle in national park and the ecological monitoring work of nature protected area, satisfy the field and patrol and protect the monitoring management demand, this patent carries out path planning, electric quantity state control, commander's dispatch to unmanned aerial vehicle based on the automatic airport of unmanned aerial vehicle, and very big degree promotes unmanned aerial vehicle and patrols and protects monitoring efficiency.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a national park unmanned aerial vehicle patrol route optimization method based on reinforcement learning.

The invention is realized by the following technical scheme:

a national park unmanned aerial vehicle patrol path optimization method based on reinforcement learning comprises the following steps:

step 1: inputting three-dimensional terrain data to generate a bounded three-dimensional region

According to the performance and patrol requirement of the airborne camera of the unmanned aerial vehicle, a path point set is set above the area in the air

The unmanned aerial vehicle is required to complete the visual coverage task after traversing all path points;

and 2, step: taking the flight path of the unmanned aerial vehicle as an optimization target, adding constraint conditions of traversal path points of the unmanned aerial vehicle, electric quantity limitation of the unmanned aerial vehicle and task execution energy consumption of the path points, and establishing an unmanned aerial vehicle path planning model with a self-service charging function;

and step 3: respectively corresponding unmanned aerial vehicle, path points, charging base stations, energy, battery capacity, flight path energy consumption and path point task energy consumption in the established unmanned aerial vehicle path planning model with the self-service charging function to vehicles, customers, warehouses, goods, the maximum cargo capacity of the vehicles, the path length and customer requirements in the CVRP problem model; defining new path point task energy consumption by using a feedforward weighting method, so that the new path point task energy consumption comprises the task energy consumption of a path point and the average edge energy consumption reaching the path point; corresponding the obtained new path point task energy consumption to the client requirement of the CVRP problem model, and further reducing the unmanned aerial vehicle patrol path planning problem into a CVRP problem which takes the path length as an optimization target and takes the client requirement and the vehicle cargo load as constraints;

and 4, step 4: the CVRP problem reduced in step 3 is solved using a multi-decoder attention model.

In the above technical solution, in step 2, an unmanned aerial vehicle path planning model with a self-service charging function is established, and the specific steps are as follows:

step 2.1: defining flight path decision variables for dronesx _ij ；

x _ij =1, representing unmanned aerial vehicle from a waypointiFly to the waypointj；

x _ij =0, meaning that the drone is not following a waypointiFly to the waypointj；

Defining an objective function:

（1）

wherein the content of the first and second substances,

is flight path energy consumption and represents the path point of the unmanned planeiAnd a waypointjEnergy consumption is needed;

the flight path decision variables are to form a complete and feasible one-time traversal path, and the constraints are as follows:

（2）

（3）

step 2.2: aiming at the self-service charging function of the unmanned aerial vehicle, the route planning with the charging base station is adjusted, the energy consumption of the unmanned aerial vehicle is measured according to the flight path, and the maximum endurance of the unmanned aerial vehicle is recorded asQDefining the energy loss variable

The charging base station is the starting point of the unmanned aerial vehicle and is recorded as

；

Remaining range of the drone during performance of the mission not exceeding maximum range

Is expressed as follows:

（4）

（5）

wherein the content of the first and second substances,

is a path point

Task energy consumption, representing the point of unmanned aerial vehicle completing path

The required energy consumption of the patrol task is reduced,

representing points of a path

Points of other paths

To the path point

The decision variables of the edges of (a) are,

indicating unmanned aerial vehicle slave waypoints

Performing a mission to fly to a waypoint

The residual energy after the reaction;

when the unmanned aerial vehicle leaves the charging base station, the electric quantity is full, and the formula is as follows:

（6）

indicating that the unmanned aerial vehicle leaves the charging base station to reach the waypoint

The residual energy of the waste water is the energy,

indicating that the unmanned aerial vehicle flies to a waypoint from a charging base station

The decision-making variables of (a) are,

is a path point

Energy consumption required by the patrol task.

In the above technical scheme, in step 3, firstly, under the condition that the edge energy consumption constraint between the path point and the path point is not considered, a deep reinforcement learning method is used to independently solve the CVRP problem corresponding to the unmanned aerial vehicle patrol path for multiple times, and the number of the solution times is recorded as

And training the neural network in the deep reinforcement learning model again every time of solving, and using the neural network trained every time for predicting the CVRP problem corresponding to the original unmanned aerial vehicle patrol problem

The secondary solution is obtained

Grouping different solutions to form a solution set

Solution set

Therein comprises

Planting a patrol path scheme;

redefining new task point energy consumption variables on the basis of the known solution set

；

（7）

Wherein the content of the first and second substances,

representing points of a path

To the path point

Is in the solution set

The number of occurrences in (1) is equivalent to the weighted average of the path energy consumption required for reaching any path point, and the weight is

Then the solution set is obtained by optimizing the path length of the reference total patrol task

。

In the above technical solution, the solving process of step 4 includes the following steps:

step 4.1: firstly, according to the scale of input information, several groups of data sets with identical path point quantity are produced, and said data sets are equipped with

Group data set, first

The information in the group dataset comprises a randomly generated starting point

And the position of the path point

And randomly generated waypoint task energy consumption

Wherein

；

Step 4.2: using generated

Training the multi-decoder attention model in a block data set, where the parameters of the encoder and decoder are

The model is trained by a strategy gradient algorithm with baseline, and parameters of the optimized model are continuously updated circularly to obtain a trained attention model of the multi-decoder;

step 4.3: after the training of the model parameters is finished, inputting the data of the task planning problem of the original unmanned aerial vehicle as a reduced CVRP problem example into the trained model, and taking the output sequence of the model at the moment as a path point access scheme of the unmanned aerial vehicle patrol problem.

In the above technical solution, in step 4.3, the data of the original unmanned aerial vehicle mission planning problem includes a starting point

、

A path point

And information of energy consumption of each path point task, wherein the energy consumption of the path point task refers to the energy consumption of the new path point task defined in the step 2.

The invention has the advantages and beneficial effects that:

the base station is introduced to provide real-time charging service for the working unmanned aerial vehicle, and the unmanned aerial vehicle can access the base station to perform charging for multiple times when executing tasks. Under the system, a constraint formula is constructed by taking the optimized unmanned aerial vehicle task path length as a target, a multi-unmanned aerial vehicle path planning model is established, and the problem is converted into a combined optimization problem. A known combined optimization solver is utilized, a feedforward weighting method is designed to calculate the path energy consumption constraint, and the problem is further converted into a vehicle path problem (CVRP) with capacity limitation. In addition, the deep reinforcement learning method based on the multi-decoder attention model can stably output a high-quality solution of a visual coverage problem for a specific scene, has generalization capability for solving the reduced unmanned aerial vehicle path planning problem, has strong adaptability to a training data set, and can guarantee an efficient training network for path planning under different scenes to obtain the high-quality solution. Based on a trained learning model, the result can be quickly obtained by only calling neural network prediction after the unmanned aerial vehicle path problem example is reduced, the solving speed is higher than the efficiency of the traditional search algorithm, and the decision requirement of the unmanned aerial vehicle quick scheduling planning can be met.

Drawings

FIG. 1 is a flow chart of the national park unmanned aerial vehicle patrol route optimization method based on reinforcement learning.

FIG. 2 is a flow chart of a solution of a multi-decoder attention model to an example problem.

For a person skilled in the art, other relevant figures can be obtained from the above figures without inventive effort.

Detailed Description

In order to make the technical solution of the present invention better understood, the technical solution of the present invention is further described below with reference to specific examples.

A national park unmanned aerial vehicle patrol path optimization method based on reinforcement learning is disclosed, referring to the attached figure 1, and comprises the following steps:

According to the unmanned plane on-board shootingHead-like performance and patrol requirements set a set of waypoints in the air above the area

Obtaining initial data

And the unmanned aerial vehicle is required to complete the visual coverage task after traversing all path points.

Step 2: and establishing a constraint formula, taking the flight path of the unmanned aerial vehicle as an optimization target, adding constraint conditions of traversal path points of the unmanned aerial vehicle, electric quantity limitation of the unmanned aerial vehicle and energy consumption of task execution of the path points, and establishing an unmanned aerial vehicle path planning model with a self-service charging function without considering uncontrollable factors such as wind power, visibility and unmanned aerial vehicle faults. The method comprises the following specific steps.

Step 2.1: defining flight path decision variables for an unmanned aerial vehiclex _ij ；

Defining an objective function:

（1）

wherein the content of the first and second substances,

is flight path energy consumption and represents the path point of the unmanned planeiAnd a waypointjThe energy consumption generated between the unmanned aerial vehicle and the unmanned aerial vehicle is in direct proportion to the distance between the path points, and the aim of the task is to optimize the flight path of the unmanned aerial vehicle and minimize the flight path on the premise of completing the task aim. Meanwhile, the flight path decision variables need to ensure that a complete and feasible one-time traversal path can be formed, and the specific constraints are as follows：

（2）

（3）

Charging base station is the departure point of the unmanned aerial vehicle and is recorded

。

First, the drone consumes energy as it moves between waypoints and the remaining range of the drone during the mission should not exceed the maximum range

Is given by the following equation:

（4）

（5）

wherein, the first and the second end of the pipe are connected with each other,

is a path point

The energy consumption required by the patrol task is reduced,

representing points of a path

Points of other routes

To the path point

The decision variables of the edges of (a) are,

representing unmanned aerial vehicle slave waypoints

Performing a mission to fly to a waypoint

The remaining energy (i.e., electricity).

Secondly, when unmanned aerial vehicle leaves charging base station, the electric quantity is full, and the formula is expressed as follows:

（6）

The residual energy of the waste water is the energy,

indicating that the drone is flying from the charging base stationTo the path point

The decision variable(s) of (a),

is a path point

Energy consumption required by the patrol task.

In conclusion, an unmanned aerial vehicle path planning model with a self-service charging function is established, and the model comprises an objective function (1) and constraint formulas (2), (3), (4), (5) and (6). The solution of this model is a combinatorial optimization problem promptly, that is to say, the unmanned aerial vehicle patrols the route planning problem and transforms for a combinatorial optimization problem.

And step 3: referring to table 1, the unmanned aerial vehicle, the waypoints, the charging base station, the energy (i.e., the electric quantity), the battery capacity, the flight path energy consumption, and the waypoint task energy consumption in the unmanned aerial vehicle path planning model with the self-service charging function, which are established as above, are respectively corresponding to the maximum cargo capacity, the path length, and the customer demand of the vehicle, the customer, the warehouse, the goods, and the vehicle in the CVRP problem (the capacity-limited vehicle path solving problem) model, and then the unmanned aerial vehicle path planning model is converted into the capacity-limited vehicle path solving problem (CVRP).

Table 1: correspondence between unmanned aerial vehicle path planning and CVRP problem model

The energy consumption of the unmanned aerial vehicle comprises the side energy consumption from the path point to the path point and the point energy consumption required by the path point to complete the patrol task, but in the CVRP problem model, the side energy consumption is only used as an optimization target for planning the vehicle path, and only the point energy consumption is used as a constraint condition of the vehicle path. Therefore, the invention uses a feedforward weighting method to enable point energy consumption to replace 'point plus edge energy consumption', and then add edge energy consumption into the constraint condition, so that the problem of unmanned aerial vehicle patrol route planning which originally needs to consider edge energy consumption constraint and point energy consumption constraint is reduced to a CVRP problem which takes the route length as an optimization target and takes customer requirements and vehicle cargo as constraints. The specific treatment method is as follows.

Firstly, under the condition of not considering the limit energy consumption constraint, a deep reinforcement learning method is used for independently solving the CVRP problem corresponding to the unmanned aerial vehicle patrol path for multiple times, and the solving times are recorded as

And (2) training the neural network in the deep reinforcement learning model again (or independently) every time of solving, using the neural network trained every time for predicting the CVRP problem corresponding to the original unmanned aerial vehicle patrol problem, wherein the generation and extraction of the training set are random, so that the method has the advantages of high reliability, high accuracy and low cost

Total of sub-training

The neural networks are different, and the prediction results of the neural networks are different, so that the neural networks can obtain

Grouping different solutions to form a solution set

Solution set

Therein comprises

And a patrol path scheme is adopted.

Redefining new path point task energy consumption based on known solution set

(i.e., waypoints)

Energy consumption required for completing the patrol task):

（7）

wherein the content of the first and second substances,

representing points of a path

To the path point

Is in the solution set

The number of occurrences in (1) is equivalent to weighted average of the path energy consumption required for reaching any path point, and the weight is

Then refer to the solution set optimized by the total patrol task path length

。

The obtained new path point task energy consumption

Customer requirements for the CVRP problem model so that new waypoint tasks consume energy

The task energy consumption of a path point and the average side energy consumption for reaching the path point are included, and the patrol path problem which originally needs to consider side energy consumption constraint and point energy consumption constraint is reduced to a CVRP problem which takes path length as an optimization target and takes customer demand and vehicle cargo as constraints.

The data of the unmanned aerial vehicle path planning problem comprises a starting point

Information and

a path point

And information of task energy consumption of each path point (the path point task energy consumption refers to new path point task energy consumption defined in the step 2), and the information is reduced to information of warehouse, client demand and the like in the CVRP problem according to the step 3 and is used as input information of the model. The encoder structure of the model is based on a transformer model, a plurality of decoders with the same structure and independent parameters are used in a decoder part, the difference degree of construction solutions between the decoders is measured by Kullback-Leibler divergence (abbreviated as 'KL divergence') between probability distributions calculated by different decoders, and in addition, each decoder increases the masking of nodes when calculating attention weights and is used for realizing task path constraint in the CVRP problem. The model is trained by a policy gradient algorithm with baseline and a plurality of data sets which are randomly generated and have the same scale with the problem to be solved. Referring to fig. 2, the specific solving process is as follows.

Step 4.1: firstly, groups with the same path point number (namely, the same path point number) are generated according to the scale of input information

) Assuming common data sets of

Group data set, in order

For the example of a group dataset, the information therein includes a randomly generated starting point

And the position of the path point

And randomly generated waypoint task energy consumption

Wherein

。

Step 4.2: using generated

The model is trained by a policy gradient algorithm with baseline, model parameters are continuously updated and optimized in a circulating mode, the training target is the model parameters for optimizing the shortest path length of a client access scheme and KL divergence of decoder parameters, and the model parameters are recorded

The total length of the task path is obtained for the solution under the model parameters, and is recorded

And (4) carrying out parameter training for the KL divergence of the decoder parameters under the model parameters according to the following algorithm to obtain the trained attention model of the multi-decoder.

The reinforcement learning algorithm with baseline is as follows:

1, inputting

Group dataset, significance level

Training period

；

2, initializing model parameters

；

3, recording baseline parameters

；

4, current number of training times

；

5, combining the optimization objectives according to the current

Group dataset and parameters

Calculating the task path length and KL divergence of the output result of the model

Optimizing direction

；

6 according to the optimization direction

Updating parameters using Adam function

；

7, using t test parameters

And Baseline parameters

If the significance is less than

Update, update

；

8, if

，

Returning to the step 5; otherwise, turning to the next step;

9, training is finished, and the obtained parameters are

The multi-decoder attention model of (1).

Step 4.3: after the training of the model parameters is finished, the data (including the starting point) of the original unmanned aerial vehicle mission planning problem is processed

、

A path point

And information of task energy consumption of each path point) as a reduced CVRP problem instance, inputting the trained model, and taking an output sequence of the model at the moment as a path of the unmanned aerial vehicle patrol problemA point access scheme.

The invention being thus described by way of example, it should be understood that any simple alterations, modifications or other equivalent alterations as would be within the skill of the art without the exercise of inventive faculty, are within the scope of the invention.

Claims

1. A national park Unmanned Aerial Vehicle (UAV) patrol path optimization method based on reinforcement learning is characterized by comprising the following steps of:

step 2: taking the flight path of the unmanned aerial vehicle as an optimization target, adding constraint conditions of traversal path points of the unmanned aerial vehicle, electric quantity limitation of the unmanned aerial vehicle and energy consumption of task execution of the path points, and establishing an unmanned aerial vehicle path planning model with a self-service charging function;

and 3, step 3: respectively corresponding unmanned aerial vehicle, path points, charging base stations, energy, battery capacity, flight path energy consumption and path point task energy consumption in the established unmanned aerial vehicle path planning model with the self-service charging function to vehicles, customers, warehouses, goods, the maximum cargo capacity of the vehicles, the path length and customer requirements in the CVRP problem model; defining new path point task energy consumption by using a feedforward weighting method, so that the new path point task energy consumption comprises the task energy consumption of a path point and the average edge energy consumption reaching the path point; corresponding the obtained new path point task energy consumption to the client requirement of the CVRP problem model, and further reducing the unmanned aerial vehicle patrol path planning problem into a CVRP problem which takes the path length as an optimization target and takes the client requirement and the vehicle cargo load as constraints;

2. The reinforcement learning-based national park unmanned aerial vehicle patrol route optimization method according to claim 1, wherein: in step 2, an unmanned aerial vehicle path planning model with a self-service charging function is established, and the method specifically comprises the following steps:

x _ij =1, representing unmanned aerial vehicle from a waypointiFly to waypointj；

x _ij =0, meaning that the drone is not following a waypointiFly to waypointj；

Defining an objective function:

（1）

wherein the content of the first and second substances,

the flight path decision variables need to form a complete and feasible one-time traversal path, and the constraints are as follows:

（2）

（3）

step 2.2: for no oneThe self-service charging function of the unmanned aerial vehicle is adjusted to adjust the route planning with the charging base station, the energy consumption of the unmanned aerial vehicle is measured according to the flight path, and the maximum endurance of the unmanned aerial vehicle is recorded asQDefining the energy loss variable

；

The remaining range of the drone during execution of the mission does not exceed the maximum range

Is expressed as follows:

（4）

（5）

wherein the content of the first and second substances,

is a path point

Task energy consumption, representing the point of path completed by the unmanned aerial vehicle

The energy consumption required by the patrol task is reduced,

representing points of a path

Points of other routes

To the path point

The decision variables of the edges of (a) are,

indicating unmanned aerial vehicle slave waypoints

Performing a mission to fly to a waypoint

The residual energy after the reaction;

when unmanned aerial vehicle leaves charging base station, the electric quantity is full, and the formula is as follows:

（6）

The residual energy of the waste water is the energy,

The decision-making variables of (a) are,

is a path point

Energy consumption required by the patrol task.

3. The reinforcement learning-based national park unmanned aerial vehicle patrol route optimization method according to claim 2, wherein: in step 3, firstly, under the condition of not considering the edge energy consumption constraint between the path points, a deep reinforcement learning method is used for independently solving the CVRP problem corresponding to the unmanned aerial vehicle patrol path for multiple times, and the solving times are recorded as

And (3) retraining the neural network in the deep reinforcement learning model every time of solving, and using the neural network trained every time to predict the CVRP problem corresponding to the original unmanned aerial vehicle patrol problem

The secondary solution is obtained

Grouping different solutions to form solution sets

Solution set

Therein comprises

A patrol path scheme is planted;

in a known solution setOn the basis, redefining new task point energy consumption variables

；

（7）

representing points of a path

To the path point

Is in the solution set

。

4. The reinforcement learning-based national park unmanned aerial vehicle patrol route optimization method according to claim 1, wherein: the solving process of the step 4 comprises the following steps:

step 4.1: firstly, according to the scale of input information, several groups of data sets with same path point quantity are generated, and said method is characterized by that

Group data set, first

And the position of the path point

And randomly generated waypoint task energy consumption

Wherein

；

Step 4.2: using generated

5. The reinforcement learning-based national park unmanned aerial vehicle patrol route optimization method according to claim 4, wherein: in step 4.3, the data of the original unmanned aerial vehicle mission planning problem comprises a starting point

、

A path point