CN113433967A

CN113433967A - Chargeable unmanned aerial vehicle path planning method and system

Info

Publication number: CN113433967A
Application number: CN202110631925.6A
Authority: CN
Inventors: 王莉; 费爱国; 徐连明; 王雪夫; 付玮琦
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-06-07
Filing date: 2021-06-07
Publication date: 2021-09-24
Anticipated expiration: 2041-06-07
Also published as: CN113433967B

Abstract

The invention provides a chargeable unmanned aerial vehicle path planning method and a chargeable unmanned aerial vehicle path planning system, which comprise the following steps: acquiring an unmanned aerial vehicle and a sensor set, and determining a system model, a channel model and an unmanned aerial vehicle energy consumption model; establishing a total energy consumption objective function of the unmanned aerial vehicle based on the system model, the channel model and the unmanned aerial vehicle energy consumption model; based on a sensor clustering mechanism and a cluster head selection mechanism, performing combined optimization on an unmanned aerial vehicle flight path strategy and a charging strategy in the total energy consumption objective function of the unmanned aerial vehicle according to a DQN algorithm, and realizing the purpose of minimizing the energy consumption of all sensor cluster head nodes after the unmanned aerial vehicle collects. According to the method, aiming at the scene that the unmanned aerial vehicle needs to return to charge, a deep reinforcement learning method is adopted, and the unmanned aerial vehicle continuously interacts with the environment, so that the combined optimization of the flight path and the charging strategy is realized.

Description

Chargeable unmanned aerial vehicle path planning method and system

Technical Field

The invention relates to the technical field of unmanned aerial vehicles, in particular to a chargeable unmanned aerial vehicle path planning method and system.

Background

Unmanned aerial vehicle technology has advanced significantly over the past few years and has found widespread use in many areas.

In the communication field, unmanned aerial vehicle can be used as wireless coverage's aerial base station, also can regard as the mobile relay to provide reliable communication link for remote user, and other unmanned aerial vehicle's application still includes unmanned aerial vehicle auxiliary wireless network, towards 6G's future intelligent security unmanned aerial vehicle network etc.. In addition, unmanned aerial vehicle also plays an important role under emergent scene. Under the emergent and complicated topography condition of calamity, use unmanned aerial vehicle to carry out data acquisition and information extraction on the spot, can provide first hand data for the decision maker to furthest reduces the loss. With the continuous development of hardware technology, the cost of the unmanned aerial vehicle is lower and lower, but the functions carried by the unmanned aerial vehicle are more and more. For this reason, unmanned aerial vehicle assisted sensor acquisition systems are gaining wide attention. On one hand, due to the high mobility of the unmanned aerial vehicle, the unmanned aerial vehicle can expand the acquisition range; on the other hand, because the drone is higher in height, it has a higher chance of establishing a line-of-sight communication link with the ground sensors. Data communication is directly carried out through unmanned aerial vehicle and sensor, can improve data acquisition efficiency and reliability.

For further promoting unmanned aerial vehicle's collection efficiency, can carry out the clustering to ground sensor and handle, the cluster head is responsible for collecting all the other sensor data in the cluster, and unmanned aerial vehicle only gathers to the cluster head. In particular, due to the limited energy of the drone and the large area of the area that needs to be acquired, it may not be possible to acquire all the data within the area during one flight. Therefore, it is very critical to design a reasonable flight strategy according to the limited energy of the unmanned aerial vehicle. Currently, many research efforts are based on unmanned aerial vehicle assisted sensor acquisition systems that do not take into account the chargeable nature of the unmanned aerial vehicle and optimize the flight path of the unmanned aerial vehicle based on the above scenarios. However, in practical systems, it is common that the drone may return to the charging point for charging and go back to perform the task.

Disclosure of Invention

The invention provides a chargeable unmanned aerial vehicle path planning method and system, which are used for overcoming the defects in the prior art.

In a first aspect, the present invention provides a method for planning a path of a rechargeable unmanned aerial vehicle, including:

acquiring an unmanned aerial vehicle and a sensor set, and determining a system model, a channel model and an unmanned aerial vehicle energy consumption model;

establishing a total energy consumption objective function of the unmanned aerial vehicle based on the system model, the channel model and the unmanned aerial vehicle energy consumption model;

based on a sensor clustering mechanism and a cluster head selection mechanism, performing combined optimization on an unmanned aerial vehicle flight path strategy and a charging strategy in the total energy consumption objective function of the unmanned aerial vehicle according to a DQN algorithm, and realizing the purpose of minimizing the energy consumption of all sensor cluster head nodes after the unmanned aerial vehicle collects.

In one embodiment, the acquiring the drone and the set of sensors, determining a system model, a channel model, and a drone energy consumption model includes:

dividing the sensor set into a plurality of clusters, acquiring all cluster head sets, node sets in each cluster and the total number of sensors in any cluster, determining that the distance between any two sensor nodes in any cluster is not greater than the maximum communication range of the cluster heads, and determining the data carrying capacity of any sensor in any cluster;

respectively acquiring a cluster head selection strategy and an unmanned aerial vehicle acquisition strategy based on binary variables, determining the total number of cluster head nodes acquired by the unmanned aerial vehicle, and acquiring a flight path set of the unmanned aerial vehicle and the path length of each section based on the total number of the cluster head nodes;

obtaining channel gain between the unmanned aerial vehicle and the sensor based on a free space path loss model, and obtaining a data transmission rate between the unmanned aerial vehicle and the cluster head according to the channel gain;

obtaining a given total thrust, an implied speed under the given total thrust, an average unmanned aerial vehicle motion speed and an unmanned aerial vehicle pitch angle, and obtaining unmanned aerial vehicle forward minimum power based on the given total thrust, the implied speed under the given total thrust, the average unmanned aerial vehicle motion speed and the unmanned aerial vehicle pitch angle, wherein the given total thrust is obtained based on an unmanned aerial vehicle total mass, a gravity constant and a total resistance, and the implied speed is obtained based on the given total thrust, the unmanned aerial vehicle pitch angle, an unmanned aerial vehicle rotor number and an unmanned aerial vehicle rotor radius;

obtaining unmanned aerial vehicle hovering minimum power based on the given total thrust, the number of unmanned aerial vehicle rotors and the unmanned aerial vehicle rotor radius;

determining energy efficiency, and dividing the minimum advancing power and the minimum hovering power of the unmanned aerial vehicle by the energy efficiency respectively to obtain the minimum advancing power and the minimum hovering power of the unmanned aerial vehicle in unit time respectively;

and acquiring the total propulsion time of the unmanned aerial vehicle, and respectively multiplying the total propulsion time of the unmanned aerial vehicle by the minimum forward power of the unmanned aerial vehicle in unit time and the minimum hovering power of the unmanned aerial vehicle in unit time to respectively obtain total propulsion energy consumption and total hovering energy consumption.

In one embodiment, the establishing of the total energy consumption objective function of the drone based on the system model, the channel model, and the drone energy consumption model includes:

establishing a minimum objective function based on the total propulsion energy consumption and the total hovering energy consumption, and simultaneously satisfying a first constraint condition and a second constraint condition;

the first constraint condition is that all the sensors in each cluster need to cover the coverage range of the cluster head node in the current cluster;

the second constraint condition is that the residual energy of the unmanned aerial vehicle at any moment is greater than 0, and when the unmanned aerial vehicle collects the cluster head sensor and returns to a charging point with the unmanned aerial vehicle for charging.

In one embodiment, the performing, based on a sensor clustering mechanism and a cluster head selection mechanism, joint optimization on a flight path policy and a charging policy of the unmanned aerial vehicle in a total energy consumption objective function of the unmanned aerial vehicle according to a DQN algorithm to minimize energy consumption of all sensor cluster head nodes after the unmanned aerial vehicle collects energy includes:

selecting to obtain the optimal clustering quantity by adopting a preset clustering algorithm;

performing iterative optimization on the clustering process and cluster head selection based on the optimal clustering quantity to obtain the minimum clustering quantity meeting the communication range constraint of each cluster head;

and determining the state, action and reward of the unmanned aerial vehicle based on the minimum clustering number, and solving to obtain the energy consumption of all sensor cluster head nodes acquired by the minimum unmanned aerial vehicle.

In an embodiment, the selecting to obtain the optimal clustering number by using a preset clustering algorithm includes:

clustering the sensor set through a k-means algorithm;

and obtaining the distance from each in-cluster sensor to the cluster head based on the Euclidean distance, and judging whether the distance from each in-cluster sensor to the cluster head is smaller than the maximum communication range of the cluster head or not to obtain the optimal clustering quantity.

In an embodiment, the iteratively optimizing the clustering process and the cluster head selection based on the optimal cluster number to obtain the minimum cluster number satisfying the communication range constraint of each cluster head includes:

determining the data carrying capacity of any sensor in any cluster and the average distance between any sensor in any cluster and other nodes in the same cluster;

multiplying the carried data quantity and the average distance to obtain the centrality of any sensor in any cluster;

and determining the cluster head selection strategy of each cluster to obtain the node corresponding to the centrality reaching the maximum value as a cluster head node.

In one embodiment, the determining the state, the action and the reward of the drone based on the minimum clustering number, and solving to obtain the energy consumption of the drone to collect and complete all sensor cluster head nodes includes:

determining that the state of the unmanned aerial vehicle at any moment comprises unmanned aerial vehicle position information, a sensor acquired condition and unmanned aerial vehicle residual capacity, wherein the action of the unmanned aerial vehicle at any moment comprises a data acquisition action of flying to any cluster head at a constant speed and a flying-back starting point charging action, and the reward of the unmanned aerial vehicle at any moment comprises electric quantity reward, flight time reward and acquisition reward;

building an experience playback library based on the DQN algorithm, and putting a data set into the experience playback library for storage, wherein the data set comprises the state, the action, the reward and the state of the next moment;

initializing an estimated value network output and a target value network output, and initializing the experience playback library;

randomly extracting data from the experience playback library for learning, updating a target Q value, minimizing a loss function according to a gradient descent method, and updating corresponding neural network weight and a learning rate of Q value updating to reduce a difference between the estimated value network output and the target value network output until a preset convergence condition is met;

and outputting the flight path strategy and the charging strategy of the unmanned aerial vehicle, and meeting the requirement that the unmanned aerial vehicle acquires and completes the energy consumption of all sensor cluster head nodes.

In a second aspect, the present invention further provides a rechargeable unmanned aerial vehicle path planning system, including:

the acquisition module is used for acquiring the unmanned aerial vehicle and the sensor set and determining a system model, a channel model and an unmanned aerial vehicle energy consumption model;

the establishing module is used for establishing a total energy consumption objective function of the unmanned aerial vehicle based on the system model, the channel model and the unmanned aerial vehicle energy consumption model;

and the optimization module is used for carrying out combined optimization on the unmanned aerial vehicle flight path strategy and the charging strategy in the total energy consumption objective function of the unmanned aerial vehicle according to a DQN algorithm based on a sensor clustering mechanism and a cluster head selection mechanism, so that the energy consumption of all sensor cluster head nodes is completed by the minimized unmanned aerial vehicle collection.

In a third aspect, the present invention further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method for planning a route of a rechargeable drone according to any one of the above descriptions.

In a fourth aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method for path planning for a chargeable drone according to any one of the above.

According to the chargeable unmanned aerial vehicle path planning method and system provided by the invention, a deep reinforcement learning method is adopted for a scene that the unmanned aerial vehicle needs to return to charge, and the flight path and the charging strategy are jointly optimized through continuous interaction of the unmanned aerial vehicle and the environment.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a rechargeable unmanned aerial vehicle path planning method provided by the present invention;

FIG. 2 is a diagram of a single UAV chargeable scenario provided by the present invention;

FIG. 3 is a flow chart of the clustering number selection and cluster head selection mechanism algorithm provided by the present invention;

fig. 4 is a flow chart of an unmanned aerial vehicle path planning algorithm in a chargeable acquisition scenario provided by the present invention;

FIG. 5 is a logic diagram of a path planning algorithm for an unmanned aerial vehicle in a rechargeable acquisition scenario according to the present invention;

FIG. 6 is a schematic diagram of a neural network model provided by the present invention;

FIG. 7 is a diagram of a sensor pre-clustering distribution and clustering under a chargeable scenario provided by the present invention;

FIG. 8 is a diagram of the variation of reward and loss functions in a rechargeable scenario according to the present invention;

fig. 9 is a schematic diagram of an actual flight trajectory and charging of the unmanned aerial vehicle in the chargeable scene provided by the present invention;

FIG. 10 is a graph illustrating the impact of different learning rate settings on the loss function in a chargeable scenario provided by the present invention;

FIG. 11 is a graph of the variation of different schemes with the total number of sensors in a chargeable scenario provided by the present invention;

FIG. 12 is a comparison of the strategy schemes provided by the present invention;

fig. 13 is a schematic structural diagram of a rechargeable unmanned aerial vehicle path planning system provided by the present invention;

fig. 14 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Aiming at the problem that no method for systematically planning the path of an unmanned aerial vehicle charging scene exists in the prior art, the invention provides a chargeable unmanned aerial vehicle path planning method, and fig. 1 is a schematic flow chart of the chargeable unmanned aerial vehicle path planning method provided by the invention, and as shown in fig. 1, the method comprises the following steps:

s1, acquiring an unmanned aerial vehicle and a sensor set, and determining a system model, a channel model and an unmanned aerial vehicle energy consumption model;

s2, establishing a total energy consumption objective function of the unmanned aerial vehicle based on the system model, the channel model and the unmanned aerial vehicle energy consumption model;

and S3, based on a sensor clustering mechanism and a cluster head selection mechanism, performing joint optimization on the unmanned aerial vehicle flight path strategy and the charging strategy in the total energy consumption objective function of the unmanned aerial vehicle according to a DQN algorithm, and realizing that the energy consumption of all sensor cluster head nodes is completed by minimum unmanned aerial vehicle acquisition.

It should be noted that the invention mainly solves the problem of path planning that a single unmanned aerial vehicle can return to charge in the scene of sensor data acquisition, wherein because the number of sensors is too many, the sensors are clustered, the sensors in a cluster only need to transmit data to a cluster head node, and the unmanned aerial vehicle only needs to acquire the data of the cluster head node; furthermore, in an actual scene, the unmanned aerial vehicle can return to a charging point for charging, the acquisition task can be continuously executed after the unmanned aerial vehicle is fully charged, and the traditional mathematical method is difficult to solve in a dynamic environment. For this reason, deep reinforcement learning provides an effective method for optimizing charging strategies in dynamically changing environments.

Specifically, data of a single unmanned aerial vehicle and a plurality of sensors in a flight area are obtained, and a complete system model, a channel model between the unmanned aerial vehicle and the sensors and an energy consumption model of the unmanned aerial vehicle are analyzed and established; analyzing the target problem on the basis of the plurality of models, establishing a target optimization problem, and establishing a specific target function; further, the single unmanned aerial vehicle path planning problem in the charging scene is divided into three steps of solving, namely a clustering process, a cluster head selection process and a depth reinforcement learning process, optimization solving is carried out on the objective function, an optimal unmanned aerial vehicle flight path strategy and a charging strategy are obtained, and the unmanned aerial vehicle acquires and completes the minimum value of energy consumption of all sensor cluster head nodes.

According to the method, aiming at the scene that the unmanned aerial vehicle needs to return to charge, a deep reinforcement learning method is adopted, and the unmanned aerial vehicle continuously interacts with the environment, so that the combined optimization of the flight path and the charging strategy is realized.

Based on the above embodiment, step S1 in the method includes:

Specifically, the invention considers the problem that the unmanned aerial vehicle acquires data in the charging scene, which is the same as the unmanned aerial vehicle in the non-charging scene, and as shown in fig. 2, the scene is composed of one unmanned aerial vehicle and N sensors. The coordinate of the nth sensor is s_n＝{x_n，y_nAnd the transmission power of the sensor is noted as p_nThe coordinate of unmanned aerial vehicle is that u ═ x, y, h }, and it is fixed to assume unmanned aerial vehicle flying height, optimizes unmanned aerial vehicle's two-dimensional route, and unmanned aerial vehicle flies at the uniform velocity with speed u during the collection.

As shown in FIG. 2, assuming that all sensors are divided into K clusters, the clusters are grouped together

Indicating, a set of all cluster headsIs totally expressed as

Starting point of unmanned aerial vehicle is H₀And (4) showing. Except the starting point of the unmanned aerial vehicle, the set of nodes in each cluster is recorded as

The total number of sensors in the kth cluster is

And the unmanned aerial vehicle is enabled to collect the data of the cluster head only and can not repeatedly collect the cluster head.

The distance between any two sensor nodes in the kth cluster is recorded as

Since all sensors in a cluster need to transmit information to the cluster head, as shown in fig. 2, it is necessary to ensure that the distance between each sensor and the cluster head is less than the maximum communication range d of the cluster head_thre. The data carried by each sensor is of different size, defining D_k，iRepresenting the amount of data carried by the ith sensor in the kth cluster.

Assuming a binary variable λ_k，iIndicating whether the ith node in the kth cluster is selected as a cluster head, and if so, λ_k，iIs 1, otherwise is 0, the selection policy of the cluster head can be expressed as

By binary variables

Indicating whether unmanned aerial vehicle has acquired cluster head H_kWhen data of

In time, the unmanned aerial vehicle carries out data acquisition on the cluster head, otherwise

The acquisition policy may be defined as

Assuming that M cluster head nodes are collected by the unmanned aerial vehicle in one flight path, and M is a possible flight sequence, then M ═ H₍₁₎，H₍₂₎，...，H_(M)]When the unmanned aerial vehicle starts from the starting point, finishes acquisition and returns to the starting point, all flight paths of the unmanned aerial vehicle can be divided into M +1 sections, and a set is used

Represents a set of paths, where l_iRepresenting the path length of each segment.

Further, a model of the channel between the drone and the sensor is analyzed. Considering the uplink scenario, data acquisition is performed assuming that the drone is flying directly above the sensor. Assuming that the model of the channel from the drone to the ground is a line-of-sight link, the channel gain between the drone and the sensor follows the Free Space Path Loss (FSPL) model, so the channel gain between the drone and the sensor is

L_fspl＝20log(4πf_ch/c) (1)

Wherein f is_cAnd c represent carrier frequency and speed of light, respectively. Then the drone and the cluster head H_kData transmission rate therebetween

Can be expressed as

Where B denotes the bandwidth, σ²Representing the noise power at which the drone receives.

Next, the invention considers a four-rotor unmanned aerial vehicle energy consumption model, in which the energy consumption of the unmanned aerial vehicle mainly comprises two parts, the unmanned aerial vehicle transmits dataAnd maintaining the energy loss of the unmanned aerial vehicle in the self motion state, such as propulsion and hovering. The transmission energy loss of the unmanned aerial vehicle is far less than the energy loss in flight, and the energy loss in transmission can be ignored. In addition, in order to simplify the model, the unmanned aerial vehicle always flies at a constant speed and does not consider energy loss caused by sudden stop and steering. The energy of the unmanned aerial vehicle in the process of uniform-speed propulsion is mainly used for overcoming the gravity, environmental factors and resistance generated by forward movement of the unmanned aerial vehicle in the air. Noting the minimum power at which the drone is heading as

This value can be derived by the following formula.

Noting the minimum power at which the drone is heading as

This value can be derived by the following equation:

F＝Mg+F_drag (4)

wherein, F is the total thrust,

is the implied speed at a given thrust F, u is the average speed of motion of the drone relative to the ground, and θ is the drone pitch angle. In equation (4), M represents the total mass of the drone, including the mass of the drone fuselage, battery, and other accessories, g is the gravitational constant, F_dragIs the total resistance determined by the wind speed, the air density p and the drag coefficient. Considering the difference of air density under the different altitudes, unmanned aerial vehicle flight can receive different resistances when different altitudes, for simplifying the analysis, this paper considers that unmanned aerial vehicle flies fixed altitude department all the time, ignores the difference that height brought the unmanned aerial vehicle energy consumption model. The calculation of the pitching angle of the unmanned aerial vehicle during flying is obtained by the formula (5). In addition, for

It is calculated from the implicit equation (6) where w and r represent the number of rotors and the radius of the rotors of the drone, respectively. When the drone has sufficient energy, the energy consumption of the drone per unit time is expressed as:

eta is energy efficiency, and in addition, the minimum energy consumption per unit time when the unmanned aerial vehicle hovers is recorded

Is shown as

Wherein

Thus, the total propulsive energy consumption may be expressed as

Wherein

Indicating total time of propulsion of the droneThe ratio of the sum of the total distances it flies to its flight speed should be expressed as:

the total hover energy consumption may be expressed as:

wherein

Representing the total time that the drone hovers, the ratio of the sum of the total amount of information of all clusters that should be collected for it to the transmission rate, taking into account the data transmitted by the rest of the nodes in the cluster to the cluster head,

expressed as:

therefore, the residual energy of the unmanned aerial vehicle at the moment t should be the initial energy value e_initThe difference from the total energy consumption (including propulsion energy consumption and hover energy consumption) is expressed as

In the chargeable scene of unmanned aerial vehicle, unmanned aerial vehicle can return to the departure point when the energy is not enough and charge through filling electric pile, then continue to gather, consequently in this scene, unmanned aerial vehicle can have flown all cluster head nodes and can not visit repeated cluster head, but needs optimize unmanned aerial vehicle's flight path and the opportunity of charging and the number of times of charging.

Defining binary variables

Whether the unmanned aerial vehicle returns to the starting point for charging after collecting cluster head node k is shown, namely

Indicating that the drone is returning to node charging, and

indicating that no charge is returned. For convenience of solution, it is assumed that all energy is recovered after the charging of the drone and the specific process of charging of the drone is not considered. Define all charge sets as

Therefore, what needs to be optimized in this problem includes both the flight path strategy m and the charging strategy c of the drone.

According to the invention, by establishing a system model, a channel model and an unmanned aerial vehicle energy consumption model, the data acquisition range and related interactive processes in an unmanned aerial vehicle charging scene can be comprehensively covered.

Based on any of the above embodiments, the step S2 in the method includes:

Specifically, after the model is established, a combined optimization needs to be simultaneously performed on a flight path strategy and a charging strategy of a single unmanned aerial vehicle in a charging scene, and considering that the unmanned aerial vehicle can acquire all clustered cluster head nodes in the area through a charging behavior, a scene different from a scene in which the unmanned aerial vehicle is not charged in the previous chapter takes a weighted sum as a target, the scene takes energy consumption for minimizing the acquisition of all nodes by the unmanned aerial vehicle and returning to a starting point as a target, optimization solution is performed, and a specific objective function is established as follows:

the first term of the objective function is the propulsion energy consumption of the unmanned aerial vehicle, and the second term is the hovering energy consumption of the unmanned aerial vehicle. The first constraint condition (15a) indicates that all sensors in each cluster need to be in the coverage range of the cluster head node of the cluster to ensure that other sensors can transmit data to the cluster head, the second constraint condition (15b) is an energy constraint and indicates that the residual energy of the unmanned aerial vehicle at any moment must be greater than 0, and T is the total time for the unmanned aerial vehicle to execute tasks.

In conclusion, the invention jointly optimizes the flight strategy and the charging strategy of the unmanned aerial vehicle so as to minimize the energy consumption when the unmanned aerial vehicle finishes collecting all the sensor data.

Based on any of the above embodiments, the step S3 in the method includes:

Wherein, the selecting to obtain the optimal clustering quantity by adopting a preset clustering algorithm comprises the following steps:

clustering the sensor set through a k-means algorithm;

The iterative optimization of the clustering process and cluster head selection based on the optimal clustering number to obtain the minimum clustering number meeting the communication range constraint of each cluster head comprises the following steps:

Wherein, based on minimum clustering quantity, confirm unmanned aerial vehicle's state, action and reward, solve and obtain minimum unmanned aerial vehicle gathers the energy consumption who accomplishes all sensor cluster head nodes, include:

Specifically, the single unmanned aerial vehicle path planning problem under the charging scene is divided into three steps, the three steps of solving are respectively a clustering process, a cluster head selection process and a deep reinforcement learning process, and the specific process is as follows:

1. clustering step

In this step, the whole ground sensors are clustered by a k-means algorithm, and specifically, the sensors are clustered by the k-means algorithm in this chapter. In the scene, the Euclidean distance is used as a clustering standard to cluster all the sensors, then, cluster heads are selected from the clustered sensors, and the iterative optimization of the clustering process and the cluster head selection process is carried out by judging whether the distance from each sensor in the cluster to the cluster head is within the communication range of the cluster head, so that the optimal clustering number K is selected.

2. Cluster head selection process

After clustering the sensors, the cluster heads in each cluster need to be selected, the data of all other sensor nodes in the cluster need to be collected by the cluster heads, all the data need to be transmitted to the unmanned aerial vehicle, and the cluster heads need to be carefully selected to achieve the optimal overall performance due to the fact that the cluster heads have the effect of being not neglected in managing the communication between the other sensors in the cluster and the unmanned aerial vehicle.

Therefore, the centrality of all nodes in the same cluster is fully considered, the centrality of the nodes represents the importance degree of the nodes in the cluster, a sensor multi-dimensional centrality index is defined by taking the centrality concept of the nodes in the social network as a reference, and the index gives consideration to the data volume of the nodes and the connection capacity of the nodes with other nodes in the cluster. On the one hand, the method comprises the following steps of,if the data volume of a certain node is larger, the node in the cluster is more important; on the other hand, if the average distance between a certain node and other nodes in the same cluster is smaller, the node has stronger ability to contact other nodes. Such nodes are therefore well suited to act as cluster heads per cluster. Specifically, the centrality ψ of the ith node in the kth cluster is defined_k，iComprises the following steps:

wherein D is_k，iRepresents the amount of data carried by the ith node in the kth cluster,

and the average distance between the ith node in the kth cluster and other nodes in the same cluster is represented as:

it can be seen that the centrality is proportional to the data volume of the node and inversely proportional to the average distance between the centrality and other nodes in the same cluster, so that the cluster head node selection is performed based on the maximum centrality index, and the cluster head selection policy of each cluster is as follows:

after determining the clustering algorithm and the selection mechanism of the cluster head, the clustering process and the cluster head selection process are iteratively optimized, so as to select the minimum clustering number which can meet the communication range constraint of each cluster head, and the specific process is shown in fig. 3.

3. Path planning algorithm based on deep reinforcement learning

Based on the selected cluster head nodes, the flight strategy and the charging strategy of the unmanned aerial vehicle need to be designed reasonably so as to minimize energy consumption when the unmanned aerial vehicle collects all the cluster head node data. In the scene, a deep reinforcement learning intelligent algorithm is adopted, the charging time and the charging times of the unmanned aerial vehicle are adaptively adjusted through the interaction between the unmanned aerial vehicle and the surrounding environment, and the specific flow design of the algorithm is as follows:

regard unmanned aerial vehicle as an agent, and define its state, action, reward:

1) state s of the drone at time t_t: this state mainly includes three aspects, namely the position information u of the drone_t＝{x_t，y_tH, condition g of sensor being acquired_tAnd the remaining capacity e of the unmanned aerial vehicle_tI.e. s_t＝[u_t，g_t，e_t]Wherein g is_tThe set of collected conditions for all sensor cluster heads can be expressed as

2) Act a at time t_t: the unmanned aerial vehicle comprises two types of actions, namely an action of flying to the kth cluster head at a constant speed to acquire data

Charging operation with flying back to the starting point

Wherein the charging action causes the drone to be fully charged;

3) prize r at time t_t: the reward of the unmanned aerial vehicle comprises three parts

Instant electric power reward

Time of flight reward

Collecting rewards

Wherein the electricity amount is rewarded

The effect of (2) guarantee that unmanned aerial vehicle all is higher than 0 at all electric quantity constantly, set up it as:

time of flight reward

Lie in guaranteeing that unmanned aerial vehicle's flight path is shorter as far as possible to minimize unmanned aerial vehicle's energy consumption, set up it into

Alpha is a negative coefficient and a reward is collected

The effect lie in avoiding unmanned aerial vehicle to carry out repeated collection to the sensor cluster head node of gathering, set up it as:

the Q-Learning algorithm needs to build a Q table to find all possible state action pairs, however, if too many states or actions are required to process the data, the Q table is very time-consuming to build, and therefore, in the deep reinforcement Learning algorithm, the conventional Q table is fitted in the form of a neural network. Specifically, the invention adopts DQN (Deep Q-Network) algorithm, firstly constructs an experience playback library, and collects the data set(s)_t，a_t，r_t，s_t+1) Put into it for storage, and then later learnedIn the learning process, randomly extracting data from the experience playback library for learning, and updating a target Q value, wherein the Q value is updated according to the formula

Wherein

For the learning rate, γ ∈ [0, 1 ]]Is a discount factor. DQN is an off-policy off-line learning method. DQN ensures that the difference between the Q estimated value output by the estimated value network and the target Q value output by the target value network is continuously reduced through training, namely, the loss function is minimized

Loss(θ)＝E[(Q_target-Q(s_t，a_t；θ))]² (23)

Q_target＝r+max_aQ(8′，a；θ′) (24)

In order to accelerate the learning process of the DQN and improve the learning effect of the DQN, in the algorithm, the parameter learning rate of the DQN is also improved

Optimization is carried out, a learning rate which is always fixed is adopted in most documents at present, or the learning rate is halved in stages according to the increase of the iteration number, and the updating mode can influence the learning process. Because the learning process is a process from bad to good, the learning effect is better and better along with the increase of the epicode, and in this case, the value of the learning rate is gradually reduced, namely the search step length is reduced, so that the optimal solution can be found, and therefore, the learning process is not facilitated by always adopting the fixed learning rate; however, in the method of reducing the learning rate by half in a stepwise manner, although the influence process of the epamode on the learning rate is considered, the difference of the epamodes is not considered in the method for reducing the learning rate by half in the same stage, and therefore, the learning effect is not good enough. Based on the reasons, the invention adopts an exponential learning rate updating mode, fully considers the influence of the epamode on the search step length, and has specific updating process such asThe following:

wherein

Representing the magnitude of the learning rate during the t-th epamode,

and

respectively, the minimum and maximum values of the learning rate, and phi the number of total epsilodes.

In summary, based on the sensor clustering mechanism and the cluster head selection mechanism, a method is provided for performing joint optimization on a flight strategy and a charging strategy of an unmanned aerial vehicle by adopting a DQN algorithm, and specifically analyzing the state, the action and the reward setting of the unmanned aerial vehicle, so as to minimize energy consumption of all sensor cluster head nodes collected by the unmanned aerial vehicle, wherein a specific flow of the algorithm is shown in fig. 4, and a logic diagram is shown in fig. 5.

Based on any of the above embodiments, the technical effects of the present invention are described in a specific implementation scenario.

The scheme provided by the invention is based on Python simulation, a deep learning framework is built by adopting a Pythroch machine learning library, and the performance of the method is verified by a simulation result. In the simulation setting, 100 sensors are randomly distributed in the range of 1000m multiplied by 1000m, the transmitting power of the sensors is 0.1w, the carrier frequency is 2Ghz, the bandwidth is 1MHz, and the noise power sigma is²Is-110 dbm. Every sensor data volume is 1 ~ 10Mb, and unmanned aerial vehicle initial energy is 150000J, and unmanned aerial vehicle flying height is 100M, and flying speed is 20M/s, and unmanned aerial vehicle quality M is 1.57kg, and rotor quantity is 4, total resistance F_drag9.6998N, air density ρ 1.225kg/m³The energy efficiency η is 0.7.

Specifically, as shown in fig. 6, the neural network model used in the present invention is a neural network model in which the estimated value neural network and the target value neural network have the same structure, and the estimated value neural network updates the parameters in real time, the unmanned aerial vehicle selects an action according to the estimated value neural network, and the parameters of the target value neural network are updated periodically by the estimated value neural network. Two neural networks with the same structure are adopted, and the purpose is to reduce the correlation between the current Q value and the target Q value and improve the stability of the algorithm. As shown in fig. 6, a vector consisting of the position of the unmanned aerial vehicle, the residual energy and the cluster head acquisition condition is input, after passing through two hidden layers and corresponding activation functions and normalization operations, a Q value corresponding to each action is output, the unmanned aerial vehicle selects the action according to an epsilon-greedy manner, namely, the action is randomly selected according to the probability of epsilon, and the action with the maximum Q value is selected according to the probability of 1-epsilon.

Fig. 7 shows the distribution of sensors and the clustering of sensors after clustering, respectively, where fig. 7(a) shows the random distribution of sensors before clustering, and fig. 7(b) shows the clustering of sensors after clustering, distinguished by different colors, and the position of the centroid after clustering.

Fig. 8 shows the convergence of the DQN algorithm mentioned in this section, wherein fig. 8(a) shows the change of reward, while fig. 8(b) shows the variation of the loss function, it can be seen from fig. 8(a) that as the value of epicode increases, the reward overall exhibits an ascending state, and finally converges in about 1000 times, which shows that in the previous 1000 iterations, the agent is in the learning phase, after 1000 times, the intelligent agent gradually finds the optimal flight strategy and charging strategy, and the reward gradually becomes stable, in this process, however, a small amount of "glitch" still occurs in the reward curve, because, when the action selection is carried out by the epsilon-greedy strategy, although the action with the maximum Q value can be selected with a great probability, the random action selection is carried out with a probability, in such a case, the execution of the random action results in a small number of "glitches" in reward; in fig. 8(b), the variation curve of the loss function with respect to the epsilon is reflected, and after 1000 eposolates or so, the loss function gradually approaches 0, which indicates that the neural network has well learned the flight strategy and the charging strategy.

Fig. 9 shows the final flight route map of the drone. As can be seen from the figure, under the clustering condition, the unmanned aerial vehicle firstly performs data acquisition on the cluster head nodes on the left side of the area, then returns to the starting point for charging in the midway due to insufficient residual energy, and performs data acquisition on the cluster head nodes on the right side of the area again, and the DQN algorithm realizes the joint optimization of the flight strategy and the charging strategy of the unmanned aerial vehicle.

Fig. 10 shows the impact of different learning rate settings on the loss function. Wherein the fixed learning rate is set to 0.05, the learning rate for the stepwise update is initially 0.1 and decreases by half every 2000 epsilon, the exponential learning rate maximum is set to 0.01, and the minimum is set to 0.0001. As can be seen from the figure, the learning rate exponential update method adopted in this scenario can converge to a loss function value closer to 0 more quickly than the learning rate fixed and stepwise update methods. This is because the learning of DQN is related to the size of the epicode, and as the epicode increases, the agent gets closer to the optimal solution. Therefore, in the process of searching for the optimal solution, the set step size should be smaller, compared with the other two updating methods, the learning process can be accelerated by the indexed updating strategy, and on the other hand, compared with the staged updating strategy, the indexed updating strategy is reduced faster when the epsilon is smaller, and is smoother when the epsilon is larger.

Fig. 11 is a comparison graph comparing the total energy consumption of the drone caused by different schemes as the number of sensors changes. It can be seen that as the number of sensors increases, the energy consumed by the intelligent agent shows an upward trend, mainly because as the number of nodes increases, the data volume increases, the number of clusters thereof may also increase, and the flight distance of the unmanned aerial vehicle also increases. The figure also compares the results of the DQN scheme with the results of the other three schemes, namely Sarsa algorithm flight strategy, genetic algorithm flight strategy and random flight strategy, and compared with the three schemes, the proposed DQN algorithm can achieve lower energy consumption.

Further, the greeny parameters for different evaluation strategies in Sarsa algorithm are compared here. As shown in fig. 12, in Sarsa algorithm, action strategies are consistent, i.e., actions are randomly selected with a probability of 10%, and actions with the highest Q value are selected with a probability of 90%. And comparing results of different evaluation strategy parameters in the DQN algorithm and the Sarsa algorithm on the total energy consumption of the unmanned aerial vehicle, and displaying that the results of the DQN algorithm are all superior to those of the Sarsa algorithm. In addition, the red line represents 0% selection random action, 100% selection action with the largest Q value; the green line represents 5% of the selected random actions and 95% of the selected actions with the largest Q; the orange line represents 10% of the random motions selected and 90% of the motions selected with the highest Q value. It can be seen that when the number of sensors is small, the state space and the motion space are also small, the result of the red curve is slightly better, and the orange curve has better effect with the increase of the number of sensors. The reason for this is that Sarsa is a "conservative" strategy, and when the state space is small, by changing the parameter of greedy of the evaluation strategy, it is more likely to find a better solution; selecting the same probability as the action policy may be more optimal when the state space is larger.

The rechargeable unmanned aerial vehicle path planning system provided by the invention is described below, and the rechargeable unmanned aerial vehicle path planning system described below and the rechargeable unmanned aerial vehicle path planning method described above can be referred to correspondingly.

Fig. 13 is a schematic structural diagram of the rechargeable unmanned aerial vehicle path planning system provided by the present invention, as shown in fig. 13, including: an obtaining module 1301, an establishing module 1302, and an optimizing module 1303, wherein:

the obtaining module 1301 is used for obtaining an unmanned aerial vehicle and a sensor set, and determining a system model, a channel model and an unmanned aerial vehicle energy consumption model; the establishing module 1302 is configured to establish a total energy consumption objective function of the drone based on the system model, the channel model, and the drone energy consumption model; and the optimization module 1303 is used for performing joint optimization on the unmanned aerial vehicle flight path strategy and the charging strategy in the total energy consumption objective function of the unmanned aerial vehicle according to the DQN algorithm based on a sensor clustering mechanism and a cluster head selection mechanism, so that the energy consumption of all sensor cluster head nodes is completed by the acquisition of the unmanned aerial vehicle.

Fig. 14 illustrates a physical structure diagram of an electronic device, and as shown in fig. 14, the electronic device may include: a processor (processor)1410, a communication interface (communication interface)1420, a memory (memory)1430 and a communication bus 1440, wherein the processor 1410, the communication interface 1420 and the memory 1430 communicate with each other via the communication bus 1440. The processor 1410 may invoke logic instructions in the memory 1430 to perform a chargeable drone path planning method, the method comprising: acquiring an unmanned aerial vehicle and a sensor set, and determining a system model, a channel model and an unmanned aerial vehicle energy consumption model; establishing a total energy consumption objective function of the unmanned aerial vehicle based on the system model, the channel model and the unmanned aerial vehicle energy consumption model; based on a sensor clustering mechanism and a cluster head selection mechanism, performing combined optimization on an unmanned aerial vehicle flight path strategy and a charging strategy in the total energy consumption objective function of the unmanned aerial vehicle according to a DQN algorithm, and realizing the purpose of minimizing the energy consumption of all sensor cluster head nodes after the unmanned aerial vehicle collects.

In addition, the logic instructions in the memory 1430 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the method for path planning for a rechargeable drone provided by the above methods, the method comprising: acquiring an unmanned aerial vehicle and a sensor set, and determining a system model, a channel model and an unmanned aerial vehicle energy consumption model; establishing a total energy consumption objective function of the unmanned aerial vehicle based on the system model, the channel model and the unmanned aerial vehicle energy consumption model; based on a sensor clustering mechanism and a cluster head selection mechanism, performing combined optimization on an unmanned aerial vehicle flight path strategy and a charging strategy in the total energy consumption objective function of the unmanned aerial vehicle according to a DQN algorithm, and realizing the purpose of minimizing the energy consumption of all sensor cluster head nodes after the unmanned aerial vehicle collects.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program that, when executed by a processor, is implemented to perform the method for path planning for a rechargeable drone provided in the above aspects, the method comprising: acquiring an unmanned aerial vehicle and a sensor set, and determining a system model, a channel model and an unmanned aerial vehicle energy consumption model; establishing a total energy consumption objective function of the unmanned aerial vehicle based on the system model, the channel model and the unmanned aerial vehicle energy consumption model; based on a sensor clustering mechanism and a cluster head selection mechanism, performing combined optimization on an unmanned aerial vehicle flight path strategy and a charging strategy in the total energy consumption objective function of the unmanned aerial vehicle according to a DQN algorithm, and realizing the purpose of minimizing the energy consumption of all sensor cluster head nodes after the unmanned aerial vehicle collects.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, or may contribute to the prior art.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the present invention in its essence.

Claims

1. A rechargeable unmanned aerial vehicle path planning method is characterized by comprising the following steps:

2. The method of claim 1, wherein the obtaining a set of drones and sensors, determining a system model, a channel model, and a drone energy consumption model, comprises:

3. The method of claim 2, wherein the establishing a total energy consumption objective function for the drone based on the system model, the channel model, and the drone energy consumption model comprises:

4. The method for planning the path of the rechargeable unmanned aerial vehicle according to claim 1, wherein the method for performing joint optimization on a flight path strategy and a charging strategy of the unmanned aerial vehicle in the total energy consumption objective function of the unmanned aerial vehicle according to a DQN algorithm based on a sensor clustering mechanism and a cluster head selection mechanism to minimize energy consumption of all sensor cluster head nodes after unmanned aerial vehicle acquisition is completed comprises:

5. The method according to claim 4, wherein the selecting an optimal clustering number by using a preset clustering algorithm comprises:

clustering the sensor set through a k-means algorithm;

6. The method of claim 4, wherein the iteratively optimizing clustering processes and cluster head selection based on the optimal number of clusters to obtain a minimum number of clusters that satisfies a communication range constraint of each cluster head comprises:

7. The method for planning the path of the rechargeable unmanned aerial vehicle according to claim 4, wherein the determining the state, the action and the reward of the unmanned aerial vehicle based on the minimum clustering number, and solving the energy consumption of all sensor cluster head nodes acquired by the minimum unmanned aerial vehicle comprises:

8. A rechargeable unmanned aerial vehicle path planning system, comprising:

9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the method for path planning for a chargeable drone of any of claims 1 to 7.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the method for path planning for a rechargeable drone according to any one of claims 1 to 7.