CN113255218A

CN113255218A - Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network

Info

Publication number: CN113255218A
Application number: CN202110582074.0A
Authority: CN
Inventors: 胡杰; 李雨婷; 于秦; 杨鲲
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-05-27
Filing date: 2021-05-27
Publication date: 2021-08-13
Anticipated expiration: 2041-05-27
Also published as: CN113255218B

Abstract

The invention discloses an unmanned aerial vehicle autonomous navigation and resource scheduling method of a wireless self-powered communication network, which comprises the following steps: s1, determining a network model, a communication mode and a channel model; s2, modeling downlink wireless power transmission and uplink wireless information transmission, and determining an optimized target expression and constraint conditions thereof; s3, analyzing the optimization problem, and modeling the optimization problem as a Markov process; s4, determining a network communication protocol and an unmanned aerial vehicle flight decision model; s5, defining a neural network input state, an unmanned aerial vehicle output action and a reward function; and S6, solving the optimization problem according to the depth reinforcement learning algorithm. According to the invention, through jointly designing three parts, namely the flight path of the unmanned aerial vehicle in the wireless self-powered communication network, the selection of the ground equipment and the communication mode with the ground equipment, the energy supply to a plurality of pieces of ground equipment is realized, and meanwhile, the maximization of the average data volume of the plurality of pieces of ground equipment in the wireless self-powered communication network is also considered.

Description

Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network

Technical Field

The invention belongs to the technical field of unmanned aerial vehicle energy supply communication networks, and particularly relates to an unmanned aerial vehicle autonomous navigation and resource scheduling method of a wireless self-powered communication network.

Background

Wireless Sensor Networks (WSNs) may be used to collect information about the surrounding environment. Generally, the power of the device in the wireless sensor network is limited, and when the power of the device is exhausted, the sensor is charged in a manual mode or a traditional ground communication network, so that the charging efficiency is low. While Radio Frequency (RF) based Energy Harvesting (EH) may be considered as an expected solution to extend the useful life of energy-limited sensor devices. Wireless Power Transfer (WPT) through RF radiation can provide a convenient, reliable energy supply for low power internet of things devices. And can operate over a longer range, charging it simultaneously even under multiple wireless devices that are moving. Wireless Powered Communication Networks (WPCNs) have therefore been proposed. The Wireless Power Transmission (WPT) and Wireless Information Transmission (WIT) are integrated, and a feasible solution is provided for energy constraint Internet of things equipment.

UAV (Unmanned Aerial Vehicle) by virtue of its high mobility and low cost, it can support better communication links between air and ground terminals due to less signal blocking and shadowing effects. It can provide higher line of sight (LoS) channel probability and better connectivity by greatly shortening its distance to the user compared to a conventional fixed base station. The UAV, as an air base station, can be used to overcome the user unfairness problem caused by the "double far-near" problem in the conventional fixed base station wireless energy supply network, and improve the data rate by flexibly reducing the signal propagation distance between the UAV and the ground equipment.

However, in the current technology, the energy transmission and data collection tasks of the unmanned aerial vehicle in an unknown environment are not considered under the condition that the position of the ground equipment is known.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides an unmanned aerial vehicle autonomous navigation and resource scheduling method of a wireless self-powered communication network, which realizes energy supply to a plurality of ground devices and also maximizes the average data volume of the plurality of devices in the wireless self-powered communication network by jointly designing three parts, namely a flight track of an unmanned aerial vehicle, selection of ground devices and a communication mode with the ground devices in the wireless self-powered communication network.

The purpose of the invention is realized by the following technical scheme: the unmanned aerial vehicle autonomous navigation and resource scheduling method of the wireless self-powered communication network comprises the following steps:

s1, determining a network model, a communication mode and a channel model;

s2, modeling downlink wireless power transmission and uplink wireless information transmission, and determining an optimized target expression and constraint conditions thereof;

s3, analyzing the optimization problem, and modeling the optimization problem as a Markov process;

s4, determining a network communication protocol and an unmanned aerial vehicle flight decision model;

s5, defining a neural network input state, an unmanned aerial vehicle output action and a reward function;

and S6, solving the optimization problem according to the depth reinforcement learning algorithm.

Furthermore, the network model consists of an unmanned aerial vehicle and a plurality of ground passive devices;

the communication mode is as follows: the unmanned aerial vehicle transmits energy to the ground passive device through the radio frequency link, and the ground passive device transmits data to the unmanned aerial vehicle through the harvested energy;

the channel model is a Los channel.

Further, the step S2 specifically includes the following sub-steps:

s21, determining the energy harvested by the ground passive equipment for the downlink wireless power transmission;

s22, for uplink wireless information transmission, when the unmanned aerial vehicle selects a certain ground passive device for communication, determining an uplink transmission data volume;

and S23, determining an optimization target expression and a constraint condition thereof.

Further, the step S5 specifically includes the following sub-steps:

s51, determining a network state set: defining the network state as S ═ e_i(t),ζ_i,q(t),h_i(t)}，e_i(t) represents the battery power level, ζ, of the ith passive device at time t within the coverage area_iRepresenting the accumulated uploaded data volume of the passive device i, q (t) representing the position of the unmanned aerial vehicle at the moment t, h_i(t) represents the channel gain of the passive device i and the unmanned aerial vehicle at the time t;

s52, determining the output unmanned aerial vehicle action set A as: a ═ i, [ rho ] (t), α (t), v_UAV(t), where ρ (t) represents a communication mode of the drone, ρ (t) ═ 1 represents a downlink transmission mode, and ρ (t) ═ 0 represents an uplink transmission mode; α (t) represents an unmanned aerial vehicle steering angle; v. of_UAV(t) represents the flight speed of the drone;

s53, determining a reward mechanism: defining a reward function r ═ r_data+r_penalty，

Representing the variation of the network average data volume, and executing corresponding penalty r once any constraint in the constraint conditions is not satisfied_penaltyAnd I denotes the total number of passive devices.

Further, the step S6 specifically includes the following sub-steps:

s61, initializing network parameters: initializing values Q corresponding to all states and actions, initializing all parameters omega of the current neural network, emptying a set D of experience playback, wherein the parameters omega' of the target neural network are omega;

s62 initialization S_tFor the current state, the feature vector phi(s) of the current state is obtained_t)；

S63, using phi (S) in the neural network_t) As input, values Q corresponding to all states of the neural network are obtained, and corresponding actions a are selected from the current values Q by an epsilon-greedy method_t；

S64, at state S_tPerforming a current action a_tTo obtain a new state s_t+1And a feature vector phi(s) corresponding to the new state_t+1) And the prize r of the current state_tWill { phi(s)_t),a_t,r_t,φ(s_t+1) Storing the quadruple into an empirical playback set D;

s65, let t be t +1, then S_t＝s_t+1Judging the new state s_t+1Whether the flight is in the ending state or not, if not, returning to the step S63; if yes, continuing to judge whether the iteration round number T +1 is larger than T, if yes, ending the iteration, otherwise, returning to the step S63;

s66, sampling m samples from the empirical playback set D_j),a_j,r_j,φ(s_j+1) J 1.. m, the current target state-action value y is calculated according to the following formula_j：

Q′(s_j+1,a_j+1(ii) a ω') represents the value of the next state, calculated by the target neural network;

s67, calculating a mean square error loss function

Updating all parameters omega of the neural network by gradient back propagation of the neural network such that the mean square error loss functionMinimization; y is_jIs shown in state s_jThe value calculated by the formula of S66,

is shown in state s_jThe value directly output through the current neural network;

s68, if t% of the target neural network parameter update frequency is 1, updating the target neural network parameter ω' ═ ω, otherwise, not updating the target neural network parameter;

s69, updating coordinates of the unmanned aerial vehicle, calculating the battery power level of the passive device, accumulating the uploaded data volume by the passive device, and obtaining the channel gain of the passive device and the unmanned aerial vehicle.

The invention has the beneficial effects that: the invention achieves the purpose of improving the average uplink transmission data volume of system users to the maximum extent by jointly designing three parts of the flight path of the unmanned aerial vehicle, the selection of ground equipment and the communication mode of the ground equipment in the wireless self-powered communication network, performs optimized solution by a deep reinforcement learning algorithm, and inputs the system state into a neural network so as to output the optimal action of the unmanned aerial vehicle. The invention fully considers the problem that the unmanned aerial vehicle does not have prior knowledge on the position of the ground equipment, realizes energy supply to a plurality of pieces of ground equipment, and simultaneously considers the maximization of the average data volume of the plurality of pieces of equipment in the wireless self-powered communication network.

Drawings

Fig. 1 is a flow chart of the autonomous navigation and resource scheduling method of the unmanned aerial vehicle of the present invention;

fig. 2 is a schematic diagram of a wireless self-powered communication network model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a deep reinforcement learning algorithm model according to the present invention.

Detailed Description

The technical scheme of the invention is further explained by combining the attached drawings.

As shown in fig. 1, the method for unmanned aerial vehicle autonomous navigation and resource scheduling in a wireless autonomous power communication network of the present invention includes the following steps:

s1, determining a network model, a communication mode and a channel model;

the network model consists of an unmanned aerial vehicle and a plurality of ground passive devices; suppose that there is an unmanned aerial vehicle as the aerial base station in the WPCN network, and there are I passive (sensor) devices on the ground, which are recorded as

Representing a two-dimensional space. The drone is destined to collect data for I passive devices in the area. In order to simplify the network model, the flying height of the unmanned aerial vehicle is assumed to be unchanged and fixed as H. The position of the unmanned aerial vehicle at time t is denoted as q (t) ═ x (t), y (t), and the flying speed is v_UAV(t) the carrier signal transmission power of the unmanned aerial vehicle is P_UAVChannel noise power of σ²At time t, the distance between the unmanned aerial vehicle and each passive device is

Where | · | | represents the euclidean distance between a pair of vectors, w_iThe location of the ith passive device is indicated. The energy conversion efficiency coefficient of the passive equipment is eta, and the signal transmitting power is P_tr. A model of a communication network based on drones is shown in fig. 2.

The communication mode is as follows: the unmanned aerial vehicle transmits energy to the ground passive device through the radio frequency link, and the ground passive device transmits data to the unmanned aerial vehicle through the harvested energy; drones serve as both transmitters of energy and receivers of information. The ground passive equipment adopts a protocol of 'harvesting before transmission', namely, after enough energy is harvested from a downlink radio frequency link of the unmanned aerial vehicle, data are transmitted to the unmanned aerial vehicle through an uplink link. The total working time of the unmanned aerial vehicle is T, and at each time T, the unmanned aerial vehicle determines a communication mode and uses rho (T) epsilon {0,1} to represent the communication mode. Wherein ρ (t) ═ 1 represents a downlink transmission mode, and the unmanned aerial vehicle broadcasts energy to the ground passive equipment; ρ (t) ═ 0 represents the uplink transmission mode, and the drone selects a certain passive device to receive its uploaded data information, and only one device is allowed to upload at this time.

The channel model is a Los channel; at time t, the two-dimensional coordinates of the drone are q (t) ═ (x (t), y (t)). Suppose there is a Los channel between the drone and the ground passive device, and the path loss index is 2. The passive device i and the unmanned aerial vehicle have channel gains at the moment t of

β₀Representing the channel gain at a reference distance of 1 meter.

S2, modeling downlink wireless power transmission and uplink wireless information transmission, and determining an optimized target expression and constraint conditions thereof; the method specifically comprises the following steps:

s21, determining the energy harvested by the ground passive equipment for the downlink wireless power transmission to obtain an energy constraint condition; assuming that the unmanned aerial vehicle is in a downlink transmission mode, the corresponding power received by the passive device i at the moment t is

Wherein P is_UAVThe transmitting power of the unmanned aerial vehicle is represented, and eta is an energy conversion efficiency coefficient of the passive equipment. Suppose in

Within the time, the unmanned aerial vehicle is always in a downlink communication mode, and the battery energy on the passive device i is

Comparing the residual battery power of the passive equipment with an energy threshold value, judging whether the residual battery power of the passive equipment is greater than the energy threshold value, if so, defining the power level of the passive equipment to be 1, otherwise, defining the power level of the passive equipment to be 0, and discretizing the battery power of all the passive equipment into high and low levels e_i(t)∈{0,1}。

S22, for uplink wireless information transmission, when the unmanned aerial vehicle selects a certain ground passive device for communication, determining the uplink transmission data volume to obtain a service quality constraint condition; suppose an unmanned aerial vehicleAnd in an uplink transmission mode, at the moment, the passive device i is selected to transmit data to the unmanned aerial vehicle, and the throughput of the passive device i at the moment t is

Where B is the system bandwidth, P_trIs the transmitted power of the passive device(s),

is a reference signal-to-noise ratio (SNR). Suppose in

Within the time, the passive device i is selected all the time to send data to the unmanned aerial vehicle, and the data volume uploaded by the passive device i is accumulated to be

S23, determining an optimized target expression and a constraint condition thereof, wherein the target problem of the maximization of the average data volume of the system is as follows:

q(0)＝q(T)

where P1 represents the optimization problem P1, i.e. maximizing the average throughput of all devices by adjusting drone position, speed and communication mode;

representing the average flight speed of the unmanned aerial vehicle, and tau representing the current flight time of the unmanned aerial vehicle; q (0) represents the position of the drone at time T-0, q (T) represents the position of the drone at time T-T, T is the drone flight time specified in advance, and q (0) q (T) represents the need for the drone to return to the home position at time T. Zeta_QoSThe constraint representing the QoS criteria, i.e. the minimum amount of data uploaded per sensor, also represents that the drone needs to traverse all sensors.

S3, analyzing the optimization problem, and modeling the optimization problem as a Markov process; the Markov process is composed of 4 tuples<S,A,R,P>Definition, where S is a set of states; a is the set of all possible actions, R is the reward when an action is taken, and P represents the transition probability from one state to another. Specifically, the drone observes the environment and obtains state s as a smart agent_tE.g. S. Unmanned aerial vehicle selects action a at time t_tE.g. A, then according to the observation and the next state s_t+1Obtain a return r_t∈R。

S4, determining a network communication protocol and an unmanned aerial vehicle flight decision model; in order to solve the problem that the unmanned aerial vehicle does not have a priori knowledge of the position of the passive device, a coverage area is defined for the unmanned aerial vehicle, and only the passive device in the coverage area can communicate with the unmanned aerial vehicle. When the drone is in WPT mode, the drone broadcasts energy to all passive devices in the coverage area. At the end of the time slot, the passive device receiving the energy will send a short beacon status message to the drone, including battery power, channel information and accumulated data volume. In the next time slot, the drone will determine the next action, i.e. steering angle, passive device selection and communication mode, based on the received status information of some passive devices. In the flight process, the coverage area of the unmanned aerial vehicle can change, and the unmanned aerial vehicle can automatically navigate to the optimal position to receive more passive equipment state information, improve the average data volume to the greatest extent, and reasonably plan the flight path while meeting the energy constraint of the passive equipment.

S5, defining a neural network input state, an unmanned aerial vehicle output action and a reward function; the method is realized by the following steps:

s52, determining the output unmanned aerial vehicle action set A as: a ═ i, [ rho ] (t), α (t), v_UAV(t), where ρ (t) represents a communication mode of the drone, ρ (t) ═ 1 represents a downlink transmission mode, and ρ (t) ═ 0 represents an uplink transmission mode; α (t) represents the steering angle of the drone, α (t) ∈ {0 °,45 °,90 °,135 °,180 °,225 °,270 °,325 ° }; v. of_UAV(t) denotes the flying speed of the drone, v_UAV(t)∈{0m/s,5m/s,10m/s}；

S6, solving an optimization problem according to a depth reinforcement learning algorithm;

deep reinforcement learning algorithm as shown in fig. 3, the deep reinforcement learning algorithm can obtain the best strategy pi to maximize the long-term expected cumulative reward. The expected jackpot for each state-action pair output by the neural network may be defined as

Where gamma represents the discount factor. By selecting the best action

An optimal action-value function can be obtained

Wherein

Indicating the learning rate.

The deep reinforcement learning algorithm comprises two neural networks, wherein one neural network is a current neural network and used for calculating a value Q in a current state, and the other neural network is a target neural network and used for calculating a value Q in a next state.

Inputting: iteration round number F, state characteristic dimension n, action set A, attenuation factor gamma, exploration rate epsilon and learning rate

Q network structure, batch gradient descending sample number m, and target Q network parameter updating frequency.

The method specifically comprises the following steps:

Q′(s_j+1,a_j+1(ii) a ω') represents the value of the next state, calculated by the target neural network, rather than the current neural network, thus avoiding training the neural network with the current neural network and avoiding too strong a coupling.

y_tRepresents the Q value calculated by the formula, which is obtained by calculation and is not directly output through a neural network; the aforementioned Q value is equivalent to a true Q value and is obtained by directly inputting a state into the Q network. The invention aims to train a neural network and use the value Q output by the neural network to approximate the value y obtained by the calculation of the formula_tAnd the mean square error loss function between the two functions is minimized, so that the neural network can perfectly simulate the target value Q finally.

S67, calculating a mean square error loss function

Updating all parameters omega of the neural network through gradient back propagation of the neural network so as to minimize a mean square error loss function; y is_jIs shown in state s_jThe value calculated by the formula of S66,

s68, if t% of the target neural network parameter update frequency is 1, updating the target neural network parameter ω' ═ ω (that is, the target neural network parameter is updated once at intervals of update frequency and time), otherwise, not updating the target neural network parameter;

It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims

1. The unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network, is characterized in that, comprises the following steps:

S1. Determine the network model, communication mode and channel model;

S2. Model the downlink wireless power transmission and the uplink wireless information transmission, and determine the optimization target expression and its constraints;

S3, analyze the optimization problem, and model the optimization problem as a Markov process;

S4. Determine the network communication protocol and the UAV flight decision model;

S5. Define the input state of the neural network, the output action of the UAV, and the reward function;

S6. Solve the optimization problem according to the deep reinforcement learning algorithm.

2. The unmanned aerial vehicle autonomous navigation and resource scheduling method of the wireless self-powered communication network according to claim 1, wherein the network model is composed of an unmanned aerial vehicle and a plurality of ground passive devices;

The communication method is: the UAV transmits energy to the ground passive equipment through the radio frequency link, and the ground passive equipment sends data to the UAV through the harvested energy;

The channel model is a Los channel.

3. The unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network according to claim 1, is characterized in that, described step S2 specifically comprises the following sub-steps:

S21. For downlink wireless power transmission, determine the energy harvested by the ground passive equipment;

S22. For uplink wireless information transmission, when the drone selects a certain ground passive device for communication, determine the amount of uplink transmission data;

S23, determine the optimization objective expression and its constraints.

4. The unmanned aerial vehicle autonomous navigation and resource scheduling method of the wireless self-powered communication network according to claim 1, is characterized in that, the concrete realization method of described step S5 is:

S51. Determine the network state set: define the network state as S={e _i (t), ζ _i , q(t), h _i (t)}, and e _i (t) represents the i-th non-network state within the coverage area at time t The battery level of the source device, ζ _i represents the cumulative upload data amount of the passive device i, q(t) represents the position of the drone at time t, h _i (t) represents the passive device i and the drone at time t the channel gain;

S52. Determine the output UAV action set A as: A={i, ρ(t), α(t), v _UAV (t)}, where ρ(t) represents the communication mode of the UAV, ρ (t)=1 represents the downlink transmission mode, ρ(t)=0 represents the uplink transmission mode; α(t) represents the steering angle of the UAV; v _UAV (t) represents the flight speed of the UAV;

S53, determine the reward mechanism: define the reward function r=r _data +r _penalty ,

Represents the variation of the average data volume of the network. Once any of the constraints is not satisfied, the corresponding penalty r _penalty will be executed, and I represents the total number of passive devices.

5. The unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network according to claim 1, is characterized in that, described step S6 specifically comprises the following sub-steps:

S61, initialize network parameters: initialize all the values Q corresponding to the states and actions, initialize all the parameters ω of the current neural network, the parameter ω′=ω of the target neural network, and clear the set D of experience playback;

S62, initialize s _t as the current state, and obtain the feature vector φ(s _t ) of the current state;

S63. Use φ(s _t ) as the input in the neural network to obtain the value Q corresponding to all states of the neural network, and use the ε-greedy method to select the corresponding action a _t in the current value Q;

S64. Execute the current action a _t in the state s _t to obtain the new state s _t+1 , as well as the feature vector φ(s _t+1 ) corresponding to the new state and the reward r _t of the current state, set {φ(s _t ), a _t , r _t , φ(s _t+1 )} this quadruple is stored in the experience playback set D;

S65, set t=t+1, then s _t =s _t+1 , judge whether the new state s _t+1 is the termination flight state, if not, return to step S63; if so, continue to judge whether the number of iteration rounds t+1 is greater than T, if yes, end the iteration, otherwise return to step S63;

S66. Sample m samples {φ(s _j ), a _j , r _j , φ(s _j+1 )}, j=1,...,m from the experience playback set D, and calculate the current target according to the following formula State-action value y _j :

Q′(s _j+1 , a _j+1 ; ω′) represents the value of the next state, which is calculated by the target neural network;

S67. Calculate the mean square error loss function

All parameters ω of the neural network are updated through the gradient back-propagation of the neural network, so that the mean square error loss function is minimized; y _j represents the value calculated by the formula of S66 in the state s _j ,

Represents the value directly output by the current neural network in state s _j ;

S68, if t% target neural network parameter update frequency=1, update the target neural network parameter ω′=ω, otherwise do not update the target neural network parameter;

S69. Update the coordinates of the drone, calculate the battery level of the passive device, the cumulative upload data volume of the passive device, and the channel gain between the passive device and the drone.