CN113255218A - Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network - Google Patents
Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network Download PDFInfo
- Publication number
- CN113255218A CN113255218A CN202110582074.0A CN202110582074A CN113255218A CN 113255218 A CN113255218 A CN 113255218A CN 202110582074 A CN202110582074 A CN 202110582074A CN 113255218 A CN113255218 A CN 113255218A
- Authority
- CN
- China
- Prior art keywords
- unmanned aerial
- aerial vehicle
- network
- neural network
- state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W16/00—Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
- H04W16/22—Traffic simulation tools or models
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/02—Traffic management, e.g. flow control or congestion control
- H04W28/0226—Traffic management, e.g. flow control or congestion control based on location or mobility
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/02—Traffic management, e.g. flow control or congestion control
- H04W28/06—Optimizing the usage of the radio link, e.g. header compression, information sizing, discarding information
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/30—Services specially adapted for particular environments, situations or purposes
- H04W4/38—Services specially adapted for particular environments, situations or purposes for collecting sensor information
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/02—CAD in a network environment, e.g. collaborative CAD or distributed simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/04—Constraint-based CAD
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/08—Probabilistic or stochastic CAD
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Computer Hardware Design (AREA)
- Geometry (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention discloses an unmanned aerial vehicle autonomous navigation and resource scheduling method of a wireless self-powered communication network, which comprises the following steps: s1, determining a network model, a communication mode and a channel model; s2, modeling downlink wireless power transmission and uplink wireless information transmission, and determining an optimized target expression and constraint conditions thereof; s3, analyzing the optimization problem, and modeling the optimization problem as a Markov process; s4, determining a network communication protocol and an unmanned aerial vehicle flight decision model; s5, defining a neural network input state, an unmanned aerial vehicle output action and a reward function; and S6, solving the optimization problem according to the depth reinforcement learning algorithm. According to the invention, through jointly designing three parts, namely the flight path of the unmanned aerial vehicle in the wireless self-powered communication network, the selection of the ground equipment and the communication mode with the ground equipment, the energy supply to a plurality of pieces of ground equipment is realized, and meanwhile, the maximization of the average data volume of the plurality of pieces of ground equipment in the wireless self-powered communication network is also considered.
Description
Technical Field
The invention belongs to the technical field of unmanned aerial vehicle energy supply communication networks, and particularly relates to an unmanned aerial vehicle autonomous navigation and resource scheduling method of a wireless self-powered communication network.
Background
Wireless Sensor Networks (WSNs) may be used to collect information about the surrounding environment. Generally, the power of the device in the wireless sensor network is limited, and when the power of the device is exhausted, the sensor is charged in a manual mode or a traditional ground communication network, so that the charging efficiency is low. While Radio Frequency (RF) based Energy Harvesting (EH) may be considered as an expected solution to extend the useful life of energy-limited sensor devices. Wireless Power Transfer (WPT) through RF radiation can provide a convenient, reliable energy supply for low power internet of things devices. And can operate over a longer range, charging it simultaneously even under multiple wireless devices that are moving. Wireless Powered Communication Networks (WPCNs) have therefore been proposed. The Wireless Power Transmission (WPT) and Wireless Information Transmission (WIT) are integrated, and a feasible solution is provided for energy constraint Internet of things equipment.
UAV (Unmanned Aerial Vehicle) by virtue of its high mobility and low cost, it can support better communication links between air and ground terminals due to less signal blocking and shadowing effects. It can provide higher line of sight (LoS) channel probability and better connectivity by greatly shortening its distance to the user compared to a conventional fixed base station. The UAV, as an air base station, can be used to overcome the user unfairness problem caused by the "double far-near" problem in the conventional fixed base station wireless energy supply network, and improve the data rate by flexibly reducing the signal propagation distance between the UAV and the ground equipment.
However, in the current technology, the energy transmission and data collection tasks of the unmanned aerial vehicle in an unknown environment are not considered under the condition that the position of the ground equipment is known.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an unmanned aerial vehicle autonomous navigation and resource scheduling method of a wireless self-powered communication network, which realizes energy supply to a plurality of ground devices and also maximizes the average data volume of the plurality of devices in the wireless self-powered communication network by jointly designing three parts, namely a flight track of an unmanned aerial vehicle, selection of ground devices and a communication mode with the ground devices in the wireless self-powered communication network.
The purpose of the invention is realized by the following technical scheme: the unmanned aerial vehicle autonomous navigation and resource scheduling method of the wireless self-powered communication network comprises the following steps:
s1, determining a network model, a communication mode and a channel model;
s2, modeling downlink wireless power transmission and uplink wireless information transmission, and determining an optimized target expression and constraint conditions thereof;
s3, analyzing the optimization problem, and modeling the optimization problem as a Markov process;
s4, determining a network communication protocol and an unmanned aerial vehicle flight decision model;
s5, defining a neural network input state, an unmanned aerial vehicle output action and a reward function;
and S6, solving the optimization problem according to the depth reinforcement learning algorithm.
Furthermore, the network model consists of an unmanned aerial vehicle and a plurality of ground passive devices;
the communication mode is as follows: the unmanned aerial vehicle transmits energy to the ground passive device through the radio frequency link, and the ground passive device transmits data to the unmanned aerial vehicle through the harvested energy;
the channel model is a Los channel.
Further, the step S2 specifically includes the following sub-steps:
s21, determining the energy harvested by the ground passive equipment for the downlink wireless power transmission;
s22, for uplink wireless information transmission, when the unmanned aerial vehicle selects a certain ground passive device for communication, determining an uplink transmission data volume;
and S23, determining an optimization target expression and a constraint condition thereof.
Further, the step S5 specifically includes the following sub-steps:
s51, determining a network state set: defining the network state as S ═ ei(t),ζi,q(t),hi(t)},ei(t) represents the battery power level, ζ, of the ith passive device at time t within the coverage areaiRepresenting the accumulated uploaded data volume of the passive device i, q (t) representing the position of the unmanned aerial vehicle at the moment t, hi(t) represents the channel gain of the passive device i and the unmanned aerial vehicle at the time t;
s52, determining the output unmanned aerial vehicle action set A as: a ═ i, [ rho ] (t), α (t), vUAV(t), where ρ (t) represents a communication mode of the drone, ρ (t) ═ 1 represents a downlink transmission mode, and ρ (t) ═ 0 represents an uplink transmission mode; α (t) represents an unmanned aerial vehicle steering angle; v. ofUAV(t) represents the flight speed of the drone;
s53, determining a reward mechanism: defining a reward function r ═ rdata+rpenalty,Representing the variation of the network average data volume, and executing corresponding penalty r once any constraint in the constraint conditions is not satisfiedpenaltyAnd I denotes the total number of passive devices.
Further, the step S6 specifically includes the following sub-steps:
s61, initializing network parameters: initializing values Q corresponding to all states and actions, initializing all parameters omega of the current neural network, emptying a set D of experience playback, wherein the parameters omega' of the target neural network are omega;
s62 initialization StFor the current state, the feature vector phi(s) of the current state is obtainedt);
S63, using phi (S) in the neural networkt) As input, values Q corresponding to all states of the neural network are obtained, and corresponding actions a are selected from the current values Q by an epsilon-greedy methodt;
S64, at state StPerforming a current action atTo obtain a new state st+1And a feature vector phi(s) corresponding to the new statet+1) And the prize r of the current statetWill { phi(s)t),at,rt,φ(st+1) Storing the quadruple into an empirical playback set D;
s65, let t be t +1, then St=st+1Judging the new state st+1Whether the flight is in the ending state or not, if not, returning to the step S63; if yes, continuing to judge whether the iteration round number T +1 is larger than T, if yes, ending the iteration, otherwise, returning to the step S63;
s66, sampling m samples from the empirical playback set Dj),aj,rj,φ(sj+1) J 1.. m, the current target state-action value y is calculated according to the following formulaj:
Q′(sj+1,aj+1(ii) a ω') represents the value of the next state, calculated by the target neural network;
s67, calculating a mean square error loss functionUpdating all parameters omega of the neural network by gradient back propagation of the neural network such that the mean square error loss functionMinimization; y isjIs shown in state sjThe value calculated by the formula of S66,is shown in state sjThe value directly output through the current neural network;
s68, if t% of the target neural network parameter update frequency is 1, updating the target neural network parameter ω' ═ ω, otherwise, not updating the target neural network parameter;
s69, updating coordinates of the unmanned aerial vehicle, calculating the battery power level of the passive device, accumulating the uploaded data volume by the passive device, and obtaining the channel gain of the passive device and the unmanned aerial vehicle.
The invention has the beneficial effects that: the invention achieves the purpose of improving the average uplink transmission data volume of system users to the maximum extent by jointly designing three parts of the flight path of the unmanned aerial vehicle, the selection of ground equipment and the communication mode of the ground equipment in the wireless self-powered communication network, performs optimized solution by a deep reinforcement learning algorithm, and inputs the system state into a neural network so as to output the optimal action of the unmanned aerial vehicle. The invention fully considers the problem that the unmanned aerial vehicle does not have prior knowledge on the position of the ground equipment, realizes energy supply to a plurality of pieces of ground equipment, and simultaneously considers the maximization of the average data volume of the plurality of pieces of equipment in the wireless self-powered communication network.
Drawings
Fig. 1 is a flow chart of the autonomous navigation and resource scheduling method of the unmanned aerial vehicle of the present invention;
fig. 2 is a schematic diagram of a wireless self-powered communication network model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a deep reinforcement learning algorithm model according to the present invention.
Detailed Description
The technical scheme of the invention is further explained by combining the attached drawings.
As shown in fig. 1, the method for unmanned aerial vehicle autonomous navigation and resource scheduling in a wireless autonomous power communication network of the present invention includes the following steps:
s1, determining a network model, a communication mode and a channel model;
the network model consists of an unmanned aerial vehicle and a plurality of ground passive devices; suppose that there is an unmanned aerial vehicle as the aerial base station in the WPCN network, and there are I passive (sensor) devices on the ground, which are recorded as Representing a two-dimensional space. The drone is destined to collect data for I passive devices in the area. In order to simplify the network model, the flying height of the unmanned aerial vehicle is assumed to be unchanged and fixed as H. The position of the unmanned aerial vehicle at time t is denoted as q (t) ═ x (t), y (t), and the flying speed is vUAV(t) the carrier signal transmission power of the unmanned aerial vehicle is PUAVChannel noise power of σ2At time t, the distance between the unmanned aerial vehicle and each passive device isWhere | · | | represents the euclidean distance between a pair of vectors, wiThe location of the ith passive device is indicated. The energy conversion efficiency coefficient of the passive equipment is eta, and the signal transmitting power is Ptr. A model of a communication network based on drones is shown in fig. 2.
The communication mode is as follows: the unmanned aerial vehicle transmits energy to the ground passive device through the radio frequency link, and the ground passive device transmits data to the unmanned aerial vehicle through the harvested energy; drones serve as both transmitters of energy and receivers of information. The ground passive equipment adopts a protocol of 'harvesting before transmission', namely, after enough energy is harvested from a downlink radio frequency link of the unmanned aerial vehicle, data are transmitted to the unmanned aerial vehicle through an uplink link. The total working time of the unmanned aerial vehicle is T, and at each time T, the unmanned aerial vehicle determines a communication mode and uses rho (T) epsilon {0,1} to represent the communication mode. Wherein ρ (t) ═ 1 represents a downlink transmission mode, and the unmanned aerial vehicle broadcasts energy to the ground passive equipment; ρ (t) ═ 0 represents the uplink transmission mode, and the drone selects a certain passive device to receive its uploaded data information, and only one device is allowed to upload at this time.
The channel model is a Los channel; at time t, the two-dimensional coordinates of the drone are q (t) ═ (x (t), y (t)). Suppose there is a Los channel between the drone and the ground passive device, and the path loss index is 2. The passive device i and the unmanned aerial vehicle have channel gains at the moment t ofβ0Representing the channel gain at a reference distance of 1 meter.
S2, modeling downlink wireless power transmission and uplink wireless information transmission, and determining an optimized target expression and constraint conditions thereof; the method specifically comprises the following steps:
s21, determining the energy harvested by the ground passive equipment for the downlink wireless power transmission to obtain an energy constraint condition; assuming that the unmanned aerial vehicle is in a downlink transmission mode, the corresponding power received by the passive device i at the moment t isWherein P isUAVThe transmitting power of the unmanned aerial vehicle is represented, and eta is an energy conversion efficiency coefficient of the passive equipment. Suppose inWithin the time, the unmanned aerial vehicle is always in a downlink communication mode, and the battery energy on the passive device i isComparing the residual battery power of the passive equipment with an energy threshold value, judging whether the residual battery power of the passive equipment is greater than the energy threshold value, if so, defining the power level of the passive equipment to be 1, otherwise, defining the power level of the passive equipment to be 0, and discretizing the battery power of all the passive equipment into high and low levels ei(t)∈{0,1}。
S22, for uplink wireless information transmission, when the unmanned aerial vehicle selects a certain ground passive device for communication, determining the uplink transmission data volume to obtain a service quality constraint condition; suppose an unmanned aerial vehicleAnd in an uplink transmission mode, at the moment, the passive device i is selected to transmit data to the unmanned aerial vehicle, and the throughput of the passive device i at the moment t isWhere B is the system bandwidth, PtrIs the transmitted power of the passive device(s),is a reference signal-to-noise ratio (SNR). Suppose inWithin the time, the passive device i is selected all the time to send data to the unmanned aerial vehicle, and the data volume uploaded by the passive device i is accumulated to be
S23, determining an optimized target expression and a constraint condition thereof, wherein the target problem of the maximization of the average data volume of the system is as follows:
q(0)=q(T)
where P1 represents the optimization problem P1, i.e. maximizing the average throughput of all devices by adjusting drone position, speed and communication mode;representing the average flight speed of the unmanned aerial vehicle, and tau representing the current flight time of the unmanned aerial vehicle; q (0) represents the position of the drone at time T-0, q (T) represents the position of the drone at time T-T, T is the drone flight time specified in advance, and q (0) q (T) represents the need for the drone to return to the home position at time T. ZetaQoSThe constraint representing the QoS criteria, i.e. the minimum amount of data uploaded per sensor, also represents that the drone needs to traverse all sensors.
S3, analyzing the optimization problem, and modeling the optimization problem as a Markov process; the Markov process is composed of 4 tuples<S,A,R,P>Definition, where S is a set of states; a is the set of all possible actions, R is the reward when an action is taken, and P represents the transition probability from one state to another. Specifically, the drone observes the environment and obtains state s as a smart agenttE.g. S. Unmanned aerial vehicle selects action a at time ttE.g. A, then according to the observation and the next state st+1Obtain a return rt∈R。
S4, determining a network communication protocol and an unmanned aerial vehicle flight decision model; in order to solve the problem that the unmanned aerial vehicle does not have a priori knowledge of the position of the passive device, a coverage area is defined for the unmanned aerial vehicle, and only the passive device in the coverage area can communicate with the unmanned aerial vehicle. When the drone is in WPT mode, the drone broadcasts energy to all passive devices in the coverage area. At the end of the time slot, the passive device receiving the energy will send a short beacon status message to the drone, including battery power, channel information and accumulated data volume. In the next time slot, the drone will determine the next action, i.e. steering angle, passive device selection and communication mode, based on the received status information of some passive devices. In the flight process, the coverage area of the unmanned aerial vehicle can change, and the unmanned aerial vehicle can automatically navigate to the optimal position to receive more passive equipment state information, improve the average data volume to the greatest extent, and reasonably plan the flight path while meeting the energy constraint of the passive equipment.
S5, defining a neural network input state, an unmanned aerial vehicle output action and a reward function; the method is realized by the following steps:
s51, determining a network state set: defining the network state as S ═ ei(t),ζi,q(t),hi(t)},ei(t) represents the battery power level, ζ, of the ith passive device at time t within the coverage areaiRepresenting the accumulated uploaded data volume of the passive device i, q (t) representing the position of the unmanned aerial vehicle at the moment t, hi(t) represents the channel gain of the passive device i and the unmanned aerial vehicle at the time t;
s52, determining the output unmanned aerial vehicle action set A as: a ═ i, [ rho ] (t), α (t), vUAV(t), where ρ (t) represents a communication mode of the drone, ρ (t) ═ 1 represents a downlink transmission mode, and ρ (t) ═ 0 represents an uplink transmission mode; α (t) represents the steering angle of the drone, α (t) ∈ {0 °,45 °,90 °,135 °,180 °,225 °,270 °,325 ° }; v. ofUAV(t) denotes the flying speed of the drone, vUAV(t)∈{0m/s,5m/s,10m/s};
S53, determining a reward mechanism: defining a reward function r ═ rdata+rpenalty,Representing the variation of the network average data volume, and executing corresponding penalty r once any constraint in the constraint conditions is not satisfiedpenaltyAnd I denotes the total number of passive devices.
S6, solving an optimization problem according to a depth reinforcement learning algorithm;
deep reinforcement learning algorithm as shown in fig. 3, the deep reinforcement learning algorithm can obtain the best strategy pi to maximize the long-term expected cumulative reward. The expected jackpot for each state-action pair output by the neural network may be defined asWhere gamma represents the discount factor. By selecting the best actionAn optimal action-value function can be obtainedWhereinIndicating the learning rate.
The deep reinforcement learning algorithm comprises two neural networks, wherein one neural network is a current neural network and used for calculating a value Q in a current state, and the other neural network is a target neural network and used for calculating a value Q in a next state.
Inputting: iteration round number F, state characteristic dimension n, action set A, attenuation factor gamma, exploration rate epsilon and learning rateQ network structure, batch gradient descending sample number m, and target Q network parameter updating frequency.
The method specifically comprises the following steps:
s61, initializing network parameters: initializing values Q corresponding to all states and actions, initializing all parameters omega of the current neural network, emptying a set D of experience playback, wherein the parameters omega' of the target neural network are omega;
s62 initialization StFor the current state, the feature vector phi(s) of the current state is obtainedt);
S63, using phi (S) in the neural networkt) As input, values Q corresponding to all states of the neural network are obtained, and corresponding actions a are selected from the current values Q by an epsilon-greedy methodt;
S64, at state StPerforming a current action atTo obtain a new state st+1And a feature vector phi(s) corresponding to the new statet+1) And the prize r of the current statetWill { phi(s)t),at,rt,φ(st+1) Storing the quadruple into an empirical playback set D;
s65, let t be t +1, then St=st+1Judging the new state st+1Whether the flight is in the ending state or not, if not, returning to the step S63; if yes, continuing to judge whether the iteration round number T +1 is larger than T, if yes, ending the iteration, otherwise, returning to the step S63;
s66, sampling m samples from the empirical playback set Dj),aj,rj,φ(sj+1) J 1.. m, the current target state-action value y is calculated according to the following formulaj:
Q′(sj+1,aj+1(ii) a ω') represents the value of the next state, calculated by the target neural network, rather than the current neural network, thus avoiding training the neural network with the current neural network and avoiding too strong a coupling.
ytRepresents the Q value calculated by the formula, which is obtained by calculation and is not directly output through a neural network; the aforementioned Q value is equivalent to a true Q value and is obtained by directly inputting a state into the Q network. The invention aims to train a neural network and use the value Q output by the neural network to approximate the value y obtained by the calculation of the formulatAnd the mean square error loss function between the two functions is minimized, so that the neural network can perfectly simulate the target value Q finally.
S67, calculating a mean square error loss functionUpdating all parameters omega of the neural network through gradient back propagation of the neural network so as to minimize a mean square error loss function; y isjIs shown in state sjThe value calculated by the formula of S66,is shown in state sjThe value directly output through the current neural network;
s68, if t% of the target neural network parameter update frequency is 1, updating the target neural network parameter ω' ═ ω (that is, the target neural network parameter is updated once at intervals of update frequency and time), otherwise, not updating the target neural network parameter;
s69, updating coordinates of the unmanned aerial vehicle, calculating the battery power level of the passive device, accumulating the uploaded data volume by the passive device, and obtaining the channel gain of the passive device and the unmanned aerial vehicle.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.
Claims (5)
1. The unmanned aerial vehicle autonomous navigation and resource scheduling method of the wireless self-powered communication network is characterized by comprising the following steps:
s1, determining a network model, a communication mode and a channel model;
s2, modeling downlink wireless power transmission and uplink wireless information transmission, and determining an optimized target expression and constraint conditions thereof;
s3, analyzing the optimization problem, and modeling the optimization problem as a Markov process;
s4, determining a network communication protocol and an unmanned aerial vehicle flight decision model;
s5, defining a neural network input state, an unmanned aerial vehicle output action and a reward function;
and S6, solving the optimization problem according to the depth reinforcement learning algorithm.
2. The unmanned aerial vehicle autonomous navigation and resource scheduling method of a wireless self-powered communication network of claim 1, wherein the network model consists of one unmanned aerial vehicle and a plurality of ground passive devices;
the communication mode is as follows: the unmanned aerial vehicle transmits energy to the ground passive device through the radio frequency link, and the ground passive device transmits data to the unmanned aerial vehicle through the harvested energy;
the channel model is a Los channel.
3. The unmanned aerial vehicle autonomous navigation and resource scheduling method of a wireless self-powered communication network of claim 1, wherein the step S2 specifically comprises the following sub-steps:
s21, determining the energy harvested by the ground passive equipment for the downlink wireless power transmission;
s22, for uplink wireless information transmission, when the unmanned aerial vehicle selects a certain ground passive device for communication, determining an uplink transmission data volume;
and S23, determining an optimization target expression and a constraint condition thereof.
4. The unmanned aerial vehicle autonomous navigation and resource scheduling method of a wireless self-powered communication network of claim 1, wherein the step S5 is implemented by:
s51, determining a network state set: defining the network state as S ═ ei(t),ζi,q(t),hi(t)},ei(t) represents the battery power level, ζ, of the ith passive device at time t within the coverage areaiRepresenting the accumulated uploaded data volume of the passive device i, q (t) representing the position of the unmanned aerial vehicle at the moment t, hi(t) represents the channel gain of the passive device i and the unmanned aerial vehicle at the time t;
s52, determining the output unmanned aerial vehicle action set A as: a ═ i, [ rho ] (t), α (t), vUAV(t), where ρ (t) represents a communication mode of the drone, ρ (t) ═ 1 represents a downlink transmission mode, and ρ (t) ═ 0 represents an uplink transmission mode; alpha (t) denotes unmanned aerial vehicle steeringAn angle; v. ofUAV(t) represents the flight speed of the drone;
s53, determining a reward mechanism: defining a reward function r ═ rdata+rpenalty,Representing the variation of the network average data volume, and executing corresponding penalty r once any constraint in the constraint conditions is not satisfiedpenaltyAnd I denotes the total number of passive devices.
5. The unmanned aerial vehicle autonomous navigation and resource scheduling method of a wireless self-powered communication network of claim 1, wherein the step S6 specifically comprises the following sub-steps:
s61, initializing network parameters: initializing values Q corresponding to all states and actions, initializing all parameters omega of the current neural network, emptying a set D of experience playback, wherein the parameters omega' of the target neural network are omega;
s62 initialization StFor the current state, the feature vector phi(s) of the current state is obtainedt);
S63, using phi (S) in the neural networkt) As input, values Q corresponding to all states of the neural network are obtained, and corresponding actions a are selected from the current values Q by an epsilon-greedy methodt;
S64, at state StPerforming a current action atTo obtain a new state st+1And a feature vector phi(s) corresponding to the new statet+1) And the prize r of the current statetWill { phi(s)t),at,rt,φ(st+1) Storing the quadruple into an empirical playback set D;
s65, let t be t +1, then St=st+1Judging the new state st+1Whether the flight is in the ending state or not, if not, returning to the step S63; if yes, continuing to judge whether the iteration round number T +1 is larger than T, if yes, ending the iteration, otherwise, returning to the step S63;
s66, sampling m samples from the empirical playback set Dj),aj,rj,φ(sj+1) J 1.. m, the current target state-action value y is calculated according to the following formulaj:
Q′(sj+1,aj+1(ii) a ω') represents the value of the next state, calculated by the target neural network;
s67, calculating a mean square error loss functionUpdating all parameters omega of the neural network through gradient back propagation of the neural network so as to minimize a mean square error loss function; y isjIs shown in state sjThe value calculated by the formula of S66,is shown in state sjThe value directly output through the current neural network;
s68, if t% of the target neural network parameter update frequency is 1, updating the target neural network parameter ω' ═ ω, otherwise, not updating the target neural network parameter;
s69, updating coordinates of the unmanned aerial vehicle, calculating the battery power level of the passive device, accumulating the uploaded data volume by the passive device, and obtaining the channel gain of the passive device and the unmanned aerial vehicle.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110582074.0A CN113255218B (en) | 2021-05-27 | 2021-05-27 | Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110582074.0A CN113255218B (en) | 2021-05-27 | 2021-05-27 | Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113255218A true CN113255218A (en) | 2021-08-13 |
CN113255218B CN113255218B (en) | 2022-05-31 |
Family
ID=77184662
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110582074.0A Active CN113255218B (en) | 2021-05-27 | 2021-05-27 | Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113255218B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114061589A (en) * | 2021-11-16 | 2022-02-18 | 中山大学 | Edge-side-coordinated multi-unmanned aerial vehicle autonomous navigation method |
CN115470894A (en) * | 2022-10-31 | 2022-12-13 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle knowledge model time-sharing calling method and device based on reinforcement learning |
CN115766769A (en) * | 2022-10-25 | 2023-03-07 | 西北工业大学 | Wireless sensor network deployment method based on deep reinforcement learning |
CN116502547A (en) * | 2023-06-29 | 2023-07-28 | 深圳大学 | Multi-unmanned aerial vehicle wireless energy transmission method based on graph reinforcement learning |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110428115A (en) * | 2019-08-13 | 2019-11-08 | 南京理工大学 | Maximization system benefit method under dynamic environment based on deeply study |
CN110488861A (en) * | 2019-07-30 | 2019-11-22 | 北京邮电大学 | Unmanned plane track optimizing method, device and unmanned plane based on deeply study |
WO2020079702A1 (en) * | 2018-10-18 | 2020-04-23 | Telefonaktiebolaget Lm Ericsson (Publ) | Formation flight of unmanned aerial vehicles |
CN111786713A (en) * | 2020-06-04 | 2020-10-16 | 大连理工大学 | Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning |
CN112468205A (en) * | 2020-01-09 | 2021-03-09 | 电子科技大学中山学院 | Backscatter secure communication method suitable for unmanned aerial vehicle |
CN112711271A (en) * | 2020-12-16 | 2021-04-27 | 中山大学 | Autonomous navigation unmanned aerial vehicle power optimization method based on deep reinforcement learning |
CN112817327A (en) * | 2020-12-30 | 2021-05-18 | 北京航空航天大学 | Multi-unmanned aerial vehicle collaborative search method under communication constraint |
-
2021
- 2021-05-27 CN CN202110582074.0A patent/CN113255218B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020079702A1 (en) * | 2018-10-18 | 2020-04-23 | Telefonaktiebolaget Lm Ericsson (Publ) | Formation flight of unmanned aerial vehicles |
CN110488861A (en) * | 2019-07-30 | 2019-11-22 | 北京邮电大学 | Unmanned plane track optimizing method, device and unmanned plane based on deeply study |
CN110428115A (en) * | 2019-08-13 | 2019-11-08 | 南京理工大学 | Maximization system benefit method under dynamic environment based on deeply study |
CN112468205A (en) * | 2020-01-09 | 2021-03-09 | 电子科技大学中山学院 | Backscatter secure communication method suitable for unmanned aerial vehicle |
CN111786713A (en) * | 2020-06-04 | 2020-10-16 | 大连理工大学 | Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning |
CN112711271A (en) * | 2020-12-16 | 2021-04-27 | 中山大学 | Autonomous navigation unmanned aerial vehicle power optimization method based on deep reinforcement learning |
CN112817327A (en) * | 2020-12-30 | 2021-05-18 | 北京航空航天大学 | Multi-unmanned aerial vehicle collaborative search method under communication constraint |
Non-Patent Citations (4)
Title |
---|
JIE HU 等: "Joint Trajectory and Scheduling Design for UAV Aided Secure Backscatter Communications", 《IEEE WIRELESS COMMUNICATIONS LETTERS》, vol. 9, no. 12, 12 April 2020 (2020-04-12), pages 2168 - 2172, XP011824554, DOI: 10.1109/LWC.2020.3016174 * |
KAI LI 等: "Deep Reinforcement Learning for Real-Time Trajectory Planning in UAV Networks", 《2020 INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING (IWCMC)》, 27 July 2020 (2020-07-27), pages 958 - 963 * |
伍芸荻: "无人机通信系统中信息和能量传输优化研究", 《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑》, no. 8, 15 August 2019 (2019-08-15), pages 031 - 66 * |
杨鲲 等: "无线数能一体化通信网络及其数能联合接入控制协议设计", 《吉林师范大学学报(自然科学版)》, vol. 40, no. 1, 16 January 2019 (2019-01-16), pages 106 - 114 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114061589A (en) * | 2021-11-16 | 2022-02-18 | 中山大学 | Edge-side-coordinated multi-unmanned aerial vehicle autonomous navigation method |
CN114061589B (en) * | 2021-11-16 | 2023-05-26 | 中山大学 | Multi-unmanned aerial vehicle autonomous navigation method with cooperative end edges |
CN115766769A (en) * | 2022-10-25 | 2023-03-07 | 西北工业大学 | Wireless sensor network deployment method based on deep reinforcement learning |
CN115470894A (en) * | 2022-10-31 | 2022-12-13 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle knowledge model time-sharing calling method and device based on reinforcement learning |
CN116502547A (en) * | 2023-06-29 | 2023-07-28 | 深圳大学 | Multi-unmanned aerial vehicle wireless energy transmission method based on graph reinforcement learning |
CN116502547B (en) * | 2023-06-29 | 2024-06-04 | 深圳大学 | Multi-unmanned aerial vehicle wireless energy transmission method based on graph reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN113255218B (en) | 2022-05-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113255218B (en) | Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network | |
Zhan et al. | Energy minimization for cellular-connected UAV: From optimization to deep reinforcement learning | |
CN108880662B (en) | Wireless information and energy transmission optimization method based on unmanned aerial vehicle | |
CN110730028B (en) | Unmanned aerial vehicle-assisted backscatter communication device and resource allocation control method | |
CN114389679B (en) | Multi-antenna unmanned aerial vehicle sensing and transmission optimization method based on information age minimization | |
CN111988762A (en) | Energy efficiency maximum resource allocation method based on unmanned aerial vehicle D2D communication network | |
CN109743210A (en) | Unmanned plane network multi-user connection control method based on deeply study | |
CN115494732B (en) | Unmanned aerial vehicle track design and power distribution method based on near-end strategy optimization | |
Li et al. | Deep reinforcement learning for real-time trajectory planning in UAV networks | |
CN113776531B (en) | Multi-unmanned aerial vehicle autonomous navigation and task allocation algorithm of wireless self-powered communication network | |
Ni et al. | Optimal transmission control and learning-based trajectory design for UAV-assisted detection and communication | |
CN117119489A (en) | Deployment and resource optimization method of wireless energy supply network based on multi-unmanned aerial vehicle assistance | |
Cui et al. | Joint trajectory and power optimization for energy efficient UAV communication using deep reinforcement learning | |
Shi et al. | Age of information optimization with heterogeneous uavs based on deep reinforcement learning | |
Zhang et al. | Multi-objective optimization for UAV-enabled wireless powered IoT networks: an LSTM-based deep reinforcement learning approach | |
CN108337024B (en) | Large-scale MIMO system energy efficiency optimization method based on energy collection | |
Ouamri et al. | Joint Energy Efficiency and Throughput Optimization for UAV-WPT Integrated Ground Network using DDPG | |
Wei et al. | An energy efficient cooperation design for multi-UAVs enabled wireless powered communication networks | |
CN115412156B (en) | Urban monitoring-oriented satellite energy-carrying Internet of things resource optimal allocation method | |
Lyu et al. | Resource Allocation in UAV‐Assisted Wireless Powered Communication Networks for Urban Monitoring | |
Zhou et al. | Model-based machine learning for energy-efficient UAV placement | |
Chen et al. | Deep Reinforcement Learning Assisted UAV Trajectory and Resource Optimization for NOMA Networks | |
Lee et al. | Multi-Agent Deep Reinforcement Learning-Based Multi-UAV Path Planning for Wireless Data Collection and Energy Transfer | |
Khodaparast et al. | Deep reinforcement learning based data collection in IoT networks | |
CN113821909B (en) | Design method and device of space-sky wireless energy-carrying communication system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |