CN113255218A - Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network - Google Patents

Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network Download PDF

Info

Publication number
CN113255218A
CN113255218A CN202110582074.0A CN202110582074A CN113255218A CN 113255218 A CN113255218 A CN 113255218A CN 202110582074 A CN202110582074 A CN 202110582074A CN 113255218 A CN113255218 A CN 113255218A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
network
neural network
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110582074.0A
Other languages
Chinese (zh)
Other versions
CN113255218B (en
Inventor
胡杰
李雨婷
于秦
杨鲲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202110582074.0A priority Critical patent/CN113255218B/en
Publication of CN113255218A publication Critical patent/CN113255218A/en
Application granted granted Critical
Publication of CN113255218B publication Critical patent/CN113255218B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/22Traffic simulation tools or models
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/0226Traffic management, e.g. flow control or congestion control based on location or mobility
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/06Optimizing the usage of the radio link, e.g. header compression, information sizing, discarding information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/38Services specially adapted for particular environments, situations or purposes for collecting sensor information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/02CAD in a network environment, e.g. collaborative CAD or distributed simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/04Constraint-based CAD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/08Probabilistic or stochastic CAD
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses an unmanned aerial vehicle autonomous navigation and resource scheduling method of a wireless self-powered communication network, which comprises the following steps: s1, determining a network model, a communication mode and a channel model; s2, modeling downlink wireless power transmission and uplink wireless information transmission, and determining an optimized target expression and constraint conditions thereof; s3, analyzing the optimization problem, and modeling the optimization problem as a Markov process; s4, determining a network communication protocol and an unmanned aerial vehicle flight decision model; s5, defining a neural network input state, an unmanned aerial vehicle output action and a reward function; and S6, solving the optimization problem according to the depth reinforcement learning algorithm. According to the invention, through jointly designing three parts, namely the flight path of the unmanned aerial vehicle in the wireless self-powered communication network, the selection of the ground equipment and the communication mode with the ground equipment, the energy supply to a plurality of pieces of ground equipment is realized, and meanwhile, the maximization of the average data volume of the plurality of pieces of ground equipment in the wireless self-powered communication network is also considered.

Description

Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network
Technical Field
The invention belongs to the technical field of unmanned aerial vehicle energy supply communication networks, and particularly relates to an unmanned aerial vehicle autonomous navigation and resource scheduling method of a wireless self-powered communication network.
Background
Wireless Sensor Networks (WSNs) may be used to collect information about the surrounding environment. Generally, the power of the device in the wireless sensor network is limited, and when the power of the device is exhausted, the sensor is charged in a manual mode or a traditional ground communication network, so that the charging efficiency is low. While Radio Frequency (RF) based Energy Harvesting (EH) may be considered as an expected solution to extend the useful life of energy-limited sensor devices. Wireless Power Transfer (WPT) through RF radiation can provide a convenient, reliable energy supply for low power internet of things devices. And can operate over a longer range, charging it simultaneously even under multiple wireless devices that are moving. Wireless Powered Communication Networks (WPCNs) have therefore been proposed. The Wireless Power Transmission (WPT) and Wireless Information Transmission (WIT) are integrated, and a feasible solution is provided for energy constraint Internet of things equipment.
UAV (Unmanned Aerial Vehicle) by virtue of its high mobility and low cost, it can support better communication links between air and ground terminals due to less signal blocking and shadowing effects. It can provide higher line of sight (LoS) channel probability and better connectivity by greatly shortening its distance to the user compared to a conventional fixed base station. The UAV, as an air base station, can be used to overcome the user unfairness problem caused by the "double far-near" problem in the conventional fixed base station wireless energy supply network, and improve the data rate by flexibly reducing the signal propagation distance between the UAV and the ground equipment.
However, in the current technology, the energy transmission and data collection tasks of the unmanned aerial vehicle in an unknown environment are not considered under the condition that the position of the ground equipment is known.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an unmanned aerial vehicle autonomous navigation and resource scheduling method of a wireless self-powered communication network, which realizes energy supply to a plurality of ground devices and also maximizes the average data volume of the plurality of devices in the wireless self-powered communication network by jointly designing three parts, namely a flight track of an unmanned aerial vehicle, selection of ground devices and a communication mode with the ground devices in the wireless self-powered communication network.
The purpose of the invention is realized by the following technical scheme: the unmanned aerial vehicle autonomous navigation and resource scheduling method of the wireless self-powered communication network comprises the following steps:
s1, determining a network model, a communication mode and a channel model;
s2, modeling downlink wireless power transmission and uplink wireless information transmission, and determining an optimized target expression and constraint conditions thereof;
s3, analyzing the optimization problem, and modeling the optimization problem as a Markov process;
s4, determining a network communication protocol and an unmanned aerial vehicle flight decision model;
s5, defining a neural network input state, an unmanned aerial vehicle output action and a reward function;
and S6, solving the optimization problem according to the depth reinforcement learning algorithm.
Furthermore, the network model consists of an unmanned aerial vehicle and a plurality of ground passive devices;
the communication mode is as follows: the unmanned aerial vehicle transmits energy to the ground passive device through the radio frequency link, and the ground passive device transmits data to the unmanned aerial vehicle through the harvested energy;
the channel model is a Los channel.
Further, the step S2 specifically includes the following sub-steps:
s21, determining the energy harvested by the ground passive equipment for the downlink wireless power transmission;
s22, for uplink wireless information transmission, when the unmanned aerial vehicle selects a certain ground passive device for communication, determining an uplink transmission data volume;
and S23, determining an optimization target expression and a constraint condition thereof.
Further, the step S5 specifically includes the following sub-steps:
s51, determining a network state set: defining the network state as S ═ ei(t),ζi,q(t),hi(t)},ei(t) represents the battery power level, ζ, of the ith passive device at time t within the coverage areaiRepresenting the accumulated uploaded data volume of the passive device i, q (t) representing the position of the unmanned aerial vehicle at the moment t, hi(t) represents the channel gain of the passive device i and the unmanned aerial vehicle at the time t;
s52, determining the output unmanned aerial vehicle action set A as: a ═ i, [ rho ] (t), α (t), vUAV(t), where ρ (t) represents a communication mode of the drone, ρ (t) ═ 1 represents a downlink transmission mode, and ρ (t) ═ 0 represents an uplink transmission mode; α (t) represents an unmanned aerial vehicle steering angle; v. ofUAV(t) represents the flight speed of the drone;
s53, determining a reward mechanism: defining a reward function r ═ rdata+rpenalty
Figure BDA0003086322640000021
Representing the variation of the network average data volume, and executing corresponding penalty r once any constraint in the constraint conditions is not satisfiedpenaltyAnd I denotes the total number of passive devices.
Further, the step S6 specifically includes the following sub-steps:
s61, initializing network parameters: initializing values Q corresponding to all states and actions, initializing all parameters omega of the current neural network, emptying a set D of experience playback, wherein the parameters omega' of the target neural network are omega;
s62 initialization StFor the current state, the feature vector phi(s) of the current state is obtainedt);
S63, using phi (S) in the neural networkt) As input, values Q corresponding to all states of the neural network are obtained, and corresponding actions a are selected from the current values Q by an epsilon-greedy methodt
S64, at state StPerforming a current action atTo obtain a new state st+1And a feature vector phi(s) corresponding to the new statet+1) And the prize r of the current statetWill { phi(s)t),at,rt,φ(st+1) Storing the quadruple into an empirical playback set D;
s65, let t be t +1, then St=st+1Judging the new state st+1Whether the flight is in the ending state or not, if not, returning to the step S63; if yes, continuing to judge whether the iteration round number T +1 is larger than T, if yes, ending the iteration, otherwise, returning to the step S63;
s66, sampling m samples from the empirical playback set Dj),aj,rj,φ(sj+1) J 1.. m, the current target state-action value y is calculated according to the following formulaj
Figure BDA0003086322640000031
Q′(sj+1,aj+1(ii) a ω') represents the value of the next state, calculated by the target neural network;
s67, calculating a mean square error loss function
Figure BDA0003086322640000032
Updating all parameters omega of the neural network by gradient back propagation of the neural network such that the mean square error loss functionMinimization; y isjIs shown in state sjThe value calculated by the formula of S66,
Figure BDA0003086322640000033
is shown in state sjThe value directly output through the current neural network;
s68, if t% of the target neural network parameter update frequency is 1, updating the target neural network parameter ω' ═ ω, otherwise, not updating the target neural network parameter;
s69, updating coordinates of the unmanned aerial vehicle, calculating the battery power level of the passive device, accumulating the uploaded data volume by the passive device, and obtaining the channel gain of the passive device and the unmanned aerial vehicle.
The invention has the beneficial effects that: the invention achieves the purpose of improving the average uplink transmission data volume of system users to the maximum extent by jointly designing three parts of the flight path of the unmanned aerial vehicle, the selection of ground equipment and the communication mode of the ground equipment in the wireless self-powered communication network, performs optimized solution by a deep reinforcement learning algorithm, and inputs the system state into a neural network so as to output the optimal action of the unmanned aerial vehicle. The invention fully considers the problem that the unmanned aerial vehicle does not have prior knowledge on the position of the ground equipment, realizes energy supply to a plurality of pieces of ground equipment, and simultaneously considers the maximization of the average data volume of the plurality of pieces of equipment in the wireless self-powered communication network.
Drawings
Fig. 1 is a flow chart of the autonomous navigation and resource scheduling method of the unmanned aerial vehicle of the present invention;
fig. 2 is a schematic diagram of a wireless self-powered communication network model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a deep reinforcement learning algorithm model according to the present invention.
Detailed Description
The technical scheme of the invention is further explained by combining the attached drawings.
As shown in fig. 1, the method for unmanned aerial vehicle autonomous navigation and resource scheduling in a wireless autonomous power communication network of the present invention includes the following steps:
s1, determining a network model, a communication mode and a channel model;
the network model consists of an unmanned aerial vehicle and a plurality of ground passive devices; suppose that there is an unmanned aerial vehicle as the aerial base station in the WPCN network, and there are I passive (sensor) devices on the ground, which are recorded as
Figure BDA0003086322640000041
Figure BDA0003086322640000042
Representing a two-dimensional space. The drone is destined to collect data for I passive devices in the area. In order to simplify the network model, the flying height of the unmanned aerial vehicle is assumed to be unchanged and fixed as H. The position of the unmanned aerial vehicle at time t is denoted as q (t) ═ x (t), y (t), and the flying speed is vUAV(t) the carrier signal transmission power of the unmanned aerial vehicle is PUAVChannel noise power of σ2At time t, the distance between the unmanned aerial vehicle and each passive device is
Figure BDA0003086322640000043
Where | · | | represents the euclidean distance between a pair of vectors, wiThe location of the ith passive device is indicated. The energy conversion efficiency coefficient of the passive equipment is eta, and the signal transmitting power is Ptr. A model of a communication network based on drones is shown in fig. 2.
The communication mode is as follows: the unmanned aerial vehicle transmits energy to the ground passive device through the radio frequency link, and the ground passive device transmits data to the unmanned aerial vehicle through the harvested energy; drones serve as both transmitters of energy and receivers of information. The ground passive equipment adopts a protocol of 'harvesting before transmission', namely, after enough energy is harvested from a downlink radio frequency link of the unmanned aerial vehicle, data are transmitted to the unmanned aerial vehicle through an uplink link. The total working time of the unmanned aerial vehicle is T, and at each time T, the unmanned aerial vehicle determines a communication mode and uses rho (T) epsilon {0,1} to represent the communication mode. Wherein ρ (t) ═ 1 represents a downlink transmission mode, and the unmanned aerial vehicle broadcasts energy to the ground passive equipment; ρ (t) ═ 0 represents the uplink transmission mode, and the drone selects a certain passive device to receive its uploaded data information, and only one device is allowed to upload at this time.
The channel model is a Los channel; at time t, the two-dimensional coordinates of the drone are q (t) ═ (x (t), y (t)). Suppose there is a Los channel between the drone and the ground passive device, and the path loss index is 2. The passive device i and the unmanned aerial vehicle have channel gains at the moment t of
Figure BDA0003086322640000044
β0Representing the channel gain at a reference distance of 1 meter.
S2, modeling downlink wireless power transmission and uplink wireless information transmission, and determining an optimized target expression and constraint conditions thereof; the method specifically comprises the following steps:
s21, determining the energy harvested by the ground passive equipment for the downlink wireless power transmission to obtain an energy constraint condition; assuming that the unmanned aerial vehicle is in a downlink transmission mode, the corresponding power received by the passive device i at the moment t is
Figure BDA0003086322640000051
Wherein P isUAVThe transmitting power of the unmanned aerial vehicle is represented, and eta is an energy conversion efficiency coefficient of the passive equipment. Suppose in
Figure BDA0003086322640000052
Within the time, the unmanned aerial vehicle is always in a downlink communication mode, and the battery energy on the passive device i is
Figure BDA0003086322640000053
Comparing the residual battery power of the passive equipment with an energy threshold value, judging whether the residual battery power of the passive equipment is greater than the energy threshold value, if so, defining the power level of the passive equipment to be 1, otherwise, defining the power level of the passive equipment to be 0, and discretizing the battery power of all the passive equipment into high and low levels ei(t)∈{0,1}。
S22, for uplink wireless information transmission, when the unmanned aerial vehicle selects a certain ground passive device for communication, determining the uplink transmission data volume to obtain a service quality constraint condition; suppose an unmanned aerial vehicleAnd in an uplink transmission mode, at the moment, the passive device i is selected to transmit data to the unmanned aerial vehicle, and the throughput of the passive device i at the moment t is
Figure BDA0003086322640000054
Where B is the system bandwidth, PtrIs the transmitted power of the passive device(s),
Figure BDA0003086322640000055
is a reference signal-to-noise ratio (SNR). Suppose in
Figure BDA0003086322640000056
Within the time, the passive device i is selected all the time to send data to the unmanned aerial vehicle, and the data volume uploaded by the passive device i is accumulated to be
Figure BDA0003086322640000057
S23, determining an optimized target expression and a constraint condition thereof, wherein the target problem of the maximization of the average data volume of the system is as follows:
Figure BDA0003086322640000058
Figure BDA0003086322640000059
q(0)=q(T)
Figure BDA00030863226400000510
Figure BDA00030863226400000511
Figure BDA00030863226400000512
Figure BDA00030863226400000513
where P1 represents the optimization problem P1, i.e. maximizing the average throughput of all devices by adjusting drone position, speed and communication mode;
Figure BDA00030863226400000514
representing the average flight speed of the unmanned aerial vehicle, and tau representing the current flight time of the unmanned aerial vehicle; q (0) represents the position of the drone at time T-0, q (T) represents the position of the drone at time T-T, T is the drone flight time specified in advance, and q (0) q (T) represents the need for the drone to return to the home position at time T. ZetaQoSThe constraint representing the QoS criteria, i.e. the minimum amount of data uploaded per sensor, also represents that the drone needs to traverse all sensors.
S3, analyzing the optimization problem, and modeling the optimization problem as a Markov process; the Markov process is composed of 4 tuples<S,A,R,P>Definition, where S is a set of states; a is the set of all possible actions, R is the reward when an action is taken, and P represents the transition probability from one state to another. Specifically, the drone observes the environment and obtains state s as a smart agenttE.g. S. Unmanned aerial vehicle selects action a at time ttE.g. A, then according to the observation and the next state st+1Obtain a return rt∈R。
S4, determining a network communication protocol and an unmanned aerial vehicle flight decision model; in order to solve the problem that the unmanned aerial vehicle does not have a priori knowledge of the position of the passive device, a coverage area is defined for the unmanned aerial vehicle, and only the passive device in the coverage area can communicate with the unmanned aerial vehicle. When the drone is in WPT mode, the drone broadcasts energy to all passive devices in the coverage area. At the end of the time slot, the passive device receiving the energy will send a short beacon status message to the drone, including battery power, channel information and accumulated data volume. In the next time slot, the drone will determine the next action, i.e. steering angle, passive device selection and communication mode, based on the received status information of some passive devices. In the flight process, the coverage area of the unmanned aerial vehicle can change, and the unmanned aerial vehicle can automatically navigate to the optimal position to receive more passive equipment state information, improve the average data volume to the greatest extent, and reasonably plan the flight path while meeting the energy constraint of the passive equipment.
S5, defining a neural network input state, an unmanned aerial vehicle output action and a reward function; the method is realized by the following steps:
s51, determining a network state set: defining the network state as S ═ ei(t),ζi,q(t),hi(t)},ei(t) represents the battery power level, ζ, of the ith passive device at time t within the coverage areaiRepresenting the accumulated uploaded data volume of the passive device i, q (t) representing the position of the unmanned aerial vehicle at the moment t, hi(t) represents the channel gain of the passive device i and the unmanned aerial vehicle at the time t;
s52, determining the output unmanned aerial vehicle action set A as: a ═ i, [ rho ] (t), α (t), vUAV(t), where ρ (t) represents a communication mode of the drone, ρ (t) ═ 1 represents a downlink transmission mode, and ρ (t) ═ 0 represents an uplink transmission mode; α (t) represents the steering angle of the drone, α (t) ∈ {0 °,45 °,90 °,135 °,180 °,225 °,270 °,325 ° }; v. ofUAV(t) denotes the flying speed of the drone, vUAV(t)∈{0m/s,5m/s,10m/s};
S53, determining a reward mechanism: defining a reward function r ═ rdata+rpenalty
Figure BDA0003086322640000061
Representing the variation of the network average data volume, and executing corresponding penalty r once any constraint in the constraint conditions is not satisfiedpenaltyAnd I denotes the total number of passive devices.
S6, solving an optimization problem according to a depth reinforcement learning algorithm;
deep reinforcement learning algorithm as shown in fig. 3, the deep reinforcement learning algorithm can obtain the best strategy pi to maximize the long-term expected cumulative reward. The expected jackpot for each state-action pair output by the neural network may be defined as
Figure BDA0003086322640000071
Where gamma represents the discount factor. By selecting the best action
Figure BDA0003086322640000072
An optimal action-value function can be obtained
Figure BDA0003086322640000073
Wherein
Figure BDA0003086322640000074
Indicating the learning rate.
The deep reinforcement learning algorithm comprises two neural networks, wherein one neural network is a current neural network and used for calculating a value Q in a current state, and the other neural network is a target neural network and used for calculating a value Q in a next state.
Inputting: iteration round number F, state characteristic dimension n, action set A, attenuation factor gamma, exploration rate epsilon and learning rate
Figure BDA0003086322640000075
Q network structure, batch gradient descending sample number m, and target Q network parameter updating frequency.
The method specifically comprises the following steps:
s61, initializing network parameters: initializing values Q corresponding to all states and actions, initializing all parameters omega of the current neural network, emptying a set D of experience playback, wherein the parameters omega' of the target neural network are omega;
s62 initialization StFor the current state, the feature vector phi(s) of the current state is obtainedt);
S63, using phi (S) in the neural networkt) As input, values Q corresponding to all states of the neural network are obtained, and corresponding actions a are selected from the current values Q by an epsilon-greedy methodt
S64, at state StPerforming a current action atTo obtain a new state st+1And a feature vector phi(s) corresponding to the new statet+1) And the prize r of the current statetWill { phi(s)t),at,rt,φ(st+1) Storing the quadruple into an empirical playback set D;
s65, let t be t +1, then St=st+1Judging the new state st+1Whether the flight is in the ending state or not, if not, returning to the step S63; if yes, continuing to judge whether the iteration round number T +1 is larger than T, if yes, ending the iteration, otherwise, returning to the step S63;
s66, sampling m samples from the empirical playback set Dj),aj,rj,φ(sj+1) J 1.. m, the current target state-action value y is calculated according to the following formulaj
Figure BDA0003086322640000076
Q′(sj+1,aj+1(ii) a ω') represents the value of the next state, calculated by the target neural network, rather than the current neural network, thus avoiding training the neural network with the current neural network and avoiding too strong a coupling.
ytRepresents the Q value calculated by the formula, which is obtained by calculation and is not directly output through a neural network; the aforementioned Q value is equivalent to a true Q value and is obtained by directly inputting a state into the Q network. The invention aims to train a neural network and use the value Q output by the neural network to approximate the value y obtained by the calculation of the formulatAnd the mean square error loss function between the two functions is minimized, so that the neural network can perfectly simulate the target value Q finally.
S67, calculating a mean square error loss function
Figure BDA0003086322640000081
Updating all parameters omega of the neural network through gradient back propagation of the neural network so as to minimize a mean square error loss function; y isjIs shown in state sjThe value calculated by the formula of S66,
Figure BDA0003086322640000082
is shown in state sjThe value directly output through the current neural network;
s68, if t% of the target neural network parameter update frequency is 1, updating the target neural network parameter ω' ═ ω (that is, the target neural network parameter is updated once at intervals of update frequency and time), otherwise, not updating the target neural network parameter;
s69, updating coordinates of the unmanned aerial vehicle, calculating the battery power level of the passive device, accumulating the uploaded data volume by the passive device, and obtaining the channel gain of the passive device and the unmanned aerial vehicle.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims (5)

1. The unmanned aerial vehicle autonomous navigation and resource scheduling method of the wireless self-powered communication network is characterized by comprising the following steps:
s1, determining a network model, a communication mode and a channel model;
s2, modeling downlink wireless power transmission and uplink wireless information transmission, and determining an optimized target expression and constraint conditions thereof;
s3, analyzing the optimization problem, and modeling the optimization problem as a Markov process;
s4, determining a network communication protocol and an unmanned aerial vehicle flight decision model;
s5, defining a neural network input state, an unmanned aerial vehicle output action and a reward function;
and S6, solving the optimization problem according to the depth reinforcement learning algorithm.
2. The unmanned aerial vehicle autonomous navigation and resource scheduling method of a wireless self-powered communication network of claim 1, wherein the network model consists of one unmanned aerial vehicle and a plurality of ground passive devices;
the communication mode is as follows: the unmanned aerial vehicle transmits energy to the ground passive device through the radio frequency link, and the ground passive device transmits data to the unmanned aerial vehicle through the harvested energy;
the channel model is a Los channel.
3. The unmanned aerial vehicle autonomous navigation and resource scheduling method of a wireless self-powered communication network of claim 1, wherein the step S2 specifically comprises the following sub-steps:
s21, determining the energy harvested by the ground passive equipment for the downlink wireless power transmission;
s22, for uplink wireless information transmission, when the unmanned aerial vehicle selects a certain ground passive device for communication, determining an uplink transmission data volume;
and S23, determining an optimization target expression and a constraint condition thereof.
4. The unmanned aerial vehicle autonomous navigation and resource scheduling method of a wireless self-powered communication network of claim 1, wherein the step S5 is implemented by:
s51, determining a network state set: defining the network state as S ═ ei(t),ζi,q(t),hi(t)},ei(t) represents the battery power level, ζ, of the ith passive device at time t within the coverage areaiRepresenting the accumulated uploaded data volume of the passive device i, q (t) representing the position of the unmanned aerial vehicle at the moment t, hi(t) represents the channel gain of the passive device i and the unmanned aerial vehicle at the time t;
s52, determining the output unmanned aerial vehicle action set A as: a ═ i, [ rho ] (t), α (t), vUAV(t), where ρ (t) represents a communication mode of the drone, ρ (t) ═ 1 represents a downlink transmission mode, and ρ (t) ═ 0 represents an uplink transmission mode; alpha (t) denotes unmanned aerial vehicle steeringAn angle; v. ofUAV(t) represents the flight speed of the drone;
s53, determining a reward mechanism: defining a reward function r ═ rdata+rpenalty
Figure FDA0003086322630000021
Representing the variation of the network average data volume, and executing corresponding penalty r once any constraint in the constraint conditions is not satisfiedpenaltyAnd I denotes the total number of passive devices.
5. The unmanned aerial vehicle autonomous navigation and resource scheduling method of a wireless self-powered communication network of claim 1, wherein the step S6 specifically comprises the following sub-steps:
s61, initializing network parameters: initializing values Q corresponding to all states and actions, initializing all parameters omega of the current neural network, emptying a set D of experience playback, wherein the parameters omega' of the target neural network are omega;
s62 initialization StFor the current state, the feature vector phi(s) of the current state is obtainedt);
S63, using phi (S) in the neural networkt) As input, values Q corresponding to all states of the neural network are obtained, and corresponding actions a are selected from the current values Q by an epsilon-greedy methodt
S64, at state StPerforming a current action atTo obtain a new state st+1And a feature vector phi(s) corresponding to the new statet+1) And the prize r of the current statetWill { phi(s)t),at,rt,φ(st+1) Storing the quadruple into an empirical playback set D;
s65, let t be t +1, then St=st+1Judging the new state st+1Whether the flight is in the ending state or not, if not, returning to the step S63; if yes, continuing to judge whether the iteration round number T +1 is larger than T, if yes, ending the iteration, otherwise, returning to the step S63;
s66, sampling m samples from the empirical playback set Dj),aj,rj,φ(sj+1) J 1.. m, the current target state-action value y is calculated according to the following formulaj
Figure FDA0003086322630000022
Q′(sj+1,aj+1(ii) a ω') represents the value of the next state, calculated by the target neural network;
s67, calculating a mean square error loss function
Figure FDA0003086322630000023
Updating all parameters omega of the neural network through gradient back propagation of the neural network so as to minimize a mean square error loss function; y isjIs shown in state sjThe value calculated by the formula of S66,
Figure FDA0003086322630000024
is shown in state sjThe value directly output through the current neural network;
s68, if t% of the target neural network parameter update frequency is 1, updating the target neural network parameter ω' ═ ω, otherwise, not updating the target neural network parameter;
s69, updating coordinates of the unmanned aerial vehicle, calculating the battery power level of the passive device, accumulating the uploaded data volume by the passive device, and obtaining the channel gain of the passive device and the unmanned aerial vehicle.
CN202110582074.0A 2021-05-27 2021-05-27 Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network Active CN113255218B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110582074.0A CN113255218B (en) 2021-05-27 2021-05-27 Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110582074.0A CN113255218B (en) 2021-05-27 2021-05-27 Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network

Publications (2)

Publication Number Publication Date
CN113255218A true CN113255218A (en) 2021-08-13
CN113255218B CN113255218B (en) 2022-05-31

Family

ID=77184662

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110582074.0A Active CN113255218B (en) 2021-05-27 2021-05-27 Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network

Country Status (1)

Country Link
CN (1) CN113255218B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114061589A (en) * 2021-11-16 2022-02-18 中山大学 Edge-side-coordinated multi-unmanned aerial vehicle autonomous navigation method
CN115470894A (en) * 2022-10-31 2022-12-13 中国人民解放军国防科技大学 Unmanned aerial vehicle knowledge model time-sharing calling method and device based on reinforcement learning
CN115766769A (en) * 2022-10-25 2023-03-07 西北工业大学 Wireless sensor network deployment method based on deep reinforcement learning
CN116502547A (en) * 2023-06-29 2023-07-28 深圳大学 Multi-unmanned aerial vehicle wireless energy transmission method based on graph reinforcement learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110428115A (en) * 2019-08-13 2019-11-08 南京理工大学 Maximization system benefit method under dynamic environment based on deeply study
CN110488861A (en) * 2019-07-30 2019-11-22 北京邮电大学 Unmanned plane track optimizing method, device and unmanned plane based on deeply study
WO2020079702A1 (en) * 2018-10-18 2020-04-23 Telefonaktiebolaget Lm Ericsson (Publ) Formation flight of unmanned aerial vehicles
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
CN112468205A (en) * 2020-01-09 2021-03-09 电子科技大学中山学院 Backscatter secure communication method suitable for unmanned aerial vehicle
CN112711271A (en) * 2020-12-16 2021-04-27 中山大学 Autonomous navigation unmanned aerial vehicle power optimization method based on deep reinforcement learning
CN112817327A (en) * 2020-12-30 2021-05-18 北京航空航天大学 Multi-unmanned aerial vehicle collaborative search method under communication constraint

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020079702A1 (en) * 2018-10-18 2020-04-23 Telefonaktiebolaget Lm Ericsson (Publ) Formation flight of unmanned aerial vehicles
CN110488861A (en) * 2019-07-30 2019-11-22 北京邮电大学 Unmanned plane track optimizing method, device and unmanned plane based on deeply study
CN110428115A (en) * 2019-08-13 2019-11-08 南京理工大学 Maximization system benefit method under dynamic environment based on deeply study
CN112468205A (en) * 2020-01-09 2021-03-09 电子科技大学中山学院 Backscatter secure communication method suitable for unmanned aerial vehicle
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
CN112711271A (en) * 2020-12-16 2021-04-27 中山大学 Autonomous navigation unmanned aerial vehicle power optimization method based on deep reinforcement learning
CN112817327A (en) * 2020-12-30 2021-05-18 北京航空航天大学 Multi-unmanned aerial vehicle collaborative search method under communication constraint

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JIE HU 等: "Joint Trajectory and Scheduling Design for UAV Aided Secure Backscatter Communications", 《IEEE WIRELESS COMMUNICATIONS LETTERS》, vol. 9, no. 12, 12 April 2020 (2020-04-12), pages 2168 - 2172, XP011824554, DOI: 10.1109/LWC.2020.3016174 *
KAI LI 等: "Deep Reinforcement Learning for Real-Time Trajectory Planning in UAV Networks", 《2020 INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING (IWCMC)》, 27 July 2020 (2020-07-27), pages 958 - 963 *
伍芸荻: "无人机通信系统中信息和能量传输优化研究", 《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑》, no. 8, 15 August 2019 (2019-08-15), pages 031 - 66 *
杨鲲 等: "无线数能一体化通信网络及其数能联合接入控制协议设计", 《吉林师范大学学报(自然科学版)》, vol. 40, no. 1, 16 January 2019 (2019-01-16), pages 106 - 114 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114061589A (en) * 2021-11-16 2022-02-18 中山大学 Edge-side-coordinated multi-unmanned aerial vehicle autonomous navigation method
CN114061589B (en) * 2021-11-16 2023-05-26 中山大学 Multi-unmanned aerial vehicle autonomous navigation method with cooperative end edges
CN115766769A (en) * 2022-10-25 2023-03-07 西北工业大学 Wireless sensor network deployment method based on deep reinforcement learning
CN115470894A (en) * 2022-10-31 2022-12-13 中国人民解放军国防科技大学 Unmanned aerial vehicle knowledge model time-sharing calling method and device based on reinforcement learning
CN116502547A (en) * 2023-06-29 2023-07-28 深圳大学 Multi-unmanned aerial vehicle wireless energy transmission method based on graph reinforcement learning
CN116502547B (en) * 2023-06-29 2024-06-04 深圳大学 Multi-unmanned aerial vehicle wireless energy transmission method based on graph reinforcement learning

Also Published As

Publication number Publication date
CN113255218B (en) 2022-05-31

Similar Documents

Publication Publication Date Title
CN113255218B (en) Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network
Zhan et al. Energy minimization for cellular-connected UAV: From optimization to deep reinforcement learning
CN108880662B (en) Wireless information and energy transmission optimization method based on unmanned aerial vehicle
CN110730028B (en) Unmanned aerial vehicle-assisted backscatter communication device and resource allocation control method
CN114389679B (en) Multi-antenna unmanned aerial vehicle sensing and transmission optimization method based on information age minimization
CN111988762A (en) Energy efficiency maximum resource allocation method based on unmanned aerial vehicle D2D communication network
CN109743210A (en) Unmanned plane network multi-user connection control method based on deeply study
CN115494732B (en) Unmanned aerial vehicle track design and power distribution method based on near-end strategy optimization
Li et al. Deep reinforcement learning for real-time trajectory planning in UAV networks
CN113776531B (en) Multi-unmanned aerial vehicle autonomous navigation and task allocation algorithm of wireless self-powered communication network
Ni et al. Optimal transmission control and learning-based trajectory design for UAV-assisted detection and communication
CN117119489A (en) Deployment and resource optimization method of wireless energy supply network based on multi-unmanned aerial vehicle assistance
Cui et al. Joint trajectory and power optimization for energy efficient UAV communication using deep reinforcement learning
Shi et al. Age of information optimization with heterogeneous uavs based on deep reinforcement learning
Zhang et al. Multi-objective optimization for UAV-enabled wireless powered IoT networks: an LSTM-based deep reinforcement learning approach
CN108337024B (en) Large-scale MIMO system energy efficiency optimization method based on energy collection
Ouamri et al. Joint Energy Efficiency and Throughput Optimization for UAV-WPT Integrated Ground Network using DDPG
Wei et al. An energy efficient cooperation design for multi-UAVs enabled wireless powered communication networks
CN115412156B (en) Urban monitoring-oriented satellite energy-carrying Internet of things resource optimal allocation method
Lyu et al. Resource Allocation in UAV‐Assisted Wireless Powered Communication Networks for Urban Monitoring
Zhou et al. Model-based machine learning for energy-efficient UAV placement
Chen et al. Deep Reinforcement Learning Assisted UAV Trajectory and Resource Optimization for NOMA Networks
Lee et al. Multi-Agent Deep Reinforcement Learning-Based Multi-UAV Path Planning for Wireless Data Collection and Energy Transfer
Khodaparast et al. Deep reinforcement learning based data collection in IoT networks
CN113821909B (en) Design method and device of space-sky wireless energy-carrying communication system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant