CN113255218A - Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network - Google Patents

Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network Download PDF

Info

Publication number
CN113255218A
CN113255218A CN202110582074.0A CN202110582074A CN113255218A CN 113255218 A CN113255218 A CN 113255218A CN 202110582074 A CN202110582074 A CN 202110582074A CN 113255218 A CN113255218 A CN 113255218A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
network
state
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110582074.0A
Other languages
Chinese (zh)
Other versions
CN113255218B (en
Inventor
胡杰
李雨婷
于秦
杨鲲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202110582074.0A priority Critical patent/CN113255218B/en
Publication of CN113255218A publication Critical patent/CN113255218A/en
Application granted granted Critical
Publication of CN113255218B publication Critical patent/CN113255218B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/22Traffic simulation tools or models
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/0226Traffic management, e.g. flow control or congestion control based on location or mobility
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/06Optimizing the usage of the radio link, e.g. header compression, information sizing, discarding information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/38Services specially adapted for particular environments, situations or purposes for collecting sensor information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/02CAD in a network environment, e.g. collaborative CAD or distributed simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/04Constraint-based CAD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/08Probabilistic or stochastic CAD
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses an unmanned aerial vehicle autonomous navigation and resource scheduling method of a wireless self-powered communication network, which comprises the following steps: s1, determining a network model, a communication mode and a channel model; s2, modeling downlink wireless power transmission and uplink wireless information transmission, and determining an optimized target expression and constraint conditions thereof; s3, analyzing the optimization problem, and modeling the optimization problem as a Markov process; s4, determining a network communication protocol and an unmanned aerial vehicle flight decision model; s5, defining a neural network input state, an unmanned aerial vehicle output action and a reward function; and S6, solving the optimization problem according to the depth reinforcement learning algorithm. According to the invention, through jointly designing three parts, namely the flight path of the unmanned aerial vehicle in the wireless self-powered communication network, the selection of the ground equipment and the communication mode with the ground equipment, the energy supply to a plurality of pieces of ground equipment is realized, and meanwhile, the maximization of the average data volume of the plurality of pieces of ground equipment in the wireless self-powered communication network is also considered.

Description

Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network
Technical Field
The invention belongs to the technical field of unmanned aerial vehicle energy supply communication networks, and particularly relates to an unmanned aerial vehicle autonomous navigation and resource scheduling method of a wireless self-powered communication network.
Background
Wireless Sensor Networks (WSNs) may be used to collect information about the surrounding environment. Generally, the power of the device in the wireless sensor network is limited, and when the power of the device is exhausted, the sensor is charged in a manual mode or a traditional ground communication network, so that the charging efficiency is low. While Radio Frequency (RF) based Energy Harvesting (EH) may be considered as an expected solution to extend the useful life of energy-limited sensor devices. Wireless Power Transfer (WPT) through RF radiation can provide a convenient, reliable energy supply for low power internet of things devices. And can operate over a longer range, charging it simultaneously even under multiple wireless devices that are moving. Wireless Powered Communication Networks (WPCNs) have therefore been proposed. The Wireless Power Transmission (WPT) and Wireless Information Transmission (WIT) are integrated, and a feasible solution is provided for energy constraint Internet of things equipment.
UAV (Unmanned Aerial Vehicle) by virtue of its high mobility and low cost, it can support better communication links between air and ground terminals due to less signal blocking and shadowing effects. It can provide higher line of sight (LoS) channel probability and better connectivity by greatly shortening its distance to the user compared to a conventional fixed base station. The UAV, as an air base station, can be used to overcome the user unfairness problem caused by the "double far-near" problem in the conventional fixed base station wireless energy supply network, and improve the data rate by flexibly reducing the signal propagation distance between the UAV and the ground equipment.
However, in the current technology, the energy transmission and data collection tasks of the unmanned aerial vehicle in an unknown environment are not considered under the condition that the position of the ground equipment is known.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an unmanned aerial vehicle autonomous navigation and resource scheduling method of a wireless self-powered communication network, which realizes energy supply to a plurality of ground devices and also maximizes the average data volume of the plurality of devices in the wireless self-powered communication network by jointly designing three parts, namely a flight track of an unmanned aerial vehicle, selection of ground devices and a communication mode with the ground devices in the wireless self-powered communication network.
The purpose of the invention is realized by the following technical scheme: the unmanned aerial vehicle autonomous navigation and resource scheduling method of the wireless self-powered communication network comprises the following steps:
s1, determining a network model, a communication mode and a channel model;
s2, modeling downlink wireless power transmission and uplink wireless information transmission, and determining an optimized target expression and constraint conditions thereof;
s3, analyzing the optimization problem, and modeling the optimization problem as a Markov process;
s4, determining a network communication protocol and an unmanned aerial vehicle flight decision model;
s5, defining a neural network input state, an unmanned aerial vehicle output action and a reward function;
and S6, solving the optimization problem according to the depth reinforcement learning algorithm.
Furthermore, the network model consists of an unmanned aerial vehicle and a plurality of ground passive devices;
the communication mode is as follows: the unmanned aerial vehicle transmits energy to the ground passive device through the radio frequency link, and the ground passive device transmits data to the unmanned aerial vehicle through the harvested energy;
the channel model is a Los channel.
Further, the step S2 specifically includes the following sub-steps:
s21, determining the energy harvested by the ground passive equipment for the downlink wireless power transmission;
s22, for uplink wireless information transmission, when the unmanned aerial vehicle selects a certain ground passive device for communication, determining an uplink transmission data volume;
and S23, determining an optimization target expression and a constraint condition thereof.
Further, the step S5 specifically includes the following sub-steps:
s51, determining a network state set: defining the network state as S ═ ei(t),ζi,q(t),hi(t)},ei(t) represents the battery power level, ζ, of the ith passive device at time t within the coverage areaiRepresenting the accumulated uploaded data volume of the passive device i, q (t) representing the position of the unmanned aerial vehicle at the moment t, hi(t) represents the channel gain of the passive device i and the unmanned aerial vehicle at the time t;
s52, determining the output unmanned aerial vehicle action set A as: a ═ i, [ rho ] (t), α (t), vUAV(t), where ρ (t) represents a communication mode of the drone, ρ (t) ═ 1 represents a downlink transmission mode, and ρ (t) ═ 0 represents an uplink transmission mode; α (t) represents an unmanned aerial vehicle steering angle; v. ofUAV(t) represents the flight speed of the drone;
s53, determining a reward mechanism: defining a reward function r ═ rdata+rpenalty
Figure BDA0003086322640000021
Representing the variation of the network average data volume, and executing corresponding penalty r once any constraint in the constraint conditions is not satisfiedpenaltyAnd I denotes the total number of passive devices.
Further, the step S6 specifically includes the following sub-steps:
s61, initializing network parameters: initializing values Q corresponding to all states and actions, initializing all parameters omega of the current neural network, emptying a set D of experience playback, wherein the parameters omega' of the target neural network are omega;
s62 initialization StFor the current state, the feature vector phi(s) of the current state is obtainedt);
S63, using phi (S) in the neural networkt) As input, values Q corresponding to all states of the neural network are obtained, and corresponding actions a are selected from the current values Q by an epsilon-greedy methodt
S64, at state StPerforming a current action atTo obtain a new state st+1And a feature vector phi(s) corresponding to the new statet+1) And the prize r of the current statetWill { phi(s)t),at,rt,φ(st+1) Storing the quadruple into an empirical playback set D;
s65, let t be t +1, then St=st+1Judging the new state st+1Whether the flight is in the ending state or not, if not, returning to the step S63; if yes, continuing to judge whether the iteration round number T +1 is larger than T, if yes, ending the iteration, otherwise, returning to the step S63;
s66, sampling m samples from the empirical playback set Dj),aj,rj,φ(sj+1) J 1.. m, the current target state-action value y is calculated according to the following formulaj
Figure BDA0003086322640000031
Q′(sj+1,aj+1(ii) a ω') represents the value of the next state, calculated by the target neural network;
s67, calculating a mean square error loss function
Figure BDA0003086322640000032
Updating all parameters omega of the neural network by gradient back propagation of the neural network such that the mean square error loss functionMinimization; y isjIs shown in state sjThe value calculated by the formula of S66,
Figure BDA0003086322640000033
is shown in state sjThe value directly output through the current neural network;
s68, if t% of the target neural network parameter update frequency is 1, updating the target neural network parameter ω' ═ ω, otherwise, not updating the target neural network parameter;
s69, updating coordinates of the unmanned aerial vehicle, calculating the battery power level of the passive device, accumulating the uploaded data volume by the passive device, and obtaining the channel gain of the passive device and the unmanned aerial vehicle.
The invention has the beneficial effects that: the invention achieves the purpose of improving the average uplink transmission data volume of system users to the maximum extent by jointly designing three parts of the flight path of the unmanned aerial vehicle, the selection of ground equipment and the communication mode of the ground equipment in the wireless self-powered communication network, performs optimized solution by a deep reinforcement learning algorithm, and inputs the system state into a neural network so as to output the optimal action of the unmanned aerial vehicle. The invention fully considers the problem that the unmanned aerial vehicle does not have prior knowledge on the position of the ground equipment, realizes energy supply to a plurality of pieces of ground equipment, and simultaneously considers the maximization of the average data volume of the plurality of pieces of equipment in the wireless self-powered communication network.
Drawings
Fig. 1 is a flow chart of the autonomous navigation and resource scheduling method of the unmanned aerial vehicle of the present invention;
fig. 2 is a schematic diagram of a wireless self-powered communication network model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a deep reinforcement learning algorithm model according to the present invention.
Detailed Description
The technical scheme of the invention is further explained by combining the attached drawings.
As shown in fig. 1, the method for unmanned aerial vehicle autonomous navigation and resource scheduling in a wireless autonomous power communication network of the present invention includes the following steps:
s1, determining a network model, a communication mode and a channel model;
the network model consists of an unmanned aerial vehicle and a plurality of ground passive devices; suppose that there is an unmanned aerial vehicle as the aerial base station in the WPCN network, and there are I passive (sensor) devices on the ground, which are recorded as
Figure BDA0003086322640000041
Figure BDA0003086322640000042
Representing a two-dimensional space. The drone is destined to collect data for I passive devices in the area. In order to simplify the network model, the flying height of the unmanned aerial vehicle is assumed to be unchanged and fixed as H. The position of the unmanned aerial vehicle at time t is denoted as q (t) ═ x (t), y (t), and the flying speed is vUAV(t) the carrier signal transmission power of the unmanned aerial vehicle is PUAVChannel noise power of σ2At time t, the distance between the unmanned aerial vehicle and each passive device is
Figure BDA0003086322640000043
Where | · | | represents the euclidean distance between a pair of vectors, wiThe location of the ith passive device is indicated. The energy conversion efficiency coefficient of the passive equipment is eta, and the signal transmitting power is Ptr. A model of a communication network based on drones is shown in fig. 2.
The communication mode is as follows: the unmanned aerial vehicle transmits energy to the ground passive device through the radio frequency link, and the ground passive device transmits data to the unmanned aerial vehicle through the harvested energy; drones serve as both transmitters of energy and receivers of information. The ground passive equipment adopts a protocol of 'harvesting before transmission', namely, after enough energy is harvested from a downlink radio frequency link of the unmanned aerial vehicle, data are transmitted to the unmanned aerial vehicle through an uplink link. The total working time of the unmanned aerial vehicle is T, and at each time T, the unmanned aerial vehicle determines a communication mode and uses rho (T) epsilon {0,1} to represent the communication mode. Wherein ρ (t) ═ 1 represents a downlink transmission mode, and the unmanned aerial vehicle broadcasts energy to the ground passive equipment; ρ (t) ═ 0 represents the uplink transmission mode, and the drone selects a certain passive device to receive its uploaded data information, and only one device is allowed to upload at this time.
The channel model is a Los channel; at time t, the two-dimensional coordinates of the drone are q (t) ═ (x (t), y (t)). Suppose there is a Los channel between the drone and the ground passive device, and the path loss index is 2. The passive device i and the unmanned aerial vehicle have channel gains at the moment t of
Figure BDA0003086322640000044
β0Representing the channel gain at a reference distance of 1 meter.
S2, modeling downlink wireless power transmission and uplink wireless information transmission, and determining an optimized target expression and constraint conditions thereof; the method specifically comprises the following steps:
s21, determining the energy harvested by the ground passive equipment for the downlink wireless power transmission to obtain an energy constraint condition; assuming that the unmanned aerial vehicle is in a downlink transmission mode, the corresponding power received by the passive device i at the moment t is
Figure BDA0003086322640000051
Wherein P isUAVThe transmitting power of the unmanned aerial vehicle is represented, and eta is an energy conversion efficiency coefficient of the passive equipment. Suppose in
Figure BDA0003086322640000052
Within the time, the unmanned aerial vehicle is always in a downlink communication mode, and the battery energy on the passive device i is
Figure BDA0003086322640000053
Comparing the residual battery power of the passive equipment with an energy threshold value, judging whether the residual battery power of the passive equipment is greater than the energy threshold value, if so, defining the power level of the passive equipment to be 1, otherwise, defining the power level of the passive equipment to be 0, and discretizing the battery power of all the passive equipment into high and low levels ei(t)∈{0,1}。
S22, for uplink wireless information transmission, when the unmanned aerial vehicle selects a certain ground passive device for communication, determining the uplink transmission data volume to obtain a service quality constraint condition; suppose an unmanned aerial vehicleAnd in an uplink transmission mode, at the moment, the passive device i is selected to transmit data to the unmanned aerial vehicle, and the throughput of the passive device i at the moment t is
Figure BDA0003086322640000054
Where B is the system bandwidth, PtrIs the transmitted power of the passive device(s),
Figure BDA0003086322640000055
is a reference signal-to-noise ratio (SNR). Suppose in
Figure BDA0003086322640000056
Within the time, the passive device i is selected all the time to send data to the unmanned aerial vehicle, and the data volume uploaded by the passive device i is accumulated to be
Figure BDA0003086322640000057
S23, determining an optimized target expression and a constraint condition thereof, wherein the target problem of the maximization of the average data volume of the system is as follows:
Figure BDA0003086322640000058
Figure BDA0003086322640000059
q(0)=q(T)
Figure BDA00030863226400000510
Figure BDA00030863226400000511
Figure BDA00030863226400000512
Figure BDA00030863226400000513
where P1 represents the optimization problem P1, i.e. maximizing the average throughput of all devices by adjusting drone position, speed and communication mode;
Figure BDA00030863226400000514
representing the average flight speed of the unmanned aerial vehicle, and tau representing the current flight time of the unmanned aerial vehicle; q (0) represents the position of the drone at time T-0, q (T) represents the position of the drone at time T-T, T is the drone flight time specified in advance, and q (0) q (T) represents the need for the drone to return to the home position at time T. ZetaQoSThe constraint representing the QoS criteria, i.e. the minimum amount of data uploaded per sensor, also represents that the drone needs to traverse all sensors.
S3, analyzing the optimization problem, and modeling the optimization problem as a Markov process; the Markov process is composed of 4 tuples<S,A,R,P>Definition, where S is a set of states; a is the set of all possible actions, R is the reward when an action is taken, and P represents the transition probability from one state to another. Specifically, the drone observes the environment and obtains state s as a smart agenttE.g. S. Unmanned aerial vehicle selects action a at time ttE.g. A, then according to the observation and the next state st+1Obtain a return rt∈R。
S4, determining a network communication protocol and an unmanned aerial vehicle flight decision model; in order to solve the problem that the unmanned aerial vehicle does not have a priori knowledge of the position of the passive device, a coverage area is defined for the unmanned aerial vehicle, and only the passive device in the coverage area can communicate with the unmanned aerial vehicle. When the drone is in WPT mode, the drone broadcasts energy to all passive devices in the coverage area. At the end of the time slot, the passive device receiving the energy will send a short beacon status message to the drone, including battery power, channel information and accumulated data volume. In the next time slot, the drone will determine the next action, i.e. steering angle, passive device selection and communication mode, based on the received status information of some passive devices. In the flight process, the coverage area of the unmanned aerial vehicle can change, and the unmanned aerial vehicle can automatically navigate to the optimal position to receive more passive equipment state information, improve the average data volume to the greatest extent, and reasonably plan the flight path while meeting the energy constraint of the passive equipment.
S5, defining a neural network input state, an unmanned aerial vehicle output action and a reward function; the method is realized by the following steps:
s51, determining a network state set: defining the network state as S ═ ei(t),ζi,q(t),hi(t)},ei(t) represents the battery power level, ζ, of the ith passive device at time t within the coverage areaiRepresenting the accumulated uploaded data volume of the passive device i, q (t) representing the position of the unmanned aerial vehicle at the moment t, hi(t) represents the channel gain of the passive device i and the unmanned aerial vehicle at the time t;
s52, determining the output unmanned aerial vehicle action set A as: a ═ i, [ rho ] (t), α (t), vUAV(t), where ρ (t) represents a communication mode of the drone, ρ (t) ═ 1 represents a downlink transmission mode, and ρ (t) ═ 0 represents an uplink transmission mode; α (t) represents the steering angle of the drone, α (t) ∈ {0 °,45 °,90 °,135 °,180 °,225 °,270 °,325 ° }; v. ofUAV(t) denotes the flying speed of the drone, vUAV(t)∈{0m/s,5m/s,10m/s};
S53, determining a reward mechanism: defining a reward function r ═ rdata+rpenalty
Figure BDA0003086322640000061
Representing the variation of the network average data volume, and executing corresponding penalty r once any constraint in the constraint conditions is not satisfiedpenaltyAnd I denotes the total number of passive devices.
S6, solving an optimization problem according to a depth reinforcement learning algorithm;
deep reinforcement learning algorithm as shown in fig. 3, the deep reinforcement learning algorithm can obtain the best strategy pi to maximize the long-term expected cumulative reward. The expected jackpot for each state-action pair output by the neural network may be defined as
Figure BDA0003086322640000071
Where gamma represents the discount factor. By selecting the best action
Figure BDA0003086322640000072
An optimal action-value function can be obtained
Figure BDA0003086322640000073
Wherein
Figure BDA0003086322640000074
Indicating the learning rate.
The deep reinforcement learning algorithm comprises two neural networks, wherein one neural network is a current neural network and used for calculating a value Q in a current state, and the other neural network is a target neural network and used for calculating a value Q in a next state.
Inputting: iteration round number F, state characteristic dimension n, action set A, attenuation factor gamma, exploration rate epsilon and learning rate
Figure BDA0003086322640000075
Q network structure, batch gradient descending sample number m, and target Q network parameter updating frequency.
The method specifically comprises the following steps:
s61, initializing network parameters: initializing values Q corresponding to all states and actions, initializing all parameters omega of the current neural network, emptying a set D of experience playback, wherein the parameters omega' of the target neural network are omega;
s62 initialization StFor the current state, the feature vector phi(s) of the current state is obtainedt);
S63, using phi (S) in the neural networkt) As input, values Q corresponding to all states of the neural network are obtained, and corresponding actions a are selected from the current values Q by an epsilon-greedy methodt
S64, at state StPerforming a current action atTo obtain a new state st+1And a feature vector phi(s) corresponding to the new statet+1) And the prize r of the current statetWill { phi(s)t),at,rt,φ(st+1) Storing the quadruple into an empirical playback set D;
s65, let t be t +1, then St=st+1Judging the new state st+1Whether the flight is in the ending state or not, if not, returning to the step S63; if yes, continuing to judge whether the iteration round number T +1 is larger than T, if yes, ending the iteration, otherwise, returning to the step S63;
s66, sampling m samples from the empirical playback set Dj),aj,rj,φ(sj+1) J 1.. m, the current target state-action value y is calculated according to the following formulaj
Figure BDA0003086322640000076
Q′(sj+1,aj+1(ii) a ω') represents the value of the next state, calculated by the target neural network, rather than the current neural network, thus avoiding training the neural network with the current neural network and avoiding too strong a coupling.
ytRepresents the Q value calculated by the formula, which is obtained by calculation and is not directly output through a neural network; the aforementioned Q value is equivalent to a true Q value and is obtained by directly inputting a state into the Q network. The invention aims to train a neural network and use the value Q output by the neural network to approximate the value y obtained by the calculation of the formulatAnd the mean square error loss function between the two functions is minimized, so that the neural network can perfectly simulate the target value Q finally.
S67, calculating a mean square error loss function
Figure BDA0003086322640000081
Updating all parameters omega of the neural network through gradient back propagation of the neural network so as to minimize a mean square error loss function; y isjIs shown in state sjThe value calculated by the formula of S66,
Figure BDA0003086322640000082
is shown in state sjThe value directly output through the current neural network;
s68, if t% of the target neural network parameter update frequency is 1, updating the target neural network parameter ω' ═ ω (that is, the target neural network parameter is updated once at intervals of update frequency and time), otherwise, not updating the target neural network parameter;
s69, updating coordinates of the unmanned aerial vehicle, calculating the battery power level of the passive device, accumulating the uploaded data volume by the passive device, and obtaining the channel gain of the passive device and the unmanned aerial vehicle.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims (5)

1.无线自供电通信网络的无人机自主导航及资源调度方法,其特征在于,包括以下步骤:1. The unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network, is characterized in that, comprises the following steps: S1、确定网络模型、通信方式及信道模型;S1. Determine the network model, communication mode and channel model; S2、对下行无线功率传输和上行无线信息传输进行建模,并确定优化目标表达式及其约束条件;S2. Model the downlink wireless power transmission and the uplink wireless information transmission, and determine the optimization target expression and its constraints; S3、分析优化问题,将优化问题建模为马尔科夫过程;S3, analyze the optimization problem, and model the optimization problem as a Markov process; S4、确定网络通信协议及无人机飞行决策模型;S4. Determine the network communication protocol and the UAV flight decision model; S5、定义神经网络输入状态、无人机输出动作以及奖励函数;S5. Define the input state of the neural network, the output action of the UAV, and the reward function; S6、根据深度强化学习算法求解优化问题。S6. Solve the optimization problem according to the deep reinforcement learning algorithm. 2.根据权利要求1所述的无线自供电通信网络的无人机自主导航及资源调度方法,其特征在于,所述网络模型由一个无人机及多个地面无源设备组成;2. The unmanned aerial vehicle autonomous navigation and resource scheduling method of the wireless self-powered communication network according to claim 1, wherein the network model is composed of an unmanned aerial vehicle and a plurality of ground passive devices; 通信方式为:无人机通过射频链路向地面无源设备传输能量,地面无源设备通过收割的能量向无人机发送数据;The communication method is: the UAV transmits energy to the ground passive equipment through the radio frequency link, and the ground passive equipment sends data to the UAV through the harvested energy; 所述信道模型为Los信道。The channel model is a Los channel. 3.根据权利要求1所述的无线自供电通信网络的无人机自主导航及资源调度方法,其特征在于,所述步骤S2具体包括以下分步骤:3. The unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network according to claim 1, is characterized in that, described step S2 specifically comprises the following sub-steps: S21、对于下行无线功率传输,确定地面无源设备所收割到的能量;S21. For downlink wireless power transmission, determine the energy harvested by the ground passive equipment; S22、对于上行无线信息传输,当无人机选择某一地面无源设备进行通信时,确定上行传输数据量;S22. For uplink wireless information transmission, when the drone selects a certain ground passive device for communication, determine the amount of uplink transmission data; S23、确定优化目标表达式及其约束条件。S23, determine the optimization objective expression and its constraints. 4.根据权利要求1所述的无线自供电通信网络的无人机自主导航及资源调度方法,其特征在于,所述步骤S5具体实现方法为:4. The unmanned aerial vehicle autonomous navigation and resource scheduling method of the wireless self-powered communication network according to claim 1, is characterized in that, the concrete realization method of described step S5 is: S51、确定网络状态集合:定义网络状态为S={ei(t),ζi,q(t),hi(t)},ei(t)表示覆盖范围内t时刻第i个无源设备的电池电量等级,ζi表示无源设备i累积上传数据量,q(t)表示无人机在t时刻的位置,hi(t)表示无源设备i与无人机在t时刻的信道增益;S51. Determine the network state set: define the network state as S={e i (t), ζ i , q(t), h i (t)}, and e i (t) represents the i-th non-network state within the coverage area at time t The battery level of the source device, ζ i represents the cumulative upload data amount of the passive device i, q(t) represents the position of the drone at time t, h i (t) represents the passive device i and the drone at time t the channel gain; S52、确定输出的无人机动作集合A为:A={i,ρ(t),α(t),vUAV(t)},其中,ρ(t)表示无人机的通信模式,ρ(t)=1表示下行传输模式,ρ(t)=0表示上行传输模;α(t)表示无人机转向角;vUAV(t)表示无人机的飞行速度;S52. Determine the output UAV action set A as: A={i, ρ(t), α(t), v UAV (t)}, where ρ(t) represents the communication mode of the UAV, ρ (t)=1 represents the downlink transmission mode, ρ(t)=0 represents the uplink transmission mode; α(t) represents the steering angle of the UAV; v UAV (t) represents the flight speed of the UAV; S53、确定奖励机制:定义奖励函数r=rdata+rpenalty
Figure FDA0003086322630000021
表示网络平均数据量的变化量,一旦约束条件中的任一约束不满足时,将执行相应的惩罚rpenalty,I表示无源设备的总数量。
S53, determine the reward mechanism: define the reward function r=r data +r penalty ,
Figure FDA0003086322630000021
Represents the variation of the average data volume of the network. Once any of the constraints is not satisfied, the corresponding penalty r penalty will be executed, and I represents the total number of passive devices.
5.根据权利要求1所述的无线自供电通信网络的无人机自主导航及资源调度方法,其特征在于,所述步骤S6具体包括以下分步骤:5. The unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network according to claim 1, is characterized in that, described step S6 specifically comprises the following sub-steps: S61、初始化网络参数:初始化所有的状态和动作对应的价值Q,初始化当前神经网络的所有参数ω,目标神经网络的参数ω′=ω,清空经验回放的集合D;S61, initialize network parameters: initialize all the values Q corresponding to the states and actions, initialize all the parameters ω of the current neural network, the parameter ω′=ω of the target neural network, and clear the set D of experience playback; S62、初始化st为当前状态,得到当前状态的特征向量φ(st);S62, initialize s t as the current state, and obtain the feature vector φ(s t ) of the current state; S63、在神经网络中使用φ(st)作为输入,得到神经网络的所有状态对应的价值Q,用ε-贪婪法在当前价值Q中选择对应的动作atS63. Use φ(s t ) as the input in the neural network to obtain the value Q corresponding to all states of the neural network, and use the ε-greedy method to select the corresponding action a t in the current value Q; S64、在状态st执行当前动作at,得到新状态st+1,以及新状态对应的特征向量φ(st+1)和当前状态的奖励rt,将{φ(st),at,rt,φ(st+1)}这个四元组存入经验回放集合D;S64. Execute the current action a t in the state s t to obtain the new state s t+1 , as well as the feature vector φ(s t+1 ) corresponding to the new state and the reward r t of the current state, set {φ(s t ), a t , r t , φ(s t+1 )} this quadruple is stored in the experience playback set D; S65、令t=t+1,则st=st+1,判断新状态st+1是否为终止飞行状态,若否则返回步骤S63;若是,则继续判断迭代轮数t+1是否大于T,若是,则结束迭代,反之返回步骤S63;S65, set t=t+1, then s t =s t+1 , judge whether the new state s t+1 is the termination flight state, if not, return to step S63; if so, continue to judge whether the number of iteration rounds t+1 is greater than T, if yes, end the iteration, otherwise return to step S63; S66、从经验回放集合D中采样m个样本{φ(sj),aj,rj,φ(sj+1)},j=1,...,m,根据以下公式计算当前目标状态-动作的价值yjS66. Sample m samples {φ(s j ), a j , r j , φ(s j+1 )}, j=1,...,m from the experience playback set D, and calculate the current target according to the following formula State-action value y j :
Figure FDA0003086322630000022
Figure FDA0003086322630000022
Q′(sj+1,aj+1;ω′)表示下一个状态的价值,是通过目标神经网络计算得到的;Q′(s j+1 , a j+1 ; ω′) represents the value of the next state, which is calculated by the target neural network; S67、计算均方差损失函数
Figure FDA0003086322630000023
通过神经网络的梯度反向传播来更新神经网络的所有参数ω,使得均方差损失函数最小化;yj表示在状态sj时通过S66的公式计算得到的价值,
Figure FDA0003086322630000024
表示在状态sj时通过当前神经网络直接输出的价值;
S67. Calculate the mean square error loss function
Figure FDA0003086322630000023
All parameters ω of the neural network are updated through the gradient back-propagation of the neural network, so that the mean square error loss function is minimized; y j represents the value calculated by the formula of S66 in the state s j ,
Figure FDA0003086322630000024
Represents the value directly output by the current neural network in state s j ;
S68、如果t%目标神经网络参数更新频率=1,则更新目标神经网络参数ω′=ω,否则不更新目标神经网络参数;S68, if t% target neural network parameter update frequency=1, update the target neural network parameter ω′=ω, otherwise do not update the target neural network parameter; S69、更新无人机坐标,计算无源设备的电池电量等级,无源设备累积上传数据量,无源设备与无人机在的信道增益。S69. Update the coordinates of the drone, calculate the battery level of the passive device, the cumulative upload data volume of the passive device, and the channel gain between the passive device and the drone.
CN202110582074.0A 2021-05-27 2021-05-27 Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network Active CN113255218B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110582074.0A CN113255218B (en) 2021-05-27 2021-05-27 Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110582074.0A CN113255218B (en) 2021-05-27 2021-05-27 Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network

Publications (2)

Publication Number Publication Date
CN113255218A true CN113255218A (en) 2021-08-13
CN113255218B CN113255218B (en) 2022-05-31

Family

ID=77184662

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110582074.0A Active CN113255218B (en) 2021-05-27 2021-05-27 Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network

Country Status (1)

Country Link
CN (1) CN113255218B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114061589A (en) * 2021-11-16 2022-02-18 中山大学 Edge-side-coordinated multi-unmanned aerial vehicle autonomous navigation method
CN114881287A (en) * 2022-04-06 2022-08-09 南京航空航天大学 Energy optimization method for industrial wireless chargeable sensor network
CN115470894A (en) * 2022-10-31 2022-12-13 中国人民解放军国防科技大学 Time-sharing call method and device for UAV knowledge model based on reinforcement learning
CN115766769A (en) * 2022-10-25 2023-03-07 西北工业大学 Wireless sensor network deployment method based on deep reinforcement learning
CN116113025A (en) * 2023-02-16 2023-05-12 中国科学院上海微系统与信息技术研究所 A trajectory design and power allocation method in UAV collaborative communication network
CN116502547A (en) * 2023-06-29 2023-07-28 深圳大学 Multi-unmanned aerial vehicle wireless energy transmission method based on graph reinforcement learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110428115A (en) * 2019-08-13 2019-11-08 南京理工大学 Maximization system benefit method under dynamic environment based on deeply study
CN110488861A (en) * 2019-07-30 2019-11-22 北京邮电大学 Unmanned plane track optimizing method, device and unmanned plane based on deeply study
WO2020079702A1 (en) * 2018-10-18 2020-04-23 Telefonaktiebolaget Lm Ericsson (Publ) Formation flight of unmanned aerial vehicles
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 A UAV network hovering position optimization method based on multi-agent deep reinforcement learning
CN112468205A (en) * 2020-01-09 2021-03-09 电子科技大学中山学院 Backscatter secure communication method suitable for unmanned aerial vehicle
CN112711271A (en) * 2020-12-16 2021-04-27 中山大学 Autonomous navigation unmanned aerial vehicle power optimization method based on deep reinforcement learning
CN112817327A (en) * 2020-12-30 2021-05-18 北京航空航天大学 Multi-unmanned aerial vehicle collaborative search method under communication constraint

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020079702A1 (en) * 2018-10-18 2020-04-23 Telefonaktiebolaget Lm Ericsson (Publ) Formation flight of unmanned aerial vehicles
CN110488861A (en) * 2019-07-30 2019-11-22 北京邮电大学 Unmanned plane track optimizing method, device and unmanned plane based on deeply study
CN110428115A (en) * 2019-08-13 2019-11-08 南京理工大学 Maximization system benefit method under dynamic environment based on deeply study
CN112468205A (en) * 2020-01-09 2021-03-09 电子科技大学中山学院 Backscatter secure communication method suitable for unmanned aerial vehicle
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 A UAV network hovering position optimization method based on multi-agent deep reinforcement learning
CN112711271A (en) * 2020-12-16 2021-04-27 中山大学 Autonomous navigation unmanned aerial vehicle power optimization method based on deep reinforcement learning
CN112817327A (en) * 2020-12-30 2021-05-18 北京航空航天大学 Multi-unmanned aerial vehicle collaborative search method under communication constraint

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JIE HU 等: "Joint Trajectory and Scheduling Design for UAV Aided Secure Backscatter Communications", 《IEEE WIRELESS COMMUNICATIONS LETTERS》, vol. 9, no. 12, 12 April 2020 (2020-04-12), pages 2168 - 2172, XP011824554, DOI: 10.1109/LWC.2020.3016174 *
KAI LI 等: "Deep Reinforcement Learning for Real-Time Trajectory Planning in UAV Networks", 《2020 INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING (IWCMC)》, 27 July 2020 (2020-07-27), pages 958 - 963 *
伍芸荻: "无人机通信系统中信息和能量传输优化研究", 《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑》, no. 8, 15 August 2019 (2019-08-15), pages 031 - 66 *
杨鲲 等: "无线数能一体化通信网络及其数能联合接入控制协议设计", 《吉林师范大学学报(自然科学版)》, vol. 40, no. 1, 16 January 2019 (2019-01-16), pages 106 - 114 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114061589A (en) * 2021-11-16 2022-02-18 中山大学 Edge-side-coordinated multi-unmanned aerial vehicle autonomous navigation method
CN114061589B (en) * 2021-11-16 2023-05-26 中山大学 Multi-unmanned aerial vehicle autonomous navigation method with cooperative end edges
CN114881287A (en) * 2022-04-06 2022-08-09 南京航空航天大学 Energy optimization method for industrial wireless chargeable sensor network
CN115766769A (en) * 2022-10-25 2023-03-07 西北工业大学 Wireless sensor network deployment method based on deep reinforcement learning
CN115470894A (en) * 2022-10-31 2022-12-13 中国人民解放军国防科技大学 Time-sharing call method and device for UAV knowledge model based on reinforcement learning
CN116113025A (en) * 2023-02-16 2023-05-12 中国科学院上海微系统与信息技术研究所 A trajectory design and power allocation method in UAV collaborative communication network
CN116502547A (en) * 2023-06-29 2023-07-28 深圳大学 Multi-unmanned aerial vehicle wireless energy transmission method based on graph reinforcement learning
CN116502547B (en) * 2023-06-29 2024-06-04 深圳大学 Multi-unmanned aerial vehicle wireless energy transmission method based on graph reinforcement learning

Also Published As

Publication number Publication date
CN113255218B (en) 2022-05-31

Similar Documents

Publication Publication Date Title
CN113255218B (en) Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network
Zhan et al. Energy minimization for cellular-connected UAV: From optimization to deep reinforcement learning
CN109743210B (en) Unmanned aerial vehicle network multi-user access control method based on deep reinforcement learning
CN110730028B (en) Unmanned aerial vehicle-assisted backscatter communication device and resource allocation control method
WO2020015214A1 (en) Optimization method for wireless information and energy transmission based on unmanned aerial vehicle
CN110380776B (en) Internet of things system data collection method based on unmanned aerial vehicle
CN115696211A (en) Unmanned aerial vehicle track self-adaptive optimization method based on information age
CN111526592B (en) Non-cooperative multi-agent power control method used in wireless interference channel
CN115494732B (en) A UAV trajectory design and power allocation method based on proximal strategy optimization
CN114942653B (en) Method, device and electronic device for determining flight strategy of unmanned swarm
CN113776531B (en) Multi-unmanned aerial vehicle autonomous navigation and task allocation algorithm of wireless self-powered communication network
Li et al. Deep reinforcement learning for real-time trajectory planning in UAV networks
CN117062182A (en) DRL-based wireless chargeable unmanned aerial vehicle data uploading path optimization method
CN117915375A (en) DDQN-based unmanned aerial vehicle track optimization method in data acquisition scene
CN117768987A (en) Unmanned aerial vehicle track planning and video transmission method based on cellular network
CN111182469B (en) An Energy Harvesting Network Time Allocation and UAV Trajectory Optimization Method
Ni et al. Optimal transmission control and learning-based trajectory design for UAV-assisted detection and communication
CN117376985B (en) Energy efficiency optimization method for multi-unmanned aerial vehicle auxiliary MEC task unloading under rice channel
CN118101034A (en) Optimization method of UAV-assisted communication system based on dynamic prediction of user location
CN116882270A (en) Multi-unmanned aerial vehicle wireless charging and edge computing combined optimization method and system based on deep reinforcement learning
CN116489610A (en) UAV-assisted wearable Internet of Things device charging and data processing method and system
CN116009590A (en) Distributed trajectory planning method, system, equipment and medium for UAV network
Mondal et al. Joint Trajectory, User-Association, and Power Control for Green UAV-Assisted Data Collection using Deep Reinforcement Learning
Chen et al. Proximal Policy Optimization-Based Anti-Jamming UAV-Assisted Data Collection
CN112087767B (en) HAP-UAV access network power control method based on minimized distortion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant