CN113255218A - Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network - Google Patents
Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network Download PDFInfo
- Publication number
- CN113255218A CN113255218A CN202110582074.0A CN202110582074A CN113255218A CN 113255218 A CN113255218 A CN 113255218A CN 202110582074 A CN202110582074 A CN 202110582074A CN 113255218 A CN113255218 A CN 113255218A
- Authority
- CN
- China
- Prior art keywords
- unmanned aerial
- aerial vehicle
- network
- state
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004891 communication Methods 0.000 title claims abstract description 49
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000013528 artificial neural network Methods 0.000 claims abstract description 52
- 230000005540 biological transmission Effects 0.000 claims abstract description 32
- 230000009471 action Effects 0.000 claims abstract description 20
- 238000005457 optimization Methods 0.000 claims abstract description 16
- 230000006870 function Effects 0.000 claims abstract description 13
- 230000002787 reinforcement Effects 0.000 claims abstract description 9
- 230000008569 process Effects 0.000 claims abstract description 6
- 230000000875 corresponding effect Effects 0.000 claims description 16
- 239000013598 vector Substances 0.000 claims description 7
- 230000001186 cumulative effect Effects 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 239000000243 solution Substances 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000003306 harvesting Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 239000007983 Tris buffer Substances 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W16/00—Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
- H04W16/22—Traffic simulation tools or models
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/02—Traffic management, e.g. flow control or congestion control
- H04W28/0226—Traffic management, e.g. flow control or congestion control based on location or mobility
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/02—Traffic management, e.g. flow control or congestion control
- H04W28/06—Optimizing the usage of the radio link, e.g. header compression, information sizing, discarding information
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/30—Services specially adapted for particular environments, situations or purposes
- H04W4/38—Services specially adapted for particular environments, situations or purposes for collecting sensor information
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/02—CAD in a network environment, e.g. collaborative CAD or distributed simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/04—Constraint-based CAD
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/08—Probabilistic or stochastic CAD
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Computer Hardware Design (AREA)
- Geometry (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention discloses an unmanned aerial vehicle autonomous navigation and resource scheduling method of a wireless self-powered communication network, which comprises the following steps: s1, determining a network model, a communication mode and a channel model; s2, modeling downlink wireless power transmission and uplink wireless information transmission, and determining an optimized target expression and constraint conditions thereof; s3, analyzing the optimization problem, and modeling the optimization problem as a Markov process; s4, determining a network communication protocol and an unmanned aerial vehicle flight decision model; s5, defining a neural network input state, an unmanned aerial vehicle output action and a reward function; and S6, solving the optimization problem according to the depth reinforcement learning algorithm. According to the invention, through jointly designing three parts, namely the flight path of the unmanned aerial vehicle in the wireless self-powered communication network, the selection of the ground equipment and the communication mode with the ground equipment, the energy supply to a plurality of pieces of ground equipment is realized, and meanwhile, the maximization of the average data volume of the plurality of pieces of ground equipment in the wireless self-powered communication network is also considered.
Description
Technical Field
The invention belongs to the technical field of unmanned aerial vehicle energy supply communication networks, and particularly relates to an unmanned aerial vehicle autonomous navigation and resource scheduling method of a wireless self-powered communication network.
Background
Wireless Sensor Networks (WSNs) may be used to collect information about the surrounding environment. Generally, the power of the device in the wireless sensor network is limited, and when the power of the device is exhausted, the sensor is charged in a manual mode or a traditional ground communication network, so that the charging efficiency is low. While Radio Frequency (RF) based Energy Harvesting (EH) may be considered as an expected solution to extend the useful life of energy-limited sensor devices. Wireless Power Transfer (WPT) through RF radiation can provide a convenient, reliable energy supply for low power internet of things devices. And can operate over a longer range, charging it simultaneously even under multiple wireless devices that are moving. Wireless Powered Communication Networks (WPCNs) have therefore been proposed. The Wireless Power Transmission (WPT) and Wireless Information Transmission (WIT) are integrated, and a feasible solution is provided for energy constraint Internet of things equipment.
UAV (Unmanned Aerial Vehicle) by virtue of its high mobility and low cost, it can support better communication links between air and ground terminals due to less signal blocking and shadowing effects. It can provide higher line of sight (LoS) channel probability and better connectivity by greatly shortening its distance to the user compared to a conventional fixed base station. The UAV, as an air base station, can be used to overcome the user unfairness problem caused by the "double far-near" problem in the conventional fixed base station wireless energy supply network, and improve the data rate by flexibly reducing the signal propagation distance between the UAV and the ground equipment.
However, in the current technology, the energy transmission and data collection tasks of the unmanned aerial vehicle in an unknown environment are not considered under the condition that the position of the ground equipment is known.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an unmanned aerial vehicle autonomous navigation and resource scheduling method of a wireless self-powered communication network, which realizes energy supply to a plurality of ground devices and also maximizes the average data volume of the plurality of devices in the wireless self-powered communication network by jointly designing three parts, namely a flight track of an unmanned aerial vehicle, selection of ground devices and a communication mode with the ground devices in the wireless self-powered communication network.
The purpose of the invention is realized by the following technical scheme: the unmanned aerial vehicle autonomous navigation and resource scheduling method of the wireless self-powered communication network comprises the following steps:
s1, determining a network model, a communication mode and a channel model;
s2, modeling downlink wireless power transmission and uplink wireless information transmission, and determining an optimized target expression and constraint conditions thereof;
s3, analyzing the optimization problem, and modeling the optimization problem as a Markov process;
s4, determining a network communication protocol and an unmanned aerial vehicle flight decision model;
s5, defining a neural network input state, an unmanned aerial vehicle output action and a reward function;
and S6, solving the optimization problem according to the depth reinforcement learning algorithm.
Furthermore, the network model consists of an unmanned aerial vehicle and a plurality of ground passive devices;
the communication mode is as follows: the unmanned aerial vehicle transmits energy to the ground passive device through the radio frequency link, and the ground passive device transmits data to the unmanned aerial vehicle through the harvested energy;
the channel model is a Los channel.
Further, the step S2 specifically includes the following sub-steps:
s21, determining the energy harvested by the ground passive equipment for the downlink wireless power transmission;
s22, for uplink wireless information transmission, when the unmanned aerial vehicle selects a certain ground passive device for communication, determining an uplink transmission data volume;
and S23, determining an optimization target expression and a constraint condition thereof.
Further, the step S5 specifically includes the following sub-steps:
s51, determining a network state set: defining the network state as S ═ ei(t),ζi,q(t),hi(t)},ei(t) represents the battery power level, ζ, of the ith passive device at time t within the coverage areaiRepresenting the accumulated uploaded data volume of the passive device i, q (t) representing the position of the unmanned aerial vehicle at the moment t, hi(t) represents the channel gain of the passive device i and the unmanned aerial vehicle at the time t;
s52, determining the output unmanned aerial vehicle action set A as: a ═ i, [ rho ] (t), α (t), vUAV(t), where ρ (t) represents a communication mode of the drone, ρ (t) ═ 1 represents a downlink transmission mode, and ρ (t) ═ 0 represents an uplink transmission mode; α (t) represents an unmanned aerial vehicle steering angle; v. ofUAV(t) represents the flight speed of the drone;
s53, determining a reward mechanism: defining a reward function r ═ rdata+rpenalty,Representing the variation of the network average data volume, and executing corresponding penalty r once any constraint in the constraint conditions is not satisfiedpenaltyAnd I denotes the total number of passive devices.
Further, the step S6 specifically includes the following sub-steps:
s61, initializing network parameters: initializing values Q corresponding to all states and actions, initializing all parameters omega of the current neural network, emptying a set D of experience playback, wherein the parameters omega' of the target neural network are omega;
s62 initialization StFor the current state, the feature vector phi(s) of the current state is obtainedt);
S63, using phi (S) in the neural networkt) As input, values Q corresponding to all states of the neural network are obtained, and corresponding actions a are selected from the current values Q by an epsilon-greedy methodt;
S64, at state StPerforming a current action atTo obtain a new state st+1And a feature vector phi(s) corresponding to the new statet+1) And the prize r of the current statetWill { phi(s)t),at,rt,φ(st+1) Storing the quadruple into an empirical playback set D;
s65, let t be t +1, then St=st+1Judging the new state st+1Whether the flight is in the ending state or not, if not, returning to the step S63; if yes, continuing to judge whether the iteration round number T +1 is larger than T, if yes, ending the iteration, otherwise, returning to the step S63;
s66, sampling m samples from the empirical playback set Dj),aj,rj,φ(sj+1) J 1.. m, the current target state-action value y is calculated according to the following formulaj:
Q′(sj+1,aj+1(ii) a ω') represents the value of the next state, calculated by the target neural network;
s67, calculating a mean square error loss functionUpdating all parameters omega of the neural network by gradient back propagation of the neural network such that the mean square error loss functionMinimization; y isjIs shown in state sjThe value calculated by the formula of S66,is shown in state sjThe value directly output through the current neural network;
s68, if t% of the target neural network parameter update frequency is 1, updating the target neural network parameter ω' ═ ω, otherwise, not updating the target neural network parameter;
s69, updating coordinates of the unmanned aerial vehicle, calculating the battery power level of the passive device, accumulating the uploaded data volume by the passive device, and obtaining the channel gain of the passive device and the unmanned aerial vehicle.
The invention has the beneficial effects that: the invention achieves the purpose of improving the average uplink transmission data volume of system users to the maximum extent by jointly designing three parts of the flight path of the unmanned aerial vehicle, the selection of ground equipment and the communication mode of the ground equipment in the wireless self-powered communication network, performs optimized solution by a deep reinforcement learning algorithm, and inputs the system state into a neural network so as to output the optimal action of the unmanned aerial vehicle. The invention fully considers the problem that the unmanned aerial vehicle does not have prior knowledge on the position of the ground equipment, realizes energy supply to a plurality of pieces of ground equipment, and simultaneously considers the maximization of the average data volume of the plurality of pieces of equipment in the wireless self-powered communication network.
Drawings
Fig. 1 is a flow chart of the autonomous navigation and resource scheduling method of the unmanned aerial vehicle of the present invention;
fig. 2 is a schematic diagram of a wireless self-powered communication network model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a deep reinforcement learning algorithm model according to the present invention.
Detailed Description
The technical scheme of the invention is further explained by combining the attached drawings.
As shown in fig. 1, the method for unmanned aerial vehicle autonomous navigation and resource scheduling in a wireless autonomous power communication network of the present invention includes the following steps:
s1, determining a network model, a communication mode and a channel model;
the network model consists of an unmanned aerial vehicle and a plurality of ground passive devices; suppose that there is an unmanned aerial vehicle as the aerial base station in the WPCN network, and there are I passive (sensor) devices on the ground, which are recorded as Representing a two-dimensional space. The drone is destined to collect data for I passive devices in the area. In order to simplify the network model, the flying height of the unmanned aerial vehicle is assumed to be unchanged and fixed as H. The position of the unmanned aerial vehicle at time t is denoted as q (t) ═ x (t), y (t), and the flying speed is vUAV(t) the carrier signal transmission power of the unmanned aerial vehicle is PUAVChannel noise power of σ2At time t, the distance between the unmanned aerial vehicle and each passive device isWhere | · | | represents the euclidean distance between a pair of vectors, wiThe location of the ith passive device is indicated. The energy conversion efficiency coefficient of the passive equipment is eta, and the signal transmitting power is Ptr. A model of a communication network based on drones is shown in fig. 2.
The communication mode is as follows: the unmanned aerial vehicle transmits energy to the ground passive device through the radio frequency link, and the ground passive device transmits data to the unmanned aerial vehicle through the harvested energy; drones serve as both transmitters of energy and receivers of information. The ground passive equipment adopts a protocol of 'harvesting before transmission', namely, after enough energy is harvested from a downlink radio frequency link of the unmanned aerial vehicle, data are transmitted to the unmanned aerial vehicle through an uplink link. The total working time of the unmanned aerial vehicle is T, and at each time T, the unmanned aerial vehicle determines a communication mode and uses rho (T) epsilon {0,1} to represent the communication mode. Wherein ρ (t) ═ 1 represents a downlink transmission mode, and the unmanned aerial vehicle broadcasts energy to the ground passive equipment; ρ (t) ═ 0 represents the uplink transmission mode, and the drone selects a certain passive device to receive its uploaded data information, and only one device is allowed to upload at this time.
The channel model is a Los channel; at time t, the two-dimensional coordinates of the drone are q (t) ═ (x (t), y (t)). Suppose there is a Los channel between the drone and the ground passive device, and the path loss index is 2. The passive device i and the unmanned aerial vehicle have channel gains at the moment t ofβ0Representing the channel gain at a reference distance of 1 meter.
S2, modeling downlink wireless power transmission and uplink wireless information transmission, and determining an optimized target expression and constraint conditions thereof; the method specifically comprises the following steps:
s21, determining the energy harvested by the ground passive equipment for the downlink wireless power transmission to obtain an energy constraint condition; assuming that the unmanned aerial vehicle is in a downlink transmission mode, the corresponding power received by the passive device i at the moment t isWherein P isUAVThe transmitting power of the unmanned aerial vehicle is represented, and eta is an energy conversion efficiency coefficient of the passive equipment. Suppose inWithin the time, the unmanned aerial vehicle is always in a downlink communication mode, and the battery energy on the passive device i isComparing the residual battery power of the passive equipment with an energy threshold value, judging whether the residual battery power of the passive equipment is greater than the energy threshold value, if so, defining the power level of the passive equipment to be 1, otherwise, defining the power level of the passive equipment to be 0, and discretizing the battery power of all the passive equipment into high and low levels ei(t)∈{0,1}。
S22, for uplink wireless information transmission, when the unmanned aerial vehicle selects a certain ground passive device for communication, determining the uplink transmission data volume to obtain a service quality constraint condition; suppose an unmanned aerial vehicleAnd in an uplink transmission mode, at the moment, the passive device i is selected to transmit data to the unmanned aerial vehicle, and the throughput of the passive device i at the moment t isWhere B is the system bandwidth, PtrIs the transmitted power of the passive device(s),is a reference signal-to-noise ratio (SNR). Suppose inWithin the time, the passive device i is selected all the time to send data to the unmanned aerial vehicle, and the data volume uploaded by the passive device i is accumulated to be
S23, determining an optimized target expression and a constraint condition thereof, wherein the target problem of the maximization of the average data volume of the system is as follows:
q(0)=q(T)
where P1 represents the optimization problem P1, i.e. maximizing the average throughput of all devices by adjusting drone position, speed and communication mode;representing the average flight speed of the unmanned aerial vehicle, and tau representing the current flight time of the unmanned aerial vehicle; q (0) represents the position of the drone at time T-0, q (T) represents the position of the drone at time T-T, T is the drone flight time specified in advance, and q (0) q (T) represents the need for the drone to return to the home position at time T. ZetaQoSThe constraint representing the QoS criteria, i.e. the minimum amount of data uploaded per sensor, also represents that the drone needs to traverse all sensors.
S3, analyzing the optimization problem, and modeling the optimization problem as a Markov process; the Markov process is composed of 4 tuples<S,A,R,P>Definition, where S is a set of states; a is the set of all possible actions, R is the reward when an action is taken, and P represents the transition probability from one state to another. Specifically, the drone observes the environment and obtains state s as a smart agenttE.g. S. Unmanned aerial vehicle selects action a at time ttE.g. A, then according to the observation and the next state st+1Obtain a return rt∈R。
S4, determining a network communication protocol and an unmanned aerial vehicle flight decision model; in order to solve the problem that the unmanned aerial vehicle does not have a priori knowledge of the position of the passive device, a coverage area is defined for the unmanned aerial vehicle, and only the passive device in the coverage area can communicate with the unmanned aerial vehicle. When the drone is in WPT mode, the drone broadcasts energy to all passive devices in the coverage area. At the end of the time slot, the passive device receiving the energy will send a short beacon status message to the drone, including battery power, channel information and accumulated data volume. In the next time slot, the drone will determine the next action, i.e. steering angle, passive device selection and communication mode, based on the received status information of some passive devices. In the flight process, the coverage area of the unmanned aerial vehicle can change, and the unmanned aerial vehicle can automatically navigate to the optimal position to receive more passive equipment state information, improve the average data volume to the greatest extent, and reasonably plan the flight path while meeting the energy constraint of the passive equipment.
S5, defining a neural network input state, an unmanned aerial vehicle output action and a reward function; the method is realized by the following steps:
s51, determining a network state set: defining the network state as S ═ ei(t),ζi,q(t),hi(t)},ei(t) represents the battery power level, ζ, of the ith passive device at time t within the coverage areaiRepresenting the accumulated uploaded data volume of the passive device i, q (t) representing the position of the unmanned aerial vehicle at the moment t, hi(t) represents the channel gain of the passive device i and the unmanned aerial vehicle at the time t;
s52, determining the output unmanned aerial vehicle action set A as: a ═ i, [ rho ] (t), α (t), vUAV(t), where ρ (t) represents a communication mode of the drone, ρ (t) ═ 1 represents a downlink transmission mode, and ρ (t) ═ 0 represents an uplink transmission mode; α (t) represents the steering angle of the drone, α (t) ∈ {0 °,45 °,90 °,135 °,180 °,225 °,270 °,325 ° }; v. ofUAV(t) denotes the flying speed of the drone, vUAV(t)∈{0m/s,5m/s,10m/s};
S53, determining a reward mechanism: defining a reward function r ═ rdata+rpenalty,Representing the variation of the network average data volume, and executing corresponding penalty r once any constraint in the constraint conditions is not satisfiedpenaltyAnd I denotes the total number of passive devices.
S6, solving an optimization problem according to a depth reinforcement learning algorithm;
deep reinforcement learning algorithm as shown in fig. 3, the deep reinforcement learning algorithm can obtain the best strategy pi to maximize the long-term expected cumulative reward. The expected jackpot for each state-action pair output by the neural network may be defined asWhere gamma represents the discount factor. By selecting the best actionAn optimal action-value function can be obtainedWhereinIndicating the learning rate.
The deep reinforcement learning algorithm comprises two neural networks, wherein one neural network is a current neural network and used for calculating a value Q in a current state, and the other neural network is a target neural network and used for calculating a value Q in a next state.
Inputting: iteration round number F, state characteristic dimension n, action set A, attenuation factor gamma, exploration rate epsilon and learning rateQ network structure, batch gradient descending sample number m, and target Q network parameter updating frequency.
The method specifically comprises the following steps:
s61, initializing network parameters: initializing values Q corresponding to all states and actions, initializing all parameters omega of the current neural network, emptying a set D of experience playback, wherein the parameters omega' of the target neural network are omega;
s62 initialization StFor the current state, the feature vector phi(s) of the current state is obtainedt);
S63, using phi (S) in the neural networkt) As input, values Q corresponding to all states of the neural network are obtained, and corresponding actions a are selected from the current values Q by an epsilon-greedy methodt;
S64, at state StPerforming a current action atTo obtain a new state st+1And a feature vector phi(s) corresponding to the new statet+1) And the prize r of the current statetWill { phi(s)t),at,rt,φ(st+1) Storing the quadruple into an empirical playback set D;
s65, let t be t +1, then St=st+1Judging the new state st+1Whether the flight is in the ending state or not, if not, returning to the step S63; if yes, continuing to judge whether the iteration round number T +1 is larger than T, if yes, ending the iteration, otherwise, returning to the step S63;
s66, sampling m samples from the empirical playback set Dj),aj,rj,φ(sj+1) J 1.. m, the current target state-action value y is calculated according to the following formulaj:
Q′(sj+1,aj+1(ii) a ω') represents the value of the next state, calculated by the target neural network, rather than the current neural network, thus avoiding training the neural network with the current neural network and avoiding too strong a coupling.
ytRepresents the Q value calculated by the formula, which is obtained by calculation and is not directly output through a neural network; the aforementioned Q value is equivalent to a true Q value and is obtained by directly inputting a state into the Q network. The invention aims to train a neural network and use the value Q output by the neural network to approximate the value y obtained by the calculation of the formulatAnd the mean square error loss function between the two functions is minimized, so that the neural network can perfectly simulate the target value Q finally.
S67, calculating a mean square error loss functionUpdating all parameters omega of the neural network through gradient back propagation of the neural network so as to minimize a mean square error loss function; y isjIs shown in state sjThe value calculated by the formula of S66,is shown in state sjThe value directly output through the current neural network;
s68, if t% of the target neural network parameter update frequency is 1, updating the target neural network parameter ω' ═ ω (that is, the target neural network parameter is updated once at intervals of update frequency and time), otherwise, not updating the target neural network parameter;
s69, updating coordinates of the unmanned aerial vehicle, calculating the battery power level of the passive device, accumulating the uploaded data volume by the passive device, and obtaining the channel gain of the passive device and the unmanned aerial vehicle.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.
Claims (5)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110582074.0A CN113255218B (en) | 2021-05-27 | 2021-05-27 | Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110582074.0A CN113255218B (en) | 2021-05-27 | 2021-05-27 | Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113255218A true CN113255218A (en) | 2021-08-13 |
CN113255218B CN113255218B (en) | 2022-05-31 |
Family
ID=77184662
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110582074.0A Active CN113255218B (en) | 2021-05-27 | 2021-05-27 | Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113255218B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114061589A (en) * | 2021-11-16 | 2022-02-18 | 中山大学 | Edge-side-coordinated multi-unmanned aerial vehicle autonomous navigation method |
CN114881287A (en) * | 2022-04-06 | 2022-08-09 | 南京航空航天大学 | Energy optimization method for industrial wireless chargeable sensor network |
CN115470894A (en) * | 2022-10-31 | 2022-12-13 | 中国人民解放军国防科技大学 | Time-sharing call method and device for UAV knowledge model based on reinforcement learning |
CN115766769A (en) * | 2022-10-25 | 2023-03-07 | 西北工业大学 | Wireless sensor network deployment method based on deep reinforcement learning |
CN116113025A (en) * | 2023-02-16 | 2023-05-12 | 中国科学院上海微系统与信息技术研究所 | A trajectory design and power allocation method in UAV collaborative communication network |
CN116502547A (en) * | 2023-06-29 | 2023-07-28 | 深圳大学 | Multi-unmanned aerial vehicle wireless energy transmission method based on graph reinforcement learning |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110428115A (en) * | 2019-08-13 | 2019-11-08 | 南京理工大学 | Maximization system benefit method under dynamic environment based on deeply study |
CN110488861A (en) * | 2019-07-30 | 2019-11-22 | 北京邮电大学 | Unmanned plane track optimizing method, device and unmanned plane based on deeply study |
WO2020079702A1 (en) * | 2018-10-18 | 2020-04-23 | Telefonaktiebolaget Lm Ericsson (Publ) | Formation flight of unmanned aerial vehicles |
CN111786713A (en) * | 2020-06-04 | 2020-10-16 | 大连理工大学 | A UAV network hovering position optimization method based on multi-agent deep reinforcement learning |
CN112468205A (en) * | 2020-01-09 | 2021-03-09 | 电子科技大学中山学院 | Backscatter secure communication method suitable for unmanned aerial vehicle |
CN112711271A (en) * | 2020-12-16 | 2021-04-27 | 中山大学 | Autonomous navigation unmanned aerial vehicle power optimization method based on deep reinforcement learning |
CN112817327A (en) * | 2020-12-30 | 2021-05-18 | 北京航空航天大学 | Multi-unmanned aerial vehicle collaborative search method under communication constraint |
-
2021
- 2021-05-27 CN CN202110582074.0A patent/CN113255218B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020079702A1 (en) * | 2018-10-18 | 2020-04-23 | Telefonaktiebolaget Lm Ericsson (Publ) | Formation flight of unmanned aerial vehicles |
CN110488861A (en) * | 2019-07-30 | 2019-11-22 | 北京邮电大学 | Unmanned plane track optimizing method, device and unmanned plane based on deeply study |
CN110428115A (en) * | 2019-08-13 | 2019-11-08 | 南京理工大学 | Maximization system benefit method under dynamic environment based on deeply study |
CN112468205A (en) * | 2020-01-09 | 2021-03-09 | 电子科技大学中山学院 | Backscatter secure communication method suitable for unmanned aerial vehicle |
CN111786713A (en) * | 2020-06-04 | 2020-10-16 | 大连理工大学 | A UAV network hovering position optimization method based on multi-agent deep reinforcement learning |
CN112711271A (en) * | 2020-12-16 | 2021-04-27 | 中山大学 | Autonomous navigation unmanned aerial vehicle power optimization method based on deep reinforcement learning |
CN112817327A (en) * | 2020-12-30 | 2021-05-18 | 北京航空航天大学 | Multi-unmanned aerial vehicle collaborative search method under communication constraint |
Non-Patent Citations (4)
Title |
---|
JIE HU 等: "Joint Trajectory and Scheduling Design for UAV Aided Secure Backscatter Communications", 《IEEE WIRELESS COMMUNICATIONS LETTERS》, vol. 9, no. 12, 12 April 2020 (2020-04-12), pages 2168 - 2172, XP011824554, DOI: 10.1109/LWC.2020.3016174 * |
KAI LI 等: "Deep Reinforcement Learning for Real-Time Trajectory Planning in UAV Networks", 《2020 INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING (IWCMC)》, 27 July 2020 (2020-07-27), pages 958 - 963 * |
伍芸荻: "无人机通信系统中信息和能量传输优化研究", 《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑》, no. 8, 15 August 2019 (2019-08-15), pages 031 - 66 * |
杨鲲 等: "无线数能一体化通信网络及其数能联合接入控制协议设计", 《吉林师范大学学报(自然科学版)》, vol. 40, no. 1, 16 January 2019 (2019-01-16), pages 106 - 114 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114061589A (en) * | 2021-11-16 | 2022-02-18 | 中山大学 | Edge-side-coordinated multi-unmanned aerial vehicle autonomous navigation method |
CN114061589B (en) * | 2021-11-16 | 2023-05-26 | 中山大学 | Multi-unmanned aerial vehicle autonomous navigation method with cooperative end edges |
CN114881287A (en) * | 2022-04-06 | 2022-08-09 | 南京航空航天大学 | Energy optimization method for industrial wireless chargeable sensor network |
CN115766769A (en) * | 2022-10-25 | 2023-03-07 | 西北工业大学 | Wireless sensor network deployment method based on deep reinforcement learning |
CN115470894A (en) * | 2022-10-31 | 2022-12-13 | 中国人民解放军国防科技大学 | Time-sharing call method and device for UAV knowledge model based on reinforcement learning |
CN116113025A (en) * | 2023-02-16 | 2023-05-12 | 中国科学院上海微系统与信息技术研究所 | A trajectory design and power allocation method in UAV collaborative communication network |
CN116502547A (en) * | 2023-06-29 | 2023-07-28 | 深圳大学 | Multi-unmanned aerial vehicle wireless energy transmission method based on graph reinforcement learning |
CN116502547B (en) * | 2023-06-29 | 2024-06-04 | 深圳大学 | Multi-unmanned aerial vehicle wireless energy transmission method based on graph reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN113255218B (en) | 2022-05-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113255218B (en) | Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network | |
Zhan et al. | Energy minimization for cellular-connected UAV: From optimization to deep reinforcement learning | |
CN109743210B (en) | Unmanned aerial vehicle network multi-user access control method based on deep reinforcement learning | |
CN110730028B (en) | Unmanned aerial vehicle-assisted backscatter communication device and resource allocation control method | |
WO2020015214A1 (en) | Optimization method for wireless information and energy transmission based on unmanned aerial vehicle | |
CN110380776B (en) | Internet of things system data collection method based on unmanned aerial vehicle | |
CN115696211A (en) | Unmanned aerial vehicle track self-adaptive optimization method based on information age | |
CN111526592B (en) | Non-cooperative multi-agent power control method used in wireless interference channel | |
CN115494732B (en) | A UAV trajectory design and power allocation method based on proximal strategy optimization | |
CN114942653B (en) | Method, device and electronic device for determining flight strategy of unmanned swarm | |
CN113776531B (en) | Multi-unmanned aerial vehicle autonomous navigation and task allocation algorithm of wireless self-powered communication network | |
Li et al. | Deep reinforcement learning for real-time trajectory planning in UAV networks | |
CN117062182A (en) | DRL-based wireless chargeable unmanned aerial vehicle data uploading path optimization method | |
CN117915375A (en) | DDQN-based unmanned aerial vehicle track optimization method in data acquisition scene | |
CN117768987A (en) | Unmanned aerial vehicle track planning and video transmission method based on cellular network | |
CN111182469B (en) | An Energy Harvesting Network Time Allocation and UAV Trajectory Optimization Method | |
Ni et al. | Optimal transmission control and learning-based trajectory design for UAV-assisted detection and communication | |
CN117376985B (en) | Energy efficiency optimization method for multi-unmanned aerial vehicle auxiliary MEC task unloading under rice channel | |
CN118101034A (en) | Optimization method of UAV-assisted communication system based on dynamic prediction of user location | |
CN116882270A (en) | Multi-unmanned aerial vehicle wireless charging and edge computing combined optimization method and system based on deep reinforcement learning | |
CN116489610A (en) | UAV-assisted wearable Internet of Things device charging and data processing method and system | |
CN116009590A (en) | Distributed trajectory planning method, system, equipment and medium for UAV network | |
Mondal et al. | Joint Trajectory, User-Association, and Power Control for Green UAV-Assisted Data Collection using Deep Reinforcement Learning | |
Chen et al. | Proximal Policy Optimization-Based Anti-Jamming UAV-Assisted Data Collection | |
CN112087767B (en) | HAP-UAV access network power control method based on minimized distortion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |