CN117062182A - DRL-based wireless chargeable unmanned aerial vehicle data uploading path optimization method - Google Patents

DRL-based wireless chargeable unmanned aerial vehicle data uploading path optimization method Download PDF

Info

Publication number
CN117062182A
CN117062182A CN202311129223.3A CN202311129223A CN117062182A CN 117062182 A CN117062182 A CN 117062182A CN 202311129223 A CN202311129223 A CN 202311129223A CN 117062182 A CN117062182 A CN 117062182A
Authority
CN
China
Prior art keywords
task
task machine
machine
mathematical model
time slot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311129223.3A
Other languages
Chinese (zh)
Inventor
贾兆红
张博文
王辛迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN202311129223.3A priority Critical patent/CN117062182A/en
Publication of CN117062182A publication Critical patent/CN117062182A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W40/00Communication routing or communication path finding
    • H04W40/02Communication route or path selection, e.g. power-based or shortest path routing
    • H04W40/20Communication route or path selection, e.g. power-based or shortest path routing based on geographic position or location
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/14Relay systems
    • H04B7/15Active relay systems
    • H04B7/185Space-based or airborne stations; Stations for satellite systems
    • H04B7/18502Airborne stations
    • H04B7/18504Aircraft used as relay or high altitude atmospheric platform
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • H04L45/08Learning-based routing, e.g. using neural networks or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W40/00Communication routing or communication path finding
    • H04W40/02Communication route or path selection, e.g. power-based or shortest path routing
    • H04W40/04Communication route or path selection, e.g. power-based or shortest path routing based on wireless node resources
    • H04W40/10Communication route or path selection, e.g. power-based or shortest path routing based on wireless node resources based on available power or energy

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • Astronomy & Astrophysics (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • General Physics & Mathematics (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention relates to a wireless chargeable unmanned aerial vehicle data uploading path optimization method based on DRL, which comprises the following steps: constructing an Internet of things communication and wireless charging scene system; establishing a first mathematical model for the on-board battery consumption of the task machine; establishing a second mathematical model for the data uploading channel; establishing an energy supplementing mathematical model, namely a third mathematical model, when the wireless charging process is executed; establishing a fourth mathematical model for the path optimization target; determining a state set S, an action set A and a reward function r t The method comprises the steps of carrying out a first treatment on the surface of the Offline learning is carried out by utilizing an improved Double DQN algorithm to obtain an optimal path strategy pi * . The invention provides more convenience for the task machine for assisting the base station to execute data uploading in the communication system of the Internet of thingsUnder the condition of higher data real-time requirement, the cruising ability of the mission machine is obviously improved through a more convenient charging method, and the high data uploading efficiency of the Internet of things communication system under the assistance of the unmanned aerial vehicle is realized.

Description

DRL-based wireless chargeable unmanned aerial vehicle data uploading path optimization method
Technical Field
The invention relates to the technical fields of artificial intelligence, power and communication, in particular to a wireless chargeable unmanned aerial vehicle data uploading path optimization method based on DRL.
Background
Owing to the highly flexible deployment capability, unmanned aerial vehicles are widely applied to emerging fields such as the Internet of things in recent years, and aim to overcome the defect of the existing base station communication, and the unmanned aerial vehicle can play a role in connecting a mobile communication access point with a ground user to provide emergency data service, so that the communication service quality of the mobile user in a large-scale scene is ensured. However, due to the energy constraints of the onboard battery, unmanned aerial vehicles face serious energy shortage problems in providing services.
The problem of inefficiency that unmanned aerial vehicle airborne energy constraint brought mainly represents along with going on continuously of time and task, unmanned aerial vehicle individual energy constantly consumes and makes it need return the charging station at the task execution moment, can not satisfy the data uploading demand of high real-time. With the advent of various emerging energy supply technologies, battery replenishment has advanced significantly, wherein wireless power transfer technologies can achieve decoupling of the location of energy from a sensing location, transferring energy from an energy rich region to an energy poor region, allowing for efficient energy harvesting while the drone is performing data transfer tasks.
By carrying a high-gain radio frequency antenna on the unmanned aerial vehicle as a mobile charger, the unmanned aerial vehicle can provide charging service for the unmanned aerial vehicle for performing data uploading tasks, and the unmanned aerial vehicle can be used as an effective energy solution. In the communication context of the task machine-assisted internet of things system, user data uploading and on-demand charging are involved, so that an effective unmanned aerial vehicle path planning strategy is needed, and efficient execution of individual unmanned aerial vehicle data uploading tasks is ensured.
Although related researchers currently carry out a series of researches on optimization of the unmanned aerial vehicle data uploading path, such as ant colony algorithm, genetic algorithm and reinforcement learning algorithm are used for solving the optimization path, most of the unmanned aerial vehicle data uploading path optimization is carried out only for flight tasks under the support of unmanned aerial vehicle airborne energy, more single flight energy utilization rate under the condition of no charging is considered, and the requirement of further meeting the task efficient execution by using an energy supplementing technology is ignored. Therefore, development of a data uploading path optimization method of a wireless chargeable unmanned aerial vehicle is urgently needed, real-time requirements of data uploading of equipment of the Internet of things are met, wireless energy supplement is considered to improve single flight life of the unmanned aerial vehicle, and important research significance and application value are provided for flight path optimization of a mission machine.
Disclosure of Invention
The invention aims to solve the problem of low efficiency caused by energy shortage of an unmanned aerial vehicle in a task scene of communication of an existing task unmanned aerial vehicle, namely a task unmanned aerial vehicle auxiliary Internet of things system, and provides a DRL-based wireless chargeable unmanned aerial vehicle data uploading path optimization method which can obviously improve the cruising ability of the task unmanned aerial vehicle and realize high data uploading efficiency of the Internet of things communication system under the assistance of the unmanned aerial vehicle under the condition of higher data real-time requirement.
In order to achieve the above purpose, the present invention adopts the following technical scheme: a method for optimizing a data uploading path of a wireless chargeable unmanned aerial vehicle based on a DRL, comprising the following sequential steps:
(1) The method comprises the steps of constructing an Internet of things communication and wireless charging scene system, wherein the Internet of things communication and wireless charging scene system comprises a task machine, a mobile charger and M mobile Internet of things devices;
(2) Establishing a first mathematical model for the on-board battery consumption of the task machine;
(3) Establishing a second mathematical model for the data uploading channel;
(4) Establishing an energy supplementing mathematical model, namely a third mathematical model, when the wireless charging process is executed;
(5) Establishing a fourth mathematical model for the path optimization target according to the communication of the Internet of things, the wireless charging scene system, the first mathematical model, the second mathematical model and the third mathematical model;
(6) Determining a state set S, an action set A and a reward function r according to the communication of the Internet of things, the wireless charging scene system, the first mathematical model, the second mathematical model, the third mathematical model and the fourth mathematical model t
(7) According to the state set S, the action set A and the rewarding function r t Using improved Double DQN algorithm performs offline learning to obtain optimal path strategy pi *
The step (1) specifically refers to: the system is characterized in that an Internet of things communication and wireless charging scene system is recorded as a system, the system space is divided into N grids, each grid is a square unit, the side length is c, N is the number of grids in the transverse direction and the longitudinal direction, a task machine is deployed in the scene to execute uploading tasks, a mobile charger gives wireless charging service, and M mobile Internet of things devices have values d respectively m Waiting for uploading of the data volume of (2);
the current time slot of the system is recorded as T, t=0, τ,2τ..T, τ is the length of a single time slot, and T is the time of the system termination state, then the position of the mobile internet of things device m at the time of time slot T isRepresentation, where m.epsilon. {1,2 … M },>representing the abscissa of mobile internet of things device m in grid space,/for>Representing the ordinate, h, of mobile Internet of things device m in grid space m Representing the fixed height of the mobile internet of things device m; assume that the remaining amount of data to be uploaded at time slot t for mobile animal networking device m is denoted +.>
The mobile charger as the energy supply end starts from a stop point at the beginning of a task and starts with a pre-deployed flight path and a pre-deployed flight speed v k Moving and providing energy to task machines for real-time positioning Indicating (I)>Represents the abscissa of the mobile charger in grid space,/->Indicating the ordinate of the mobile charger in the grid space, h k The fixed flying height of the mobile charger; when the mission starts, the mission machine starts from the stop point and takes a constant flying speed v u Flying, real-time positioningRepresentation, wherein->Represents the abscissa, < >/of the task machine in grid space>Representing the ordinate of the task machine in the grid space, h u Is a fixed flight level of the mission machine.
The step (2) specifically refers to: the maximum battery capacity of the task machine is b maxIndicating the remaining power on time slot t, +.>The mission machine consumes a constant energy value per execution of a flight action +.>Then, without considering charging, a mathematical model is built at the battery level of the task machine at time slot t+1 and is recorded as a first mathematical model, and the expression is as follows:
in the method, in the process of the invention,representing the remaining power on time slot t + 1.
The step (3) specifically refers to: the task machine and the mobile internet of things equipment establish communication connection and start data uploading, the flight height of the task machine is enough, at the moment, the visual range wireless transmission communication is guaranteed between the mobile internet of things equipment and the task machine, and then the expression of the channel gain between the mobile internet of things equipment and the task machine is as follows:
Wherein ρ is 0 Representing the channel gain of the channel at a reference distance of 1m,the Euclidean distance between the task machine and the mobile Internet of things equipment m when the time slot is t is represented, and the expression is as follows:
wherein,representing Euclidean distance of the task machine and the mobile internet of things device m on the abscissa, and +.>The Euclidean distance of the task machine and the mobile Internet of things equipment m on the ordinate is represented; h is a u The fixed flying height of the task machine; establishing a mathematical model of data transmission at the time slot t, and recording the mathematical model as a second mathematical model:
wherein,is the transmission rate of a data transmission link established between a task machine and mobile internet of things equipment m in a time slot t, W is the signal bandwidth and P is IoT Is the transmitting power sigma of the equipment of the Internet of things 2 Is the noise power.
The step (4) specifically refers to: the mobile charger is provided with a high-gain radio frequency antenna for transmitting wireless power, and provides energy supply service for the task machine according to a fixed deployment track; in the task execution process, the distances between a task machine and a mobile charger are different in different time slots, when the distance between the task machine and the mobile charger is increased, the power of wireless charging can be drastically reduced, the full-energy conversion efficiency is assumed in a wireless transmission link, and the power obtained at the task machine is calculated through a Friis free space propagation model Expressed by the following formula:
wherein P is t Is the power of the transmitting end, G t And G r Is the antenna gain at the transmitting and receiving ends, lambda is the transmission wavelength,the euclidean distance between the transmitting end and the receiving end is expressed as the following formula:
wherein,indicating the Euclidean distance between the task machine and the mobile charger in the transverse direction of the time slot t, and +.>Indicating the Europe distance of the task machine and the mobile charger in the longitudinal direction of the time slot tSeparation, h k,u Representing a constant height difference between the task machine and the mobile charger; and establishing an energy supplementing mathematical model when the wireless charging process is executed and marking the energy supplementing mathematical model as a third mathematical model, wherein the expression is as follows:
in the method, in the process of the invention,for the energy value received by the task machine on a single slot, τ is the length of the single slot.
The step (5) specifically refers to: the variables in the dynamic scene aiming at the optimization problem of the path planning of the task machine are the position information of the mobile Internet of things equipment, the mobile charger and the task machine, the equipment data uploading queue and the task machine battery state information; the optimization aims at finding a path strategy, helping a task machine to make an optimal decision between balancing energy consumption and uploading data amount, maximizing data uploading efficiency in a single flight process from the angle of optimizing a moving track, and the main factors considered in the process are data uploading amount and task energy consumption, wherein the expression of a fourth mathematical model is as follows:
Wherein,the transmission rate of a data transmission link established between a task machine and mobile internet of things equipment m in a time slot t is +.>The constant energy value is consumed for each flight action executed by the mission machine; recording the current time slot of the system as T, t=0, τ,2τ..T, τ is the length of a single time slot, and T is the time of the system termination state; m.epsilon. {1,2 … M }.
The step (6) specifically comprises the following steps:
(6a) Determining a state set S:
the expression of the state set S is as follows:
wherein s is t Is the system state at time slot t, defined by L m (t)、L u (t)、B u (t) three parts; l (L) m (t) representing the state of each mobile Internet of things device in the data uploading task, and guiding the task target of the data uploading, L m (t) including the position and the residual data amount information of all the mobile Internet of things devices, and setting the two-dimensional coordinates of the mobile Internet of things device M with the last serial number in the region of the uploading task in the process of proceeding in the horizontal direction of the time slot tThe amount of data to be uploaded remaining therewith +.>Then L is m (t) is represented by the following formula:
L u (t) representing the data transmission rate of the mobile Internet of things equipment and the position of the task machine at different positions within the effective transmission distance of the task machine by simulating the visual field of the task machine; according to the system characteristics, a task machine is used as a center, n grids are assumed in a visual field range, positions outside a task area are represented by black grids, corresponding matrix values are set to-1 to represent a separation from the task area, data uploading rates at different positions are represented according to the difference of distances from the task machine, the uploading rate at the task machine is maximum, corresponding matrix elements are set to 50, a grid corresponding matrix value is set to 20, two grid corresponding matrix values are set to 5, Matrix values corresponding to the rest white grids are set to 0;
then a new view matrix Y εR is constructed at this time (2n+1)×(2n+1)For the ith row of the field of view matrix, the jth column of matrix element values, where i, j e (0, 2n+1)]And assuming that the ith row and jth column of the field of view matrix Y have spatial positions corresponding to x as the abscissa in the overall grid space i 、y j ,/>For the ith row of the visual field matrix at the time slot t, the position of the jth column matrix element is away from the grid distance of the task machine, and the visual field matrix element is +.>The numerical expression of (2) is as follows:
the obtained view matrix Y is composed of (2n+1) 2 The square matrix of each element, the elements in the visual field matrix Y are flattened, and the abscissa of the combined task machine in the grid space is combinedOrdinate of task machine in grid space +.>Form a one-dimensional column vector L u (t) the expression is as follows:
B u (t) current battery level including task machineHorizontal coordinate of mobile charger in grid space>Ordinate of mobile charger in grid space +.>Euclidean distance between task machine and mobile charger>B u The expression (t) is as follows:
(6b) Determining an action set A:
the movement of the task machine in the grid space in the system is selected according to the movement direction, and the expression of the action space is as follows:
wherein a is t Action performed on time slot t for mission machine and at fixed speed v during flight u Four kinds of flying directions are adopted; the flight process obtains energy from the transmitting end of the mobile charger in a wireless charging mode, and two processes of uploading scene data and obtaining energy are realized in parallel;
(6c) Determining a reward function r t
For the task machine, the influence of different behavior decisions on each link of the system is embodied according to a reward mechanism, and the instant rewards obtained by the task machine in a time slot t are expressed as r t The expression is as follows:
wherein mu 1 、μ 2 、μ 3 To adjustA weighting factor in between; />Taking the maximum data throughput in the whole data uploading process as a core target, giving positive behavior rewards to each behavior according to the data throughput in the single communication process, wherein the expression is as follows:
wherein,the data amount uploaded to the task machine by the user m at the time t is represented;
aiming at maximizing the data uploading efficiency of the whole data uploading process, endowing negative rewards to the movement behaviors of each step of the task machine to urge the task machine to promote the path selection capability, reducing unnecessary energy loss and promoting the convergence of the optimal path;
the method aims at maximizing the endurance of the task machine to execute the data uploading task, and gives positive behavior rewards to wireless charging caused by the task machine to execute the movement decision, and the expression is as follows:
Wherein,indicating the remaining capacity of the task machine at time slot t, < >>B for the energy value received by the task machine on a single time slot th To determine whether the task machine enters a threshold, beta, of a low energy level state 1 And beta 2 Are all constant coefficients.
The step (7) specifically comprises the following steps:
(7a) Initializing estimated value network neural parameter theta 1 And target value network neural parameter θ 2 Let theta 2 =θ 1 The method comprises the steps of carrying out a first treatment on the surface of the Initializing an experience playback pool, wherein the capacity is D; initializing a network learning rate alpha and an attenuation coefficient gamma;
(7b) Based on the current state-action pair (s t ,a t ),s t A is the system state at time slot t t For the task machine to perform action on the time slot t, the estimated value of the Q value is output, namely Q predicted (s t ,a t ;θ 1 ) Wherein θ is 1 Network neural parameters are estimated values; the target value network generates for the next state selection actionWherein s is t+1 An action value, a, inputted for a target value network t+1 For the action value entered for the target network, a next state-action pair(s) of the target network is then determined t+1 ,a t+1 ) Q value of +.>The target value for the Double DQN algorithm is defined as:
(7c) According to the current state s t Executing action a t And selecting actions according to the improved Double DQN algorithm, and converting to a new state s t+1 According to the reward function r t Calculating a single step prize value, storing the translations in an experience playback pool (s t ,a t ,r t ,s t +1 );
Circularly executing the steps (7 b) to (7 c) until the number of memory stores in the experience playback pool is equal to D and the step (7D) is entered;
the Double DQN algorithm is provided with two neural network structures, namely an estimated value network and a target value network, and training is carried out once every step by giving a training step length step;
defining the total prize value in each training round as R, the prize value obtained by executing a single flight path is expressed as:
the improved Double DQN algorithm is to improve greedy coefficient epsilon selected by actions, ensure actions obtaining maximum rewarding values in each step according to optimization measures, select the current optimal solution with probability of epsilon as a value in the execution process of epsilon-greedy strategies, and search other actions with probability of remaining 1-epsilon, wherein the numerical variation expression of epsilon coefficients is as follows:
wherein, epicode is the current number of flight rounds, K is the maximum number of flight rounds reached when ε=1;
randomly extracting z experiences from an experience playback pool with the capacity of D to form an offline learning training data set; the maximum task round number is F;
(7d) Randomly extracting z memories from the experience playback pool, z=32; the kth state transition sequence in the small batch of data is noted as (s k ,a k ,r k ,s k+1 ),k=1,2,3…z;
(7e) According to step (7 d) (s k ,a k ,r k ,s k+1 ) K=1, 2,3 … z, the target Q value is calculatedAnd a loss value L (θ) 1 ) Let->The loss function is expressed as follows:
(7f) Minimizing loss function L (θ) by gradient descent 1 ) The expression is as follows:
in the method, in the process of the invention,is a partial guide symbol>Representing a minimization error function L (θ) 1 ) For theta 1 Deviation-inducing and-> Representing the estimated value network input as s t 、a t The square value of the difference between the Q value calculated at that time and the Q value of the target value network versus the estimated value network neural parameter theta 1 Deviation-inducing and->To update the estimated value network neural parameter, the target value network neural parameter theta is obtained at intervals of step steps 2 Replaced by theta 1
When the step (7 d) to the step (7F) are finished once, the single training is finished, the step (7 b) to the step (7F) are repeatedly executed in the execution process of each round of flight task, the training is finished after F rounds are finished, and the learning process of the Double DQN algorithm is finished;
(7g) And (3) after the training algorithm is finished, saving the optimal path strategy pi * And recording the rewarding value, the flight steps and the data uploading amount: in F training rounds, the estimated value network neural parameter θ 1 And target value network neural parameter θ 2 Updating towards the direction of maximizing the total rewarding value R, and finally finding the optimal path strategy pi *
According to the technical scheme, the beneficial effects of the invention are as follows: firstly, compared with the prior art, the remote field wireless charging technology is utilized, the unmanned aerial vehicle with high mobility is combined with the high-gain radio frequency antenna to serve as a mobile charger, more convenient charging service is provided for a task machine for assisting a base station to execute data uploading in the communication system of the Internet of things, and under the condition of higher data real-time requirement, the cruising ability of the task machine is obviously improved through a more convenient charging method, so that the high data uploading efficiency of the communication system of the Internet of things under the assistance of the unmanned aerial vehicle is realized; secondly, according to the characteristic of high dimensionality of the problem state space, the method uses a Double DQN (direct solution) with a Double-depth neural network structure on a path optimization method and improves the exploration coefficient.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a grid scene representation in an implementation of the present invention;
FIG. 3 is a block diagram of a model of the improved Double DQN algorithm of the present invention;
FIG. 4 is a graph of the effect of convergence of prize value training in the present invention;
FIG. 5 is a diagram of the training effect of the number of steps of a single flight mission in the invention;
fig. 6 is a diagram of training effect of single flight mission data upload in the present invention.
Detailed Description
As shown in fig. 1, a method for optimizing a data uploading path of a wireless chargeable unmanned aerial vehicle based on a DRL includes the following sequential steps:
(1) The method comprises the steps of constructing an Internet of things communication and wireless charging scene system, wherein the Internet of things communication and wireless charging scene system comprises a task machine, a mobile charger and M mobile Internet of things devices;
(2) Establishing a first mathematical model for the on-board battery consumption of the task machine;
(3) Establishing a second mathematical model for the data uploading channel;
(4) Establishing an energy supplementing mathematical model, namely a third mathematical model, when the wireless charging process is executed;
(5) Establishing a fourth mathematical model for the path optimization target according to the communication of the Internet of things, the wireless charging scene system, the first mathematical model, the second mathematical model and the third mathematical model;
(6) Determining a state set S, an action set A and a reward function r according to the communication of the Internet of things, the wireless charging scene system, the first mathematical model, the second mathematical model, the third mathematical model and the fourth mathematical model t
(7) According to the state set S, the action set A and the rewarding function r t Offline learning is performed by utilizing an improved Double DQN algorithm, so that an optimal path strategy pi is obtained *
The step (1) specifically refers to: the system is characterized in that an Internet of things communication and wireless charging scene system is recorded as a system, a system space is divided into N grids, each grid is a square unit, the side length is c, N is the number of grids in the transverse direction and the longitudinal direction, a task machine is deployed in the scene to execute uploading tasks, a mobile charger gives wireless charging service, and M mobile Internet of things devices have values respectivelyIs d m Waiting for uploading of the data volume of (2);
the current time slot of the system is recorded as T, t=0, τ,2τ..T, τ is the length of a single time slot, and T is the time of the system termination state, then the position of the mobile internet of things device m at the time of time slot T isRepresentation, where m.epsilon. {1,2 … M },>representing the abscissa of mobile internet of things device m in grid space,/for>Representing the ordinate, h, of mobile Internet of things device m in grid space m Representing the fixed height of the mobile internet of things device m; assume that the remaining amount of data to be uploaded at time slot t for mobile animal networking device m is denoted +.>
The mobile charger as the energy supply end starts from a stop point at the beginning of a task and starts with a pre-deployed flight path and a pre-deployed flight speed v k Moving and providing energy to task machines for real-time positioningIndicating (I)>Represents the abscissa of the mobile charger in grid space,/->Indicating the ordinate of the mobile charger in the grid space, h k The fixed flying height of the mobile charger; when the mission starts, the mission machine starts from the stop point and takes a constant flying speed v u Flying, real-time positioningRepresentation, wherein->Represents the abscissa, < >/of the task machine in grid space>Representing the ordinate of the task machine in the grid space, h u Is a fixed flight level of the mission machine. Here, v k =10 m/s, h k =40 meters, v u =10 m/s, h u =41 meters.
The step (2) specifically refers to: the maximum battery capacity of the task machine is b maxIndicating the remaining power on time slot t, +.>The mission machine consumes a constant energy value per execution of a flight action +.>Then, without considering charging, a mathematical model is built at the battery level of the task machine at time slot t+l and is recorded as a first mathematical model, and the expression is as follows:
In the method, in the process of the invention,representing the remaining power over time slot t + l.
Here, b max =150 kilojoules of a person,
the step (3) specifically refers to: the task machine and the mobile internet of things equipment establish communication connection and start data uploading, the flight height of the task machine is enough, at the moment, the visual range wireless transmission communication is guaranteed between the mobile internet of things equipment and the task machine, and then the expression of the channel gain between the mobile internet of things equipment and the task machine is as follows:
wherein ρ is 0 Representing the channel gain of the channel at a reference distance of 1m,the Euclidean distance between the task machine and the mobile Internet of things equipment m when the time slot is t is represented, and the expression is as follows:
wherein,representing Euclidean distance of the task machine and the mobile internet of things device m on the abscissa, and +.>The Euclidean distance of the task machine and the mobile Internet of things equipment m on the ordinate is represented; h is a u The fixed flying height of the task machine; establishing a mathematical model of data transmission at the time slot t, and recording the mathematical model as a second mathematical model:
wherein,is the transmission rate of a data transmission link established between a task machine and mobile internet of things equipment m in a time slot t, W is the signal bandwidth and P is loT Is the transmitting power sigma of the equipment of the Internet of things 2 Is the noise power.
Here ρ 0 =-50dB,W=1Mhz,P loT =0.1W,σ 2 =-100dB。
The step (4) specifically refers to: the mobile charger is provided with a high-gain radio frequency antenna for transmitting wireless power, and provides energy supply service for the task machine according to a fixed deployment track; in the task execution process, the distances between a task machine and a mobile charger are different in different time slots, when the distance between the task machine and the mobile charger is increased, the power of wireless charging can be drastically reduced, the full-energy conversion efficiency is assumed in a wireless transmission link, and the power obtained at the task machine is calculated through a Friis free space propagation model Expressed by the following formula:
wherein P is t Is the power of the transmitting end, G t And G r Is the antenna gain at the transmitting and receiving ends, lambda is the transmission wavelength,the euclidean distance between the transmitting end and the receiving end is expressed as the following formula:
wherein,indicating the Euclidean distance between the task machine and the mobile charger in the transverse direction of the time slot t, and +.>Indicating Euclidean distance h between task machine and mobile charger in longitudinal direction of time slot t k,u Representing a constant height difference between the task machine and the mobile charger; and establishing an energy supplementing mathematical model when the wireless charging process is executed and marking the energy supplementing mathematical model as a third mathematical model, wherein the expression is as follows:
in the method, in the process of the invention,for the energy value received by the task machine on a single slot, τ is the length of the single slot.
Here, P t =20W,G t =25dBi,G r =25 dBi, λ=1 meter.
The step (5) specifically refers to: the variables in the dynamic scene aiming at the optimization problem of the path planning of the task machine are the position information of the mobile Internet of things equipment, the mobile charger and the task machine, the equipment data uploading queue and the task machine battery state information; the optimization aims at finding a path strategy, helping a task machine to make an optimal decision between balancing energy consumption and uploading data amount, maximizing data uploading efficiency in a single flight process from the angle of optimizing a moving track, and the main factors considered in the process are data uploading amount and task energy consumption, wherein the expression of a fourth mathematical model is as follows:
Wherein,the transmission rate of a data transmission link established between a task machine and mobile internet of things equipment m in a time slot t is +.>The constant energy value is consumed for each flight action executed by the mission machine; the current time slot of the system is recorded as t, t=0, τ and 2τ... T, τ is the length of a single slot, T is the time of the system termination state; m.epsilon. {1,2 … M }.
The step (6) specifically comprises the following steps:
(6a) Determining a state set S:
the expression of the state set S is as follows:
wherein s is t Is the system state at time slot t, defined by L m (t)、L u (t)、B u (t) three parts; l (L) m (t) representing the state of each mobile Internet of things device in the data uploading task, and guiding the task target of the data uploading, L m (t) including the position and the residual data amount information of all the mobile Internet of things devices, and setting the two-dimensional coordinates of the mobile Internet of things device M with the last serial number in the region of the uploading task in the process of proceeding in the horizontal direction of the time slot tThe amount of data to be uploaded remaining therewith +.>Then L is m (t) is represented by the following formula:
L u (t) representing the data transmission rate of the mobile Internet of things equipment and the position of the task machine at different positions within the effective transmission distance of the task machine by simulating the visual field of the task machine; according to the system characteristics, a task machine is taken as a center, n grids in a visual field range are assumed, positions outside a task area are represented by black grids, a corresponding matrix value is set to be-1 to represent a separation task area, data uploading rates of different positions are represented according to the difference of distances from the task machine, the uploading rate at the task machine is maximum, The corresponding matrix element is set to 50, the matrix value corresponding to one grid is set to 20, the matrix value corresponding to two grids is set to 5, and the matrix values corresponding to the rest white grids are set to 0;
then a new view matrix Y εR is constructed at this time (2n+1)×(2n+1)For the ith row of the field of view matrix, the jth column of matrix element values, where i, j e (0, 2n+1)]And assuming that the ith row and jth column of the field of view matrix Y have spatial positions corresponding to x as the abscissa in the overall grid space i 、y j ,/>For the ith row of the visual field matrix at the time slot t, the position of the jth column matrix element is away from the grid distance of the task machine, and the visual field matrix element is +.>The numerical expression of (2) is as follows:
the obtained view matrix Y is composed of (2n+1) 2 The square matrix of each element, the elements in the visual field matrix Y are flattened, and the abscissa of the combined task machine in the grid space is combinedOrdinate of task machine in grid space +.>Form a one-dimensional column vector L u (t) the expression is as follows: />
B u (t) includes tasksCurrent battery power of machineHorizontal coordinate of mobile charger in grid space>Ordinate of mobile charger in grid space +.>Euclidean distance between task machine and mobile charger>B u The expression (t) is as follows:
(6b) Determining an action set A:
The movement of the task machine in the grid space in the system is selected according to the movement direction, and the expression of the action space is as follows:
wherein a is t Action performed on time slot t for mission machine and at fixed speed v during flight u Four kinds of flying directions are adopted; the flight process obtains energy from the transmitting end of the mobile charger in a wireless charging mode, and two processes of uploading scene data and obtaining energy are realized in parallel;
(6c) Determining a reward function r t
For the task machine, the influence of different behavior decisions on each link of the system is embodied according to a reward mechanism, and the instant rewards obtained by the task machine in a time slot t are expressed as r t The expression is as follows:
wherein mu 1 、μ 2 、μ 3 To adjustA weighting factor in between; here, μ 1 =1,μ 2 =60,μ 3 =5;/>Taking the maximum data throughput in the whole data uploading process as a core target, giving positive behavior rewards to each behavior according to the data throughput in the single communication process, wherein the expression is as follows:
wherein,the data amount uploaded to the task machine by the user m at the time t is represented;
aiming at maximizing the data uploading efficiency of the whole data uploading process, endowing negative rewards to the movement behaviors of each step of the task machine to urge the task machine to promote the path selection capability, reducing unnecessary energy loss and promoting the convergence of the optimal path;
The method aims at maximizing the endurance of the task machine to execute the data uploading task, and gives positive behavior rewards to wireless charging caused by the task machine to execute the movement decision, and the expression is as follows:
wherein,indicating the remaining capacity of the task machine at time slot t, < >>B for the energy value received by the task machine on a single time slot th To determine whether the task machine enters a threshold, beta, of a low energy level state 1 And beta 2 Are all constant coefficients.
The step (7) specifically comprises the following steps:
(7a) Initializing estimated value network neural parameter theta 1 And target value network neural parameter θ 2 Let theta 2 =θ 1 The method comprises the steps of carrying out a first treatment on the surface of the Initializing an experience playback pool, wherein the capacity is D; initializing a network learning rate alpha and an attenuation coefficient gamma; here, α=0.0001, γ=0.95.
(7b) Based on the current state-action pair (s t ,a t ),s t A is the system state at time slot t t For the task machine to perform action on the time slot t, the estimated value of the Q value is output, namely Q predicted (s t ,a t ;θ 1 ) Wherein θ is 1 Network neural parameters are estimated values; the target value network generates for the next state selection actionWherein s is t+1 An action value, a, inputted for a target value network t+1 For the action value entered for the target network, a next state-action pair(s) of the target network is then determined t+l ,a t+1 ) Q value of +. >The target value for the Double DQN algorithm is defined as:
(7c) According to the current state s t Executing action a t And selecting actions according to the improved Double DQN algorithm, and converting to a new state s t+1 According to the reward function r t Calculating a single step prize value, storing the translations in an experience playback pool (s t ,a t ,r t ,s t +1 );
Circularly executing the steps (7 b) to (7 c) until the number of memory stores in the experience playback pool is equal to D and the step (7D) is entered;
the Double DQN algorithm is provided with two neural network structures, namely an estimated value network and a target value network, and training is carried out once every step by giving a training step length step;
defining the total prize value in each training round as R, the prize value obtained by executing a single flight path is expressed as:
here, k=8000, d=55000, z=32, f=16000, step=25.
As shown in fig. 3, the improved Double DQN algorithm refers to an action of modifying a greedy coefficient epsilon selected by an action, ensuring that a maximum rewarding value is obtained in each step according to an optimization measure, and in the execution process of an epsilon-greedy strategy, an intelligent agent selects a current optimal solution with a probability of epsilon, and performs exploration of other actions with a probability of remaining 1-epsilon, wherein the numerical variation expression of the epsilon coefficient is as follows:
Wherein, epicode is the current number of flight rounds, K is the maximum number of flight rounds reached when ε=1;
randomly extracting z experiences from an experience playback pool with the capacity of D to form an offline learning training data set; the maximum task round number is F;
(7d) Randomly extracting z memories from the experience playback pool, z=32; the kth state transition sequence in the small batch of data is noted as (s k ,a k ,r k ,s k+1 ),k=1,2,3…z;
(7e) According to step (7 d) (s k ,a k ,r k ,s k+1 ) K=1, 2,3 … z, the target Q value is calculatedAnd a loss value L (θ) 1 ) Let->The loss function is expressed as follows: />
(7f) Minimizing loss function L (θ) by gradient descent 1 ) The expression is as follows:
in the method, in the process of the invention,is a partial guide symbol>Representing a minimization error function L (θ) 1 ) For theta 1 Deviation-inducing and-> Representing the estimated value network input as s t 、a t The square value of the difference between the Q value calculated at that time and the Q value of the target value network versus the estimated value network neural parameter theta 1 Deviation-inducing and->To update the estimated value network neural parameter, the target value network neural parameter theta is obtained at intervals of step steps 2 Replaced by theta 1
When the step (7 d) to the step (7F) are finished once, the single training is finished, the step (7 b) to the step (7F) are repeatedly executed in the execution process of each round of flight task, the training is finished after F rounds are finished, and the learning process of the Double DQN algorithm is finished;
(7g) And (3) after the training algorithm is finished, saving the optimal path strategy pi * And recording the rewarding value, the flight steps and the data uploading amount: in F training rounds, the estimated value network neural parameter θ 1 And target value network neural parameter θ 2 Updating towards the direction of maximizing the total rewarding value R, and finally finding the optimal path strategy pi *
In order to improve task execution efficiency of a task machine in a communication system for assisting task unmanned aerial vehicle (task machine for short) to upload data, a concept of far-field wireless charging and flexibility and easy deployment of the unmanned aerial vehicle are utilized, and a charging unmanned aerial vehicle (mobile charger for short) with a high-gain radio frequency antenna is used for providing charging service for the task machine. Aiming at the communication and wireless charging system, the improved deep reinforcement learning algorithm Double DQN is utilized to realize the flight path optimization strategy of the task machine for balancing the data uploading task and the electric quantity supplementing requirement, so that the purpose of improving the task execution efficiency of the task machine is achieved.
The invention aims at the state set 5, the action set A and the reward function r according to the problem characteristics of the path optimization target t The design is carried out, and furthermore, the actual field of view of the task machine is simulated, and the input information L of the neural network is designed by combining the effective information of the system m (t)、L u (t)、B u (t). The greedy coefficient epsilon is designed to be increased along with the number of flight rounds in an algorithm manner so as to increase the acquisition quantity of the early-stage effective samples, an estimation value network and a target value network which are mutually independent are designed, and in the learning process, the two networks gradually update network parameters so as to reduce estimation errors, effectively reduce the influence of the over-estimation problem and improve the algorithm precision.
As shown in fig. 2, the communication and wireless charging system of the internet of things is recorded as a system, the system space is divided into N x N grids, each grid is a square unit and has a side length of c, wherein N is the number of grids in the transverse direction and the longitudinal direction, an unmanned aerial vehicle is deployed in a scene to execute uploading tasks, a mobile charger gives wireless charging service, and M mobile internet of things devices have values of d respectively m Is waiting for an upload. In this embodiment, n=15, c=10 meters, m=10, d m =1000kB。
As shown in fig. 4, the abscissa in fig. 4 represents the number of task rounds and the ordinate represents the prize value for a single flight round. As can be seen from FIG. 4, as the number of task rounds increases, the prize value gradually increases and tends to stabilize, the training effect is optimized when the number of rounds reaches about 14000, and the neural parameters θ of the estimated value network and the target value network are estimated 1 、θ 2 Updating to obtain flight path strategy pi with maximum task execution efficiency *
As shown in fig. 5, the abscissa in fig. 5 represents the number of task rounds, and the ordinate represents the number of steps of the mission machine in a single flight round. As can be seen from fig. 5, as the number of rounds of tasks increases, the number of steps of the round of tasks machine in 0-10000 is gradually increased while vibrating, and the initial onboard electric quantity of the task machine is insufficient to support the task machine to complete all data uploading, so that the task machine learns to draw close the charger at a proper time to realize electric quantity supplement. Gradually reducing the flight steps of the round task machine at 10000-16000 and finally converging the round task machine at 72 steps of single flight steps, further learning the round task machine towards the direction of less flight steps in the generated optimized strategy set, and finally finding the path flight strategy pi with the minimum energy consumption *
As shown in fig. 6, the abscissa in fig. 6 is the number of task rounds, and the ordinate is the total data load for a single flight round. As can be seen from fig. 6, as the number of rounds of task increases, the total data uploading amount gradually increases and is stabilized at the maximum value 10000kB when the number of rounds reaches about 12000, thereby realizing a strategy capable of efficiently completing the data uploading task.
In summary, compared with the prior art, the invention utilizes far-field wireless charging technology and combines the unmanned aerial vehicle with high mobility to carry the high-gain radio frequency antenna as the mobile charger, thereby providing more convenient charging service for the task machine for assisting the base station to execute data uploading in the communication system of the Internet of things, obviously improving the cruising ability of the task machine through a more convenient charging method under the condition of higher data real-time requirement, and realizing the high data uploading efficiency of the communication system of the Internet of things under the assistance of the unmanned aerial vehicle.

Claims (8)

1. A wireless chargeable unmanned aerial vehicle data uploading path optimizing method based on DRL is characterized in that: the method comprises the following steps in sequence:
(1) The method comprises the steps of constructing an Internet of things communication and wireless charging scene system, wherein the Internet of things communication and wireless charging scene system comprises a task machine, a mobile charger and M mobile Internet of things devices;
(2) Establishing a first mathematical model for the on-board battery consumption of the task machine;
(3) Establishing a second mathematical model for the data uploading channel;
(4) Establishing an energy supplementing mathematical model, namely a third mathematical model, when the wireless charging process is executed;
(5) Establishing a fourth mathematical model for the path optimization target according to the communication of the Internet of things, the wireless charging scene system, the first mathematical model, the second mathematical model and the third mathematical model;
(6) Determining a state set S, an action set A and a reward function r according to the communication of the Internet of things, the wireless charging scene system, the first mathematical model, the second mathematical model, the third mathematical model and the fourth mathematical model t
(7) According to the state set S, the action set A and the prizeExcitation function r t Offline learning is performed by utilizing an improved Double DQN algorithm, so that an optimal path strategy pi is obtained *
2. The DRL-based wireless chargeable unmanned aerial vehicle data upload path optimization method of claim 1, wherein: the step (1) specifically refers to: the system is characterized in that an Internet of things communication and wireless charging scene system is recorded as a system, the system space is divided into N grids, each grid is a square unit, the side length is c, N is the number of grids in the transverse direction and the longitudinal direction, a task machine is deployed in the scene to execute uploading tasks, a mobile charger gives wireless charging service, and M mobile Internet of things devices have values d respectively m Waiting for uploading of the data volume of (2);
recording that the current time slot of the system is T, t=0, τ,2τ … T, τ is the length of a single time slot, and T is the time of the system termination state, the position of the mobile Internet of things device m at the time of the time slot T isThe representation, where M ε {1,2 … M }, Representing the abscissa of mobile internet of things device m in grid space,/for>Representing the ordinate, h, of mobile Internet of things device m in grid space m Representing the fixed height of the mobile internet of things device m; assume that the remaining amount of data to be uploaded at time slot t for mobile animal networking device m is denoted +.>
The mobile charger as the energy supply end starts from a stop point at the beginning of a task and starts with a pre-deployed flight path and a pre-deployed flight speed v k Moving and providing energy to task machines for real-time positioningIndicating (I)>Represents the abscissa of the mobile charger in grid space,/->Indicating the ordinate of the mobile charger in the grid space, h k The fixed flying height of the mobile charger; when the mission starts, the mission machine starts from the stop point and takes a constant flying speed v u Flying, real-time positioningRepresentation, wherein->Represents the abscissa, < >/of the task machine in grid space>Representing the ordinate of the task machine in the grid space, h u Is a fixed flight level of the mission machine.
3. The DRL-based wireless chargeable unmanned aerial vehicle data upload path optimization method of claim 1, wherein: the step (2) specifically refers to: the maximum battery capacity of the task machine is b maxIndicating the remaining power at the time slot t,the mission machine consumes a constant energy value per execution of a flight action +.>Then, without considering charging, a mathematical model is built at the battery level of the task machine at time slot t+1 and is recorded as a first mathematical model, and the expression is as follows:
in the method, in the process of the invention,representing the remaining power on time slot t + 1.
4. The DRL-based wireless chargeable unmanned aerial vehicle data upload path optimization method of claim 1, wherein: the step (3) specifically refers to: the task machine and the mobile internet of things equipment establish communication connection and start data uploading, the flight height of the task machine is enough, at the moment, the visual range wireless transmission communication is guaranteed between the mobile internet of things equipment and the task machine, and then the expression of the channel gain between the mobile internet of things equipment and the task machine is as follows:
wherein ρ is 0 Representing the channel gain of the channel at a reference distance of 1m,the Euclidean distance between the task machine and the mobile Internet of things equipment m when the time slot is t is represented, and the expression is as follows:
wherein,representing Euclidean distance of the task machine and the mobile internet of things device m on the abscissa, and +.>The Euclidean distance of the task machine and the mobile Internet of things equipment m on the ordinate is represented; h is a u The fixed flying height of the task machine; establishing a mathematical model of data transmission at the time slot t, and recording the mathematical model as a second mathematical model:
Wherein,is the transmission rate of a data transmission link established between a task machine and mobile internet of things equipment m in a time slot t, W is the signal bandwidth and P is IoT Is the transmitting power sigma of the equipment of the Internet of things 2 Is the noise power.
5. The DRL-based wireless chargeable unmanned aerial vehicle data upload path optimization method of claim 1, wherein: the step (4) specifically refers to: the mobile charger is provided with a high-gain radio frequency antenna for transmitting wireless power, and provides energy supply service for the task machine according to a fixed deployment track; in the task execution process, the distances between a task machine and a mobile charger are different in different time slots, when the distance between the task machine and the mobile charger is increased, the power of wireless charging can be drastically reduced, the full-energy conversion efficiency is assumed in a wireless transmission link, and the power obtained at the task machine is calculated through a Friis free space propagation modelExpressed by the following formula:
wherein P is t Is the power of the transmitting end, G t And G r Is the antenna gain at the transmitting and receiving ends, lambda is the transmission wavelength,the euclidean distance between the transmitting end and the receiving end is expressed as the following formula:
wherein,indicating the Euclidean distance between the task machine and the mobile charger in the transverse direction of the time slot t, and +.>Indicating Euclidean distance h between task machine and mobile charger in longitudinal direction of time slot t k,u Representing a constant height difference between the task machine and the mobile charger; and establishing an energy supplementing mathematical model when the wireless charging process is executed and marking the energy supplementing mathematical model as a third mathematical model, wherein the expression is as follows:
in the method, in the process of the invention,for the energy value received by the task machine on a single slot, τ is the length of the single slot.
6. The DRL-based wireless chargeable unmanned aerial vehicle data upload path optimization method of claim 1, wherein: the step (5) specifically refers to: the variables in the dynamic scene aiming at the optimization problem of the path planning of the task machine are the position information of the mobile Internet of things equipment, the mobile charger and the task machine, the equipment data uploading queue and the task machine battery state information; the optimization aims at finding a path strategy, helping a task machine to make an optimal decision between balancing energy consumption and uploading data amount, maximizing data uploading efficiency in a single flight process from the angle of optimizing a moving track, and the main factors considered in the process are data uploading amount and task energy consumption, wherein the expression of a fourth mathematical model is as follows:
wherein,is the transmission rate of a data transmission link established between the task machine and the mobile internet of things device m at the time slot t,the constant energy value is consumed for each flight action executed by the mission machine; the current time slot of the system is recorded as T, t=0, τ,2τ … T, τ is the length of a single time slot, and T is the time of the system termination state; m.epsilon. {1,2 … M }.
7. The DRL-based wireless chargeable unmanned aerial vehicle data upload path optimization method of claim 1, wherein: the step (6) specifically comprises the following steps:
(6a) Determining a state set S:
the expression of the state set S is as follows:
wherein s is t Is the system state at time slot t, defined by L m (t)、L u (t)、B u (t) three parts; l (L) m (t) for each of the data upload tasksThe state of the mobile Internet of things equipment is represented and used for guiding task targets of data uploading, and L m (t) including the position and the residual data amount information of all the mobile Internet of things devices, and setting the two-dimensional coordinates of the mobile Internet of things device M with the last serial number in the region of the uploading task in the process of proceeding in the horizontal direction of the time slot tThe amount of data to be uploaded remaining therewith +.>Then L is m (t) is represented by the following formula:
L u (t) representing the data transmission rate of the mobile Internet of things equipment and the position of the task machine at different positions within the effective transmission distance of the task machine by simulating the visual field of the task machine; according to the system characteristics, a task machine is taken as a center, n grids in a visual field range are assumed, positions outside a task area are represented by black grids, corresponding matrix values are set to be-1 to represent a separation task area, data uploading rates at different positions are represented according to the difference of distances from the task machine, the uploading rate at the task machine is maximum, corresponding matrix elements are set to be 50, a matrix value corresponding to one grid is set to be 20, matrix values corresponding to two grids are set to be 5, and matrix values corresponding to the rest white grids are set to be 0;
Then a new view matrix Y εR is constructed at this time (2n+1)×(2n+1)For the ith row of the field of view matrix, the jth column of matrix element values, where i, j e (0, 2n+1)]And assuming that the ith row and jth column of the field of view matrix Y have spatial positions corresponding to x as the abscissa in the overall grid space i 、y j ,/>For the ith row of the visual field matrix at the time slot t, the position of the jth column matrix element is away from the grid distance of the task machine, and the visual field matrix element is +.>The numerical expression of (2) is as follows:
the obtained view matrix Y is composed of (2n+1) 2 The square matrix of each element, the elements in the visual field matrix Y are flattened, and the abscissa of the combined task machine in the grid space is combinedOrdinate of task machine in grid space +.>Form a one-dimensional column vector L u (t) the expression is as follows:
B u (t) current battery level including task machineHorizontal coordinate of mobile charger in grid space>Ordinate of mobile charger in grid space +.>Either oneEuclidean distance between service machine and mobile charger>B u The expression (t) is as follows:
(6b) Determining an action set A:
the movement of the task machine in the grid space in the system is selected according to the movement direction, and the expression of the action space is as follows:
wherein a is t Action performed on time slot t for mission machine and at fixed speed v during flight u Four kinds of flying directions are adopted; the flight process obtains energy from the transmitting end of the mobile charger in a wireless charging mode, and two processes of uploading scene data and obtaining energy are realized in parallel;
(6c) Determining a reward function r t
For the task machine, the influence of different behavior decisions on each link of the system is embodied according to a reward mechanism, and the instant rewards obtained by the task machine in a time slot t are expressed as r t The expression is as follows:
wherein mu 1 、μ 2 、μ 3 To adjustA weighting factor in between; />Taking the maximum data throughput in the whole data uploading process as a core target, giving positive behavior rewards to each behavior according to the data throughput in the single communication process, wherein the expression is as follows:
wherein,the data amount uploaded to the task machine by the user m at the time t is represented;
aiming at maximizing the data uploading efficiency of the whole data uploading process, endowing negative rewards to the movement behaviors of each step of the task machine to urge the task machine to promote the path selection capability, reducing unnecessary energy loss and promoting the convergence of the optimal path;
the method aims at maximizing the endurance of the task machine to execute the data uploading task, and gives positive behavior rewards to wireless charging caused by the task machine to execute the movement decision, and the expression is as follows:
Wherein,indicating the remaining capacity of the task machine at time slot t, < >>For the task machine when singleEnergy value received over slot, b th To determine whether the task machine enters a threshold, beta, of a low energy level state 1 And beta 2 Are all constant coefficients.
8. The DRL-based wireless chargeable unmanned aerial vehicle data upload path optimization method of claim 1, wherein: the step (7) specifically comprises the following steps:
(7a) Initializing estimated value network neural parameter theta 1 And target value network neural parameter θ 2 Let theta 2 =θ 1 The method comprises the steps of carrying out a first treatment on the surface of the Initializing an experience playback pool, wherein the capacity is D; initializing a network learning rate alpha and an attenuation coefficient gamma;
(7b) Based on the current state-action pair (s t ,a t ),s t A is the system state at time slot t t For the task machine to perform action on the time slot t, the estimated value of the Q value is output, namely Q predicted (s t ,a t ;θ 1 ) Wherein θ is 1 Network neural parameters are estimated values; the target value network generates for the next state selection actionWherein s is t+1 An action value, a, inputted for a target value network t+1 For the action value entered for the target network, a next state-action pair(s) of the target network is then determined t+1 ,a t+1 ) Q value of +.>The target value for the Double DQN algorithm is defined as:
(7c) According to the current state s t Executing action a t And selecting actions according to the improved Double DQN algorithm, and converting to a new state s t+1 According to the reward function r t Calculation ofSingle step prize value, storing the translations in an experience playback pool (s t ,a t ,r t ,s t+1 );
Circularly executing the steps (7 b) to (7 c) until the number of memory stores in the experience playback pool is equal to D and the step (7D) is entered;
the Double DQN algorithm is provided with two neural network structures, namely an estimated value network and a target value network, and training is carried out once every step by giving a training step length step;
defining the total prize value in each training round as R, the prize value obtained by executing a single flight path is expressed as:
the improved Double DQN algorithm is to improve greedy coefficient epsilon selected by actions, ensure actions obtaining maximum rewarding values in each step according to optimization measures, select the current optimal solution with probability of epsilon as a value in the execution process of epsilon-greedy strategies, and search other actions with probability of remaining 1-epsilon, wherein the numerical variation expression of epsilon coefficients is as follows:
wherein, epicode is the current number of flight rounds, K is the maximum number of flight rounds reached when ε=1;
randomly extracting z experiences from an experience playback pool with the capacity of D to form an offline learning training data set; the maximum task round number is F;
(7d) Randomly extracting z memories from the experience playback pool, z=32; the kth state transition sequence in the small batch of data is noted as (s k ,a k ,r k ,s k+1 ),k=1,2,3…z;
(7e) According to step (7 d) (s k ,a k ,r k ,s k+1 ),k=1,2,3, … z, calculating a target Q valueAnd a loss value L (θ) 1 ) Let->The loss function is expressed as follows:
(7f) Minimizing loss function L (θ) by gradient descent 1 ) The expression is as follows:
in the method, in the process of the invention,is a partial guide symbol>Representing a minimization error function L (θ) 1 ) For theta 1 Deviation-inducing and-> Representing the estimated value network input as s t 、a t The square value of the difference between the Q value calculated at that time and the Q value of the target value network versus the estimated value network neural parameter theta 1 Deviation-inducing and->To update the estimated value network neural parameter, the target value network neural parameter theta is obtained at intervals of step steps 2 Replaced by theta 1
When the step (7 d) to the step (7F) are finished once, the single training is finished, the step (7 b) to the step (7F) are repeatedly executed in the execution process of each round of flight task, the training is finished after F rounds are finished, and the learning process of the Double DQN algorithm is finished;
(7g) And (3) after the training algorithm is finished, saving the optimal path strategy pi * And recording the rewarding value, the flight steps and the data uploading amount: in F training rounds, the estimated value network neural parameter θ 1 And target value network neural parameter θ 2 Updating towards the direction of maximizing the total rewarding value R, and finally finding the optimal path strategy pi *
CN202311129223.3A 2023-09-04 2023-09-04 DRL-based wireless chargeable unmanned aerial vehicle data uploading path optimization method Pending CN117062182A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311129223.3A CN117062182A (en) 2023-09-04 2023-09-04 DRL-based wireless chargeable unmanned aerial vehicle data uploading path optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311129223.3A CN117062182A (en) 2023-09-04 2023-09-04 DRL-based wireless chargeable unmanned aerial vehicle data uploading path optimization method

Publications (1)

Publication Number Publication Date
CN117062182A true CN117062182A (en) 2023-11-14

Family

ID=88655344

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311129223.3A Pending CN117062182A (en) 2023-09-04 2023-09-04 DRL-based wireless chargeable unmanned aerial vehicle data uploading path optimization method

Country Status (1)

Country Link
CN (1) CN117062182A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117295096A (en) * 2023-11-24 2023-12-26 武汉市豪迈电力自动化技术有限责任公司 Smart electric meter data transmission method and system based on 5G short sharing

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117295096A (en) * 2023-11-24 2023-12-26 武汉市豪迈电力自动化技术有限责任公司 Smart electric meter data transmission method and system based on 5G short sharing
CN117295096B (en) * 2023-11-24 2024-02-09 武汉市豪迈电力自动化技术有限责任公司 Smart electric meter data transmission method and system based on 5G short sharing

Similar Documents

Publication Publication Date Title
CN108966286B (en) Unmanned aerial vehicle-assisted mobile edge computing system and information bit distribution method thereof
CN117062182A (en) DRL-based wireless chargeable unmanned aerial vehicle data uploading path optimization method
CN108834049B (en) Wireless energy supply communication network and method and device for determining working state of wireless energy supply communication network
CN113395654A (en) Method for task unloading and resource allocation of multiple unmanned aerial vehicles of edge computing system
CN111628855A (en) Industrial 5G dynamic multi-priority multi-access method based on deep reinforcement learning
CN109839955B (en) Trajectory optimization method for wireless communication between unmanned aerial vehicle and multiple ground terminals
CN114142908B (en) Multi-unmanned aerial vehicle communication resource allocation method for coverage reconnaissance task
CN112902969A (en) Path planning method for unmanned aerial vehicle in data collection process
CN112422171A (en) Intelligent resource joint scheduling method under uncertain environment remote sensing satellite network
Peng et al. Long-lasting UAV-aided RIS communications based on SWIPT
CN115499921A (en) Three-dimensional trajectory design and resource scheduling optimization method for complex unmanned aerial vehicle network
CN116700343A (en) Unmanned aerial vehicle path planning method, unmanned aerial vehicle path planning equipment and storage medium
CN115065678A (en) Multi-intelligent-device task unloading decision method based on deep reinforcement learning
CN113377131A (en) Method for obtaining unmanned aerial vehicle collected data track by using reinforcement learning
CN113255218B (en) Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network
CN115494732A (en) Unmanned aerial vehicle trajectory design and power distribution method based on near-end strategy optimization
Zhang et al. Multi-objective optimization for UAV-enabled wireless powered IoT networks: an LSTM-based deep reinforcement learning approach
Shi et al. Age of information optimization with heterogeneous uavs based on deep reinforcement learning
CN116205390A (en) Multi-unmanned aerial vehicle data collection method and system based on federal reinforcement learning
CN117236561A (en) SAC-based multi-unmanned aerial vehicle auxiliary mobile edge computing method, device and storage medium
CN116896777A (en) Unmanned aerial vehicle group general sense one-body energy optimization method based on reinforcement learning
CN115278905B (en) Multi-node communication opportunity determination method for unmanned aerial vehicle network transmission
CN108738045B (en) Moving edge calculation rate maximization method based on depth certainty strategy gradient
CN106712813B (en) MIMO relay selection method for selecting antenna based on network life cycle threshold
CN115119174A (en) Unmanned aerial vehicle autonomous deployment method based on energy consumption optimization in irrigation area scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination