CN114268986A

CN114268986A - Unmanned aerial vehicle computing unloading and charging service efficiency optimization method

Info

Publication number: CN114268986A
Application number: CN202111529547.7A
Authority: CN
Inventors: 丁文锐; 王晨晨; 罗祎喆; 王玉峰
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2022-04-01

Abstract

The invention discloses an unmanned aerial vehicle computing unloading and charging service efficiency optimization method facing to the Internet of things, which comprises the following steps: a reinforcement learning network is constructed in advance; sensing a current environmental state; determining the action of the unmanned aerial vehicle through a reinforcement learning network based on the perception state data; and determining the flight path of the unmanned aerial vehicle and the decision of calculating unloading service and charging service of the sensor of the Internet of things based on the action of the unmanned aerial vehicle. According to the invention, the unmanned aerial vehicle can be used as a mobile edge server to provide calculation unloading service, and can also be used as a wireless charging device to charge the Internet of things device, and a deep reinforcement learning algorithm is used for unmanned aerial vehicle track planning and user service decision by modeling the position of the Internet of things sensor, the position of the unmanned aerial vehicle, the residual electric quantity of the Internet of things sensor, task information of the Internet of things sensor and the channel state between the unmanned aerial vehicle and the Internet of things sensor, so that the calculation complexity is low, and the calculation capability and the working time of the Internet of things system are effectively improved.

Description

Unmanned aerial vehicle computing unloading and charging service efficiency optimization method

Technical Field

The invention relates to the field of mobile edge calculation and the field of wireless energy transmission, in particular to the field of a neural network-based deep reinforcement learning algorithm, and specifically relates to an unmanned aerial vehicle calculation unloading and charging service efficiency optimization method.

Background

In recent years, with the development of the technology of the internet of things, various emerging applications of the internet of things emerge endlessly, however, many devices of the internet of things have limited or even no computing power, so the computing requirements are high, and the working time is limited due to the limited battery energy. The mobile edge computing can effectively meet the flow requirement of the Internet of things equipment in a task unloading mode by deploying computing resources near the Internet of things equipment, and reduces task delay. The unmanned aerial vehicle can be used as a mobile edge server to achieve task unloading of the Internet of things equipment due to high mobility of the unmanned aerial vehicle, and in addition, the unmanned aerial vehicle can also be used as wireless charging equipment to charge the Internet of things equipment, but the energy supply of the unmanned aerial vehicle is limited, so that how to design a flight path planning and user service strategy of the unmanned aerial vehicle is crucial to achieving maximized system efficiency.

Disclosure of Invention

Aiming at the problems, the invention provides a method for optimizing the efficiency of the calculation unloading and charging service of the unmanned aerial vehicle, which maximizes the long-term reward of the system under the condition of the energy constraint of the unmanned aerial vehicle, and reasonably plans the flight path and the user service mechanism of the unmanned aerial vehicle by reasonably setting a reward function, thereby maximizing the system efficiency.

The invention relates to an unmanned aerial vehicle computing unloading and charging service efficiency optimization method, which comprises the following specific steps:

step 1: and constructing a deep reinforcement learning network based on the DQN.

Step 2: and (3) sensing the current environment state information, and determining executable actions of the unmanned aerial vehicle through the deep reinforcement learning network constructed in the step 1.

And step 3: based on the current environment state sensing information, the values of all actions in the state are predicted through a deep reinforcement learning network, the action with the maximum value is selected, and then the flight target position of the unmanned aerial vehicle, the unmanned aerial vehicle service target and the service type decision of the Internet of things equipment are determined, wherein the service type decision comprises the calculation of unloading service and charging service, and meanwhile, the energy consumption of the unmanned aerial vehicle is calculated.

And 4, step 4: and calculating a reward function, and iteratively updating the DQN deep reinforcement learning network.

The invention has the advantages that:

according to the unmanned aerial vehicle computing unloading and charging service efficiency optimization method, long-term reward of a system is maximized under the condition of energy constraint of the unmanned aerial vehicle, the flight path and user service mechanism of the unmanned aerial vehicle are reasonably planned by reasonably setting a reward function, the service process is modeled as a Markov decision process, an intelligent decision method based on deep reinforcement learning is designed, and the flight path, user allocation and service type of the unmanned aerial vehicle are optimized in a combined manner, so that the unmanned aerial vehicle can make a decision quickly according to environmental state information, the problem of high computation amount of a traditional algorithm is solved, and the decision cost is reduced.

Drawings

Fig. 1 is a flowchart of a method for optimizing efficiency of a calculation offloading and charging service of an unmanned aerial vehicle according to the present invention;

fig. 2 is a diagram of a system model applied in the method for optimizing efficiency of calculation offloading and charging service for an unmanned aerial vehicle according to the present invention;

fig. 3 is a simulation diagram of cumulative rewards as a function of iteration number in an embodiment of the method for calculating the efficiency optimization of the offloading and charging services by the unmanned aerial vehicle of the invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

The invention discloses an unmanned aerial vehicle computing unloading and charging service efficiency optimization method, which comprises the following steps as shown in figure 1:

step 1: and constructing a deep reinforcement learning network based on the DQN, and initializing parameters of the DQN neural network.

(1) And (3) constructing a DQN algorithm, namely a neural network structure including a real network and an estimation network, wherein the neural network structure and the real network structure are the same but have different parameters, the input is the current environmental state and the action taken by the unmanned aerial vehicle, and the output is the value of the environmental state-action pair.

(2) And constructing a memory base to store data generated by interaction between the unmanned aerial vehicle and the environment, wherein the data comprises environment state information data, unmanned aerial vehicle action data and the like generated by each interaction, and randomly extracting a part of data in the memory base for updating when updating the estimated network parameters in the DQN algorithm every time so as to break the correlation and unsteady distribution problems among the data.

Step 2: and (3) sensing the current environment state information, and determining the action of the unmanned aerial vehicle through the deep reinforcement learning network constructed in the step (1), wherein the action comprises unmanned aerial vehicle flight target position selection, service target selection and service type selection.

Sensing current position information of each Internet of things device, current position information of the unmanned aerial vehicle, current residual power information of each Internet of things device, current residual power information of the unmanned aerial vehicle, calculation task size information (namely the number of data bits required to be processed) of each Internet of things device, and channel state information between the unmanned aerial vehicle and each Internet of things device by using an unmanned aerial vehicle airborne sensor as state information. The environmental state information data is expressed in the form of the following matrix:

wherein S (t) represents environmental status information at time t,

respectively representing the positions of N pieces of Internet of things equipment at the time t; l^V(t) represents the position of the drone at time t;

respectively representing the residual electric quantity of N pieces of Internet of things equipment at the time t; e^V(t) represents the remaining capacity of the unmanned aerial vehicle at the moment t;

respectively representing the task sizes of N pieces of Internet of things equipment at t moment, wherein the unit is bit;

respectively represent the channel gain between N thing networking device and the unmanned aerial vehicle at moment t.

Based on the perception state data, determining the action of the unmanned aerial vehicle through a deep reinforcement learning network, and the method comprises the following steps: unmanned aerial vehicle flight target position selection, service target selection and service type selection.

The motion data is expressed in matrix form:

A(t)＝[a_n(t),a_m(t),a_T(t)]

wherein A (t) represents the action executed by the unmanned aerial vehicle at the time t, a_n(t) Internet of things equipment for unmanned aerial vehicle service selection at time t, a_n(t)∈{1,2,…,N}；a_m(t) represents the position to which the unmanned aerial vehicle flies at time t, a_m(t) is belonged to {1,2, …, M }, and represents flying to a certain access point in M access points which are set in advance; a is_T(t) indicates the type of service selected by the drone at time t, a_T(t)∈{0,1}，a_T(t) ═ 0 denotes that the drone provides computation offload service, a_TIf (t) ═ 1 indicates that the drone provides the charging service, the drone has M × 2 × 2 executable actions in total.

Wherein, unmanned aerial vehicle energy consumption includes 4 kinds of energy consumption, as follows:

(1) unmanned aerial vehicle flight energy consumption: the energy consumption of the horizontal flight of the unmanned aerial vehicle is considered, and the constant horizontal flight speed V of the unmanned aerial vehicle is considered_fFlying in the air with the height h and the flying power P_fThe flight energy consumption of the unmanned aerial vehicle at the moment t is related to the current moment position and the next moment position, and the position state of the unmanned aerial vehicle under the three-dimensional coordinate system at the moment t is l^V(t)＝[x^V(t),y^V(t),h]Then unmanned aerial vehicle flight energy consumption does:

(2) unmanned aerial vehicle energy consumption that spirals: the method refers to the hovering energy consumption of the unmanned aerial vehicle for providing calculation unloading service or charging service for the Internet of things equipment at the moment t, and the hovering power of the unmanned aerial vehicle is considered to be constant P_hWhen the unmanned aerial vehicle provides calculation unloading service, the channel is considered as a direct-view channel, and the position state of the Internet of things equipment n

The channel gain and the data transmission rate between the unmanned aerial vehicle and the internet of things device n are respectively as follows:

where B is the channel bandwidth, p₀For channel power gain at a reference distance of 1 meter, P_nIs the fixed transmission power, sigma, of the Internet of things equipment n²And the power is gaussian white noise, the rotating energy consumption when the unmanned aerial vehicle provides the calculation unloading service is as follows:

wherein the content of the first and second substances,

the unit represents the task size of thing networking device n at moment t, and when unmanned aerial vehicle provided charging service, charging power was:

wherein, P₀Emission power, beta, of unmanned aerial vehicle when providing charging service₀For energy efficiency, beta₀∈(0,1)。

Consider that the complete electric quantity of the equipment of the Internet of things is E_bThen the energy consumption of hovering of unmanned aerial vehicle when providing charging service does:

in the formula (I), the compound is shown in the specification,

the residual capacity of the internet of things equipment n at the moment t.

(3) Calculating energy consumption by the unmanned aerial vehicle: at time t, consider the effective capacitance to be γ_cThe number of revolutions required by the CPU of the unmanned aerial vehicle for processing 1 bit of data is C, and the frequency of the CPU of the unmanned aerial vehicle is f_cThen unmanned aerial vehicle calculates the energy consumption and is:

(4) the charging power consumption of the unmanned aerial vehicle is as follows:

therefore, the total energy consumption of the unmanned aerial vehicle is as follows:

W(t)＝e_f(t)+(1-a_T(t))(e_h-compute(t)+e_compute(t))+a_T(t)(e_h-charge(t)+e_charge(t))

then the remaining energy of the unmanned aerial vehicle is: e^V(t+1)＝E^V(t) -W (t). The residual energy of the unmanned aerial vehicle is used as a part of environment state information sensed by the unmanned aerial vehicle, has strong correlation with the reward function, and is used for assisting the deep reinforcement learning network to more accurately fit the state-action value.

And 4, step 4: deep reinforcement learning network for calculating reward function and iteratively updating DQN

The unmanned aerial vehicle processing the calculation tasks unloaded by the internet of things equipment can obtain positive rewards as follows:

wherein the content of the first and second substances,

for use in

Normalization of (1);

the unmanned aerial vehicle also obtains positive rewards for charging the internet of things equipment as follows:

combining the two positive reward acquisition methods, a reward function can be defined as:

wherein r is_penaltyThe penalty item when the electric quantity of the Internet of things equipment is exhausted in the working process of the unmanned aerial vehicle is represented; in addition, if the remaining electric quantity of the unmanned aerial vehicle is lower than a threshold b in any decision period, the next state is a termination state, and the unmanned aerial vehicle finishes the service and returns; when the unmanned aerial vehicle returns to the way and the energy exhaustion condition appears, the task is not completed, and the foot is added after the reward functionPenalty term r large enough_penaltyFor example, when the number of the devices of the internet of things is N, r_penaltyCan be set to a value greater than or equal to 2N; when the situation that the energy of the Internet of things equipment is exhausted occurs in the service process of the unmanned aerial vehicle, a punishment item with a preset numerical value is added behind the reward function.

At each time t, the unmanned aerial vehicle is in a certain environmental state S (t), a state-action value (Q value) exists for any executable action a, and the current decision is to select the action with the maximum corresponding Q value, namely the action with the maximum corresponding Q value

A(t)←max_a Q(S(t),a)

After determining the action, the unmanned aerial vehicle executes the action to enter a next state S (t +1) and obtains a reward R (t +1), and meanwhile, updates a Q value corresponding to the action to update the neural network:

Q(S(t),A(t))←Q(S(t),A(t))+α[R(t+1)+γmax_aQ(S(t+1,a))-Q(S(t),A(t))]

with the training of the neural network, the state-action value Q gradually converges, the unmanned aerial vehicle selects the action corresponding to the maximum Q value according to each state, and if the average reward of 100 continuous rounds (experiments for 100 times) is greater than a preset value (set by self according to actual requirements), the optimal strategy of the flight trajectory and the user service of the unmanned aerial vehicle can be finally obtained.

The following further describes the embodiments of the present invention with reference to a method flowchart and a system model diagram.

In the system operation, firstly, the DQN neural network parameters are initialized, the environment state is initialized, actions are selected by a greedy strategy, the actions are executed, namely, the unmanned aerial vehicle flies to a target position to complete service on the selected target, after the decision is completed, the unmanned aerial vehicle can enter the next state and obtain rewards, state transfer data are stored in a memory bank, the data are selected from the memory bank to train the neural network to fit a state-action value function after the memory bank is full, and when the neural network is finally converged, the DQN network can be used for guiding the unmanned aerial vehicle to make the decision to obtain the optimal system efficiency. The algorithm overall flow is given below:

example (b):

in this example, an area with a size of 500m × 500m is considered, an unmanned aerial vehicle provides computation, unloading and charging services for internet of things equipment in the area in the air with a height of 5m, as shown in fig. 2, 32 coordinates are randomly generated in the area as positions of the internet of things equipment and as a part of an access point, another 32 coordinates are generated as a part of the access point, task information of the internet of things equipment is randomly generated, electric quantity of the internet of things equipment is randomly generated within a range of 0.05-0.2J, energy consumption caused by data transmission of the internet of things equipment is only considered in a task process, and specific parameter values of a system model are shown in the following table 1:

TABLE 1 System model parameters

The system model simulation environment adopts Python language, the open AI-based gym module is developed, the learning algorithm adopts DQN algorithm in an open-source enhanced learning algorithm library Stable bases of OpenAI, the discount factor gamma of the algorithm is 0.99, the learning rate is 0.0001, the capacity of the memory library is 10000, the total iteration frequency is 2.5 multiplied by 10⁵The network structure is 163 × 256 × 256 × 128, and the remaining parameters take default values.

Fig. 3 is a simulation diagram of a system performance function, i.e., reward, changing with iteration times in the example of the present invention, and it can be seen from the diagram that accumulated experience is immediately explored in the initial stage of the algorithm, the obtained reward is small, the obtained reward is gradually increased with the increase of the iteration times, the system performance is gradually improved, and because the reward belongs to a greedy policy in the training process, certain fluctuation still exists in each round, but the algorithm tends to converge as a whole. Under the used network structure, the parameter number is 140672, the calculation amount is 280064, the calculation amount comprises 140032 times of multiplication operation and 140032 times of addition operation, the model is small, and the calculation speed is high.

Claims

1. An unmanned aerial vehicle computing unloading and charging service efficiency optimization method is characterized by comprising the following steps: the design method comprises the following specific steps:

step 1: constructing a deep reinforcement learning network based on DQN;

step 2: sensing the current environment state information, and determining executable actions of the unmanned aerial vehicle through the deep reinforcement learning network constructed in the step 1;

and step 3: predicting values of all actions in the state through a deep reinforcement learning network based on the perceived current environment state information, selecting the action with the maximum value, and further determining a flight target position of the unmanned aerial vehicle, an unmanned aerial vehicle service target and a service type decision of the Internet of things equipment, wherein the service type decision comprises an unloading service and a charging service, and meanwhile, the energy consumption of the unmanned aerial vehicle is calculated;

2. The method of claim 1, wherein the method for optimizing the efficiency of the unmanned aerial vehicle computing offloading and charging service comprises: in the step 1, a DQN algorithm is constructed, wherein the DQN algorithm comprises a neural network structure including a real network and an estimation network, the input of the neural network structure and the input of the real network structure are both the current environmental state and the action taken by the unmanned aerial vehicle, and the output is the value of an environmental state-action pair; meanwhile, a memory base is constructed to store data generated by interaction between the unmanned aerial vehicle and the environment, wherein the data comprises environment state information data, unmanned aerial vehicle action data and the like generated by each interaction.

3. The method of claim 1, wherein the method for optimizing the efficiency of the unmanned aerial vehicle computing offloading and charging service comprises: in the step 2: the current environment state information comprises current position information of each Internet of things device, current position information of the unmanned aerial vehicle, current residual power information of each Internet of things device, current residual power information of the unmanned aerial vehicle, calculation task size information of each Internet of things device, and channel state information between the unmanned aerial vehicle and each Internet of things device as state information;

the executable actions of the drone include: unmanned aerial vehicle flight target position selection, service target selection and service type selection.

4. The method of claim 3, wherein the method for optimizing the efficiency of the unmanned aerial vehicle computing offloading and charging service comprises: in the step 2: the environmental state information data is expressed in the form of a matrix as follows:

wherein S (t) represents environmental status information at time t,

respectively representing channel gains between the N pieces of Internet of things equipment and the unmanned aerial vehicle at the time t;

the executable action data of the unmanned aerial vehicle is expressed in a matrix form:

A(t)＝[a_n(t),a_m(t),a_T(t)]

wherein A (t) represents the action executed by the unmanned aerial vehicle at the time t, a_n(t) Internet of things equipment for unmanned aerial vehicle service selection at time t, a_n(t)∈{1,2,…,N}；a_m(t) represents the position to which the unmanned aerial vehicle flies at time t, a_m(t) is belonged to {1,2, …, M }, and represents flying to a certain access point in M access points which are set in advance; a is_T(t) indicates the type of service selected by the drone at time t, a_T(t)∈{0,1}，a_T(t) ═ 0 denotes that the drone provides computation offload service, a_TAnd (t) 1 represents that the unmanned aerial vehicle provides the charging service.

5. The method of claim 1, wherein the method for optimizing the efficiency of the unmanned aerial vehicle computing offloading and charging service comprises: in the step 3: unmanned aerial vehicle energy consumption includes 4 kinds of energy consumption, as follows:

wherein the content of the first and second substances,

wherein, P₀Emission power, beta, of unmanned aerial vehicle when providing charging service₀For energy efficiency, beta₀∈(0,1)；

Consider that the complete electric quantity of the equipment of the Internet of things is E_b，

For the surplus capacity of thing networking device n at time t, then the energy consumption of hovering of unmanned aerial vehicle when providing charging service does:

the remaining energy of the unmanned aerial vehicle is as follows: e^V(t+1)＝E^V(t)-W(t)。

6. The method of claim 1, wherein the method for optimizing the efficiency of the unmanned aerial vehicle computing offloading and charging service comprises: in the step 4: when the unmanned aerial vehicle processes the calculation task unloaded by the internet of things equipment and the unmanned aerial vehicle charges the internet of things equipment, positive rewards are obtained, and reward functions are defined as follows:

in the formula (I), the compound is shown in the specification,

the drone processes the positive rewards earned by the computing tasks offloaded by the internet of things device,

representing a positive reward obtained by the unmanned aerial vehicle for charging the internet device; a is_T(t) represents the service type selected by the unmanned aerial vehicle at the moment t; r is_penaltyAnd a punishment item when the electric quantity of the Internet of things equipment is exhausted in the working process of the unmanned aerial vehicle is represented.

7. The method of optimizing computing offloading and charging service performance for unmanned aerial vehicle as claimed in claims 1-6, wherein: in the step 4: the application mode of the reward function is as follows: if the residual electric quantity of the unmanned aerial vehicle is lower than the threshold b in any decision period, the next state is a termination state, and the unmanned aerial vehicle finishes the service and returns; when the unmanned aerial vehicle returns to the way and the situation of energy exhaustion occurs, the task is not completed, and a penalty term r is added after the reward function_penalty(ii) a When the situation that the energy of the Internet of things equipment is exhausted occurs in the service process of the unmanned aerial vehicle, a punishment item with a preset numerical value is added behind the reward function.

8. The method of optimizing computing offloading and charging service performance for unmanned aerial vehicle as claimed in claims 1-6, wherein: in the step 4: the iterative updating method of the DQN deep reinforcement learning network comprises the following steps:

after the action is determined, the unmanned aerial vehicle executes the action to enter the next state, obtains a reward value and updates the value corresponding to the action; with the training of the neural network, the values of all the states and actions are gradually converged, the unmanned aerial vehicle selects the action corresponding to the maximum Q value according to each state, and finally the optimal strategy of the flight trajectory of the unmanned aerial vehicle and the user service is obtained.