CN114169234A

CN114169234A - A scheduling optimization method and system for UAV-assisted mobile edge computing

Info

Publication number: CN114169234A
Application number: CN202111449863.3A
Authority: CN
Inventors: 张广驰; 何梓楠; 崔苗; 刘圣海; 王日明; 王昆
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2022-03-11
Anticipated expiration: 2041-11-30
Also published as: CN114169234B

Abstract

The invention discloses a scheduling optimization method and system for UAV-assisted mobile edge computing, and relates to the technical field of UAV mobile edge computing. The method includes constructing a mobile edge computing system for a UAV and several user equipment and calculate the energy consumption for completing each computing task; with the goal of minimizing the average energy consumption of user equipment, an optimization problem of joint UAV trajectory and user equipment scheduling is established, and it is transformed into a Markov decision process , define the state space, action space and reward function of the mobile edge computing system offloading model, which is used to train the deep neural network based on the SAC algorithm. The trained deep neural network can be used for scheduling optimization, and the optimal scheduling strategy can be obtained. The continuous action of the UAV is obtained, and the reasonable and accurate flight trajectory and the selection strategy of the user equipment are obtained. The complexity is low, the convergence is strong, and the average computing energy consumption of the user equipment is reduced.

Description

Scheduling optimization method and system for unmanned aerial vehicle-assisted mobile edge calculation

Technical Field

The invention relates to the technical field of unmanned aerial vehicle mobile edge calculation, in particular to a scheduling optimization method and system for unmanned aerial vehicle-assisted mobile edge calculation.

Background

The rapid development of the internet of things computing promotes the popularization of computing task intensive intelligent equipment, such as automatic driving, virtual reality and the like, so that the life is more convenient and faster. At this stage, although mobile devices are equipped with powerful hardware, significant power consumption is still required to complete the computing task of mobile applications while meeting the requirement of low latency. In recent years, Mobile Edge Computing (MEC) has been proposed to overcome this drawback, i.e. the computing task of the user equipment is transferred to the network edge for computing, which greatly reduces the energy consumption of the equipment. Recently, the use of the MEC carried by the unmanned aerial vehicle is widely discussed in the industrial and academic fields, and the coverage capability and mobility of the unmanned aerial vehicle are utilized to achieve lower delay requirements, provide more flexible computing services and reduce cost. Unmanned aerial vehicle carries on MEC has following problem: (1) how to select proper device association, i.e. the user device selects to offload or locally process the computing task, so as to reduce the long-term energy consumption of all user devices as much as possible; (2) considering how the user equipment calculates different tasks and controls the flight trajectory, i.e. flight direction and distance, of the unmanned aerial vehicle in real time, especially in the case that the unmanned aerial vehicle needs to reach a specific end point. For the combined use method of the MEC and the unmanned aerial vehicle, a plurality of scholars have already made relevant researches. Because the strategy space of the unmanned aerial vehicle, namely the optimal track, is a continuous space, the traditional exhaustive search method is difficult to solve. Some scholars propose a quantitative dynamic planning algorithm to solve the resource allocation problem of the MEC, and the complexity of the algorithm is very high because the flight selection of the unmanned aerial vehicle is almost infinite; or the unmanned aerial vehicle track is discretized into an unmanned aerial vehicle position sequence, and a continuous space is converted into a discrete finite space, so that the problem can be treated, the unmanned aerial vehicle track is approximated through a discrete variable, and the optimization is performed through a traditional convex optimization method, but the method can reduce the control precision of the unmanned aerial vehicle, and the optimal control strategy cannot be obtained.

The prior art discloses a mobile edge calculation method, a device and a system of the internet of things, which comprise the following steps: distributing the unmanned aerial vehicles for the Internet of things equipment based on the current simulated positions of the unmanned aerial vehicles and the actual positions of the Internet of things equipment in the target Internet of things area; simulating to unload the tasks of the Internet of things equipment to the distributed unmanned aerial vehicles, and simulating each unmanned aerial vehicle to schedule the received tasks based on a deep reinforcement learning algorithm; iteratively updating the current simulation position of each unmanned aerial vehicle by using a differential evolution algorithm, and continuing to execute the operation until the iterative updating times reach a preset threshold value; determining the optimal coordinate position of each unmanned aerial vehicle based on the unmanned aerial vehicle distributed by the Internet of things equipment in each operation, the task scheduling result of the unmanned aerial vehicle and the current simulation position of the unmanned aerial vehicle; and triggering each unmanned aerial vehicle to move to the optimal coordinate position of the unmanned aerial vehicle and scheduling tasks on the corresponding Internet of things equipment. The method comprises the steps that tasks of the Internet of things equipment are unloaded and distributed to the unmanned aerial vehicle, and the tasks are scheduled under the condition that the load of the unmanned aerial vehicle is balanced; only the load balance of the unmanned aerial vehicle is considered, the unmanned aerial vehicle track and the user equipment scheduling strategy are not jointly considered, the track planning is unreasonable, and the computing energy consumption is high.

Disclosure of Invention

The invention aims to overcome the defect that the prior art can not plan the continuous action of the unmanned aerial vehicle and obtain an accurate scheduling strategy, and provides a scheduling optimization method and a scheduling optimization system for the unmanned aerial vehicle auxiliary mobile edge calculation.

In order to solve the technical problems, the technical scheme of the invention is as follows:

the invention provides a scheduling optimization method for unmanned aerial vehicle-assisted mobile edge calculation, which comprises the following steps:

s1: constructing an unloading model of the mobile edge computing system, wherein the model comprises an unmanned aerial vehicle and a plurality of user devices;

s2: obtaining the energy consumption of the calculation task according to the unloading model of the mobile edge calculation system;

s3: establishing an optimization problem combining unmanned aerial vehicle track and user equipment scheduling by taking average energy consumption minimization of the user equipment as a target;

s4: converting the optimization problem into a Markov decision process, and defining a state space, an action space and a return function of an unloading model of the mobile edge computing system;

s5: constructing a deep neural network based on a SAC algorithm, and training the deep neural network by using a state space, an action space and a return function to obtain a trained deep neural network;

s6: and carrying out scheduling optimization by using the trained deep neural network to obtain an optimal scheduling strategy, namely a selection strategy of the flight path of the unmanned aerial vehicle and the user equipment.

The SAC algorithm is an off-line random strategy algorithm based on a maximum entropy reinforcement learning framework and an Actor-Critic network, and is mainly characterized by entropy regularization, wherein entropy is a measure of strategy randomness, and the increase of entropy can bring more strategy exploration, and the expected return and the entropy value are balanced through training strategies, so that the network learning speed can be accelerated, and meanwhile, the strategy convergence to a local optimal solution is avoided; the purpose of the Actor network is to obtain the maximum return expectation and the maximum entropy, i.e. explore other strategies in the strategy space while successfully completing the task; the combination of the network update in an offline mode and the Actor-Critic network achieves good performance on a continuous control reference task, and is more stable and better in convergence.

Preferably, in step S1, the unloading model of the mobile edge computing system is specifically:

unloading module for mobile edge computing systemThe model comprises a single unmanned aerial vehicle and N user devices, wherein the unmanned aerial vehicle simultaneously serves K user devices at most, and each user device selects to calculate a calculation task from local calculation or unload the calculation task to the unmanned aerial vehicle calculation; the length and the width of the flight area of the unmanned aerial vehicle are set to be X respectively_maxAnd Y_maxThe unmanned aerial vehicle flies at a constant speed v (t) at a fixed height h, the antenna emission angle is theta, and the maximum flying speed is v_max(ii) a The flight time of the unmanned aerial vehicle is T time slots, the length of each time slot is tau, and the time for completing the calculation task at any moment cannot exceed the maximum time delay T_max；

Let the coordinates of the unmanned aerial vehicle be [ X (t), Y (t), h]The coordinates of the user equipment are [ x ]_i(t)，y_i(t)，0]I ∈ {1, 2, …, N }; setting the flight distance and the horizontal direction angle of the unmanned aerial vehicle at the moment t as d (t) and theta respectively^h(t), then X (t) X (t-1) + d (t) cos (θ)^h(t))，Y(t)＝Y(t-1)+d(t)sin(θ^h(t)); the maximum coverage of the unmanned aerial vehicle is R_maxH · tan (θ), and a flying speed of

Defining the calculation task at the moment t as follows:

I_i(t)＝{D_i(t)，F_i(t)}

in the formula, D_i(t) represents the amount of data transferred when selecting to offload a computing task at time t of computing, F_i(t) represents the computational power required to complete the computational task at time t;

definition of alpha_i(t) E {0, 1} represents a selection policy of the user equipment, α_iWhen (t) is 0, the calculation task local calculation at the time t is shown, alpha_iWhen (t) is 1, it indicates that the calculation task at time t is unloaded.

Preferably, in step S2, obtaining the energy consumed by the computing task according to the unloading model of the mobile edge computing system includes:

user equipment selection to offload computation, i.e. alpha_i(t) ═ 1; this moment the distance on this user equipment and unmanned aerial vehicle's the horizontal plane does:

the user equipment is provided with a single antenna, and in order to avoid interference between the user equipment, a frequency division multiple access protocol unloading mode is adopted; because the flying height of the unmanned aerial vehicle is certain, and a free space channel model is adopted, the uplink rate during unloading calculation is as follows:

wherein B represents the average bandwidth of the communication channel, P^TrRepresenting the transmission power of the user equipment data unloading, and rho represents a transmission power coefficient;

the time overhead for the user equipment to transmit the calculation task is as follows:

the time overhead of processing the calculation task by the unmanned aerial vehicle is as follows:

in the formula (f)^U(t) represents the computational power of the drone;

the total time overhead for the ue to choose to offload the computation is:

the energy consumption of the user equipment for selecting the uninstalling calculation is as follows:

in the formula (I), the compound is shown in the specification,

indicating that the ith user equipment chooses to offload the calculated energy consumption.

Preferably, in step S2, the obtaining the energy consumed by the computing task according to the unloading model of the mobile edge computing system further includes:

the user equipment selecting local calculation, i.e. alpha_i(t)＝0；

The time overhead of the user equipment for processing the computing task is as follows:

in the formula (I), the compound is shown in the specification,

representing the computing power of the user device;

setting power consumption of a user equipment to

The user equipment selects the locally calculated energy consumption as:

in the formula, k_iIs a first constant, v_iIs a second constant.

Preferably, in step S3, with the objective of minimizing the average energy consumption of the user equipment, an optimization problem combining the trajectory of the drone and the scheduling of the user equipment is established, specifically:

defining a set of flight actions

User equipment scheduling policy set

Then optimize the problemP is represented as:

wherein E is_i(t) represents the energy consumption of the user equipment, when α_iWhen the value (t) is 1,

when alpha is_iWhen (t) is 0, the reaction is carried out,

representing a constraint on an unmanned aerial vehicle to serve at most K user equipments, alpha, simultaneously_i(t)S_i(t)≤R_maxThe user equipment representing the constraint selection offload computation is in the maximum coverage of the drone.

Preferably, in step S4, the state space and the action space of the designed unloading model of the moving edge computing system are specifically:

in an unloading model of the mobile edge computing system, an unmanned aerial vehicle and user equipment are equivalent to an intelligent agent, and in each time slot, the intelligent agent observes and obtains a current state s (t) from an environment, the current state s (t) corresponds to a current action a (t), the unmanned aerial vehicle executes the current action a (t) in an action space, interacts with the environment, and returns a current return r (t) and a new state s (t + 1);

for the state space, in each time slot, the position of the user equipment is fixed, and only the position information of the unmanned aerial vehicle needs to be considered; and when each flight cycle is finished, the unmanned aerial vehicle needs to arrive at a specific destination, and the distance between the unmanned aerial vehicle and the specific destination is set as d '(t), the current state expression in the state space is s (t) ═ x (t), y (t), h, d' (t) };

for the action space, according to the flight distance d (t) of the unmanned aerial vehicle and the horizontal direction angle theta^h(t), calculating the position coordinates [ X (t +1), Y (t +1), h of the unmanned aerial vehicle at the next moment]And the selection policy of the user equipment, then in the action space, the current action expression is a (t) ═ θ^h(t)，d(t)，α_i(t)}。

Preferably, in step S4, the designed reward function of the unloading model of the mobile edge computing system is specifically:

the reward function is used for evaluating the quality of the action taken by the agent in the current state, and specifically comprises the following steps:

r(t)＝R_erengy+R_des+P_out+P_speed

wherein R (t) represents the current reward, R_erengyRepresenting the return of the optimization problem, R_desIndicating that the drone flies back to a particular destinationIn return, R_desK/d' (t), k being the reward factor; p_outRepresents a penalty of the unmanned aerial vehicle flying out of the flight area, P_speedAnd the penalty of flying overspeed of the unmanned aerial vehicle is represented.

Preferably, in the step S5, the constructed deep neural network includes an experience buffer, an Actor network, a first Critic network, a second Critic network, a first Critic target network and a second Critic target network;

in each time slot, the input of the Actor network is the current state s (t), and the corresponding current action a (t) is output to obtain the current scheduling strategy pi_φ(ii) a The input of the first Critic network and the input of the second Critic network are both the current state s (t) and the current action a (t), and Q values are respectively output; after the unmanned plane executes the current action a (t), a new state s (t +1) is generated, and the current return r (t) is obtained, and then [ s (t), a (t), r (t), s (t +1) ]]Storing in an experience buffer; the first Critic target network and the second Critic target network are respectively used as copies of the first Critic network and the second Critic network, an objective function is set, and the smaller Q value of the two Q values is selected to calculate a target value for updating network parameters of the first Critic network and the second Critic network; when the time slot is finished, updating network parameters of the Actor network and the Critic network in real time according to the current scheduling strategy, and randomly sampling from an experience buffer area to update the network parameters of the Critic target network;

the loss function for an Actor network is:

the loss function of the first Critic network and the second Critic network is:

the objective function of the first and second Critic target networks is:

where φ represents a network parameter of the Actor network, θ_iIndicates the network parameters of the ith critical network,

represents the Q value of the ith Critic network; when i is 1, theta₁Representing network parameters of the first critical network,

represents the Q value of the first Critic network; when i is 2, theta₂A network parameter representing a second critical network,

represents the Q value of the second Critic network;

represents pi according to the current scheduling policy_φCalculating the obtained new action;

representing a target value, alpha representing an entropy regularization system;

represents the Q value of the ith Critic target network, when i is 1,

representing the Q value of the first critical target network,

representing the Q value of the second critical target network.

Preferably, the constructed optimal scheduling policy expression of the deep neural network is as follows:

in the formula, pi represents an optimal scheduling strategy, alpha represents an entropy regularization coefficient, and pi_φRepresenting a scheduling policy, gamma representing a discount factor; h represents entropy, and the calculation method is as follows: h (Pi)_φ(·|s(t)))＝E[-logπ_φ(·|s(t))]。

The invention also provides a dispatching optimization system for unmanned aerial vehicle assisted mobile edge calculation, which comprises:

the model building module is used for building an unloading model of the mobile edge computing system, and the model comprises an unmanned aerial vehicle and a plurality of user equipment;

the energy consumption calculation module is used for obtaining the energy consumption of the calculation task according to the unloading model of the mobile edge calculation system;

the optimization problem establishing module is used for establishing an optimization problem combining the unmanned aerial vehicle track and the user equipment scheduling by taking the average energy consumption minimization of the user equipment as a target;

the optimization problem transformation module is used for transforming the optimization problem into a Markov decision process and defining a state space, an action space and a return function of an unloading model of the mobile edge computing system;

the network construction training module is used for constructing a deep neural network based on a deep reinforcement learning algorithm, and training the deep neural network by using a state space, an action space and a return function to obtain a trained deep neural network;

and the scheduling optimization module performs scheduling optimization by using the trained deep neural network to obtain an optimal scheduling strategy, namely a selection strategy of the flight path of the unmanned aerial vehicle and the user equipment.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the mobile edge computing system unloading model constructed by the invention comprises an unmanned aerial vehicle and a plurality of user equipment, and the optimization problem of combining unmanned aerial vehicle track and user equipment scheduling is established based on the energy consumption for completing the computing task and taking the average energy consumption minimization of the user equipment as a target; the optimization problem is non-convex, is difficult to solve by a traditional method, is converted into a Markov decision process, and defines a state space, an action space and a return function of an unloading model of the mobile edge computing system; the deep neural network constructed based on the SAC algorithm is trained by utilizing the state space, the action space and the return function, the trained deep neural network can be used for scheduling optimization, an optimal scheduling strategy is obtained, the continuous action of the unmanned aerial vehicle can be planned, a reasonable and accurate flight track and a selection strategy of user equipment are obtained, the complexity is low, the convergence is strong, and the average calculation energy consumption of the user equipment is reduced.

Drawings

Fig. 1 is a flowchart of a scheduling optimization method for unmanned aerial vehicle-assisted mobile edge computing according to embodiment 1;

FIG. 2 is a diagram illustrating an unloading model of the mobile edge computing system according to embodiment 2;

FIG. 3 is a schematic structural diagram of the constructed deep neural network described in example 2;

fig. 4 is a schematic diagram illustrating trajectory comparison of the unmanned aerial vehicle according to different optimization scheduling optimization methods described in embodiment 2;

fig. 5 is a schematic diagram illustrating comparison of average energy consumption of ue under different optimal scheduling optimization methods according to embodiment 2;

fig. 6 is a schematic diagram of a scheduling optimization system for unmanned aerial vehicle-assisted mobile edge computation according to embodiment 3.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;

it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

Example 1

The embodiment provides a scheduling optimization method for unmanned aerial vehicle-assisted mobile edge computing, as shown in fig. 1, including:

In a specific implementation process, an unloading model of a mobile edge computing system is constructed, and a single unmanned aerial vehicle carries an MEC to fly in a specified area, so that edge computing is provided for user equipment; calculating the energy consumption for completing each calculation task according to the unloading model of the mobile edge calculation system; establishing an optimization problem combining unmanned aerial vehicle track and user equipment scheduling by taking average energy consumption minimization of the user equipment as a target; the method comprises the steps that the problem of unmanned aerial vehicle flight trajectory and user equipment selection unloading calculation or local calculation is converted into a Markov decision process, and a state space, an action space and a return function of an unloading model of a mobile edge computing system are defined; the method comprises the steps of constructing a deep neural network based on a SAC algorithm, training the deep neural network by using a state space, an action space and a return function, scheduling and optimizing by using the trained deep neural network to obtain an optimal scheduling strategy, namely a selection strategy of a flight track of the unmanned aerial vehicle and user equipment, solving a non-convex optimization problem, planning continuous actions of the unmanned aerial vehicle, obtaining a reasonable and accurate flight track and selection strategy of the user equipment, and reducing average calculation energy consumption of the user equipment.

Example 2

The embodiment provides a scheduling optimization method for unmanned aerial vehicle-assisted mobile edge computing, which comprises the following steps:

as shown in fig. 2, the offload model of the mobile edge computing system includes a single drone and N user devices, the drone serving K user devices at most simultaneously, each user device selecting to offload computing tasks from local computing to drone computing; the length and the width of the flight area of the unmanned aerial vehicle are set to be X respectively_maxAnd Y_maxThe unmanned aerial vehicle flies at a constant speed v (t) at a fixed height h, the antenna emission angle is theta, and the maximum flying speed is v_max(ii) a The flight time of the unmanned aerial vehicle is T time slots, the length of each time slot is tau, and the time for completing the calculation task at any moment cannot exceed the maximum time delay T_max；

Defining the calculation task at the moment t as follows:

I_i(t)＝{D_i(t)，F_i(t)}

definition of alpha_i(t) e {0, 1} represents a selection policy of the user equipmentA, a_iWhen (t) is 0, the calculation task local calculation at the time t is shown, alpha_iWhen (t) is 1, it indicates that the calculation task at time t is unloaded.

in the formula (f)^U(t) represents the computational power of the drone;

the total time overhead for the ue to choose to offload the computation is:

in the formula (I), the compound is shown in the specification,

The user equipment selecting local calculation, i.e. alpha_i(t)＝0；

in the formula (I), the compound is shown in the specification,

representing the computing power of the user device;

setting power consumption of a user equipment to

The user equipment selects the locally calculated energy consumption as:

in the formula, k_iIs a first constant, v_iIs a second constant. In this example, v_iIs 3.

defining a set of flight actions

User equipment scheduling policy set

The optimization problem P is then expressed as:

when alpha is_iWhen (t) is 0, the reaction is carried out,

for the state space, in each time slot, the position of the user equipment is fixed, and only the position information of the unmanned aerial vehicle needs to be considered; and when each flight cycle is finished, the unmanned aerial vehicle needs to arrive at a specific destination, and the distance between the unmanned aerial vehicle and the specific destination is set as d '(t), then in the state space, the current state expression is s (t) ═ x (t), y (t), h, d' (t) }, and the state space of the embodiment is 4-dimensional;

for the action space, according to the flight distance d (t) of the unmanned aerial vehicle and the horizontal direction angle theta^h(t), calculating the position coordinates [ X (t +1), Y (t +1), h of the unmanned aerial vehicle at the next moment]And the selection policy of the user equipment, then in the action space, the current action expression is a (t) ═ θ^h(t)，d(t)，α_i(t), the motion space of this embodiment is (N +2) -dimensional.

r(t)＝R_erengy+R_des+P_out+P_speed

wherein R (t) represents the current reward, R_erengyRepresenting the return of the optimization problem, R_desIndicating a return of the drone to a particular destination, R_desK/d' (t), k being the reward factor; p_outRepresents a penalty of the unmanned aerial vehicle flying out of the flight area, P_speedAnd the penalty of flying overspeed of the unmanned aerial vehicle is represented.

The constructed deep neural network comprises an experience buffer area, an Actor network, a first Critic network, a second Critic network, a first Critic target network and a second Critic target network;

in each time slot, the input of the Actor network is the current state s (t), and the corresponding current action a (t) is output to obtain the current scheduling strategy pi_φ(ii) a The input of the first Critic network and the input of the second Critic network are both the current state s (t) and the current action a (t), and Q values are respectively output; after the unmanned plane executes the current action a (t), a new state s (t +1) is generated, and the current return r (t) is obtained, and then [ s (t), a (t), r (t), s (t +1) ]]Storing in an experience buffer; the first Critic target network and the second Critic target network are respectively used as copies of the first Critic network and the second Critic network, and target functions are setSelecting the smaller Q value of the two Q values to calculate a target value for updating the network parameters of the first Critic network and the second Critic network; when the time slot is finished, updating network parameters of the Actor network and the Critic network in real time according to the current scheduling strategy, and randomly sampling from an experience buffer area to update the network parameters of the Critic target network;

the loss function for an Actor network is:

the loss function of the first Critic network and the second Critic network is:

the objective function of the first and second Critic target networks is:

represents the Q value of the second Critic network;

represents the Q value of the ith Critic target network, when i is 1,

representing the Q value of the first critical target network,

represents the Q value of the second Critic target network;

The optimal scheduling strategy expression is as follows:

In a specific implementation process, each screen of the deep neural network constructed based on the SAC algorithm is that the unmanned aerial vehicle starts from a starting point and arrives at a destination or the maximum time T is over; before each screen starts, initializing a starting position and an end position of the unmanned aerial vehicle, and randomly initializing the number of user equipment, namely the value of N; in the initial stage, the scheduling strategy is far away from the optimal scheduling strategy, the entropy regularization coefficient alpha is set to be 1, so that the intelligent agent explores more actions in the initial stage to prevent from falling into the local optimal solution, the alpha is updated while the network parameters are updated, and the alpha is updated along with iterationAnd the algorithm gradually converges to the optimal solution when the times are increased. As shown in fig. 3, in each time slot, the agent outputs an action a (t), namely the flight direction and distance of the drone, according to the observed state information s (t), and the user equipment selects local calculation or offloading calculation; if the flight distance of the unmanned aerial vehicle is greater than the maximum distance d_maxLet d (t) be d_max(ii) a And if the next position of the unmanned aerial vehicle exceeds the specified area, canceling the flight action. Obtaining corresponding current return r (t) and state s (t +1) at the next moment according to the current action, and converting s (t), a (t), r (t), s (t +1)]And storing the network parameters in an experience buffer, and randomly sampling K groups of experiences from the experience buffer at the end of each time to update the network parameters. The SAC algorithm comprises a parameterized Actor network from which a strategy pi is output_φ(. s (t)), namely inputting the state information s (t) to the Actor network and outputting the corresponding action a (t) to pi_φ(. s (t)); in addition, two parameterized Critic networks, also called Q networks, input the status information s (t) of the Actor network and the corresponding obtained action a (t) are jointly input into the first Critic network and the second Critic network, and the obtained Q values are respectively output

Is selected to be smaller

The method is used for judging the performance of the Actor network and preventing overestimation. Wherein phi and theta_iRespectively representing the parameters of the Actor network and the Critic network. Similar to other DRL algorithms, the SAC algorithm also sets an experience buffer for training deep neural network parameters, and also sets a target network and soft update. The objective functions are copies of the first and second critical networks respectively,

denotes a target Q value, θ'_iRepresenting parameters of the first and second critical target networks. "Soft" update means that the parameters of the target network are updated by slowly tracking the trained network parameters, i.e., < ← τ φ + (1- τ) φ',θ_i←τθ_i+(1-τ)θ′_iwherein tau is less than or equal to 1. The difference is that the actions for updating the Actor network and Critic network are from the current policy and are not sampled from the experience buffer.

As shown in fig. 4, scheduling optimization is performed by taking a single drone service and 40 pieces of user equipment as examples, and a trajectory of a drone under different scheduling optimization methods is shown in the figure; the difference is that the trajectory 2 is the flight trajectory of the unmanned aerial vehicle which is only optimized to be scheduled by the user equipment, and the trajectory 3 is the flight trajectory of the unmanned aerial vehicle which is scheduled by the random user equipment; where the triangle represents track 1, the diamond represents track 2, the square represents track 3, and track 2 coincides with track 3; as shown in fig. 5, a comparison graph of the average energy consumption of the user equipments for three scheduling methods is shown, where a triangle represents the average energy consumption of the user equipments for jointly optimizing the trajectory of the drone and for scheduling the user equipments provided by the embodiment, a circle represents the average energy consumption of the user equipments for optimizing only scheduling the user equipments, and a square represents the average energy consumption of the user equipments for scheduling the random user equipments; based on different functions of different user equipment in reality, the calculation task size is randomly generated in the embodiment, and the maximum service number K of the unmanned aerial vehicles is 3; as can be seen from the figure, the average energy consumption of the user equipment in the scheduling optimization method for jointly optimizing the unmanned aerial vehicle trajectory and the user equipment scheduling is much smaller than that in the method for only optimizing the user equipment scheduling and the method for randomly scheduling the user equipment.

Example 3

The present embodiment provides a scheduling optimization system for unmanned aerial vehicle-assisted mobile edge computation, as shown in fig. 6, including:

The terms describing positional relationships in the drawings are for illustrative purposes only and are not to be construed as limiting the patent;

it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. a scheduling optimization method of UAV-assisted mobile edge computing, is characterized in that, comprises:

S1: Build an offloading model of the mobile edge computing system, the model includes a drone and several user equipments;

S2: Obtain the energy consumption of the computing task according to the unloading model of the mobile edge computing system;

S3: To minimize the average energy consumption of user equipment, establish an optimization problem of joint UAV trajectory and user equipment scheduling;

S4: Convert the optimization problem into a Markov decision process, and define the state space, action space and reward function of the mobile edge computing system unloading model;

S5: Construct a deep neural network based on the SAC algorithm, and use the state space, action space and reward function to train the deep neural network to obtain a trained deep neural network;

S6: Use the trained deep neural network to perform scheduling optimization to obtain the optimal scheduling strategy, that is, the selection strategy of the UAV flight trajectory and user equipment.

2. The scheduling optimization method of UAV-assisted mobile edge computing according to claim 1, characterized in that, in the step S1, the unloading model of the constructed mobile edge computing system is specifically:

The offload model of the mobile edge computing system includes a single drone and N user devices. The drone can serve up to K user devices at the same time, and each user device chooses to offload computing tasks from local computing or offload to drone computing; set The length and width of the flying area of the UAV are X _max and Y _max respectively, the UAV flies at a constant speed of v(t) at a fixed height h, the antenna launch angle is θ, and the maximum flying speed is v _max ; The flight time of T is T timeslots, and the length of each timeslot is τ, and the time to complete the calculation task at any time cannot exceed the maximum delay T _max ;

Denote the coordinates of the UAV as [X(t), Y(t), h], and the coordinates of the user equipment as [x _i (t), y _i (t), 0], i ∈ {1, 2 ,...,N}; set the flight distance and horizontal angle of the UAV at time t as d(t) and θh(t), respectively, then X(t)=X(t-1)+d(t) cos(θh(t)), Y(t)=Y(t-1)+d(t)sin(θ ^h (t)); the maximum coverage area of the UAV is R _max = h tan(θ) , the flight speed is

The computing task at time t is defined as:

I _i (t)={D _i (t), F _i (t)}

In the formula, D _i (t) represents the amount of data transmission when the computing task at time t is selected to be unloaded, and F _i (t) represents the computing power required to complete the computing task at time t;

Definition α _i (t)∈{0, 1} represents the selection strategy of the user equipment, when α _i (t)=0, it represents the local calculation of the computing task at time t, and when α _i (t)=1, it represents the computing task at time t Offload calculation.

3. The scheduling optimization method of unmanned aerial vehicle-assisted mobile edge computing according to claim 2, characterized in that, in the step S2, according to the unloading model of the mobile edge computing system, obtaining the energy consumed by the computing task comprises:

The user equipment selects the unloading calculation, that is, α _i (t)=1; at this time, the distance between the user equipment and the UAV on the horizontal plane is:

Then the uplink rate during offloading calculation is:

In the formula, B represents the average bandwidth of the communication channel, P ^Tr represents the transmission power of user equipment data unloading, and ρ represents the transmission power coefficient;

The time overhead for the user equipment to transmit computing tasks is:

The time overhead of UAV processing computing tasks is:

In the formula, f ^U (t) represents the computing power of the UAV;

Then the total time overhead for the user equipment to choose to uninstall the calculation is:

The energy consumption when the user equipment chooses to offload the calculation is:

In the formula,

Indicates the energy consumption of the i-th user equipment that chooses to offload computing.

4. The scheduling optimization method of unmanned aerial vehicle-assisted mobile edge computing according to claim 3, wherein in the step S2, according to the unloading model of the mobile edge computing system, obtaining the energy consumed by the computing task also comprises:

The user equipment selects local calculation, that is, α _i (t)=0;

The time overhead for the user equipment to process computing tasks is:

In the formula,

Indicates the computing power of the user equipment;

Set the power consumption of the user equipment to

Then the energy consumption of the user equipment to choose local computing is:

In the formula, _k _i is the first constant, and vi is the second constant.

5. the scheduling optimization method of UAV-assisted mobile edge computing according to claim 4, is characterized in that, in described step S3, take the average energy consumption of user equipment to be minimized as the goal, establish joint UAV trajectory and The optimization problem of user equipment scheduling, specifically:

Define the flight action set

User Equipment Scheduling Policy Set

Then the optimization problem P is expressed as:

P:

Among them, E _i (t) represents the energy consumption of the user equipment, when α _i (t)=1,

When α _i (t)=0,

Represents the constraint that the UAV serves at most K user equipments at the same time, and α _i (t)S _i (t)≤R _max indicates that the user equipment that is constrained to select the offload calculation is in the maximum coverage of the UAV.

6. The scheduling optimization method of UAV-assisted mobile edge computing according to claim 5, characterized in that, in the step S4, the state space and the action space of the designed mobile edge computing system unloading model are specifically:

In the unloading model of the mobile edge computing system, the UAV and the user equipment are equivalent to an agent. In each time slot, the agent obtains the current state s(t) from the environment observation, and the current state s(t) corresponds to the current state. Action a(t), the drone performs the current action a(t) in the action space, interacts with the environment, and the environment returns the current return r(t) and the new state s(t+1);

For the state space, in each time slot, the location of the user equipment is fixed, and only the location information of the UAV needs to be considered; and at the end of each flight cycle, the UAV needs to reach a specific destination, and set the unmanned The distance between the machine and the specific destination is d'(t), then in the state space, the current state expression is s(t)={X(t), Y(t), h, d'(t)};

For the action space, according to the UAV flight distance d(t) and the horizontal angle θ ^h (t), calculate the position coordinates of the UAV at the next moment [X(t+1), Y(t+1), h ], and the selection strategy of the user equipment, then in the action space, the current action expression is a(t)={θ ^h (t), d(t), α _i (t)}.

7. The scheduling optimization method of UAV-assisted mobile edge computing according to claim 6, characterized in that, in the step S4, the reward function of the unloading model of the designed mobile edge computing system is specifically:

The reward function is used to evaluate the quality of the action taken by the agent in the current state, specifically:

r(t)=R _erengy +R _des +P _out +P _speed

In the formula, r(t) represents the current reward, _Reengy represents the reward of the optimization problem, R _des represents the reward of the drone flying back to a specific destination, R _des =k/d'(t), k is the reward factor; P _out represents the penalty for the drone flying out of the flight area, and P _speed represents the penalty for the drone flying too fast.

8. The scheduling optimization method of UAV-assisted mobile edge computing according to claim 7, wherein in the step S5, the constructed deep neural network comprises an experience buffer, an Actor network, the first Critic network, the first Critic network, the third Two Critic networks, a first Critic target network and a second Critic target network;

In each time slot, the input of the Actor network is the current state s(t), and the corresponding current action a(t) is output to obtain the current scheduling policy π _φ ; the inputs of the first Critic network and the second Critic network are current The state s(t) and the current action a(t), respectively output the Q value; after the drone performs the current action a(t), a new state s(t+1) is generated, and the current return r(t) is obtained. [s(t), a(t), r(t), s(t+1)] are stored in the experience buffer; the first Critic target network and the second Critic target network are used as the first Critic network and the second Critic target network, respectively. A copy of the Critic network, set the objective function, and select the smaller Q value of the two Q values to calculate the target value, which is used to update the network parameters of the first Critic network and the second Critic network; at the end of the time slot, according to the current scheduling strategy Update the network parameters of the Actor network and the Critic network in real time, and update the network parameters of the Critic target network by random sampling from the experience buffer;

The loss function of the Actor network is:

The loss functions of the first Critic network and the second Critic network are:

The objective functions of the first Critic target network and the second Critic target network are:

In the formula, φ represents the network parameters of the Actor network, θ _i represents the network parameters of the i-th Critic network,

Represents the Q value of the i-th Critic network; when i=1, θ ₁ represents the network parameters of the first Critic network,

Represents the Q value of the first Critic network; when i=2, θ ₂ represents the network parameters of the second Critic network,

Represents the Q value of the second Critic network;

represents the new action calculated according to the current scheduling policy π _φ ;

represents the target value, and α represents the entropy regularization system;

Represents the Q value of the i-th Critic target network, when i=1,

represents the Q value of the first Critic target network,

Represents the Q value of the second Critic target network.

9. The scheduling optimization method of UAV-assisted mobile edge computing according to claim 8, wherein the optimal scheduling strategy expression of the constructed deep neural network is:

In the formula, π* represents the optimal scheduling strategy, α represents the entropy regularization coefficient, π _φ represents the scheduling strategy, and γ represents the discount factor; H represents the entropy, and the calculation method is: H(π _φ (·|s(t))) =E[ _-logπφ (·|s(t))].

10. A scheduling optimization system for UAV-assisted mobile edge computing, characterized in that it comprises:

a model building module for building an offloading model of the mobile edge computing system, the model including a drone and several user equipments;

The energy consumption calculation module obtains the energy consumption of the computing task according to the unloading model of the mobile edge computing system;

The optimization problem establishment module is used to establish the optimization problem of joint UAV trajectory and user equipment scheduling with the goal of minimizing the average energy consumption of user equipment;

an optimization problem conversion module, which is used to convert the optimization problem into a Markov decision process, and define the state space, action space and reward function of the mobile edge computing system unloading model;

Network construction training module, build deep neural network based on SAC algorithm, use state space, action space and reward function to train deep neural network to obtain a trained deep neural network;

The scheduling optimization module uses the trained deep neural network for scheduling optimization to obtain the optimal scheduling strategy, that is, the selection strategy of the UAV flight trajectory and user equipment.