CN114169234A - A scheduling optimization method and system for UAV-assisted mobile edge computing - Google Patents

A scheduling optimization method and system for UAV-assisted mobile edge computing Download PDF

Info

Publication number
CN114169234A
CN114169234A CN202111449863.3A CN202111449863A CN114169234A CN 114169234 A CN114169234 A CN 114169234A CN 202111449863 A CN202111449863 A CN 202111449863A CN 114169234 A CN114169234 A CN 114169234A
Authority
CN
China
Prior art keywords
network
user equipment
uav
critic
mobile edge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111449863.3A
Other languages
Chinese (zh)
Other versions
CN114169234B (en
Inventor
张广驰
何梓楠
崔苗
刘圣海
王日明
王昆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202111449863.3A priority Critical patent/CN114169234B/en
Publication of CN114169234A publication Critical patent/CN114169234A/en
Application granted granted Critical
Publication of CN114169234B publication Critical patent/CN114169234B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Traffic Control Systems (AREA)

Abstract

本发明公开了一种无人机辅助移动边缘计算的调度优化方法及系统,涉及无人机移动边缘计算的技术领域,所述方法包括构建当无人机和若干个用户设备额移动边缘计算系统的卸载模型,并计算完成每个计算任务的能耗;以用户设备的平均能耗最小化为目标,建立联合无人机轨迹和用户设备调度的优化问题,将其转化为马尔科夫决策过程,定义移动边缘计算系统卸载模型的状态空间、动作空间和回报函,用于训练基于SAC算法构建的深度神经网络,利用训练好的深度神经网络可以进行调度优化,获得最优调度策略,可以规划出无人机的连续动作,获得合理、准确的飞行轨迹和用户设备的选择策略,复杂程度低,收敛性强,减少了用户设备的平均计算能耗。

Figure 202111449863

The invention discloses a scheduling optimization method and system for UAV-assisted mobile edge computing, and relates to the technical field of UAV mobile edge computing. The method includes constructing a mobile edge computing system for a UAV and several user equipment and calculate the energy consumption for completing each computing task; with the goal of minimizing the average energy consumption of user equipment, an optimization problem of joint UAV trajectory and user equipment scheduling is established, and it is transformed into a Markov decision process , define the state space, action space and reward function of the mobile edge computing system offloading model, which is used to train the deep neural network based on the SAC algorithm. The trained deep neural network can be used for scheduling optimization, and the optimal scheduling strategy can be obtained. The continuous action of the UAV is obtained, and the reasonable and accurate flight trajectory and the selection strategy of the user equipment are obtained. The complexity is low, the convergence is strong, and the average computing energy consumption of the user equipment is reduced.

Figure 202111449863

Description

Scheduling optimization method and system for unmanned aerial vehicle-assisted mobile edge calculation
Technical Field
The invention relates to the technical field of unmanned aerial vehicle mobile edge calculation, in particular to a scheduling optimization method and system for unmanned aerial vehicle-assisted mobile edge calculation.
Background
The rapid development of the internet of things computing promotes the popularization of computing task intensive intelligent equipment, such as automatic driving, virtual reality and the like, so that the life is more convenient and faster. At this stage, although mobile devices are equipped with powerful hardware, significant power consumption is still required to complete the computing task of mobile applications while meeting the requirement of low latency. In recent years, Mobile Edge Computing (MEC) has been proposed to overcome this drawback, i.e. the computing task of the user equipment is transferred to the network edge for computing, which greatly reduces the energy consumption of the equipment. Recently, the use of the MEC carried by the unmanned aerial vehicle is widely discussed in the industrial and academic fields, and the coverage capability and mobility of the unmanned aerial vehicle are utilized to achieve lower delay requirements, provide more flexible computing services and reduce cost. Unmanned aerial vehicle carries on MEC has following problem: (1) how to select proper device association, i.e. the user device selects to offload or locally process the computing task, so as to reduce the long-term energy consumption of all user devices as much as possible; (2) considering how the user equipment calculates different tasks and controls the flight trajectory, i.e. flight direction and distance, of the unmanned aerial vehicle in real time, especially in the case that the unmanned aerial vehicle needs to reach a specific end point. For the combined use method of the MEC and the unmanned aerial vehicle, a plurality of scholars have already made relevant researches. Because the strategy space of the unmanned aerial vehicle, namely the optimal track, is a continuous space, the traditional exhaustive search method is difficult to solve. Some scholars propose a quantitative dynamic planning algorithm to solve the resource allocation problem of the MEC, and the complexity of the algorithm is very high because the flight selection of the unmanned aerial vehicle is almost infinite; or the unmanned aerial vehicle track is discretized into an unmanned aerial vehicle position sequence, and a continuous space is converted into a discrete finite space, so that the problem can be treated, the unmanned aerial vehicle track is approximated through a discrete variable, and the optimization is performed through a traditional convex optimization method, but the method can reduce the control precision of the unmanned aerial vehicle, and the optimal control strategy cannot be obtained.
The prior art discloses a mobile edge calculation method, a device and a system of the internet of things, which comprise the following steps: distributing the unmanned aerial vehicles for the Internet of things equipment based on the current simulated positions of the unmanned aerial vehicles and the actual positions of the Internet of things equipment in the target Internet of things area; simulating to unload the tasks of the Internet of things equipment to the distributed unmanned aerial vehicles, and simulating each unmanned aerial vehicle to schedule the received tasks based on a deep reinforcement learning algorithm; iteratively updating the current simulation position of each unmanned aerial vehicle by using a differential evolution algorithm, and continuing to execute the operation until the iterative updating times reach a preset threshold value; determining the optimal coordinate position of each unmanned aerial vehicle based on the unmanned aerial vehicle distributed by the Internet of things equipment in each operation, the task scheduling result of the unmanned aerial vehicle and the current simulation position of the unmanned aerial vehicle; and triggering each unmanned aerial vehicle to move to the optimal coordinate position of the unmanned aerial vehicle and scheduling tasks on the corresponding Internet of things equipment. The method comprises the steps that tasks of the Internet of things equipment are unloaded and distributed to the unmanned aerial vehicle, and the tasks are scheduled under the condition that the load of the unmanned aerial vehicle is balanced; only the load balance of the unmanned aerial vehicle is considered, the unmanned aerial vehicle track and the user equipment scheduling strategy are not jointly considered, the track planning is unreasonable, and the computing energy consumption is high.
Disclosure of Invention
The invention aims to overcome the defect that the prior art can not plan the continuous action of the unmanned aerial vehicle and obtain an accurate scheduling strategy, and provides a scheduling optimization method and a scheduling optimization system for the unmanned aerial vehicle auxiliary mobile edge calculation.
In order to solve the technical problems, the technical scheme of the invention is as follows:
the invention provides a scheduling optimization method for unmanned aerial vehicle-assisted mobile edge calculation, which comprises the following steps:
s1: constructing an unloading model of the mobile edge computing system, wherein the model comprises an unmanned aerial vehicle and a plurality of user devices;
s2: obtaining the energy consumption of the calculation task according to the unloading model of the mobile edge calculation system;
s3: establishing an optimization problem combining unmanned aerial vehicle track and user equipment scheduling by taking average energy consumption minimization of the user equipment as a target;
s4: converting the optimization problem into a Markov decision process, and defining a state space, an action space and a return function of an unloading model of the mobile edge computing system;
s5: constructing a deep neural network based on a SAC algorithm, and training the deep neural network by using a state space, an action space and a return function to obtain a trained deep neural network;
s6: and carrying out scheduling optimization by using the trained deep neural network to obtain an optimal scheduling strategy, namely a selection strategy of the flight path of the unmanned aerial vehicle and the user equipment.
The SAC algorithm is an off-line random strategy algorithm based on a maximum entropy reinforcement learning framework and an Actor-Critic network, and is mainly characterized by entropy regularization, wherein entropy is a measure of strategy randomness, and the increase of entropy can bring more strategy exploration, and the expected return and the entropy value are balanced through training strategies, so that the network learning speed can be accelerated, and meanwhile, the strategy convergence to a local optimal solution is avoided; the purpose of the Actor network is to obtain the maximum return expectation and the maximum entropy, i.e. explore other strategies in the strategy space while successfully completing the task; the combination of the network update in an offline mode and the Actor-Critic network achieves good performance on a continuous control reference task, and is more stable and better in convergence.
Preferably, in step S1, the unloading model of the mobile edge computing system is specifically:
unloading module for mobile edge computing systemThe model comprises a single unmanned aerial vehicle and N user devices, wherein the unmanned aerial vehicle simultaneously serves K user devices at most, and each user device selects to calculate a calculation task from local calculation or unload the calculation task to the unmanned aerial vehicle calculation; the length and the width of the flight area of the unmanned aerial vehicle are set to be X respectivelymaxAnd YmaxThe unmanned aerial vehicle flies at a constant speed v (t) at a fixed height h, the antenna emission angle is theta, and the maximum flying speed is vmax(ii) a The flight time of the unmanned aerial vehicle is T time slots, the length of each time slot is tau, and the time for completing the calculation task at any moment cannot exceed the maximum time delay Tmax
Let the coordinates of the unmanned aerial vehicle be [ X (t), Y (t), h]The coordinates of the user equipment are [ x ]i(t),yi(t),0]I ∈ {1, 2, …, N }; setting the flight distance and the horizontal direction angle of the unmanned aerial vehicle at the moment t as d (t) and theta respectivelyh(t), then X (t) X (t-1) + d (t) cos (θ)h(t)),Y(t)=Y(t-1)+d(t)sin(θh(t)); the maximum coverage of the unmanned aerial vehicle is RmaxH · tan (θ), and a flying speed of
Figure BDA0003385009630000031
Defining the calculation task at the moment t as follows:
Ii(t)={Di(t),Fi(t)}
in the formula, Di(t) represents the amount of data transferred when selecting to offload a computing task at time t of computing, Fi(t) represents the computational power required to complete the computational task at time t;
definition of alphai(t) E {0, 1} represents a selection policy of the user equipment, αiWhen (t) is 0, the calculation task local calculation at the time t is shown, alphaiWhen (t) is 1, it indicates that the calculation task at time t is unloaded.
Preferably, in step S2, obtaining the energy consumed by the computing task according to the unloading model of the mobile edge computing system includes:
user equipment selection to offload computation, i.e. alphai(t) ═ 1; this moment the distance on this user equipment and unmanned aerial vehicle's the horizontal plane does:
Figure BDA0003385009630000032
the user equipment is provided with a single antenna, and in order to avoid interference between the user equipment, a frequency division multiple access protocol unloading mode is adopted; because the flying height of the unmanned aerial vehicle is certain, and a free space channel model is adopted, the uplink rate during unloading calculation is as follows:
Figure BDA0003385009630000033
wherein B represents the average bandwidth of the communication channel, PTrRepresenting the transmission power of the user equipment data unloading, and rho represents a transmission power coefficient;
the time overhead for the user equipment to transmit the calculation task is as follows:
Figure BDA0003385009630000041
the time overhead of processing the calculation task by the unmanned aerial vehicle is as follows:
Figure BDA0003385009630000042
in the formula (f)U(t) represents the computational power of the drone;
the total time overhead for the ue to choose to offload the computation is:
Figure BDA0003385009630000043
the energy consumption of the user equipment for selecting the uninstalling calculation is as follows:
Figure BDA0003385009630000044
in the formula (I), the compound is shown in the specification,
Figure BDA0003385009630000045
indicating that the ith user equipment chooses to offload the calculated energy consumption.
Preferably, in step S2, the obtaining the energy consumed by the computing task according to the unloading model of the mobile edge computing system further includes:
the user equipment selecting local calculation, i.e. alphai(t)=0;
The time overhead of the user equipment for processing the computing task is as follows:
Figure BDA0003385009630000046
in the formula (I), the compound is shown in the specification,
Figure BDA0003385009630000047
representing the computing power of the user device;
setting power consumption of a user equipment to
Figure BDA0003385009630000048
The user equipment selects the locally calculated energy consumption as:
Figure BDA0003385009630000049
in the formula, kiIs a first constant, viIs a second constant.
Preferably, in step S3, with the objective of minimizing the average energy consumption of the user equipment, an optimization problem combining the trajectory of the drone and the scheduling of the user equipment is established, specifically:
defining a set of flight actions
Figure BDA00033850096300000410
User equipment scheduling policy set
Figure BDA00033850096300000411
Then optimize the problemP is represented as:
Figure BDA00033850096300000412
Figure BDA0003385009630000051
Figure BDA0003385009630000052
Figure BDA0003385009630000053
Figure BDA0003385009630000054
Figure BDA0003385009630000055
Figure BDA0003385009630000056
Figure BDA0003385009630000057
Figure BDA0003385009630000058
wherein E isi(t) represents the energy consumption of the user equipment, when αiWhen the value (t) is 1,
Figure BDA0003385009630000059
when alpha isiWhen (t) is 0, the reaction is carried out,
Figure BDA00033850096300000510
representing a constraint on an unmanned aerial vehicle to serve at most K user equipments, alpha, simultaneouslyi(t)Si(t)≤RmaxThe user equipment representing the constraint selection offload computation is in the maximum coverage of the drone.
Preferably, in step S4, the state space and the action space of the designed unloading model of the moving edge computing system are specifically:
in an unloading model of the mobile edge computing system, an unmanned aerial vehicle and user equipment are equivalent to an intelligent agent, and in each time slot, the intelligent agent observes and obtains a current state s (t) from an environment, the current state s (t) corresponds to a current action a (t), the unmanned aerial vehicle executes the current action a (t) in an action space, interacts with the environment, and returns a current return r (t) and a new state s (t + 1);
for the state space, in each time slot, the position of the user equipment is fixed, and only the position information of the unmanned aerial vehicle needs to be considered; and when each flight cycle is finished, the unmanned aerial vehicle needs to arrive at a specific destination, and the distance between the unmanned aerial vehicle and the specific destination is set as d '(t), the current state expression in the state space is s (t) ═ x (t), y (t), h, d' (t) };
for the action space, according to the flight distance d (t) of the unmanned aerial vehicle and the horizontal direction angle thetah(t), calculating the position coordinates [ X (t +1), Y (t +1), h of the unmanned aerial vehicle at the next moment]And the selection policy of the user equipment, then in the action space, the current action expression is a (t) ═ θh(t),d(t),αi(t)}。
Preferably, in step S4, the designed reward function of the unloading model of the mobile edge computing system is specifically:
the reward function is used for evaluating the quality of the action taken by the agent in the current state, and specifically comprises the following steps:
r(t)=Rerengy+Rdes+Pout+Pspeed
wherein R (t) represents the current reward, RerengyRepresenting the return of the optimization problem, RdesIndicating that the drone flies back to a particular destinationIn return, RdesK/d' (t), k being the reward factor; poutRepresents a penalty of the unmanned aerial vehicle flying out of the flight area, PspeedAnd the penalty of flying overspeed of the unmanned aerial vehicle is represented.
Preferably, in the step S5, the constructed deep neural network includes an experience buffer, an Actor network, a first Critic network, a second Critic network, a first Critic target network and a second Critic target network;
in each time slot, the input of the Actor network is the current state s (t), and the corresponding current action a (t) is output to obtain the current scheduling strategy piφ(ii) a The input of the first Critic network and the input of the second Critic network are both the current state s (t) and the current action a (t), and Q values are respectively output; after the unmanned plane executes the current action a (t), a new state s (t +1) is generated, and the current return r (t) is obtained, and then [ s (t), a (t), r (t), s (t +1) ]]Storing in an experience buffer; the first Critic target network and the second Critic target network are respectively used as copies of the first Critic network and the second Critic network, an objective function is set, and the smaller Q value of the two Q values is selected to calculate a target value for updating network parameters of the first Critic network and the second Critic network; when the time slot is finished, updating network parameters of the Actor network and the Critic network in real time according to the current scheduling strategy, and randomly sampling from an experience buffer area to update the network parameters of the Critic target network;
the loss function for an Actor network is:
Figure BDA0003385009630000061
the loss function of the first Critic network and the second Critic network is:
Figure BDA0003385009630000062
the objective function of the first and second Critic target networks is:
Figure BDA0003385009630000063
where φ represents a network parameter of the Actor network, θiIndicates the network parameters of the ith critical network,
Figure BDA0003385009630000064
represents the Q value of the ith Critic network; when i is 1, theta1Representing network parameters of the first critical network,
Figure BDA0003385009630000065
represents the Q value of the first Critic network; when i is 2, theta2A network parameter representing a second critical network,
Figure BDA0003385009630000066
represents the Q value of the second Critic network;
Figure BDA0003385009630000067
represents pi according to the current scheduling policyφCalculating the obtained new action;
Figure BDA0003385009630000068
representing a target value, alpha representing an entropy regularization system;
Figure BDA0003385009630000069
represents the Q value of the ith Critic target network, when i is 1,
Figure BDA00033850096300000610
representing the Q value of the first critical target network,
Figure BDA00033850096300000611
representing the Q value of the second critical target network.
Preferably, the constructed optimal scheduling policy expression of the deep neural network is as follows:
Figure BDA0003385009630000071
in the formula, pi represents an optimal scheduling strategy, alpha represents an entropy regularization coefficient, and piφRepresenting a scheduling policy, gamma representing a discount factor; h represents entropy, and the calculation method is as follows: h (Pi)φ(·|s(t)))=E[-logπφ(·|s(t))]。
The invention also provides a dispatching optimization system for unmanned aerial vehicle assisted mobile edge calculation, which comprises:
the model building module is used for building an unloading model of the mobile edge computing system, and the model comprises an unmanned aerial vehicle and a plurality of user equipment;
the energy consumption calculation module is used for obtaining the energy consumption of the calculation task according to the unloading model of the mobile edge calculation system;
the optimization problem establishing module is used for establishing an optimization problem combining the unmanned aerial vehicle track and the user equipment scheduling by taking the average energy consumption minimization of the user equipment as a target;
the optimization problem transformation module is used for transforming the optimization problem into a Markov decision process and defining a state space, an action space and a return function of an unloading model of the mobile edge computing system;
the network construction training module is used for constructing a deep neural network based on a deep reinforcement learning algorithm, and training the deep neural network by using a state space, an action space and a return function to obtain a trained deep neural network;
and the scheduling optimization module performs scheduling optimization by using the trained deep neural network to obtain an optimal scheduling strategy, namely a selection strategy of the flight path of the unmanned aerial vehicle and the user equipment.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the mobile edge computing system unloading model constructed by the invention comprises an unmanned aerial vehicle and a plurality of user equipment, and the optimization problem of combining unmanned aerial vehicle track and user equipment scheduling is established based on the energy consumption for completing the computing task and taking the average energy consumption minimization of the user equipment as a target; the optimization problem is non-convex, is difficult to solve by a traditional method, is converted into a Markov decision process, and defines a state space, an action space and a return function of an unloading model of the mobile edge computing system; the deep neural network constructed based on the SAC algorithm is trained by utilizing the state space, the action space and the return function, the trained deep neural network can be used for scheduling optimization, an optimal scheduling strategy is obtained, the continuous action of the unmanned aerial vehicle can be planned, a reasonable and accurate flight track and a selection strategy of user equipment are obtained, the complexity is low, the convergence is strong, and the average calculation energy consumption of the user equipment is reduced.
Drawings
Fig. 1 is a flowchart of a scheduling optimization method for unmanned aerial vehicle-assisted mobile edge computing according to embodiment 1;
FIG. 2 is a diagram illustrating an unloading model of the mobile edge computing system according to embodiment 2;
FIG. 3 is a schematic structural diagram of the constructed deep neural network described in example 2;
fig. 4 is a schematic diagram illustrating trajectory comparison of the unmanned aerial vehicle according to different optimization scheduling optimization methods described in embodiment 2;
fig. 5 is a schematic diagram illustrating comparison of average energy consumption of ue under different optimal scheduling optimization methods according to embodiment 2;
fig. 6 is a schematic diagram of a scheduling optimization system for unmanned aerial vehicle-assisted mobile edge computation according to embodiment 3.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;
it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1
The embodiment provides a scheduling optimization method for unmanned aerial vehicle-assisted mobile edge computing, as shown in fig. 1, including:
s1: constructing an unloading model of the mobile edge computing system, wherein the model comprises an unmanned aerial vehicle and a plurality of user devices;
s2: obtaining the energy consumption of the calculation task according to the unloading model of the mobile edge calculation system;
s3: establishing an optimization problem combining unmanned aerial vehicle track and user equipment scheduling by taking average energy consumption minimization of the user equipment as a target;
s4: converting the optimization problem into a Markov decision process, and defining a state space, an action space and a return function of an unloading model of the mobile edge computing system;
s5: constructing a deep neural network based on a SAC algorithm, and training the deep neural network by using a state space, an action space and a return function to obtain a trained deep neural network;
s6: and carrying out scheduling optimization by using the trained deep neural network to obtain an optimal scheduling strategy, namely a selection strategy of the flight path of the unmanned aerial vehicle and the user equipment.
In a specific implementation process, an unloading model of a mobile edge computing system is constructed, and a single unmanned aerial vehicle carries an MEC to fly in a specified area, so that edge computing is provided for user equipment; calculating the energy consumption for completing each calculation task according to the unloading model of the mobile edge calculation system; establishing an optimization problem combining unmanned aerial vehicle track and user equipment scheduling by taking average energy consumption minimization of the user equipment as a target; the method comprises the steps that the problem of unmanned aerial vehicle flight trajectory and user equipment selection unloading calculation or local calculation is converted into a Markov decision process, and a state space, an action space and a return function of an unloading model of a mobile edge computing system are defined; the method comprises the steps of constructing a deep neural network based on a SAC algorithm, training the deep neural network by using a state space, an action space and a return function, scheduling and optimizing by using the trained deep neural network to obtain an optimal scheduling strategy, namely a selection strategy of a flight track of the unmanned aerial vehicle and user equipment, solving a non-convex optimization problem, planning continuous actions of the unmanned aerial vehicle, obtaining a reasonable and accurate flight track and selection strategy of the user equipment, and reducing average calculation energy consumption of the user equipment.
Example 2
The embodiment provides a scheduling optimization method for unmanned aerial vehicle-assisted mobile edge computing, which comprises the following steps:
s1: constructing an unloading model of the mobile edge computing system, wherein the model comprises an unmanned aerial vehicle and a plurality of user devices;
as shown in fig. 2, the offload model of the mobile edge computing system includes a single drone and N user devices, the drone serving K user devices at most simultaneously, each user device selecting to offload computing tasks from local computing to drone computing; the length and the width of the flight area of the unmanned aerial vehicle are set to be X respectivelymaxAnd YmaxThe unmanned aerial vehicle flies at a constant speed v (t) at a fixed height h, the antenna emission angle is theta, and the maximum flying speed is vmax(ii) a The flight time of the unmanned aerial vehicle is T time slots, the length of each time slot is tau, and the time for completing the calculation task at any moment cannot exceed the maximum time delay Tmax
Let the coordinates of the unmanned aerial vehicle be [ X (t), Y (t), h]The coordinates of the user equipment are [ x ]i(t),yi(t),0]I ∈ {1, 2, …, N }; setting the flight distance and the horizontal direction angle of the unmanned aerial vehicle at the moment t as d (t) and theta respectivelyh(t), then X (t) X (t-1) + d (t) cos (θ)h(t)),Y(t)=Y(t-1)+d(t)sin(θh(t)); the maximum coverage of the unmanned aerial vehicle is RmaxH · tan (θ), and a flying speed of
Figure BDA0003385009630000091
Defining the calculation task at the moment t as follows:
Ii(t)={Di(t),Fi(t)}
in the formula, Di(t) represents the amount of data transferred when selecting to offload a computing task at time t of computing, Fi(t) represents the computational power required to complete the computational task at time t;
definition of alphai(t) e {0, 1} represents a selection policy of the user equipmentA, aiWhen (t) is 0, the calculation task local calculation at the time t is shown, alphaiWhen (t) is 1, it indicates that the calculation task at time t is unloaded.
S2: obtaining the energy consumption of the calculation task according to the unloading model of the mobile edge calculation system;
user equipment selection to offload computation, i.e. alphai(t) ═ 1; this moment the distance on this user equipment and unmanned aerial vehicle's the horizontal plane does:
Figure BDA0003385009630000101
the user equipment is provided with a single antenna, and in order to avoid interference between the user equipment, a frequency division multiple access protocol unloading mode is adopted; because the flying height of the unmanned aerial vehicle is certain, and a free space channel model is adopted, the uplink rate during unloading calculation is as follows:
Figure BDA0003385009630000102
wherein B represents the average bandwidth of the communication channel, PTrRepresenting the transmission power of the user equipment data unloading, and rho represents a transmission power coefficient;
the time overhead for the user equipment to transmit the calculation task is as follows:
Figure BDA0003385009630000103
the time overhead of processing the calculation task by the unmanned aerial vehicle is as follows:
Figure BDA0003385009630000104
in the formula (f)U(t) represents the computational power of the drone;
the total time overhead for the ue to choose to offload the computation is:
Figure BDA0003385009630000105
the energy consumption of the user equipment for selecting the uninstalling calculation is as follows:
Figure BDA0003385009630000106
in the formula (I), the compound is shown in the specification,
Figure BDA0003385009630000107
indicating that the ith user equipment chooses to offload the calculated energy consumption.
The user equipment selecting local calculation, i.e. alphai(t)=0;
The time overhead of the user equipment for processing the computing task is as follows:
Figure BDA0003385009630000108
in the formula (I), the compound is shown in the specification,
Figure BDA0003385009630000109
representing the computing power of the user device;
setting power consumption of a user equipment to
Figure BDA00033850096300001010
The user equipment selects the locally calculated energy consumption as:
Figure BDA00033850096300001011
in the formula, kiIs a first constant, viIs a second constant. In this example, viIs 3.
S3: establishing an optimization problem combining unmanned aerial vehicle track and user equipment scheduling by taking average energy consumption minimization of the user equipment as a target;
defining a set of flight actions
Figure BDA00033850096300001112
User equipment scheduling policy set
Figure BDA00033850096300001113
The optimization problem P is then expressed as:
Figure BDA0003385009630000111
Figure BDA0003385009630000112
Figure BDA0003385009630000113
Figure BDA0003385009630000114
Figure BDA0003385009630000115
Figure BDA0003385009630000116
Figure BDA0003385009630000117
Figure BDA0003385009630000118
Figure BDA0003385009630000119
wherein E isi(t) represents the energy consumption of the user equipment, when αiWhen the value (t) is 1,
Figure BDA00033850096300001110
when alpha isiWhen (t) is 0, the reaction is carried out,
Figure BDA00033850096300001111
representing a constraint on an unmanned aerial vehicle to serve at most K user equipments, alpha, simultaneouslyi(t)Si(t)≤RmaxThe user equipment representing the constraint selection offload computation is in the maximum coverage of the drone.
S4: converting the optimization problem into a Markov decision process, and defining a state space, an action space and a return function of an unloading model of the mobile edge computing system;
in an unloading model of the mobile edge computing system, an unmanned aerial vehicle and user equipment are equivalent to an intelligent agent, and in each time slot, the intelligent agent observes and obtains a current state s (t) from an environment, the current state s (t) corresponds to a current action a (t), the unmanned aerial vehicle executes the current action a (t) in an action space, interacts with the environment, and returns a current return r (t) and a new state s (t + 1);
for the state space, in each time slot, the position of the user equipment is fixed, and only the position information of the unmanned aerial vehicle needs to be considered; and when each flight cycle is finished, the unmanned aerial vehicle needs to arrive at a specific destination, and the distance between the unmanned aerial vehicle and the specific destination is set as d '(t), then in the state space, the current state expression is s (t) ═ x (t), y (t), h, d' (t) }, and the state space of the embodiment is 4-dimensional;
for the action space, according to the flight distance d (t) of the unmanned aerial vehicle and the horizontal direction angle thetah(t), calculating the position coordinates [ X (t +1), Y (t +1), h of the unmanned aerial vehicle at the next moment]And the selection policy of the user equipment, then in the action space, the current action expression is a (t) ═ θh(t),d(t),αi(t), the motion space of this embodiment is (N +2) -dimensional.
The reward function is used for evaluating the quality of the action taken by the agent in the current state, and specifically comprises the following steps:
r(t)=Rerengy+Rdes+Pout+Pspeed
wherein R (t) represents the current reward, RerengyRepresenting the return of the optimization problem, RdesIndicating a return of the drone to a particular destination, RdesK/d' (t), k being the reward factor; poutRepresents a penalty of the unmanned aerial vehicle flying out of the flight area, PspeedAnd the penalty of flying overspeed of the unmanned aerial vehicle is represented.
S5: constructing a deep neural network based on a SAC algorithm, and training the deep neural network by using a state space, an action space and a return function to obtain a trained deep neural network;
the SAC algorithm is an off-line random strategy algorithm based on a maximum entropy reinforcement learning framework and an Actor-Critic network, and is mainly characterized by entropy regularization, wherein entropy is a measure of strategy randomness, and the increase of entropy can bring more strategy exploration, and the expected return and the entropy value are balanced through training strategies, so that the network learning speed can be accelerated, and meanwhile, the strategy convergence to a local optimal solution is avoided; the purpose of the Actor network is to obtain the maximum return expectation and the maximum entropy, i.e. explore other strategies in the strategy space while successfully completing the task; the combination of the network update in an offline mode and the Actor-Critic network achieves good performance on a continuous control reference task, and is more stable and better in convergence.
The constructed deep neural network comprises an experience buffer area, an Actor network, a first Critic network, a second Critic network, a first Critic target network and a second Critic target network;
in each time slot, the input of the Actor network is the current state s (t), and the corresponding current action a (t) is output to obtain the current scheduling strategy piφ(ii) a The input of the first Critic network and the input of the second Critic network are both the current state s (t) and the current action a (t), and Q values are respectively output; after the unmanned plane executes the current action a (t), a new state s (t +1) is generated, and the current return r (t) is obtained, and then [ s (t), a (t), r (t), s (t +1) ]]Storing in an experience buffer; the first Critic target network and the second Critic target network are respectively used as copies of the first Critic network and the second Critic network, and target functions are setSelecting the smaller Q value of the two Q values to calculate a target value for updating the network parameters of the first Critic network and the second Critic network; when the time slot is finished, updating network parameters of the Actor network and the Critic network in real time according to the current scheduling strategy, and randomly sampling from an experience buffer area to update the network parameters of the Critic target network;
the loss function for an Actor network is:
Figure BDA0003385009630000121
the loss function of the first Critic network and the second Critic network is:
Figure BDA0003385009630000131
the objective function of the first and second Critic target networks is:
Figure BDA0003385009630000132
where φ represents a network parameter of the Actor network, θiIndicates the network parameters of the ith critical network,
Figure BDA0003385009630000133
represents the Q value of the ith Critic network; when i is 1, theta1Representing network parameters of the first critical network,
Figure BDA0003385009630000134
represents the Q value of the first Critic network; when i is 2, theta2A network parameter representing a second critical network,
Figure BDA0003385009630000135
represents the Q value of the second Critic network;
Figure BDA0003385009630000136
represents pi according to the current scheduling policyφCalculating the obtained new action;
Figure BDA0003385009630000137
representing a target value, alpha representing an entropy regularization system;
Figure BDA0003385009630000138
represents the Q value of the ith Critic target network, when i is 1,
Figure BDA0003385009630000139
representing the Q value of the first critical target network,
Figure BDA00033850096300001310
represents the Q value of the second Critic target network;
s6: and carrying out scheduling optimization by using the trained deep neural network to obtain an optimal scheduling strategy, namely a selection strategy of the flight path of the unmanned aerial vehicle and the user equipment.
The optimal scheduling strategy expression is as follows:
Figure BDA00033850096300001311
in the formula, pi represents an optimal scheduling strategy, alpha represents an entropy regularization coefficient, and piφRepresenting a scheduling policy, gamma representing a discount factor; h represents entropy, and the calculation method is as follows: h (Pi)φ(·|s(t)))=E[-logπφ(·|s(t))]。
In a specific implementation process, each screen of the deep neural network constructed based on the SAC algorithm is that the unmanned aerial vehicle starts from a starting point and arrives at a destination or the maximum time T is over; before each screen starts, initializing a starting position and an end position of the unmanned aerial vehicle, and randomly initializing the number of user equipment, namely the value of N; in the initial stage, the scheduling strategy is far away from the optimal scheduling strategy, the entropy regularization coefficient alpha is set to be 1, so that the intelligent agent explores more actions in the initial stage to prevent from falling into the local optimal solution, the alpha is updated while the network parameters are updated, and the alpha is updated along with iterationAnd the algorithm gradually converges to the optimal solution when the times are increased. As shown in fig. 3, in each time slot, the agent outputs an action a (t), namely the flight direction and distance of the drone, according to the observed state information s (t), and the user equipment selects local calculation or offloading calculation; if the flight distance of the unmanned aerial vehicle is greater than the maximum distance dmaxLet d (t) be dmax(ii) a And if the next position of the unmanned aerial vehicle exceeds the specified area, canceling the flight action. Obtaining corresponding current return r (t) and state s (t +1) at the next moment according to the current action, and converting s (t), a (t), r (t), s (t +1)]And storing the network parameters in an experience buffer, and randomly sampling K groups of experiences from the experience buffer at the end of each time to update the network parameters. The SAC algorithm comprises a parameterized Actor network from which a strategy pi is outputφ(. s (t)), namely inputting the state information s (t) to the Actor network and outputting the corresponding action a (t) to piφ(. s (t)); in addition, two parameterized Critic networks, also called Q networks, input the status information s (t) of the Actor network and the corresponding obtained action a (t) are jointly input into the first Critic network and the second Critic network, and the obtained Q values are respectively output
Figure BDA0003385009630000141
Is selected to be smaller
Figure BDA0003385009630000142
The method is used for judging the performance of the Actor network and preventing overestimation. Wherein phi and thetaiRespectively representing the parameters of the Actor network and the Critic network. Similar to other DRL algorithms, the SAC algorithm also sets an experience buffer for training deep neural network parameters, and also sets a target network and soft update. The objective functions are copies of the first and second critical networks respectively,
Figure BDA0003385009630000143
denotes a target Q value, θ'iRepresenting parameters of the first and second critical target networks. "Soft" update means that the parameters of the target network are updated by slowly tracking the trained network parameters, i.e., < ← τ φ + (1- τ) φ',θi←τθi+(1-τ)θ′iwherein tau is less than or equal to 1. The difference is that the actions for updating the Actor network and Critic network are from the current policy and are not sampled from the experience buffer.
As shown in fig. 4, scheduling optimization is performed by taking a single drone service and 40 pieces of user equipment as examples, and a trajectory of a drone under different scheduling optimization methods is shown in the figure; the difference is that the trajectory 2 is the flight trajectory of the unmanned aerial vehicle which is only optimized to be scheduled by the user equipment, and the trajectory 3 is the flight trajectory of the unmanned aerial vehicle which is scheduled by the random user equipment; where the triangle represents track 1, the diamond represents track 2, the square represents track 3, and track 2 coincides with track 3; as shown in fig. 5, a comparison graph of the average energy consumption of the user equipments for three scheduling methods is shown, where a triangle represents the average energy consumption of the user equipments for jointly optimizing the trajectory of the drone and for scheduling the user equipments provided by the embodiment, a circle represents the average energy consumption of the user equipments for optimizing only scheduling the user equipments, and a square represents the average energy consumption of the user equipments for scheduling the random user equipments; based on different functions of different user equipment in reality, the calculation task size is randomly generated in the embodiment, and the maximum service number K of the unmanned aerial vehicles is 3; as can be seen from the figure, the average energy consumption of the user equipment in the scheduling optimization method for jointly optimizing the unmanned aerial vehicle trajectory and the user equipment scheduling is much smaller than that in the method for only optimizing the user equipment scheduling and the method for randomly scheduling the user equipment.
Example 3
The present embodiment provides a scheduling optimization system for unmanned aerial vehicle-assisted mobile edge computation, as shown in fig. 6, including:
the model building module is used for building an unloading model of the mobile edge computing system, and the model comprises an unmanned aerial vehicle and a plurality of user equipment;
the energy consumption calculation module is used for obtaining the energy consumption of the calculation task according to the unloading model of the mobile edge calculation system;
the optimization problem establishing module is used for establishing an optimization problem combining the unmanned aerial vehicle track and the user equipment scheduling by taking the average energy consumption minimization of the user equipment as a target;
the optimization problem transformation module is used for transforming the optimization problem into a Markov decision process and defining a state space, an action space and a return function of an unloading model of the mobile edge computing system;
the network construction training module is used for constructing a deep neural network based on a deep reinforcement learning algorithm, and training the deep neural network by using a state space, an action space and a return function to obtain a trained deep neural network;
and the scheduling optimization module performs scheduling optimization by using the trained deep neural network to obtain an optimal scheduling strategy, namely a selection strategy of the flight path of the unmanned aerial vehicle and the user equipment.
The terms describing positional relationships in the drawings are for illustrative purposes only and are not to be construed as limiting the patent;
it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1.一种无人机辅助移动边缘计算的调度优化方法,其特征在于,包括:1. a scheduling optimization method of UAV-assisted mobile edge computing, is characterized in that, comprises: S1:构建移动边缘计算系统的卸载模型,所述模型包括一个无人机和若干个用户设备;S1: Build an offloading model of the mobile edge computing system, the model includes a drone and several user equipments; S2:根据移动边缘计算系统的卸载模型,获得计算任务的能耗;S2: Obtain the energy consumption of the computing task according to the unloading model of the mobile edge computing system; S3:以用户设备的平均能耗最小化为目标,建立联合无人机轨迹和用户设备调度的优化问题;S3: To minimize the average energy consumption of user equipment, establish an optimization problem of joint UAV trajectory and user equipment scheduling; S4:将所述优化问题转化为马尔科夫决策过程,定义移动边缘计算系统卸载模型的状态空间、动作空间和回报函数;S4: Convert the optimization problem into a Markov decision process, and define the state space, action space and reward function of the mobile edge computing system unloading model; S5:基于SAC算法构建深度神经网络,利用状态空间、动作空间和回报函数对深度神经网络进行训练,获得训练好的深度神经网络;S5: Construct a deep neural network based on the SAC algorithm, and use the state space, action space and reward function to train the deep neural network to obtain a trained deep neural network; S6:利用训练好的深度神经网络进行调度优化,获得最优调度策略,即无人机飞行轨迹和用户设备的选择策略。S6: Use the trained deep neural network to perform scheduling optimization to obtain the optimal scheduling strategy, that is, the selection strategy of the UAV flight trajectory and user equipment. 2.根据权利要求1所述的无人机辅助移动边缘计算的调度优化方法,其特征在于,所述步骤S1中,构建的移动边缘计算系统的卸载模型具体为:2. The scheduling optimization method of UAV-assisted mobile edge computing according to claim 1, characterized in that, in the step S1, the unloading model of the constructed mobile edge computing system is specifically: 移动边缘计算系统的卸载模型包括单个无人机和N个用户设备,无人机最多同时服务K个用户设备,每个用户设备选择将计算任务由本地计算或者卸载至无人机计算;设定无人机的飞行区域的长度和宽度分别为Xmax和Ymax,无人机在固定高度h以v(t)恒定速度飞行,天线发射角度为θ,飞行最大速度为vmax;无人机的飞行时间为T个时隙,每个时隙长度为τ,在任意时刻完成计算任务的时间不能超过最大时延TmaxThe offload model of the mobile edge computing system includes a single drone and N user devices. The drone can serve up to K user devices at the same time, and each user device chooses to offload computing tasks from local computing or offload to drone computing; set The length and width of the flying area of the UAV are X max and Y max respectively, the UAV flies at a constant speed of v(t) at a fixed height h, the antenna launch angle is θ, and the maximum flying speed is v max ; The flight time of T is T timeslots, and the length of each timeslot is τ, and the time to complete the calculation task at any time cannot exceed the maximum delay T max ; 将无人机的坐标表示为[X(t),Y(t),h],用户设备的坐标表示为[xi(t),yi(t),0],i∈{1,2,…,N};设定无人机在t时刻的飞行距离和水平方向角度分别为d(t)和θh(t),则X(t)=X(t-1)+d(t)cos(θh(t)),Y(t)=Y(t-1)+d(t)sin(θh(t));无人机的最大覆盖范围为Rmax=h·tan(θ),飞行速度为
Figure FDA0003385009620000011
Denote the coordinates of the UAV as [X(t), Y(t), h], and the coordinates of the user equipment as [x i (t), y i (t), 0], i ∈ {1, 2 ,...,N}; set the flight distance and horizontal angle of the UAV at time t as d(t) and θh(t), respectively, then X(t)=X(t-1)+d(t) cos(θh(t)), Y(t)=Y(t-1)+d(t)sin(θ h (t)); the maximum coverage area of the UAV is R max = h tan(θ) , the flight speed is
Figure FDA0003385009620000011
定义t时刻的计算任务为:The computing task at time t is defined as: Ii(t)={Di(t),Fi(t)}I i (t)={D i (t), F i (t)} 式中,Di(t)表示选择卸载计算t时刻的计算任务时的数据传输量,Fi(t)表示完成t时刻的计算任务所需的计算能力;In the formula, D i (t) represents the amount of data transmission when the computing task at time t is selected to be unloaded, and F i (t) represents the computing power required to complete the computing task at time t; 定义αi(t)∈{0,1}表示用户设备的选择策略,αi(t)=0时表示t时刻的计算任务本地计算,αi(t)=1时表示t时刻的计算任务卸载计算。Definition α i (t)∈{0, 1} represents the selection strategy of the user equipment, when α i (t)=0, it represents the local calculation of the computing task at time t, and when α i (t)=1, it represents the computing task at time t Offload calculation.
3.根据权利要求2所述的无人机辅助移动边缘计算的调度优化方法,其特征在于,所述步骤S2中,根据移动边缘计算系统的卸载模型,获得计算任务所消耗的能量包括:3. The scheduling optimization method of unmanned aerial vehicle-assisted mobile edge computing according to claim 2, characterized in that, in the step S2, according to the unloading model of the mobile edge computing system, obtaining the energy consumed by the computing task comprises: 用户设备选择卸载计算,即αi(t)=1;此时该用户设备与无人机的水平面上的距离为:The user equipment selects the unloading calculation, that is, α i (t)=1; at this time, the distance between the user equipment and the UAV on the horizontal plane is:
Figure FDA0003385009620000021
Figure FDA0003385009620000021
则卸载计算时上行链路速率为:Then the uplink rate during offloading calculation is:
Figure FDA0003385009620000022
Figure FDA0003385009620000022
式中,B表示通信信道的平均带宽,PTr表示用户设备数据卸载的传输功率,ρ表示传输功率系数;In the formula, B represents the average bandwidth of the communication channel, P Tr represents the transmission power of user equipment data unloading, and ρ represents the transmission power coefficient; 用户设备传输计算任务的时间开销为:The time overhead for the user equipment to transmit computing tasks is:
Figure FDA0003385009620000023
Figure FDA0003385009620000023
无人机处理计算任务的时间开销为:The time overhead of UAV processing computing tasks is:
Figure FDA0003385009620000024
Figure FDA0003385009620000024
式中,fU(t)表示无人机的计算能力;In the formula, f U (t) represents the computing power of the UAV; 则用户设备选择卸载计算的总时间开销为:Then the total time overhead for the user equipment to choose to uninstall the calculation is:
Figure FDA0003385009620000025
Figure FDA0003385009620000025
用户设备选择卸载计算的能耗为:The energy consumption when the user equipment chooses to offload the calculation is:
Figure FDA0003385009620000026
Figure FDA0003385009620000026
式中,
Figure FDA0003385009620000027
表示第i个用户设备选择卸载计算的能耗。
In the formula,
Figure FDA0003385009620000027
Indicates the energy consumption of the i-th user equipment that chooses to offload computing.
4.根据权利要求3所述的无人机辅助移动边缘计算的调度优化方法,其特征在于,所述步骤S2中,根据移动边缘计算系统的卸载模型,获得计算任务所消耗的能量还包括:4. The scheduling optimization method of unmanned aerial vehicle-assisted mobile edge computing according to claim 3, wherein in the step S2, according to the unloading model of the mobile edge computing system, obtaining the energy consumed by the computing task also comprises: 用户设备选择本地计算,即αi(t)=0;The user equipment selects local calculation, that is, α i (t)=0; 用户设备处理计算任务的时间开销为:The time overhead for the user equipment to process computing tasks is:
Figure FDA0003385009620000031
Figure FDA0003385009620000031
式中,
Figure FDA00033850096200000318
表示用户设备的计算能力;
In the formula,
Figure FDA00033850096200000318
Indicates the computing power of the user equipment;
将用户设备的功耗设定为
Figure FDA0003385009620000032
则用户设备选择本地计算的能耗为:
Set the power consumption of the user equipment to
Figure FDA0003385009620000032
Then the energy consumption of the user equipment to choose local computing is:
Figure FDA0003385009620000033
Figure FDA0003385009620000033
式中,ki为第一常数,vi为第二常数。In the formula, k i is the first constant, and vi is the second constant.
5.根据权利要求4所述的无人机辅助移动边缘计算的调度优化方法,其特征在于,所述步骤S3中,以用户设备的平均能耗最小化为目标,建立联合无人机轨迹和用户设备调度的优化问题,具体为:5. the scheduling optimization method of UAV-assisted mobile edge computing according to claim 4, is characterized in that, in described step S3, take the average energy consumption of user equipment to be minimized as the goal, establish joint UAV trajectory and The optimization problem of user equipment scheduling, specifically: 定义飞行动作集合
Figure FDA0003385009620000034
用户设备调度策略集合
Figure FDA0003385009620000035
则优化问题P表示为:
Define the flight action set
Figure FDA0003385009620000034
User Equipment Scheduling Policy Set
Figure FDA0003385009620000035
Then the optimization problem P is expressed as:
P:
Figure FDA0003385009620000036
P:
Figure FDA0003385009620000036
Figure FDA0003385009620000037
Figure FDA0003385009620000037
Figure FDA0003385009620000038
Figure FDA0003385009620000038
Figure FDA0003385009620000039
Figure FDA0003385009620000039
Figure FDA00033850096200000310
Figure FDA00033850096200000310
Figure FDA00033850096200000311
Figure FDA00033850096200000311
Figure FDA00033850096200000312
Figure FDA00033850096200000312
Figure FDA00033850096200000313
Figure FDA00033850096200000313
Figure FDA00033850096200000314
Figure FDA00033850096200000314
其中,Ei(t)表示用户设备的能耗,当αi(t)=1时,
Figure FDA00033850096200000315
当αi(t)=0时,
Figure FDA00033850096200000316
Figure FDA00033850096200000317
表示约束无人机最多同时服务K个用户设备,αi(t)Si(t)≤Rmax表示约束选择卸载计算的用户设备在无人机的最大覆盖范围中。
Among them, E i (t) represents the energy consumption of the user equipment, when α i (t)=1,
Figure FDA00033850096200000315
When α i (t)=0,
Figure FDA00033850096200000316
Figure FDA00033850096200000317
Represents the constraint that the UAV serves at most K user equipments at the same time, and α i (t)S i (t)≤R max indicates that the user equipment that is constrained to select the offload calculation is in the maximum coverage of the UAV.
6.根据权利要求5所述的无人机辅助移动边缘计算的调度优化方法,其特征在于,所述步骤S4中,设计的移动边缘计算系统卸载模型的状态空间和动作空间具体为:6. The scheduling optimization method of UAV-assisted mobile edge computing according to claim 5, characterized in that, in the step S4, the state space and the action space of the designed mobile edge computing system unloading model are specifically: 在移动边缘计算系统的卸载模型中,无人机与用户设备相当于一个智能体,在每个时隙,智能体从环境中观测获得当前状态s(t),当前状态s(t)对应当前动作a(t),无人机执行动作空间中的当前动作a(t),与环境进行交互,环境返还当前回报r(t)和新状态s(t+1);In the unloading model of the mobile edge computing system, the UAV and the user equipment are equivalent to an agent. In each time slot, the agent obtains the current state s(t) from the environment observation, and the current state s(t) corresponds to the current state. Action a(t), the drone performs the current action a(t) in the action space, interacts with the environment, and the environment returns the current return r(t) and the new state s(t+1); 对于状态空间,在每个时隙中,用户设备的位置是固定的,只需考虑无人机的位置信息;以及每个飞行周期结束,无人机都需到达特定目的地,设定无人机与特定目的地的距离为d′(t),则状态空间中,当前状态表达式为s(t)={X(t),Y(t),h,d′(t)};For the state space, in each time slot, the location of the user equipment is fixed, and only the location information of the UAV needs to be considered; and at the end of each flight cycle, the UAV needs to reach a specific destination, and set the unmanned The distance between the machine and the specific destination is d'(t), then in the state space, the current state expression is s(t)={X(t), Y(t), h, d'(t)}; 对于动作空间,根据无人机飞行距离d(t)和水平方向角度θh(t),计算无人机下一时刻的位置坐标[X(t+1),Y(t+1),h],以及用户设备的选择策略,则动作空间中,当前动作表达式为a(t)={θh(t),d(t),αi(t)}。For the action space, according to the UAV flight distance d(t) and the horizontal angle θ h (t), calculate the position coordinates of the UAV at the next moment [X(t+1), Y(t+1), h ], and the selection strategy of the user equipment, then in the action space, the current action expression is a(t)={θ h (t), d(t), α i (t)}. 7.根据权利要求6所述的无人机辅助移动边缘计算的调度优化方法,其特征在于,所述步骤S4中,设计的移动边缘计算系统的卸载模型的回报函数具体为:7. The scheduling optimization method of UAV-assisted mobile edge computing according to claim 6, characterized in that, in the step S4, the reward function of the unloading model of the designed mobile edge computing system is specifically: 回报函数用于评估智能体在当前状态下采取的动作的好坏,具体为:The reward function is used to evaluate the quality of the action taken by the agent in the current state, specifically: r(t)=Rerengy+Rdes+Pout+Pspeed r(t)=R erengy +R des +P out +P speed 式中,r(t)表示当前回报,Rerengy表示优化问题的回报,Rdes表示无人机飞回特定目的地的回报,Rdes=k/d′(t),k为奖励因子;Pout表示无人机飞出飞行区域的惩罚,Pspeed表示无人机飞行超速的惩罚。In the formula, r(t) represents the current reward, Reengy represents the reward of the optimization problem, R des represents the reward of the drone flying back to a specific destination, R des =k/d'(t), k is the reward factor; P out represents the penalty for the drone flying out of the flight area, and P speed represents the penalty for the drone flying too fast. 8.根据权利要求7所述的无人机辅助移动边缘计算的调度优化方法,其特征在于,所述步骤S5中,构建的深度神经网络包括经验缓冲区、Actor网络、第一Critic网络、第二Critic网络、第一Critic目标网络和第二Critic目标网络;8. The scheduling optimization method of UAV-assisted mobile edge computing according to claim 7, wherein in the step S5, the constructed deep neural network comprises an experience buffer, an Actor network, the first Critic network, the first Critic network, the third Two Critic networks, a first Critic target network and a second Critic target network; 在每个时隙中,Actor网络的输入为当前状态s(t),输出相应的当前动作a(t),得到当前调度策略πφ;第一Critic网络和第二Critic网络的输入均为当前状态s(t)和当前动作a(t),分别输出Q值;无人机执行当前动作a(t)后,生成新状态s(t+1),并获得当前回报r(t),将[s(t),a(t),r(t),s(t+1)]储存在经验缓冲区中;第一Critic目标网络和第二Critic目标网络分别作为第一Critic网络、第二Critic网络的副本,设置目标函数,选择两个Q值中较小的Q值来计算目标值,用于更新第一Critic网络、第二Critic网络的网络参数;该时隙结束时根据当前调度策略对Actor网络和Critic网络的网络参数进行实时更新,从经验缓冲区中随机采样对Critic目标网络的网络参数进行更新;In each time slot, the input of the Actor network is the current state s(t), and the corresponding current action a(t) is output to obtain the current scheduling policy π φ ; the inputs of the first Critic network and the second Critic network are current The state s(t) and the current action a(t), respectively output the Q value; after the drone performs the current action a(t), a new state s(t+1) is generated, and the current return r(t) is obtained. [s(t), a(t), r(t), s(t+1)] are stored in the experience buffer; the first Critic target network and the second Critic target network are used as the first Critic network and the second Critic target network, respectively. A copy of the Critic network, set the objective function, and select the smaller Q value of the two Q values to calculate the target value, which is used to update the network parameters of the first Critic network and the second Critic network; at the end of the time slot, according to the current scheduling strategy Update the network parameters of the Actor network and the Critic network in real time, and update the network parameters of the Critic target network by random sampling from the experience buffer; Actor网络的损失函数为:The loss function of the Actor network is:
Figure FDA0003385009620000051
Figure FDA0003385009620000051
第一Critic网络和第二Critic网络的损失函数为:The loss functions of the first Critic network and the second Critic network are:
Figure FDA0003385009620000052
Figure FDA0003385009620000052
第一Critic目标网络和第二Critic目标网络的目标函数为:The objective functions of the first Critic target network and the second Critic target network are:
Figure FDA0003385009620000053
Figure FDA0003385009620000053
式中,φ表示Actor网络的网络参数,θi表示第i个Critic网络的网络参数,
Figure FDA0003385009620000054
表示第i个Critic网络的Q值;i=1时,θ1表示第一Critic网络的网络参数,
Figure FDA0003385009620000055
表示第一Critic网络的Q值;i=2时,θ2表示第二Critic网络的网络参数,
Figure FDA0003385009620000056
表示第二Critic网络的Q值;
Figure FDA0003385009620000057
表示根据当前调度策略πφ计算获得的新动作;
Figure FDA0003385009620000058
表示目标值,α表示熵正则化系;
Figure FDA0003385009620000059
表示第i个Critic目标网络的Q值,i=1时,
Figure FDA00033850096200000510
表示第一Critic目标网络的Q值,
Figure FDA00033850096200000511
表示第二Critic目标网络的Q值。
In the formula, φ represents the network parameters of the Actor network, θ i represents the network parameters of the i-th Critic network,
Figure FDA0003385009620000054
Represents the Q value of the i-th Critic network; when i=1, θ 1 represents the network parameters of the first Critic network,
Figure FDA0003385009620000055
Represents the Q value of the first Critic network; when i=2, θ 2 represents the network parameters of the second Critic network,
Figure FDA0003385009620000056
Represents the Q value of the second Critic network;
Figure FDA0003385009620000057
represents the new action calculated according to the current scheduling policy π φ ;
Figure FDA0003385009620000058
represents the target value, and α represents the entropy regularization system;
Figure FDA0003385009620000059
Represents the Q value of the i-th Critic target network, when i=1,
Figure FDA00033850096200000510
represents the Q value of the first Critic target network,
Figure FDA00033850096200000511
Represents the Q value of the second Critic target network.
9.根据权利要求8所述的无人机辅助移动边缘计算的调度优化方法,其特征在于,构建的深度神经网络的最优调度策略表达式为:9. The scheduling optimization method of UAV-assisted mobile edge computing according to claim 8, wherein the optimal scheduling strategy expression of the constructed deep neural network is:
Figure FDA00033850096200000512
Figure FDA00033850096200000512
式中,π*表示最优调度策略,α表示熵正则化系数,πφ表示调度策略,γ表示折扣因子;H表示熵,计算方法为:H(πφ(·|s(t)))=E[-logπφ(·|s(t))]。In the formula, π* represents the optimal scheduling strategy, α represents the entropy regularization coefficient, π φ represents the scheduling strategy, and γ represents the discount factor; H represents the entropy, and the calculation method is: H(π φ (·|s(t))) =E[ -logπφ (·|s(t))].
10.一种无人机辅助移动边缘计算的调度优化系统,其特征在于,包括:10. A scheduling optimization system for UAV-assisted mobile edge computing, characterized in that it comprises: 模型构建模块,用于构建移动边缘计算系统的卸载模型,所述模型包括一个无人机和若干个用户设备;a model building module for building an offloading model of the mobile edge computing system, the model including a drone and several user equipments; 能耗计算模块,根据移动边缘计算系统的卸载模型,获得计算任务的能耗;The energy consumption calculation module obtains the energy consumption of the computing task according to the unloading model of the mobile edge computing system; 优化问题建立模块,用于以用户设备的平均能耗最小化为目标,建立联合无人机轨迹和用户设备调度的优化问题;The optimization problem establishment module is used to establish the optimization problem of joint UAV trajectory and user equipment scheduling with the goal of minimizing the average energy consumption of user equipment; 优化问题转化模块,用于将所述优化问题转化为马尔科夫决策过程,定义移动边缘计算系统卸载模型的状态空间、动作空间和回报函数;an optimization problem conversion module, which is used to convert the optimization problem into a Markov decision process, and define the state space, action space and reward function of the mobile edge computing system unloading model; 网络构建训练模块,基于SAC算法构建深度神经网络,利用状态空间、动作空间和回报函数对深度神经网络进行训练,获得训练好的深度神经网络;Network construction training module, build deep neural network based on SAC algorithm, use state space, action space and reward function to train deep neural network to obtain a trained deep neural network; 调度优化模块,利用训练好的深度神经网络进行调度优化,获得最优调度策略,即无人机飞行轨迹和用户设备的选择策略。The scheduling optimization module uses the trained deep neural network for scheduling optimization to obtain the optimal scheduling strategy, that is, the selection strategy of the UAV flight trajectory and user equipment.
CN202111449863.3A 2021-11-30 2021-11-30 Scheduling optimization method and system for unmanned aerial vehicle auxiliary mobile edge calculation Active CN114169234B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111449863.3A CN114169234B (en) 2021-11-30 2021-11-30 Scheduling optimization method and system for unmanned aerial vehicle auxiliary mobile edge calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111449863.3A CN114169234B (en) 2021-11-30 2021-11-30 Scheduling optimization method and system for unmanned aerial vehicle auxiliary mobile edge calculation

Publications (2)

Publication Number Publication Date
CN114169234A true CN114169234A (en) 2022-03-11
CN114169234B CN114169234B (en) 2024-10-25

Family

ID=80481862

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111449863.3A Active CN114169234B (en) 2021-11-30 2021-11-30 Scheduling optimization method and system for unmanned aerial vehicle auxiliary mobile edge calculation

Country Status (1)

Country Link
CN (1) CN114169234B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114698125A (en) * 2022-06-02 2022-07-01 北京建筑大学 Compute offload optimization method, device and system for mobile edge computing network
CN114840021A (en) * 2022-04-28 2022-08-02 中国人民解放军国防科技大学 Trajectory planning method, device, equipment and medium for data collection by unmanned aerial vehicle
CN114896072A (en) * 2022-06-02 2022-08-12 深圳市芯中芯科技有限公司 Unmanned aerial vehicle-assisted mobile edge calculation optimization method based on deep reinforcement learning
CN115334165A (en) * 2022-07-11 2022-11-11 西安交通大学 Underwater multi-unmanned platform scheduling method and system based on deep reinforcement learning
CN115361089A (en) * 2022-07-08 2022-11-18 国网江苏省电力有限公司电力科学研究院 Data security communication method, system and device of power Internet of things and storage medium
CN116017472A (en) * 2022-12-07 2023-04-25 中南大学 UAV trajectory planning and resource allocation method for emergency network
CN116451934A (en) * 2023-03-16 2023-07-18 中国人民解放军国防科技大学 Multi-unmanned aerial vehicle edge calculation path optimization and dependent task scheduling optimization method and system
CN117553803A (en) * 2024-01-09 2024-02-13 大连海事大学 A multi-UAV intelligent path planning method based on deep reinforcement learning
CN117970952A (en) * 2024-03-28 2024-05-03 中国人民解放军海军航空大学 Unmanned aerial vehicle maneuver strategy offline modeling method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021034050A (en) * 2019-08-21 2021-03-01 哈爾浜工程大学 AUV action plan and motion control method based on reinforcement learning
CN112911648A (en) * 2021-01-20 2021-06-04 长春工程学院 Air-ground combined mobile edge calculation unloading optimization method
CN113346944A (en) * 2021-06-28 2021-09-03 上海交通大学 Time delay minimization calculation task unloading method and system in air-space-ground integrated network
CN113395654A (en) * 2021-06-11 2021-09-14 广东工业大学 Method for task unloading and resource allocation of multiple unmanned aerial vehicles of edge computing system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021034050A (en) * 2019-08-21 2021-03-01 哈爾浜工程大学 AUV action plan and motion control method based on reinforcement learning
CN112911648A (en) * 2021-01-20 2021-06-04 长春工程学院 Air-ground combined mobile edge calculation unloading optimization method
CN113395654A (en) * 2021-06-11 2021-09-14 广东工业大学 Method for task unloading and resource allocation of multiple unmanned aerial vehicles of edge computing system
CN113346944A (en) * 2021-06-28 2021-09-03 上海交通大学 Time delay minimization calculation task unloading method and system in air-space-ground integrated network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
宋强;王昆仑;黄祚继;董丹丹;张蕊;: "浅议低空无人机在河道管理中应用研究", 治淮, no. 05, 15 May 2019 (2019-05-15) *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114840021A (en) * 2022-04-28 2022-08-02 中国人民解放军国防科技大学 Trajectory planning method, device, equipment and medium for data collection by unmanned aerial vehicle
CN114896072A (en) * 2022-06-02 2022-08-12 深圳市芯中芯科技有限公司 Unmanned aerial vehicle-assisted mobile edge calculation optimization method based on deep reinforcement learning
CN114698125A (en) * 2022-06-02 2022-07-01 北京建筑大学 Compute offload optimization method, device and system for mobile edge computing network
CN115361089A (en) * 2022-07-08 2022-11-18 国网江苏省电力有限公司电力科学研究院 Data security communication method, system and device of power Internet of things and storage medium
CN115334165A (en) * 2022-07-11 2022-11-11 西安交通大学 Underwater multi-unmanned platform scheduling method and system based on deep reinforcement learning
CN115334165B (en) * 2022-07-11 2023-10-17 西安交通大学 An underwater multi-unmanned platform scheduling method and system based on deep reinforcement learning
CN116017472B (en) * 2022-12-07 2024-04-19 中南大学 UAV trajectory planning and resource allocation methods for emergency networks
CN116017472A (en) * 2022-12-07 2023-04-25 中南大学 UAV trajectory planning and resource allocation method for emergency network
CN116451934A (en) * 2023-03-16 2023-07-18 中国人民解放军国防科技大学 Multi-unmanned aerial vehicle edge calculation path optimization and dependent task scheduling optimization method and system
CN116451934B (en) * 2023-03-16 2024-02-06 中国人民解放军国防科技大学 Multi-UAV edge computing path optimization and dependent task scheduling optimization methods and systems
CN117553803A (en) * 2024-01-09 2024-02-13 大连海事大学 A multi-UAV intelligent path planning method based on deep reinforcement learning
CN117553803B (en) * 2024-01-09 2024-03-19 大连海事大学 Multi-unmanned aerial vehicle intelligent path planning method based on deep reinforcement learning
CN117970952A (en) * 2024-03-28 2024-05-03 中国人民解放军海军航空大学 Unmanned aerial vehicle maneuver strategy offline modeling method
CN117970952B (en) * 2024-03-28 2024-06-04 中国人民解放军海军航空大学 Offline modeling method for UAV maneuvering strategy

Also Published As

Publication number Publication date
CN114169234B (en) 2024-10-25

Similar Documents

Publication Publication Date Title
CN114169234A (en) A scheduling optimization method and system for UAV-assisted mobile edge computing
CN111787509B (en) Reinforcement learning-based UAV task offloading method and system in edge computing
CN113395654A (en) Method for task unloading and resource allocation of multiple unmanned aerial vehicles of edge computing system
CN116451934B (en) Multi-UAV edge computing path optimization and dependent task scheduling optimization methods and systems
CN113660681A (en) Multi-agent resource optimization method applied to unmanned aerial vehicle cluster auxiliary transmission
CN117055619A (en) Unmanned aerial vehicle scheduling method based on multi-agent reinforcement learning
CN117041129A (en) Low-orbit satellite network flow routing method based on multi-agent reinforcement learning
Wang et al. Curriculum reinforcement learning-based computation offloading approach in space-air-ground integrated network
CN116634498A (en) Multi-level offloading method for edge computing of low-orbit satellite constellation network based on reinforcement learning
Song et al. Energy-efficient trajectory optimization with wireless charging in UAV-assisted MEC based on multi-objective reinforcement learning
Xie et al. Computation offloading and resource allocation in leo satellite-terrestrial integrated networks with system state delay
CN114513814A (en) Edge network computing resource dynamic optimization method based on unmanned aerial vehicle auxiliary node
Tan et al. Communication-assisted multi-agent reinforcement learning improves task-offloading in UAV-aided edge-computing networks
CN117580105B (en) An optimization method for unmanned aerial vehicle task offloading for power grid inspection
CN118574156A (en) Unmanned plane assisted unmanned ship task unloading method based on deep reinforcement learning
Gao et al. Multi-UAV assisted offloading optimization: A game combined reinforcement learning approach
CN117236561A (en) SAC-based multi-unmanned aerial vehicle auxiliary mobile edge computing method, device and storage medium
CN116489610A (en) UAV-assisted wearable Internet of Things device charging and data processing method and system
CN116578354A (en) Method and device for unloading edge calculation tasks of electric power inspection unmanned aerial vehicle
CN116828539A (en) Joint computing transfer and UAV trajectory optimization method based on deep reinforcement learning
CN116704823A (en) Intelligent trajectory planning and synaesthesia resource allocation method for UAV based on reinforcement learning
CN116774584A (en) Unmanned aerial vehicle differentiated service track optimization method based on multi-agent deep reinforcement learning
CN114727323A (en) Unmanned aerial vehicle base station control method and device and model training method and device
CN116669069B (en) A joint dynamic decision-making method for cell association, trajectory planning and offloading scheduling
Linpei et al. Energy-efficient computation offloading assisted by RIS-based UAV

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant