CN114169234A - Scheduling optimization method and system for unmanned aerial vehicle-assisted mobile edge calculation - Google Patents

Scheduling optimization method and system for unmanned aerial vehicle-assisted mobile edge calculation Download PDF

Info

Publication number
CN114169234A
CN114169234A CN202111449863.3A CN202111449863A CN114169234A CN 114169234 A CN114169234 A CN 114169234A CN 202111449863 A CN202111449863 A CN 202111449863A CN 114169234 A CN114169234 A CN 114169234A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
network
user equipment
mobile edge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111449863.3A
Other languages
Chinese (zh)
Other versions
CN114169234B (en
Inventor
张广驰
何梓楠
崔苗
刘圣海
王日明
王昆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202111449863.3A priority Critical patent/CN114169234B/en
Priority claimed from CN202111449863.3A external-priority patent/CN114169234B/en
Publication of CN114169234A publication Critical patent/CN114169234A/en
Application granted granted Critical
Publication of CN114169234B publication Critical patent/CN114169234B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a scheduling optimization method and a system for unmanned aerial vehicle assisted mobile edge computing, which relate to the technical field of unmanned aerial vehicle mobile edge computing, wherein the method comprises the steps of constructing an unloading model of a mobile edge computing system of an unmanned aerial vehicle and a plurality of user equipment, and computing the energy consumption for completing each computing task; the optimization problem combining unmanned aerial vehicle track and user equipment scheduling is established by taking the average energy consumption minimization of the user equipment as a target, the optimization problem is converted into a Markov decision process, the state space, the action space and the return function of an unloading model of a mobile edge computing system are defined and are used for training a deep neural network constructed based on a SAC algorithm, the scheduling optimization can be carried out by utilizing the trained deep neural network, an optimal scheduling strategy is obtained, the continuous action of the unmanned aerial vehicle can be planned, the reasonable and accurate flight track and the selection strategy of the user equipment are obtained, the complexity is low, the convergence is strong, and the average computing energy consumption of the user equipment is reduced.

Description

Scheduling optimization method and system for unmanned aerial vehicle-assisted mobile edge calculation
Technical Field
The invention relates to the technical field of unmanned aerial vehicle mobile edge calculation, in particular to a scheduling optimization method and system for unmanned aerial vehicle-assisted mobile edge calculation.
Background
The rapid development of the internet of things computing promotes the popularization of computing task intensive intelligent equipment, such as automatic driving, virtual reality and the like, so that the life is more convenient and faster. At this stage, although mobile devices are equipped with powerful hardware, significant power consumption is still required to complete the computing task of mobile applications while meeting the requirement of low latency. In recent years, Mobile Edge Computing (MEC) has been proposed to overcome this drawback, i.e. the computing task of the user equipment is transferred to the network edge for computing, which greatly reduces the energy consumption of the equipment. Recently, the use of the MEC carried by the unmanned aerial vehicle is widely discussed in the industrial and academic fields, and the coverage capability and mobility of the unmanned aerial vehicle are utilized to achieve lower delay requirements, provide more flexible computing services and reduce cost. Unmanned aerial vehicle carries on MEC has following problem: (1) how to select proper device association, i.e. the user device selects to offload or locally process the computing task, so as to reduce the long-term energy consumption of all user devices as much as possible; (2) considering how the user equipment calculates different tasks and controls the flight trajectory, i.e. flight direction and distance, of the unmanned aerial vehicle in real time, especially in the case that the unmanned aerial vehicle needs to reach a specific end point. For the combined use method of the MEC and the unmanned aerial vehicle, a plurality of scholars have already made relevant researches. Because the strategy space of the unmanned aerial vehicle, namely the optimal track, is a continuous space, the traditional exhaustive search method is difficult to solve. Some scholars propose a quantitative dynamic planning algorithm to solve the resource allocation problem of the MEC, and the complexity of the algorithm is very high because the flight selection of the unmanned aerial vehicle is almost infinite; or the unmanned aerial vehicle track is discretized into an unmanned aerial vehicle position sequence, and a continuous space is converted into a discrete finite space, so that the problem can be treated, the unmanned aerial vehicle track is approximated through a discrete variable, and the optimization is performed through a traditional convex optimization method, but the method can reduce the control precision of the unmanned aerial vehicle, and the optimal control strategy cannot be obtained.
The prior art discloses a mobile edge calculation method, a device and a system of the internet of things, which comprise the following steps: distributing the unmanned aerial vehicles for the Internet of things equipment based on the current simulated positions of the unmanned aerial vehicles and the actual positions of the Internet of things equipment in the target Internet of things area; simulating to unload the tasks of the Internet of things equipment to the distributed unmanned aerial vehicles, and simulating each unmanned aerial vehicle to schedule the received tasks based on a deep reinforcement learning algorithm; iteratively updating the current simulation position of each unmanned aerial vehicle by using a differential evolution algorithm, and continuing to execute the operation until the iterative updating times reach a preset threshold value; determining the optimal coordinate position of each unmanned aerial vehicle based on the unmanned aerial vehicle distributed by the Internet of things equipment in each operation, the task scheduling result of the unmanned aerial vehicle and the current simulation position of the unmanned aerial vehicle; and triggering each unmanned aerial vehicle to move to the optimal coordinate position of the unmanned aerial vehicle and scheduling tasks on the corresponding Internet of things equipment. The method comprises the steps that tasks of the Internet of things equipment are unloaded and distributed to the unmanned aerial vehicle, and the tasks are scheduled under the condition that the load of the unmanned aerial vehicle is balanced; only the load balance of the unmanned aerial vehicle is considered, the unmanned aerial vehicle track and the user equipment scheduling strategy are not jointly considered, the track planning is unreasonable, and the computing energy consumption is high.
Disclosure of Invention
The invention aims to overcome the defect that the prior art can not plan the continuous action of the unmanned aerial vehicle and obtain an accurate scheduling strategy, and provides a scheduling optimization method and a scheduling optimization system for the unmanned aerial vehicle auxiliary mobile edge calculation.
In order to solve the technical problems, the technical scheme of the invention is as follows:
the invention provides a scheduling optimization method for unmanned aerial vehicle-assisted mobile edge calculation, which comprises the following steps:
s1: constructing an unloading model of the mobile edge computing system, wherein the model comprises an unmanned aerial vehicle and a plurality of user devices;
s2: obtaining the energy consumption of the calculation task according to the unloading model of the mobile edge calculation system;
s3: establishing an optimization problem combining unmanned aerial vehicle track and user equipment scheduling by taking average energy consumption minimization of the user equipment as a target;
s4: converting the optimization problem into a Markov decision process, and defining a state space, an action space and a return function of an unloading model of the mobile edge computing system;
s5: constructing a deep neural network based on a SAC algorithm, and training the deep neural network by using a state space, an action space and a return function to obtain a trained deep neural network;
s6: and carrying out scheduling optimization by using the trained deep neural network to obtain an optimal scheduling strategy, namely a selection strategy of the flight path of the unmanned aerial vehicle and the user equipment.
The SAC algorithm is an off-line random strategy algorithm based on a maximum entropy reinforcement learning framework and an Actor-Critic network, and is mainly characterized by entropy regularization, wherein entropy is a measure of strategy randomness, and the increase of entropy can bring more strategy exploration, and the expected return and the entropy value are balanced through training strategies, so that the network learning speed can be accelerated, and meanwhile, the strategy convergence to a local optimal solution is avoided; the purpose of the Actor network is to obtain the maximum return expectation and the maximum entropy, i.e. explore other strategies in the strategy space while successfully completing the task; the combination of the network update in an offline mode and the Actor-Critic network achieves good performance on a continuous control reference task, and is more stable and better in convergence.
Preferably, in step S1, the unloading model of the mobile edge computing system is specifically:
unloading module for mobile edge computing systemThe model comprises a single unmanned aerial vehicle and N user devices, wherein the unmanned aerial vehicle simultaneously serves K user devices at most, and each user device selects to calculate a calculation task from local calculation or unload the calculation task to the unmanned aerial vehicle calculation; the length and the width of the flight area of the unmanned aerial vehicle are set to be X respectivelymaxAnd YmaxThe unmanned aerial vehicle flies at a constant speed v (t) at a fixed height h, the antenna emission angle is theta, and the maximum flying speed is vmax(ii) a The flight time of the unmanned aerial vehicle is T time slots, the length of each time slot is tau, and the time for completing the calculation task at any moment cannot exceed the maximum time delay Tmax
Let the coordinates of the unmanned aerial vehicle be [ X (t), Y (t), h]The coordinates of the user equipment are [ x ]i(t),yi(t),0]I ∈ {1, 2, …, N }; setting the flight distance and the horizontal direction angle of the unmanned aerial vehicle at the moment t as d (t) and theta respectivelyh(t), then X (t) X (t-1) + d (t) cos (θ)h(t)),Y(t)=Y(t-1)+d(t)sin(θh(t)); the maximum coverage of the unmanned aerial vehicle is RmaxH · tan (θ), and a flying speed of
Figure BDA0003385009630000031
Defining the calculation task at the moment t as follows:
Ii(t)={Di(t),Fi(t)}
in the formula, Di(t) represents the amount of data transferred when selecting to offload a computing task at time t of computing, Fi(t) represents the computational power required to complete the computational task at time t;
definition of alphai(t) E {0, 1} represents a selection policy of the user equipment, αiWhen (t) is 0, the calculation task local calculation at the time t is shown, alphaiWhen (t) is 1, it indicates that the calculation task at time t is unloaded.
Preferably, in step S2, obtaining the energy consumed by the computing task according to the unloading model of the mobile edge computing system includes:
user equipment selection to offload computation, i.e. alphai(t) ═ 1; this moment the distance on this user equipment and unmanned aerial vehicle's the horizontal plane does:
Figure BDA0003385009630000032
the user equipment is provided with a single antenna, and in order to avoid interference between the user equipment, a frequency division multiple access protocol unloading mode is adopted; because the flying height of the unmanned aerial vehicle is certain, and a free space channel model is adopted, the uplink rate during unloading calculation is as follows:
Figure BDA0003385009630000033
wherein B represents the average bandwidth of the communication channel, PTrRepresenting the transmission power of the user equipment data unloading, and rho represents a transmission power coefficient;
the time overhead for the user equipment to transmit the calculation task is as follows:
Figure BDA0003385009630000041
the time overhead of processing the calculation task by the unmanned aerial vehicle is as follows:
Figure BDA0003385009630000042
in the formula (f)U(t) represents the computational power of the drone;
the total time overhead for the ue to choose to offload the computation is:
Figure BDA0003385009630000043
the energy consumption of the user equipment for selecting the uninstalling calculation is as follows:
Figure BDA0003385009630000044
in the formula (I), the compound is shown in the specification,
Figure BDA0003385009630000045
indicating that the ith user equipment chooses to offload the calculated energy consumption.
Preferably, in step S2, the obtaining the energy consumed by the computing task according to the unloading model of the mobile edge computing system further includes:
the user equipment selecting local calculation, i.e. alphai(t)=0;
The time overhead of the user equipment for processing the computing task is as follows:
Figure BDA0003385009630000046
in the formula (I), the compound is shown in the specification,
Figure BDA0003385009630000047
representing the computing power of the user device;
setting power consumption of a user equipment to
Figure BDA0003385009630000048
The user equipment selects the locally calculated energy consumption as:
Figure BDA0003385009630000049
in the formula, kiIs a first constant, viIs a second constant.
Preferably, in step S3, with the objective of minimizing the average energy consumption of the user equipment, an optimization problem combining the trajectory of the drone and the scheduling of the user equipment is established, specifically:
defining a set of flight actions
Figure BDA00033850096300000410
User equipment scheduling policy set
Figure BDA00033850096300000411
Then optimize the problemP is represented as:
Figure BDA00033850096300000412
Figure BDA0003385009630000051
Figure BDA0003385009630000052
Figure BDA0003385009630000053
Figure BDA0003385009630000054
Figure BDA0003385009630000055
Figure BDA0003385009630000056
Figure BDA0003385009630000057
Figure BDA0003385009630000058
wherein E isi(t) represents the energy consumption of the user equipment, when αiWhen the value (t) is 1,
Figure BDA0003385009630000059
when alpha isiWhen (t) is 0, the reaction is carried out,
Figure BDA00033850096300000510
representing a constraint on an unmanned aerial vehicle to serve at most K user equipments, alpha, simultaneouslyi(t)Si(t)≤RmaxThe user equipment representing the constraint selection offload computation is in the maximum coverage of the drone.
Preferably, in step S4, the state space and the action space of the designed unloading model of the moving edge computing system are specifically:
in an unloading model of the mobile edge computing system, an unmanned aerial vehicle and user equipment are equivalent to an intelligent agent, and in each time slot, the intelligent agent observes and obtains a current state s (t) from an environment, the current state s (t) corresponds to a current action a (t), the unmanned aerial vehicle executes the current action a (t) in an action space, interacts with the environment, and returns a current return r (t) and a new state s (t + 1);
for the state space, in each time slot, the position of the user equipment is fixed, and only the position information of the unmanned aerial vehicle needs to be considered; and when each flight cycle is finished, the unmanned aerial vehicle needs to arrive at a specific destination, and the distance between the unmanned aerial vehicle and the specific destination is set as d '(t), the current state expression in the state space is s (t) ═ x (t), y (t), h, d' (t) };
for the action space, according to the flight distance d (t) of the unmanned aerial vehicle and the horizontal direction angle thetah(t), calculating the position coordinates [ X (t +1), Y (t +1), h of the unmanned aerial vehicle at the next moment]And the selection policy of the user equipment, then in the action space, the current action expression is a (t) ═ θh(t),d(t),αi(t)}。
Preferably, in step S4, the designed reward function of the unloading model of the mobile edge computing system is specifically:
the reward function is used for evaluating the quality of the action taken by the agent in the current state, and specifically comprises the following steps:
r(t)=Rerengy+Rdes+Pout+Pspeed
wherein R (t) represents the current reward, RerengyRepresenting the return of the optimization problem, RdesIndicating that the drone flies back to a particular destinationIn return, RdesK/d' (t), k being the reward factor; poutRepresents a penalty of the unmanned aerial vehicle flying out of the flight area, PspeedAnd the penalty of flying overspeed of the unmanned aerial vehicle is represented.
Preferably, in the step S5, the constructed deep neural network includes an experience buffer, an Actor network, a first Critic network, a second Critic network, a first Critic target network and a second Critic target network;
in each time slot, the input of the Actor network is the current state s (t), and the corresponding current action a (t) is output to obtain the current scheduling strategy piφ(ii) a The input of the first Critic network and the input of the second Critic network are both the current state s (t) and the current action a (t), and Q values are respectively output; after the unmanned plane executes the current action a (t), a new state s (t +1) is generated, and the current return r (t) is obtained, and then [ s (t), a (t), r (t), s (t +1) ]]Storing in an experience buffer; the first Critic target network and the second Critic target network are respectively used as copies of the first Critic network and the second Critic network, an objective function is set, and the smaller Q value of the two Q values is selected to calculate a target value for updating network parameters of the first Critic network and the second Critic network; when the time slot is finished, updating network parameters of the Actor network and the Critic network in real time according to the current scheduling strategy, and randomly sampling from an experience buffer area to update the network parameters of the Critic target network;
the loss function for an Actor network is:
Figure BDA0003385009630000061
the loss function of the first Critic network and the second Critic network is:
Figure BDA0003385009630000062
the objective function of the first and second Critic target networks is:
Figure BDA0003385009630000063
where φ represents a network parameter of the Actor network, θiIndicates the network parameters of the ith critical network,
Figure BDA0003385009630000064
represents the Q value of the ith Critic network; when i is 1, theta1Representing network parameters of the first critical network,
Figure BDA0003385009630000065
represents the Q value of the first Critic network; when i is 2, theta2A network parameter representing a second critical network,
Figure BDA0003385009630000066
represents the Q value of the second Critic network;
Figure BDA0003385009630000067
represents pi according to the current scheduling policyφCalculating the obtained new action;
Figure BDA0003385009630000068
representing a target value, alpha representing an entropy regularization system;
Figure BDA0003385009630000069
represents the Q value of the ith Critic target network, when i is 1,
Figure BDA00033850096300000610
representing the Q value of the first critical target network,
Figure BDA00033850096300000611
representing the Q value of the second critical target network.
Preferably, the constructed optimal scheduling policy expression of the deep neural network is as follows:
Figure BDA0003385009630000071
in the formula, pi represents an optimal scheduling strategy, alpha represents an entropy regularization coefficient, and piφRepresenting a scheduling policy, gamma representing a discount factor; h represents entropy, and the calculation method is as follows: h (Pi)φ(·|s(t)))=E[-logπφ(·|s(t))]。
The invention also provides a dispatching optimization system for unmanned aerial vehicle assisted mobile edge calculation, which comprises:
the model building module is used for building an unloading model of the mobile edge computing system, and the model comprises an unmanned aerial vehicle and a plurality of user equipment;
the energy consumption calculation module is used for obtaining the energy consumption of the calculation task according to the unloading model of the mobile edge calculation system;
the optimization problem establishing module is used for establishing an optimization problem combining the unmanned aerial vehicle track and the user equipment scheduling by taking the average energy consumption minimization of the user equipment as a target;
the optimization problem transformation module is used for transforming the optimization problem into a Markov decision process and defining a state space, an action space and a return function of an unloading model of the mobile edge computing system;
the network construction training module is used for constructing a deep neural network based on a deep reinforcement learning algorithm, and training the deep neural network by using a state space, an action space and a return function to obtain a trained deep neural network;
and the scheduling optimization module performs scheduling optimization by using the trained deep neural network to obtain an optimal scheduling strategy, namely a selection strategy of the flight path of the unmanned aerial vehicle and the user equipment.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the mobile edge computing system unloading model constructed by the invention comprises an unmanned aerial vehicle and a plurality of user equipment, and the optimization problem of combining unmanned aerial vehicle track and user equipment scheduling is established based on the energy consumption for completing the computing task and taking the average energy consumption minimization of the user equipment as a target; the optimization problem is non-convex, is difficult to solve by a traditional method, is converted into a Markov decision process, and defines a state space, an action space and a return function of an unloading model of the mobile edge computing system; the deep neural network constructed based on the SAC algorithm is trained by utilizing the state space, the action space and the return function, the trained deep neural network can be used for scheduling optimization, an optimal scheduling strategy is obtained, the continuous action of the unmanned aerial vehicle can be planned, a reasonable and accurate flight track and a selection strategy of user equipment are obtained, the complexity is low, the convergence is strong, and the average calculation energy consumption of the user equipment is reduced.
Drawings
Fig. 1 is a flowchart of a scheduling optimization method for unmanned aerial vehicle-assisted mobile edge computing according to embodiment 1;
FIG. 2 is a diagram illustrating an unloading model of the mobile edge computing system according to embodiment 2;
FIG. 3 is a schematic structural diagram of the constructed deep neural network described in example 2;
fig. 4 is a schematic diagram illustrating trajectory comparison of the unmanned aerial vehicle according to different optimization scheduling optimization methods described in embodiment 2;
fig. 5 is a schematic diagram illustrating comparison of average energy consumption of ue under different optimal scheduling optimization methods according to embodiment 2;
fig. 6 is a schematic diagram of a scheduling optimization system for unmanned aerial vehicle-assisted mobile edge computation according to embodiment 3.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;
it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1
The embodiment provides a scheduling optimization method for unmanned aerial vehicle-assisted mobile edge computing, as shown in fig. 1, including:
s1: constructing an unloading model of the mobile edge computing system, wherein the model comprises an unmanned aerial vehicle and a plurality of user devices;
s2: obtaining the energy consumption of the calculation task according to the unloading model of the mobile edge calculation system;
s3: establishing an optimization problem combining unmanned aerial vehicle track and user equipment scheduling by taking average energy consumption minimization of the user equipment as a target;
s4: converting the optimization problem into a Markov decision process, and defining a state space, an action space and a return function of an unloading model of the mobile edge computing system;
s5: constructing a deep neural network based on a SAC algorithm, and training the deep neural network by using a state space, an action space and a return function to obtain a trained deep neural network;
s6: and carrying out scheduling optimization by using the trained deep neural network to obtain an optimal scheduling strategy, namely a selection strategy of the flight path of the unmanned aerial vehicle and the user equipment.
In a specific implementation process, an unloading model of a mobile edge computing system is constructed, and a single unmanned aerial vehicle carries an MEC to fly in a specified area, so that edge computing is provided for user equipment; calculating the energy consumption for completing each calculation task according to the unloading model of the mobile edge calculation system; establishing an optimization problem combining unmanned aerial vehicle track and user equipment scheduling by taking average energy consumption minimization of the user equipment as a target; the method comprises the steps that the problem of unmanned aerial vehicle flight trajectory and user equipment selection unloading calculation or local calculation is converted into a Markov decision process, and a state space, an action space and a return function of an unloading model of a mobile edge computing system are defined; the method comprises the steps of constructing a deep neural network based on a SAC algorithm, training the deep neural network by using a state space, an action space and a return function, scheduling and optimizing by using the trained deep neural network to obtain an optimal scheduling strategy, namely a selection strategy of a flight track of the unmanned aerial vehicle and user equipment, solving a non-convex optimization problem, planning continuous actions of the unmanned aerial vehicle, obtaining a reasonable and accurate flight track and selection strategy of the user equipment, and reducing average calculation energy consumption of the user equipment.
Example 2
The embodiment provides a scheduling optimization method for unmanned aerial vehicle-assisted mobile edge computing, which comprises the following steps:
s1: constructing an unloading model of the mobile edge computing system, wherein the model comprises an unmanned aerial vehicle and a plurality of user devices;
as shown in fig. 2, the offload model of the mobile edge computing system includes a single drone and N user devices, the drone serving K user devices at most simultaneously, each user device selecting to offload computing tasks from local computing to drone computing; the length and the width of the flight area of the unmanned aerial vehicle are set to be X respectivelymaxAnd YmaxThe unmanned aerial vehicle flies at a constant speed v (t) at a fixed height h, the antenna emission angle is theta, and the maximum flying speed is vmax(ii) a The flight time of the unmanned aerial vehicle is T time slots, the length of each time slot is tau, and the time for completing the calculation task at any moment cannot exceed the maximum time delay Tmax
Let the coordinates of the unmanned aerial vehicle be [ X (t), Y (t), h]The coordinates of the user equipment are [ x ]i(t),yi(t),0]I ∈ {1, 2, …, N }; setting the flight distance and the horizontal direction angle of the unmanned aerial vehicle at the moment t as d (t) and theta respectivelyh(t), then X (t) X (t-1) + d (t) cos (θ)h(t)),Y(t)=Y(t-1)+d(t)sin(θh(t)); the maximum coverage of the unmanned aerial vehicle is RmaxH · tan (θ), and a flying speed of
Figure BDA0003385009630000091
Defining the calculation task at the moment t as follows:
Ii(t)={Di(t),Fi(t)}
in the formula, Di(t) represents the amount of data transferred when selecting to offload a computing task at time t of computing, Fi(t) represents the computational power required to complete the computational task at time t;
definition of alphai(t) e {0, 1} represents a selection policy of the user equipmentA, aiWhen (t) is 0, the calculation task local calculation at the time t is shown, alphaiWhen (t) is 1, it indicates that the calculation task at time t is unloaded.
S2: obtaining the energy consumption of the calculation task according to the unloading model of the mobile edge calculation system;
user equipment selection to offload computation, i.e. alphai(t) ═ 1; this moment the distance on this user equipment and unmanned aerial vehicle's the horizontal plane does:
Figure BDA0003385009630000101
the user equipment is provided with a single antenna, and in order to avoid interference between the user equipment, a frequency division multiple access protocol unloading mode is adopted; because the flying height of the unmanned aerial vehicle is certain, and a free space channel model is adopted, the uplink rate during unloading calculation is as follows:
Figure BDA0003385009630000102
wherein B represents the average bandwidth of the communication channel, PTrRepresenting the transmission power of the user equipment data unloading, and rho represents a transmission power coefficient;
the time overhead for the user equipment to transmit the calculation task is as follows:
Figure BDA0003385009630000103
the time overhead of processing the calculation task by the unmanned aerial vehicle is as follows:
Figure BDA0003385009630000104
in the formula (f)U(t) represents the computational power of the drone;
the total time overhead for the ue to choose to offload the computation is:
Figure BDA0003385009630000105
the energy consumption of the user equipment for selecting the uninstalling calculation is as follows:
Figure BDA0003385009630000106
in the formula (I), the compound is shown in the specification,
Figure BDA0003385009630000107
indicating that the ith user equipment chooses to offload the calculated energy consumption.
The user equipment selecting local calculation, i.e. alphai(t)=0;
The time overhead of the user equipment for processing the computing task is as follows:
Figure BDA0003385009630000108
in the formula (I), the compound is shown in the specification,
Figure BDA0003385009630000109
representing the computing power of the user device;
setting power consumption of a user equipment to
Figure BDA00033850096300001010
The user equipment selects the locally calculated energy consumption as:
Figure BDA00033850096300001011
in the formula, kiIs a first constant, viIs a second constant. In this example, viIs 3.
S3: establishing an optimization problem combining unmanned aerial vehicle track and user equipment scheduling by taking average energy consumption minimization of the user equipment as a target;
defining a set of flight actions
Figure BDA00033850096300001112
User equipment scheduling policy set
Figure BDA00033850096300001113
The optimization problem P is then expressed as:
Figure BDA0003385009630000111
Figure BDA0003385009630000112
Figure BDA0003385009630000113
Figure BDA0003385009630000114
Figure BDA0003385009630000115
Figure BDA0003385009630000116
Figure BDA0003385009630000117
Figure BDA0003385009630000118
Figure BDA0003385009630000119
wherein E isi(t) represents the energy consumption of the user equipment, when αiWhen the value (t) is 1,
Figure BDA00033850096300001110
when alpha isiWhen (t) is 0, the reaction is carried out,
Figure BDA00033850096300001111
representing a constraint on an unmanned aerial vehicle to serve at most K user equipments, alpha, simultaneouslyi(t)Si(t)≤RmaxThe user equipment representing the constraint selection offload computation is in the maximum coverage of the drone.
S4: converting the optimization problem into a Markov decision process, and defining a state space, an action space and a return function of an unloading model of the mobile edge computing system;
in an unloading model of the mobile edge computing system, an unmanned aerial vehicle and user equipment are equivalent to an intelligent agent, and in each time slot, the intelligent agent observes and obtains a current state s (t) from an environment, the current state s (t) corresponds to a current action a (t), the unmanned aerial vehicle executes the current action a (t) in an action space, interacts with the environment, and returns a current return r (t) and a new state s (t + 1);
for the state space, in each time slot, the position of the user equipment is fixed, and only the position information of the unmanned aerial vehicle needs to be considered; and when each flight cycle is finished, the unmanned aerial vehicle needs to arrive at a specific destination, and the distance between the unmanned aerial vehicle and the specific destination is set as d '(t), then in the state space, the current state expression is s (t) ═ x (t), y (t), h, d' (t) }, and the state space of the embodiment is 4-dimensional;
for the action space, according to the flight distance d (t) of the unmanned aerial vehicle and the horizontal direction angle thetah(t), calculating the position coordinates [ X (t +1), Y (t +1), h of the unmanned aerial vehicle at the next moment]And the selection policy of the user equipment, then in the action space, the current action expression is a (t) ═ θh(t),d(t),αi(t), the motion space of this embodiment is (N +2) -dimensional.
The reward function is used for evaluating the quality of the action taken by the agent in the current state, and specifically comprises the following steps:
r(t)=Rerengy+Rdes+Pout+Pspeed
wherein R (t) represents the current reward, RerengyRepresenting the return of the optimization problem, RdesIndicating a return of the drone to a particular destination, RdesK/d' (t), k being the reward factor; poutRepresents a penalty of the unmanned aerial vehicle flying out of the flight area, PspeedAnd the penalty of flying overspeed of the unmanned aerial vehicle is represented.
S5: constructing a deep neural network based on a SAC algorithm, and training the deep neural network by using a state space, an action space and a return function to obtain a trained deep neural network;
the SAC algorithm is an off-line random strategy algorithm based on a maximum entropy reinforcement learning framework and an Actor-Critic network, and is mainly characterized by entropy regularization, wherein entropy is a measure of strategy randomness, and the increase of entropy can bring more strategy exploration, and the expected return and the entropy value are balanced through training strategies, so that the network learning speed can be accelerated, and meanwhile, the strategy convergence to a local optimal solution is avoided; the purpose of the Actor network is to obtain the maximum return expectation and the maximum entropy, i.e. explore other strategies in the strategy space while successfully completing the task; the combination of the network update in an offline mode and the Actor-Critic network achieves good performance on a continuous control reference task, and is more stable and better in convergence.
The constructed deep neural network comprises an experience buffer area, an Actor network, a first Critic network, a second Critic network, a first Critic target network and a second Critic target network;
in each time slot, the input of the Actor network is the current state s (t), and the corresponding current action a (t) is output to obtain the current scheduling strategy piφ(ii) a The input of the first Critic network and the input of the second Critic network are both the current state s (t) and the current action a (t), and Q values are respectively output; after the unmanned plane executes the current action a (t), a new state s (t +1) is generated, and the current return r (t) is obtained, and then [ s (t), a (t), r (t), s (t +1) ]]Storing in an experience buffer; the first Critic target network and the second Critic target network are respectively used as copies of the first Critic network and the second Critic network, and target functions are setSelecting the smaller Q value of the two Q values to calculate a target value for updating the network parameters of the first Critic network and the second Critic network; when the time slot is finished, updating network parameters of the Actor network and the Critic network in real time according to the current scheduling strategy, and randomly sampling from an experience buffer area to update the network parameters of the Critic target network;
the loss function for an Actor network is:
Figure BDA0003385009630000121
the loss function of the first Critic network and the second Critic network is:
Figure BDA0003385009630000131
the objective function of the first and second Critic target networks is:
Figure BDA0003385009630000132
where φ represents a network parameter of the Actor network, θiIndicates the network parameters of the ith critical network,
Figure BDA0003385009630000133
represents the Q value of the ith Critic network; when i is 1, theta1Representing network parameters of the first critical network,
Figure BDA0003385009630000134
represents the Q value of the first Critic network; when i is 2, theta2A network parameter representing a second critical network,
Figure BDA0003385009630000135
represents the Q value of the second Critic network;
Figure BDA0003385009630000136
represents pi according to the current scheduling policyφCalculating the obtained new action;
Figure BDA0003385009630000137
representing a target value, alpha representing an entropy regularization system;
Figure BDA0003385009630000138
represents the Q value of the ith Critic target network, when i is 1,
Figure BDA0003385009630000139
representing the Q value of the first critical target network,
Figure BDA00033850096300001310
represents the Q value of the second Critic target network;
s6: and carrying out scheduling optimization by using the trained deep neural network to obtain an optimal scheduling strategy, namely a selection strategy of the flight path of the unmanned aerial vehicle and the user equipment.
The optimal scheduling strategy expression is as follows:
Figure BDA00033850096300001311
in the formula, pi represents an optimal scheduling strategy, alpha represents an entropy regularization coefficient, and piφRepresenting a scheduling policy, gamma representing a discount factor; h represents entropy, and the calculation method is as follows: h (Pi)φ(·|s(t)))=E[-logπφ(·|s(t))]。
In a specific implementation process, each screen of the deep neural network constructed based on the SAC algorithm is that the unmanned aerial vehicle starts from a starting point and arrives at a destination or the maximum time T is over; before each screen starts, initializing a starting position and an end position of the unmanned aerial vehicle, and randomly initializing the number of user equipment, namely the value of N; in the initial stage, the scheduling strategy is far away from the optimal scheduling strategy, the entropy regularization coefficient alpha is set to be 1, so that the intelligent agent explores more actions in the initial stage to prevent from falling into the local optimal solution, the alpha is updated while the network parameters are updated, and the alpha is updated along with iterationAnd the algorithm gradually converges to the optimal solution when the times are increased. As shown in fig. 3, in each time slot, the agent outputs an action a (t), namely the flight direction and distance of the drone, according to the observed state information s (t), and the user equipment selects local calculation or offloading calculation; if the flight distance of the unmanned aerial vehicle is greater than the maximum distance dmaxLet d (t) be dmax(ii) a And if the next position of the unmanned aerial vehicle exceeds the specified area, canceling the flight action. Obtaining corresponding current return r (t) and state s (t +1) at the next moment according to the current action, and converting s (t), a (t), r (t), s (t +1)]And storing the network parameters in an experience buffer, and randomly sampling K groups of experiences from the experience buffer at the end of each time to update the network parameters. The SAC algorithm comprises a parameterized Actor network from which a strategy pi is outputφ(. s (t)), namely inputting the state information s (t) to the Actor network and outputting the corresponding action a (t) to piφ(. s (t)); in addition, two parameterized Critic networks, also called Q networks, input the status information s (t) of the Actor network and the corresponding obtained action a (t) are jointly input into the first Critic network and the second Critic network, and the obtained Q values are respectively output
Figure BDA0003385009630000141
Is selected to be smaller
Figure BDA0003385009630000142
The method is used for judging the performance of the Actor network and preventing overestimation. Wherein phi and thetaiRespectively representing the parameters of the Actor network and the Critic network. Similar to other DRL algorithms, the SAC algorithm also sets an experience buffer for training deep neural network parameters, and also sets a target network and soft update. The objective functions are copies of the first and second critical networks respectively,
Figure BDA0003385009630000143
denotes a target Q value, θ'iRepresenting parameters of the first and second critical target networks. "Soft" update means that the parameters of the target network are updated by slowly tracking the trained network parameters, i.e., < ← τ φ + (1- τ) φ',θi←τθi+(1-τ)θ′iwherein tau is less than or equal to 1. The difference is that the actions for updating the Actor network and Critic network are from the current policy and are not sampled from the experience buffer.
As shown in fig. 4, scheduling optimization is performed by taking a single drone service and 40 pieces of user equipment as examples, and a trajectory of a drone under different scheduling optimization methods is shown in the figure; the difference is that the trajectory 2 is the flight trajectory of the unmanned aerial vehicle which is only optimized to be scheduled by the user equipment, and the trajectory 3 is the flight trajectory of the unmanned aerial vehicle which is scheduled by the random user equipment; where the triangle represents track 1, the diamond represents track 2, the square represents track 3, and track 2 coincides with track 3; as shown in fig. 5, a comparison graph of the average energy consumption of the user equipments for three scheduling methods is shown, where a triangle represents the average energy consumption of the user equipments for jointly optimizing the trajectory of the drone and for scheduling the user equipments provided by the embodiment, a circle represents the average energy consumption of the user equipments for optimizing only scheduling the user equipments, and a square represents the average energy consumption of the user equipments for scheduling the random user equipments; based on different functions of different user equipment in reality, the calculation task size is randomly generated in the embodiment, and the maximum service number K of the unmanned aerial vehicles is 3; as can be seen from the figure, the average energy consumption of the user equipment in the scheduling optimization method for jointly optimizing the unmanned aerial vehicle trajectory and the user equipment scheduling is much smaller than that in the method for only optimizing the user equipment scheduling and the method for randomly scheduling the user equipment.
Example 3
The present embodiment provides a scheduling optimization system for unmanned aerial vehicle-assisted mobile edge computation, as shown in fig. 6, including:
the model building module is used for building an unloading model of the mobile edge computing system, and the model comprises an unmanned aerial vehicle and a plurality of user equipment;
the energy consumption calculation module is used for obtaining the energy consumption of the calculation task according to the unloading model of the mobile edge calculation system;
the optimization problem establishing module is used for establishing an optimization problem combining the unmanned aerial vehicle track and the user equipment scheduling by taking the average energy consumption minimization of the user equipment as a target;
the optimization problem transformation module is used for transforming the optimization problem into a Markov decision process and defining a state space, an action space and a return function of an unloading model of the mobile edge computing system;
the network construction training module is used for constructing a deep neural network based on a deep reinforcement learning algorithm, and training the deep neural network by using a state space, an action space and a return function to obtain a trained deep neural network;
and the scheduling optimization module performs scheduling optimization by using the trained deep neural network to obtain an optimal scheduling strategy, namely a selection strategy of the flight path of the unmanned aerial vehicle and the user equipment.
The terms describing positional relationships in the drawings are for illustrative purposes only and are not to be construed as limiting the patent;
it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A scheduling optimization method for unmanned aerial vehicle-assisted mobile edge computing is characterized by comprising the following steps:
s1: constructing an unloading model of the mobile edge computing system, wherein the model comprises an unmanned aerial vehicle and a plurality of user devices;
s2: obtaining the energy consumption of the calculation task according to the unloading model of the mobile edge calculation system;
s3: establishing an optimization problem combining unmanned aerial vehicle track and user equipment scheduling by taking average energy consumption minimization of the user equipment as a target;
s4: converting the optimization problem into a Markov decision process, and defining a state space, an action space and a return function of an unloading model of the mobile edge computing system;
s5: constructing a deep neural network based on a SAC algorithm, and training the deep neural network by using a state space, an action space and a return function to obtain a trained deep neural network;
s6: and carrying out scheduling optimization by using the trained deep neural network to obtain an optimal scheduling strategy, namely a selection strategy of the flight path of the unmanned aerial vehicle and the user equipment.
2. The scheduling optimization method for unmanned aerial vehicle-assisted mobile edge computing according to claim 1, wherein in step S1, the unloading model of the mobile edge computing system is specifically constructed as follows:
the unloading model of the mobile edge computing system comprises a single unmanned aerial vehicle and N user devices, wherein the unmanned aerial vehicle serves K user devices at most at the same time, and each user device selects to calculate a computing task from local computing or unload the computing task to the unmanned aerial vehicle for computing; the length and the width of the flight area of the unmanned aerial vehicle are set to be X respectivelymaxAnd YmaxThe unmanned aerial vehicle flies at a constant speed v (t) at a fixed height h, the antenna emission angle is theta, and the maximum flying speed is vmax(ii) a The flight time of the unmanned aerial vehicle is T time slots, the length of each time slot is tau, and the time for completing the calculation task at any moment cannot exceed the maximum time delay Tmax
The coordinates of the unmanned plane are expressed as [ X (t), Y (t), h]The coordinates of the user equipment are denoted xi(t),yi(t),0]I ∈ {1, 2, …, N }; when the flight distance and the horizontal angle of the unmanned aerial vehicle at time t are set to d (t) and θ h (t), respectively, X (t) is X (t-1) + d (t) cos (θ h (t)), Y (t) is Y (t-1) + d (t) sin (θ h (t))h(t)); the maximum coverage of the unmanned aerial vehicle is RmaxH · tan (θ), flying speedIs composed of
Figure FDA0003385009620000011
Defining the calculation task at the moment t as follows:
Ii(t)={Di(t),Fi(t)}
in the formula, Di(t) represents the amount of data transferred when selecting to offload a computing task at time t of computing, Fi(t) represents the computational power required to complete the computational task at time t;
definition of alphai(t) E {0, 1} represents a selection policy of the user equipment, αiWhen (t) is 0, the calculation task local calculation at the time t is shown, alphaiWhen (t) is 1, it indicates that the calculation task at time t is unloaded.
3. The method for scheduling optimization of unmanned aerial vehicle-assisted mobile edge computing according to claim 2, wherein in step S2, obtaining the energy consumed by the computing task according to the unloading model of the mobile edge computing system includes:
user equipment selection to offload computation, i.e. alphai(t) ═ 1; this moment the distance on this user equipment and unmanned aerial vehicle's the horizontal plane does:
Figure FDA0003385009620000021
the uplink rate at the time of offloading the computation is:
Figure FDA0003385009620000022
wherein B represents the average bandwidth of the communication channel, PTrRepresenting the transmission power of the user equipment data unloading, and rho represents a transmission power coefficient;
the time overhead for the user equipment to transmit the calculation task is as follows:
Figure FDA0003385009620000023
the time overhead of processing the calculation task by the unmanned aerial vehicle is as follows:
Figure FDA0003385009620000024
in the formula (f)U(t) represents the computational power of the drone;
the total time overhead for the ue to choose to offload the computation is:
Figure FDA0003385009620000025
the energy consumption of the user equipment for selecting the uninstalling calculation is as follows:
Figure FDA0003385009620000026
in the formula (I), the compound is shown in the specification,
Figure FDA0003385009620000027
indicating that the ith user equipment chooses to offload the calculated energy consumption.
4. The method for scheduling optimization of unmanned aerial vehicle-assisted mobile edge computing according to claim 3, wherein the step S2, obtaining the energy consumed by the computing task according to the unloading model of the mobile edge computing system further comprises:
the user equipment selecting local calculation, i.e. alphai(t)=0;
The time overhead of the user equipment for processing the computing task is as follows:
Figure FDA0003385009620000031
in the formula (I), the compound is shown in the specification,
Figure FDA00033850096200000318
representing the computing power of the user device;
setting power consumption of a user equipment to
Figure FDA0003385009620000032
The user equipment selects the locally calculated energy consumption as:
Figure FDA0003385009620000033
in the formula, kiIs a first constant, viIs a second constant.
5. The scheduling optimization method for unmanned aerial vehicle-assisted mobile edge computing according to claim 4, wherein in step S3, aiming at minimizing average energy consumption of user equipment, an optimization problem combining unmanned aerial vehicle trajectory and user equipment scheduling is established, specifically:
defining a set of flight actions
Figure FDA0003385009620000034
User equipment scheduling policy set
Figure FDA0003385009620000035
The optimization problem P is then expressed as:
P:
Figure FDA0003385009620000036
Figure FDA0003385009620000037
Figure FDA0003385009620000038
Figure FDA0003385009620000039
Figure FDA00033850096200000310
Figure FDA00033850096200000311
Figure FDA00033850096200000312
Figure FDA00033850096200000313
Figure FDA00033850096200000314
wherein E isi(t) represents the energy consumption of the user equipment, when αiWhen the value (t) is 1,
Figure FDA00033850096200000315
when alpha isiWhen (t) is 0, the reaction is carried out,
Figure FDA00033850096200000316
Figure FDA00033850096200000317
representing a constraint on an unmanned aerial vehicle to serve at most K user equipments, alpha, simultaneouslyi(t)Si(t)≤RmaxUser equipment for representing constraint selection offload computation in unmanned aerial vehicleIs in the maximum coverage area.
6. The scheduling optimization method for unmanned aerial vehicle-assisted mobile edge computing according to claim 5, wherein in step S4, the state space and the action space of the designed mobile edge computing system unloading model are specifically:
in an unloading model of the mobile edge computing system, an unmanned aerial vehicle and user equipment are equivalent to an intelligent agent, and in each time slot, the intelligent agent observes and obtains a current state s (t) from an environment, the current state s (t) corresponds to a current action a (t), the unmanned aerial vehicle executes the current action a (t) in an action space, interacts with the environment, and returns a current return r (t) and a new state s (t + 1);
for the state space, in each time slot, the position of the user equipment is fixed, and only the position information of the unmanned aerial vehicle needs to be considered; and when each flight cycle is finished, the unmanned aerial vehicle needs to arrive at a specific destination, and the distance between the unmanned aerial vehicle and the specific destination is set as d '(t), the current state expression in the state space is s (t) ═ x (t), y (t), h, d' (t) };
for the action space, according to the flight distance d (t) of the unmanned aerial vehicle and the horizontal direction angle thetah(t), calculating the position coordinates [ X (t +1), Y (t +1), h of the unmanned aerial vehicle at the next moment]And the selection policy of the user equipment, then in the action space, the current action expression is a (t) ═ θh(t),d(t),αi(t)}。
7. The method for scheduling optimization of unmanned aerial vehicle-assisted mobile edge computing according to claim 6, wherein in step S4, the designed reward function of the offload model of the mobile edge computing system is specifically:
the reward function is used for evaluating the quality of the action taken by the agent in the current state, and specifically comprises the following steps:
r(t)=Rerengy+Rdes+Pout+Pspeed
wherein R (t) represents the current reward, RerengyRepresenting the return of the optimization problem, RdesIndicating unmanned aerial vehicleReward for flying back to a particular destination, RdesK/d' (t), k being the reward factor; poutRepresents a penalty of the unmanned aerial vehicle flying out of the flight area, PspeedAnd the penalty of flying overspeed of the unmanned aerial vehicle is represented.
8. The method for scheduling optimization of unmanned aerial vehicle-assisted mobile edge computing according to claim 7, wherein in step S5, the constructed deep neural network includes an experience buffer, an Actor network, a first Critic network, a second Critic network, a first Critic target network, and a second Critic target network;
in each time slot, the input of the Actor network is the current state s (t), and the corresponding current action a (t) is output to obtain the current scheduling strategy piφ(ii) a The input of the first Critic network and the input of the second Critic network are both the current state s (t) and the current action a (t), and Q values are respectively output; after the unmanned plane executes the current action a (t), a new state s (t +1) is generated, and the current return r (t) is obtained, and then [ s (t), a (t), r (t), s (t +1) ]]Storing in an experience buffer; the first Critic target network and the second Critic target network are respectively used as copies of the first Critic network and the second Critic network, an objective function is set, and the smaller Q value of the two Q values is selected to calculate a target value for updating network parameters of the first Critic network and the second Critic network; when the time slot is finished, updating network parameters of the Actor network and the Critic network in real time according to the current scheduling strategy, and randomly sampling from an experience buffer area to update the network parameters of the Critic target network;
the loss function for an Actor network is:
Figure FDA0003385009620000051
the loss function of the first Critic network and the second Critic network is:
Figure FDA0003385009620000052
the objective function of the first and second Critic target networks is:
Figure FDA0003385009620000053
where φ represents a network parameter of the Actor network, θiIndicates the network parameters of the ith critical network,
Figure FDA0003385009620000054
represents the Q value of the ith Critic network; when i is 1, theta1Representing network parameters of the first critical network,
Figure FDA0003385009620000055
represents the Q value of the first Critic network; when i is 2, theta2A network parameter representing a second critical network,
Figure FDA0003385009620000056
represents the Q value of the second Critic network;
Figure FDA0003385009620000057
represents pi according to the current scheduling policyφCalculating the obtained new action;
Figure FDA0003385009620000058
representing a target value, alpha representing an entropy regularization system;
Figure FDA0003385009620000059
represents the Q value of the ith Critic target network, when i is 1,
Figure FDA00033850096200000510
representing the Q value of the first critical target network,
Figure FDA00033850096200000511
indicates a second CriticThe Q value of the target network.
9. The unmanned aerial vehicle-assisted mobile edge computing scheduling optimization method according to claim 8, wherein the optimal scheduling strategy expression of the constructed deep neural network is as follows:
Figure FDA00033850096200000512
in the formula, pi represents an optimal scheduling strategy, alpha represents an entropy regularization coefficient, and piφRepresenting a scheduling policy, gamma representing a discount factor; h represents entropy, and the calculation method is as follows: h (Pi)φ(·|s(t)))=E[-logπφ(·|s(t))]。
10. An unmanned aerial vehicle assisted mobile edge computed dispatch optimization system, comprising:
the model building module is used for building an unloading model of the mobile edge computing system, and the model comprises an unmanned aerial vehicle and a plurality of user equipment;
the energy consumption calculation module is used for obtaining the energy consumption of the calculation task according to the unloading model of the mobile edge calculation system;
the optimization problem establishing module is used for establishing an optimization problem combining the unmanned aerial vehicle track and the user equipment scheduling by taking the average energy consumption minimization of the user equipment as a target;
the optimization problem transformation module is used for transforming the optimization problem into a Markov decision process and defining a state space, an action space and a return function of an unloading model of the mobile edge computing system;
the network construction training module is used for constructing a deep neural network based on a SAC algorithm, and training the deep neural network by utilizing a state space, an action space and a return function to obtain a trained deep neural network;
and the scheduling optimization module performs scheduling optimization by using the trained deep neural network to obtain an optimal scheduling strategy, namely a selection strategy of the flight path of the unmanned aerial vehicle and the user equipment.
CN202111449863.3A 2021-11-30 Scheduling optimization method and system for unmanned aerial vehicle auxiliary mobile edge calculation Active CN114169234B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111449863.3A CN114169234B (en) 2021-11-30 Scheduling optimization method and system for unmanned aerial vehicle auxiliary mobile edge calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111449863.3A CN114169234B (en) 2021-11-30 Scheduling optimization method and system for unmanned aerial vehicle auxiliary mobile edge calculation

Publications (2)

Publication Number Publication Date
CN114169234A true CN114169234A (en) 2022-03-11
CN114169234B CN114169234B (en) 2024-10-25

Family

ID=

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114698125A (en) * 2022-06-02 2022-07-01 北京建筑大学 Method, device and system for optimizing computation offload of mobile edge computing network
CN114840021A (en) * 2022-04-28 2022-08-02 中国人民解放军国防科技大学 Trajectory planning method, device, equipment and medium for data collection of unmanned aerial vehicle
CN115334165A (en) * 2022-07-11 2022-11-11 西安交通大学 Underwater multi-unmanned platform scheduling method and system based on deep reinforcement learning
CN115361089A (en) * 2022-07-08 2022-11-18 国网江苏省电力有限公司电力科学研究院 Data security communication method, system and device of power Internet of things and storage medium
CN116017472A (en) * 2022-12-07 2023-04-25 中南大学 Unmanned aerial vehicle track planning and resource allocation method for emergency network
CN116451934A (en) * 2023-03-16 2023-07-18 中国人民解放军国防科技大学 Multi-unmanned aerial vehicle edge calculation path optimization and dependent task scheduling optimization method and system
CN117553803A (en) * 2024-01-09 2024-02-13 大连海事大学 Multi-unmanned aerial vehicle intelligent path planning method based on deep reinforcement learning
CN117970952A (en) * 2024-03-28 2024-05-03 中国人民解放军海军航空大学 Unmanned aerial vehicle maneuver strategy offline modeling method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021034050A (en) * 2019-08-21 2021-03-01 哈爾浜工程大学 Auv action plan and operation control method based on reinforcement learning
CN112911648A (en) * 2021-01-20 2021-06-04 长春工程学院 Air-ground combined mobile edge calculation unloading optimization method
CN113346944A (en) * 2021-06-28 2021-09-03 上海交通大学 Time delay minimization calculation task unloading method and system in air-space-ground integrated network
CN113395654A (en) * 2021-06-11 2021-09-14 广东工业大学 Method for task unloading and resource allocation of multiple unmanned aerial vehicles of edge computing system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021034050A (en) * 2019-08-21 2021-03-01 哈爾浜工程大学 Auv action plan and operation control method based on reinforcement learning
CN112911648A (en) * 2021-01-20 2021-06-04 长春工程学院 Air-ground combined mobile edge calculation unloading optimization method
CN113395654A (en) * 2021-06-11 2021-09-14 广东工业大学 Method for task unloading and resource allocation of multiple unmanned aerial vehicles of edge computing system
CN113346944A (en) * 2021-06-28 2021-09-03 上海交通大学 Time delay minimization calculation task unloading method and system in air-space-ground integrated network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
宋强;王昆仑;黄祚继;董丹丹;张蕊;: "浅议低空无人机在河道管理中应用研究", 治淮, no. 05, 15 May 2019 (2019-05-15) *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114840021A (en) * 2022-04-28 2022-08-02 中国人民解放军国防科技大学 Trajectory planning method, device, equipment and medium for data collection of unmanned aerial vehicle
CN114698125A (en) * 2022-06-02 2022-07-01 北京建筑大学 Method, device and system for optimizing computation offload of mobile edge computing network
CN115361089A (en) * 2022-07-08 2022-11-18 国网江苏省电力有限公司电力科学研究院 Data security communication method, system and device of power Internet of things and storage medium
CN115334165B (en) * 2022-07-11 2023-10-17 西安交通大学 Underwater multi-unmanned platform scheduling method and system based on deep reinforcement learning
CN115334165A (en) * 2022-07-11 2022-11-11 西安交通大学 Underwater multi-unmanned platform scheduling method and system based on deep reinforcement learning
CN116017472A (en) * 2022-12-07 2023-04-25 中南大学 Unmanned aerial vehicle track planning and resource allocation method for emergency network
CN116017472B (en) * 2022-12-07 2024-04-19 中南大学 Unmanned aerial vehicle track planning and resource allocation method for emergency network
CN116451934A (en) * 2023-03-16 2023-07-18 中国人民解放军国防科技大学 Multi-unmanned aerial vehicle edge calculation path optimization and dependent task scheduling optimization method and system
CN116451934B (en) * 2023-03-16 2024-02-06 中国人民解放军国防科技大学 Multi-unmanned aerial vehicle edge calculation path optimization and dependent task scheduling optimization method and system
CN117553803A (en) * 2024-01-09 2024-02-13 大连海事大学 Multi-unmanned aerial vehicle intelligent path planning method based on deep reinforcement learning
CN117553803B (en) * 2024-01-09 2024-03-19 大连海事大学 Multi-unmanned aerial vehicle intelligent path planning method based on deep reinforcement learning
CN117970952A (en) * 2024-03-28 2024-05-03 中国人民解放军海军航空大学 Unmanned aerial vehicle maneuver strategy offline modeling method
CN117970952B (en) * 2024-03-28 2024-06-04 中国人民解放军海军航空大学 Unmanned aerial vehicle maneuver strategy offline modeling method

Similar Documents

Publication Publication Date Title
CN113346944B (en) Time delay minimization calculation task unloading method and system in air-space-ground integrated network
CN112911648A (en) Air-ground combined mobile edge calculation unloading optimization method
CN113254188B (en) Scheduling optimization method and device, electronic equipment and storage medium
CN115827108B (en) Unmanned aerial vehicle edge calculation unloading method based on multi-target deep reinforcement learning
CN115640131A (en) Unmanned aerial vehicle auxiliary computing migration method based on depth certainty strategy gradient
WO2022242468A1 (en) Task offloading method and apparatus, scheduling optimization method and apparatus, electronic device, and storage medium
CN117055619A (en) Unmanned aerial vehicle scheduling method based on multi-agent reinforcement learning
CN113660681A (en) Multi-agent resource optimization method applied to unmanned aerial vehicle cluster auxiliary transmission
CN115633320B (en) Multi-unmanned aerial vehicle assisted data acquisition and return method, system, equipment and medium
CN117499867A (en) Method for realizing high-energy-efficiency calculation and unloading through strategy gradient algorithm in multi-unmanned plane auxiliary movement edge calculation
CN116893861A (en) Multi-agent cooperative dependency task unloading method based on space-ground cooperative edge calculation
CN111629443A (en) Optimization method and system for dynamic spectrum slicing frame in super 5G vehicle networking
CN116546559B (en) Distributed multi-target space-ground combined track planning and unloading scheduling method and system
Yang et al. Dynamic trajectory and offloading control of UAV-enabled MEC under user mobility
CN117580105B (en) Unmanned aerial vehicle task unloading optimization method for power grid inspection
CN116723548A (en) Unmanned aerial vehicle auxiliary calculation unloading method based on deep reinforcement learning
Wang et al. Curriculum reinforcement learning-based computation offloading approach in space-air-ground integrated network
Gao et al. Multi-UAV assisted offloading optimization: A game combined reinforcement learning approach
CN117236561A (en) SAC-based multi-unmanned aerial vehicle auxiliary mobile edge computing method, device and storage medium
CN116009590B (en) Unmanned aerial vehicle network distributed track planning method, system, equipment and medium
CN116882270A (en) Multi-unmanned aerial vehicle wireless charging and edge computing combined optimization method and system based on deep reinforcement learning
CN116578354A (en) Method and device for unloading edge calculation tasks of electric power inspection unmanned aerial vehicle
CN114169234B (en) Scheduling optimization method and system for unmanned aerial vehicle auxiliary mobile edge calculation
CN114169234A (en) Scheduling optimization method and system for unmanned aerial vehicle-assisted mobile edge calculation
Tan et al. Communication-assisted multi-agent reinforcement learning improves task-offloading in UAV-aided edge-computing networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant