CN113157344A - DRL-based energy consumption perception task unloading method in mobile edge computing environment - Google Patents

DRL-based energy consumption perception task unloading method in mobile edge computing environment Download PDF

Info

Publication number
CN113157344A
CN113157344A CN202110481249.9A CN202110481249A CN113157344A CN 113157344 A CN113157344 A CN 113157344A CN 202110481249 A CN202110481249 A CN 202110481249A CN 113157344 A CN113157344 A CN 113157344A
Authority
CN
China
Prior art keywords
task
enb
tasks
drl
energy consumption
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110481249.9A
Other languages
Chinese (zh)
Other versions
CN113157344B (en
Inventor
胡海洋
胡宇航
李忠金
魏泽丰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202110481249.9A priority Critical patent/CN113157344B/en
Publication of CN113157344A publication Critical patent/CN113157344A/en
Application granted granted Critical
Publication of CN113157344B publication Critical patent/CN113157344B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44594Unloading
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses an energy consumption perception task unloading method based on DRL in a mobile edge computing environment. The invention designs a state space, an action space and a reward function of a task unloading problem under a multi-eNB MEC environment. An actor-critic framework is adopted as the basic structure of the whole DRL-E2D algorithm, namely two neural networks of actor and critic are mainly included. At the same time, the state observed by MD under the environment is used as the input of the operator, and the action and state of the operator output are used as the critical network input. The invention combines the relevant knowledge of the intensive deep learning and considers the deadline constraint into the reward function, so that the MD can make the optimal decision of unloading the tasks to a plurality of eNBs under the condition of limiting the task duration according to the system state.

Description

DRL-based energy consumption perception task unloading method in mobile edge computing environment
Technical Field
The invention belongs to the technical field of mobile edge computing, and relates to an energy consumption perception task unloading decision method in mobile edge computing, in particular to a DRL-based model-free task unloading decision method under the constraint of deadline.
Background
With the development of wireless networks, more and more mobile applications are beginning to emerge and are receiving tremendous popularity. These mobile applications cover a wide range of fields, such as traffic monitoring, smart homes, real-time vision processing, target tracking, etc., often requiring computationally intensive resources to achieve a high quality of experience (QoE), and running all applications on a single MD can result in high energy consumption and delay despite the increasing performance of Mobile Devices (MDs). Mobile Edge Computing (MEC) has become a promising technology to address this problem, providing Computing power within a wireless access network compared to traditional cloud Computing systems using a remote public cloud. The advent of MEC allows MD to offload its computationally intensive tasks to near-end enodebs (enbs) to enhance computational power. Task or computing offloading in the MEC environment has been extensively studied at present. Conventional offloading schemes are model-based, i.e. it is generally assumed that the mobile signals between the MD and the eNB are well modeled. However, the MEC environment is very complex and the mobility of the user is highly dynamic, making the mobility model difficult to build and predict. With the generation of Deep Reinforcement Learning (DRL), more and more researchers unload the tasks applied to the MEC, and the DRL has three advantages that 1) the DRL is a model-free optimization method and does not need any mathematical knowledge based on models; 2) the optimization problem in a high dynamic time-varying system can be solved; 3) it can handle large state and motion space problems. The above features indicate that DRL is an ideal method for MEC to accomplish task offloading. However, applying DRL technology for MEC task offloading should consider and solve the following problems: first, the proposed MEC task offloading problem for high density enbss is a large discrete action space problem. For example, there are 5 e NBSs in the MEC for the MD to offload 20 tasks, and there are 5 million offload operations. In this case, deep neural network (DQN) based DRL does not work well because it has only the potential to handle small motion space problems. Second, task offloading is a discrete control problem, so continuous control methods such as depth-deterministic policy gradients (DDPG) will not work properly. And all the above methods take the task processing time as the average performance requirement, and do not consider the cutoff running time of the task, which is unreasonable. Thus, the reward functions for current task offloading schemes focus primarily on average-based performance metrics, failing to meet the deadline constraints for the task. The invention provides a DRL-based energy consumption perception task unloading method (DRL-E2D) under a mobile edge computing environment, which is used for learning an optimal decision from an unknown environment based on a deep reinforcement learning technology, so that the MD can maximize the task unloading utility under the condition of meeting task deadline constraints.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an energy consumption perception task unloading method based on DRL in a mobile edge computing environment.
The general idea of the inventive method is:
the task unloading architecture of the multi-eNB MEC environment mainly comprises an MD and a plurality of eNBs. The MD may generate a certain number of tasks in each time period, and each task may be offloaded to any eNB through the wireless network for execution. Therefore, a reasonable offloading scheme is very important, which directly affects the execution time of the task and the energy consumption of the MD. Aiming at the condition that the deadline constraint of tasks is not considered by the reward function of most of the current task unloading schemes, the invention combines the deadline constraint with the utility of MD to finish the tasks, considers the energy consumption of MD and the task discarding penalty, and designs a combined reward function for processing the optimization problem.
The invention adopts DRL-E2D algorithm to solve the problems, firstly, the state space, the action space and the reward function of the task unloading problem under the multi-eNB MEC environment are designed. An actor-critic framework is adopted as the basic structure of the whole DRL-E2D algorithm, namely two neural networks of actor and critic are mainly included. At the same time, the state observed by MD under the environment is used as the input of the operator, and the action and state of the operator output are used as the critical network input. In order to deal with the problem of dimensionality disaster of a high-dimensional discrete motion space, an embedding layer is added into an actor network and a critic network, the embedding layer is used for converting continuous motion under the space into discrete motion, and a KNN algorithm with low complexity is adopted to extract a nearest neighbor motion value.
The method comprises the following specific steps:
step (1), constructing a task unloading scene under a multi-eNB MEC environment;
step (2), constructing a joint reward function of a task unloading scene under the constraint of deadline under the environment of multiple eNB MECs:
Max:R(τ)=U(τ)-P(τ)-E(τ) (a)
Figure BDA0003049350470000021
Figure BDA0003049350470000022
Figure BDA0003049350470000023
Figure BDA0003049350470000024
step (3), under a task unloading scene under a multi-eNB MEC environment, constructing an operator-critical deep reinforcement learning network framework;
step (4), an operator-critic deep reinforcement learning network framework is adopted to carry out optimization solution on the joint reward function of task unloading in the step (2), and a solution of optimal task unloading is obtained;
it is a further object of the present invention to provide a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the above-mentioned method.
It is a further object of the present invention to provide a computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method described above.
The invention has the beneficial effects that: the invention is used for a multi-eNB environment with a high-dimensional discrete action space in mobile edge calculation, such as various application scenes of traffic monitoring, smart home, real-time visual processing, AI application and the like, and aims to optimize the long-term energy consumption of MD (machine direction) so as to save the battery capacity of MD. The invention combines the relevant knowledge of the intensive deep learning and considers the deadline constraint into the reward function, so that the MD can make the optimal decision of unloading the tasks to a plurality of eNBs under the condition of limiting the task duration according to the system state.
Figure BDA0003049350470000031
n: representing the number of eNBs in the MEC;
Tslot: represents the duration of each time period;
w: representing the workload of the task;
d: a data size representing a task;
λ: representing the rate at which tasks arrive at the MD;
TDL: representing a cutoff constraint for the task;
z (τ): represents the number of task arrivals over the τ period MD;
ηi(τ): represents the data transmission rate from the MD to the eNB during the τ period;
Li(τ) represents a task queue processed by the eNB during the τ period;
αi(τ): represents the number of tasks offloaded on the eNB over a period of τ;
βi(τ): represents the completed task processed at the time period of tau;
ci(τ): represents the computation capacity of the MD or eNB;
di(τ) represents the amount of tasks deleted by each eNB and MD for a period of τ;
Figure BDA0003049350470000032
a data transmission rate representing an offloading task from the MD to the eNB for a period of τ;
Figure BDA0003049350470000033
representing the execution time of a task on MD or eNB;
Figure BDA0003049350470000034
representing the computational power of the MD over the period of tau, as determined by its own hardware.
E (τ): represents the total amount of energy consumed during the τ period MD;
u (τ): indicating the overall utility over the period of time tau.
P (τ): represents the penalty incurred by all the drop tasks during the period of τ;
r (τ): represents all rewards during the period of τ;
drawings
FIG. 1 is an architecture for task offloading in a multi-eNB MEC environment;
FIG. 2 is an architectural diagram of DRL-E2D;
FIG. 3(1) - (3) shows the convergence comparison experiment of DRL-E2D of the present invention and the conventional DQN algorithm under the condition that nb number k is 1, 3, and 5, respectively;
FIGS. 4(1) - (3) respectively show the reward, energy consumption and loss cost obtained by LB, Remote, DRL-E2D, DQN and MD algorithms at different enb numbers;
FIGS. 5(1) - (3) are respectively the reward, energy consumption and loss cost obtained by LB, Remote, DRL-E2D, DQN, MD algorithms at different task workloads W;
fig. 6(1) - (3) respectively show the reward, energy consumption and loss cost obtained by the LB, Remote, DRL-E2D, DQN and MD algorithms at different data sizes D.
Detailed Description
The invention is further analyzed with reference to the following figures.
FIG. 2 is an architectural diagram of DRL-E2D. The DRL-based energy consumption perception task unloading method under the mobile edge computing environment comprises the following steps:
step (1), constructing a task unloading scene under a multi-eNB MEC environment; FIG. 1 is an architecture for task offloading in a multi-eNB MEC environment;
the overall architecture of a task unloading scene under a multi-eNB MEC environment mainly comprises a single MD and n base station eNBs; the MD is used for sending the designated tasks to each base station for unloading and simultaneously executing the tasks locally;
(1.1) dividing the system time into equally spaced time periods, assuming that z (τ) tasks arrive at MD at the beginning of each time period, they are considered as an independent and identically distributed sequence, and each arriving task has constant data D and execution workload W;
(1.2) defining the ith time period from MD to ith base station eNBiData transmission rate ηi(τ):
ηi(τ)=Bilog2[1+SNRi(τ)] (1)
Wherein B isiRepresenting eNBiThe bandwidth allocated to the MD is such that,
Figure BDA0003049350470000051
which is indicative of the signal-to-noise ratio,
Figure BDA0003049350470000052
representing the transmission power, σ, of the MD2Representing white Gaussian noise, gi(τ) represents the channel gain, defined as
Figure BDA0003049350470000053
And theta denotes a path loss constant and a path loss exponent, respectively, di(τ) denotes eNBiPath distance from MD at time period τ;
(1.4) definition of eNBiτ +1 th slot task processing queue Li(τ+1):
Li(τ+1)=max{Li(τ)-βi(τ),0}+αi(τ) (2)
Wherein alpha isi(τ) indicates all offloading to eNBiTask of (1), betai(τ) denotes eNBiProcessing completed tasks in the tau time period;
define the # 1 time slot task processing queue L of MD0(τ+1):
L0(τ+1)=max{L0(τ)-β0(τ),0}+α0(τ) (3)
Wherein alpha is0(τ) is a local task of MD, β0(τ) is the task processed and completed in the τ -th time period of the MD;
(1.5) since the task can be executed on MD or eNB respectively, its execution time is defined respectively
Figure BDA0003049350470000054
And consumption of capacity
Figure BDA0003049350470000055
(1.5.1) for the case where the task is executed locally in the MD, its execution time and energy consumption are defined as:
Figure BDA0003049350470000056
Figure BDA0003049350470000057
wherein
Figure BDA0003049350470000058
Represents the computing power of an MD with an M-core CPU, and is defined as
Figure BDA0003049350470000059
Wherein
Figure BDA00030493504700000510
Represents a constant related to the chip architecture; f (tau) represents the working frequency of the M-core CPU; m represents an M core; c. C0(τ) represents the calculated capacity of the MD, denoted c0(τ) ═ MF (τ); w represents the workload of the task;
(1.5.2) offloading tasks to eNB for MDiThe execution condition needs to consider the data transmission time and the execution time respectively; defining dataThe input time is as follows:
Figure BDA00030493504700000511
meanwhile, the energy consumed by data transmission can be defined as:
Figure BDA00030493504700000512
wherein D represents the data size of the task;
Figure BDA0003049350470000061
represents the transmission power of the MD;
when eNBiAfter receiving the task, the task is put into a task processing queue Q of the task according to the rule of first-come first-obtainedi(τ); defining the task execution time as follows:
Figure BDA0003049350470000062
wherein W represents the workload of a task; c. Ci(τ) denotes eNBiThe calculated capacity of (a);
step (2), constructing a joint reward function of a task unloading scene under the constraint of deadline under the environment of multiple eNB MECs, specifically as follows:
(2.1) define the total energy consumption E (τ) to perform tasks and offload tasks locally to the eNB at each time period MD as:
Figure BDA0003049350470000063
wherein
Figure BDA0003049350470000064
Denotes MD local execution beta0(τ) the amount of energy consumed by the tasks,
Figure BDA0003049350470000065
denotes alphai(τ) offloading of tasks to eNBiTotal transmission energy consumption of;
(2.2) considering the task deadline constraint, defining the total utility U (τ) of MD and all base stations as:
Figure BDA0003049350470000066
Figure BDA0003049350470000067
wherein n represents the number of eNBs in the MEC; t (T)j) Representing the jth task tjWaiting or execution time of, TDLRepresenting the deadline of the task; beta is a0(τ) represents the number of tasks processed by the MD over a period of τ, αi(τ) denotes the ith base station eNBiThe number of the tasks processed and completed in the time period tau, and u represents the obtained effect of the MD on successfully completing the tasks;
(2.3) if a task misses the deadline, considering that the task is overtime and will be discarded by the system, thus generating a loss, defining a loss function:
Figure BDA0003049350470000068
wherein d is0(τ) represents the number of tasks dropped by MD, di(τ) denotes eNBiThe number of tasks dropped;
and (2.4) defining an optimization problem model of task unloading under the scene according to the steps (2.1) to (2.3):
Max:R(τ)=U(τ)-P(τ)-E(τ) (a)
Figure BDA0003049350470000071
Figure BDA0003049350470000072
Figure BDA0003049350470000073
Figure BDA0003049350470000074
wherein formula (a) represents an optimized objective reward function R (τ), i.e. maximizing the total utility of the acquired tasks U (τ) while minimizing the loss function P (τ) and the energy consumption E (τ);
formula (b) represents the number constraint of task offloads, z (τ) represents the number of task arrivals over the τ period MD;
equation (c) represents the link transmission capacity constraint, η, between MD and each eNBi(τ) represents a data transmission rate from the MD to the eNB for a period of τ;
equation (d) represents the time constraint for task offloading, TslotRepresents the duration of each time period;
equation (e) represents the computation capability constraint, β, for each base station and MDi(τ) represents the task completed by the process for a period of τ, ci(τ) denotes eNBiThe calculated capacity of (a);
step (3), under a task unloading scene under a multi-eNB MEC environment, constructing an operator-critical deep reinforcement learning network framework;
the operator-critical deep reinforcement learning network framework is composed of all eNBsiAnd a task processing queue of the time period tau of the MD, and the MD sends to all the eNBsiThe data transmission rate and the total number of tasks reached by MD are input states sτTaking a task unloading solution and the calculation capacity of the MD as an action space, taking the task unloading solution as an output, and taking a target reward function of a formula (a) as a reward;
state s of the period of time ττ=[L0(π),L1(τ),...,Li(τ),...,Ln(τ),η1(τ),...,ηi(τ),...,ηn(τ),z(τ)]
Wherein L is0(π) represents the task processing queue in MD, Li(τ) denotes eNBiAn upper task processing queue, i ═ 1,2, … …, n; etai(τ) denotes MD and eNBiZ (τ) represents the total number of tasks reached by the MD;
the vector form of each motion of the motion space is aτ=[a0(τ),...,ai(τ),...,an(τ),c0(τ)]I.e. each action contains the number of MD locally reserved tasks a0(τ), offloading to individual eNBsiTask a ofi(τ) and MD calculation Capacity c0(τ);
The operator-critic deep reinforcement learning network framework adopts an operator network and a critic network;
the operator network adopts [100, n +1 ]]The activation function is RELU, the last layer is an action layer, and n +1 probability values of different actions are output; wherein the operator network policy function is
Figure BDA0003049350470000081
Is represented as state sτObtaining an action value; thetaμIs an operator network weight parameter;
the critic network structure is the same as the actor network; wherein the criticc network evaluation function is
Figure BDA0003049350470000082
Is shown in state sτTake action aτThe action expected value obtained later; thetaQIs a criticc network weight parameter;
step (4), an operator-critic deep reinforcement learning network framework is adopted to carry out optimization solution on the joint reward function of task unloading in the step (2), and a solution of optimal task unloading is obtained;
2. the method for energy consumption aware task offloading based on DRL in a mobile edge computing environment according to claim 1, wherein the step (1) is specifically as follows:
3. the method for energy consumption aware task offloading based on DRL in mobile edge computing environment according to claim 2, wherein the step (1.5) is specifically as follows:
4. the method for energy consumption aware task offloading based on DRL in a mobile edge computing environment according to claim 1, wherein the fourth step is as follows:
(4.1) randomly initializing weights θ of the operator network and the critic networkμAnd thetaQRespectively copying the weights to a target actor network and a target critic network, and setting the empirical playback pool capacity to be D, D>0, simultaneously emptying the experience playback pool;
θμ′←θμ,θQ′←θQ
wherein theta isμ′、θQ′Respectively representing the weights of the target operator network and the target critic network;
(4.2) initializing an MD system environment and distributing tasks to the MD to obtain an initial state value under the current round; the method comprises the following specific steps:
4.2.1 initializing MD system environment and generating a random noise generator N;
4.2.2 allocate z (τ) task for MD, when τ ═ 0 denotes the initial time period;
4.2.3 obtain the initial state value observed by MD from the system environment, i.e. MD local state when the task is not running and since MD is not offloading the task to eNB at this time, when τ is 0, MD local state is:
sτ=[L0(π),η0(τ),z(τ)] (5)
(4.3) operating an operator-critic deep reinforcement learning network framework to obtain an optimal value action for the state in each time period; the method comprises the following specific steps:
4.3.1 operator network according to the current time period status sτOutputting prototype actions, entering an embedding layer for mapping, and extracting k nearest neighbor value actions by using a KNN algorithm; the method comprises the following specific steps:
4.3.1.1 State sτInputting into the operator network, the operator network based on the input policyPi to obtain output
Figure BDA0003049350470000091
And in order to increase the learning randomness, a search noise point N is addedτGet the prototype action apI.e. by
Figure BDA0003049350470000092
4.3.1.2 to convert the motion value a in continuous spacepMapping to action value a in discrete spacep'An embedding layer is arranged between the operator and the critical, and the obtained a is processedpInputting the embedding layer and outputting d mapped ap'(ii) a D mapped action values ap'K neighbor value sets A extracted by KNN algorithmkMeasured as Euclidean distance between actions, i.e. Ak=knn(ap') K may be selected to be 10;
4.3.2 criticic network obtaining all the nearest neighbor value actions obtained in the step (3.2.2.1), and screening to obtain an optimal value action; the MD saves the current state to an experience playback pool after executing the optimal value action; the method comprises the following specific steps:
4.3.2.1 will AkThe actions are respectively input into the critic current network, and the critic performs the functions according to the strategy
Figure BDA0003049350470000093
Outputting different behavior actions A in the current statekThe corresponding value is selected to have the maximum value of axAs predictive input action of MD, i.e.
Figure BDA0003049350470000094
4.3.2.2MD according to action aτExecuting task unloading decision, and obtaining return r according to action execution resultτAnd a new state s is observedτ+1Forming new vector samples sτ,aτ,rτ,sτ+1]And storing the experience playback pool;
4.3.3 updating network parameters; the method comprises the following specific steps:
1) randomly sampling m samples [ s ] from an empirical playback poolτ,aτ,rτ,sτ+1]Sending the data to the current actor network, the current critic network, the target actor network and the target critic network;
2) the target actor network follows the state s of the next time periodτ+1Output action a'τ+1The target critic network depends on the state sτ+1And action a 'output from target actor network'τ+1Obtaining the current target expected value yτ(ii) a And the current target expected value yτDelivered to the mean square error loss function
Figure BDA0003049350470000095
Figure BDA0003049350470000096
Wherein
Figure BDA0003049350470000097
Representing the target network, gamma representing the reduction factor;
3) the current critic network depends on the state sτAction aτAnd a prize rτOutputting an evaluation function
Figure BDA0003049350470000101
Given sampling strategy gradient
Figure BDA0003049350470000102
Sum mean square error loss function
Figure BDA0003049350470000103
4) Updating all weights theta of operator network and critic network through back propagation of neural networkQ,θu
A loss function of mean square error of
Figure BDA0003049350470000104
The sampling strategy has a gradient function of
Figure BDA0003049350470000105
5) Updating network parameters of the target operator network and the target critical network, namely:
θQ′←σθQ+(1-σ)θQ′
θμ′←σθμ+(1-σ)θμ′
wherein σ is a network update weight, set to 0.1;
6) the actor network obtains the state s of the next time period from the experience recycle poolτ+1Repeating steps 1) to 6) up to a maximum time period;
4.3.4 repeat steps 4.3.1-4.3.3 until the maximum number of rounds is reached to obtain stable model parameters.
In order to verify the feasibility of the method, the method is compared with the traditional three algorithms LB, Remote, Local and the reinforced learning network DQN through experiments.
Calculated capacity c of each enb in this experimenti(tau) 10GHz, transmission power with MD
Figure BDA0003049350470000106
The total working time is 1000s, and the size T of each time periodslot1s, each task has the same workload W25 GHz · s, a data size D10 MB, and a deadline T for each time period τDLSet to 3s, when the task is completed at the deadline, the MD can get the utility u equal to 1, the bandwidth B of the wireless network equal to 100MHz, and white gaussian noise σ2-174dbm/Hz, constant of path loss
Figure BDA0003049350470000107
The path loss exponent θ is 4, and the distance length d of each enb from the MD is 1000.
The CPU core number M of MD is 4, and the operating frequency of each CPU is 2.0GHz, so the computation power of MD is
Figure BDA0003049350470000108
Wherein
Figure BDA0003049350470000109
And comparing the performance of each algorithm under different conditions by using three indexes of reward, energy consumption and loss cost generated by task discarding of the MD.
1. Convergence comparison
Since the invention applies Knn algorithm to extract motion characteristics from continuous space to discrete space, the influence of different k sizes in KNN algorithm on convergence is considered in experiments, wherein k is 1 to extract only one motion from the prototype motion, and k is 1% to extract 1% from the prototype motion. In the case that the number of enb in fig. 3(1), (3) and (5) is 1, 3 and 5, respectively, the convergence comparison experiment is performed on the DRL-E2D provided by the present invention and the conventional DQN algorithm, and the upper limit of the number of cycles is 250.
From fig. 3(2) - (3), it can be seen that DRL-E2D performs better than k-1% when k is 1%, because the larger k is more beneficial for the neural network to infer better next action based on its own strategy, and it can be seen that the DQN convergence performance is consistently worse than DRL-E2D within the same number of cycles regardless of the number of enb.
Effect of the number of eNBs
Fig. 4(1) shows that as the number of eNB increases, the rewards gained by LB, Remote, DRL-E2D and DQN increase because these algorithms can benefit and offload tasks to eNB, and furthermore, as eNB increases, MD gains more rewards by doing more tasks and consuming less energy. Fig. 4(2) shows that the power consumption of DRL-E2D remains constant regardless of the number of enb, since the MD tends to give up tasks instead of performing tasks in order to obtain the maximum reward. FIG. 4(3) shows that as enb increases, the penalty for the remaining algorithms, with the exception of the Local algorithm, decreases accordingly.
3. Effect of task workload W
Fig. 5(1) can see that as W increases, the rewards earned by all algorithms gradually decrease because for a fixed task arrival rate λ, a larger W requires more computing resources, resulting in higher energy consumption, fewer completed tasks and lower rewards, but DRL-E2D consistently performs better than other algorithms, indicating that it is more adaptable to changes in W. Fig. 5(2) shows that Remote has the lowest energy consumption and is independent of W. The energy consumption of LB, DRL-E2D and DQN increases with increasing W, since a larger W requires more computing resources and time, resulting in higher energy consumption. Figure 5(3) shows that as W increases, the penalty increases for all algorithms, with loss variation for Local being the most drastic, since it discards more tasks than other algorithms.
4. Influence of data size D
FIG. 6(1) shows that in addition to Local, the rewards earned by the remaining algorithms increase as D increases because DRL-E2D, LB, Remote and DQN apply task offload policies. Thus, with larger D, the MD will spend more energy offloading tasks to the eNB. While Local does not employ task offloading, so its reward is independent of D.
Similarly, the power loss and loss penalty of MD in fig. 6(1) - (2) also gradually increases with increasing D. In this case, since the Remote algorithm offloads all tasks to enb, the number of tasks discarded is larger, and the loss cost thereof changes faster.
In conclusion, the DRL-E2D algorithm provided by the invention performs well under various conditions.

Claims (10)

1. The DRL-based energy consumption perception task unloading method under the mobile edge computing environment is characterized by comprising the following steps of:
step (1), constructing a task unloading scene under a multi-eNB MEC environment;
step (2), constructing a joint reward function of a task unloading scene under the constraint of deadline under the environment of multiple eNB MECs, specifically as follows:
(2.1) define the total energy consumption E (τ) to perform tasks and offload tasks locally to the eNB at each time period MD as:
Figure FDA0003049350460000011
wherein
Figure FDA0003049350460000012
Denotes MD local execution beta0(τ) the amount of energy consumed by the tasks,
Figure FDA0003049350460000013
denotes alphai(τ) offloading of tasks to eNBiTotal transmission energy consumption of;
(2.2) considering the task deadline constraint, defining the total utility U (τ) of MD and all base stations as:
Figure FDA0003049350460000014
Figure FDA0003049350460000015
wherein n represents the number of eNBs in the MEC; t (T)j) Representing the jth task tjWaiting or execution time of, TDLRepresenting the deadline of the task; beta is a0(τ) represents the number of tasks processed by the MD over a period of τ, αi(τ) denotes the ith base station eNBiThe number of the tasks processed and completed in the time period tau, and u represents the obtained effect of the MD on successfully completing the tasks;
(2.3) if a task misses the deadline, considering that the task is overtime and will be discarded by the system, thus generating a loss, defining a loss function:
Figure FDA0003049350460000016
wherein d is0(τ) represents the number of tasks dropped by MD, di(τ) denotes eNBiThe number of tasks dropped;
and (2.4) defining an optimization problem model of task unloading under the scene according to the steps (2.1) to (2.3):
Max:R(τ)=U(τ)-P(τ)-E(τ) (a)
Figure FDA0003049350460000017
Figure FDA0003049350460000021
Figure FDA0003049350460000022
Figure FDA0003049350460000023
step (3) all eNBs under the task unloading scene under the multi-eNB MEC environmentiAnd a task processing queue of the time period tau of the MD, and the MD sends to all the eNBsiThe data transmission rate and the total number of tasks reached by MD are input states sτConstructing an operator-critical deep reinforcement learning network framework by taking a task unloading solution and the calculation capacity of the MD as an action space, taking the task unloading solution as an output and taking a target reward function of a formula (a) as a reward;
state s of the period of time ττ=[L0(π),L1(τ),...,Li(τ),...,Ln(τ),η1(τ),...,ηi(τ),...,ηn(τ),z(τ)];
Wherein L is0(π) represents the task processing queue in MD, Li(τ) denotes eNBiAn upper task processing queue, i ═ 1,2, … …, n; etai(τ) denotes MD and eNBiThe rate of data transmission between the first and second,z (τ) represents the total number of tasks reached by the MD;
the vector form of each motion of the motion space is aτ=[a0(τ),...,ai(τ),...,an(τ),c0(τ)]I.e. each action contains the number of MD locally reserved tasks a0(τ), offloading to individual eNBsiTask a ofi(τ) and MD calculation Capacity c0(τ);
And (4) adopting an operator-critic deep reinforcement learning network framework to carry out optimization solution on the joint reward function for task unloading in the step (2) to obtain a solution for optimal task unloading.
2. The method for energy consumption aware task offloading based on DRL in a mobile edge computing environment according to claim 1, wherein the step (1) is specifically as follows:
the overall architecture of a task unloading scene under a multi-eNB MEC environment mainly comprises a single MD and n base station eNBs; the MD is used for sending the designated tasks to each base station for unloading and simultaneously executing the tasks locally;
(1.1) dividing the system time into equally spaced time periods, assuming that z (τ) tasks arrive at MD at the beginning of each time period, they are considered as an independent and identically distributed sequence, and each arriving task has constant data D and execution workload W;
(1.2) defining the ith time period from MD to ith base station eNBiData transmission rate ηi(τ):
ηi(τ)=Bilog2[1+SNRi(τ)] (1)
Wherein B isiRepresenting eNBiThe bandwidth allocated to the MD is such that,
Figure FDA0003049350460000031
which is indicative of the signal-to-noise ratio,
Figure FDA0003049350460000032
representing the transmission power, σ, of the MD2Representing white gaussian noiseSound, gi(τ) represents the channel gain, defined as
Figure FDA0003049350460000033
Figure FDA0003049350460000034
And theta denotes a path loss constant and a path loss exponent, respectively, di(τ) denotes eNBiPath distance from MD at time period τ;
(1.4) definition of eNBiτ +1 th slot task processing queue Li(τ+1):
Li(τ+1)=max{Li(τ)-βi(τ),0}+αi(τ) (2)
Wherein alpha isi(τ) indicates all offloading to eNBiTask of (1), betai(τ) denotes eNBiProcessing completed tasks in the tau time period;
define the # 1 time slot task processing queue L of MD0(τ+1):
L0(τ+1)=max{L0(τ)-β0(τ),0}+α0(τ) (3)
Wherein alpha is0(τ) is a local task of MD, β0(τ) is the task processed and completed in the τ -th time period of the MD;
(1.5) since the task can be executed on MD or eNB respectively, its execution time is defined respectively
Figure FDA0003049350460000035
And consumption of capacity
Figure FDA0003049350460000036
3. The method for energy consumption aware task offloading based on DRL in mobile edge computing environment according to claim 2, wherein the step (1.5) is specifically as follows:
(1.5.1) for the case where the task is executed locally in the MD, its execution time and energy consumption are defined as:
Figure FDA0003049350460000037
Figure FDA0003049350460000038
wherein
Figure FDA0003049350460000039
Represents the computing power of an MD with an M-core CPU, and is defined as
Figure FDA00030493504600000310
Wherein
Figure FDA00030493504600000311
Represents a constant related to the chip architecture; f (tau) represents the working frequency of the M-core CPU; m represents an M core; c. C0(τ) represents the calculated capacity of the MD, denoted c0(τ) ═ MF (τ); w represents the workload of the task;
(1.5.2) offloading tasks to eNB for MDiThe execution condition needs to consider the data transmission time and the execution time respectively; defining the data transmission time as:
Ti θx(τ)=D/ηi(τ) (6)
meanwhile, the energy consumed by data transmission can be defined as:
Figure FDA0003049350460000041
wherein D represents the data size of the task;
Figure FDA0003049350460000042
represents the transmission power of the MD;
when eNBiReceiving a taskThen, the task is put into the self task processing queue Q according to the rule of first-come-first-obtainedi(τ); defining the task execution time as follows:
Ti θx(τ)=W/ci(τ) (6)
wherein W represents the workload of a task; c. Ci(τ) denotes eNBiThe computing capacity of (2).
4. The method for energy consumption aware task offloading based on DRL in a mobile edge computing environment according to claim 1, wherein the fourth step is as follows:
(4.1) randomly initializing weights θ of the operator network and the critic networkμAnd thetaQRespectively copying the weights to a target actor network and a target critic network, and setting the empirical playback pool capacity to be D, D>0, simultaneously emptying the experience playback pool;
θμ’←θμ,θQ’←θQ
wherein theta isμ’、θQ’Respectively representing the weights of the target operator network and the target critic network;
(4.2) initializing an MD system environment and distributing tasks to the MD to obtain an initial state value under the current round;
and (4.3) operating the operator-critic deep reinforcement learning network framework to obtain the optimal value action for the state in each time period.
5. The method for DRL-based energy consumption aware task offloading in a mobile edge computing environment according to claim 4, wherein the step (4.2) is specifically as follows:
4.2.1 initializing MD system environment and generating a random noise generator N;
4.2.2 allocate z (τ) task for MD, when τ ═ 0 denotes the initial time period;
4.2.3 obtain the initial state value observed by MD from the system environment, i.e. MD local state when the task is not running and since MD is not offloading the task to eNB at this time, when τ is 0, MD local state is:
sτ=[L0(π),η0(τ),z(τ)] (5)。
6. the method for DRL-based energy consumption aware task offloading in a mobile edge computing environment according to claim 4, wherein the step (4.3) is specifically as follows:
4.3.1 operator network according to the current time period status sτOutputting prototype actions, entering an embedding layer for mapping, and extracting k nearest neighbor value actions by using a KNN algorithm;
4.3.2 criticic network obtaining all the nearest neighbor value actions obtained in the step (3.2.2.1), and screening to obtain an optimal value action; the MD saves the current state to an experience playback pool after executing the optimal value action;
4.3.3 updating network parameters;
4.3.4 repeat steps 4.3.1-4.3.3 until the maximum number of rounds is reached to obtain stable model parameters.
7. The method for DRL-based energy consumption aware task offloading in a mobile edge computing environment according to claim 6, wherein the step (4.3.1) is specifically as follows:
4.3.1.1 State sτInputting the operator network, and obtaining output by the operator network according to the input strategy pi
Figure FDA0003049350460000051
And in order to increase the learning randomness, a search noise point N is addedτGet the prototype action apI.e. by
Figure FDA0003049350460000052
4.3.1.2 to convert the motion value a in continuous spacepMapping to action value a in discrete spacep'An embedding layer is arranged between the operator and the critical, and the obtained a is processedpInputting the embedding layer and outputting d mapped ap'(ii) a D mapped action values ap'K neighbor value sets A extracted by KNN algorithmkMeasured in Euclidean between actionsDistance, i.e. Ak=knn(ap')。
8. The method for energy consumption aware task offloading based on DRL in mobile edge computing environment according to claim 1, wherein the step (4.3.2) is specifically as follows:
4.3.2.1 will AkThe actions are respectively input into the critic current network, and the critic performs the functions according to the strategy
Figure FDA0003049350460000054
Outputting different behavior actions A in the current statekThe corresponding value is selected to have the maximum value of axAs predictive input action of MD, i.e.
Figure FDA0003049350460000053
4.3.2.2MD according to action aτExecuting task unloading decision, and obtaining return r according to action execution resultτAnd a new state s is observedτ+1Forming new vector samples sτ,aτ,rτ,sτ+1]And stores in an experience playback pool.
9. A computer-readable storage medium, having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-8.
10. A computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method of any of claims 1-8.
CN202110481249.9A 2021-04-30 2021-04-30 DRL-based energy consumption perception task unloading method in mobile edge computing environment Active CN113157344B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110481249.9A CN113157344B (en) 2021-04-30 2021-04-30 DRL-based energy consumption perception task unloading method in mobile edge computing environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110481249.9A CN113157344B (en) 2021-04-30 2021-04-30 DRL-based energy consumption perception task unloading method in mobile edge computing environment

Publications (2)

Publication Number Publication Date
CN113157344A true CN113157344A (en) 2021-07-23
CN113157344B CN113157344B (en) 2022-06-14

Family

ID=76872731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110481249.9A Active CN113157344B (en) 2021-04-30 2021-04-30 DRL-based energy consumption perception task unloading method in mobile edge computing environment

Country Status (1)

Country Link
CN (1) CN113157344B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113590229A (en) * 2021-08-12 2021-11-02 中山大学 Industrial Internet of things graph task unloading method and system based on deep reinforcement learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109362064A (en) * 2018-09-14 2019-02-19 重庆邮电大学 The task buffer allocation strategy based on MEC in mobile edge calculations network
CN110262845A (en) * 2019-04-30 2019-09-20 北京邮电大学 The enabled distributed computing task discharging method of block chain and system
CN112015481A (en) * 2020-06-04 2020-12-01 湖南大学 Multi-Agent reinforcement learning-based mobile edge calculation unloading algorithm
CN112261674A (en) * 2020-09-30 2021-01-22 北京邮电大学 Performance optimization method of Internet of things scene based on mobile edge calculation and block chain collaborative enabling

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109362064A (en) * 2018-09-14 2019-02-19 重庆邮电大学 The task buffer allocation strategy based on MEC in mobile edge calculations network
CN110262845A (en) * 2019-04-30 2019-09-20 北京邮电大学 The enabled distributed computing task discharging method of block chain and system
CN112015481A (en) * 2020-06-04 2020-12-01 湖南大学 Multi-Agent reinforcement learning-based mobile edge calculation unloading algorithm
CN112261674A (en) * 2020-09-30 2021-01-22 北京邮电大学 Performance optimization method of Internet of things scene based on mobile edge calculation and block chain collaborative enabling

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
胡海洋等: "移动云计算环境下任务调度的多目标优化方法", 《计算机研究与发展》, 15 September 2017 (2017-09-15) *
詹文翰: "移动边缘网络计算卸载调度与资源管理策略优化研究", 《CNKI中国学术文献网络出版总库博士论文》, 15 July 2020 (2020-07-15) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113590229A (en) * 2021-08-12 2021-11-02 中山大学 Industrial Internet of things graph task unloading method and system based on deep reinforcement learning
CN113590229B (en) * 2021-08-12 2023-11-10 中山大学 Industrial Internet of things graph task unloading method and system based on deep reinforcement learning

Also Published As

Publication number Publication date
CN113157344B (en) 2022-06-14

Similar Documents

Publication Publication Date Title
CN113242568B (en) Task unloading and resource allocation method in uncertain network environment
CN113950066B (en) Single server part calculation unloading method, system and equipment under mobile edge environment
CN113543176B (en) Unloading decision method of mobile edge computing system based on intelligent reflecting surface assistance
CN110928654B (en) Distributed online task unloading scheduling method in edge computing system
CN112380008B (en) Multi-user fine-grained task unloading scheduling method for mobile edge computing application
CN111556461A (en) Vehicle-mounted edge network task distribution and unloading method based on deep Q network
US11784931B2 (en) Network burst load evacuation method for edge servers
CN111367657A (en) Computing resource collaborative cooperation method based on deep reinforcement learning
CN111813506A (en) Resource sensing calculation migration method, device and medium based on particle swarm algorithm
CN112214301B (en) Smart city-oriented dynamic calculation migration method and device based on user preference
CN114285853A (en) Task unloading method based on end edge cloud cooperation in equipment-intensive industrial Internet of things
CN116489712B (en) Mobile edge computing task unloading method based on deep reinforcement learning
EP4024212A1 (en) Method for scheduling interference workloads on edge network resources
CN113485826A (en) Load balancing method and system for edge server
CN113573363A (en) MEC calculation unloading and resource allocation method based on deep reinforcement learning
CN113590279A (en) Task scheduling and resource allocation method for multi-core edge computing server
CN116112488A (en) Fine-grained task unloading and resource allocation method for MEC network
CN113157344B (en) DRL-based energy consumption perception task unloading method in mobile edge computing environment
CN113946423A (en) Multi-task edge computing scheduling optimization method based on graph attention network
CN116009990B (en) Cloud edge collaborative element reinforcement learning computing unloading method based on wide attention mechanism
CN117579701A (en) Mobile edge network computing and unloading method and system
CN116193516A (en) Cost optimization method for efficient federation learning in Internet of things scene
CN110768827A (en) Task unloading method based on group intelligent algorithm
Yao et al. Performance Optimization in Serverless Edge Computing Environment using DRL-Based Function Offloading
CN114520772B (en) 5G slice resource scheduling method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant