CN113157344A - DRL-based energy consumption perception task unloading method in mobile edge computing environment - Google Patents
DRL-based energy consumption perception task unloading method in mobile edge computing environment Download PDFInfo
- Publication number
- CN113157344A CN113157344A CN202110481249.9A CN202110481249A CN113157344A CN 113157344 A CN113157344 A CN 113157344A CN 202110481249 A CN202110481249 A CN 202110481249A CN 113157344 A CN113157344 A CN 113157344A
- Authority
- CN
- China
- Prior art keywords
- task
- enb
- tasks
- drl
- energy consumption
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/445—Program loading or initiating
- G06F9/44594—Unloading
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5072—Grid computing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses an energy consumption perception task unloading method based on DRL in a mobile edge computing environment. The invention designs a state space, an action space and a reward function of a task unloading problem under a multi-eNB MEC environment. An actor-critic framework is adopted as the basic structure of the whole DRL-E2D algorithm, namely two neural networks of actor and critic are mainly included. At the same time, the state observed by MD under the environment is used as the input of the operator, and the action and state of the operator output are used as the critical network input. The invention combines the relevant knowledge of the intensive deep learning and considers the deadline constraint into the reward function, so that the MD can make the optimal decision of unloading the tasks to a plurality of eNBs under the condition of limiting the task duration according to the system state.
Description
Technical Field
The invention belongs to the technical field of mobile edge computing, and relates to an energy consumption perception task unloading decision method in mobile edge computing, in particular to a DRL-based model-free task unloading decision method under the constraint of deadline.
Background
With the development of wireless networks, more and more mobile applications are beginning to emerge and are receiving tremendous popularity. These mobile applications cover a wide range of fields, such as traffic monitoring, smart homes, real-time vision processing, target tracking, etc., often requiring computationally intensive resources to achieve a high quality of experience (QoE), and running all applications on a single MD can result in high energy consumption and delay despite the increasing performance of Mobile Devices (MDs). Mobile Edge Computing (MEC) has become a promising technology to address this problem, providing Computing power within a wireless access network compared to traditional cloud Computing systems using a remote public cloud. The advent of MEC allows MD to offload its computationally intensive tasks to near-end enodebs (enbs) to enhance computational power. Task or computing offloading in the MEC environment has been extensively studied at present. Conventional offloading schemes are model-based, i.e. it is generally assumed that the mobile signals between the MD and the eNB are well modeled. However, the MEC environment is very complex and the mobility of the user is highly dynamic, making the mobility model difficult to build and predict. With the generation of Deep Reinforcement Learning (DRL), more and more researchers unload the tasks applied to the MEC, and the DRL has three advantages that 1) the DRL is a model-free optimization method and does not need any mathematical knowledge based on models; 2) the optimization problem in a high dynamic time-varying system can be solved; 3) it can handle large state and motion space problems. The above features indicate that DRL is an ideal method for MEC to accomplish task offloading. However, applying DRL technology for MEC task offloading should consider and solve the following problems: first, the proposed MEC task offloading problem for high density enbss is a large discrete action space problem. For example, there are 5 e NBSs in the MEC for the MD to offload 20 tasks, and there are 5 million offload operations. In this case, deep neural network (DQN) based DRL does not work well because it has only the potential to handle small motion space problems. Second, task offloading is a discrete control problem, so continuous control methods such as depth-deterministic policy gradients (DDPG) will not work properly. And all the above methods take the task processing time as the average performance requirement, and do not consider the cutoff running time of the task, which is unreasonable. Thus, the reward functions for current task offloading schemes focus primarily on average-based performance metrics, failing to meet the deadline constraints for the task. The invention provides a DRL-based energy consumption perception task unloading method (DRL-E2D) under a mobile edge computing environment, which is used for learning an optimal decision from an unknown environment based on a deep reinforcement learning technology, so that the MD can maximize the task unloading utility under the condition of meeting task deadline constraints.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an energy consumption perception task unloading method based on DRL in a mobile edge computing environment.
The general idea of the inventive method is:
the task unloading architecture of the multi-eNB MEC environment mainly comprises an MD and a plurality of eNBs. The MD may generate a certain number of tasks in each time period, and each task may be offloaded to any eNB through the wireless network for execution. Therefore, a reasonable offloading scheme is very important, which directly affects the execution time of the task and the energy consumption of the MD. Aiming at the condition that the deadline constraint of tasks is not considered by the reward function of most of the current task unloading schemes, the invention combines the deadline constraint with the utility of MD to finish the tasks, considers the energy consumption of MD and the task discarding penalty, and designs a combined reward function for processing the optimization problem.
The invention adopts DRL-E2D algorithm to solve the problems, firstly, the state space, the action space and the reward function of the task unloading problem under the multi-eNB MEC environment are designed. An actor-critic framework is adopted as the basic structure of the whole DRL-E2D algorithm, namely two neural networks of actor and critic are mainly included. At the same time, the state observed by MD under the environment is used as the input of the operator, and the action and state of the operator output are used as the critical network input. In order to deal with the problem of dimensionality disaster of a high-dimensional discrete motion space, an embedding layer is added into an actor network and a critic network, the embedding layer is used for converting continuous motion under the space into discrete motion, and a KNN algorithm with low complexity is adopted to extract a nearest neighbor motion value.
The method comprises the following specific steps:
step (1), constructing a task unloading scene under a multi-eNB MEC environment;
step (2), constructing a joint reward function of a task unloading scene under the constraint of deadline under the environment of multiple eNB MECs:
Max:R(τ)=U(τ)-P(τ)-E(τ) (a)
step (3), under a task unloading scene under a multi-eNB MEC environment, constructing an operator-critical deep reinforcement learning network framework;
step (4), an operator-critic deep reinforcement learning network framework is adopted to carry out optimization solution on the joint reward function of task unloading in the step (2), and a solution of optimal task unloading is obtained;
it is a further object of the present invention to provide a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the above-mentioned method.
It is a further object of the present invention to provide a computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method described above.
The invention has the beneficial effects that: the invention is used for a multi-eNB environment with a high-dimensional discrete action space in mobile edge calculation, such as various application scenes of traffic monitoring, smart home, real-time visual processing, AI application and the like, and aims to optimize the long-term energy consumption of MD (machine direction) so as to save the battery capacity of MD. The invention combines the relevant knowledge of the intensive deep learning and considers the deadline constraint into the reward function, so that the MD can make the optimal decision of unloading the tasks to a plurality of eNBs under the condition of limiting the task duration according to the system state.
n: representing the number of eNBs in the MEC;
Tslot: represents the duration of each time period;
w: representing the workload of the task;
d: a data size representing a task;
λ: representing the rate at which tasks arrive at the MD;
TDL: representing a cutoff constraint for the task;
z (τ): represents the number of task arrivals over the τ period MD;
ηi(τ): represents the data transmission rate from the MD to the eNB during the τ period;
Li(τ) represents a task queue processed by the eNB during the τ period;
αi(τ): represents the number of tasks offloaded on the eNB over a period of τ;
βi(τ): represents the completed task processed at the time period of tau;
ci(τ): represents the computation capacity of the MD or eNB;
di(τ) represents the amount of tasks deleted by each eNB and MD for a period of τ;
representing the computational power of the MD over the period of tau, as determined by its own hardware.
E (τ): represents the total amount of energy consumed during the τ period MD;
u (τ): indicating the overall utility over the period of time tau.
P (τ): represents the penalty incurred by all the drop tasks during the period of τ;
r (τ): represents all rewards during the period of τ;
drawings
FIG. 1 is an architecture for task offloading in a multi-eNB MEC environment;
FIG. 2 is an architectural diagram of DRL-E2D;
FIG. 3(1) - (3) shows the convergence comparison experiment of DRL-E2D of the present invention and the conventional DQN algorithm under the condition that nb number k is 1, 3, and 5, respectively;
FIGS. 4(1) - (3) respectively show the reward, energy consumption and loss cost obtained by LB, Remote, DRL-E2D, DQN and MD algorithms at different enb numbers;
FIGS. 5(1) - (3) are respectively the reward, energy consumption and loss cost obtained by LB, Remote, DRL-E2D, DQN, MD algorithms at different task workloads W;
fig. 6(1) - (3) respectively show the reward, energy consumption and loss cost obtained by the LB, Remote, DRL-E2D, DQN and MD algorithms at different data sizes D.
Detailed Description
The invention is further analyzed with reference to the following figures.
FIG. 2 is an architectural diagram of DRL-E2D. The DRL-based energy consumption perception task unloading method under the mobile edge computing environment comprises the following steps:
step (1), constructing a task unloading scene under a multi-eNB MEC environment; FIG. 1 is an architecture for task offloading in a multi-eNB MEC environment;
the overall architecture of a task unloading scene under a multi-eNB MEC environment mainly comprises a single MD and n base station eNBs; the MD is used for sending the designated tasks to each base station for unloading and simultaneously executing the tasks locally;
(1.1) dividing the system time into equally spaced time periods, assuming that z (τ) tasks arrive at MD at the beginning of each time period, they are considered as an independent and identically distributed sequence, and each arriving task has constant data D and execution workload W;
(1.2) defining the ith time period from MD to ith base station eNBiData transmission rate ηi(τ):
ηi(τ)=Bilog2[1+SNRi(τ)] (1)
Wherein B isiRepresenting eNBiThe bandwidth allocated to the MD is such that,which is indicative of the signal-to-noise ratio,representing the transmission power, σ, of the MD2Representing white Gaussian noise, gi(τ) represents the channel gain, defined asAnd theta denotes a path loss constant and a path loss exponent, respectively, di(τ) denotes eNBiPath distance from MD at time period τ;
(1.4) definition of eNBiτ +1 th slot task processing queue Li(τ+1):
Li(τ+1)=max{Li(τ)-βi(τ),0}+αi(τ) (2)
Wherein alpha isi(τ) indicates all offloading to eNBiTask of (1), betai(τ) denotes eNBiProcessing completed tasks in the tau time period;
define the # 1 time slot task processing queue L of MD0(τ+1):
L0(τ+1)=max{L0(τ)-β0(τ),0}+α0(τ) (3)
Wherein alpha is0(τ) is a local task of MD, β0(τ) is the task processed and completed in the τ -th time period of the MD;
(1.5) since the task can be executed on MD or eNB respectively, its execution time is defined respectivelyAnd consumption of capacity
(1.5.1) for the case where the task is executed locally in the MD, its execution time and energy consumption are defined as:
whereinRepresents the computing power of an MD with an M-core CPU, and is defined asWhereinRepresents a constant related to the chip architecture; f (tau) represents the working frequency of the M-core CPU; m represents an M core; c. C0(τ) represents the calculated capacity of the MD, denoted c0(τ) ═ MF (τ); w represents the workload of the task;
(1.5.2) offloading tasks to eNB for MDiThe execution condition needs to consider the data transmission time and the execution time respectively; defining dataThe input time is as follows:
meanwhile, the energy consumed by data transmission can be defined as:
when eNBiAfter receiving the task, the task is put into a task processing queue Q of the task according to the rule of first-come first-obtainedi(τ); defining the task execution time as follows:
wherein W represents the workload of a task; c. Ci(τ) denotes eNBiThe calculated capacity of (a);
step (2), constructing a joint reward function of a task unloading scene under the constraint of deadline under the environment of multiple eNB MECs, specifically as follows:
(2.1) define the total energy consumption E (τ) to perform tasks and offload tasks locally to the eNB at each time period MD as:
whereinDenotes MD local execution beta0(τ) the amount of energy consumed by the tasks,denotes alphai(τ) offloading of tasks to eNBiTotal transmission energy consumption of;
(2.2) considering the task deadline constraint, defining the total utility U (τ) of MD and all base stations as:
wherein n represents the number of eNBs in the MEC; t (T)j) Representing the jth task tjWaiting or execution time of, TDLRepresenting the deadline of the task; beta is a0(τ) represents the number of tasks processed by the MD over a period of τ, αi(τ) denotes the ith base station eNBiThe number of the tasks processed and completed in the time period tau, and u represents the obtained effect of the MD on successfully completing the tasks;
(2.3) if a task misses the deadline, considering that the task is overtime and will be discarded by the system, thus generating a loss, defining a loss function:
wherein d is0(τ) represents the number of tasks dropped by MD, di(τ) denotes eNBiThe number of tasks dropped;
and (2.4) defining an optimization problem model of task unloading under the scene according to the steps (2.1) to (2.3):
Max:R(τ)=U(τ)-P(τ)-E(τ) (a)
wherein formula (a) represents an optimized objective reward function R (τ), i.e. maximizing the total utility of the acquired tasks U (τ) while minimizing the loss function P (τ) and the energy consumption E (τ);
formula (b) represents the number constraint of task offloads, z (τ) represents the number of task arrivals over the τ period MD;
equation (c) represents the link transmission capacity constraint, η, between MD and each eNBi(τ) represents a data transmission rate from the MD to the eNB for a period of τ;
equation (d) represents the time constraint for task offloading, TslotRepresents the duration of each time period;
equation (e) represents the computation capability constraint, β, for each base station and MDi(τ) represents the task completed by the process for a period of τ, ci(τ) denotes eNBiThe calculated capacity of (a);
step (3), under a task unloading scene under a multi-eNB MEC environment, constructing an operator-critical deep reinforcement learning network framework;
the operator-critical deep reinforcement learning network framework is composed of all eNBsiAnd a task processing queue of the time period tau of the MD, and the MD sends to all the eNBsiThe data transmission rate and the total number of tasks reached by MD are input states sτTaking a task unloading solution and the calculation capacity of the MD as an action space, taking the task unloading solution as an output, and taking a target reward function of a formula (a) as a reward;
state s of the period of time ττ=[L0(π),L1(τ),...,Li(τ),...,Ln(τ),η1(τ),...,ηi(τ),...,ηn(τ),z(τ)]
Wherein L is0(π) represents the task processing queue in MD, Li(τ) denotes eNBiAn upper task processing queue, i ═ 1,2, … …, n; etai(τ) denotes MD and eNBiZ (τ) represents the total number of tasks reached by the MD;
the vector form of each motion of the motion space is aτ=[a0(τ),...,ai(τ),...,an(τ),c0(τ)]I.e. each action contains the number of MD locally reserved tasks a0(τ), offloading to individual eNBsiTask a ofi(τ) and MD calculation Capacity c0(τ);
The operator-critic deep reinforcement learning network framework adopts an operator network and a critic network;
the operator network adopts [100, n +1 ]]The activation function is RELU, the last layer is an action layer, and n +1 probability values of different actions are output; wherein the operator network policy function isIs represented as state sτObtaining an action value; thetaμIs an operator network weight parameter;
the critic network structure is the same as the actor network; wherein the criticc network evaluation function isIs shown in state sτTake action aτThe action expected value obtained later; thetaQIs a criticc network weight parameter;
step (4), an operator-critic deep reinforcement learning network framework is adopted to carry out optimization solution on the joint reward function of task unloading in the step (2), and a solution of optimal task unloading is obtained;
2. the method for energy consumption aware task offloading based on DRL in a mobile edge computing environment according to claim 1, wherein the step (1) is specifically as follows:
3. the method for energy consumption aware task offloading based on DRL in mobile edge computing environment according to claim 2, wherein the step (1.5) is specifically as follows:
4. the method for energy consumption aware task offloading based on DRL in a mobile edge computing environment according to claim 1, wherein the fourth step is as follows:
(4.1) randomly initializing weights θ of the operator network and the critic networkμAnd thetaQRespectively copying the weights to a target actor network and a target critic network, and setting the empirical playback pool capacity to be D, D>0, simultaneously emptying the experience playback pool;
θμ′←θμ,θQ′←θQ;
wherein theta isμ′、θQ′Respectively representing the weights of the target operator network and the target critic network;
(4.2) initializing an MD system environment and distributing tasks to the MD to obtain an initial state value under the current round; the method comprises the following specific steps:
4.2.1 initializing MD system environment and generating a random noise generator N;
4.2.2 allocate z (τ) task for MD, when τ ═ 0 denotes the initial time period;
4.2.3 obtain the initial state value observed by MD from the system environment, i.e. MD local state when the task is not running and since MD is not offloading the task to eNB at this time, when τ is 0, MD local state is:
sτ=[L0(π),η0(τ),z(τ)] (5)
(4.3) operating an operator-critic deep reinforcement learning network framework to obtain an optimal value action for the state in each time period; the method comprises the following specific steps:
4.3.1 operator network according to the current time period status sτOutputting prototype actions, entering an embedding layer for mapping, and extracting k nearest neighbor value actions by using a KNN algorithm; the method comprises the following specific steps:
4.3.1.1 State sτInputting into the operator network, the operator network based on the input policyPi to obtain outputAnd in order to increase the learning randomness, a search noise point N is addedτGet the prototype action apI.e. by
4.3.1.2 to convert the motion value a in continuous spacepMapping to action value a in discrete spacep'An embedding layer is arranged between the operator and the critical, and the obtained a is processedpInputting the embedding layer and outputting d mapped ap'(ii) a D mapped action values ap'K neighbor value sets A extracted by KNN algorithmkMeasured as Euclidean distance between actions, i.e. Ak=knn(ap') K may be selected to be 10;
4.3.2 criticic network obtaining all the nearest neighbor value actions obtained in the step (3.2.2.1), and screening to obtain an optimal value action; the MD saves the current state to an experience playback pool after executing the optimal value action; the method comprises the following specific steps:
4.3.2.1 will AkThe actions are respectively input into the critic current network, and the critic performs the functions according to the strategyOutputting different behavior actions A in the current statekThe corresponding value is selected to have the maximum value of axAs predictive input action of MD, i.e.
4.3.2.2MD according to action aτExecuting task unloading decision, and obtaining return r according to action execution resultτAnd a new state s is observedτ+1Forming new vector samples sτ,aτ,rτ,sτ+1]And storing the experience playback pool;
4.3.3 updating network parameters; the method comprises the following specific steps:
1) randomly sampling m samples [ s ] from an empirical playback poolτ,aτ,rτ,sτ+1]Sending the data to the current actor network, the current critic network, the target actor network and the target critic network;
2) the target actor network follows the state s of the next time periodτ+1Output action a'τ+1The target critic network depends on the state sτ+1And action a 'output from target actor network'τ+1Obtaining the current target expected value yτ(ii) a And the current target expected value yτDelivered to the mean square error loss function
3) the current critic network depends on the state sτAction aτAnd a prize rτOutputting an evaluation functionGiven sampling strategy gradientSum mean square error loss function
4) Updating all weights theta of operator network and critic network through back propagation of neural networkQ,θu;
5) Updating network parameters of the target operator network and the target critical network, namely:
θQ′←σθQ+(1-σ)θQ′
θμ′←σθμ+(1-σ)θμ′
wherein σ is a network update weight, set to 0.1;
6) the actor network obtains the state s of the next time period from the experience recycle poolτ+1Repeating steps 1) to 6) up to a maximum time period;
4.3.4 repeat steps 4.3.1-4.3.3 until the maximum number of rounds is reached to obtain stable model parameters.
In order to verify the feasibility of the method, the method is compared with the traditional three algorithms LB, Remote, Local and the reinforced learning network DQN through experiments.
Calculated capacity c of each enb in this experimenti(tau) 10GHz, transmission power with MDThe total working time is 1000s, and the size T of each time periodslot1s, each task has the same workload W25 GHz · s, a data size D10 MB, and a deadline T for each time period τDLSet to 3s, when the task is completed at the deadline, the MD can get the utility u equal to 1, the bandwidth B of the wireless network equal to 100MHz, and white gaussian noise σ2-174dbm/Hz, constant of path lossThe path loss exponent θ is 4, and the distance length d of each enb from the MD is 1000.
The CPU core number M of MD is 4, and the operating frequency of each CPU is 2.0GHz, so the computation power of MD isWherein
And comparing the performance of each algorithm under different conditions by using three indexes of reward, energy consumption and loss cost generated by task discarding of the MD.
1. Convergence comparison
Since the invention applies Knn algorithm to extract motion characteristics from continuous space to discrete space, the influence of different k sizes in KNN algorithm on convergence is considered in experiments, wherein k is 1 to extract only one motion from the prototype motion, and k is 1% to extract 1% from the prototype motion. In the case that the number of enb in fig. 3(1), (3) and (5) is 1, 3 and 5, respectively, the convergence comparison experiment is performed on the DRL-E2D provided by the present invention and the conventional DQN algorithm, and the upper limit of the number of cycles is 250.
From fig. 3(2) - (3), it can be seen that DRL-E2D performs better than k-1% when k is 1%, because the larger k is more beneficial for the neural network to infer better next action based on its own strategy, and it can be seen that the DQN convergence performance is consistently worse than DRL-E2D within the same number of cycles regardless of the number of enb.
Effect of the number of eNBs
Fig. 4(1) shows that as the number of eNB increases, the rewards gained by LB, Remote, DRL-E2D and DQN increase because these algorithms can benefit and offload tasks to eNB, and furthermore, as eNB increases, MD gains more rewards by doing more tasks and consuming less energy. Fig. 4(2) shows that the power consumption of DRL-E2D remains constant regardless of the number of enb, since the MD tends to give up tasks instead of performing tasks in order to obtain the maximum reward. FIG. 4(3) shows that as enb increases, the penalty for the remaining algorithms, with the exception of the Local algorithm, decreases accordingly.
3. Effect of task workload W
Fig. 5(1) can see that as W increases, the rewards earned by all algorithms gradually decrease because for a fixed task arrival rate λ, a larger W requires more computing resources, resulting in higher energy consumption, fewer completed tasks and lower rewards, but DRL-E2D consistently performs better than other algorithms, indicating that it is more adaptable to changes in W. Fig. 5(2) shows that Remote has the lowest energy consumption and is independent of W. The energy consumption of LB, DRL-E2D and DQN increases with increasing W, since a larger W requires more computing resources and time, resulting in higher energy consumption. Figure 5(3) shows that as W increases, the penalty increases for all algorithms, with loss variation for Local being the most drastic, since it discards more tasks than other algorithms.
4. Influence of data size D
FIG. 6(1) shows that in addition to Local, the rewards earned by the remaining algorithms increase as D increases because DRL-E2D, LB, Remote and DQN apply task offload policies. Thus, with larger D, the MD will spend more energy offloading tasks to the eNB. While Local does not employ task offloading, so its reward is independent of D.
Similarly, the power loss and loss penalty of MD in fig. 6(1) - (2) also gradually increases with increasing D. In this case, since the Remote algorithm offloads all tasks to enb, the number of tasks discarded is larger, and the loss cost thereof changes faster.
In conclusion, the DRL-E2D algorithm provided by the invention performs well under various conditions.
Claims (10)
1. The DRL-based energy consumption perception task unloading method under the mobile edge computing environment is characterized by comprising the following steps of:
step (1), constructing a task unloading scene under a multi-eNB MEC environment;
step (2), constructing a joint reward function of a task unloading scene under the constraint of deadline under the environment of multiple eNB MECs, specifically as follows:
(2.1) define the total energy consumption E (τ) to perform tasks and offload tasks locally to the eNB at each time period MD as:
whereinDenotes MD local execution beta0(τ) the amount of energy consumed by the tasks,denotes alphai(τ) offloading of tasks to eNBiTotal transmission energy consumption of;
(2.2) considering the task deadline constraint, defining the total utility U (τ) of MD and all base stations as:
wherein n represents the number of eNBs in the MEC; t (T)j) Representing the jth task tjWaiting or execution time of, TDLRepresenting the deadline of the task; beta is a0(τ) represents the number of tasks processed by the MD over a period of τ, αi(τ) denotes the ith base station eNBiThe number of the tasks processed and completed in the time period tau, and u represents the obtained effect of the MD on successfully completing the tasks;
(2.3) if a task misses the deadline, considering that the task is overtime and will be discarded by the system, thus generating a loss, defining a loss function:
wherein d is0(τ) represents the number of tasks dropped by MD, di(τ) denotes eNBiThe number of tasks dropped;
and (2.4) defining an optimization problem model of task unloading under the scene according to the steps (2.1) to (2.3):
Max:R(τ)=U(τ)-P(τ)-E(τ) (a)
step (3) all eNBs under the task unloading scene under the multi-eNB MEC environmentiAnd a task processing queue of the time period tau of the MD, and the MD sends to all the eNBsiThe data transmission rate and the total number of tasks reached by MD are input states sτConstructing an operator-critical deep reinforcement learning network framework by taking a task unloading solution and the calculation capacity of the MD as an action space, taking the task unloading solution as an output and taking a target reward function of a formula (a) as a reward;
state s of the period of time ττ=[L0(π),L1(τ),...,Li(τ),...,Ln(τ),η1(τ),...,ηi(τ),...,ηn(τ),z(τ)];
Wherein L is0(π) represents the task processing queue in MD, Li(τ) denotes eNBiAn upper task processing queue, i ═ 1,2, … …, n; etai(τ) denotes MD and eNBiThe rate of data transmission between the first and second,z (τ) represents the total number of tasks reached by the MD;
the vector form of each motion of the motion space is aτ=[a0(τ),...,ai(τ),...,an(τ),c0(τ)]I.e. each action contains the number of MD locally reserved tasks a0(τ), offloading to individual eNBsiTask a ofi(τ) and MD calculation Capacity c0(τ);
And (4) adopting an operator-critic deep reinforcement learning network framework to carry out optimization solution on the joint reward function for task unloading in the step (2) to obtain a solution for optimal task unloading.
2. The method for energy consumption aware task offloading based on DRL in a mobile edge computing environment according to claim 1, wherein the step (1) is specifically as follows:
the overall architecture of a task unloading scene under a multi-eNB MEC environment mainly comprises a single MD and n base station eNBs; the MD is used for sending the designated tasks to each base station for unloading and simultaneously executing the tasks locally;
(1.1) dividing the system time into equally spaced time periods, assuming that z (τ) tasks arrive at MD at the beginning of each time period, they are considered as an independent and identically distributed sequence, and each arriving task has constant data D and execution workload W;
(1.2) defining the ith time period from MD to ith base station eNBiData transmission rate ηi(τ):
ηi(τ)=Bilog2[1+SNRi(τ)] (1)
Wherein B isiRepresenting eNBiThe bandwidth allocated to the MD is such that,which is indicative of the signal-to-noise ratio,representing the transmission power, σ, of the MD2Representing white gaussian noiseSound, gi(τ) represents the channel gain, defined as And theta denotes a path loss constant and a path loss exponent, respectively, di(τ) denotes eNBiPath distance from MD at time period τ;
(1.4) definition of eNBiτ +1 th slot task processing queue Li(τ+1):
Li(τ+1)=max{Li(τ)-βi(τ),0}+αi(τ) (2)
Wherein alpha isi(τ) indicates all offloading to eNBiTask of (1), betai(τ) denotes eNBiProcessing completed tasks in the tau time period;
define the # 1 time slot task processing queue L of MD0(τ+1):
L0(τ+1)=max{L0(τ)-β0(τ),0}+α0(τ) (3)
Wherein alpha is0(τ) is a local task of MD, β0(τ) is the task processed and completed in the τ -th time period of the MD;
3. The method for energy consumption aware task offloading based on DRL in mobile edge computing environment according to claim 2, wherein the step (1.5) is specifically as follows:
(1.5.1) for the case where the task is executed locally in the MD, its execution time and energy consumption are defined as:
whereinRepresents the computing power of an MD with an M-core CPU, and is defined asWhereinRepresents a constant related to the chip architecture; f (tau) represents the working frequency of the M-core CPU; m represents an M core; c. C0(τ) represents the calculated capacity of the MD, denoted c0(τ) ═ MF (τ); w represents the workload of the task;
(1.5.2) offloading tasks to eNB for MDiThe execution condition needs to consider the data transmission time and the execution time respectively; defining the data transmission time as:
Ti θx(τ)=D/ηi(τ) (6)
meanwhile, the energy consumed by data transmission can be defined as:
when eNBiReceiving a taskThen, the task is put into the self task processing queue Q according to the rule of first-come-first-obtainedi(τ); defining the task execution time as follows:
Ti θx(τ)=W/ci(τ) (6)
wherein W represents the workload of a task; c. Ci(τ) denotes eNBiThe computing capacity of (2).
4. The method for energy consumption aware task offloading based on DRL in a mobile edge computing environment according to claim 1, wherein the fourth step is as follows:
(4.1) randomly initializing weights θ of the operator network and the critic networkμAnd thetaQRespectively copying the weights to a target actor network and a target critic network, and setting the empirical playback pool capacity to be D, D>0, simultaneously emptying the experience playback pool;
θμ’←θμ,θQ’←θQ;
wherein theta isμ’、θQ’Respectively representing the weights of the target operator network and the target critic network;
(4.2) initializing an MD system environment and distributing tasks to the MD to obtain an initial state value under the current round;
and (4.3) operating the operator-critic deep reinforcement learning network framework to obtain the optimal value action for the state in each time period.
5. The method for DRL-based energy consumption aware task offloading in a mobile edge computing environment according to claim 4, wherein the step (4.2) is specifically as follows:
4.2.1 initializing MD system environment and generating a random noise generator N;
4.2.2 allocate z (τ) task for MD, when τ ═ 0 denotes the initial time period;
4.2.3 obtain the initial state value observed by MD from the system environment, i.e. MD local state when the task is not running and since MD is not offloading the task to eNB at this time, when τ is 0, MD local state is:
sτ=[L0(π),η0(τ),z(τ)] (5)。
6. the method for DRL-based energy consumption aware task offloading in a mobile edge computing environment according to claim 4, wherein the step (4.3) is specifically as follows:
4.3.1 operator network according to the current time period status sτOutputting prototype actions, entering an embedding layer for mapping, and extracting k nearest neighbor value actions by using a KNN algorithm;
4.3.2 criticic network obtaining all the nearest neighbor value actions obtained in the step (3.2.2.1), and screening to obtain an optimal value action; the MD saves the current state to an experience playback pool after executing the optimal value action;
4.3.3 updating network parameters;
4.3.4 repeat steps 4.3.1-4.3.3 until the maximum number of rounds is reached to obtain stable model parameters.
7. The method for DRL-based energy consumption aware task offloading in a mobile edge computing environment according to claim 6, wherein the step (4.3.1) is specifically as follows:
4.3.1.1 State sτInputting the operator network, and obtaining output by the operator network according to the input strategy piAnd in order to increase the learning randomness, a search noise point N is addedτGet the prototype action apI.e. by
4.3.1.2 to convert the motion value a in continuous spacepMapping to action value a in discrete spacep'An embedding layer is arranged between the operator and the critical, and the obtained a is processedpInputting the embedding layer and outputting d mapped ap'(ii) a D mapped action values ap'K neighbor value sets A extracted by KNN algorithmkMeasured in Euclidean between actionsDistance, i.e. Ak=knn(ap')。
8. The method for energy consumption aware task offloading based on DRL in mobile edge computing environment according to claim 1, wherein the step (4.3.2) is specifically as follows:
4.3.2.1 will AkThe actions are respectively input into the critic current network, and the critic performs the functions according to the strategyOutputting different behavior actions A in the current statekThe corresponding value is selected to have the maximum value of axAs predictive input action of MD, i.e.
4.3.2.2MD according to action aτExecuting task unloading decision, and obtaining return r according to action execution resultτAnd a new state s is observedτ+1Forming new vector samples sτ,aτ,rτ,sτ+1]And stores in an experience playback pool.
9. A computer-readable storage medium, having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-8.
10. A computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method of any of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110481249.9A CN113157344B (en) | 2021-04-30 | 2021-04-30 | DRL-based energy consumption perception task unloading method in mobile edge computing environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110481249.9A CN113157344B (en) | 2021-04-30 | 2021-04-30 | DRL-based energy consumption perception task unloading method in mobile edge computing environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113157344A true CN113157344A (en) | 2021-07-23 |
CN113157344B CN113157344B (en) | 2022-06-14 |
Family
ID=76872731
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110481249.9A Active CN113157344B (en) | 2021-04-30 | 2021-04-30 | DRL-based energy consumption perception task unloading method in mobile edge computing environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113157344B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113590229A (en) * | 2021-08-12 | 2021-11-02 | 中山大学 | Industrial Internet of things graph task unloading method and system based on deep reinforcement learning |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109362064A (en) * | 2018-09-14 | 2019-02-19 | 重庆邮电大学 | The task buffer allocation strategy based on MEC in mobile edge calculations network |
CN110262845A (en) * | 2019-04-30 | 2019-09-20 | 北京邮电大学 | The enabled distributed computing task discharging method of block chain and system |
CN112015481A (en) * | 2020-06-04 | 2020-12-01 | 湖南大学 | Multi-Agent reinforcement learning-based mobile edge calculation unloading algorithm |
CN112261674A (en) * | 2020-09-30 | 2021-01-22 | 北京邮电大学 | Performance optimization method of Internet of things scene based on mobile edge calculation and block chain collaborative enabling |
-
2021
- 2021-04-30 CN CN202110481249.9A patent/CN113157344B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109362064A (en) * | 2018-09-14 | 2019-02-19 | 重庆邮电大学 | The task buffer allocation strategy based on MEC in mobile edge calculations network |
CN110262845A (en) * | 2019-04-30 | 2019-09-20 | 北京邮电大学 | The enabled distributed computing task discharging method of block chain and system |
CN112015481A (en) * | 2020-06-04 | 2020-12-01 | 湖南大学 | Multi-Agent reinforcement learning-based mobile edge calculation unloading algorithm |
CN112261674A (en) * | 2020-09-30 | 2021-01-22 | 北京邮电大学 | Performance optimization method of Internet of things scene based on mobile edge calculation and block chain collaborative enabling |
Non-Patent Citations (2)
Title |
---|
胡海洋等: "移动云计算环境下任务调度的多目标优化方法", 《计算机研究与发展》, 15 September 2017 (2017-09-15) * |
詹文翰: "移动边缘网络计算卸载调度与资源管理策略优化研究", 《CNKI中国学术文献网络出版总库博士论文》, 15 July 2020 (2020-07-15) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113590229A (en) * | 2021-08-12 | 2021-11-02 | 中山大学 | Industrial Internet of things graph task unloading method and system based on deep reinforcement learning |
CN113590229B (en) * | 2021-08-12 | 2023-11-10 | 中山大学 | Industrial Internet of things graph task unloading method and system based on deep reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN113157344B (en) | 2022-06-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113242568B (en) | Task unloading and resource allocation method in uncertain network environment | |
CN113950066B (en) | Single server part calculation unloading method, system and equipment under mobile edge environment | |
CN113543176B (en) | Unloading decision method of mobile edge computing system based on intelligent reflecting surface assistance | |
CN110928654B (en) | Distributed online task unloading scheduling method in edge computing system | |
CN112380008B (en) | Multi-user fine-grained task unloading scheduling method for mobile edge computing application | |
CN111556461A (en) | Vehicle-mounted edge network task distribution and unloading method based on deep Q network | |
US11784931B2 (en) | Network burst load evacuation method for edge servers | |
CN111367657A (en) | Computing resource collaborative cooperation method based on deep reinforcement learning | |
CN111813506A (en) | Resource sensing calculation migration method, device and medium based on particle swarm algorithm | |
CN112214301B (en) | Smart city-oriented dynamic calculation migration method and device based on user preference | |
CN114285853A (en) | Task unloading method based on end edge cloud cooperation in equipment-intensive industrial Internet of things | |
CN116489712B (en) | Mobile edge computing task unloading method based on deep reinforcement learning | |
EP4024212A1 (en) | Method for scheduling interference workloads on edge network resources | |
CN113485826A (en) | Load balancing method and system for edge server | |
CN113573363A (en) | MEC calculation unloading and resource allocation method based on deep reinforcement learning | |
CN113590279A (en) | Task scheduling and resource allocation method for multi-core edge computing server | |
CN116112488A (en) | Fine-grained task unloading and resource allocation method for MEC network | |
CN113157344B (en) | DRL-based energy consumption perception task unloading method in mobile edge computing environment | |
CN113946423A (en) | Multi-task edge computing scheduling optimization method based on graph attention network | |
CN116009990B (en) | Cloud edge collaborative element reinforcement learning computing unloading method based on wide attention mechanism | |
CN117579701A (en) | Mobile edge network computing and unloading method and system | |
CN116193516A (en) | Cost optimization method for efficient federation learning in Internet of things scene | |
CN110768827A (en) | Task unloading method based on group intelligent algorithm | |
Yao et al. | Performance Optimization in Serverless Edge Computing Environment using DRL-Based Function Offloading | |
CN114520772B (en) | 5G slice resource scheduling method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |