CN116418808A

CN116418808A - Combined computing unloading and resource allocation method and device for MEC

Info

Publication number: CN116418808A
Application number: CN202111639639.0A
Authority: CN
Inventors: 赵英宏
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2021-12-29
Filing date: 2021-12-29
Publication date: 2023-07-11

Abstract

The invention discloses a method and a device for joint calculation unloading and resource allocation of MEC, which take the minimum network overall time delay as an optimization target, and form a joint scheduling strategy of task unloading decision and edge server resource allocation by designing a TD3 improved algorithm (ITD 3) based on a deep reinforcement learning model so as to ensure high stability and high reliability of a user service scheduling process, realize flexible scheduling of service and on-demand allocation of resources and improve user service quality.

Description

Combined computing unloading and resource allocation method and device for MEC

Technical Field

The present invention relates to the field of emerging information technologies, and in particular, to a method and apparatus for joint computing offloading and resource allocation for multi-access edge computing (MEC, mobile Edge Computing).

Background

To meet the low latency requirements of 5G application scenarios, multiple access edge computing (MEC) is a trend. Multiple access edge computing technologies provide computing services at the network edge. In the MEC system, the MEC server is closer to the internet of things device than the traditional cloud server. In addition, due to the distributed structure of the MEC server, data transmission is free from congestion, and transmission delay is greatly reduced. Thus, compared to cloud computing, MECs can support delay critical services and various internet of things applications. The computing task is transferred to the edge server with relatively rich resources, so that the computing service quality (QoS) can be improved, and the capability of the terminal equipment for resource demand application can be enhanced. Compared to traditional cloud servers, MEC servers may be less resource and more dynamic, requiring devices to contend for the limited computing resources of the server. Therefore, for such resource constrained systems, resource allocation and scheduling (e.g., server selection, offload rate allocation, and local computing power) is very important. In order to achieve efficient utilization of computing resources, meet computing needs of a device, an intelligent computing offload policy is needed. Thus, computational offloading has attracted attention from more researchers.

Disclosure of Invention

The embodiment of the invention provides a method and a device for joint calculation unloading and resource allocation of MEC (media access control), which are used for ensuring high stability and high reliability of a user service scheduling process, realizing flexible scheduling of service and on-demand allocation of resources and improving user service quality.

The method for jointly unloading and distributing the resources of the MEC comprises the following steps:

according to the task characteristics obtained in real time and the resource loads of all edge servers in the mobile edge network, abstract modeling is carried out on the network state, and a deep reinforcement learning model is established; the mobile edge network consists of a plurality of user equipment, a plurality of edge servers and a central controller;

after acquiring network state information in the mobile edge network every a period of time, training the deep reinforcement learning model by using an ITD3 algorithm;

when a computing task uploaded by the user equipment is received, unloading and resource allocation decisions are carried out by using the deep reinforcement learning model trained by an ITD3 algorithm according to the latest updated network state information and with the processing time delay of the computing task minimized as an optimization target, so as to obtain decision results;

And sending the unloading result of the computing task to an edge server accessed by the computing task, and sending a resource scheduling decision to the edge server allocated for executing the computing task.

In one embodiment, the network state information includes network resource group information for each of the edge servers;

the network resource group information of the edge server s is recorded as

Wherein, the liquid crystal display device comprises a liquid crystal display device,

the total amount of computing resources of the edge server s and the computing resource load of the time slot t respectively,

b, respectively, the total storage resource amount of the edge server s and the storage resource load of the time slot t _s And (t) is the communication resource of the edge server s in the time slot t.

In one embodiment, the state space, action space, and reward function of the deep reinforcement learning model are set as follows:

the state space

Wherein f (t) = { f ₁ (t),f ₂ (t),…f _S (t) represents the computing resource allocation of each of said edge servers at time t,

q(t)＝{q ₁ (t),q ₂ (t),…q _S (t) } represents the storage resource allocation of each of the edge servers at time t,/->

g(t)＝{g _i,j (t) |i, j ε S, i+.j } represents the channel gain of the communication link between the edge servers at time t,/>

h _i,j (t) is the channel coefficient of channel (i, j) at time t, d _i,j The Euclidean distance between the edge server i and the edge server j is represented by zeta, which is a path loss factor; (W) _k (t),D _k (t),T _k (t)) is the workload, data size and deadline of the computing task k (t), respectively;

the motion space

Wherein alpha is _k,s (t) is a binary unload decision variable, when alpha _k,s When (t) =1, it means that the calculation task k (t) is offloaded to the edge server s; f (f) _s (t) computing resources allocated for performing computing tasks at time slot t for edge server s;

the bonus function

Wherein (1)>

To optimize the objective function of the problem +.>

As penalty function, C ₀ As penalty factor, Σ _s∈S (1-α _k,s (t)) represents the number of tasks that are rejected.

In one embodiment, the processing delay minimization calculation formula of the calculation task is as follows:

wherein D is _k (t) input data for calculation task k, R _k,s For the link rate, W _k (t) is a workload, f _s And computing resources allocated for the computing task k for the edge server s.

In one embodiment, the constraint condition of the processing delay minimization calculation formula of the calculation task is as follows:

C ₁ :

C ₂ :

C ₃ :

C ₄ :

C ₅ :

C ₆ :

wherein constraint C1 defines the computing task as binary offload; constraint C2 defines deadline constraints for each of the computing tasks; constraint C3 defines that the sum of computing resources occupied by the executing computing task is not greater than the computing resources of the edge server; constraint C4 defines that the sum of storage resources occupied by all tasks offloaded to the edge server is not greater than the storage resources of the edge server; constraint C5 defines that a computing task can be offloaded to at most one edge server; constraint C6 defines the constant positive of the allocated computing resources.

In one embodiment, the training the deep reinforcement learning model using the ITD3 algorithm specifically includes:

constructing a main network N and a target network N ', wherein the main network N and the target network N' comprise two critic sub-networks

And an actor subnetwork pi;

randomly allocating parameters theta to the main network N and the target network N' in an initial state ₁ ,θ ₂ And initializing an experience playback pool

When the execution times t of training is judged to be larger than a set value, changing the initial state s into the optimal state s observed before;

adding noise selection actions according to a determination policy

After the next state is obtained according to the execution of the action, storing the samples (s, a, r, s') in the experience playback pool

In (a) and (b);

from the experience playback pool

Randomly sampling N samples (s, a, r, s') for review, and updating the action according to a quadratic update formula>

Selecting network update parameters with smaller Q-value values from the two critic sub-networks to calculate an objective function

Updating parameters of critic subnetworks according to gradient descent formula

After determining that the execution times t reach a set threshold d, updating parameters of the actor sub-network according to a deterministic strategy gradient formula

Updating parameters of the primary network to parameters of a target network using a soft update policy

And repeatedly executing the training process until the parameters are converged.

In one embodiment, the decision result includes a node map of the computing task offload edge server (task ID→node IP) and a data map of resource allocation (node IP→ { computing resource: X, storage resource: Y … }).

In another aspect, an embodiment of the present invention further provides a server, including:

the modeling unit is used for carrying out abstract modeling on the network state according to the task characteristics acquired in real time and the resource loads of all edge servers in the mobile edge network, and establishing a deep reinforcement learning model; the mobile edge network consists of a plurality of user equipment, a plurality of edge servers and a central controller;

the training unit is used for training the deep reinforcement learning model by using an ITD3 algorithm after acquiring network state information in the mobile edge network every a period of time;

the processing unit is used for carrying out unloading and resource allocation decision by using the deep reinforcement learning model trained by the ITD3 algorithm according to the latest updated network state information and taking the processing time delay of the minimum calculation task as an optimization target when receiving the calculation task uploaded by the user equipment, so as to obtain a decision result;

And the sending unit is used for sending the unloading result of the computing task to the edge server accessed by the computing task and sending the resource scheduling decision to the edge server allocated for executing the computing task.

In another aspect, an embodiment of the present invention further provides an electronic device, including at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions for execution by the at least one processor; the instructions are executed by the at least one processor to enable the at least one processor to perform the method for joint computational offloading and resource allocation of MECs provided by embodiments of the present invention.

On the other hand, the embodiment of the invention also provides a computer storage medium, and the computer storage medium stores a computer program, and the computer program is used for executing the combined computing and unloading and resource allocation method of the MEC provided by the embodiment of the invention.

The invention has the following beneficial effects:

the method and the device for joint calculation unloading and resource allocation of the MEC provided by the embodiment of the invention take the minimum network overall time delay as an optimization target, and form a joint scheduling strategy of task unloading decision and edge server resource allocation by designing a TD3 improved algorithm (ITD 3) based on a deep reinforcement learning model so as to ensure high stability and high reliability of a user service scheduling process, realize flexible scheduling of services and on-demand allocation of resources and improve user service quality.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are needed in the description of the embodiments will be briefly described below, it will be apparent that the drawings in the following description are only some embodiments of the present disclosure, and that other drawings may be obtained according to these drawings without inventive effort to a person of ordinary skill in the art.

FIG. 1 is a flow chart of a method for joint computing offloading and resource allocation of MECs provided by an embodiment of the present invention;

fig. 2 is a network topology diagram of a multi-access edge computing network according to an embodiment of the present invention;

FIG. 3 is a flowchart of an ITD3 algorithm provided by an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a combined computing and offloading and resource allocation apparatus of an MEC according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without inventive effort, based on the embodiments in this disclosure are intended to be within the scope of this disclosure.

The term "and/or" in the embodiments of the present disclosure describes an association relationship of association objects, which indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

The application scenario described in the embodiments of the present disclosure is for more clearly describing the technical solution of the embodiments of the present disclosure, and does not constitute a limitation on the technical solution provided by the embodiments of the present disclosure, and as a person of ordinary skill in the art can know that, with the appearance of a new application scenario, the technical solution provided by the embodiments of the present disclosure is equally applicable to similar technical problems. In the description of the present disclosure, unless otherwise indicated, the meaning of "a plurality" is two or more.

For the computing offloading technology, many design methods for computing offloading policies are proposed by the academy. The prior work mostly adopts an optimization or game-based method to solve the problem of computing unloading. Such as Lyapunov optimization, branch-and-bound algorithm and its improved algorithm (e.g. row/column generation, benders decomposition), biological heuristic algorithm (e.g. multi-particle swarm algorithm, simulated annealing algorithm), and the combination of several conventional optimization algorithms. However, since the problem of joint calculation unloading and resource allocation is a mixed integer nonlinear programming (MINLP) problem, the problem is NP-Hard, only an approximate optimal solution can be obtained through a traditional optimization algorithm, and the upper and lower boundaries of the difference between a feasible solution and an optimal solution which are solved are difficult to accurately measure from a mathematical level. In addition, these conventional algorithms all require a priori information of the environmental statistics, which is not possible in a dynamic MEC system of the actual scenario. In addition, the traditional optimization algorithms do not consider the influence of the past service scheduling decisions on the subsequent service scheduling decisions, are limited by the algorithm, neglect the relevance among the service scheduling strategies, and cause that each scheduling is mutually independent, so that the current decision is difficult to ensure to be optimal for a long time.

For problems with conventional algorithms, some operators model the computational offload problem as a Markov Decision Process (MDP) and solve it using Reinforcement Learning (RL) or Deep Reinforcement Learning (DRL) methods. Many domestic and foreign scholars use classical deep reinforcement learning models such as D3QN (Dueling Double DQN, value-based), TRPO (Trust Region Policy Optimization, value-based), DDPG (Deep Deterministic Policy Gradient, actor-critic) and the like to solve the joint calculation unloading and resource allocation problems, and the methods obtain good performance without the need of environmental statistics priori knowledge. However, they are modeled either in discrete action space or in continuous action space, which limits the optimization of offloading decisions in limited action space. In reality, the motion space of the unloading problem is often mixed continuously-discretely, and each device needs to jointly decide on continuous and discrete motions to complete the unloading process. For example, the device may not only decide whether to offload tasks or which server to select, but also choose an offload rate or local computing power to balance time and energy consumption. Thus, these methods may not perform fine discretization of continuous actions well when the action space becomes large, and relaxing discrete actions into a continuous set may significantly increase the complexity of the action space.

Based on the technical problems in the prior art, the invention provides a joint calculation unloading and resource allocation method of MEC, which aims at minimizing the overall time delay of a network as an optimization target, designs a TD3 improvement algorithm (ITD 3) based on a deep reinforcement learning model, forms a joint scheduling strategy of task unloading decision and edge server resource allocation, ensures high stability and high reliability of a user service scheduling process, realizes flexible scheduling of service and on-demand allocation of resources, and improves user service quality.

Specifically, the method for jointly unloading calculation and distributing resources of MEC provided by the embodiment of the invention, as shown in fig. 1, comprises the following steps:

s1, abstract modeling is carried out on the network state according to the task characteristics obtained in real time and the resource loads of all edge servers in the mobile edge network, and a deep reinforcement learning model is built. In a mobile edge network composed of user equipment, an edge server and a central controller, each user equipment generates a calculation task according to requirements and sends the calculation task to the edge server for processing. And the central controller performs abstract modeling on the network state according to the real-time task characteristics and the resource load.

Specifically, the mobile edge network is an end-edge-cloud network composed of several user equipments UE, several edge servers (MEC servers) and one central controller (core network), and operates in a time slot structure, and its basic topology is shown in fig. 2. The total of K user devices is represented by k= {1,2, …, and K }, the user devices do not have the capability of processing the computing task and only take charge of uploading the computing task to the edge server connected with the computing task nearby. For user equipment k, its computation task generated in time slot t is denoted as k (t), and is defined by tuple (W _k (t),D _k (t),T _k (t)) wherein W _k (t) is the workload, D _k (T) is the input data size, T _k And (t) is a deadline. One or more ofThe computing tasks are inseparable, and each user device can upload one computing task to one edge server at most, regardless of the situation of redundancy offloading. The edge servers are S in total, denoted by s= {1,2, …, S }, each edge server has a certain network resource, and can process computing tasks from multiple user devices at the same time.

S2, training the deep reinforcement learning model by using an ITD3 algorithm after acquiring network state information in the mobile edge network every a period of time. Specifically, at intervals, the central controller gathers and caches up-to-date network state information in preparation for invocation when a computing task arrives.

Specifically, the network state information includes network resource group information of each edge server. For the edge server s, its network resource group information is noted as

Wherein->

Computing resource load of total computing resource (embodied as CPU main frequency, unit MHz) and time slot t of edge server s respectively, +.>

The total storage resource amount (expressed as hard disk capacity, unit GB) of the edge server s and the storage resource load of the time slot t are respectively, B _s And (t) is the communication resource (embodied as bandwidth, in Mbps) of the edge server s in the time slot t.

Alternatively, software Defined Networking (SDN) technology may be enabled in the core cellular network to enable flexible routing and communication between edge servers. The software defined networking technology concentrates control logic in a centralized entity called an SDN controller, simplifying the network management flow. The SDN controller may install appropriate forwarding rules in routing tables of all forwarding devices (e.g., routing tables in openFlow switch), so as to obtain indication information required by a network application policy, and obtain a global view of the network. Some network performance related monitoring information is known by means of the SDN controller.

Specifically, the improved dual delay depth deterministic strategy gradient (ITD 3, improved Twin Delayed Deep Deterministic Policy Gradient) algorithm, which is an improved algorithm of the dual delay depth deterministic strategy gradient (TD 3, twin Delayed Deep Deterministic Policy Gradient) algorithm, is a deep reinforcement learning algorithm. Reinforcement Learning (RL) is suitable for solving real-time, associative complex decision problems, which mainly include three key elements of state, action, and rewards. The agent interacts with the environment in a discrete time domain, and at each time slot t, the agent makes a behavior a according to a strategy μ, namely: mu s _t →a _t . After the agent completes the action according to the strategy, the environment returns a rewarding value r _t And the system state s _t Transition to state s _t+1 . Generally, we use the action-cost function (i.e., Q function) Q ^μ (s, a) represents the expected long-term discount-reward value (EDAR, expected Discounted Accumulative Reward) for executing the policy μ under the initial state s and the initial action a.

Therefore, prior to solving the computational offloading problem using the deep reinforcement learning model, the state space, action space, and rewards functions of the Markov decision process need to be designed. For the main characteristics of the mobile edge network, the state space, the action space and the rewarding function are set as follows:

in particular, the state space consists of resource load, task information and channel state, i.e

Wherein f (t) = { f ₁ (t),f ₂ (t),…f _S (t) } represents the computing resource allocation of each edge server at time t, i.e. +.>

q(t)＝{q ₁ (t),q ₂ (t),…q _S (t) } tableShowing the storage resource allocation condition of each edge server at the time t, namely

g(t)＝{g _i,j (t) |i, j ε S, i+.j } represents the channel gain of the communication link between edge servers at time t, i.e. +.>

Wherein is h _i,j (t) channel coefficients of channel (i, j) at time t, d _i,j For the euclidean distance between edge server i and edge server j, ζ is a path loss factor, defaulting to 3.5. (W) _k (t),D _k (t),T _k (t)) is the basic information of the computing task k, namely the workload, the data size and the deadline of the computing task k (t).

In particular, the action space consists of offloading decisions, resource allocation, i.e

Wherein alpha is _k,s (t) is a binary unload decision variable, when alpha _k,s When (t) =1, the calculation task k (t) is offloaded to the edge server s. f (f) _s (t) computing resources allocated for performing computing tasks at time slot t for edge server s.

Specifically, the reward function: the reward function consists of an objective function and a penalty function, i.e

Wherein->

The objective function of the optimization problem, namely the maximum value of the time delay of all the calculation tasks; />

As penalty function, C ₀ As penalty factor, Σ _s∈S (1-α _k,s (t)) represents the number of tasks that are rejected. The lower the delay, the fewer tasks that are rejected and the larger the reward function.

The TD3 algorithm is an improved algorithm of the DDPG algorithm, which eliminates the parameter error problem of the conventional DDPG (vanella-DDPG) algorithm by the following three techniques: 1. learning by using two Q functions, and using parameters of a function with the minimum Q-value when updating model parameters, so as to avoid overestimation of the Q-value; 2. the target network and the strategy are updated in a delay way, and every d rounds of updating is performed, so that accumulated errors caused by parameter updating are avoided; 3. and adding noise to the target action network, smoothing the Q function, and reducing the error of strategy estimation.

Compared with the TD3 algorithm, the ITD3 algorithm uses the thought of the greedy algorithm to modify the steps at the beginning of each training set on the basis of TD3, and the initial state at the beginning of each set value (for example, 150) epochs is set to be the best state in all the previously observed states, so that the original TD3 algorithm can be helped to quickly find the optimal solution in a high-dimensional state space, the algorithm convergence speed is increased on the premise of not affecting the algorithm performance, and the deployment availability of the TD3 algorithm in an actual scene is improved.

As shown in fig. 3, the main steps of the ITD3 algorithm are as follows:

1. a primary network N and a target network N' are constructed, each network comprising two critical subnetworks

And an actor subnetwork pi;

2. randomly assigning a parameter θ to each network in an initial state ₁ ,θ ₂ And initializing an experience playback pool

3. If 150 times is performed, changing the initial state s to the optimal state s observed before, otherwise, taking no action;

4. selecting actions based on decision strategy plus noise

5. After the next state is obtained according to the action execution, the samples (s, a, r, s') are stored in an experience playback pool

In (a) and (b);

6. after a period of time, randomly sampling N samples (s, a, r, s') from the experience playback pool for review, and updating action a according to a secondary updating formula,

The value of the region around the target action is ensured to be smooth so as to reduce the error as much as possible (regularization process).

7. Selecting network update parameters with smaller Q-value values from a double-critic sub-network to calculate an objective function

8. Updating critic network parameters according to gradient descent formula

9. After the execution times t reach a certain threshold d (t mod d), updating the actor network parameters according to a deterministic strategy gradient formula

10. Updating parameters of a primary network to parameters of a target network using a soft update policy

11. Repeating the steps 3-10 until the algorithm converges.

And S3, when receiving a calculation task uploaded by the user equipment, taking the processing time delay of the minimum calculation task as an optimization target according to the latest updated network state information, and carrying out unloading and resource allocation decision by using a deep reinforcement learning model trained by an ITD3 algorithm to obtain a decision result.

Specifically, the network state information when the offload request is calculated should be guaranteed to be the same as the network state information updated by the global controller last time.

Specifically, the task processing delay comprises the transmission delay and the execution delay of the task, and the calculation method comprises the following steps:

wherein τ _k,s And (t) is the total time delay from the time slot t to the time slot t from the time of generating the computing task k at the user equipment to the time of executing the computing task at the edge server s. τ _up In order to upload the time delay of the input data,

processing delay τ for performing computation task k for edge server s _wait To calculate the latency of task k, τ _down The download is delayed for the result. Since each edge server is simultaneously allocated with different resources to process a plurality of computing tasks, the waiting time delay is 0; the download delay is negligible due to the better downlink quality.

The calculation method of the task processing time delay comprises the following steps:

wherein D is _k (t) is the input data (unit MB) of the computing task k, R _k,s Is the link rate (unit Mbps), W _k (t) is a workload, f _s For edge server s as computing taskk allocated computing resources (units MHz).

According to Shannon's formula, the link rate can be written as follows:

wherein B is bandwidth, g _k,s (t) is channel gain, P _k In order to transmit the power of the power,

is the noise power, I _k,s And (t) is the mutual interference between channels.

Thus, the optimization of the computational offload algorithm aims at minimizing the long-term consumption of the whole network, which is defined as the latency of performing the computational tasks, the computational method being as follows:

the constraint condition calculation method for calculating unloading is as follows:

C ₁ :

C ₂ :

C ₃ :

C ₄ :

C ₅ :

C ₆ :

wherein constraint C1 defines a task binary offload; constraint C2 defines deadline constraints for each computing task; constraint C3 defines that the sum of the computing resources occupied by the executing task is not greater than the computing resources of the server; constraint C4 defines that the sum of storage resources occupied by all tasks offloaded to the server is not greater than the storage resources of the server; constraint C5 defines that a task can be offloaded to at most one edge server; constraint C6 defines the constant positive of the allocated computing resources.

Specifically, the decision result is a data structure for providing the results of computing offload and resource allocation, including the node map of computing offload (task ID→node IP) and the data map of resource allocation (node IP→ { computing resource: X, storage resource: Y … }).

And S4, sending an unloading result of the computing task to an edge server accessed by the computing task, and sending a resource scheduling decision to the edge server allocated for executing the computing task.

Specifically, an edge server that receives a computing task nearby determines an edge server address to which the computing task is offloaded according to the node map, and sends task main information to the specified edge server address. The task main information includes task ID, task type (thereby determining algorithm to be executed), task arrival node (for returning result data after calculation is completed), task offloading node (destination IP address).

And the edge server responsible for executing the calculation task dynamically allocates the required calculation resources and storage resources for executing the corresponding algorithm according to the resource load condition of the server, establishes a transmission link and receives the input data of the task. The corresponding algorithm is an artificial intelligent reasoning algorithm which is required to be called for executing the calculation task, and the mapping of the task and the algorithm is required to be statically configured before the network is started. In fact, the time delay of the algorithm for executing task call in different network environments can be used as auxiliary data to provide more accurate decision reference for calculating unloading and resource allocation strategies. And the edge server for loading and executing the calculation task calls a specified algorithm on the edge server to process the calculation task, and returns an output result to the access server of the calculation task through a communication link to finish unloading the calculation task.

Compared with the prior art, the joint computing unloading and resource allocation method provided by the embodiment of the invention has the advantages that:

under a more complex scene, the traditional calculation unloading algorithm is difficult to have high performance and low complexity; the traditional calculation unloading algorithm only considers the unloading result of one time slot, and does not consider the relevance among the time slots; the computational offload problem itself has two features of reinforcement learning (reward maximization/Markov decision), which can be solved using reinforcement learning. The deep reinforcement learning model is used for solving the joint calculation unloading and resource allocation problem, and through pre-training of different network scenes, the instant-to-use (ready-to-use) of the model is realized. Also, in MEC networks, each edge server can independently make decisions, including target servers for task offloading and resource allocation scheduling decisions based on time-varying wireless channels, to minimize the overall cost of the network.

The ITD3 algorithm is used as an improved version of the DDPG algorithm, and is used for improving the policy network and the Value network on the basis of the DDPG algorithm, so that the problem of over-high estimation of the Q-Value is optimized; tain Critic was inherited from Double DQN to reduce overestimate errors; when the Q value for updating the TD-error is calculated, adding noise to action for smoother Q value estimation function of Critic fitting; adding a noise to the action for smoothing the Critic function; and setting the optimal solutions of 150 epochs at the timing as an initial state, and accelerating the convergence rate of the algorithm. By combining the characteristics of continuous state space and continuous action space of the mobile edge network scene, the ITD3 algorithm is used for training the deep reinforcement learning model, and the ITD3 algorithm can conveniently explore the deep reinforcement learning model and has higher convergence speed and higher solving precision than DDPG.

By completing the design of state space, action space and rewarding function of the deep reinforcement learning model, the deep reinforcement learning model which is suitable for different mobile edge network scenes can be obtained by training in different mobile edge network scenes, is not limited by specific application scenes, can achieve good performance under the condition that the traditional calculation unloading and resource allocation scheme does not need the prior knowledge of environmental statistics, can well execute the fine discretization of continuous actions, does not remarkably increase the complexity of the action space, is easy to be deployed in actual networking, and realizes the load balance of the network.

Based on the same inventive concept, the embodiment of the invention also provides a device for jointly unloading and distributing resources of MEC, and because the principle of the device for solving the problem is similar to that of the method for jointly unloading and distributing resources of MEC, the implementation of the device can be referred to the implementation of the method, and the repetition is omitted.

The device for jointly unloading and distributing the computation of the MEC provided by the embodiment of the invention, as shown in fig. 4, comprises:

the modeling unit 1 is used for carrying out abstract modeling on the network state according to the task characteristics acquired in real time and the resource loads of all edge servers in the mobile edge network, and establishing a deep reinforcement learning model; the mobile edge network consists of a plurality of user equipment, a plurality of edge servers and a central controller;

The training unit 2 is used for training the deep reinforcement learning model by using an ITD3 algorithm after acquiring network state information in the mobile edge network every a period of time;

the processing unit 3 is used for carrying out unloading and resource allocation decision by using a deep reinforcement learning model trained by an ITD3 algorithm according to the latest updated network state information and taking the processing time delay of the minimum calculation task as an optimization target when receiving the calculation task uploaded by the user equipment, so as to obtain a decision result;

and the sending unit 4 is used for sending the unloading result of the computing task to the edge server accessed by the computing task and sending the resource scheduling decision to the edge server allocated for executing the computing task.

Having described a method and apparatus for joint computing offloading and resource allocation of MECs according to an exemplary embodiment of the present invention, an electronic device according to another exemplary embodiment of the present invention is described next.

Those skilled in the art will appreciate that the various aspects of the invention may be implemented as a system, method, or program product. Accordingly, aspects of the invention may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

In some possible embodiments, an electronic device according to the invention may comprise at least one processor, and at least one computer storage medium. Wherein the computer storage medium stores program code which, when executed by a processor, causes the processor to perform the steps in the method of joint computing offload and resource allocation of MECs according to various exemplary embodiments of the invention described hereinabove.

An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 5. The electronic device 600 shown in fig. 5 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.

As shown in fig. 5, the electronic device 600 is embodied in the form of a general-purpose electronic device. Components of electronic device 600 may include, but are not limited to: the at least one processor 601, the at least one computer storage medium 602, and a bus 603 that connects the various system components, including the computer storage medium 602 and the processor 601.

Bus 603 represents one or more of several types of bus structures, including a computer storage media bus or computer storage media controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.

The computer storage media 602 may include readable media in the form of volatile computer storage media, such as random access computer storage media (RAM) 621 and/or cache storage media 622, and may further include read only computer storage media (ROM) 623.

The computer storage media 602 can also include a program/utility 625 with a set (at least one) of program modules 624, such program modules 624 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

The electronic device 600 may also communicate with one or more external devices 604 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with the electronic device 600, and/or with any device (e.g., router, modem, etc.) that enables the electronic device 600 to communicate with one or more other electronic devices. Such communication may occur through an input/output (I/O) interface 605. Also, the electronic device 600 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through a network adapter 606. As shown, the network adapter 606 communicates with other modules for the electronic device 600 over the bus 603. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 600, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

In some possible embodiments, aspects of a method for joint computing offload and resource allocation of MECs provided by the present invention may also be implemented in the form of a program product comprising program code for causing a computer device to perform the steps in the method for joint computing offload and resource allocation of MECs according to various exemplary embodiments of the invention as described herein above when the program product is run on a computer device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, a random access computer storage medium (RAM), a read-only computer storage medium (ROM), an erasable programmable read-only computer storage medium (EPROM or flash memory), an optical fiber, a portable compact disc read-only computer storage medium (CD-ROM), an optical computer storage medium, a magnetic computer storage medium, or any suitable combination of the foregoing.

The program product of the joint computing offload and resource allocation of the MEC of embodiments of the invention may employ a portable compact disc read-only computer storage medium (CD-ROM) and include program code and may run on an electronic device. However, the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the consumer electronic device, partly on the consumer electronic device, as a stand-alone software package, partly on the consumer electronic device, partly on the remote electronic device, or entirely on the remote electronic device or server. In the case of remote electronic devices, the remote electronic device may be connected to the consumer electronic device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external electronic device (e.g., connected through the internet using an internet service provider).

It should be noted that although several modules of the apparatus are mentioned in the detailed description above, this division is merely exemplary and not mandatory. Indeed, the features and functions of two or more modules described above may be embodied in one module in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module described above may be further divided into a plurality of modules to be embodied.

Furthermore, although the operations of the methods of the present disclosure are depicted in the drawings in a particular order, this is not required to or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

It will be apparent to those skilled in the art that embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk computer storage media, CD-ROM, optical computer storage media, and the like) having computer-usable program code embodied therein.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the disclosure. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable computer storage medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable computer storage medium produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims

1. A method for joint computational offloading and resource allocation of an MEC, comprising:

2. The method of claim 1, wherein the network state information includes network resource group information for each of the edge servers;

The network resource group information of the edge server s is recorded as

3. The method of claim 2, wherein the state space, action space, rewards function of the deep reinforcement learning model established is set as follows:

the state space

g(t)＝{g _i,j (t) |i, j ε S, i+.j } represents the channel gain of the communication link between the edge servers at time t, +.>

the motion space

the bonus function

Wherein (1)>

To optimize the objective function of the problem +.>

4. The method of claim 1, wherein the processing delay minimization calculation formula for the calculation task is as follows:

5. The method of claim 4, wherein the constraint of the processing delay minimization calculation formula for the calculation task is as follows:

C ₁ :

C ₂ :

C ₃ :

C ₄ :

C ₅ :

C ₆ :

6. The method of claim 4 or 5, wherein training the deep reinforcement learning model using ITD3 algorithm specifically comprises:

And an actor subnetwork pi;

adding noise selection actions according to a determination policy

In (a) and (b);

from the experience playback pool

Updating parameters of critic subnetworks according to gradient descent formula

7. The method of any of claims 1-5, wherein the decision result comprises a node map of the computing task offload edge server (task id→node IP) and a data map of resource allocation (node ip→ { computing resource: X, storage resource: Y … }).

8. A server, comprising:

9. An electronic device comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions for execution by the at least one processor; the instructions being executable by the at least one processor to enable the at least one processor to perform the method according to any one of claims 1-7.

10. A computer storage medium, characterized in that it stores a computer program for executing the method according to any one of claims 1-7.