CN112788605B

CN112788605B - Edge computing resource scheduling method and system based on double-delay depth certainty strategy

Info

Publication number: CN112788605B
Application number: CN202011560881.4A
Authority: CN
Inventors: 李林峰; 肖林松; 范律; 陈永; 余伟峰
Original assignee: Willfar Information Technology Co Ltd
Current assignee: Willfar Information Technology Co Ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2022-07-26
Anticipated expiration: 2040-12-25
Also published as: CN112788605A

Abstract

The invention relates to a method and a system for scheduling edge computing resources based on a double-delay depth deterministic strategy. An edge computing resource scheduling method based on a double-delay deep deterministic strategy, an edge computing system comprises an edge server and a plurality of edge gateways in communication connection with the edge server, and the method comprises the following steps: the edge server acquires independent task information sets of all the edge gateways; based on the independent task information set, the edge distribution network respectively outputs corresponding optimal server distribution frequency and optimal scheduling sequence for all the edge gateways by using a double-delay depth deterministic strategy gradient algorithm; and sending the optimal server allocation frequency and the optimal scheduling sequence to the edge gateway to perform scheduling. When the system resources are limited and tense, the delay can be greatly reduced while the energy consumption is greatly reduced, so that the user experience and the utilization rate of energy and network resources are improved.

Description

Edge computing resource scheduling method and system based on double-delay depth certainty strategy

Technical Field

The invention relates to the field of edge computing, in particular to a method and a system for scheduling edge computing resources based on a double-delay depth certainty strategy.

Background

Fifth generation mobile communication technology (5G) is facing new challenges of explosive data traffic growth and large scale device connectivity. New services of 5G networks such as virtual reality, augmented reality, unmanned vehicles, smart grids and the like put higher demands on delay, and meanwhile, the calculation-intensive applications consume a large amount of energy, so that the problems cannot be solved by user equipment, and edge calculation is carried out at the right moment. Edge computing deploys computing and storage resources at the edge of the mobile network to meet the stringent latency requirements of some applications. The edge gateway can wholly or partially unload the calculation task to the MEC server through the wireless channel for calculation, so that the delay and the energy consumption are reduced, and good user experience is obtained. The existing traditional optimization algorithm is feasible for solving the MEC computation offloading and resource allocation problem, but the traditional optimization algorithm is not very suitable for the MEC system with high real-time performance. The reinforcement learning algorithm is well suited to solve resource allocation problems, such as MEC server resource allocation.

The problem of minimizing system consumption in edge computing can be solved by finding optimal offload decisions and computing offloaded resource allocations. However, the offload decision vector X is a feasible set of binary variables and the objective function is a non-convex problem. In addition, as the number of tasks increases, the difficulty in solving the minimization of system consumption problem increases exponentially, and thus it is a non-convex problem that extends from the knapsack problem and is an NP problem.

Thus, the existing edge computing system field has shortcomings and needs to be improved and enhanced.

Disclosure of Invention

In view of the defects of the prior art, the invention aims to provide a resource scheduling method and system based on a double-delay depth deterministic strategy edge, solve the problems of delay and energy optimization in a 5G heterogeneous network, improve the utilization rate of computing resources and reduce task delay by effectively unloading resource scheduling and a server resource allocation method.

In order to achieve the purpose, the invention adopts the following technical scheme:

an edge computing resource scheduling method based on a double-delay deep deterministic strategy, wherein an edge computing system comprises an edge server and a plurality of edge gateways in communication connection with the edge server, an edge distribution network and an edge distribution target network are constructed in the edge server, and the method comprises the following steps:

the edge server acquires independent task information sets of all the edge gateways;

based on the independent task information set, the edge distribution network respectively outputs corresponding optimal server distribution frequency and optimal scheduling sequence for all the edge gateways by using a double-delay depth deterministic strategy gradient algorithm;

sending the optimal server distribution frequency and the optimal scheduling sequence to the edge gateway to execute scheduling;

the edge distribution target network carries out real-time training based on the acquired independent task information set; and updating the network parameters of the edge distribution network in sections according to the target network parameters of the edge distribution target network.

Preferably, the resource scheduling method based on the dual-delay depth deterministic policy edge computing is characterized in that the independent task information set comprises a plurality of independent task information, and the independent task information at least comprises data volume and CPU (central processing unit) cycle volume required for processing the task; the required CPU cycle amount comprises a cycle amount required by a server and a cycle amount required by an edge gateway;

the implementation step of the edge distribution network using a double-delay depth deterministic policy gradient algorithm to respectively output the corresponding optimal server distribution frequency and optimal scheduling sequence for all the edge gateways is the same as the real-time training step of the edge distribution target network, and specifically comprises the following steps:

s31, based on the independent task information set, solving the pre-distribution frequency distributed by the edge server for each edge gateway through pre-classification;

s32, the edge distribution target network classifies all independent tasks based on the required period quantity of the server and the required period quantity of the edge gateway of each independent task, and stores the tasks into an unloading task set and a local task set respectively;

s33, the edge distribution target network uses a double-delay depth certainty strategy gradient algorithm to carry out server distribution frequency on independent tasks in an unloading task set;

and S34, performing one iteration every time steps S32-S33 are executed, outputting the optimal server allocation frequency after a preset number of iterations, and determining the target network parameters of the edge allocation target network.

Preferably, in the method for scheduling resource based on edge computing of dual-delay depth deterministic policy, the output criteria of the optimal server allocation frequency are: when the iterative computation finally has a convergence result, outputting the server distribution frequency of the iterative convergence as the optimal server distribution frequency; otherwise, outputting the pre-distribution frequency as the optimal server distribution frequency.

Preferably, in the method for scheduling edge computing resources based on the double-delay depth deterministic policy, the network parameters of the edge distribution network are updated in segments according to the target network parameters of the edge distribution target network, and specifically, the method includes: when the edge distribution target network is trained in real time, in step S34, every iteration of the set number of times, the current target network parameter is divided into update sections according to a predetermined step length based on the network parameter before training to obtain update parameters, and the update parameters are used as the network parameters of the edge distribution network for updating.

Preferably, the resource scheduling method based on the dual-delay deep deterministic policy edge computing has the set times of 20-80.

Preferably, in the method for scheduling computing resources based on an edge of a dual-delay depth deterministic policy, in step S31, the specific obtaining step of the pre-allocation frequency includes:

s311, respectively calculating the dominant frequency proportion of the equipment CPU frequency of each edge gateway to the sum of the equipment CPU frequencies of all the edge gateways; distributing CPU frequency for the edge gateway in the edge server according to the main frequency proportion;

s312, calculating local execution time delay of each independent task according to the independent task information, and respectively calculating a relative time delay ratio of the local execution time delay of each edge gateway to the sum of the local execution time delays of all the edge gateways;

s313, respectively calculating the distribution weight of each edge gateway according to the main frequency proportion and the relative time delay proportion;

and S314, respectively calculating the pre-distribution frequency of each edge gateway in the edge server according to the distribution weight and the service CPU frequency.

Preferably, in the method for scheduling resource based on edge computation of dual-delay depth deterministic policy, the step S32 specifically includes:

s321, classifying the independent task information of all the edge gateways according to unloading time and server execution time, adding the independent task information of which the unloading time is less than the server execution time to a first array, and arranging all the independent task information in the first array according to the ascending order of the unloading time; adding the independent task information with the unloading time larger than or equal to the execution time of the server to a second array, and arranging all the independent task information in the second array in a descending order according to the execution time of the server;

s322, obtaining the server execution time and the unloading time of each independent task information in the first array to obtain the server processing time of each independent task information; acquiring the local execution time of each independent task information in the second array;

s323, obtaining a time difference value between the total server processing time of all the independent task information in the first array and the total local execution time of all the independent task information in the second array;

s324, determining all independent task information listed in an array with longer time according to the time difference value to form a third array; taking the processed first array as an unloading task set, and taking the processed second array as a local task set;

s325, respectively calculating server processing time and local execution time of each independent task information in the third array, putting the independent task information of which the server processing time is greater than the local execution time into the local task pre-allocation set, and putting the independent task information of which the server processing time is less than or equal to the local execution time into the unloading task pre-allocation set;

s326, after the independent task information in the third array is distributed, an unloading task set and a local task set are obtained, and an unloading decision vector is obtained according to the final unloading task set.

Preferably, in the method for scheduling resources based on dual-delay deep deterministic policy edge computation, in the step S34, the predetermined number of times is 100-200.

In the preferred edge computing resource scheduling method based on the double-delay deep deterministic strategy, the edge distribution network consists of a value network and an action network; the edge distribution target network is composed of a value target network and an action target network.

An edge computing system comprises an edge server and a plurality of edge gateways which are in communication connection with the edge server, wherein the edge server and the edge gateways work by using the edge computing resource scheduling method based on the double-delay deep deterministic strategy.

Compared with the prior art, the edge computing resource scheduling method and system based on the double-delay depth certainty strategy provided by the invention have the following beneficial effects:

the edge computing resource scheduling method provided by the invention can firstly fix the server frequency distributed to the edge computing gateway when the system resource is limited and tense, then solve the task unloading sequence and the unloading decision which can reach the minimum completion time, finally obtain the optimal server distribution frequency and the optimal scheduling sequence, greatly reduce the energy consumption and delay, and further improve the user experience and the utilization rate of energy and network resources.

Drawings

FIG. 1 is a flow chart of a resource scheduling method provided by the present invention;

FIG. 2 is a block diagram of an edge computing system provided by the present invention;

FIG. 3 is a flow chart of a real-time training and specific output method provided by the present invention;

FIG. 4 is a flow chart of a real-time training and specific output method implementation provided by the present invention;

FIG. 5 is a flow chart of the steps of the pre-sorting server allocating frequencies provided by the present invention;

FIG. 6 is a flow chart of an embodiment of the pre-sorting server frequency allocation step provided by the present invention;

FIG. 7 is a flowchart of the offload task set and offload decision vector acquisition steps provided by the present invention;

FIG. 8 is a flowchart illustrating an exemplary offloading task set and offloading decision vector obtaining procedure provided by the present invention;

FIG. 9 is a flow chart of the network parameter update procedure provided by the present invention;

FIG. 10 is a schematic diagram of a value network architecture provided by the present invention;

FIG. 11 is a schematic diagram of an action network architecture provided by the present invention;

fig. 12 is a graph of iterative rewards for optimizing network parameters provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

It is to be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory specific embodiments of the invention, and are not intended to limit the invention.

The terms "comprises," "comprising," or any other variation thereof, herein are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps, but may include other steps not expressly listed or inherent to such process or method. Also, without further limitation, one or more devices or subsystems, elements or structures or components beginning with "comprise. The appearances of the phrases "in one embodiment," "in another embodiment," and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

Referring to fig. 1-2, the present invention provides an edge computing resource scheduling method based on a dual-delay deep deterministic policy, an edge computing system includes an edge server and a plurality of edge gateways communicatively connected to the edge server, an edge distribution network and an edge distribution target network are constructed in the edge server, including the steps of:

the edge server acquires independent task information sets of all the edge gateways; the independent task information at least comprises data volume and CPU cycle volume required by processing the task; preferably, in a specific implementation, all edge gateways U ═ U are first set to { U ═ U ₁ ,U ₂ ,...,U _K Middle edge gateway U _i Is abstracted into a task set G ═ T containing two features _i,j |1≤j≤N,1≤i≤H}，T _i,j ＝(D _i,j ,C _i,j ) Wherein D is _i,j Is an edge gateway U _i The data size of the task is in bits; c _i,j Is an edge gateway U _i The number of CPU cycles required to process each unit data amount is in cycles/bit. Edge gateway U _i Has a CPU frequency of f _i,user In Hz, edge server to edge gateway U _i Has a CPU frequency of f _i,ser In Hz, edge gateway U _i Is the transmission power p, the target value tc is initialized _best ＝100。

the edge distribution target network carries out real-time training based on the acquired independent task information set; and updating the network parameters of the edge distribution network in sections according to the target network parameters of the edge distribution target network. Further, the real-time training process of the edge distribution target network and the network parameter segmentation updating process are not limited to be executed in the last step, and may be executed after all the independent task information sets of the edge gateways are received from the edge server, or may be executed at any time after the edge distribution network outputs the optimal server distribution frequency and the optimal scheduling sequence.

Specifically, referring to fig. 3, in the implementation of the edge scheduling method provided by the present invention, the execution principle is as follows: 1. generating an edge gateway description set U ═ U ₁ ,U ₂ ,...,U _K H, generating a task description set G ═ T _i,j |1≤j≤N,1≤i≤K}，T _i,j ＝(D _i,j ,C _i,j ) Wherein D is _i,j Representing edge gateway U _i The data size of the jth task is bit; c _i,j Indicating the number of CPU cycles required to process the task in cycles/bit. 2. Initializing a current value network Q ₁ ,Q ₂ Synchronizing with the weight parameter w of the current action network P and the target value network Q ₁ ',Q ₂ 'parameter w' of target action network P ═ w; initializing a default data structure for empirical playback SumTree, priority p of V leaf nodes of SumTree _V Step is 0, epoch is 0. 3. Pre-classification solving server distribution frequency f _ser,base ＝f _ser,best ＝{f _1,ser ,...,f _K,ser }. 4. Solving an unloading decision vector X by an unloading scheduling method based on the number of cycles required by the CPU to process the task; classifying all tasks according to the unloading decision vector X, respectively putting the unloading execution tasks and the local execution tasks into a set S, L, and calculating an initial state s ═ tc, ac; 5. allocating frequency f to servers of all edge gateways in set U _ser,best ＝{f _1,ser ,...,f _K,ser Using TD3(Twin Delayed Deep Delayed policy algorithm)A late depth deterministic policy gradient algorithm) algorithm solution; 6. repeating the steps 4-5 for M times and outputting f _ser,best ＝{f _1,ser ,...,f _K,ser }, optimal target value Val _best . The invention greatly reduces the task execution delay and energy consumption when the task resources are limited in the edge computing network, and improves the utilization rate of the server resources of the edge computing system and the battery life of the edge gateway.

As a preferred solution, in this embodiment, the step of updating the network parameter of the edge distribution network in segments according to the target network parameter of the edge distribution target network specifically includes: when the edge distribution target network is trained in real time, in step S34, every iteration of the set number of times, the current target network parameter is divided into update sections according to a predetermined step length based on the network parameter before training to obtain update parameters, and the update parameters are used as the network parameters of the edge distribution network for updating. For example, one of the current network parameters of the edge distribution network is 10, and the corresponding parameter value of the edge distribution target network in the trained network parameters is 11, at this time, the network parameters of the edge distribution network are optimized, but instead of directly changing the corresponding parameter value to 11, an update interval is defined between 10 and 11 according to a predetermined step length, for example, 0.2 is a predetermined length, and is increased by 0.2 each time the parameter value is updated. Preferably, the set number of times is 20 to 80. More preferably, the set number of times is 50 times. The predetermined step size is preferably 5-20% of the difference between the previous and subsequent values of the network parameter to be updated. Preferably 10%. By using the segmented updating method provided by the invention to update the parameters of the network, the time for configuring the optimal network parameters can be shortened, and the time can be shortened by half in the actual operation.

As a preferred solution, in this embodiment, referring to fig. 3 to 4, the independent task information set has a plurality of independent task information, where the independent task information at least includes a data amount and a CPU cycle amount required for processing the task; the required CPU cycle amount comprises a cycle amount required by a server and a cycle amount required by an edge gateway;

further, referring to fig. 5-6, in an implementation, in the step S31, the step of specifically obtaining the pre-allocated frequency includes:

s311, respectively calculating a main frequency proportion of the equipment CPU frequency of each edge gateway to the sum of the equipment CPU frequencies of all the edge gateways; distributing CPU frequency for the edge gateway in the edge server according to the main frequency proportion;

The operation principle of step S31 is specifically: edge gateway U _i Task T of _i,j The execution time at the edge server is represented as

Task T _i,j Is expressed as a local execution time of

Task T _i,j The unloading and conveying speed of (2) is as follows:

where w is the transmission bandwidth, g ₀ Is a path loss constant, L ₀ Is a relative distance, L _i Is the actual distance between the edge gateway and the edge server, theta is the path loss exponent, N ₀ For noise power spectral density, p denotes the edge gateway offload task T _i,j Transmission power to the edge server.

Task T _i,j Is unloaded by a transfer time of

Task T _i,j The unloading and conveying energy consumption is e _(i,j),S ：

Task T _i The local execution energy consumption is e _(i,j),L ：

e _(i,j),L ＝δ _L C _i,j (6)

Wherein, delta _L The edge gateway consumes energy per CPU cycle in joules per cycle.

Please refer to fig. 6, which shows the specific steps of pre-sorting solution server allocation frequency:

1) computing edge gateway U _i The ratio f of the resources to the total resources of the system _i,ratio ：

2) Compute edge gateway U _i Relative proportion t of local execution time delay to total system time delay _i,ratio ：

3) Compute edge gateway U _i Frequency assignment weight η of _i ；

4) Calculating the distribution frequency f of the edge gateway _i,base ：

f _i,base ＝η _i *F (10)

further, referring to fig. 7-9, in an implementation, the step S32 specifically includes:

s322, obtaining the server execution time and the unloading time of each independent task information in the first array, and obtaining the server processing time of each independent task information; acquiring the local execution time of each independent task information in the second array;

s324, determining all independent task information listed in the array with longer time according to the time difference value to form a third array; taking the processed first array as an unloading task set, and taking the processed second array as a local task set;

and S326, after the independent task information in the third array is distributed, obtaining an unloading task set and a local task set, and obtaining an unloading decision vector according to the final unloading task set.

In practical application, the operation principle is as follows: the unloading scheduling method based on the number of cycles required by the CPU to process the task solves the unloading decision vector, and the steps based on the number of cycles required by the CPU to process the task are as follows:

inputting: edge gateway U _i All task set G of _i Edge gateway U _i CPU frequency f _i,user Server to edge gateway U _i CPU frequency f _i,ser 。

And (3) outputting: offloading task set S _i ＝{S _i,1 ,S _i,2 ,...,S _i,Ns }, local task set L _i ＝{L _i,1 ,L _i,2 ,...,L _i,Nl }, offload decision vector X _i ＝{x _i,1 ,x _i,2 ,...,x _i,K }。

1) According to G _i The CPU period number required by the middle task to process the task is arranged in a descending order for all the tasks to obtain a new task order

2) Setting array

If the initial index value h is 1, the values are calculated according to the formulas (11) and (12)

Put into local set L _i Unloading set S _i Later completion time

3) If it is

Then the

Put into local set L _i Task of

Is unloaded decision variable x _i,h Step i) is repeatedly executed until the step is exited into step 4), where h is 0, h +1, and k0 is k0+ 1. Otherwise, the task

Put into and unload set S _i Task of

Is unloaded decision variable x _i,h Step ii) is repeatedly executed until the step is exited to step 4), where h is 1 and h + 1.

i) Repeatedly executing the step until the step is exited into the step 4): is compared with if

Put into local set L _i And uninstall the set S _i Completion time of (2)

Is calculated according to equation (13) for task L _i,k0 Time of completion, calculating task S according to equation (14) _i,k1 And (4) completion time. If it is

Task

Unload decision variable x _i,h Set 0, task

Put into local set L _i H is h + 1; otherwise, the task is

Put into the unload set S _i H +1, and perform 4).

ii) repeating this step until the step is exited into step 4): is compared with if

Put into local set L _i And unloading the set S _i Completion time of (2)

Task

Is unloaded decision variable x _i,h 1, task

Put into and unload set S _i,k1 H is h + 1; otherwise, the task is

Put into the unload set S _i H +1, and perform 4).

Wherein

4) If it is

Task

Unload decision variable x _i,h Equal to 0, task

Put into local set L _i Else task

Unload decision variable x _i,h 1, task

Put into the unload set S _i . h +1, and this step is repeated until h N.

To the unloading set S _i All tasks in the system are classified, and the unloading transmission time is compared

And edge server execution time

Adding the task with unloading transmission time less than the edge server execution time into the array P _i ，

Will P _i According to the unloading transmission time of all tasks

And (4) arranging in an ascending order. Adding the tasks with the unloading transmission time being more than or equal to the execution time of the edge server into an array Q _i ，

Will Q _i According to the execution time of the edge server of all tasks in the system

And (5) arranging in a descending order. Will array Q _i Added to array P _i The new task order sigma is obtained later _i ＝[P _i Q _i ]。

and S34, performing one iteration every time steps S32-S33 are executed, outputting the optimal server allocation frequency after a preset number of iterations, and determining the target network parameters of the edge allocation target network. Preferably, the predetermined number of times is 100-200, and more preferably 150. As a preferred solution, in this embodiment, the output standard of the optimal server allocation frequency is: when the iterative computation finally has a convergence result, outputting the server distribution frequency of the iterative convergence as the optimal server distribution frequency; otherwise, outputting the pre-distribution frequency as the optimal server distribution frequency.

Specifically, in the specific implementation of the steps S33 and S34, the operation principle includes: according to the unloading task set and the unloading decision vector obtained in the step S32, all edge gateways U-U are solved by using a TD3 algorithm ₁ ,U ₂ ,...,U _K Server resource allocation of f _ser,best ＝{f _1,ser ,...,f _K,ser The solving steps of are as follows:

inputting: iteration step T, maximum cycle number M, soft update step τ _c Soft update weight ratio tau _ratio Sample sampling weight coefficient beta, attenuation factor gamma, exploration rate epsilon and current value network Q ₁ And Q ₂ Goal value network Q ₁ ' and Q ₂ ', the current action network P and the target action network P', the number of samples m of batch gradient descent, and the number of leaf nodes V of SumTree.

And (3) outputting: server resource allocation: f. of _ser,best ＝{f _1,ser ,...,f _K,ser }

1) The objective of the joint task scheduling and server resource allocation problem is to minimize energy consumption and completion time of all tasks, and the mathematical model of the optimization problem is represented as (16) to (21), which is denoted as original problem P1. Where formula (16) is the objective function and formulae (17) to (21) are the constraints.

Wherein

The completion time of all the unloading tasks after the sorting is shown, and Ns represents the number of all the unloading execution tasks. And Nl represents the number of the locally executed tasks.

Representing the total power consumption of the edge server to perform all tasks.

To the completion time of the jth ordered off-load task,

is a set S _i The server processing time of the jth offload task in (1).

Is a set S _i The calculation formula of the transmission time of the 1 st to the jth unloading tasks is shown in the formula (15).

2) One state s ═ is initialized and normalized (tc, ac). Where tc is the system consumption of the whole system in the current state, which can be obtained from equation (16). ac is the available computing capacity of the MEC server, and the computing mode is as follows:

the system state s is normalized as follows:

wherein, tc _σ And tc _μ Mean and variance of system consumption; ac _μ And ac _σ The mean and variance of the server's remaining frequencies.

tc _σ 、tc _μ 、ac _σ 、ac _μ The calculation of (c) is as follows:

3) generating a random action a { (f) with a probability of ε _1,ser ,...f _i,ser ,...,f _K,ser )|0≤f _i,ser ≤2f _i,base I ≦ K0 ≦ and obtains a new system state s ≦ K ≦ t (tc, ac). Or inputting the state s ═ (tc, ac) into the target action network P' with a probability of 1-epsilon to obtain the predicted action a { (f) _1,ser ,...f _i,ser ,...,f _K,ser )|0≤f _i,ser ≤2f _i,base ,0≤i≤K}。step＝step+1。

Wherein epsilon _max To converge with probability, epsilon _min Is the minimum random probability of ∈ _const For the random rate constant, step is the number of predictions by the neural network.

The method of calculating the predicted action a from the target action network P' is as follows:

i) adding Gaussian white noise with the mean value of 0 and the variance of sigma to the output layer of the target action network P' and cutting the Gaussian white noise to a position between [0 and 1], wherein the calculation mode is as follows:

wherein q is _k Acting on the network P for the target ₂ ' output layer k-th neuron output value.

ii) by (q) q ₁ ,...,q _k ,...,q _K ) Normalization is performed by the following method:

iii) outputting the value q _k Adjusting to a proper action interval to obtain an action a:

a＝(q ₁ ,...,q _k ,...,q _K )*2f _i,base ,1≤k≤K,k∈N ^* (28)

4) the next state s' ═(tc, ac) is calculated from the action a. If ac is less than 0, flag bit end is True, otherwise end is False, reward r, Sam is (s, s', r, q, end) stored in SumTree in sequence, and state iteration is performed: s is equal to s'.

The calculation formula of the reward r is as follows:

and calculates the cumulative prize for that round of epochs.

5) If tc < tc _best Then tc _best ＝tc，f _ser,best ＝a。

6) Judging step V, if yes, then entering the next step if the experience pool is full, if not, returning to 2)

7) Extracting m samples from SumTree to train the neural network in the following way:

i) let i equal to 1 and j equal to 1. Summing all leaf nodes in SumTree to obtain the priority of the root node with the value of L _1,1 . SumTree consensus Floor 1+ log ₂ And (V) layer.

Ii) root node priority L ₁₁ Is divided into

Randomly selecting one number in each interval to obtain t ═ t ₁ ,...,t _i ,...,t _y ]。

I ii) according to t _i The search starts with the topmost root node.

iv) the priority of the left leaf node is set as left at this time, and the priority of the right leaf node is set as right. If left>t _i Entering a left leaf node, otherwise entering a right leaf node; if entering the right leaf node, then t _i ＝t _i -left. j is j + 1. Repeat this step until j>Floor. At this time t _i The sample stored by the corresponding leaf node is Sam _i 。

v) repeating the above steps until Sam ═ Sam is selected ₁ ,...,Sam _m ]For a total of m samples.

vi) and updating the priority of each sample, sample priority p _y The updating method is as follows:

p _y ＝loss _m +0.0001,y∈V (30)

among them, loss _m For the value loss value of sample m, 0.0001 is to prevent L after summation _1,1 ＝0。

8) Updating the current value network Q by back propagation using the taken m samples ₁ And Q ₂ Goal value network Q ₁ 'and Q' ₂ All parameters of the current action network P and the target action network P'. The network updating method comprises the following steps:

i) the sample Sam is input to the target motion network P 'to obtain the output layer vector q (q), which is (s, s', r, q, end) ₁ ,...,q _k ,...,q _K ) Adding white Gaussian noise with mean 0 and variance σ and clippingThe shearing, adding and shearing modes are shown as a formula (24). Then, normalizing and adjusting to a proper interval to obtain the predicted action a, wherein the normalization mode and the adjustment mode are shown as formulas (25) and (26).

ii) inputting the system state (s', a) into the target value network Q ₁ ' and target value network Q ₂ ', obtaining a target value vq ═ vq (vq) ₁ ,vq ₂ ) Introduction of Q into ₂ Substituting formula (29) to obtain the expected value vq _exp 。

Wherein gamma is an attenuation coefficient, and the calculation mode is as follows;

wherein, γ _max Is the maximum value of the attenuation coefficient, gamma _min Is the minimum value of attenuation coefficient, gamma _const Is a constant attenuation coefficient.

i) Inputting the system state (s, a) into the current value network Q ₁ And a target value network Q ₂ Combined expected value vq _exp Calculating loss of value

And performs gradient descent and back propagation to update Q ₁ 、Q ₂ Weight coefficients for the value network. Wherein the loss of value is calculated

The method of (1) is as follows:

wherein vq is _l,j Inputting the current value network Q for the system state (s, a) ₁ And a target value network Q ₂ The value of (A) is obtained. ws (all-weather data) _j Is the jth sampleThe corresponding sample weight is calculated as follows:

where m is the number of samples extracted, p _j And beta is a sample sampling weight coefficient, and the calculation mode is as follows:

wherein beta is _start Is the initial value of the sampling weight coefficient, beta _start Is a sample weight coefficient constant.

ii) calculating the average value loss for sample m as follows:

updating the priority p of a sample by equation (28) _y 。

9) Judgment of step% τ _c If 0 is true, step 10) is entered, otherwise step 11) is entered.

10) Inputting the system state s into the current action network P to obtain the predicted action a, substituting (s, a) into the current value network Q ₁ Obtaining loss of motion loss _a And loss of motion _a Back propagation and updating of the current value network Q ₁ And a current action network P. The weights of the current action network P, the current value network Q and the target value network Q' are updated in a soft updating mode, wherein the weights are updated in the following mode:

w'＝w'(1-τ _ratio )+w*τ _ratio (37)

wherein w' is the weight of the target network, and w is the weight of the current network.

11) And (4) judging end to True or step% T to 0, if yes, then enter into step 6, otherwise return to step 2).

Judging whether the epoch is less thanM, if yes, returning to the step S32, otherwise outputting tc _best ,f _ser,best . Specific bonus situations, which can participate in fig. 12, can be found: the accumulated reward is continuously increased, and the reward obtained by each predicted action is integrally improved.

As shown in fig. 10 and 11, in the present embodiment, the edge distribution network preferably includes a value network and an action network; the edge distribution target network is composed of a value target network and an action target network. Further, the value networks are preferably two, and the action networks are preferably one, which form one edge distribution network. The value target networks are preferably two, the action networks are preferably one, the edge distribution target networks are correspondingly formed, joint training is carried out when the edge distribution target networks are trained in real time, and synchronous updating is carried out on three networks (two value networks and one action network) when network parameters of the real-time edge distribution networks are updated correspondingly. Specifically, the value network is mainly used for supervising the operation of the action network, the action network is mainly used for outputting the optimal server allocation frequency and the optimal scheduling sequence, and in the action network, a K layer is a neuron which is respectively connected with the edge gateways and used for receiving the independent task information sets uploaded by the edge gateways and sending the optimal scheduling sequence to the corresponding edge gateways; in this embodiment, the edge computing system has how many edge gateways and the action network has the same number of data neurons.

Specifically, the following description will be made in detail by taking an edge computing system as an example. Fig. 2 is a schematic diagram of an edge computing scenario model, which includes an edge server, K mobile edge gateways (K ═ 2), and 7 independent tasks (N ═ 7). Let a set of computing tasks be

Each task T _i,j The amount of data required to be processed is D _i,j Each task T _i,j Is C per unit data _i,j Maximum transmission power for each task is p _max 100mw, the transmission distance from the edge gateway to the edge server is L ═ L ₁ ,L ₂ }。

S1-1 initializes a task set, task T _i,j D of (A) _i,j And C _i,j As shown in table 1, in order to solve the optimal solution, it is assumed that the transmission powers corresponding to the two edge gateways are p ═ mw (64.248, 59.039), and the energy consumption δ per CPU cycle of the edge gateway is δ _L ＝1.6541*10 ^-9 W/Hz, CPU frequency of edge gateway is f _user (0.5, 1) GHz, edge gateway U ═ U ₁ ,U ₂ Distance to the edge server is L ═ (154.881, 171.518) m. The CPU frequency of the edge server is f _ser 2 GHz. Each edge gateway has a transmission bandwidth of 5 MHz.

TABLE 1 parameter Table for individual tasks

The system parameters are shown in table 2.

TABLE 2 execution time and energy consumption List of tasks

S1-2 initializing value network Q ₁ ,Q ₂ ,Q ₁ ',Q' ₂ And a weight parameter of the action network P, P'. Initializing a default data structure for empirical playback of SumTree, the priority p of the V (V64) leaf nodes of SumTree _V 1, epoch is 0. The neural network structure is shown in fig. 6 and fig. 7.

S1-3, solving server pre-classification distribution frequency:

calculating G ═ G (G) ₁ ,G ₂ ) Local execution time of each task

Time of task transmission

Energy consumption for task transmission e _(i,j),S Local execution energy consumption e _(i,j),L The calculation results are shown in table 3:

TABLE 3 execution time and energy consumption Chart of tasks

The formula (7) can be used to calculate the edge gateway U ═ U ₁ ,U ₂ The relative proportion f of the local resources of i to the total resources of the system _i,ratio ＝(0.016，0.327)。

The formula (8) can be used to calculate the edge gateway U ═ U ₁ ,U ₂ The relative proportion t of the local execution time delay to the total time delay of the system _i,ratio ＝(0.063，0.936)

The formula (9) can be used to calculate the edge gateway U ═ { U ═ U- ₁ ,U ₂ Frequency assignment weight η of } _i ＝(0.576，0.424)。

The distributed frequency f of the edge gateway can be calculated by the formula (10) _i,base ＝(1.15*10 ⁹ ，8.49*10 ⁸ )

S1-4, solving an unloading decision vector based on the unloading scheduling method of the number of cycles required by the CPU to process the task:

s2-1 according to G _i The CPU period number required by the middle task to process the task is arranged in a descending order for all the tasks to obtain a new task order

TABLE 4 ordering table for CPU period number needed by task

G ₁

T _1,6

T _1,4

T _1,5

T _1,1

T _1,2

T _1,3

T _1,7

G ₂

T _2,7

T _2,2

T _2,5

T _2,6

T _2,4

T _2,1

T _2,3

S2-2 setting array

Initial subscript value h ═ 1, according to equations (1) (2) and f _i,base Separate computing task

Put into local set L _i Unloading set S _i Later completion time

S2-3 if

Then

Put into local set L _i Task of

Is unloaded decision variable x _i,h Step S3-1) is repeatedly executed until the step-in step S2-4) is exited, where h is h +1 and k0 is k0+ 1). Otherwise, the task

Put into and unload set S _i Task(s)

Unload decision variable x _i,h Step S3-2) is repeatedly executed until the step S2-4) is exited, where h is 1 and h + 1).

S3-1 repeatedly executes this step until the step exits to step S2-4): is compared to if

Put into local set L _i And uninstall the set S _i Completion time of

According to equation (13), calculate task L _i,k0 Time of completion, calculating task S according to equation (14) _i,k1 The completion time. If it is

Task

Unload decision variable x _i,h Equal to 0, task

Put into local set L _i H is h + 1; otherwise, the task is

Put into the unload set S _i H +1, and S2-4) is performed.

S3-2 repeatedly execute this step until the step exits to step S2-4): is compared to if

Put into local set L _i And unloading the set S _i Completion time of (2)

Is calculated according to equation (13) for task L _i,k0 Time of completion, calculating task S according to equation (14) _ik1 The completion time. If it is

Task

Is unloaded decision variable x _i,h 1, task

Put into the unload set S _i,k1 H is h + 1; otherwise, the task is

Put into the unload set S _i H +1, and S2-4 is performed).

S2-4 if

Task

Is unloaded decision variable x _i,h Equal to 0, task

Put into local set L _i Else task

Is unloaded decision variable x _i,h 1, task

Put into the unload set S _i . h +1, and this step is repeated until h N.

At this time, set S _i And set L _i The task distribution in (2) is shown in table 5:

TABLE 5 distribution of tasks in set S and set L

S2-5 pairs of offload collections S _i All tasks in the system are classified, and the unloading transmission time is compared

And edge server execution time

Will P _i According to the unloading transmission time of all tasks

And (5) arranging in an ascending order. Adding the tasks with the unloading transmission time being more than or equal to the execution time of the edge server into an array Q _i ，

Will Q _i The execution time of all the tasks is determined according to the edge server

And (5) arranging in a descending order. Will array Q _i Added to array P _i The new task order sigma is obtained later _i ＝[P _i Q _i ]. At this time, the set P _i Set Q _i The task distribution in (1) is shown in table 6:

table 6 set P _i 、Q _i Task distribution in

P ₁	T _1,1	T _1,3	T _1,2
					Q ₁	T _1,5	T _1,4	T _1,6
P ₂	T _2,5	T _2,6	T _2,4	T _2,1
					Q ₂	T _2,2

S1-5, solving all edge gateways U ═ U by using the TD3 algorithm according to the unloading task set and the unloading decision vector obtained in the step S1-4 ₁ ,U ₂ ,...,U _K Server resource allocation of f _ser,best ＝{f _1,ser ,...,f _K,ser }：

S4-1 constructs an optimization problem P1.

S4-2 randomly generates a fraction epsilon between (0, 1) ₀ If epsilon ₀ <If epsilon is not, the state s is input into the target action network P', and the predicted value isq and normalizing and adjusting the normalized interval to obtain the action a. step + 1.

At this time,. epsilon ₀ ＝0.15，ε＝0.2，ε ₀ <Epsilon, the random action generated is a ═ (1.046 ═ 10) ⁹ ，9.5308*10 ⁸ )。

S4-3 calculates the next normalized state S ' ═ 0.0071, 0, end ═ False, reward r ═ 0.31, (S, S ', r, q, end) into SumTree, state iteration S ═ S ', target value tc ═ 0.0286, according to action a.

S4-4, determining tc < tc _best Whether it is true, if true, tc _best ＝tc，f _ser,best ＝f _ser . If not, the process proceeds directly to S4-5.

S4-5 judges whether step > V is satisfied, if not, returns to step S4-2, and if so, proceeds to step S4-6.

S4-6 training neural network Q by extracting m samples from SumTree ₁ ,Q ₂ P, the priority of each sample is updated. .

S4-7 determines whether step% C is true, and if true, soft-updates the neural network Q ₁ ',Q ₂ ', P', if not, proceeds directly to S4-8.

S4-8 determines whether end or step% T is 0, and if True, epoch + 1. If not, the process returns to step S4-2.

S1-6 judgment of epoch<Whether M is true or not, if true, outputting tc _best ,f _ser,best If not, the process returns to step S1-4.

The final optimization results are shown in the following table:

system execution delay	0.011825
		Consumption of the system	0.021417
Server allocation of frequencies	[1.03294728e+09 9.67052723e+08]

In summary, reinforcement learning can create experience for learning by itself through a trial-and-return feedback mechanism different from the conventional optimization algorithm, so as to complete the optimization objective. The deep learning algorithm can learn the historical data characteristics, and compared with the traditional optimization algorithm, the efficiency is greatly improved after the training is finished. The joint task scheduling and resource allocation method is an unloading iterative algorithm combining scheduling optimization and reinforcement learning: 1. the server frequency allocated to the edge gateway is fixed, and then the task unloading sequence and the unloading decision which can reach the minimum completion time are solved. 2. And solving the optimal server distribution frequency corresponding to each unloading task in the unloading sequence under the condition that the unloading sequence obtained in the last step is fixed and unchanged. And repeating the two steps of iteration to finally obtain the optimal server distribution frequency and the optimal scheduling sequence.

The invention also provides an edge computing system, which comprises an edge server and a plurality of edge gateways in communication connection with the edge server, wherein the edge server and the edge gateways work by using the edge computing resource scheduling method based on the double-delay depth certainty strategy. The joint task scheduling and resource allocation method is an unloading iterative algorithm combining scheduling optimization and reinforcement learning: 1. the server frequency allocated to the edge gateway is fixed, and then the task unloading sequence and the unloading decision which can reach the minimum completion time are solved. 2. And solving the optimal server distribution frequency corresponding to each unloading task in the unloading sequence under the condition that the unloading sequence obtained in the last step is fixed and unchanged. And repeating the two steps of iteration to finally obtain the optimal server distribution frequency and the optimal scheduling sequence. The result of obtaining the optimal server distribution frequency is faster, and the method can be suitable for more complex systems.

It should be understood that equivalents and modifications of the technical solution and inventive concept thereof may occur to those skilled in the art, and all such modifications and alterations should fall within the scope of the appended claims.

Claims

1. An edge computing resource scheduling method based on a double-delay deep deterministic strategy is characterized in that an edge computing system comprises an edge server and a plurality of edge gateways in communication connection with the edge server, an edge distribution network and an edge distribution target network are built in the edge server, and the method comprises the following steps:

2. The method for scheduling resources based on the edge computing of the double-delay depth deterministic strategy according to claim 1, wherein the independent task information set comprises a plurality of independent task information, and the independent task information at least comprises data volume and CPU cycle volume required for processing the task; the required CPU cycle amount comprises a cycle amount required by a server and a cycle amount required by an edge gateway;

the edge distribution network uses a double-delay depth deterministic policy gradient algorithm to respectively output the corresponding optimal server distribution frequency and optimal scheduling sequence for all the edge gateways, and the execution steps are the same as the real-time training steps of the edge distribution target network, and specifically include:

s32, the edge distribution target network classifies all the independent tasks based on the period quantity required by the server and the period quantity required by the edge gateway of each independent task, and stores the independent tasks into an unloading task set and a local task set respectively;

and S34, performing one iteration every time steps S32-S33 are performed, outputting the optimal server distribution frequency after a preset number of iterations, and determining the target network parameters of the edge distribution target network.

3. The method for scheduling resource based on edge computing of double-delay deep deterministic policy according to claim 2, wherein the output criteria of the optimal server allocation frequency are: when the iterative computation finally has a convergence result, outputting the server distribution frequency of the iterative convergence as the optimal server distribution frequency; otherwise, outputting the pre-distribution frequency as the optimal server distribution frequency.

4. The method for scheduling edge computing resources based on the dual-delay deep deterministic policy according to claim 2, wherein the network parameters of the edge distribution network are updated in segments according to the target network parameters of the edge distribution target network, specifically: when the edge distribution target network is trained in real time, in step S34, every iteration of the set number of times, the current target network parameter is divided into update sections according to a predetermined step length based on the network parameter before training to obtain update parameters, and the update parameters are used as the network parameters of the edge distribution network for updating.

5. The method according to claim 4, wherein the set number of times is 20-80.

6. The method for scheduling computation resources based on the dual-delay deep deterministic policy edge as claimed in claim 2, wherein in step S31, the specific obtaining step of the pre-allocated frequency comprises:

s313, respectively calculating the distribution weight of each edge gateway according to the dominant frequency proportion and the relative time delay proportion;

7. The method for scheduling resources for computing an edge based on a dual-delay deep deterministic policy according to claim 2, wherein the step S32 specifically includes:

s321, classifying the independent task information of all the edge gateways according to unloading time and server execution time, adding the independent task information of which the unloading time is less than the server execution time to a first array, and arranging all the independent task information in the first array according to the ascending order of the unloading time; adding the independent task information with the unloading time being more than or equal to the execution time of the server to a second array, and arranging all the independent task information in the second array in a descending order according to the execution time of the server;

s325, respectively calculating server processing time and local execution time for each independent task information in the third array, putting the independent task information of which the server processing time is greater than the local execution time into the local task pre-allocation set, and putting the independent task information of which the server processing time is less than or equal to the local execution time into the unloading task pre-allocation set;

8. The method as claimed in claim 2, wherein the predetermined number of times is 100-200 in the step S34.

9. The dual-delay deep deterministic policy based edge computing resource scheduling method of claim 1, wherein the edge distribution network consists of a value network and an action network; the edge distribution target network is composed of a value target network and an action target network.

10. An edge computing system comprising an edge server and a plurality of edge gateways communicatively coupled to the edge server, wherein the edge server and the plurality of edge gateways operate using the dual-delay depth deterministic policy based edge computing resource scheduling method of any of claims 1-9.