CN115174681A

CN115174681A - Method, equipment and storage medium for scheduling edge computing service request

Info

Publication number: CN115174681A
Application number: CN202210685149.2A
Authority: CN
Inventors: 李兵; 赵玉琦; 姜德纶; 王健; 李段腾川
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2022-06-14
Filing date: 2022-06-14
Publication date: 2022-10-11
Anticipated expiration: 2042-06-14
Also published as: CN115174681B

Abstract

The invention relates to a method, a device and a storage medium for scheduling edge computing service requests, which are characterized by comprising the following steps: making a decision according to the execution sequence of the plurality of service requests queued in the edge server by the pointer network; optimizing the pointer network according to the utilization condition of the edge server resources, the service request running time and the service request waiting time; the pointer network comprises an actor network and a critic network, wherein the actor network is used for deciding the execution sequence of service requests, and the critic network is used for predicting subsequent decisions according to the decisions made by the actor network and assisting parameter updating of the actor network based on predicted values. The resource utilization rate of the edge server can be effectively improved, the time required by the completion of the execution of the service request sequence is shortened, and the average waiting time of the request is reduced.

Description

Edge computing service request scheduling method, device and storage medium

Technical Field

The present invention relates to the field of edge computing application technologies, and in particular, to a method, a device, and a storage medium for scheduling an edge computing service request.

Background

In recent years, with the rapid development of the internet, cloud computing technology has been widely applied in various industries. However, in some scenarios, the traditional cloud computing method also exposes its disadvantages. For example, in the internet of things (IoT) that is currently and rapidly developed, since the conventional cloud computing method transmits data to the cloud for processing by means of task offloading and returns a result to the user terminal, many services with high sensitivity to communication delay cannot be responded to in time.

With the development of Docker and Kubernets, micro-services can be deployed in a more flexible and convenient manner. By adopting the edge computing network architecture as shown in fig. 2, in the edge computing environment, by deploying the micro-service to the network edge, the task delay is greatly shortened, and the problem of delay sensitivity can be effectively solved. However, the resources of the edge server are still very limited compared to the cloud, and how to fully utilize the service resources in the network edge environment and satisfy the service requests as much as possible is one of the main problems faced by edge computing.

For an edge server, there are usually multiple users requesting its service at the same time, and when the query rate (QPS) is high to a certain extent, the limited resources of the edge server make it often unable to satisfy a large number of service requests at the same time. The related technology carries out a micro-service request scheduling strategy, and needs to queue the service requests after the service requests reach an edge server and arrange the execution sequence of the waiting service requests so as to achieve the effect of ensuring the service quality. But have problems including:

first, only a single index is concerned, such as the running time and the request waiting time required by a task, and the overall consideration of multiple indexes is lacking.

Secondly, a heuristic algorithm is adopted to optimize the strategy, and as the heuristic algorithm needs long-time iteration to obtain a better result, the requirement of quick response under edge calculation is not met.

Disclosure of Invention

The embodiment of the invention provides a method and a device for scheduling edge computing service requests, which are used for solving the problems in the related art.

In one aspect, an embodiment of the present invention provides a method for scheduling an edge computing service request, where the method includes:

making a decision according to the execution sequence of the plurality of service requests queued in the edge server by the pointer network;

optimizing the pointer network according to the utilization condition of the edge server resources, the service request running time and the service request waiting time;

the pointer network comprises an actor network and a critic network, wherein the actor network is used for deciding the execution sequence of service requests, and the critic network is used for predicting subsequent decisions according to the decisions made by the actor network and assisting parameter updating of the actor network based on predicted values.

In some embodiments, the optimizing the actor network based on edge server resource utilization, service request runtime, and service request latency includes:

defining a reward function for reinforcement learning according to the resource utilization condition of the edge server, the service request running time and the service request waiting time, and training the actor network based on the reward function;

and taking the predicted value of the critic network as the value of the baseline function in the training process of the actor network so as to optimize the parameters of the actor network.

In some embodiments, the defining the reinforcement learning reward function according to the edge server resource utilization, the service request running time and the service request waiting time comprises the following steps: based on reward = α reward ₁ +β*reward ₂ +γ*reward ₃ Determining the reward function raward, wherein,

the above-mentioned

The above-mentioned

The above-mentioned

Alpha, beta, gamma are weighting coefficients, C _j Is the CPU capacity of the edge server, O _j Is the I/0 capacity of the edge server, B _j For the bandwidth capacity of the edge server, M _j Is the memory capacity of the edge server, m is the total number of edge servers, W _i Waiting time for service request i, T _ map _j The total time required for the edge server j to run all service requests.

In some embodiments, said training said actor network based on said reward function comprises the steps of:

defining a trained strategy gradient, the strategy gradient being executed by

Wherein, θ is a parameter of the actor network,

represents the gradient of theta, J (theta | Q) is the optimization target, E _C～pθ(.|Q) Represents the mathematical expectation of all policies with the set of known service requests Q, p θ represents the set of policies, reward (C) ^Q Q) is a known set of service requests Q, policy C is taken ^Q Function value of time reward, b (Q) is AND strategy C ^Q An independent baseline function, which is used to estimate the value of reward to reduce the variance of the gradient.

In some embodiments, the method comprises the steps of: training the critic network by adopting a random gradient descent mode, wherein the random gradient descent mode is as follows:

wherein the content of the first and second substances,

in order to be a predicted reward value,

actually decided reward value, l (θ), for the actor network _v ) For a random gradient, θ _v Are network parameters.

In some embodiments, the actor network comprises an encoder and a decoder, and the encoder and decoder each comprise a recurrent neural network composed of a plurality of long-short term memory networks, the actor network, in deciding on the order of execution of service requests, comprising the steps of:

taking a service request sequence which is queuing as an input sequence and converting the service request sequence into a first intermediate vector to be input into an encoder of the actor network to obtain the state of each hidden layer corresponding to the encoder;

inputting the state of each hidden layer of the encoder into a decoder to obtain the state of each hidden layer of the decoder, and obtaining a second intermediate vector from the state of each hidden layer of the decoder through the attention mechanism of the pointer network;

acquiring the probability that a decoder selects each service request in a certain hidden layer as the output of the layer based on the second intermediate vector;

and each hidden layer selects a service request with the highest probability as the output of the layer, and defines the sequence of executing the service requests according to the output of all the hidden layers as the output sequence of the edge server.

In some embodiments, before said deciding an execution order of the plurality of service requests queued in the edge server, further comprising:

for each service request, acquiring a set of edge servers capable of receiving the service request, and randomly selecting one edge server from the set of edge servers as an edge server for processing the service request.

In some embodiments, for each service request, obtaining a set of edge servers capable of receiving the service request includes:

according to pi _i ＝{s _j |r _j ≥||p _i -p _j || ₂ ,s _j S) obtaining the set of edge servers that can receive the service request, wherein pi _i For the set of all edge servers that can receive the service request, p _i Coordinates requested for the service; p is a radical of formula _j As coordinates of the edge server, r _j Coverage radius of edge server, s _j Is the jth edge server and S is the set of all edge servers.

In a second aspect, an embodiment of the present invention further provides a computer-readable storage medium, where at least one instruction is stored in the storage medium, and the instruction is loaded and executed by a processor to implement the method according to any one of claims 1 to 8.

In a third aspect, an embodiment of the present invention further provides an apparatus, where the apparatus includes: at least one processor; and a memory coupled to the at least one processor, the memory containing instructions stored therein that, when loaded and executed by the processor, perform the method of any of claims 1-8. The technical scheme provided by the invention has the beneficial effects that:

the embodiment of the invention provides a method and a device for scheduling edge computing service requests, which are used for remarkably improving the service quality, effectively improving the resource utilization rate of an edge server, shortening the time required by the completion of the execution of a service request sequence and reducing the average waiting time of requests by extracting the main characteristics of task operation and combining and considering three indexes of resource utilization rate, operation time and waiting time in the service request process.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a method for scheduling an edge computing service request according to an embodiment of the present invention;

FIG. 2 is a block diagram of an edge computing network architecture according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating an overall implementation of a scheduling policy according to an embodiment of the present invention;

fig. 4 is a flowchart illustrating a method for scheduling an edge computing service request according to an embodiment of the present invention;

FIG. 5 is a data comparison chart of experiment result 1 provided in the embodiment of the present invention;

fig. 6 is a data comparison graph of experimental result 2 provided in the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

As shown in fig. 1, an embodiment of the present invention provides a method for scheduling an edge computing service request, including the steps of:

s100, making a decision according to the execution sequence of the plurality of service requests queued in the edge server by the pointer network;

s200, optimizing the pointer network according to the utilization condition of the edge server resources, the service request running time and the service request waiting time;

and the pointer network comprises an actor network and a critic network, wherein the actor network is used for deciding the execution sequence of the service requests, and the critic network is used for predicting subsequent decisions according to the decisions made by the actor network and assisting the parameter update of the actor network based on the predicted values.

It should be noted that, the edge server resource utilization condition may include CPU capacity, I/O capacity, bandwidth capacity, memory capacity, and the like of the edge server; the service request runtime may refer to the total time required for the edge server to perform all queued service requests; the service request latency may refer to the time interval from the arrival of the service request at the edge server to the completion of the processing.

It can be understood that the embodiment of the invention is different from the traditional heuristic algorithm in long-time iteration, and the pointer network based on artificial intelligence can make a quick decision and meet the requirement of time delay sensitivity in the edge computing environment. Meanwhile, in the optimization of the pointer network, a plurality of optimization targets including the resource utilization rate of the edge server, the service request running time and the service request waiting time are fully considered, optimization is carried out from a plurality of dimensions, and the service quality is improved.

In some embodiments, S200 includes the steps of:

s210, defining a reward function of reinforcement learning according to the utilization condition of the edge server resources, the service request running time and the service request waiting time, and training the actor network based on the reward function;

and S220, taking the predicted value of the critic network as the value of a baseline function in the training process of the actor network so as to optimize the parameters of the actor network.

It can be appreciated that the actor network can be trained continuously while making intelligent decisions, thereby avoiding the problem of reduced model effect due to fluctuations in service request data over time.

In the embodiment of the invention, the actor network is trained according to a plurality of optimization targets, the critic network is introduced to carry out auxiliary optimization on the actor network, and the predicted value of the critic network is used as a baseline function value to influence the parameter update of the actor network.

Further, S110 includes the steps of: based on reward = α reward ₁ +β*reward ₂ +γ*reward ₃ A reward function reward is determined, wherein,

alpha, beta, gamma are weighting coefficients, C _j Is the CPU capacity of the edge server, O _j Is the I/O capacity of the edge server, B _j For the bandwidth capacity of the edge server, M _j Is the memory capacity of the edge server, m is the total number of edge servers, W _i Waiting time for service request i, T _ map _j The total time required for the edge server j to run all the service requests.

Further, the training of the actor network based on the reward function in S110 includes the steps of:

defining a strategy gradient of the training, and executing the strategy gradient as

Wherein, θ is a parameter of the actor network,

represents the gradient of theta, J (theta | Q) is the optimization target, E _C～pθ(.|Q) Representing the mathematical expectation of all policies in the case of a known set of service requests Q, p θ represents the set of policies, reward(C ^Q Q) is taken policy C given a set of service requests Q ^Q The function value of time reward, b (Q) is and strategy C ^Q An independent baseline function, which is used to estimate the value of reward to reduce the variance of the gradient.

Further, when predicting a subsequent decision according to the decision made by the actor network in S120, the critic network is trained in a random gradient descent mode, where the random gradient descent mode is:

wherein, the first and the second end of the pipe are connected with each other,

in order to be a predicted reward value,

As shown in fig. 3, in some embodiments, the actor network includes an encoder and a decoder, and the encoder and the decoder each include a recurrent neural network composed of a plurality of long-short term memory networks, and the actor network makes a decision on the execution order of a plurality of service requests queued in the edge server, including the steps of:

s110, taking the service request sequence which is queued as an input sequence and converting the service request sequence into a first intermediate vector to be input into an encoder of the actor network to obtain the state of each hidden layer corresponding to the encoder;

s120, inputting the state of each hidden layer of the encoder into a decoder to obtain the state of each hidden layer of the decoder, and obtaining a second intermediate vector from the state of each hidden layer of the decoder through the attention mechanism of the actor network;

s130, acquiring the probability that the decoder selects each service request in a certain hidden layer as the output of the hidden layer based on the second intermediate vector;

and S140, each hidden layer selects a service request with the highest probability as the output of the layer, and defines the sequence of executing the service requests according to the output of all the hidden layers as the output sequence of the edge server.

It should be noted that in S101, an embedding operation may be performed on the input sequence to convert the input sequence into a first intermediate vector, and the first intermediate vector is used to compress the data dimension so as to match the data dimension with the network model. Furthermore, the number of encoder and decoder concealment layers depends on the input sequence length.

Preferably, the first intermediate vector may be represented as:

wherein e is _j Representing the encoder layer j hidden layer state, d _i Representing the i-th hidden layer state of the decoder, v ^T 、W ₁ 、W ₂ Are all parameters to be trained by the pointer network.

Preferably, in S102 can be paired

Performing softmax operation to obtain the probability p (C) that the decoder selects each service request in the ith hidden layer as the output of the layer _i |C ₁ ,...,C _i-1 ,Q)＝softmax(u ⁱ ) Wherein p (C) _i |C ₁ ,...,C _i-1 Q) represents the probability vector of the input sequence in the i-th hidden layer of the decoder, with dimensions equal to the input sequence length.

In some embodiments, before the decision, for each service request, a set of edge servers capable of receiving the service request is obtained, and one edge server is randomly selected from the set of edge servers as an edge server for processing the service request.

Preferably, the step of obtaining a set of edge servers capable of receiving the service request comprises the steps of: according to pi _i ＝{s _j |r _j ≥||p _i -p _j || ₂ ,s _j S) obtaining the set of edge servers that can receive the service request, where pi _i For the set of all edge servers that can receive the service request, p _i Coordinates requested for the service; p is a radical of formula _j As coordinates of the edge server, r _j Coverage radius of edge server, s _j Is the jth edge server, and S is the set of all edge servers.

As shown in fig. 4, in a specific embodiment, an edge computing service request scheduling method includes the steps of:

step S1, edge server information and service request data under a real environment are obtained, data preprocessing is respectively carried out on the characteristics of the edge server information and the service request data, and preprocessed edge server information and service request data representation are obtained.

And S2, on the basis of the step S1, distributing the service request to different edge servers for processing according to the coverage area of the edge servers and the service request initiating position.

And S3, on the basis of the step S2, when a plurality of service requests are queued inside the edge server, making a decision on the execution sequence of the service requests by using a pointer network model.

And S4, on the basis of the step S3, training the pointer network model by using a reinforcement learning mode while making a decision on the pointer network model, so that a better decision effect is achieved.

Further, step S1 includes:

s1.1, analyzing edge server information in a real scene, and enabling a single edge server S _j Represented as a quadruple: s _j ＝(C _j ,O _j ,B _j ,M _j ) Wherein, C _j Representing CPU capacity of edge servers, O _j Indicating the I/O capacity of the edge server, B _j Representing the bandwidth capacity of the edge server, M _j The memory capacity of the edge server is represented, and the edge server set S is defined as: s = { S = ₁ ,s ₂ ,...,s _m }。

S1.2, analyzing service request data in a real scene and making a single service request q _i Represented as a seven-tuple: q. q of _i ＝(c _i ,o _i ,b _i ,m _i ,T _i ,t _i ,π _i ) Wherein the first four dimensions (c) _i ,o _i ,b _i ,m _i ) Respectively represent execution q _i CPU, I/O, bandwidth, memory, T, required for a request _i Denotes q _i Timestamp of request initiation, t _i Represents q _i Time required for the requested operation, pi _i Represents the set of all edge servers that can receive the service request, and defines the set of service requests Q as: q = { Q = ₁ ,q ₂ ,...,q _N }。

Further, step S2 includes:

s2.1, on the basis of step S1.2, pi _i The statistical result of the coverage relation of the edge server to the service request is pi _i Expressed as: pi _i ＝{s _j |r _j ≥||p _i -p _j || ₂ ,s _j E.g., S }, wherein p _i Coordinates, p, representing the service request _j Representing edge servers s _j Coordinate of (a), r _j Representing edge servers s _j The radius of coverage of.

S2.2, for each service request q _i From it pi _i One edge server is randomly selected from the set to serve as a target edge server for processing the service request, and a plurality of micro service requests distributed to the same edge server are queued inside the edge server.

Further, as shown in fig. 3, the pointer network model is divided into two parts, an actor network and a critic network, wherein the actor network is used for deciding the execution sequence of the service request and consists of an encoder and a decoder, wherein the encoder and the decoder are respectively a Recurrent Neural Network (RNN), and the critic network is used for assisting the actor network to train and consists of a recurrent neural network and a Deep Neural Network (DNN). The step S3 comprises the following steps:

step S3.1, the sequence of service requests being queued is input into the pointer network model as input data.

Step S3.2, based on step S3.1, the input data is finally decided by the actor network to obtain an execution sequence of the service request, and the specific decision process is as follows:

s3.2_1, an Embedding operation is carried out on the input sequence to convert the input sequence into an intermediate vector representation form.

S3.2_2, the intermediate vector passes through a Recurrent Neural Network (RNN) consisting of a plurality of Long Short Term Memory (LSTM) networks to obtain the state of each hidden layer of the corresponding encoder, and the number of the hidden layers of the encoder and the decoder depends on the length of an input sequence.

S3.2_3 takes the encoder state as the input of the decoder, and the encoder state is obtained through the attention mechanism of the pointer network model:

wherein e _j Representing the state of the encoder layer j hidden layer, d _i Representing the decoder ith hidden layer state; then, performing sof tmax operation on the product to obtain: p (C) _i |C ₁ ，...，C _i-1 ，Q)＝soffmax(u ⁱ ). Wherein, p (C) _i |C ₁ ，...，C _i-1 Q) denotes the probability vector of the input sequence in the i-th hidden layer of the decoder, whose dimensions are equal to the length of the input sequence, meaning the probability that the decoder will pick the respective service request as the output of this layer in the i-th hidden layer.

And S3.2_4, each hidden layer selects a service request with the highest probability as the output of the layer, and as the number of the hidden layers of the decoder is equal to the length of the input sequence, a new sequence with the length equal to that of the input sequence can be finally formed as an output sequence.

S3.2_5 the edge server processes the service requests in the order of execution defined by this output sequence.

Further, in step S4, the pointer network model is trained in a reinforcement learning manner while making a decision, specifically including the following substeps:

s4.1, firstly, performing functional expression on the optimization target to ensure that the optimization target comprises the optimization of three aspects of resource utilization rate, running time and average waiting time:

s4.1_1: the resource utilization optimization objective is expressed as:

wherein the resource utilization rate is used for representing the average use efficiency condition, use (C) of the edge server in the process of processing the service request _j ，O _j ，B _j ，M _j ) Respectively representing the average CPU, I/0, bandwidth and memory utilization rate of the jth edge server when processing the whole input sequence.

S4.1_2: the runtime optimization goal is expressed as:

where runtime represents the total time required for the edge server to execute a fully queued service request.

S4.1_3: the latency optimization objective is expressed as:

wherein the waiting time represents the time from the service request arriving at the edge server to the completion of the processing, and is represented by W _i Indicating the waiting time for service request i.

And S4.2, on the basis of the step S4.1, obtaining a reward function of reinforcement learning by using a weighted average mode, wherein the reward function is as follows: reward = α reward ₁ +β*reward ₂ +γ*reward ₃ 。

S4.3, training the actor network on the basis of the step S4.2, and selecting a reinforcement learning mode based on strategy gradient to optimize the pointer networkNetwork parameters, defining parameters in pointer network as theta, using reward (C) ^Q Q) represents a known set of service requests Q, policy C is taken ^Q Time rewarded function value. The desired definition of the reward function value is as follows:

where J (θ | Q) represents the optimization goal of the pointer network: the expectation of minimizing reward is that the implementation process of the strategy gradient is as follows:

wherein b (Q) is a policy C associated with the taking ^Q The effect of the independent baseline function is to reduce the variance of the gradient by estimating the value of reward. In the embodiment of the invention, the predicted value of the critic network is selected as the value of b (Q), and the predicted value is used for assisting the training of the actor network.

And S4.4, training the critic network on the basis of the step S4.3, predicting the result of the critic network on the basis of the decision made by the actor network, and taking the predicted value as the value of the baseline function in the actor network training process so as to assist the actor network in updating the parameters of the network. Preferably, the training is performed in a random gradient descent mode:

predicted reward values for critic networks;

actual reward value determined for the actor network.

The embodiment of the invention has the following advantages:

1) And the data characteristics of the edge server and the service request under the real environment are extracted, so that the model has universality and is closer to the actual situation.

2) Multiple optimization objectives are fully considered, including resource utilization, run time, latency. And optimization is performed from multiple dimensions, so that the service quality is improved.

3) Different from the traditional heuristic algorithm which needs long-time iteration, the pointer network model based on artificial intelligence can make a quick decision and meet the requirement of time delay sensitivity in the edge computing environment.

4) The model can be continuously trained while carrying out intelligent decision making, so that the problem of reduced model effect caused by fluctuation of service request data along with time is solved.

In a specific embodiment, a service request scheduling method facing edge computing is adopted to make a decision and verify the effect. Firstly, building an edge computing simulation environment, and generating edge server data and service request data according to real geographical position information provided by an EUA data set. Then, the google cluster trace data set was used as simulation data of the service request in this case, for a total of 40 ten thousand pieces of data. And training the pointer network model by using 30 ten thousand pieces of simulation data as a training set and 10 ten thousand pieces of simulation data as a test set.

Applying the pointer network model to an actual scheduling process, designing two groups of experiments, wherein the first group of experiments fixedly selects 5 edge servers, and the number of service requests is changed to 300-350, 350-400, 400-450 and 450-500 respectively; the second set of experiments fixes the number of service requests to 500 and changes the number of edge servers to 5, 7, 9, 11, 13, 15, respectively.

And averagely and randomly distributing the service requests in the coverage range to the edge servers which can process according to the coverage range of the edge servers.

And in the edge server, scheduling the service request sequence in queue by applying a scheduling strategy, calculating three indexes of the resource utilization rate of the edge server, the service request running time and the average service request waiting actual, and finally verifying the effectiveness of the experimental scheme.

When the effectiveness of the experimental scheme is verified, the effectiveness of the scheme is verified by comparing the scheduling strategy (named as RLPNet) with the scheduling strategies of a plurality of benchmarks. The reference method for comparison includes: the method comprises a first-come first-serve scheduling algorithm (FCFS), a high-response-ratio first-serve scheduling algorithm (HRRN), an online reinforcement learning scheduling algorithm (OnPQ) based on Q-learning and an online delay-sensitive task scheduling algorithm (OnDisc).

As shown in fig. 4 and 5, the experimental results show that the scheduling policy RLPNet represented by the present invention is superior to the other four comparison methods in terms of resource utilization, runtime, and average latency.

In addition, an embodiment of the present invention further provides a computer-readable storage medium, where at least one instruction is stored, and the instruction is loaded and executed by a processor to implement all the method steps in the method embodiment of the present invention.

In addition, an embodiment of the present invention further provides an apparatus, where the apparatus includes: at least one processor; and a memory coupled to the at least one processor, the memory containing instructions stored therein which when loaded and executed by the processor implement all of the method steps in a method embodiment of the invention.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable storage media, which may include computer readable storage media (or non-transitory media) and communication media (or transitory media).

The above embodiments are only specific embodiments of the present invention, but the scope of the embodiments of the present invention is not limited thereto, and any person skilled in the art can easily think of various equivalent modifications or substitutions within the technical scope of the embodiments of the present invention, and these modifications or substitutions should be covered by the scope of the embodiments of the present invention. Therefore, the protection scope of the embodiments of the present invention shall be subject to the protection scope of the claims.

Claims

1. An edge computing service request scheduling method, comprising the steps of:

2. The method as claimed in claim 1, wherein the optimization of the actor network based on edge server resource utilization, service request runtime, and service request latency comprises:

defining a reward function of reinforcement learning according to the resource utilization condition of the edge server, the service request running time and the service request waiting time, and training the actor network based on the reward function;

and taking the predicted value of the critic network as the value of a baseline function in the training process of the actor network so as to optimize the parameters of the actor network.

3. The method for scheduling edge computing service request according to claim 2, wherein the step of defining the reinforcement learning reward function according to the edge server resource utilization, the service request running time and the service request waiting time comprises the steps of:

based on reward = α reward ₁ +β*reward ₂ +γ*reward ₃ Determining the reward function reward, wherein,

the above-mentioned

The above-mentioned

The above-mentioned

Alpha, beta, gamma are weighting coefficients, C _j Is the CPU capacity of the edge server, O _j Is the I/O capacity of the edge server, B _j For the bandwidth capacity of the edge server, M _j Is the memory capacity of the edge server, m is the total number of edge servers, W _i Waiting time for service request i, T _ map _j The total time required for the edge server j to run all service requests.

4. The method of claim 2, wherein the training the actor network based on the reward function comprises:

defining a trained strategy gradient, the strategy gradient being executed by

Wherein, θ is a parameter of the actor network,

5. The method for scheduling edge computing service requests as claimed in claim 2, comprising the steps of: training the critic network by adopting a random gradient descent mode, wherein the random gradient descent mode is as follows:

wherein the content of the first and second substances,

in order to be a predicted reward value,

6. The method as claimed in claim 1, wherein the actor network comprises an encoder and a decoder, and the encoder and the decoder each comprise a recurrent neural network composed of a plurality of long-short term memory networks, and the actor network, when deciding the execution sequence of the service requests, comprises the steps of:

based on the second intermediate vector, acquiring the probability that a decoder selects each service request in a certain hidden layer as the output of the layer;

7. The method as claimed in claim 1, wherein before said deciding the execution order of the plurality of service requests queued in the edge server, further comprising:

8. The method for scheduling edge computing service requests according to claim 7, wherein the step of obtaining, for each service request, a set of edge servers capable of receiving the service request comprises the steps of:

according to pi _i ＝{s _j |r _j ≥||p _i -p _j || ₂ ,s _j S) obtaining the set of edge servers that can receive the service request, where pi _i For the set of all edge servers that can receive the service request, p _i Coordinates of the service request; p is a radical of _j As coordinates of the edge server, r _j Coverage radius of edge server, s _j Is the jth edge server and S is the set of all edge servers.

9. A computer-readable storage medium having stored therein at least one instruction which is loaded and executed by a processor to implement the method of any one of claims 1-8.

10. An apparatus, characterized in that the apparatus comprises: at least one processor; and a memory coupled to the at least one processor, the memory containing instructions stored therein that when loaded and executed by the processor, perform the method of any of claims 1-8.