CN114281718A

CN114281718A - Industrial Internet edge service cache decision method and system

Info

Publication number: CN114281718A
Application number: CN202111556973.XA
Authority: CN
Inventors: 叶可江; 唐璐婕; 须成忠
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2021-12-18
Filing date: 2021-12-18
Publication date: 2022-04-05

Abstract

The invention relates to the technical field of industrial Internet, in particular to a method and a system for cache decision of industrial Internet edge service; according to the industrial Internet edge service cache decision method, the perception capability of deep learning, the decision capability of reinforcement learning and the rapid environment learning capability of meta learning are combined through the service cache method based on deep meta reinforcement learning, and the fused framework can rapidly and flexibly obtain the optimal cache strategy from a dynamic environment. The corresponding system also has the same technical effect.

Description

Industrial Internet edge service cache decision method and system

Technical Field

The invention relates to the technical field of industrial internet, in particular to a method and a system for cache decision of industrial internet edge service.

Background

The industrial internet integrates various sensors and controllers with sensing and control capabilities into an industrial production process by integrating advanced technologies such as 5G communication, artificial intelligence and the like, so that the production process of products is optimized, the cost is reduced, and the productivity is improved. In a traditional cloud computing mode, due to the characteristic of centralized deployment, a computing node is usually far away from an intelligent terminal, and the requirements of the industrial field on high real-time performance and low delay are difficult to meet. The edge computing sinks resources such as computing, storage and network to the edge of the industrial network, so that equipment requests can be responded more conveniently, key requirements such as intelligent access, real-time communication and privacy protection under the industrial internet environment are met, and intelligent green communication is realized.

Edge service caching is a key issue in practical applications of edge computing. Since the various resources (e.g., bandwidth resources, computing resources, etc.) each network edge server provides to the user are fixed and limited, only a few kinds of service requests can be provided and run by the edge nodes at the same time. The reasonable mobile edge cache strategy can effectively improve the performance of the whole network and improve the service quality.

In recent years, a great deal of research has emerged aimed at the edge computation caching problem. The scholars propose algorithms such as a cache decision strategy based on content popularity, namely, content caching is carried out according to the preference of a user to certain specific content, the content with higher user request probability is cached in a base station preferentially, and the experience quality of the user can be effectively improved. Or learning based edge cache strategy is adopted, machine learning or deep learning technology is adopted, a machine learns and predicts the preference degree of a user or the change trend of the content popularity in the network according to a large amount of user historical data, and the cache strategy is adjusted according to the learning result.

The prior art is mainly directed to the research of the mobile edge calculation caching problem. Most work has focused on improving some caching strategies on legacy networks based on the new characteristics of the mobile edge computing network. There is also a portion of work exploring new caching schemes, such as caching strategies based on user preferences, based on learning, or multi-edge node collaboration. But because content popularity, user preferences, etc. are constantly changing over time and are unpredictable. Meanwhile, in the case of a rapidly changing industrial application scenario, each time the environment changes, the caching decision of the service has to be re-adjusted through recalculation, otherwise, higher service delay and cost are generated. Although some good effects are achieved in the aspect of edge cache decision by introducing intelligent algorithms such as deep learning and reinforcement learning, the challenges of low learning speed, failure of original network parameters when the model environment changes and the like still exist. The prior art has the defects.

Disclosure of Invention

In order to solve at least one technical problem, embodiments of the present invention provide a method and a system for deciding an edge service cache of an industrial internet, which solve an optimal edge service cache policy through deep meta-reinforcement learning, so as to achieve the purpose of minimizing service access delay.

According to an embodiment of the present invention, a method for deciding an industrial internet edge service cache is provided, which includes the following steps:

s1, performing mathematical modeling on an industrial Internet system based on service access time delay, and establishing a system model;

s2, establishing an optimization target for achieving the minimized service access time delay based on a system model;

s3, constructing a depth element reinforcement learning framework capable of realizing the optimization target based on a depth certainty strategy gradient algorithm;

the industrial Internet system comprises a plurality of devices, a plurality of edge servers and a cloud server; the device is connected to a cloud server through the edge server; and a plurality of edge servers are in communication connection with each other.

The invention also provides an industrial internet edge service cache decision system adopting any one of the methods, which comprises the following steps: the system comprises a mathematical modeling module, a target establishing module and a service cache decision module;

the mathematical modeling module performs mathematical modeling on the industrial Internet system based on service access delay to establish a system model;

the target establishing module establishes an optimization target for achieving the minimized service access time delay based on a system model;

the service cache decision module constructs a depth meta reinforcement learning framework capable of realizing the optimization target based on a depth certainty strategy gradient algorithm;

the industrial Internet system also comprises a plurality of devices, a plurality of edge servers and a cloud server; the device is connected to a cloud server through the edge server; and a plurality of edge servers are in communication connection with each other.

The industrial Internet edge service cache decision method and the system can combine the perception capability of deep learning, the decision capability of reinforcement learning and the fast environment learning capability of meta learning based on the service cache method of deep meta reinforcement learning, and the fused framework can quickly and flexibly obtain the optimal cache strategy from a dynamic environment.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow chart of an industrial Internet edge service cache decision method according to the present invention;

FIG. 2 is a schematic diagram of an industrial Internet edge service cache decision method according to the present invention;

FIG. 3 is a schematic diagram of a side cloud cooperative service structure of an industrial Internet edge service caching decision system according to the present invention;

fig. 4 is a schematic diagram of a deep meta reinforcement learning framework adopted in the industrial internet edge service cache decision method of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

As shown in fig. 3, in order to facilitate modeling (building a mathematical model) of the industrial internet system, the present invention is implemented in an industrial internet scenario. Including a large number of industrial and sensor devices, a plurality of edge servers, and a cloud server. We assume that all edge servers within a region can communicate internally through various network connections, such as a local area network. In this way, the industrial device can offload tasks to any one of the edge servers in the reachable area, rather than relying only on the nearest direct connection. The edge server will create an operating environment for a particular service through virtualization technology (e.g., virtual machine, container technology).

Due to the particularity of the application tasks such as video stream analysis and augmented reality, the execution of the tasks requires not only the allocation of user input data and computing resources, but also the pre-caching of a large amount of data. The edge server can only use the services cached on it to run the corresponding tasks. Minimizing service access time is closely related to edge service caching.

And defining before constructing a mathematical model, wherein the service access delay is defined to be composed of service communication delay and service execution delay, and the service communication delay comprises the time for sending a request to a nearest edge server by the equipment and sending a service result back to the mobile equipment after the request is completed. The service execution latency includes a completion time of running the service on the edge server or the cloud server.

Referring to fig. 1to 3, according to an embodiment of the present invention, a method for deciding an industrial internet edge service cache is provided, which includes the following steps:

s1, performing mathematical modeling on an industrial Internet system based on service access delay (including service communication delay and service execution delay), and establishing a system model.

In particular implementation, to facilitate modeling, time is discretized into time slices having the same interval

An edge server set is defined as

Wherein s ═ (slat)_s,slon_s,scr_s,sc_s,sr_s) Respectively, the latitude, longitude, and coverage of the edge server s, and its computing and storage capabilities. sc (sc)_|S+1| represents the computing power of the cloud server.

Service set requested by various industrial equipment and sensor equipment

Wherein k is (kc)_k,kr_k) Respectively representing the computational resources and storage resources required by service k. the sensor device requests arriving in the t time slot are denoted as

Wherein u ═ is (ula)_u,ulon_u,uk_u,uw_u) Respectively indicating the latitude and longitude of the device and the type of service requested by the industrial device, the size of the input data.

Performing service caching on the edge server, and setting a binary variable x_k,s(t) is e {0,1} indicates whether service k is cached in edge server s at time t, when x_k,s(t) ═ 1 represents that service k is cached to the edgeAn edge server s. We define γ_t＝{x_k,s(t) | K ∈ K, S ∈ S } is the service caching decision within the time slot t. Furthermore, it is assumed that the storage capacity on the edge server is limited, and therefore, the size of a service placed on the edge server s cannot be larger than its storage capacity during any time slot t:

the computational power allocated per service in the edge server s is represented as:

and S2, establishing an optimization target for achieving the minimized service access time delay based on the system model.

In specific implementation, the optimization target based on the system model is set as:

Further, in step S1, the service access latency includes a service communication latency and a service execution latency.

Specifically, the service access delay generated by the industrial device requesting the service includes the following three cases:

if the industrial equipment is not in the service range of any edge server or the request service is not cached in any edge server, all the services are executed on the cloud server, and the access delay of the services is as follows:

wherein the content of the first and second substances,

and the time required for data transmission to the cloud server and result return from the cloud server is represented.

Indicating the time required for the requested service k to run in the cloud service.

The industrial equipment is in the service range of the edge server, and the edge server s closest to the equipment receives the equipment request service k, if the edge server caches the service, the industrial equipment is executed on the edge server s. The access delay of the service is:

wherein

The industrial equipment is in the service range of the edge server, the edge server s closest to the equipment receives the request service k of the user, and if the edge server does not cache the service, the industrial equipment is unloaded to the adjacent edge server w caching the service k for execution. The access delay of the service is:

wherein

B_s,wRepresenting the bandwidth size between edge server s and edge server w.

The service communication delay comprises the time when the equipment sends a request to the nearest edge server and the time when the edge server sends a service result after the request is completed back to the equipment;

the service execution latency includes a completion time of running the service on the edge server or the cloud server.

Further, the device comprises an industrial device and a sensor device.

Further, the deep meta reinforcement learning framework in the step S3 includes outputting an inner layer model that minimizes service access latency; and the outer layer model is used for improving the applicable environment of the inner layer model.

In order to minimize service access delay, the inner layer model provided by the invention designs an edge service caching algorithm based on a depth deterministic policy gradient algorithm (DDPG) to perform service caching decision. Deep reinforcement learning models typically require the problem to be defined as a markov decision process, including agents, environments, states, actions, and rewards. The basic idea is as follows: the agent observes the environment at time t to obtain the current state. And then, taking corresponding action to interact with the environment in the current state, giving a reward after the environment is accepted, and entering the next state. An agent maximizes the total of its rewards by continuously interacting with the environment. Therefore, we describe the service optimization caching problem as a markov decision process. It is composed of three parts of state space S, action space A and reward function R, and is defined as follows:

state space: s_tEs represents the state observed from the mobile edge computing system at time slot t. s_t＝(U_t,B_t,Υ_t-1) Device requests arriving at each time slot t, network conditions, and edge service caching policies, respectively.

An action space: in each time slot t, the edge server needs to make a service caching decision based on the current state. a is_t＝Υ_t，a_tIs 2 in size^K×S。

The reward function: the optimization goal of edge service caching is to minimize the average service access latency, so the designed reward function is as follows:

further, in the DDPG algorithm of the inner layer model, an Actor network and a Critic network both use a deep neural network to establish an approximate function; the inner model generates the determined behavior directly from the policies of the Actor network without sampling in terms of probability distributions of behavior.

The inner layer model adds a noise function on the basis of deterministic behavior in the learning phase of the deep neural network. To enable exploration within a small range around deterministic behavior.

In addition, the inner layer model backups a set of parameters for calculating expected values of behavior values for the Actor network and the Critic network respectively; to more stably promote the strategic guidance level of Critic. The Actor network and Critic network corresponding to the other set of main parameters are used for generating actual interactive behaviors and calculating corresponding strategy gradients, and the set of main parameters are updated once every learning. The purpose of this two-parameter setting is to reduce the occurrence of misconvergence due to the guidance of the approximation data.

Wherein the primary parameters are updated based on a learning process of the deep neural network; the backup parameters are updated less frequently than the primary parameters.

Generating an Actor Online policy network when the Actor network uses the primary parameter; is responsible for the iterative update of the strategy network parameter theta according to the current state s_tExploration of formation orUnexplored specific behavior a_t。

Generating an Actor Target policy network when the Actor network uses the backup parameters; subsequent states s given by the environment_t+1Generating a used for predictive value_t+1Network parameter θ^*Periodically copied from theta.

Generating a Critic Online Q network when the Critic network uses the primary parameters; is responsible for iterative update of the value network parameter omega and calculation of the state s_tAnd the generated behavior a_tThe corresponding behavioral value.

Generating a Critic Target Q network when the Critic network uses the backup parameters; according to the subsequent state s_t+1,a_t+1Generating to calculate the target value Q(s)_t,a_tQ(s) of ω)_t+1,a_t+1,ω^*) (ii) a Network parameter omega^*Periodically copied from ω.

In a specific implementation, the network parameters of the DDPG algorithm use soft updates, i.e. each time the parameters are updated only partially, where τ is an update coefficient, and generally takes a smaller value:

ω^*←τω+(1-τ)ω^*；

θ^*←τθ+(1-τ)θ^*。

meanwhile, in order to increase some randomness and coverage of learning in the learning process, the DDPG selects the action a_tWill add a certain noise N, i.e. action a of the final interaction with the environment_tThe expression of (1) is:

for the Critic Online Q network, the loss function is defined as:

for an Actor Online policy network, the loss function is defined as:

the edge service caching algorithm flow based on the DDPG algorithm is as follows:

inputting an algorithm: an Actor Online strategy network, an Actor Target strategy network, a Critic Online Q network and a Critic Target Q network, wherein the parameters are theta and theta respectively^*,ω,ω^*Meta policy parameters

And

the method comprises the following steps of attenuation factor gamma, soft updating coefficient tau, experience pool D, batch gradient descending sample number n and target Q network parameter updating frequency C. The maximum number of iterations M. Random noise function

And (3) outputting an algorithm: optimal edge service cache decision y_t。

Step 1: initializing network parameters

θ^*＝θ，ω^*The experience pool D is initialized.

Step 2: for epoch from 1to M.

And step 3: initialization state s_t。

And 4, step 4: for time sol T from 1to T, iteration is performed.

And 5: on Online policy network at Actor based on state s_tGet an action

Step 6: execution of a_tObserving the prize R earned_t(s_t,a_t) And obtaining a new state s_t+1。

And 7: handle [ s ]_t,a_t,R_t(s_t,a_t),s_t+1]And storing the experience into an experience pool D.

And 8: randomly taking n samples [ s ] from the experience pool D_j,a_j,R_j(s_j,a_j),s_j+1]，j＝1,2,...,n。

And step 9: calculating the current Q value

Step 10: and calculating a loss function J (omega), and updating the Critic Online Q network parameter omega through the gradient back propagation of the neural network.

Step 11: and calculating a loss function J (theta), and updating the Actor Online strategy network parameter theta through the gradient back propagation of the neural network.

Step 12: if t% C is 0, updating Critic Target Q network and Actor Target strategy network parameters:

ω^*←τω+(1-τ)ω^*；

θ^*←τθ+(1-τ)θ^*。

step 13: and if T < ═ T, entering the next time gap and returning to the step 5.

Step 14: if the epoch is M, the iteration ends. Outputting optimal edge service cache decisions γ_t。

Further, the outer layer model inputs cache decisions and execution results in different environments into the inner layer model for training; and in each training, the inner layer model randomly selects a training sample in one environment for learning, and randomly selects another environment for iteration after learning, so that the meta-cache strategy is trained, and the environment-applicable capability of the inner layer model is improved.

The goal of the training sample learning described above is to ensure that the parameters of the model training are not too close to the optimal solution in a particular environment. The parameters trained in this way are used as initial parameters for the inner model.

The meta cache strategy algorithm flow is as follows:

inputting an algorithm: and E, environment.

And (3) outputting an algorithm: meta-policy parameters

And

step 1: initializing an Actor Online strategy network parameter theta, a criticc Online Q network parameter omega and an experience pool D.

Step 2: for epoch from 1to M.

And step 3: random selection environment, initialization state s_t。

And 4, step 4: for time sol T from 1to T, iteration is performed.

And 5: on Online policy network at Actor based on state s_tObtain an action a_t。

Step 6: execution of a_tObtaining the reward R in that circumstance_t(s_t,a_t) And obtaining a new state s_t+1。

And 8: randomly taking n samples [ s ] from the experience pool D_j,a_j,R_j(s_j,a_j),s_j+1]And calculating the Q value.

And step 9: and updating the Critic Online Q network parameter omega and the Actor Online strategy network parameter theta.

Step 10: and if T < ═ T, entering the next time gap and returning to the step 5.

Step 11: if the epoch is M, the iteration is ended and the output is output

the mathematical modeling module is used for carrying out mathematical modeling on the industrial Internet system based on service access delay (including service communication delay and service execution delay) and establishing a system model;

Further, the service cache decision module adopts a deep neural network and deterministic behavior strategy fusion (DDPG) to construct a deep meta-reinforcement learning framework; the deep meta reinforcement learning framework comprises an inner layer model for outputting minimized service access delay; and the outer layer model is used for improving the applicable environment of the inner layer model.

Further, the inner layer model backups a set of parameters for calculating an expected value of a behavior value for each of the Actor network and the Critic network; updating the main parameters based on the learning process of the deep neural network; the updating frequency of the backup parameters is lower than that of the main parameters;

generating an Actor Online policy network when the Actor network uses the primary parameter;

generating an Actor Target policy network when the Actor network uses the backup parameters;

generating a Critic Online Q network when the Critic network uses the primary parameters;

generating a Critic Target Q network when the Critic network uses the backup parameters;

the outer layer model inputs cache decisions and execution results in different environments into the inner layer model for training; and in each training, the inner layer model randomly selects a training sample in one environment for learning, and randomly selects another environment for iteration after learning so as to improve the capability of the inner layer model in the applicable environment.

According to the method, a service cache strategy is trained by a depth element reinforcement learning framework with edge cloud cooperation, an outer layer model is used for training an element cache strategy, and the environment adaptability of an inner layer model is improved. The inner layer model designs an edge service caching algorithm based on DDPG, service caching decision is carried out, and service caching time delay is minimized. The technical problem that when the edge environment changes, original parameters of the edge environment are completely invalid, so that a large amount of training data is needed to train from beginning to beginning, and the learning efficiency is low is effectively solved.

Meanwhile, the invention provides an industrial Internet edge service cache decision method based on deep meta reinforcement learning. By combining the perception capability of deep learning, the decision-making capability of reinforcement learning and the fast environmental learning capability of meta learning, the optimal cache strategy can be quickly and flexibly obtained from a dynamic environment.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. An industrial Internet edge service cache decision method is characterized by comprising the following steps:

2. The method according to claim 1, wherein in the step S1, the service access latency includes service communication latency and service execution latency;

3. The method of claim 2, wherein the devices comprise industrial devices and sensor devices.

4. The method of claim 3, wherein the deep meta reinforcement learning framework comprises outputting an inner model that minimizes service access latency; and the outer layer model is used for improving the applicable environment of the inner layer model.

5. The method according to claim 4, wherein the Actor network and Critic network of the inner layer model both use a deep neural network to establish an approximation function; the inner model generates the determined behavior directly from a policy of the Actor network; the inner layer model adds a noise function on the basis of deterministic behavior in the learning phase of the deep neural network.

6. The method according to claim 5, wherein the inner layer model backups a set of parameters for calculating expected values of behavior values for each of the Actor network and the Critic network; updating the main parameters based on the learning process of the deep neural network; the updating frequency of the backup parameters is lower than that of the main parameters;

and generating a Critic Target Q network when the Critic network uses the backup parameters.

7. The method of claim 6, wherein the outer model inputs cache decisions and execution results in different environments into the inner model for training; and in each training, the inner layer model randomly selects a training sample in one environment for learning, and randomly selects another environment for iteration after learning so as to improve the capability of the inner layer model in the applicable environment.

8. An industrial internet edge service caching decision system using the method of any one of claims 1to 7, comprising: the system comprises a mathematical modeling module, a target establishing module and a service cache decision module; the system is characterized in that the mathematical modeling module performs mathematical modeling on the industrial Internet system based on service access time delay to establish a system model;

9. The system of claim 8, wherein the deep meta-reinforcement learning framework comprises an inner model that outputs a minimized service access latency; and the outer layer model is used for improving the applicable environment of the inner layer model.

10. The system according to claim 9, wherein the inner model backups a set of parameters for calculating an expected value of a behavior value for each of the Actor network and the Critic network; updating the main parameters based on the learning process of the deep neural network; the updating frequency of the backup parameters is lower than that of the main parameters;