CN114328291A

CN114328291A - Industrial Internet edge service cache decision method and system

Info

Publication number: CN114328291A
Application number: CN202111556974.4A
Authority: CN
Inventors: 叶可江; 唐璐婕; 须成忠
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2021-12-18
Filing date: 2021-12-18
Publication date: 2022-04-12

Abstract

The invention relates to the technical field of industrial Internet, in particular to a method and a system for cache decision of industrial Internet edge service; the industrial Internet edge service cache decision method in the embodiment of the invention can calculate the optimal solution for the edge cache strategy mathematical model through the algorithm constructed based on the distributed deep reinforcement learning method, and can solve the optimization problem of the mathematical model of the system. The method is based on the establishment of a network mathematical model and the determination of an optimal target, and based on the combination of reinforcement learning and deep learning technologies, a machine is used for learning and predicting the preference of a user and the change trend of the popularity of contents in the network according to a large amount of historical data of the user, and a service cache strategy is adjusted according to the learning result. The optimal solution of the service caching decision can be effectively given. The corresponding system also has the same technical effect.

Description

Industrial Internet edge service cache decision method and system

Technical Field

The invention relates to the technical field of industrial internet, in particular to a method and a system for cache decision of industrial internet edge service.

Background

With the increasing number of industrial devices accessing the internet, it is difficult to satisfy the requirements of industrial applications in terms of time delay and economy by only relying on the traditional cloud computing mode. Edge computing is used as a new computing paradigm, and can relieve the physical resource bottleneck of intelligent equipment. In an edge computing system, traffic load and quality of service may be improved by service caching. However, how to flexibly configure the edge service cache within the limited edge storage capacity to improve the system performance is extremely challenging.

The prior art is mainly directed to the research of the mobile edge calculation caching problem. Most work has focused on improving some caching strategies on legacy networks based on the new characteristics of the mobile edge computing network. There is also a portion of work exploring new caching schemes, such as caching strategies based on user preferences, based on learning, or multi-edge node collaboration. But because content popularity, user preferences, etc. are constantly changing over time and are unpredictable. Meanwhile, the service cache problem is an integer linear programming problem and a problem that cannot be solved within polynomial time, and the traditional optimization method is difficult to effectively realize the result of optimizing the service cache. The prior art has the defects.

Disclosure of Invention

In order to solve at least one technical problem, embodiments of the present invention provide a method and a system for deciding an edge service cache of an industrial internet, which solve an optimal edge service cache policy through a distributed deep reinforcement learning algorithm, so as to achieve the purpose of minimizing service access delay and energy consumption.

According to an embodiment of the present invention, a method for deciding an industrial internet edge service cache is provided, which includes the following steps:

s1, performing mathematical modeling on an industrial Internet system based on the fact that a task corresponding to a service can be executed only when corresponding service data are cached in a server; the cloud server of the system model caches data required by all services;

s2, establishing a mathematical model for service access time delay in the opposite side cloud coordination system;

s3, performing mathematical modeling on the energy consumption of the industrial Internet system according to the power of data transmission between the edge servers and the cloud server and the computing power of the edge servers and the cloud server;

s4, establishing an optimization target for achieving the minimized service access time delay and the minimized energy consumption based on the system model, the time delay model and the energy consumption model;

and S5, constructing an algorithm capable of realizing the optimization target based on a distributed deep reinforcement learning method.

The invention also provides an industrial internet edge service cache decision system adopting the method, which comprises the following steps: the system comprises a mathematical modeling module and a service cache decision module;

the mathematical modeling module performs mathematical modeling on the industrial Internet system based on the task that the corresponding service can be executed only when corresponding service data is cached in the server; the cloud server of the system model caches data required by all services;

establishing a mathematical model for service access delay in the edge cloud cooperative system;

performing mathematical modeling on the energy consumption of the industrial Internet system according to the power of data transmission between the edge servers and the cloud servers and the computing power of the edge servers and the cloud servers;

establishing an optimization target for achieving the minimized service access delay and the minimized energy consumption based on the system model, the delay model and the energy consumption model;

and the service cache decision module constructs an algorithm capable of realizing the optimization goal based on a distributed deep reinforcement learning method.

According to the method and the system for the industrial Internet edge service cache decision, the optimal solution can be calculated for the edge cache strategy mathematical model through the algorithm constructed based on the distributed deep reinforcement learning method, and part of the problems can be solved. The method is characterized in that a machine learns and predicts the preference of a user and the change trend of the content popularity in the network according to a large amount of user historical data by determining the digital modeling and the optimal solution target of an industrial Internet system and based on the combination of reinforcement learning and deep learning technologies, and adjusts a service cache strategy according to the learning result. The optimal solution of the service caching decision can be effectively given.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow chart of an industrial Internet edge service cache decision method according to the present invention;

FIG. 2 is a schematic diagram of an industrial Internet edge service cache decision method according to the present invention;

fig. 3 is a schematic diagram of a side cloud cooperative service structure of the industrial internet edge service caching decision system according to the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

As shown in FIG. 3, to facilitate modeling (building a mathematical model) an industrial Internet system, the present invention discretizes time into evenly distributed time slices

The interval of each time slice is Δ t. Consider that the industrial Internet system consists of N edge servers, denoted as

Each edge server may provide data analysis and processing services for sensor devices and industrial devices. The secondary computing and storage resources on the edge server are limited compared to the cloud server, and the computing and storage capabilities of the edge server n are denoted as

And

let F^cloudRepresenting the computing power of the cloud server.

Referring to fig. 1 to 3, according to an embodiment of the present invention, a method for deciding an industrial internet edge service cache is provided, which includes the following steps:

s1, performing mathematical modeling on an industrial Internet system based on the fact that a task corresponding to a service can be executed only when corresponding service data are cached in a server; the cloud server of the system model caches data required by all services.

When mathematical modeling is implemented, a certain type of task is performed on an edge server, and a corresponding service is placed first. A service is an abstraction of an application, and to run a particular service, an edge server should cache a relevant data, including the software and databases needed by the application. And the corresponding tasks of the service can be executed only if the corresponding service data is cached in the server. In the modeling process of the invention, the cloud server is assumed to cache data required by all services.

In an industrial Internet system, the generated service set is represented as

The invention assumes that different services have different data volumes, require different computational and memory resources to process, respectively using f_lAnd m_lIs shown in which

Each edge server may cache one or more services. The invention defines a binary variable x_l,n(t) is equal to {0,1} and represents whether the service is cached on the edge server, the service caching policy is

If service l is cached on edge server n at time t, x_l,n(t) 1, whereas x_l,n(t) is 0. Let p be_l,nRepresenting the computing power allocated to service l by edge server n, since the service cache is limited by the edge server memory space and computing power:

in order to analyze the access delay of the service, it is assumed that at time t, the number of requests of the industrial device for the service l from the edge server n is λ_l,n(t) of (d). Due to the diversity of the data collected by the devices and the dynamic nature of the requested service, lambda_l,n(t) is dynamically variable. It should be noted that when the requested service is not cached on the edge server, it is necessary to do soAnd scheduling the line workload, namely scheduling the arrived service request to an edge server caching the service or to a cloud server for execution. Let mu let_l,n(t)∈[0,1]Representing the proportion of the load that service i performs on edge server n. Mu.s_l,c(t) represents the proportion of service l executed on the cloud server, μ_l,n,μ_l,cThe values of (c) can be set using a number of different strategies, but the following conditions are generally required:

at time t, the total number of requests for service l in the industrial internet system is represented as:

thus, the total workload and data size for an edge server n in the system to run service l are:

F_l,n(t)＝μ_l,nf_lλ_l(t)

M_l,n(t)＝μ_l,nm_lλ_l(t)；

the final implementation uses a mathematical model to embody a heterogeneous edge cloud collaborative offload framework, as shown in fig. 3, which includes a large number of industrial and sensor devices, a plurality of edge servers and a cloud server, the industrial and sensor devices communicating with the edge server through a wireless channel, and the edge server connecting to a remote cloud through a wired link. The device tasks may be offloaded to an edge server or a cloud server for execution. For an edge server without a cache service or not providing enough computing power, the corresponding task can be offloaded to an edge server or a cloud server with a cache service existing nearby for execution. The cooperation between the edge nodes can fully utilize the resource capacity of the heterogeneous edge server, and the problem of mismatching of the resource capacity of a single edge node is solved.

And S2, establishing a mathematical model for service access time delay in the opposite side cloud coordination system.

In particular implementation, the computation latency for executing service l on edge server n is represented as:

the computation delay for executing service l in the cloud server is represented as:

because the industrial equipment is close to the edge server, the transmission delay between the equipment and the edge server is ignored, only the workflow transmission delay between adjacent edge servers is considered, and the data transmission rate between the edge servers is made to be r_eThen, the data transmission delay between the edge servers is expressed as:

if the task is unloaded to a remote cloud server for execution, the data transmission rate of the core network is set as r_cThe transmission delay of the task to be unloaded to the cloud end for execution is represented as:

the access delay for running service l includes the transmission delay and the calculated delay of the service:

and S3, performing mathematical modeling on the energy consumption of the industrial Internet system according to the power of data transmission between the edge servers and the cloud server and the computing power of the edge servers and the cloud server.

In particular, if

Indicating the power of the data transfer between the edge servers,

representing the edge server computation power, the energy consumed to run service l on edge server n at time t is:

if it is

Representing the power of core network data transmission between the edge server and the cloud server,

representing the computing power of the cloud server, the energy consumed by running the service l on the cloud server at the time t is:

therefore, at time t, the total energy consumption of running service l is represented as:

in specific implementation, the optimal service caching strategy is solved by the optimization target of the invention, so that the aims of minimizing service access delay and energy consumption are fulfilled, wherein beta represents the weight occupied by energy consumption. The optimization objectives are as follows:

and S5, constructing an algorithm capable of realizing an optimization target based on the distributed deep reinforcement learning method.

Further, step S5 further includes the following steps:

and S51, combining the multiple parallel deep neural networks DNN with a reinforcement learning algorithm Q-learning to construct a parallel deep reinforcement learning algorithm for service cache decision.

Specifically, in order to minimize service access delay and energy consumption, a parallel deep reinforcement learning algorithm is designed, and a plurality of parallel deep neural networks DNN and a reinforcement learning algorithm Q-learning are combined to perform service caching decision.

Deep reinforcement learning generally defines a problem as a markov decision process. The method mainly focuses on how the intelligent agent interacts with the environment, adopts different actions and improves the accumulated reward to the maximum extent. Its main components include agents, environments, states, actions and rewards. Thus, the present invention describes the service optimization caching problem as a Markov decision process. It is composed of three parts of state space S, action space A and reward function R, and is defined as follows:

state space: s_n,tE S represents the state of the edge server n at time slot t.

Respectively representing the storage and computation capabilities of the edge server n, the arriving service requests and the edge service caching policies.

An action space: in each time slot t, the edge server needs to make a service caching decision based on the current state, a_t＝Υ_t。

The reward function: the optimization goal of edge service caching is to minimize service access latency and energy consumption. Thus, the designed reward function is as follows:

defining the action cost function as Q(s)_t,a_t) And updating the Q value:

alpha e (0, 1) represents the learning rate, and the reward attenuation factor gamma e 0, 1.

M parallel neural network elements are provided to act in conjunction with Q-learning. Each neural network action execution is parallel, and comprises two DNNs with the same structure but different parameters, one is a main neural network for predicting Q estimation values and provided with the latest network parameter theta, and the other is a target neural network for predicting Q actual values, and the used parameter is the parameter theta of a period of time before^*And remain unchanged for a period of time. And when the main neural network learns for a certain number of times, updating the parameters of the target neural network.Each neural network element will give the selected action value according to the Q value calculated by the greedy algorithm. The loss function is defined as:

s52, selecting an action to execute according to the current state by a greedy strategy in a training stage to obtain a reward and a next state, and converting and storing the obtained states into an experience pool; when the experience pool D stores a large enough capacity, a certain number of state transitions are extracted from the experience pool to train the network parameters.

In the training phase, according to the current state s_tSelecting an action a with an epsilon-greedy strategy_tExecuting the action receives a reward R_tAnd the next state s_t+1Converting the obtained state [ s ]_t,a_t,R_t,s_t+1]And storing the state transition into an experience pool D, and randomly extracting a certain number of state transitions from the experience pool to train the network parameter theta when the storage capacity of the experience pool D is large enough.

The algorithm flow of the edge service caching algorithm training phase based on the distributed deep reinforcement learning is as follows:

inputting an algorithm: the method comprises the following steps of environment information psi, a reward attenuation factor gamma, a learning rate alpha, a search and utilization balance parameter epsilon, an experience pool D, an iteration round M and an updating step number C.

And (3) outputting an algorithm: m DNN network parameters. The specific training steps are as follows:

step 1: parallel DNNs are initialized, and experience pool D is initialized.

Step 2: for epoch from 1 to M.

And step 3: initializing states s for m parallel DNNs_t。

And 4, step 4: for time sol T from 1 to T, iteration is performed.

And 5: selecting m actions through an epsilon-greedy strategy

i＝1,2,3...,m。

Step 6: in m DNNs, each execution is

Observing awards obtained

And obtaining a new state s_t+1。

And 7: handle

And storing the experience into an experience pool D.

And 8: randomly taking n samples [ s ] from the experience pool D_j,a_j,R_j(s_j,a_j),s_j+1]，j＝1,2,...,n。

And step 9: and calculating a Loss function Loss, and updating m main neural network parameters theta through the gradient back propagation of the neural network.

Step 10: if t% C is 0, m main neural network parameters theta are assigned to the target neural network theta^*。

Step 11: and if T < ═ T, entering the next time gap and returning to the step 5.

Step 12: and if the epoch is M, ending the iteration and outputting M DNN network parameters.

And S53, generating a plurality of cache decisions through a plurality of parallel deep neural networks in a decision stage, storing the cache decisions into an action set, calculating corresponding rewards obtained by each cache decision, and taking the cache decision with the highest reward as an output action.

In the decision phase, an optimal service caching strategy needs to be obtained. Firstly, the invention generates m cache decisions a through m parallel DNNsⁱAnd stored in the action set A, and calculates and executes the action aⁱThe prize R (s, a) earnedⁱ) And Q (s, a)ⁱ) Finding the action a as argmax from the action set A_AQ(S_tAnd A), outputting the action a as an edge service cache policy.

The algorithm flow of the edge service cache algorithm decision stage based on the distributed deep reinforcement learning is as follows:

inputting an algorithm: m DNN network parameters θ.

And (3) outputting an algorithm: service caching policy a. The method comprises the following specific steps:

step 1: for i from 1 to m.

Step 2: generation of action a by the ith DNNⁱAnd input to action set a (a ═ a)¹,a²,a³,...,aⁱ})。

And step 3: execution of aⁱTo obtain a reward R (s, a)ⁱ) Calculating Q (s, a)ⁱ)。

And 4, step 4: if i is m, selecting a motion a from motion set A as argmax_AQ(S_t,A)。

And 5: and outputting the action a as a service caching strategy.

Further, in step S1, the edge server provides data analysis and processing services for the sensor device and the industrial device; edge servers have limited computing and storage resources relative to cloud servers.

Further, in step S1, when the requested service is not cached on the edge server closest to the requested service, the service is executed on the cloud server or another edge server caching the service.

Further, in step S5, the service optimization caching problem is described as a markov decision process; the Markov decision process consists of three parts, namely a state space S, an action space A and a reward function R.

Further, in step S51, the actions of the deep neural network DNN are performed as being executed in parallel, where the deep neural network DNN includes two neural network structures having the same structure but different parameters; one is a main neural network used for predicting the Q estimation value of the reinforcement learning algorithm Q-learning, and has the latest network parameters; the other is a target neural network for predicting the actual value of Q of the reinforcement learning algorithm Q-learning, and the used parameters are parameters before a period of time and are kept unchanged for a period of time.

The invention also provides an industrial internet edge service cache decision system adopting any one of the methods, which comprises the following steps: the system comprises a mathematical modeling module and a service cache decision module;

the service cache decision module constructs an algorithm capable of realizing an optimization target based on a distributed deep reinforcement learning method.

Further, the service caching decision module further includes: the system comprises an algorithm construction unit, a training unit and a decision unit;

the algorithm construction unit combines a plurality of parallel deep neural networks DNN and a reinforcement learning algorithm Q-learning to construct a parallel deep reinforcement learning algorithm and perform service cache decision;

the training unit selects an action to execute according to the current state by a greedy strategy in a training stage to obtain a reward and a next state, and the obtained state is converted and stored into an experience pool; when the storage capacity of the experience pool D is large enough, extracting a certain number of state transitions from the experience pool to train network parameters;

the decision unit generates a plurality of cache decisions through a plurality of parallel deep neural networks in a decision stage and stores the cache decisions into an action set, calculates corresponding rewards obtained by each cache decision and takes the cache decision with the largest reward as an output action.

Further, the algorithm construction unit describes the service optimization caching problem as a Markov decision process; the Markov decision process consists of three parts, namely a state space S, an action space A and a reward function R.

Further, in the parallel deep reinforcement learning algorithm constructed by the algorithm construction unit, the action execution of the deep neural network DNN is parallel execution, and the deep neural network DNN includes two neural network structures having the same structure but different parameters; one is a main neural network used for predicting the Q estimation value of the reinforcement learning algorithm Q-learning, and has the latest network parameters; the other is a target neural network for predicting the actual value of Q of the reinforcement learning algorithm Q-learning, and the used parameters are parameters before a period of time and are kept unchanged for a period of time.

The method is used for calculating the optimal solution of the mathematical model of the edge cache strategy based on the combination of a plurality of parallel deep learning networks and a reinforcement learning algorithm, and can solve the technical problem of inaccurate prediction of the cache strategy in the prior art. The machine learning is combined with a deep learning algorithm, a machine learns and predicts the preference of a user and the change trend of the content popularity in the network according to a large amount of user historical data, and a service caching strategy is adjusted according to the learning result.

Although many researches on the problem of edge computing service caching have been carried out at present, the researches on edge service caching by adopting a deep reinforcement learning method are less, and the application to the field of industrial internet is less. In addition, compared with the traditional deep reinforcement learning method, the distributed method is designed, a plurality of parallel DNNs are adopted for service caching decision, and the service caching decision is better in performance of minimizing service access delay and energy consumption.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. An industrial Internet edge service cache decision method is characterized by comprising the following steps:

2. The method according to claim 1, wherein the step S5 further comprises the steps of:

s51, combining a plurality of parallel deep neural networks DNN with a reinforcement learning algorithm Q-learning to construct a parallel deep reinforcement learning algorithm for service cache decision;

s52, selecting an action to execute according to the current state by a greedy strategy in a training stage to obtain a reward and a next state, and converting and storing the obtained states into an experience pool; when the storage capacity of the experience pool D is large enough, extracting a certain number of state transitions from the experience pool to train network parameters;

3. The method of claim 2, wherein in the step S1, the edge server provides data analysis and processing services for the sensor device and the industrial device; the edge server has limited computing and storage resources relative to the cloud server.

4. The method according to claim 3, wherein in step S1, when the requested service is not cached on the edge server closest to the requested service, the service is executed on a cloud server or another edge server caching the service.

5. The method according to claim 4, wherein in step S5, the service optimization caching problem is described as a Markov decision process; the Markov decision process consists of three parts, namely a state space S, an action space A and a reward function R.

6. The method according to claim 5, characterized in that in step S51, the actions of the deep neural network DNN are performed in parallel, the deep neural network DNN comprising two neural network structures having the same structure but different parameters; one is a main neural network for predicting the Q estimation value of the reinforcement learning algorithm Q-learning, and has the latest network parameters; the other is a target neural network for predicting the actual value of Q of the reinforcement learning algorithm Q-learning, and the parameters used are parameters before a period of time and are kept unchanged for a period of time.

7. An industrial internet edge service caching decision system using the method of any one of claims 1 to 6, comprising: the system comprises a mathematical modeling module and a service cache decision module; it is characterized in that the preparation method is characterized in that,

8. The system of claim 7, wherein the service caching decision module further comprises: the system comprises an algorithm construction unit, a training unit and a decision unit;

the algorithm construction unit combines a plurality of parallel deep neural networks DNN and a reinforcement learning algorithm Q-learning to construct a parallel deep reinforcement learning algorithm for service cache decision;

the decision unit generates a plurality of cache decisions through a plurality of parallel deep neural networks in a decision stage and stores the cache decisions into an action set, and calculates corresponding rewards obtained by each cache decision, and the cache decision with the largest reward is used as an output action.

9. The system of claim 8, wherein the algorithm building unit describes the service optimization caching problem as a markov decision process; the Markov decision process consists of three parts, namely a state space S, an action space A and a reward function R.

10. The system according to claim 9, wherein in the parallel deep reinforcement learning algorithm constructed by the algorithm construction unit, the actions of the deep neural network DNN are executed in parallel, and the deep neural network DNN comprises two neural network structures with the same structure but different parameters; one is a main neural network for predicting the Q estimation value of the reinforcement learning algorithm Q-learning, and has the latest network parameters; the other is a target neural network for predicting the actual value of Q of the reinforcement learning algorithm Q-learning, and the parameters used are parameters before a period of time and are kept unchanged for a period of time.