CN114281718A - Industrial Internet edge service cache decision method and system - Google Patents

Industrial Internet edge service cache decision method and system Download PDF

Info

Publication number
CN114281718A
CN114281718A CN202111556973.XA CN202111556973A CN114281718A CN 114281718 A CN114281718 A CN 114281718A CN 202111556973 A CN202111556973 A CN 202111556973A CN 114281718 A CN114281718 A CN 114281718A
Authority
CN
China
Prior art keywords
network
service
model
actor
edge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111556973.XA
Other languages
Chinese (zh)
Inventor
叶可江
唐璐婕
须成忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN202111556973.XA priority Critical patent/CN114281718A/en
Publication of CN114281718A publication Critical patent/CN114281718A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to the technical field of industrial Internet, in particular to a method and a system for cache decision of industrial Internet edge service; according to the industrial Internet edge service cache decision method, the perception capability of deep learning, the decision capability of reinforcement learning and the rapid environment learning capability of meta learning are combined through the service cache method based on deep meta reinforcement learning, and the fused framework can rapidly and flexibly obtain the optimal cache strategy from a dynamic environment. The corresponding system also has the same technical effect.

Description

Industrial Internet edge service cache decision method and system
Technical Field
The invention relates to the technical field of industrial internet, in particular to a method and a system for cache decision of industrial internet edge service.
Background
The industrial internet integrates various sensors and controllers with sensing and control capabilities into an industrial production process by integrating advanced technologies such as 5G communication, artificial intelligence and the like, so that the production process of products is optimized, the cost is reduced, and the productivity is improved. In a traditional cloud computing mode, due to the characteristic of centralized deployment, a computing node is usually far away from an intelligent terminal, and the requirements of the industrial field on high real-time performance and low delay are difficult to meet. The edge computing sinks resources such as computing, storage and network to the edge of the industrial network, so that equipment requests can be responded more conveniently, key requirements such as intelligent access, real-time communication and privacy protection under the industrial internet environment are met, and intelligent green communication is realized.
Edge service caching is a key issue in practical applications of edge computing. Since the various resources (e.g., bandwidth resources, computing resources, etc.) each network edge server provides to the user are fixed and limited, only a few kinds of service requests can be provided and run by the edge nodes at the same time. The reasonable mobile edge cache strategy can effectively improve the performance of the whole network and improve the service quality.
In recent years, a great deal of research has emerged aimed at the edge computation caching problem. The scholars propose algorithms such as a cache decision strategy based on content popularity, namely, content caching is carried out according to the preference of a user to certain specific content, the content with higher user request probability is cached in a base station preferentially, and the experience quality of the user can be effectively improved. Or learning based edge cache strategy is adopted, machine learning or deep learning technology is adopted, a machine learns and predicts the preference degree of a user or the change trend of the content popularity in the network according to a large amount of user historical data, and the cache strategy is adjusted according to the learning result.
The prior art is mainly directed to the research of the mobile edge calculation caching problem. Most work has focused on improving some caching strategies on legacy networks based on the new characteristics of the mobile edge computing network. There is also a portion of work exploring new caching schemes, such as caching strategies based on user preferences, based on learning, or multi-edge node collaboration. But because content popularity, user preferences, etc. are constantly changing over time and are unpredictable. Meanwhile, in the case of a rapidly changing industrial application scenario, each time the environment changes, the caching decision of the service has to be re-adjusted through recalculation, otherwise, higher service delay and cost are generated. Although some good effects are achieved in the aspect of edge cache decision by introducing intelligent algorithms such as deep learning and reinforcement learning, the challenges of low learning speed, failure of original network parameters when the model environment changes and the like still exist. The prior art has the defects.
Disclosure of Invention
In order to solve at least one technical problem, embodiments of the present invention provide a method and a system for deciding an edge service cache of an industrial internet, which solve an optimal edge service cache policy through deep meta-reinforcement learning, so as to achieve the purpose of minimizing service access delay.
According to an embodiment of the present invention, a method for deciding an industrial internet edge service cache is provided, which includes the following steps:
s1, performing mathematical modeling on an industrial Internet system based on service access time delay, and establishing a system model;
s2, establishing an optimization target for achieving the minimized service access time delay based on a system model;
s3, constructing a depth element reinforcement learning framework capable of realizing the optimization target based on a depth certainty strategy gradient algorithm;
the industrial Internet system comprises a plurality of devices, a plurality of edge servers and a cloud server; the device is connected to a cloud server through the edge server; and a plurality of edge servers are in communication connection with each other.
The invention also provides an industrial internet edge service cache decision system adopting any one of the methods, which comprises the following steps: the system comprises a mathematical modeling module, a target establishing module and a service cache decision module;
the mathematical modeling module performs mathematical modeling on the industrial Internet system based on service access delay to establish a system model;
the target establishing module establishes an optimization target for achieving the minimized service access time delay based on a system model;
the service cache decision module constructs a depth meta reinforcement learning framework capable of realizing the optimization target based on a depth certainty strategy gradient algorithm;
the industrial Internet system also comprises a plurality of devices, a plurality of edge servers and a cloud server; the device is connected to a cloud server through the edge server; and a plurality of edge servers are in communication connection with each other.
The industrial Internet edge service cache decision method and the system can combine the perception capability of deep learning, the decision capability of reinforcement learning and the fast environment learning capability of meta learning based on the service cache method of deep meta reinforcement learning, and the fused framework can quickly and flexibly obtain the optimal cache strategy from a dynamic environment.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow chart of an industrial Internet edge service cache decision method according to the present invention;
FIG. 2 is a schematic diagram of an industrial Internet edge service cache decision method according to the present invention;
FIG. 3 is a schematic diagram of a side cloud cooperative service structure of an industrial Internet edge service caching decision system according to the present invention;
fig. 4 is a schematic diagram of a deep meta reinforcement learning framework adopted in the industrial internet edge service cache decision method of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
As shown in fig. 3, in order to facilitate modeling (building a mathematical model) of the industrial internet system, the present invention is implemented in an industrial internet scenario. Including a large number of industrial and sensor devices, a plurality of edge servers, and a cloud server. We assume that all edge servers within a region can communicate internally through various network connections, such as a local area network. In this way, the industrial device can offload tasks to any one of the edge servers in the reachable area, rather than relying only on the nearest direct connection. The edge server will create an operating environment for a particular service through virtualization technology (e.g., virtual machine, container technology).
Due to the particularity of the application tasks such as video stream analysis and augmented reality, the execution of the tasks requires not only the allocation of user input data and computing resources, but also the pre-caching of a large amount of data. The edge server can only use the services cached on it to run the corresponding tasks. Minimizing service access time is closely related to edge service caching.
And defining before constructing a mathematical model, wherein the service access delay is defined to be composed of service communication delay and service execution delay, and the service communication delay comprises the time for sending a request to a nearest edge server by the equipment and sending a service result back to the mobile equipment after the request is completed. The service execution latency includes a completion time of running the service on the edge server or the cloud server.
Referring to fig. 1to 3, according to an embodiment of the present invention, a method for deciding an industrial internet edge service cache is provided, which includes the following steps:
s1, performing mathematical modeling on an industrial Internet system based on service access delay (including service communication delay and service execution delay), and establishing a system model.
In particular implementation, to facilitate modeling, time is discretized into time slices having the same interval
Figure BDA0003419116050000041
An edge server set is defined as
Figure BDA0003419116050000042
Wherein s ═ (slat)s,slons,scrs,scs,srs) Respectively, the latitude, longitude, and coverage of the edge server s, and its computing and storage capabilities. sc (sc)|S+1| represents the computing power of the cloud server.
Service set requested by various industrial equipment and sensor equipment
Figure BDA0003419116050000043
Wherein k is (kc)k,krk) Respectively representing the computational resources and storage resources required by service k. the sensor device requests arriving in the t time slot are denoted as
Figure BDA0003419116050000044
Wherein u ═ is (ula)u,ulonu,uku,uwu) Respectively indicating the latitude and longitude of the device and the type of service requested by the industrial device, the size of the input data.
Performing service caching on the edge server, and setting a binary variable xk,s(t) is e {0,1} indicates whether service k is cached in edge server s at time t, when xk,s(t) ═ 1 represents that service k is cached to the edgeAn edge server s. We define γt={xk,s(t) | K ∈ K, S ∈ S } is the service caching decision within the time slot t. Furthermore, it is assumed that the storage capacity on the edge server is limited, and therefore, the size of a service placed on the edge server s cannot be larger than its storage capacity during any time slot t:
Figure BDA0003419116050000045
the computational power allocated per service in the edge server s is represented as:
Figure BDA0003419116050000046
and S2, establishing an optimization target for achieving the minimized service access time delay based on the system model.
In specific implementation, the optimization target based on the system model is set as:
Figure BDA0003419116050000051
Figure BDA0003419116050000052
Figure BDA0003419116050000053
s3, constructing a depth element reinforcement learning framework capable of realizing the optimization target based on a depth certainty strategy gradient algorithm;
the industrial Internet system comprises a plurality of devices, a plurality of edge servers and a cloud server; the device is connected to a cloud server through the edge server; and a plurality of edge servers are in communication connection with each other.
Further, in step S1, the service access latency includes a service communication latency and a service execution latency.
Specifically, the service access delay generated by the industrial device requesting the service includes the following three cases:
if the industrial equipment is not in the service range of any edge server or the request service is not cached in any edge server, all the services are executed on the cloud server, and the access delay of the services is as follows:
Figure BDA0003419116050000054
wherein the content of the first and second substances,
Figure BDA0003419116050000055
and the time required for data transmission to the cloud server and result return from the cloud server is represented.
Figure BDA0003419116050000056
Indicating the time required for the requested service k to run in the cloud service.
The industrial equipment is in the service range of the edge server, and the edge server s closest to the equipment receives the equipment request service k, if the edge server caches the service, the industrial equipment is executed on the edge server s. The access delay of the service is:
Figure BDA0003419116050000057
wherein
Figure BDA0003419116050000058
The industrial equipment is in the service range of the edge server, the edge server s closest to the equipment receives the request service k of the user, and if the edge server does not cache the service, the industrial equipment is unloaded to the adjacent edge server w caching the service k for execution. The access delay of the service is:
Figure BDA0003419116050000061
wherein
Figure BDA0003419116050000062
Bs,wRepresenting the bandwidth size between edge server s and edge server w.
The service communication delay comprises the time when the equipment sends a request to the nearest edge server and the time when the edge server sends a service result after the request is completed back to the equipment;
the service execution latency includes a completion time of running the service on the edge server or the cloud server.
Further, the device comprises an industrial device and a sensor device.
Further, the deep meta reinforcement learning framework in the step S3 includes outputting an inner layer model that minimizes service access latency; and the outer layer model is used for improving the applicable environment of the inner layer model.
In order to minimize service access delay, the inner layer model provided by the invention designs an edge service caching algorithm based on a depth deterministic policy gradient algorithm (DDPG) to perform service caching decision. Deep reinforcement learning models typically require the problem to be defined as a markov decision process, including agents, environments, states, actions, and rewards. The basic idea is as follows: the agent observes the environment at time t to obtain the current state. And then, taking corresponding action to interact with the environment in the current state, giving a reward after the environment is accepted, and entering the next state. An agent maximizes the total of its rewards by continuously interacting with the environment. Therefore, we describe the service optimization caching problem as a markov decision process. It is composed of three parts of state space S, action space A and reward function R, and is defined as follows:
state space: stEs represents the state observed from the mobile edge computing system at time slot t. st=(Ut,Btt-1) Device requests arriving at each time slot t, network conditions, and edge service caching policies, respectively.
An action space: in each time slot t, the edge server needs to make a service caching decision based on the current state. a ist=Υt,atIs 2 in sizeK×S
The reward function: the optimization goal of edge service caching is to minimize the average service access latency, so the designed reward function is as follows:
Figure BDA0003419116050000071
further, in the DDPG algorithm of the inner layer model, an Actor network and a Critic network both use a deep neural network to establish an approximate function; the inner model generates the determined behavior directly from the policies of the Actor network without sampling in terms of probability distributions of behavior.
The inner layer model adds a noise function on the basis of deterministic behavior in the learning phase of the deep neural network. To enable exploration within a small range around deterministic behavior.
In addition, the inner layer model backups a set of parameters for calculating expected values of behavior values for the Actor network and the Critic network respectively; to more stably promote the strategic guidance level of Critic. The Actor network and Critic network corresponding to the other set of main parameters are used for generating actual interactive behaviors and calculating corresponding strategy gradients, and the set of main parameters are updated once every learning. The purpose of this two-parameter setting is to reduce the occurrence of misconvergence due to the guidance of the approximation data.
Wherein the primary parameters are updated based on a learning process of the deep neural network; the backup parameters are updated less frequently than the primary parameters.
Generating an Actor Online policy network when the Actor network uses the primary parameter; is responsible for the iterative update of the strategy network parameter theta according to the current state stExploration of formation orUnexplored specific behavior at
Generating an Actor Target policy network when the Actor network uses the backup parameters; subsequent states s given by the environmentt+1Generating a used for predictive valuet+1Network parameter θ*Periodically copied from theta.
Generating a Critic Online Q network when the Critic network uses the primary parameters; is responsible for iterative update of the value network parameter omega and calculation of the state stAnd the generated behavior atThe corresponding behavioral value.
Generating a Critic Target Q network when the Critic network uses the backup parameters; according to the subsequent state st+1,at+1Generating to calculate the target value Q(s)t,atQ(s) of ω)t+1,at+1*) (ii) a Network parameter omega*Periodically copied from ω.
In a specific implementation, the network parameters of the DDPG algorithm use soft updates, i.e. each time the parameters are updated only partially, where τ is an update coefficient, and generally takes a smaller value:
ω*←τω+(1-τ)ω*
θ*←τθ+(1-τ)θ*
meanwhile, in order to increase some randomness and coverage of learning in the learning process, the DDPG selects the action atWill add a certain noise N, i.e. action a of the final interaction with the environmenttThe expression of (1) is:
Figure BDA0003419116050000081
for the Critic Online Q network, the loss function is defined as:
Figure BDA0003419116050000082
for an Actor Online policy network, the loss function is defined as:
Figure BDA0003419116050000083
the edge service caching algorithm flow based on the DDPG algorithm is as follows:
inputting an algorithm: an Actor Online strategy network, an Actor Target strategy network, a Critic Online Q network and a Critic Target Q network, wherein the parameters are theta and theta respectively*,ω,ω*Meta policy parameters
Figure BDA0003419116050000084
And
Figure BDA0003419116050000085
the method comprises the following steps of attenuation factor gamma, soft updating coefficient tau, experience pool D, batch gradient descending sample number n and target Q network parameter updating frequency C. The maximum number of iterations M. Random noise function
Figure BDA0003419116050000086
And (3) outputting an algorithm: optimal edge service cache decision yt
Step 1: initializing network parameters
Figure BDA0003419116050000087
θ*=θ,ω*The experience pool D is initialized.
Step 2: for epoch from 1to M.
And step 3: initialization state st
And 4, step 4: for time sol T from 1to T, iteration is performed.
And 5: on Online policy network at Actor based on state stGet an action
Figure BDA0003419116050000088
Step 6: execution of atObserving the prize R earnedt(st,at) And obtaining a new state st+1
And 7: handle [ s ]t,at,Rt(st,at),st+1]And storing the experience into an experience pool D.
And 8: randomly taking n samples [ s ] from the experience pool Dj,aj,Rj(sj,aj),sj+1],j=1,2,...,n。
And step 9: calculating the current Q value
Figure BDA0003419116050000091
Step 10: and calculating a loss function J (omega), and updating the Critic Online Q network parameter omega through the gradient back propagation of the neural network.
Step 11: and calculating a loss function J (theta), and updating the Actor Online strategy network parameter theta through the gradient back propagation of the neural network.
Step 12: if t% C is 0, updating Critic Target Q network and Actor Target strategy network parameters:
ω*←τω+(1-τ)ω*
θ*←τθ+(1-τ)θ*
step 13: and if T < ═ T, entering the next time gap and returning to the step 5.
Step 14: if the epoch is M, the iteration ends. Outputting optimal edge service cache decisions γt
Further, the outer layer model inputs cache decisions and execution results in different environments into the inner layer model for training; and in each training, the inner layer model randomly selects a training sample in one environment for learning, and randomly selects another environment for iteration after learning, so that the meta-cache strategy is trained, and the environment-applicable capability of the inner layer model is improved.
The goal of the training sample learning described above is to ensure that the parameters of the model training are not too close to the optimal solution in a particular environment. The parameters trained in this way are used as initial parameters for the inner model.
The meta cache strategy algorithm flow is as follows:
inputting an algorithm: and E, environment.
And (3) outputting an algorithm: meta-policy parameters
Figure BDA0003419116050000092
And
Figure BDA0003419116050000093
step 1: initializing an Actor Online strategy network parameter theta, a criticc Online Q network parameter omega and an experience pool D.
Step 2: for epoch from 1to M.
And step 3: random selection environment, initialization state st
And 4, step 4: for time sol T from 1to T, iteration is performed.
And 5: on Online policy network at Actor based on state stObtain an action at
Step 6: execution of atObtaining the reward R in that circumstancet(st,at) And obtaining a new state st+1
And 7: handle [ s ]t,at,Rt(st,at),st+1]And storing the experience into an experience pool D.
And 8: randomly taking n samples [ s ] from the experience pool Dj,aj,Rj(sj,aj),sj+1]And calculating the Q value.
And step 9: and updating the Critic Online Q network parameter omega and the Actor Online strategy network parameter theta.
Step 10: and if T < ═ T, entering the next time gap and returning to the step 5.
Step 11: if the epoch is M, the iteration is ended and the output is output
Figure BDA0003419116050000101
The invention also provides an industrial internet edge service cache decision system adopting any one of the methods, which comprises the following steps: the system comprises a mathematical modeling module, a target establishing module and a service cache decision module;
the mathematical modeling module is used for carrying out mathematical modeling on the industrial Internet system based on service access delay (including service communication delay and service execution delay) and establishing a system model;
the target establishing module establishes an optimization target for achieving the minimized service access time delay based on a system model;
the service cache decision module constructs a depth meta reinforcement learning framework capable of realizing the optimization target based on a depth certainty strategy gradient algorithm;
the industrial Internet system also comprises a plurality of devices, a plurality of edge servers and a cloud server; the device is connected to a cloud server through the edge server; and a plurality of edge servers are in communication connection with each other.
Further, the service cache decision module adopts a deep neural network and deterministic behavior strategy fusion (DDPG) to construct a deep meta-reinforcement learning framework; the deep meta reinforcement learning framework comprises an inner layer model for outputting minimized service access delay; and the outer layer model is used for improving the applicable environment of the inner layer model.
Further, the inner layer model backups a set of parameters for calculating an expected value of a behavior value for each of the Actor network and the Critic network; updating the main parameters based on the learning process of the deep neural network; the updating frequency of the backup parameters is lower than that of the main parameters;
generating an Actor Online policy network when the Actor network uses the primary parameter;
generating an Actor Target policy network when the Actor network uses the backup parameters;
generating a Critic Online Q network when the Critic network uses the primary parameters;
generating a Critic Target Q network when the Critic network uses the backup parameters;
the outer layer model inputs cache decisions and execution results in different environments into the inner layer model for training; and in each training, the inner layer model randomly selects a training sample in one environment for learning, and randomly selects another environment for iteration after learning so as to improve the capability of the inner layer model in the applicable environment.
According to the method, a service cache strategy is trained by a depth element reinforcement learning framework with edge cloud cooperation, an outer layer model is used for training an element cache strategy, and the environment adaptability of an inner layer model is improved. The inner layer model designs an edge service caching algorithm based on DDPG, service caching decision is carried out, and service caching time delay is minimized. The technical problem that when the edge environment changes, original parameters of the edge environment are completely invalid, so that a large amount of training data is needed to train from beginning to beginning, and the learning efficiency is low is effectively solved.
Meanwhile, the invention provides an industrial Internet edge service cache decision method based on deep meta reinforcement learning. By combining the perception capability of deep learning, the decision-making capability of reinforcement learning and the fast environmental learning capability of meta learning, the optimal cache strategy can be quickly and flexibly obtained from a dynamic environment.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. An industrial Internet edge service cache decision method is characterized by comprising the following steps:
s1, performing mathematical modeling on an industrial Internet system based on service access time delay, and establishing a system model;
s2, establishing an optimization target for achieving the minimized service access time delay based on a system model;
s3, constructing a depth element reinforcement learning framework capable of realizing the optimization target based on a depth certainty strategy gradient algorithm;
the industrial Internet system comprises a plurality of devices, a plurality of edge servers and a cloud server; the device is connected to a cloud server through the edge server; and a plurality of edge servers are in communication connection with each other.
2. The method according to claim 1, wherein in the step S1, the service access latency includes service communication latency and service execution latency;
the service communication delay comprises the time when the equipment sends a request to the nearest edge server and the time when the edge server sends a service result after the request is completed back to the equipment;
the service execution latency includes a completion time of running the service on the edge server or the cloud server.
3. The method of claim 2, wherein the devices comprise industrial devices and sensor devices.
4. The method of claim 3, wherein the deep meta reinforcement learning framework comprises outputting an inner model that minimizes service access latency; and the outer layer model is used for improving the applicable environment of the inner layer model.
5. The method according to claim 4, wherein the Actor network and Critic network of the inner layer model both use a deep neural network to establish an approximation function; the inner model generates the determined behavior directly from a policy of the Actor network; the inner layer model adds a noise function on the basis of deterministic behavior in the learning phase of the deep neural network.
6. The method according to claim 5, wherein the inner layer model backups a set of parameters for calculating expected values of behavior values for each of the Actor network and the Critic network; updating the main parameters based on the learning process of the deep neural network; the updating frequency of the backup parameters is lower than that of the main parameters;
generating an Actor Online policy network when the Actor network uses the primary parameter;
generating an Actor Target policy network when the Actor network uses the backup parameters;
generating a Critic Online Q network when the Critic network uses the primary parameters;
and generating a Critic Target Q network when the Critic network uses the backup parameters.
7. The method of claim 6, wherein the outer model inputs cache decisions and execution results in different environments into the inner model for training; and in each training, the inner layer model randomly selects a training sample in one environment for learning, and randomly selects another environment for iteration after learning so as to improve the capability of the inner layer model in the applicable environment.
8. An industrial internet edge service caching decision system using the method of any one of claims 1to 7, comprising: the system comprises a mathematical modeling module, a target establishing module and a service cache decision module; the system is characterized in that the mathematical modeling module performs mathematical modeling on the industrial Internet system based on service access time delay to establish a system model;
the target establishing module establishes an optimization target for achieving the minimized service access time delay based on a system model;
the service cache decision module constructs a depth meta reinforcement learning framework capable of realizing the optimization target based on a depth certainty strategy gradient algorithm;
the industrial Internet system also comprises a plurality of devices, a plurality of edge servers and a cloud server; the device is connected to a cloud server through the edge server; and a plurality of edge servers are in communication connection with each other.
9. The system of claim 8, wherein the deep meta-reinforcement learning framework comprises an inner model that outputs a minimized service access latency; and the outer layer model is used for improving the applicable environment of the inner layer model.
10. The system according to claim 9, wherein the inner model backups a set of parameters for calculating an expected value of a behavior value for each of the Actor network and the Critic network; updating the main parameters based on the learning process of the deep neural network; the updating frequency of the backup parameters is lower than that of the main parameters;
generating an Actor Online policy network when the Actor network uses the primary parameter;
generating an Actor Target policy network when the Actor network uses the backup parameters;
generating a Critic Online Q network when the Critic network uses the primary parameters;
generating a Critic Target Q network when the Critic network uses the backup parameters;
the outer layer model inputs cache decisions and execution results in different environments into the inner layer model for training; and in each training, the inner layer model randomly selects a training sample in one environment for learning, and randomly selects another environment for iteration after learning so as to improve the capability of the inner layer model in the applicable environment.
CN202111556973.XA 2021-12-18 2021-12-18 Industrial Internet edge service cache decision method and system Pending CN114281718A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111556973.XA CN114281718A (en) 2021-12-18 2021-12-18 Industrial Internet edge service cache decision method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111556973.XA CN114281718A (en) 2021-12-18 2021-12-18 Industrial Internet edge service cache decision method and system

Publications (1)

Publication Number Publication Date
CN114281718A true CN114281718A (en) 2022-04-05

Family

ID=80873384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111556973.XA Pending CN114281718A (en) 2021-12-18 2021-12-18 Industrial Internet edge service cache decision method and system

Country Status (1)

Country Link
CN (1) CN114281718A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114860337A (en) * 2022-05-17 2022-08-05 华东师范大学 Computing unloading method based on meta reinforcement learning algorithm
CN115344510A (en) * 2022-10-18 2022-11-15 南京邮电大学 High-dimensional video cache selection method based on deep reinforcement learning
CN115633380A (en) * 2022-11-16 2023-01-20 合肥工业大学智能制造技术研究院 Multi-edge service cache scheduling method and system considering dynamic topology

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114860337A (en) * 2022-05-17 2022-08-05 华东师范大学 Computing unloading method based on meta reinforcement learning algorithm
CN114860337B (en) * 2022-05-17 2023-07-25 华东师范大学 Computing unloading method based on meta reinforcement learning algorithm
CN115344510A (en) * 2022-10-18 2022-11-15 南京邮电大学 High-dimensional video cache selection method based on deep reinforcement learning
CN115633380A (en) * 2022-11-16 2023-01-20 合肥工业大学智能制造技术研究院 Multi-edge service cache scheduling method and system considering dynamic topology
CN115633380B (en) * 2022-11-16 2023-03-17 合肥工业大学智能制造技术研究院 Multi-edge service cache scheduling method and system considering dynamic topology

Similar Documents

Publication Publication Date Title
Elgendy et al. Joint computation offloading and task caching for multi-user and multi-task MEC systems: reinforcement learning-based algorithms
Tang et al. Migration modeling and learning algorithms for containers in fog computing
CN114281718A (en) Industrial Internet edge service cache decision method and system
Peng et al. Joint optimization of service chain caching and task offloading in mobile edge computing
CN112422644B (en) Method and system for unloading computing tasks, electronic device and storage medium
Li et al. Energy-aware task offloading with deadline constraint in mobile edge computing
Shan et al. “DRL+ FL”: An intelligent resource allocation model based on deep reinforcement learning for mobile edge computing
CN113064671A (en) Multi-agent-based edge cloud extensible task unloading method
CN113568727A (en) Mobile edge calculation task allocation method based on deep reinforcement learning
Sun et al. Reinforcement learning based computation migration for vehicular cloud computing
Chen et al. Cache-assisted collaborative task offloading and resource allocation strategy: A metareinforcement learning approach
Qi et al. Vehicular edge computing via deep reinforcement learning
Wang et al. Online service migration in mobile edge with incomplete system information: A deep recurrent actor-critic learning approach
CN113973113B (en) Distributed service migration method for mobile edge computing
Lakew et al. Adaptive partial offloading and resource harmonization in wireless edge computing-assisted ioe networks
Huang et al. Reinforcement learning for cost-effective IoT service caching at the edge
Henna et al. Distributed and collaborative high-speed inference deep learning for mobile edge with topological dependencies
CN116489708B (en) Meta universe oriented cloud edge end collaborative mobile edge computing task unloading method
CN117459112A (en) Mobile edge caching method and equipment in LEO satellite network based on graph rolling network
CN112312299A (en) Service unloading method, device and system
Li et al. Efficient data offloading using markovian decision on state reward action in edge computing
CN115052262A (en) Potential game-based vehicle networking computing unloading and power optimization method
Tan et al. Toward a task offloading framework based on cyber digital twins in mobile edge computing
Ramya et al. Lightweight Unified Collaborated Relinquish Edge Intelligent Gateway Architecture with Joint Optimization
Li et al. Handoff Control and Resource Allocation for RAN Slicing in IoT Based on DTN: An Improved Algorithm Based on Actor-Critic Framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination