CN109660598B

CN109660598B - Cache replacement method and system for transient data of Internet of things

Info

Publication number: CN109660598B
Application number: CN201811370683.4A
Authority: CN
Inventors: 曹洋; 褚磊; 竺浩; 江涛
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2018-11-17
Filing date: 2018-11-17
Publication date: 2020-05-19
Anticipated expiration: 2038-11-17
Also published as: CN109660598A

Abstract

The invention discloses a cache replacement method and a cache replacement system for transient data of an Internet of things, wherein a depth reinforcement learning method is used for learning a cache strategy of the transient data, the heat trend of the data is mined from the data request history of an edge cache node, transient data information is combined and used as the input of depth reinforcement learning, an instant reward is set as the opposite number of comprehensive communication cost, and the cache strategy is learned in a self-adaptive manner by continuously performing cache replacement operation by utilizing a critic network learning value function and an actor network learning strategy function, so that the problem of low cache efficiency of the transient data in the edge cache node under the condition of limited storage resources is solved; the data freshness and the communication resource consumption of the data of the Internet of things are brought into the comprehensive communication cost, the long-term comprehensive cost for acquiring the data of the Internet of things is minimized, the network flow can be unloaded to the network edge, the time delay is reduced, and the problems of large time delay and large communication resource consumption in the transmission of the mass transient data of the Internet of things are solved to a certain extent.

Description

Cache replacement method and system for transient data of Internet of things

Technical Field

The invention belongs to the field of wireless communication, and particularly relates to a cache replacement method and system for transient data of an Internet of things.

Background

With the rapid development and wide application of the internet of things (IoT) in the fields of intelligent transportation, smart grid, smart home, industrial automation and the like, huge pressure and challenge are brought to the current communication network by massive internet of things data traffic. In order to solve the above problems, a common idea is to add an edge cache mechanism in the internet of things, cache hot spot data by using idle storage resources of network edge cache nodes, and a request end can directly obtain data from corresponding edge cache nodes without obtaining the data from a data source, thereby avoiding a large amount of unnecessary end-to-end communication. The edge cache in the Internet of things system can unload network flow, reduce network delay and provide better service quality and user experience. Because the storage capacity of the edge cache node is generally limited, an efficient cache replacement strategy can improve the cache hit rate, so that the cache space is utilized more efficiently, and more network traffic is unloaded. The mass application in the internet of things system also has a requirement on the timeliness of data, and only data within a certain timeliness is available, so that the freshness of the cached data is an important consideration standard for cache replacement. Therefore, an edge cache replacement strategy in the internet of things system needs to consider the heat and freshness information of cache data at the same time, so that the cache requirement in the scene of the internet of things is better met.

The traditional cache replacement method, such as a first-in first-out method, a least recently used method, a least frequently used method and the like, has low cache efficiency because the hot trend of the content and the distribution of the user requests are not considered. The existing cache replacement policy of the edge cache includes: bastag et al, predict the heat of data by collaborative filtering using user-data correlation, greedily cache hot data at the beginning by a cache strategy until the cache of an edge cache node is exhausted, and then perform cache replacement according to the predicted heat information; blasco et al, solved the problem of data placement in small base stations by the Knapsack method (Knapack), where the data heat is estimated from the request rate at which the data is received; song et al, which considers data heat in combination with the data caching process; tanzil et al, estimate the heat of data by constructing a neural network, and calculate the placement position and size of the cache using a mixed integer linear programming method, but the heat prediction stage of the method needs to utilize keywords and classification information of video content, which is not applicable to general IoT data.

The method mainly focuses on the edge cache of non-transient data, and the edge cache node determines whether the cache replaces the cache by estimating the heat of the data. On one hand, the heat distribution of the data and the user request are assumed to follow a specific distribution (such as a poisson distribution), so that the scene that the heat distribution of the data and the user request are rapidly changed cannot be adapted; on the other hand, only the caching of non-transient data is focused, and the timeliness problem of transient data is not considered.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to solve the technical problems of long time delay, high communication resource consumption and low transient data caching efficiency in an edge cache node under the condition of limited storage resources in the prior art of mass transient data transmission of the Internet of things.

To achieve the above object, in a first aspect, an embodiment of the present invention provides a cache replacement method for transient data of an internet of things, where a cache space of a current edge cache node is full, the method includes the following steps:

s1, an edge cache node receives a new transient data item request sent by a user;

s2, judging whether the content of the transient data item of the request is in the cache of the edge cache node, if so, entering a step S3, otherwise, entering a step S6;

s3, judging whether the transient data item of the request is fresh data or expired data, if so, entering a step S4, and if so, entering a step S5;

s4, directly reading the data from the cache region of the edge cache node, and forwarding the data to a user;

s5, the edge cache node forwards the user request to a data source, reads new data from the data source, replaces the expired data in a cache region of the edge cache node with the new data, and forwards the new data to the user;

s6, the edge cache node forwards the user request to a data source, reads new data from the data source, selects data to be replaced in a cache region of the edge cache node by using depth-enhanced learning, replaces the data to be replaced by the new data, and forwards the new data to the user.

Specifically, in step S2, if f is_k∈F_kThe requested data item content is in the cache of the edge cache node, if so

The requested data item content is not in the cache of the edge cache node, where f_kRequesting for a data item a corresponding data content unique identifier CID, F_kAnd the CID set corresponding to the data item cached in the edge cache node when the request k arrives.

Specifically, in step S3, if t is the same as_age(p(f_k))≤T_life(p(f_k) The requested data item is fresh data if t)_age(p(f_k))＞T_life(p(f_k) The requested data item is stale data, where f_kRequesting k the corresponding CID for the data item, p (-) a mapping function from the request content CID to the data item, T_life(. is) the effective lifecycle of the data item, t_age(. cndot.) represents the age of the data item.

Specifically, the selecting, by using depth-enhanced learning in step S6, data to be replaced in the cache region of the edge cache node specifically includes:

1) at the moment n, observing the state information of the edge cache node to obtain the state s at the moment n_n；

2) According to a cache strategy pi (a)_n|s_n) Select cache action a_nAnd executing the caching action;

3) performing a caching action a_nThen, calculate the instant prize r_nThe edge cache node state information is represented by s_nBecomes s_n+1；

4) Will award r immediately_nFeeding back to the edge cache node and converting the state<s_n,a_n,r_n,s_n+1>As a training sample, an actor-critic network for training deep reinforcement learning, and the above process is repeated.

In particular, an instant prize r_nThe calculation formula of (a) is as follows:

wherein, Req_nRepresenting a caching action a_nAfter execution, the action a is cached until the next execution_n+1All data request sets received by the edge cache nodes in between, C (d)_k) For obtaining data d_kThe comprehensive cost of (2).

In particular, the combined cost C (d)_k) The calculation formula of (a) is as follows:

C(d_k)＝α·c(d_k)+(1-α)·l(d_k)

wherein, α ∈ [0, 1]]Representing the compromise coefficient, c (d)_k) Denotes the communication cost, l (d)_k) Representing the data timeliness cost, c₁Representing the communication overhead of data taken directly from the edge cache node, c₂Representing the communication overhead for obtaining data from a data source, c₁＜c₂And c is and c₁、c₂Are all normal numbers; f. of_kRequesting k the corresponding CID, F for the data item_kThe CID set corresponding to the data item cached in the edge cache node when the request k arrives, p (-) is a mapping function from the CID of the request content to the data item, T_life(. is) the effective lifecycle of the data item, t_age(. cndot.) represents the age of the data item.

Specifically, the actor network parameter θ in the deep reinforcement learning is updated according to the gradient rise as follows:

wherein λ is the learning rate of the mobile network,

representing gradient operators, strategy pi (a)_n|s_n(ii) a θ) is shown in state s_nNext, select cache replace action a_nThe probability of (a) of (b) being,

to advantageFunction, γ ∈ [0, 1]]The discount factor is represented by a number of discount factors,

representing a state-cost function;

network parameter theta of critic network in deep reinforcement learning_vUpdate according to gradient descent is:

where λ' is the learning rate of the critic network.

wherein λ is the learning rate of the mobile network,

for the merit function, γ ∈ [0, 1]]The discount factor is represented by a number of discount factors,

representing a state-cost function; h (pi (· | s)_n(ii) a θ)) is the state s_nStrategy n_θthe strategy entropy of the output action space, beta, represents an exploration coefficient;

where λ' is the learning rate of the critic network.

In a second aspect, an embodiment of the present invention provides a cache replacement system for transient data of an internet of things, where a cache space of a current edge cache node is full, and the system includes: the device comprises a state judgment module, a reading module, a request forwarding module and a cache replacement module;

the state judgment module is configured to judge a state of transient data requested by a user, where the state includes: the first state: the requested transient data item content is in the cache of the edge cache node and the requested transient data item is fresh data; and a second state: the requested transient data item content is in the cache of the edge cache node and the requested transient data item is stale data; and a third state: the transient data item content of the request is not in the cache of the edge cache node;

the reading module is used for directly reading the data from the cache region of the edge cache node and forwarding the data to the user when the judgment result of the state judgment module is that the state is one;

the request forwarding module is used for forwarding the user request to the data source by the edge cache node when the judgment result of the state judgment module is the state two or three, and reading new data from the data source;

the cache replacement module is configured to replace the expired data in the cache area of the edge cache node with the new data read by the request forwarding module and forward the new data to the user when the determination result of the state determination module is state two; and when the judgment result of the state judgment module is the state three, selecting the data to be replaced in the cache region of the edge cache node by using deep reinforcement learning, replacing the data to be replaced by the new data read by the request forwarding module, and forwarding the new data to the user.

In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the cache replacement method described in the first aspect.

Generally, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects:

1. the invention considers the effective life cycle of the transient data, and brings the data freshness of the data of the internet of things and the consumption of communication resources into the comprehensive communication cost, thereby providing the target of the transient data caching strategy of the internet of things: and the long-term comprehensive cost for acquiring the data of the Internet of things is minimized. By caching the transient data in the edge cache node of the network, the network flow can be unloaded to the edge of the network, the time delay is reduced, and the problems of large time delay and large communication resource consumption in the transmission of mass transient data of the Internet of things are solved to a certain extent.

2. The method uses a deep reinforcement learning method to learn the caching strategy of the transient data, particularly models a cache replacement problem as a Markov process, mines the heat trend information of the data from the data request history of the edge cache node, and combines the life cycle and the data freshness information of the transient data as the input of the environmental state of the deep reinforcement learning. The short term rewards are set as the inverse of the composite communication cost. By utilizing a critic network learning value function and an actor network learning strategy function, the cache strategy is learned in a self-adaptive manner by continuously carrying out cache replacement operation, and long-term reward is maximized, so that the long-term comprehensive cost of acquiring transient data in the Internet of things is minimized, and the problem of low transient data cache efficiency in an edge cache node under the condition of limited storage resources is solved.

Drawings

Fig. 1 is a flowchart of a cache replacement method for transient data of an internet of things according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

First, some terms used in the present invention are explained.

The edge cache node refers to a network node which is close to the user side and has caching capacity in the Internet of things.

The data freshness is equal to the ratio of the remaining effective time of the data to the effective life cycle, and the larger the ratio is, the more timely the data is used, and the higher the freshness of the data is.

Data timeliness refers to the time interval and efficiency between the generation of data from a data source and the acquisition of the data at a user end. The shorter the acquisition time interval, the more time-efficient.

The data heat degree represents the popularity of the data, namely the number of times of requesting in a certain time, and the higher the number of times, the higher the data heat degree.

Transient data refers to data that has time-dependent requirements for the data.

The conception of the invention is as follows: firstly, in order to research the comprehensive cost of transient data caching, the comprehensive cost for acquiring the data of the internet of things is divided into two parts: communication costs (including bandwidth consumption, latency, etc.) and data timeliness costs. The goal of cache replacement in the edge cache nodes of the internet of things is to minimize the long-term integrated cost of acquiring data, i.e., to consider both the communication cost and the data timeliness cost. Then, modeling the cache replacement problem as a markov process problem, thereby constructing a Deep Learning (DRL) -based cache policy: according to the history of data requests and the current cache state of the Internet of things within a period of time, a cache strategy is automatically learned, and the long-term comprehensive cost for acquiring the data of the Internet of things is minimized.

As shown in fig. 1, a cache replacement method for transient data of an internet of things, where a cache space of a current edge cache node is full, includes the following steps:

S1, the edge cache node receives a new transient data item request sent by a user.

Data item d in the internet of things is uniquely identified by CID (Content ID), and each data item comprises two fields: generating a time t_gen(d) And effective life cycle T_life(d) In that respect At time t, the age of the data item d is denoted t_age(d)＝t-t_gen(d) In that respect If the age of the data item d is less than its effective life cycle, i.e. t_age(d)＜T_life(d) Then the data item d is said to be fresh data, in the effective life cycle; otherwise, data item d is said to be stale data, which is stale.

Recording a data item request from a user side of the Internet of things as k, wherein the CID corresponding to the content of the requested data item is f_kThe time of arrival of the request is t_k. At t_kAt any moment, the data item set cached in the edge cache node of the Internet of things is recorded as

The CID set corresponding to the cached data item is

Wherein, I represents the maximum cache data item capacity of the edge cache node. Mapping function

CID information of content to be requested

Data item with cache

Are linked together.

When a data item request k arrives, the edge cache node of the Internet of things firstly checks whether CID (CID) is f in the cache or not_kAnd cache data items freshly. Consider three cases:

case 1: f. of_k∈F_kAnd t is_age(p(f_k))≤T_life(p(f_k) I.e. the requested data item content is in the cache and the cache item is fresh, meeting timeliness requirements. Thus, the edge cache node returns the cached data item p (f) directly_k) To the data requestor.

Case 2: f. of_k∈F_kAnd t is_age(p(f_k))＞T_life(p(f_k) I.e., the requested data item content is in the cache and the cache item has expired. Thus, the edge cache node retrieves new data from the data source back to the data requestor and replaces the stale data in the cache with the newly retrieved data.

Case 3:

i.e. the requested data item content is not in the cache. Therefore, the edge cache node acquires new data from the data source and returns the new data to the data requester, and meanwhile, the data to be replaced in the cache region of the edge cache node is selected by using the depth-enhanced learning, and the data to be replaced is replaced by the new data.

Through the analysis of the three situations, the user sends the request k and receives the return data item d_kCan be expressed as:

wherein f is_kRequesting k the corresponding CID for the data item, p (-) a mapping function from the request content CID to the data item, F_kA CID set corresponding to the data item cached in the edge cache node when the request k arrives, t_age(d) Indicates the age, T, of the data item d_life(d) Representing the effective life cycle of data item d.

And S2, judging whether the content of the transient data item requested is in the cache of the edge cache node, if so, entering the step S3, and otherwise, entering the step S6.

f_k∈F_kIndicating that the requested data item content is in the cache of the edge cache node,

indicating that the requested data item content is not in the cache of the edge cache node.

And S3, judging whether the transient data item of the request is fresh data or expired data, if so, entering the step S4, and if so, entering the step S5.

t_age(p(f_k))≤T_life(p(f_k) Indicates that the requested data item is fresh data, t_age(p(f_k))＞T_life(p(f_k) Indicates that the requested data item is stale data.

For the cache replacement policy of the edge cache node, initially, the edge cache node caches all arriving data items greedily until the cache space is full. When the cache is full, if a newly arrived request corresponds to the situation 1, cache replacement is not needed; corresponding to the situation 2, since it is known that the cache data corresponding to the current request is expired, the expired data is directly replaced by the newly acquired data; corresponding to case 3, when new data arrives, the cache replacement method is required to determine whether to replace the cached data item in the cache region with the new data item, and if so, which cached data item is specifically replaced.

Specifically, for data item d_kThe caching action given by the caching strategy is marked as a_kThe motion space is A ═ a⁰,a¹,…,a^I}。a_k＝a⁰Indicating no cache replacement, a_k＝aⁱ(1 ≦ I ≦ I) representing the data item d in fresh data obtained from the data source_kReplacement of cache area

And cache items corresponding to the positions. When the cache area executes cache replacement action a_kThereafter, D_kAnd F_kWill become D_k+1And F_k+1。

And S6, the edge cache node forwards the user request to a data source, reads new data from the data source, selects data to be replaced in a cache region of the edge cache node by using depth-enhanced learning, replaces the data to be replaced by the new data, and forwards the new data to the user.

Step S6 corresponds to case 3, and defines and obtains transient data item d by selecting data to be replaced in the cache region of the edge cache node using deep reinforcement learning_kCombined cost of C (d)_k)。

A transient data item d will be acquired_kCombined cost of C (d)_k) Divided into two parts, one part being the communication cost c (d)_k) The other part is the data timeliness cost l (d)_k)。

Communication cost c (d)_k) The calculation formula of (a) is as follows:

wherein, c₁Representing the communication overhead of data taken directly from the edge cache node, c₂Representing the communication overhead for obtaining data from a data source, c₁＜c₂And c is and c₁、c₂They are all normal numbers.

Data timeliness cost l (d)_k) The calculation formula of (a) is as follows:

combined cost C (d)_k) The calculation formula of (a) is as follows:

C(d_k)＝α·c(d_k)+(1-α)·l(d_k)

wherein, α belongs to [0, 1] to represent a compromise coefficient, the importance of the two costs is weighted, the larger α represents that the user is more interested in communication loss, otherwise, the user is more interested in data timeliness.

In order to optimize the comprehensive cost of data acquisition of the Internet of things, the cache replacement problem is modeled into a Markov process problem. Both cases 1 and 2 are deterministic rules for the previous cache, so only the cache replacement action in scenario 3 needs to be optimized.

The Markov process problem can be solved by { S, A, M (S)_n+₁|s_n,a_n),R(s_n,a_n) Definition, wherein S represents a state set of an edge cache node of the Internet of things system, and S_nRepresenting the state of the edge cache node at the n moment; a represents the action set of the cache replacement policy, a_nRepresenting the caching action at the n moment; m(s)_n+1|s_n,a_n) Indicating the execution of action a_nThereafter, the state of the edge cache node is from s_nIs transferred to s_n+1Of the state transition probability matrix R(s)_n,a_n) Representing an instant reward function, at state s_nPerforming action a_nThe latter system rewards feedback. Thus, the entire cache replacement process can be expressed as:

1) at n time, the edge cache node observes the system state information to obtain the state s of the system at n time_n∈S。

2) The edge cache node is according to the cache strategy pi (a)_n|s_n) Select cache action a_nAnd executed.

3) Performing caching actionsa_nThereafter, the system returns an instant prize r_n＝R(s_n,a_n) And the system state is represented by s_nIs transferred to s_n+1。

4) Instant reward r_nFeeding back to the edge cache node to convert the current state<s_n,a_n,r_n,s_n+1>And adding the training samples into an experience pool of deep reinforcement learning for training an actor-critic network, and repeating the process.

Wherein, the cache strategy is pi (a)_n|s_n) Indicates in a state s_nNext, select cache replace action a_nThe probability of (c) can be abbreviated as pi.

Long-term cumulative rewards are recorded for the whole process

Wherein γ ∈ [0, 1]]Is a discount coefficient which determines how much future rewards will affect the current stage caching decision. Therefore, the aim of the invention is to find the optimal caching strategy pi^*Maximizing long-term jackpot expectations in all states:

wherein E [ R ]_n|π]Long-term cumulative reward R representing a caching strategy pi_nThe expectation is that.

In order to measure the advantages and disadvantages of the caching strategy pi, a cost function V is defined^π(s_n)＝E[R_n|s_n；π]Representing a cache policy pi in state s_nLong term accumulated reward R_nThe expectation is that. State s_nThe optimal cost function of^*(s_n)＝max_πV^π(s_n)。

From Markov properties, will V^π(s_n) Substituting the Bellman equation yields:

wherein, p(s)_n+1,r_n|s_n,a_n) Is shown in state s_nLower execution action a_nThen, the state is shifted to_n+1And receive the system instant prize r_nIs given by [0, 1]]Representing the discount coefficient. The formula provides an iterative computation method of the cost function in the Markov process.

Given a state transition probability matrix M(s) for the Markov process_n+1|s_n,a_n) The optimal strategy can be solved by a dynamic planning method. However, in the process of edge cache replacement, the state transition probability matrix is unknown, so that the method intelligently mines information from historical data by using a deep reinforcement learning method and learns a fitting state-value function by using a deep neural network, thereby obtaining an optimal cache replacement strategy.

The invention uses a deep reinforcement learning method to adaptively learn an efficient cache strategy, takes an Actor-Critic (Actor-Critic) deep reinforcement learning method as an example, but is not limited to the Actor-Critic method, and introduces the technical details of the invention. Wherein:

inputting: in the process that the edge cache node continuously processes the user request, the situation 3 occurs, that is, when the content of the requested data item is not in the cache, the edge cache node obtains the requested data item d from the data source by forwarding the request_n. At this point, the caching policy Agent (Agent) observes the current environmental state

As input to the deep neural network. Wherein the content of the first and second substances,

indicating the currently requested data item d_nIs determined by the feature vector of (a),

a feature vector representing the ith data item in the cache region of the edge cache node. Feature vector

Representing the content of data items in each group in past J-group requests

The feature vector reflects the heat of the data item.

Representing data item content

The effective life cycle of the system is as follows,

representing data item content

Freshness equal to the data item content

Is compared to the effective life cycle. In addition to the data request information, the input information may include scene information, edge network information, and the like.

Strategy: the present invention characterizes a replacement policy by an actor network_θ(a_n|s_n) And can also be expressed as pi (a)_n|s_n(ii) a Theta), may be abbreviated as pi_θThe actor network parameter is θ. The input of the actor network is the state information s of the edge cache node at the time n_nThe output of the actor network is the probability of selecting each cache action. The deep reinforcement learning is finally according to the strategy pi_θ(a_n|s_n) Selecting cache replacement action, namely executing replacement action a on data at the cache region position with the maximum probability of selecting each cache action_n. For example, there are 3 buffers in the edge cache node (index bits 1,2,3, then 0 for no replacement). The probability of the actor's network output is (0.1,0.1,0.7,0.1), and then the alternate location is selected according to this probability, where the output is likely to beIs a²Therefore, the buffer 2 is replaced. The aim of the actor network is to output a corresponding cache replacement action according to the learned cache policy, thereby maximizing the long-term cumulative reward expectation ER_n|π_θ]。

State-cost function: the invention characterizes a state-value function with a critic network

The comment family network parameter is theta_v. The input of the critic network is the state information s of the edge cache node at the n moment_nThe output of the critic's network is that the state is in the current policy π_θThe values indicated below. The goal of the critic network is to estimate the strategy pi as accurately as possible_θLower state s_nThe value of (A) is obtained.

Strategy training: cache replace action a Per execution_nAnd the system feeds back an instant reward r to the cache strategy intelligent decision module_n：

Wherein, Req_nRepresenting a caching action a_nAfter execution, the action a is cached until the next execution_n+1All data request sets received by the edge cache nodes in between, C (d)_k) For obtaining data d_kThe comprehensive cost of (2). Since the goal is to minimize the long-term cost of acquiring data, the cost is preceded by a negative sign.

The expectation of the long-term total reward can be estimated from the Bellman equation and the state transition trajectory of the markov process, so that the network parameters are learned by means of gradient updating. The gradient of the total reward expectation versus actor network parameter θ may be calculated as:

to maximize the overall reward expectation, the actor network parameter θ is updated according to a gradient ramp as:

wherein, λ is the learning rate of the actor network, which can be adjusted according to the actual situation;

representing a gradient operator; merit function

Measure at state s_nNext, select action a_nHow good is it.

The critic network can be trained by a time difference method, and the loss function is set as the output of the critic network

And a target value

The square error of (d). Network parameter theta of critic network_vUpdate according to gradient descent is:

wherein λ' is the learning rate of the critic network, and can be adjusted according to actual conditions;

representing the gradient operator.

To address the "exploration-exploitation dilemma" in reinforcement learning, "exploitation" means taking the best action that has been currently learned, while "exploration" attempts to fully explore the action space by taking currently non-best actions. In order to prevent the learning strategy from falling into local optimum, the technical scheme adds the entropy of the strategy (action probability distribution output by the actor network) into the updating process of the actor network parameter theta in the form of a regular term, thereby realizing the encouragement of an 'exploration' process:

wherein H (pi (. | s)_n(ii) a θ)) is the state s_nStrategy n_θthe strategy entropy of the output motion space, β represents an exploration coefficient,

representing the gradient operator. The specific calculation is as follows:

the method is characterized in that a gradient ascending method is adopted, theta is updated towards the direction of entropy increase, and the 'exploration' process is encouraged.

The caching strategy supports two modes of online learning and offline learning. The online learning can be directly deployed to the edge cache nodes, network parameters are periodically updated according to the Internet of things request data processed by the edge cache nodes, and a cache strategy is learned. Offline learning is performed by pre-training a cache strategy on line, and then the cache strategy is deployed to an edge cache node and is kept unchanged.

The invention provides a cache replacement system for transient data of the Internet of things, which comprises: the device comprises a state judgment module, a reading module, a request forwarding module and a cache replacement module;

The system also comprises a training module which is used for collecting the cache replacement action, instant reward brought by the replacement action and the states of the edge cache nodes before and after replacement when the cache replacement action is executed every time, and training the network parameters of deep reinforcement learning in the cache replacement module based on the parameters.

The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A cache replacement method for transient data of the Internet of things is characterized in that the cache space of a current edge cache node is full, and the method comprises the following steps:

s6, the edge cache node forwards the user request to a data source, reads new data from the data source, selects data to be replaced in a cache region of the edge cache node by using depth-enhanced learning, replaces the data to be replaced by the new data, and forwards the new data to the user;

in step S6, the selecting data to be replaced in the cache region of the edge cache node using depth-enhanced learning specifically includes:

4) Will award r immediately_nFeeding back to the edge cache node and converting the state<s_n,a_n,r_n,s_n+1>As a training sample, training an actor-critic network for deep reinforcement learning, and repeating the process;

the network parameter θ of the actor in the deep reinforcement learning is updated according to the gradient rise as follows:

wherein λ is the learning rate of the mobile network,

representing a state-cost function; network parameter theta of critic network in deep reinforcement learning_vUpdate according to gradient descent is:

wherein λ' is the learning rate of the critic network;

or, the actor network parameter θ in the deep reinforcement learning is updated according to the gradient rise as:

wherein λ is the learning rate of the mobile network,

representing a state-cost function; h (pi (· | s)_n(ii) a θ)) is the state s_nStrategy n_θstrategy entropy of output action space, beta represents exploration coefficient, and network parameter theta of critic network in deep reinforcement learning_vUpdate according to gradient descent is:

where λ' is the learning rate of the critic network.

2. The cache replacement method according to claim 1, wherein in step S2, if f is greater than f_k∈F_kThe requested data item content is in the cache of the edge cache node, if so

3. The cache replacement method according to claim 1, wherein in step S3, if t is the case_age(p(f_k))≤T_life(p(f_k) The requested data item is fresh data if t)_age(p(f_k))＞T_life(p(f_k))，The requested data item is stale data, where f_kRequesting k the corresponding CID for the data item, p (-) a mapping function from the request content CID to the data item, T_life(. is) the effective lifecycle of the data item, t_age(. cndot.) represents the age of the data item.

4. The cache replacement method of claim 1, wherein an instant prize r_nThe calculation formula of (a) is as follows:

5. The cache replacement method of claim 4, wherein the combined cost C (d)_k) The calculation formula of (a) is as follows:

C(d_k)＝α·c(d_k)+(1-α)·l(d_k)

wherein, α ∈ [0, 1]]Representing the compromise coefficient, c (d)_k) Denotes the communication cost, l (d)_k) Representing the data timeliness cost, c₁Representing the communication overhead of data taken directly from the edge cache node, c₂Representing the communication overhead for obtaining data from a data source, c₁＜c₂And c is and c₁、c₂Are all normal numbers; f. of_kRequesting k the corresponding CID, F for the data item_kCaching in edge cache nodes for request k arrivalP (-) is a mapping function from the request content CID to the data item, T_life(. is) the effective lifecycle of the data item, t_age(. cndot.) represents the age of the data item.

6. A cache replacement system for transient data of the Internet of things is characterized in that the cache space of a current edge cache node is full, and the system comprises: the device comprises a state judgment module, a reading module, a request forwarding module and a cache replacement module;

the cache replacement module is configured to replace the expired data in the cache area of the edge cache node with the new data read by the request forwarding module and forward the new data to the user when the determination result of the state determination module is state two; when the judgment result of the state judgment module is state three, selecting data to be replaced in the cache region of the edge cache node by using deep reinforcement learning, replacing the data to be replaced by the new data read by the request forwarding module, and forwarding the new data to the user;

the selecting of the data to be replaced in the cache region of the edge cache node by using the deep reinforcement learning specifically includes:

wherein λ is the learning rate of the mobile network,

wherein λ' is the learning rate of the critic network;

wherein λ is the learning rate of the mobile network,

where λ' is the learning rate of the critic network.

7. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the cache replacement method of any one of claims 1 to 5.