CN109660598A

CN109660598A - A kind of buffer replacing method and system of Internet of Things Temporal Data

Info

Publication number: CN109660598A
Application number: CN201811370683.4A
Authority: CN
Inventors: 曹洋; 褚磊; 竺浩; 江涛
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2018-11-17
Filing date: 2018-11-17
Publication date: 2019-04-19
Anticipated expiration: 2038-11-17
Also published as: CN109660598B

Abstract

The invention discloses the buffer replacing methods and system of a kind of Internet of Things Temporal Data, use the cache policy of depth enhancing learning method study Temporal Data, the temperature trend of mining data from the request of data history of edge caching nodes, in conjunction with Temporal Data information, input as depth enhancing study, immediately reward is set as the opposite number of integrated communication cost, utilize reviewer's e-learning cost function, actor networks learning strategy function, by constantly carrying out caching replacement operation, adaptively learn cache policy, solve the problems, such as that Temporal Data buffer efficiency is not high in storage resource confined condition lower edge cache node；Integrated communication cost is included in the consumption of the data carry mechanism of internet of things data and the communication resource, minimize the long-term overall cost for obtaining internet of things data, can offloading network flow to network edge, reduce time delay, solve the problems, such as to a certain extent time delay present in the transmission of Internet of Things magnanimity Temporal Data is big, the communication resource consume it is big.

Description

A kind of buffer replacing method and system of Internet of Things Temporal Data

Technical field

The invention belongs to wireless communication fields, more particularly, to a kind of buffer replacing method of Internet of Things Temporal Data And system.

Background technique

As Internet of Things (IoT) is in the rapid of the fields such as intelligent transportation, smart grid, smart home and industrial automation The internet of things data flow of development and extensive use, magnanimity brings huge pressure and challenge to communication network now.For It solves the above problems, a kind of common thinking is the addition edge cache mechanism in Internet of Things, utilizes network edge caching section The idle storage resource of point, caches hot spot data, and request end directly can obtain data from corresponding edge caching nodes, and nothing It need to be obtained from data source, so as to avoid a large amount of unnecessary end-to-end communications.Edge cache in Internet of things system can be with Offloading network flow reduces network delay, provides better service quality and user experience.Due to the storage of edge caching nodes Generally than relatively limited, an efficient cache replacement policy can be improved caching and hits rate capacity, so that more efficient utilization is slow Space is deposited, more network flows are unloaded.It widely applies in Internet of things system and also the timeliness of data is required, only one Data in timing effect are just available data, thus data cached freshness be also when caching is replaced one important consider Standard.Thus, edge cache replacement policy in Internet of things system needs to consider simultaneously data cached temperature and freshness letter Breath, thus the buffer size under better meeting scenes of internet of things.

Traditional buffer replacing method, such as first in first out method, least recently used method, least commonly using etc., by Distribution is requested in the temperature trend for not considering content and user, buffer efficiency is lower.The caching of existing edge cache replaces plan It slightly include: being delayed using user-data correlation by the temperature of collaborative filtering prediction data for E.Bastug et al. proposition Deposit strategy start when wolfishly cache hot spot data, until edge caching nodes caching exhaust, then according to predict come Temperature information carries out caching replacement；What P.Blasco et al. was proposed is solved in small base station by knapsack Method (Knapsack) The problem of data are placed, wherein data temperature is that the request rate arrived according to data receiver is estimated；What J.Song et al. was proposed Multi-arm fruit machine (multi-armed bandit, the MAB) method of introducing, data temperature is combined with data buffer storage process and is examined Consider；The temperature by constructing neural network estimated data that S.Tanzil et al. is proposed, uses mixed integer linear programming method Calculate the placement location and size of caching, but the temperature forecast period of this method, need using video content keyword and Classification information, this is not applicable to general IoT data.

The above method focuses primarily upon the edge cache of non-transient data, the temperature that edge caching nodes pass through estimated data To decide whether how caching replaces caching.On the one hand, as it is assumed that the temperature distribution of data and user request to obey specifically It is distributed (such as Poisson distribution), so that data temperature and the fast-changing scene of user's request distribution can not be adapted to；On the other hand, only It is absorbed in the caching of non-transient data, does not account for the imeliness problem of Temporal Data.

Summary of the invention

In view of the drawbacks of the prior art, it is an object of the invention to solve the transmission of prior art Internet of Things magnanimity Temporal Data Present in time delay is big, communication resource consumption is big, Temporal Data caching effect in storage resource confined condition lower edge cache node The not high technical problem of rate.

To achieve the above object, in a first aspect, the embodiment of the invention provides a kind of cachings of Internet of Things Temporal Data to replace Method is changed, the spatial cache of current edge cache node has been expired, method includes the following steps:

S1. edge caching nodes receive the new Temporal Data item request of user's sending；

S2. the Temporal Data item content of request is judged whether in the caching of edge caching nodes, if so, entering step Otherwise S3 enters step S6；

S3. the Temporal Data item for judging request is that fresh data or stale data enter step if fresh data S4 enters step S5 if stale data；

S4. the data directly are read from the buffer area of edge caching nodes, by the data forwarding to user；

S5. user's request is transmitted to data source by edge caching nodes, and new data is read from data source, is replaced with new data The stale data in the buffer area of edge caching nodes is changed, and the new data is transmitted to user；

S6. user's request is transmitted to data source by edge caching nodes, and new data is read from data source, is increased using depth Data to be replaced in the buffer area of strong study selection edge caching nodes replace the data to be replaced with new data, and will The new data is transmitted to user.

Specifically, in step S2, if f_k∈F_k, then the data item content requested in the caching of edge caching nodes, ifThe data item content then requested is not in the caching of edge caching nodes, wherein f_kRequest k corresponding for data item Data content unique identifier CID, F_kThe corresponding CID set of the data item cached in edge caching nodes when being reached for request k.

Specifically, in step S3, if t_age(p(f_k))≤T_life(p(f_k)), then the data item requested is fresh data, if t_age(p(f_k)) > T_life(p(f_k)), then the data item requested is stale data, wherein f_kRequest k corresponding for data item CID, p () are the mapping function from request content CID to data item, T_life() is effective life cycle of data item, t_age The age of () expression data item.

Specifically, to be replaced in the buffer area described in step S6 using depth enhancing study selection edge caching nodes Data specifically include:

1) at the n moment, edge caching nodes status information is observed, n moment state s is obtained_n；

2) according to cache policy π (a_n|s_n) selection caching movement a_nAnd execute caching movement；

3) it executes caching and acts a_nAfterwards, reward r immediately is calculated_n, edge caching nodes status information is by s_nBecome s_n+1；

4) r will be rewarded immediately_nFeed back edge caching nodes, and by this state conversion process < s_n,a_n,r_n,s_n+1> make It is repeated the above process for training sample for training actor-reviewer's network of depth enhancing study.

Specifically, r is rewarded immediately_nCalculation formula it is as follows:

Wherein, Req_nIndicate that caching acts a_nAfter execution, a is acted to caching is executed next time_n+1Between edge cache section All request of data set that point receives, C (d_k) it is to obtain data d_kOverall cost.

Specifically, overall cost C (d_k) calculation formula it is as follows:

C(d_k)=α c (d_k)+(1-α)·l(d_k)

Wherein, α ∈ [0,1] indicates compromise coefficient, c (d_k) indicate communications cost, l (d_k) indicate data age cost, c₁ Indicate the communication overhead that data are directly obtained from edge caching nodes, c₂Indicate the communication overhead that data are obtained from data source, c₁ < c₂, and c₁、c₂It is normal number；f_kK corresponding CID, F are requested for data item_kDelay in edge caching nodes when being reached for request k The corresponding CID set of the data item deposited, p () is the mapping function from request content CID to data item, T_life() is data Effective life cycle of item, t_ageThe age of () expression data item.

Specifically, rising according to gradient for actor networks parameter θ updates in depth enhancing study are as follows:

Wherein, λ is the learning rate of actor networks,Indicate gradient operator, tactful π (a_n|s_n；θ) indicate in state s_n Under, selection caching replacement acts a_nProbability,For advantage function, γ ∈ [0,1] indicates discount factor,Expression state-cost function；

The network parameter θ of reviewer's network in depth enhancing study_vDecline according to gradient and update are as follows:

Wherein, λ ' is the learning rate of reviewer's network.

Wherein, λ is the learning rate of actor networks,Indicate gradient operator, tactful π (a_n|s_n；θ) indicate in state s_n Under, selection caching replacement acts a_nProbability,For advantage function, γ ∈ [0,1] indicates discount factor,Expression state-cost function；H(π(·|s_n；It θ)) is state s_nUnder, strategy π_θThe strategy of the motion space of output Entropy, β indicate to explore coefficient；

Wherein, λ ' is the learning rate of reviewer's network.

Second aspect, the embodiment of the present invention provide a kind of caching replacement system of Internet of Things Temporal Data, and current edge is slow The spatial cache for depositing node has been expired, which includes: condition judgment module, read module, request forwarding module and caching replacement Module；

The condition judgment module, for judging that the state of the requested Temporal Data of user, the state include: state One: the Temporal Data item content of request is in the caching of edge caching nodes and the Temporal Data item of request is fresh data；Shape State two: the Temporal Data item content of request is in the caching of edge caching nodes and the Temporal Data item of request is stale data； State three: the Temporal Data item content of request is not in the caching of edge caching nodes；

The read module, for being state a period of time in the condition judgment module judging result, directly from edge cache The buffer area of node reads the data, by the data forwarding to user；

The request forwarding module, for when the condition judgment module judging result is state two or three, edge to be slow It deposits node and user's request is transmitted to data source, new data is read from data source；

The caching replacement module is used for when the condition judgment module judging result is state two, with the request The stale data in the buffer area for the new data replacement edge caching nodes that forwarding module is read, and the new data is transmitted to use Family；When the condition judgment module judging result is state three, enhance the slow of study selection edge caching nodes using depth Data to be replaced in area are deposited, replace the data to be replaced with the new data that the request forwarding module is read, and this is new Data forwarding is to user.

The third aspect, the embodiment of the invention provides a kind of computer readable storage medium, the computer-readable storage mediums Computer program is stored in matter, which realizes that caching described in above-mentioned first aspect is replaced when being executed by processor Method.

In general, through the invention it is contemplated above technical scheme is compared with the prior art, have below beneficial to effect Fruit:

1. the present invention considers effective life cycle of Temporal Data, by the data carry mechanism and the communication resource of internet of things data Consumption be included in integrated communication cost, to give the target of Internet of Things Temporal Data cache policy: minimize obtain Internet of Things The long-term overall cost of network data.It, being capable of offloading network flow by caching Temporal Data in the edge caching nodes of network To network edge, time delay is reduced, solves that time delay present in the transmission of Internet of Things magnanimity Temporal Data is big, communication to a certain extent The big problem of resource consumption.

2. the present invention is specifically replaced caching using the cache policy of the method study Temporal Data of depth enhancing study The problem of changing is modeled as markoff process, the temperature trend letter of mining data from the request of data history of edge caching nodes Breath, in conjunction with the life cycle and data carry mechanism information of Temporal Data, the input of the ambient condition as depth enhancing study.It is short Phase rewards the opposite number for being set as integrated communication cost.Learn plan using reviewer's e-learning cost function, actor networks Slightly function adaptively learns cache policy, long-term reward is maximized, thus most by constantly carrying out caching replacement operation The long-term overall cost that Temporal Data is obtained in small compound networking, solves in storage resource confined condition lower edge cache node The not high problem of Temporal Data buffer efficiency.

Detailed description of the invention

Fig. 1 is a kind of buffer replacing method flow chart of Internet of Things Temporal Data provided in an embodiment of the present invention.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.

Firstly, being explained to some terms used in the present invention.

Edge caching nodes refer to the network node with caching capabilities in Internet of Things close to user side.

Data carry mechanism is equal to the effective time of data redundancy and the ratio of effective life cycle, and ratio is bigger, uses Data are more timely, and the freshness of data is higher.

Data age refers to the time interval and efficiency that the data are obtained from data source generation data to user terminal.It obtains Time interval is shorter, and timeliness is stronger.

The popularity degree of data Thermometer registration evidence, i.e., requested number in certain time, number is higher, data temperature It is higher.

Temporal Data refers to the data required data time-effectiveness.

Design of the invention are as follows: firstly, will acquire the comprehensive of internet of things data for the overall cost of research Temporal Data caching Synthesis is originally divided into two parts: communications cost (including bandwidth consumption, time delay etc.) and data timeliness cost.Internet of Things network edge is slow Depositing and caching the target of replacement in node is to minimize the long-term overall cost for obtaining data, i.e., considers communications cost and data simultaneously Timeliness cost.Then, caching replacement problem is modeled as Markov process problem, is learned to construct and be enhanced based on depth Practise the cache policy of (Deep Reinforcement Learning, DRL): according to the request of data that Internet of Things is interior for a period of time History and current buffer status learn cache policy automatically, minimize and obtain the long-term overall cost that internet of things data obtains.

As shown in Figure 1, a kind of buffer replacing method of Internet of Things Temporal Data, the spatial cache of current edge cache node It has been expired that, method includes the following steps:

Step S1. edge caching nodes receive the new Temporal Data item request of user's sending.

Data item d CID (Content ID) unique identification in Internet of Things, each data item include two fields: raw At time t_gen(d) and effective life cycle T_life(d).T is expressed as at the age of t moment, data item d_age(d)=t-t_gen (d).If the age of data item d is less than its effective life cycle, i.e. t_age(d) < T_life(d), then data item d is claimed to be fresh number According in effective life cycle；Otherwise, data item d is claimed to be stale data, the overaging phase.

It will be denoted as k from the request of the data item of Internet of Things user terminal, the corresponding CID of requested data item content is f_k, ask The time for asking arrival is t_k.In t_kMoment, the collection of data items cached in Internet of Things edge caching nodes are denoted asThe corresponding CID collection of the data item of caching is combined intoWherein, I Indicate the largest buffered data item capacity of edge caching nodes.Mapping functionBy the cid information of request content With the data item of cachingIt connects.

After data item request k is reached, Internet of Things edge caching nodes first check for whether there is CID in caching being f_kAnd Fresh data cached item.Three kinds of situations are divided to consider:

Situation 1:f_k∈F_kAnd t_age(p(f_k))≤T_life(p(f_k)), that is, the data item content requested in the buffer, and is somebody's turn to do Cache entry be it is fresh, meet timeliness requirement.Therefore, the data item p (f of the direct return cache of edge caching nodes_k) to number According to requestor.

Situation 2:f_k∈F_kAnd t_age(p(f_k)) > T_life(p(f_k)), that is, the data item content requested in the buffer, and is somebody's turn to do Cache entry is expired.Therefore, edge caching nodes obtain new data from data source and return to data requester, and with newly The data of acquisition replace the stale data in caching.

Situation 3:The data item content requested is not in the buffer.Therefore, edge caching nodes are from data source Place obtains new data and returns to data requester, while using in the buffer area of depth enhancing study selection edge caching nodes Data to be replaced replace the data to be replaced with new data.

By the analysis of three of the above situation, after user issues request k, the returned data item d that receives_kIt can indicate are as follows:

Wherein, f_kRequesting the corresponding CID of k, p () for data item is the mapping function from request content CID to data item, F_kThe corresponding CID set of the data item cached in edge caching nodes when being reached for request k, t_age(d) year of data item d is indicated Age, T_life(d) effective life cycle of data item d is indicated.

Step S2. judges the Temporal Data item content of request whether in the caching of edge caching nodes, if so, entering step Otherwise rapid S3 enters step S6.

f_k∈F_kShow request data item content in the caching of edge caching nodes,Show the data of request Item content is not in the caching of edge caching nodes.

The Temporal Data item of step S3. judgement request is fresh data or stale data, if fresh data, into step Rapid S4 enters step S5 if stale data.

t_age(p(f_k))≤T_life(p(f_k)) show that the data item of request is fresh data, t_age(p(f_k)) > T_life(p (f_k)) show that the data item of request is stale data.

For the cache replacement policy of edge caching nodes, edge caching nodes first wolfishly cache all arrival when initial Data item, until spatial cache fills up.After caching has been expired, if newly arrived request, corresponding situation 1, without being delayed Deposit replacement；Corresponding situation 2, it is data cached out of date as corresponding to known current request, directly with the number newly obtained According to replacing the stale data；Corresponding situation 3, new data reach, and buffer replacing method is needed to decide whether to use newly Data item replaces the buffered data item of buffer area, if replacement, specifically replaces the data item which has been cached.

Specifically, to data item d_kThe caching movement that cache policy provides is denoted as a_k, motion space is A={ a⁰,a¹,…, a^I}。a_k=a⁰It indicates to replace without caching, a_k=aⁱ(1≤i≤I) is indicated with the fresh data item d obtained from data source_k Replace buffer areaThe corresponding cache entry in position.A is acted when buffer area executes caching replacement_kAfter, D_kAnd F_kIt will become D_k+1With F_k+1。

User's request is transmitted to data source by step S6. edge caching nodes, and new data is read from data source, uses depth Data to be replaced, replace the data to be replaced with new data in the buffer area of degree enhancing study selection edge caching nodes, And the new data is transmitted to user.

Step S6 corresponds to situation 3, enhances number to be replaced in the buffer area of study selection edge caching nodes using depth According to definition obtains Temporal Data item d_kOverall cost C (d_k)。

It will acquire Temporal Data item d_kOverall cost C (d_k) it is divided into two parts, a part is communications cost c (d_k), separately A part is data age cost l (d_k)。

Communications cost c (d_k) calculation formula it is as follows:

Wherein, c₁Indicate the communication overhead that data are directly obtained from edge caching nodes, c₂It indicates to obtain number from data source According to communication overhead, c₁< c₂, and c₁、c₂It is normal number.

Data age cost l (d_k) calculation formula it is as follows:

Overall cost C (d_k) calculation formula it is as follows:

C(d_k)=α c (d_k)+(1-α)·l(d_k)

Wherein, α ∈ [0,1] indicates compromise coefficient, is weighted to the importance of two kinds of costs, and biggish α indicates to use Loss of communications is more taken notice of at family, and otherwise, user more takes notice of data age.

In order to optimize the overall cost that Internet of Things obtains data, caching replacement problem is modeled as Markov process and is asked Topic.Corresponding front caching scenario 1 and situation 2 are all deterministic rules, therefore it may only be necessary to optimize the caching replacement in scene 3 Movement.

Markov process problem can pass through { S, A, M (s_n+₁|s_n,a_n),R(s_n,a_n) definition, wherein S indicates Internet of Things The state set of net system edges cache node, s_nIndicate the state of n moment edge caching nodes；A indicates cache replacement policy Set of actions, a_nIndicate the caching movement at n moment；M(s_n+1|s_n,a_n) indicate that execution acts a_nAfterwards, the shape of edge caching nodes State is from s_nTransfer is s_n+1State transition probability matrix, R (s_n,a_n) indicate instant reward function, in state s_nExecution acts a_nAfterwards System award feedback.Therefore, entirely caching replacement process can indicate are as follows:

1) at the n moment, edge caching nodes observing system status information obtains system n moment state s_n∈S。

2) edge caching nodes are according to cache policy π (a_n|s_n) selection caching movement a_nAnd it executes.

3) it executes caching and acts a_nAfterwards, system returns to a reward r immediately_n=R (s_n,a_n), and system mode is by s_nTransfer For s_n+1。

4) r is rewarded immediately_nEdge caching nodes are fed back, by this state conversion process < s_n,a_n,r_n,s_n+1> as instruction The experience pond for practicing sample addition deeply study is repeated the above process for training actor-reviewer's network.

Wherein, cache policy π (a_n|s_n) indicate, in state s_nUnder, selection caching replacement acts a_nProbability, can be abbreviated as π。

The long-term accumulation reward of whole process is denoted asWherein, γ ∈ [0,1] is discount factor, is determined Influence degree of the reward in future to cache decision at this stage.Therefore, it is an object of the present invention to find optimal cache policy π^*, it maximizes the stateful lower long-term accumulated reward of institute and it is expected:Wherein, E [R_n| π] indicate slow Deposit the long-term accumulated reward R of tactful π_nExpectation.

In order to measure the superiority and inferiority of cache policy π, cost function V is defined^π(s_n)=E [R_n|s_n；π] indicate that cache policy π exists State s_nUnder long-term accumulation reward R_nExpectation.State s_nUnder optimum value function be represented by V^*(s_n)=max_πV^π(s_n)。

By Markov property, by V^π(s_n) bring Bellman equation into, it obtains:

Wherein, p (s_n+1,r_n|s_n,a_n) indicate in state s_nLower execution acts a_nAfterwards, it is transferred to state s_n+1And obtain system Immediately reward r_nProbability, γ ∈ [0,1] indicate discount factor.This equation gives the iteration of cost function in markoff process Calculation method.

If providing the state transition probability matrix M (s of Markov process_n+1|s_n,a_n), Dynamic Programming can be passed through Method solves optimal policy.However in edge cache replacement process, state transition probability matrix is unknown, and therefore, the present invention uses The method of depth enhancing study, intelligently the mined information from historical data, learns fitting state-valence using deep neural network Value function, to obtain optimal cache replacement policy.

The present invention enhances learning method using depth, learns efficient cache policy, adaptively with actor-reviewer (Actor-Critic) for depth enhancing learning method, but it is not limited to actor-reviewer's method, introduced of the invention Technical detail.Wherein:

Input: during edge caching nodes constantly handle user's request, there is situation 3, that is, the data item requested Content not in the buffer when, edge caching nodes by forward the request, requested data item d is got from data source_n。 At this point, cache policy intelligent body (Agent) observes current ambient conditionInput as deep neural network.Wherein,Table Show current request data item d_nFeature vector,Indicate i-th of data item in the buffer area of edge caching nodes Feature vector.Feature vectorIt indicates in J group request in the past, it is each To data item content in groupRequest number of times, this feature vector reflects the temperature of the data item.It indicates Data item contentEffective life cycle,Indicate data item contentFreshness, freshness be equal to number According to item contentRemaining effective time and effective life cycle ratio.In addition to these data request informations, the information of input In can also be comprising scene information, edge network information etc..

Strategy: the present invention characterizes replacement policy π with actor networks_θ(a_n|s_n), also referred to as π (a_n|s_n；It θ), can letter Referred to as π_θ, actor networks parameter is θ.The input of actor networks is the status information s of n moment edge caching nodes_n, action The output of person's network is the probability of each caching movement of selection.Depth enhancing study is finally according to tactful π_θ(a_n|s_n) select to delay Replacement movement is deposited to execute replacement that is, to the data in the caching zone position for selecting each caching movement maximum probability and act a_n.Example Such as, there is 3 buffer areas (mark 1,2,3, then indicate not replace with 0) in edge caching nodes.Actor networks export general Rate is (0.1,0.1,0.7,0.1), and the position then replaced according to this probability selection, very possible output is exactly a herein², So being exactly to replace No. 2 buffer areas.The target of actor networks is to export corresponding caching according to the cache policy learnt Replacement movement, to maximize long-term accumulated reward expectation E [R_n|π_θ]。

State-cost function: the present invention characterizes state-cost function with reviewer's networkReviewer Network parameter is θ_v.The input of reviewer's network is the status information s of n moment edge caching nodes_n, the output of reviewer's network It is the state in current strategies π_θLower represented value.The target of reviewer's network is accurately to estimate tactful π as far as possible_θ Lower state s_nValue.

Strategies Training: every primary caching replacement of execution acts a_n, system feedback is to cache policy intelligent decision module one Immediately reward r_n:

Wherein, Req_nIndicate that caching acts a_nAfter execution, a is acted to caching is executed next time_n+1Between edge cache section All request of data set that point receives, C (d_k) it is to obtain data d_kOverall cost.Due to target be in order to minimize obtain The long-term overall cost of data, so having added negative sign before cost.

Track is shifted according to the state of Bellman equation and Markov process, it is estimated that the phase always rewarded for a long time It hopes, thus come learning network parameter by way of gradient updating.Ladder of the expectation always rewarded relative to actor networks parameter θ Degree may be calculated:

In order to maximize the expectation always rewarded, rising according to gradient for actor networks parameter θ updates are as follows:

Wherein, λ is the learning rate of actor networks, can be adjusted according to the actual situation；Indicate gradient operator；Advantage FunctionIt has measured in state s_nUnder, selection acts a_nIt is how well.

Reviewer's network can be set as the output of reviewer's network by the training of the method for time difference, loss functionWith target valueSquare error.The network parameter θ of reviewer's network_vAccording to gradient Decline updates are as follows:

Wherein, λ ' is the learning rate of reviewer's network, can be adjusted according to the actual situation；Indicate gradient operator.

In order to solve " exploration-utilizes predicament " (exploration-exploitation in enhancing study Dilemma), " utilization " indicates to take the currently optimal movement that has learnt, and " exploration " attempt by take it is current it is non-most Motion space is fully explored in good movement.Learning strategy falls into local optimum in order to prevent, and the technical program is by by plan The updated of actor networks parameter θ is added in the entropy of slightly (the movement probability distribution of actor networks output) in the form of regular terms Journey, to realize the encouragement to " exploration " process:

Wherein, H (π (| s_n；It θ)) is state s_nUnder, strategy π_θThe tactful entropy of the motion space of output, β indicate to explore system Number,Indicate gradient operator.It is specific to calculate are as follows:

By gradient ascent method, θ is updated towards the direction that entropy increases, and encourages " exploration " process.Explore factor beta be one just Number, to balance the degree of " exploration " and " utilization ".Bigger β value indicates more to encourage " exploration ", and when specific implementation can basis It needs to adjust.

Cache policy supports on-line study and off-line learning two ways.On-line study can directly be deployed to edge cache Node learns cache policy according to the Internet of Things request data that edge caching nodes are handled, periodically update network parameter. It is under off-line learning is first online that cache policy pre-training is good, edge caching nodes are then deployed to, are remained unchanged.

The present invention provides a kind of caching replacement system of Internet of Things Temporal Data, which includes: condition judgment module, reads Modulus block, request forwarding module and caching replacement module；

The system further includes training module, acted in every execution once caching replacement movement, collecting caching replacement, Immediately the state of the edge caching nodes of reward, replacement front and back brought by replacement movement, and based on described in above-mentioned parameter training Cache the network parameter of depth enhancing study in replacement module.

More than, the only preferable specific embodiment of the application, but the protection scope of the application is not limited thereto, and it is any Within the technical scope of the present application, any changes or substitutions that can be easily thought of by those familiar with the art, all answers Cover within the scope of protection of this application.Therefore, the protection scope of the application should be subject to the protection scope in claims.

Claims

1. a kind of buffer replacing method of Internet of Things Temporal Data, which is characterized in that the spatial cache of current edge cache node It has been expired that, method includes the following steps:

S2. judge the Temporal Data item content of request whether in the caching of edge caching nodes, it is no if so, enter step S3 Then, S6 is entered step；

S3. judge that the Temporal Data item of request is that fresh data or stale data if fresh data enter step S4, if It is stale data, enters step S5；

S5. user's request is transmitted to data source by edge caching nodes, and new data is read from data source, replaces side with new data The stale data in the buffer area of edge cache node, and the new data is transmitted to user；

S6. user's request is transmitted to data source by edge caching nodes, and new data is read from data source, is enhanced using depth and is learned Data to be replaced in the buffer area of selection edge caching nodes are practised, replace the data to be replaced with new data, and this is new Data forwarding is to user.

2. buffer replacing method as described in claim 1, which is characterized in that in step S2, if f_k∈F_k, then the data requested Content in the caching of edge caching nodes, ifThe data item content then requested is not in the slow of edge caching nodes In depositing, wherein f_kK corresponding data content unique identifier CID, F are requested for data item_kEdge cache section when being reached for request k The corresponding CID set of the data item cached in point.

3. buffer replacing method as described in claim 1, which is characterized in that in step S3, if t_age(p(f_k))≤T_life(p (f_k)), then the data item requested is fresh data, if t_age(p(f_k)) > T_life(p(f_k)), then the data item requested is expired Data, wherein f_kRequesting the corresponding CID of k, p () for data item is the mapping function from request content CID to data item, T_life () is effective life cycle of data item, t_ageThe age of () expression data item.

4. buffer replacing method as described in claim 1, which is characterized in that use depth enhancing study choosing described in step S6 Data to be replaced in the buffer area of edge caching nodes are selected, are specifically included:

4) r will be rewarded immediately_nFeed back edge caching nodes, and by this state conversion process < s_n,a_n,r_n,s_n+1> as training Sample is repeated the above process for training actor-reviewer's network of depth enhancing study.

5. buffer replacing method as claimed in claim 4, which is characterized in that reward r immediately_nCalculation formula it is as follows:

Wherein, Req_nIndicate that caching acts a_nAfter execution, a is acted to caching is executed next time_n+1Between edge caching nodes receive All request of data set arrived, C (d_k) it is to obtain data d_kOverall cost.

6. buffer replacing method as claimed in claim 5, which is characterized in that overall cost C (d_k) calculation formula it is as follows:

C(d_k)=α c (d_k)+(1-α)·l(d_k)

Wherein, α ∈ [0,1] indicates compromise coefficient, c (d_k) indicate communications cost, l (d_k) indicate data age cost, c₁It indicates The communication overhead of data, c are directly obtained from edge caching nodes₂Indicate the communication overhead that data are obtained from data source, c₁< c₂, and c₁、c₂It is normal number；f_kK corresponding CID, F are requested for data item_kIt is cached in edge caching nodes when being reached for request k Data item corresponding CID set, p () is the mapping function from request content CID to data item, T_life() is data item Effective life cycle, t_ageThe age of () expression data item.

7. buffer replacing method as claimed in claim 4, which is characterized in that actor networks parameter θ in depth enhancing study According to gradient rise update are as follows:

Wherein, λ is the learning rate of actor networks,Indicate gradient operator, tactful π (a_n|s_n；θ) indicate in state s_nUnder, Selection caching replacement acts a_nProbability,For advantage function, γ ∈ [0,1] indicates discount factor,Expression state-cost function；

Wherein, λ ' is the learning rate of reviewer's network.

8. buffer replacing method as claimed in claim 4, which is characterized in that actor networks parameter θ in depth enhancing study According to gradient rise update are as follows:

Wherein, λ is the learning rate of actor networks,Indicate gradient operator, tactful π (a_n|s_n；θ) indicate in state s_nUnder, Selection caching replacement acts a_nProbability,For advantage function, γ ∈ [0,1] indicates discount factor,Expression state-cost function；H(π(·|s_n；It θ)) is state s_nUnder, strategy π_θThe strategy of the motion space of output Entropy, β indicate to explore coefficient；

Wherein, λ ' is the learning rate of reviewer's network.

9. a kind of caching replacement system of Internet of Things Temporal Data, which is characterized in that the spatial cache of current edge cache node It has been expired that, which includes: condition judgment module, read module, request forwarding module and caching replacement module；

The condition judgment module, for judging that the state of the requested Temporal Data of user, the state include: state one: asking The Temporal Data item content asked is in the caching of edge caching nodes and the Temporal Data item of request is fresh data；State two: The Temporal Data item content of request is in the caching of edge caching nodes and the Temporal Data item of request is stale data；State Three: the Temporal Data item content of request is not in the caching of edge caching nodes；

The read module, for being state a period of time in the condition judgment module judging result, directly from edge caching nodes Buffer area read the data, by the data forwarding to user；

The request forwarding module is used for when the condition judgment module judging result is state two or three, edge cache section User's request is transmitted to data source by point, and new data is read from data source；

The caching replacement module, for being forwarded with the request when the condition judgment module judging result is state two The stale data in the buffer area for the new data replacement edge caching nodes that module is read, and the new data is transmitted to user； When the condition judgment module judging result is state three, enhance the buffer area of study selection edge caching nodes using depth In data to be replaced, replace the data to be replaced with the new data that the request forwarding module is read, and by the new data It is transmitted to user.

10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program, the computer program realize buffer replacing method as claimed in any one of claims 1 to 8 when being executed by processor.