CN109660598A - A kind of buffer replacing method and system of Internet of Things Temporal Data - Google Patents
A kind of buffer replacing method and system of Internet of Things Temporal Data Download PDFInfo
- Publication number
- CN109660598A CN109660598A CN201811370683.4A CN201811370683A CN109660598A CN 109660598 A CN109660598 A CN 109660598A CN 201811370683 A CN201811370683 A CN 201811370683A CN 109660598 A CN109660598 A CN 109660598A
- Authority
- CN
- China
- Prior art keywords
- data
- caching
- request
- edge
- data item
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/12—Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/568—Storing data temporarily at an intermediate stage, e.g. caching
- H04L67/5682—Policies or rules for updating, deleting or replacing the stored data
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Information Transfer Between Computers (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention discloses the buffer replacing methods and system of a kind of Internet of Things Temporal Data, use the cache policy of depth enhancing learning method study Temporal Data, the temperature trend of mining data from the request of data history of edge caching nodes, in conjunction with Temporal Data information, input as depth enhancing study, immediately reward is set as the opposite number of integrated communication cost, utilize reviewer's e-learning cost function, actor networks learning strategy function, by constantly carrying out caching replacement operation, adaptively learn cache policy, solve the problems, such as that Temporal Data buffer efficiency is not high in storage resource confined condition lower edge cache node;Integrated communication cost is included in the consumption of the data carry mechanism of internet of things data and the communication resource, minimize the long-term overall cost for obtaining internet of things data, can offloading network flow to network edge, reduce time delay, solve the problems, such as to a certain extent time delay present in the transmission of Internet of Things magnanimity Temporal Data is big, the communication resource consume it is big.
Description
Technical field
The invention belongs to wireless communication fields, more particularly, to a kind of buffer replacing method of Internet of Things Temporal Data
And system.
Background technique
As Internet of Things (IoT) is in the rapid of the fields such as intelligent transportation, smart grid, smart home and industrial automation
The internet of things data flow of development and extensive use, magnanimity brings huge pressure and challenge to communication network now.For
It solves the above problems, a kind of common thinking is the addition edge cache mechanism in Internet of Things, utilizes network edge caching section
The idle storage resource of point, caches hot spot data, and request end directly can obtain data from corresponding edge caching nodes, and nothing
It need to be obtained from data source, so as to avoid a large amount of unnecessary end-to-end communications.Edge cache in Internet of things system can be with
Offloading network flow reduces network delay, provides better service quality and user experience.Due to the storage of edge caching nodes
Generally than relatively limited, an efficient cache replacement policy can be improved caching and hits rate capacity, so that more efficient utilization is slow
Space is deposited, more network flows are unloaded.It widely applies in Internet of things system and also the timeliness of data is required, only one
Data in timing effect are just available data, thus data cached freshness be also when caching is replaced one important consider
Standard.Thus, edge cache replacement policy in Internet of things system needs to consider simultaneously data cached temperature and freshness letter
Breath, thus the buffer size under better meeting scenes of internet of things.
Traditional buffer replacing method, such as first in first out method, least recently used method, least commonly using etc., by
Distribution is requested in the temperature trend for not considering content and user, buffer efficiency is lower.The caching of existing edge cache replaces plan
It slightly include: being delayed using user-data correlation by the temperature of collaborative filtering prediction data for E.Bastug et al. proposition
Deposit strategy start when wolfishly cache hot spot data, until edge caching nodes caching exhaust, then according to predict come
Temperature information carries out caching replacement;What P.Blasco et al. was proposed is solved in small base station by knapsack Method (Knapsack)
The problem of data are placed, wherein data temperature is that the request rate arrived according to data receiver is estimated;What J.Song et al. was proposed
Multi-arm fruit machine (multi-armed bandit, the MAB) method of introducing, data temperature is combined with data buffer storage process and is examined
Consider;The temperature by constructing neural network estimated data that S.Tanzil et al. is proposed, uses mixed integer linear programming method
Calculate the placement location and size of caching, but the temperature forecast period of this method, need using video content keyword and
Classification information, this is not applicable to general IoT data.
The above method focuses primarily upon the edge cache of non-transient data, the temperature that edge caching nodes pass through estimated data
To decide whether how caching replaces caching.On the one hand, as it is assumed that the temperature distribution of data and user request to obey specifically
It is distributed (such as Poisson distribution), so that data temperature and the fast-changing scene of user's request distribution can not be adapted to;On the other hand, only
It is absorbed in the caching of non-transient data, does not account for the imeliness problem of Temporal Data.
Summary of the invention
In view of the drawbacks of the prior art, it is an object of the invention to solve the transmission of prior art Internet of Things magnanimity Temporal Data
Present in time delay is big, communication resource consumption is big, Temporal Data caching effect in storage resource confined condition lower edge cache node
The not high technical problem of rate.
To achieve the above object, in a first aspect, the embodiment of the invention provides a kind of cachings of Internet of Things Temporal Data to replace
Method is changed, the spatial cache of current edge cache node has been expired, method includes the following steps:
S1. edge caching nodes receive the new Temporal Data item request of user's sending;
S2. the Temporal Data item content of request is judged whether in the caching of edge caching nodes, if so, entering step
Otherwise S3 enters step S6;
S3. the Temporal Data item for judging request is that fresh data or stale data enter step if fresh data
S4 enters step S5 if stale data;
S4. the data directly are read from the buffer area of edge caching nodes, by the data forwarding to user;
S5. user's request is transmitted to data source by edge caching nodes, and new data is read from data source, is replaced with new data
The stale data in the buffer area of edge caching nodes is changed, and the new data is transmitted to user;
S6. user's request is transmitted to data source by edge caching nodes, and new data is read from data source, is increased using depth
Data to be replaced in the buffer area of strong study selection edge caching nodes replace the data to be replaced with new data, and will
The new data is transmitted to user.
Specifically, in step S2, if fk∈Fk, then the data item content requested in the caching of edge caching nodes, ifThe data item content then requested is not in the caching of edge caching nodes, wherein fkRequest k corresponding for data item
Data content unique identifier CID, FkThe corresponding CID set of the data item cached in edge caching nodes when being reached for request k.
Specifically, in step S3, if tage(p(fk))≤Tlife(p(fk)), then the data item requested is fresh data, if
tage(p(fk)) > Tlife(p(fk)), then the data item requested is stale data, wherein fkRequest k corresponding for data item
CID, p () are the mapping function from request content CID to data item, Tlife() is effective life cycle of data item, tage
The age of () expression data item.
Specifically, to be replaced in the buffer area described in step S6 using depth enhancing study selection edge caching nodes
Data specifically include:
1) at the n moment, edge caching nodes status information is observed, n moment state s is obtainedn;
2) according to cache policy π (an|sn) selection caching movement anAnd execute caching movement;
3) it executes caching and acts anAfterwards, reward r immediately is calculatedn, edge caching nodes status information is by snBecome sn+1;
4) r will be rewarded immediatelynFeed back edge caching nodes, and by this state conversion process < sn,an,rn,sn+1> make
It is repeated the above process for training sample for training actor-reviewer's network of depth enhancing study.
Specifically, r is rewarded immediatelynCalculation formula it is as follows:
Wherein, ReqnIndicate that caching acts anAfter execution, a is acted to caching is executed next timen+1Between edge cache section
All request of data set that point receives, C (dk) it is to obtain data dkOverall cost.
Specifically, overall cost C (dk) calculation formula it is as follows:
C(dk)=α c (dk)+(1-α)·l(dk)
Wherein, α ∈ [0,1] indicates compromise coefficient, c (dk) indicate communications cost, l (dk) indicate data age cost, c1
Indicate the communication overhead that data are directly obtained from edge caching nodes, c2Indicate the communication overhead that data are obtained from data source, c1
< c2, and c1、c2It is normal number;fkK corresponding CID, F are requested for data itemkDelay in edge caching nodes when being reached for request k
The corresponding CID set of the data item deposited, p () is the mapping function from request content CID to data item, Tlife() is data
Effective life cycle of item, tageThe age of () expression data item.
Specifically, rising according to gradient for actor networks parameter θ updates in depth enhancing study are as follows:
Wherein, λ is the learning rate of actor networks,Indicate gradient operator, tactful π (an|sn;θ) indicate in state sn
Under, selection caching replacement acts anProbability,For advantage function, γ ∈ [0,1] indicates discount factor,Expression state-cost function;
The network parameter θ of reviewer's network in depth enhancing studyvDecline according to gradient and update are as follows:
Wherein, λ ' is the learning rate of reviewer's network.
Specifically, rising according to gradient for actor networks parameter θ updates in depth enhancing study are as follows:
Wherein, λ is the learning rate of actor networks,Indicate gradient operator, tactful π (an|sn;θ) indicate in state sn
Under, selection caching replacement acts anProbability,For advantage function, γ ∈ [0,1] indicates discount factor,Expression state-cost function;H(π(·|sn;It θ)) is state snUnder, strategy πθThe strategy of the motion space of output
Entropy, β indicate to explore coefficient;
The network parameter θ of reviewer's network in depth enhancing studyvDecline according to gradient and update are as follows:
Wherein, λ ' is the learning rate of reviewer's network.
Second aspect, the embodiment of the present invention provide a kind of caching replacement system of Internet of Things Temporal Data, and current edge is slow
The spatial cache for depositing node has been expired, which includes: condition judgment module, read module, request forwarding module and caching replacement
Module;
The condition judgment module, for judging that the state of the requested Temporal Data of user, the state include: state
One: the Temporal Data item content of request is in the caching of edge caching nodes and the Temporal Data item of request is fresh data;Shape
State two: the Temporal Data item content of request is in the caching of edge caching nodes and the Temporal Data item of request is stale data;
State three: the Temporal Data item content of request is not in the caching of edge caching nodes;
The read module, for being state a period of time in the condition judgment module judging result, directly from edge cache
The buffer area of node reads the data, by the data forwarding to user;
The request forwarding module, for when the condition judgment module judging result is state two or three, edge to be slow
It deposits node and user's request is transmitted to data source, new data is read from data source;
The caching replacement module is used for when the condition judgment module judging result is state two, with the request
The stale data in the buffer area for the new data replacement edge caching nodes that forwarding module is read, and the new data is transmitted to use
Family;When the condition judgment module judging result is state three, enhance the slow of study selection edge caching nodes using depth
Data to be replaced in area are deposited, replace the data to be replaced with the new data that the request forwarding module is read, and this is new
Data forwarding is to user.
The third aspect, the embodiment of the invention provides a kind of computer readable storage medium, the computer-readable storage mediums
Computer program is stored in matter, which realizes that caching described in above-mentioned first aspect is replaced when being executed by processor
Method.
In general, through the invention it is contemplated above technical scheme is compared with the prior art, have below beneficial to effect
Fruit:
1. the present invention considers effective life cycle of Temporal Data, by the data carry mechanism and the communication resource of internet of things data
Consumption be included in integrated communication cost, to give the target of Internet of Things Temporal Data cache policy: minimize obtain Internet of Things
The long-term overall cost of network data.It, being capable of offloading network flow by caching Temporal Data in the edge caching nodes of network
To network edge, time delay is reduced, solves that time delay present in the transmission of Internet of Things magnanimity Temporal Data is big, communication to a certain extent
The big problem of resource consumption.
2. the present invention is specifically replaced caching using the cache policy of the method study Temporal Data of depth enhancing study
The problem of changing is modeled as markoff process, the temperature trend letter of mining data from the request of data history of edge caching nodes
Breath, in conjunction with the life cycle and data carry mechanism information of Temporal Data, the input of the ambient condition as depth enhancing study.It is short
Phase rewards the opposite number for being set as integrated communication cost.Learn plan using reviewer's e-learning cost function, actor networks
Slightly function adaptively learns cache policy, long-term reward is maximized, thus most by constantly carrying out caching replacement operation
The long-term overall cost that Temporal Data is obtained in small compound networking, solves in storage resource confined condition lower edge cache node
The not high problem of Temporal Data buffer efficiency.
Detailed description of the invention
Fig. 1 is a kind of buffer replacing method flow chart of Internet of Things Temporal Data provided in an embodiment of the present invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
Firstly, being explained to some terms used in the present invention.
Edge caching nodes refer to the network node with caching capabilities in Internet of Things close to user side.
Data carry mechanism is equal to the effective time of data redundancy and the ratio of effective life cycle, and ratio is bigger, uses
Data are more timely, and the freshness of data is higher.
Data age refers to the time interval and efficiency that the data are obtained from data source generation data to user terminal.It obtains
Time interval is shorter, and timeliness is stronger.
The popularity degree of data Thermometer registration evidence, i.e., requested number in certain time, number is higher, data temperature
It is higher.
Temporal Data refers to the data required data time-effectiveness.
Design of the invention are as follows: firstly, will acquire the comprehensive of internet of things data for the overall cost of research Temporal Data caching
Synthesis is originally divided into two parts: communications cost (including bandwidth consumption, time delay etc.) and data timeliness cost.Internet of Things network edge is slow
Depositing and caching the target of replacement in node is to minimize the long-term overall cost for obtaining data, i.e., considers communications cost and data simultaneously
Timeliness cost.Then, caching replacement problem is modeled as Markov process problem, is learned to construct and be enhanced based on depth
Practise the cache policy of (Deep Reinforcement Learning, DRL): according to the request of data that Internet of Things is interior for a period of time
History and current buffer status learn cache policy automatically, minimize and obtain the long-term overall cost that internet of things data obtains.
As shown in Figure 1, a kind of buffer replacing method of Internet of Things Temporal Data, the spatial cache of current edge cache node
It has been expired that, method includes the following steps:
S1. edge caching nodes receive the new Temporal Data item request of user's sending;
S2. the Temporal Data item content of request is judged whether in the caching of edge caching nodes, if so, entering step
Otherwise S3 enters step S6;
S3. the Temporal Data item for judging request is that fresh data or stale data enter step if fresh data
S4 enters step S5 if stale data;
S4. the data directly are read from the buffer area of edge caching nodes, by the data forwarding to user;
S5. user's request is transmitted to data source by edge caching nodes, and new data is read from data source, is replaced with new data
The stale data in the buffer area of edge caching nodes is changed, and the new data is transmitted to user;
S6. user's request is transmitted to data source by edge caching nodes, and new data is read from data source, is increased using depth
Data to be replaced in the buffer area of strong study selection edge caching nodes replace the data to be replaced with new data, and will
The new data is transmitted to user.
Step S1. edge caching nodes receive the new Temporal Data item request of user's sending.
Data item d CID (Content ID) unique identification in Internet of Things, each data item include two fields: raw
At time tgen(d) and effective life cycle Tlife(d).T is expressed as at the age of t moment, data item dage(d)=t-tgen
(d).If the age of data item d is less than its effective life cycle, i.e. tage(d) < Tlife(d), then data item d is claimed to be fresh number
According in effective life cycle;Otherwise, data item d is claimed to be stale data, the overaging phase.
It will be denoted as k from the request of the data item of Internet of Things user terminal, the corresponding CID of requested data item content is fk, ask
The time for asking arrival is tk.In tkMoment, the collection of data items cached in Internet of Things edge caching nodes are denoted asThe corresponding CID collection of the data item of caching is combined intoWherein, I
Indicate the largest buffered data item capacity of edge caching nodes.Mapping functionBy the cid information of request content
With the data item of cachingIt connects.
After data item request k is reached, Internet of Things edge caching nodes first check for whether there is CID in caching being fkAnd
Fresh data cached item.Three kinds of situations are divided to consider:
Situation 1:fk∈FkAnd tage(p(fk))≤Tlife(p(fk)), that is, the data item content requested in the buffer, and is somebody's turn to do
Cache entry be it is fresh, meet timeliness requirement.Therefore, the data item p (f of the direct return cache of edge caching nodesk) to number
According to requestor.
Situation 2:fk∈FkAnd tage(p(fk)) > Tlife(p(fk)), that is, the data item content requested in the buffer, and is somebody's turn to do
Cache entry is expired.Therefore, edge caching nodes obtain new data from data source and return to data requester, and with newly
The data of acquisition replace the stale data in caching.
Situation 3:The data item content requested is not in the buffer.Therefore, edge caching nodes are from data source
Place obtains new data and returns to data requester, while using in the buffer area of depth enhancing study selection edge caching nodes
Data to be replaced replace the data to be replaced with new data.
By the analysis of three of the above situation, after user issues request k, the returned data item d that receiveskIt can indicate are as follows:
Wherein, fkRequesting the corresponding CID of k, p () for data item is the mapping function from request content CID to data item,
FkThe corresponding CID set of the data item cached in edge caching nodes when being reached for request k, tage(d) year of data item d is indicated
Age, Tlife(d) effective life cycle of data item d is indicated.
Step S2. judges the Temporal Data item content of request whether in the caching of edge caching nodes, if so, entering step
Otherwise rapid S3 enters step S6.
fk∈FkShow request data item content in the caching of edge caching nodes,Show the data of request
Item content is not in the caching of edge caching nodes.
The Temporal Data item of step S3. judgement request is fresh data or stale data, if fresh data, into step
Rapid S4 enters step S5 if stale data.
tage(p(fk))≤Tlife(p(fk)) show that the data item of request is fresh data, tage(p(fk)) > Tlife(p
(fk)) show that the data item of request is stale data.
For the cache replacement policy of edge caching nodes, edge caching nodes first wolfishly cache all arrival when initial
Data item, until spatial cache fills up.After caching has been expired, if newly arrived request, corresponding situation 1, without being delayed
Deposit replacement;Corresponding situation 2, it is data cached out of date as corresponding to known current request, directly with the number newly obtained
According to replacing the stale data;Corresponding situation 3, new data reach, and buffer replacing method is needed to decide whether to use newly
Data item replaces the buffered data item of buffer area, if replacement, specifically replaces the data item which has been cached.
Specifically, to data item dkThe caching movement that cache policy provides is denoted as ak, motion space is A={ a0,a1,…,
aI}。ak=a0It indicates to replace without caching, ak=ai(1≤i≤I) is indicated with the fresh data item d obtained from data sourcek
Replace buffer areaThe corresponding cache entry in position.A is acted when buffer area executes caching replacementkAfter, DkAnd FkIt will become Dk+1With
Fk+1。
User's request is transmitted to data source by step S6. edge caching nodes, and new data is read from data source, uses depth
Data to be replaced, replace the data to be replaced with new data in the buffer area of degree enhancing study selection edge caching nodes,
And the new data is transmitted to user.
Step S6 corresponds to situation 3, enhances number to be replaced in the buffer area of study selection edge caching nodes using depth
According to definition obtains Temporal Data item dkOverall cost C (dk)。
It will acquire Temporal Data item dkOverall cost C (dk) it is divided into two parts, a part is communications cost c (dk), separately
A part is data age cost l (dk)。
Communications cost c (dk) calculation formula it is as follows:
Wherein, c1Indicate the communication overhead that data are directly obtained from edge caching nodes, c2It indicates to obtain number from data source
According to communication overhead, c1< c2, and c1、c2It is normal number.
Data age cost l (dk) calculation formula it is as follows:
Overall cost C (dk) calculation formula it is as follows:
C(dk)=α c (dk)+(1-α)·l(dk)
Wherein, α ∈ [0,1] indicates compromise coefficient, is weighted to the importance of two kinds of costs, and biggish α indicates to use
Loss of communications is more taken notice of at family, and otherwise, user more takes notice of data age.
In order to optimize the overall cost that Internet of Things obtains data, caching replacement problem is modeled as Markov process and is asked
Topic.Corresponding front caching scenario 1 and situation 2 are all deterministic rules, therefore it may only be necessary to optimize the caching replacement in scene 3
Movement.
Markov process problem can pass through { S, A, M (sn+1|sn,an),R(sn,an) definition, wherein S indicates Internet of Things
The state set of net system edges cache node, snIndicate the state of n moment edge caching nodes;A indicates cache replacement policy
Set of actions, anIndicate the caching movement at n moment;M(sn+1|sn,an) indicate that execution acts anAfterwards, the shape of edge caching nodes
State is from snTransfer is sn+1State transition probability matrix, R (sn,an) indicate instant reward function, in state snExecution acts anAfterwards
System award feedback.Therefore, entirely caching replacement process can indicate are as follows:
1) at the n moment, edge caching nodes observing system status information obtains system n moment state sn∈S。
2) edge caching nodes are according to cache policy π (an|sn) selection caching movement anAnd it executes.
3) it executes caching and acts anAfterwards, system returns to a reward r immediatelyn=R (sn,an), and system mode is by snTransfer
For sn+1。
4) r is rewarded immediatelynEdge caching nodes are fed back, by this state conversion process < sn,an,rn,sn+1> as instruction
The experience pond for practicing sample addition deeply study is repeated the above process for training actor-reviewer's network.
Wherein, cache policy π (an|sn) indicate, in state snUnder, selection caching replacement acts anProbability, can be abbreviated as
π。
The long-term accumulation reward of whole process is denoted asWherein, γ ∈ [0,1] is discount factor, is determined
Influence degree of the reward in future to cache decision at this stage.Therefore, it is an object of the present invention to find optimal cache policy
π*, it maximizes the stateful lower long-term accumulated reward of institute and it is expected:Wherein, E [Rn| π] indicate slow
Deposit the long-term accumulated reward R of tactful πnExpectation.
In order to measure the superiority and inferiority of cache policy π, cost function V is definedπ(sn)=E [Rn|sn;π] indicate that cache policy π exists
State snUnder long-term accumulation reward RnExpectation.State snUnder optimum value function be represented by V*(sn)=maxπVπ(sn)。
By Markov property, by Vπ(sn) bring Bellman equation into, it obtains:
Wherein, p (sn+1,rn|sn,an) indicate in state snLower execution acts anAfterwards, it is transferred to state sn+1And obtain system
Immediately reward rnProbability, γ ∈ [0,1] indicate discount factor.This equation gives the iteration of cost function in markoff process
Calculation method.
If providing the state transition probability matrix M (s of Markov processn+1|sn,an), Dynamic Programming can be passed through
Method solves optimal policy.However in edge cache replacement process, state transition probability matrix is unknown, and therefore, the present invention uses
The method of depth enhancing study, intelligently the mined information from historical data, learns fitting state-valence using deep neural network
Value function, to obtain optimal cache replacement policy.
The present invention enhances learning method using depth, learns efficient cache policy, adaptively with actor-reviewer
(Actor-Critic) for depth enhancing learning method, but it is not limited to actor-reviewer's method, introduced of the invention
Technical detail.Wherein:
Input: during edge caching nodes constantly handle user's request, there is situation 3, that is, the data item requested
Content not in the buffer when, edge caching nodes by forward the request, requested data item d is got from data sourcen。
At this point, cache policy intelligent body (Agent) observes current ambient conditionInput as deep neural network.Wherein,Table
Show current request data item dnFeature vector,Indicate i-th of data item in the buffer area of edge caching nodes
Feature vector.Feature vectorIt indicates in J group request in the past, it is each
To data item content in groupRequest number of times, this feature vector reflects the temperature of the data item.It indicates
Data item contentEffective life cycle,Indicate data item contentFreshness, freshness be equal to number
According to item contentRemaining effective time and effective life cycle ratio.In addition to these data request informations, the information of input
In can also be comprising scene information, edge network information etc..
Strategy: the present invention characterizes replacement policy π with actor networksθ(an|sn), also referred to as π (an|sn;It θ), can letter
Referred to as πθ, actor networks parameter is θ.The input of actor networks is the status information s of n moment edge caching nodesn, action
The output of person's network is the probability of each caching movement of selection.Depth enhancing study is finally according to tactful πθ(an|sn) select to delay
Replacement movement is deposited to execute replacement that is, to the data in the caching zone position for selecting each caching movement maximum probability and act an.Example
Such as, there is 3 buffer areas (mark 1,2,3, then indicate not replace with 0) in edge caching nodes.Actor networks export general
Rate is (0.1,0.1,0.7,0.1), and the position then replaced according to this probability selection, very possible output is exactly a herein2,
So being exactly to replace No. 2 buffer areas.The target of actor networks is to export corresponding caching according to the cache policy learnt
Replacement movement, to maximize long-term accumulated reward expectation E [Rn|πθ]。
State-cost function: the present invention characterizes state-cost function with reviewer's networkReviewer
Network parameter is θv.The input of reviewer's network is the status information s of n moment edge caching nodesn, the output of reviewer's network
It is the state in current strategies πθLower represented value.The target of reviewer's network is accurately to estimate tactful π as far as possibleθ
Lower state snValue.
Strategies Training: every primary caching replacement of execution acts an, system feedback is to cache policy intelligent decision module one
Immediately reward rn:
Wherein, ReqnIndicate that caching acts anAfter execution, a is acted to caching is executed next timen+1Between edge cache section
All request of data set that point receives, C (dk) it is to obtain data dkOverall cost.Due to target be in order to minimize obtain
The long-term overall cost of data, so having added negative sign before cost.
Track is shifted according to the state of Bellman equation and Markov process, it is estimated that the phase always rewarded for a long time
It hopes, thus come learning network parameter by way of gradient updating.Ladder of the expectation always rewarded relative to actor networks parameter θ
Degree may be calculated:
In order to maximize the expectation always rewarded, rising according to gradient for actor networks parameter θ updates are as follows:
Wherein, λ is the learning rate of actor networks, can be adjusted according to the actual situation;Indicate gradient operator;Advantage
FunctionIt has measured in state snUnder, selection acts anIt is how well.
Reviewer's network can be set as the output of reviewer's network by the training of the method for time difference, loss functionWith target valueSquare error.The network parameter θ of reviewer's networkvAccording to gradient
Decline updates are as follows:
Wherein, λ ' is the learning rate of reviewer's network, can be adjusted according to the actual situation;Indicate gradient operator.
In order to solve " exploration-utilizes predicament " (exploration-exploitation in enhancing study
Dilemma), " utilization " indicates to take the currently optimal movement that has learnt, and " exploration " attempt by take it is current it is non-most
Motion space is fully explored in good movement.Learning strategy falls into local optimum in order to prevent, and the technical program is by by plan
The updated of actor networks parameter θ is added in the entropy of slightly (the movement probability distribution of actor networks output) in the form of regular terms
Journey, to realize the encouragement to " exploration " process:
Wherein, H (π (| sn;It θ)) is state snUnder, strategy πθThe tactful entropy of the motion space of output, β indicate to explore system
Number,Indicate gradient operator.It is specific to calculate are as follows:
By gradient ascent method, θ is updated towards the direction that entropy increases, and encourages " exploration " process.Explore factor beta be one just
Number, to balance the degree of " exploration " and " utilization ".Bigger β value indicates more to encourage " exploration ", and when specific implementation can basis
It needs to adjust.
Cache policy supports on-line study and off-line learning two ways.On-line study can directly be deployed to edge cache
Node learns cache policy according to the Internet of Things request data that edge caching nodes are handled, periodically update network parameter.
It is under off-line learning is first online that cache policy pre-training is good, edge caching nodes are then deployed to, are remained unchanged.
The present invention provides a kind of caching replacement system of Internet of Things Temporal Data, which includes: condition judgment module, reads
Modulus block, request forwarding module and caching replacement module;
The condition judgment module, for judging that the state of the requested Temporal Data of user, the state include: state
One: the Temporal Data item content of request is in the caching of edge caching nodes and the Temporal Data item of request is fresh data;Shape
State two: the Temporal Data item content of request is in the caching of edge caching nodes and the Temporal Data item of request is stale data;
State three: the Temporal Data item content of request is not in the caching of edge caching nodes;
The read module, for being state a period of time in the condition judgment module judging result, directly from edge cache
The buffer area of node reads the data, by the data forwarding to user;
The request forwarding module, for when the condition judgment module judging result is state two or three, edge to be slow
It deposits node and user's request is transmitted to data source, new data is read from data source;
The caching replacement module is used for when the condition judgment module judging result is state two, with the request
The stale data in the buffer area for the new data replacement edge caching nodes that forwarding module is read, and the new data is transmitted to use
Family;When the condition judgment module judging result is state three, enhance the slow of study selection edge caching nodes using depth
Data to be replaced in area are deposited, replace the data to be replaced with the new data that the request forwarding module is read, and this is new
Data forwarding is to user.
The system further includes training module, acted in every execution once caching replacement movement, collecting caching replacement,
Immediately the state of the edge caching nodes of reward, replacement front and back brought by replacement movement, and based on described in above-mentioned parameter training
Cache the network parameter of depth enhancing study in replacement module.
More than, the only preferable specific embodiment of the application, but the protection scope of the application is not limited thereto, and it is any
Within the technical scope of the present application, any changes or substitutions that can be easily thought of by those familiar with the art, all answers
Cover within the scope of protection of this application.Therefore, the protection scope of the application should be subject to the protection scope in claims.
Claims (10)
1. a kind of buffer replacing method of Internet of Things Temporal Data, which is characterized in that the spatial cache of current edge cache node
It has been expired that, method includes the following steps:
S1. edge caching nodes receive the new Temporal Data item request of user's sending;
S2. judge the Temporal Data item content of request whether in the caching of edge caching nodes, it is no if so, enter step S3
Then, S6 is entered step;
S3. judge that the Temporal Data item of request is that fresh data or stale data if fresh data enter step S4, if
It is stale data, enters step S5;
S4. the data directly are read from the buffer area of edge caching nodes, by the data forwarding to user;
S5. user's request is transmitted to data source by edge caching nodes, and new data is read from data source, replaces side with new data
The stale data in the buffer area of edge cache node, and the new data is transmitted to user;
S6. user's request is transmitted to data source by edge caching nodes, and new data is read from data source, is enhanced using depth and is learned
Data to be replaced in the buffer area of selection edge caching nodes are practised, replace the data to be replaced with new data, and this is new
Data forwarding is to user.
2. buffer replacing method as described in claim 1, which is characterized in that in step S2, if fk∈Fk, then the data requested
Content in the caching of edge caching nodes, ifThe data item content then requested is not in the slow of edge caching nodes
In depositing, wherein fkK corresponding data content unique identifier CID, F are requested for data itemkEdge cache section when being reached for request k
The corresponding CID set of the data item cached in point.
3. buffer replacing method as described in claim 1, which is characterized in that in step S3, if tage(p(fk))≤Tlife(p
(fk)), then the data item requested is fresh data, if tage(p(fk)) > Tlife(p(fk)), then the data item requested is expired
Data, wherein fkRequesting the corresponding CID of k, p () for data item is the mapping function from request content CID to data item, Tlife
() is effective life cycle of data item, tageThe age of () expression data item.
4. buffer replacing method as described in claim 1, which is characterized in that use depth enhancing study choosing described in step S6
Data to be replaced in the buffer area of edge caching nodes are selected, are specifically included:
1) at the n moment, edge caching nodes status information is observed, n moment state s is obtainedn;
2) according to cache policy π (an|sn) selection caching movement anAnd execute caching movement;
3) it executes caching and acts anAfterwards, reward r immediately is calculatedn, edge caching nodes status information is by snBecome sn+1;
4) r will be rewarded immediatelynFeed back edge caching nodes, and by this state conversion process < sn,an,rn,sn+1> as training
Sample is repeated the above process for training actor-reviewer's network of depth enhancing study.
5. buffer replacing method as claimed in claim 4, which is characterized in that reward r immediatelynCalculation formula it is as follows:
Wherein, ReqnIndicate that caching acts anAfter execution, a is acted to caching is executed next timen+1Between edge caching nodes receive
All request of data set arrived, C (dk) it is to obtain data dkOverall cost.
6. buffer replacing method as claimed in claim 5, which is characterized in that overall cost C (dk) calculation formula it is as follows:
C(dk)=α c (dk)+(1-α)·l(dk)
Wherein, α ∈ [0,1] indicates compromise coefficient, c (dk) indicate communications cost, l (dk) indicate data age cost, c1It indicates
The communication overhead of data, c are directly obtained from edge caching nodes2Indicate the communication overhead that data are obtained from data source, c1<
c2, and c1、c2It is normal number;fkK corresponding CID, F are requested for data itemkIt is cached in edge caching nodes when being reached for request k
Data item corresponding CID set, p () is the mapping function from request content CID to data item, Tlife() is data item
Effective life cycle, tageThe age of () expression data item.
7. buffer replacing method as claimed in claim 4, which is characterized in that actor networks parameter θ in depth enhancing study
According to gradient rise update are as follows:
Wherein, λ is the learning rate of actor networks,Indicate gradient operator, tactful π (an|sn;θ) indicate in state snUnder,
Selection caching replacement acts anProbability,For advantage function, γ ∈ [0,1] indicates discount factor,Expression state-cost function;
The network parameter θ of reviewer's network in depth enhancing studyvDecline according to gradient and update are as follows:
Wherein, λ ' is the learning rate of reviewer's network.
8. buffer replacing method as claimed in claim 4, which is characterized in that actor networks parameter θ in depth enhancing study
According to gradient rise update are as follows:
Wherein, λ is the learning rate of actor networks,Indicate gradient operator, tactful π (an|sn;θ) indicate in state snUnder,
Selection caching replacement acts anProbability,For advantage function, γ ∈ [0,1] indicates discount factor,Expression state-cost function;H(π(·|sn;It θ)) is state snUnder, strategy πθThe strategy of the motion space of output
Entropy, β indicate to explore coefficient;
The network parameter θ of reviewer's network in depth enhancing studyvDecline according to gradient and update are as follows:
Wherein, λ ' is the learning rate of reviewer's network.
9. a kind of caching replacement system of Internet of Things Temporal Data, which is characterized in that the spatial cache of current edge cache node
It has been expired that, which includes: condition judgment module, read module, request forwarding module and caching replacement module;
The condition judgment module, for judging that the state of the requested Temporal Data of user, the state include: state one: asking
The Temporal Data item content asked is in the caching of edge caching nodes and the Temporal Data item of request is fresh data;State two:
The Temporal Data item content of request is in the caching of edge caching nodes and the Temporal Data item of request is stale data;State
Three: the Temporal Data item content of request is not in the caching of edge caching nodes;
The read module, for being state a period of time in the condition judgment module judging result, directly from edge caching nodes
Buffer area read the data, by the data forwarding to user;
The request forwarding module is used for when the condition judgment module judging result is state two or three, edge cache section
User's request is transmitted to data source by point, and new data is read from data source;
The caching replacement module, for being forwarded with the request when the condition judgment module judging result is state two
The stale data in the buffer area for the new data replacement edge caching nodes that module is read, and the new data is transmitted to user;
When the condition judgment module judging result is state three, enhance the buffer area of study selection edge caching nodes using depth
In data to be replaced, replace the data to be replaced with the new data that the request forwarding module is read, and by the new data
It is transmitted to user.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium
Program, the computer program realize buffer replacing method as claimed in any one of claims 1 to 8 when being executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811370683.4A CN109660598B (en) | 2018-11-17 | 2018-11-17 | Cache replacement method and system for transient data of Internet of things |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811370683.4A CN109660598B (en) | 2018-11-17 | 2018-11-17 | Cache replacement method and system for transient data of Internet of things |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109660598A true CN109660598A (en) | 2019-04-19 |
CN109660598B CN109660598B (en) | 2020-05-19 |
Family
ID=66111253
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811370683.4A Active CN109660598B (en) | 2018-11-17 | 2018-11-17 | Cache replacement method and system for transient data of Internet of things |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109660598B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110456647A (en) * | 2019-07-02 | 2019-11-15 | 珠海格力电器股份有限公司 | A kind of intelligent home furnishing control method and intelligent home control device |
CN111277666A (en) * | 2020-02-21 | 2020-06-12 | 南京邮电大学 | Online collaborative caching method based on freshness |
CN111292001A (en) * | 2020-02-24 | 2020-06-16 | 清华大学深圳国际研究生院 | Joint decision method and device based on reinforcement learning |
CN113038616A (en) * | 2021-03-16 | 2021-06-25 | 电子科技大学 | Frequency spectrum resource management and allocation method based on federal learning |
CN113055721A (en) * | 2019-12-27 | 2021-06-29 | 中国移动通信集团山东有限公司 | Video content distribution method and device, storage medium and computer equipment |
CN113115362A (en) * | 2021-04-16 | 2021-07-13 | 三峡大学 | Cooperative edge caching method and device |
CN113115368A (en) * | 2021-04-02 | 2021-07-13 | 南京邮电大学 | Base station cache replacement method, system and storage medium based on deep reinforcement learning |
CN113395333A (en) * | 2021-05-31 | 2021-09-14 | 电子科技大学 | Multi-edge base station joint cache replacement method based on intelligent agent depth reinforcement learning |
CN113438315A (en) * | 2021-07-02 | 2021-09-24 | 中山大学 | Internet of things information freshness optimization method based on dual-network deep reinforcement learning |
CN113630742A (en) * | 2020-08-05 | 2021-11-09 | 北京航空航天大学 | Mobile edge cache replacement method adopting request rate and dynamic property of information source issued content |
CN113676513A (en) * | 2021-07-15 | 2021-11-19 | 东北大学 | Deep reinforcement learning-driven intra-network cache optimization method |
WO2021253168A1 (en) * | 2020-06-15 | 2021-12-23 | Alibaba Group Holding Limited | Managing data stored in a cache using a reinforcement learning agent |
CN114170560A (en) * | 2022-02-08 | 2022-03-11 | 深圳大学 | Multi-device edge video analysis system based on deep reinforcement learning |
CN115914388A (en) * | 2022-12-14 | 2023-04-04 | 广东信通通信有限公司 | Resource data fresh-keeping method based on monitoring data acquisition |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090070533A1 (en) * | 2007-09-07 | 2009-03-12 | Edgecast Networks, Inc. | Content network global replacement policy |
CN102291447A (en) * | 2011-08-05 | 2011-12-21 | 中国电信股份有限公司 | Content distribution network load scheduling method and system |
CN106452919A (en) * | 2016-11-24 | 2017-02-22 | 济南浪潮高新科技投资发展有限公司 | Fog node optimization method based on fussy theory |
CN106888270A (en) * | 2017-03-30 | 2017-06-23 | 网宿科技股份有限公司 | Return the method and system of source routing scheduling |
CN107479829A (en) * | 2017-08-03 | 2017-12-15 | 杭州铭师堂教育科技发展有限公司 | A kind of Redis cluster mass datas based on message queue quickly clear up system and method |
-
2018
- 2018-11-17 CN CN201811370683.4A patent/CN109660598B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090070533A1 (en) * | 2007-09-07 | 2009-03-12 | Edgecast Networks, Inc. | Content network global replacement policy |
CN102291447A (en) * | 2011-08-05 | 2011-12-21 | 中国电信股份有限公司 | Content distribution network load scheduling method and system |
CN106452919A (en) * | 2016-11-24 | 2017-02-22 | 济南浪潮高新科技投资发展有限公司 | Fog node optimization method based on fussy theory |
CN106888270A (en) * | 2017-03-30 | 2017-06-23 | 网宿科技股份有限公司 | Return the method and system of source routing scheduling |
CN107479829A (en) * | 2017-08-03 | 2017-12-15 | 杭州铭师堂教育科技发展有限公司 | A kind of Redis cluster mass datas based on message queue quickly clear up system and method |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110456647A (en) * | 2019-07-02 | 2019-11-15 | 珠海格力电器股份有限公司 | A kind of intelligent home furnishing control method and intelligent home control device |
CN113055721B (en) * | 2019-12-27 | 2022-12-09 | 中国移动通信集团山东有限公司 | Video content distribution method and device, storage medium and computer equipment |
CN113055721A (en) * | 2019-12-27 | 2021-06-29 | 中国移动通信集团山东有限公司 | Video content distribution method and device, storage medium and computer equipment |
WO2021164378A1 (en) * | 2020-02-21 | 2021-08-26 | 南京邮电大学 | Freshness based online cooperative caching method |
CN111277666A (en) * | 2020-02-21 | 2020-06-12 | 南京邮电大学 | Online collaborative caching method based on freshness |
CN111277666B (en) * | 2020-02-21 | 2021-06-01 | 南京邮电大学 | Online collaborative caching method based on freshness |
CN111292001A (en) * | 2020-02-24 | 2020-06-16 | 清华大学深圳国际研究生院 | Joint decision method and device based on reinforcement learning |
CN111292001B (en) * | 2020-02-24 | 2023-06-02 | 清华大学深圳国际研究生院 | Combined decision method and device based on reinforcement learning |
CN115398877B (en) * | 2020-06-15 | 2024-03-26 | 阿里巴巴集团控股有限公司 | Managing data stored in a cache using reinforcement learning agents |
WO2021253168A1 (en) * | 2020-06-15 | 2021-12-23 | Alibaba Group Holding Limited | Managing data stored in a cache using a reinforcement learning agent |
CN115398877A (en) * | 2020-06-15 | 2022-11-25 | 阿里巴巴集团控股有限公司 | Managing data stored in a cache using reinforcement learning agents |
CN113630742A (en) * | 2020-08-05 | 2021-11-09 | 北京航空航天大学 | Mobile edge cache replacement method adopting request rate and dynamic property of information source issued content |
CN113038616A (en) * | 2021-03-16 | 2021-06-25 | 电子科技大学 | Frequency spectrum resource management and allocation method based on federal learning |
CN113115368A (en) * | 2021-04-02 | 2021-07-13 | 南京邮电大学 | Base station cache replacement method, system and storage medium based on deep reinforcement learning |
CN113115362A (en) * | 2021-04-16 | 2021-07-13 | 三峡大学 | Cooperative edge caching method and device |
CN113395333B (en) * | 2021-05-31 | 2022-03-25 | 电子科技大学 | Multi-edge base station joint cache replacement method based on intelligent agent depth reinforcement learning |
CN113395333A (en) * | 2021-05-31 | 2021-09-14 | 电子科技大学 | Multi-edge base station joint cache replacement method based on intelligent agent depth reinforcement learning |
CN113438315A (en) * | 2021-07-02 | 2021-09-24 | 中山大学 | Internet of things information freshness optimization method based on dual-network deep reinforcement learning |
CN113676513B (en) * | 2021-07-15 | 2022-07-01 | 东北大学 | Intra-network cache optimization method driven by deep reinforcement learning |
CN113676513A (en) * | 2021-07-15 | 2021-11-19 | 东北大学 | Deep reinforcement learning-driven intra-network cache optimization method |
CN114170560A (en) * | 2022-02-08 | 2022-03-11 | 深圳大学 | Multi-device edge video analysis system based on deep reinforcement learning |
CN115914388A (en) * | 2022-12-14 | 2023-04-04 | 广东信通通信有限公司 | Resource data fresh-keeping method based on monitoring data acquisition |
Also Published As
Publication number | Publication date |
---|---|
CN109660598B (en) | 2020-05-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109660598A (en) | A kind of buffer replacing method and system of Internet of Things Temporal Data | |
Zhu et al. | Caching transient data for Internet of Things: A deep reinforcement learning approach | |
CN109639760B (en) | It is a kind of based on deeply study D2D network in cache policy method | |
Huang et al. | FedParking: A federated learning based parking space estimation with parked vehicle assisted edge computing | |
Tong et al. | Adaptive computation offloading and resource allocation strategy in a mobile edge computing environment | |
Zhang et al. | Joint optimization of cooperative edge caching and radio resource allocation in 5G-enabled massive IoT networks | |
CN114143891A (en) | FDQL-based multi-dimensional resource collaborative optimization method in mobile edge network | |
CN112866015B (en) | Intelligent energy-saving control method based on data center network flow prediction and learning | |
Archi et al. | Applications of Deep Reinforcement Learning in Wireless Networks-A Recent Review | |
Hribar et al. | Utilising correlated information to improve the sustainability of internet of things devices | |
CN110290510A (en) | Support the edge cooperation caching method under the hierarchical wireless networks of D2D communication | |
Shaghluf et al. | Spectrum and energy efficiency of cooperative spectrum prediction in cognitive radio networks | |
CN116346837A (en) | Internet of things edge collaborative caching method based on deep reinforcement learning | |
Samikwa et al. | Adaptive early exit of computation for energy-efficient and low-latency machine learning over iot networks | |
Somesula et al. | Cooperative cache update using multi-agent recurrent deep reinforcement learning for mobile edge networks | |
Jiang et al. | A reinforcement learning-based computing offloading and resource allocation scheme in F-RAN | |
Xiong et al. | Distributed caching in converged networks: A deep reinforcement learning approach | |
Zhang et al. | Dual-timescale resource allocation for collaborative service caching and computation offloading in IoT systems | |
CN114554495A (en) | Federal learning-oriented user scheduling and resource allocation method | |
Shui et al. | Cell-free networking for integrated data and energy transfer: Digital twin based double parameterized dqn for energy sustainability | |
Sun et al. | Knowledge-driven deep learning paradigms for wireless network optimization in 6G | |
CN117473616A (en) | Railway BIM data edge caching method based on multi-agent reinforcement learning | |
Su et al. | Outage performance analysis and resource allocation algorithm for energy harvesting D2D communication system | |
CN116009990B (en) | Cloud edge collaborative element reinforcement learning computing unloading method based on wide attention mechanism | |
He | [Retracted] Application of Neural Network Sample Training Algorithm in Regional Economic Management |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |