CN109660598B - Cache replacement method and system for transient data of Internet of things - Google Patents

Cache replacement method and system for transient data of Internet of things Download PDF

Info

Publication number
CN109660598B
CN109660598B CN201811370683.4A CN201811370683A CN109660598B CN 109660598 B CN109660598 B CN 109660598B CN 201811370683 A CN201811370683 A CN 201811370683A CN 109660598 B CN109660598 B CN 109660598B
Authority
CN
China
Prior art keywords
data
cache
state
cache node
edge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811370683.4A
Other languages
Chinese (zh)
Other versions
CN109660598A (en
Inventor
曹洋
褚磊
竺浩
江涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201811370683.4A priority Critical patent/CN109660598B/en
Publication of CN109660598A publication Critical patent/CN109660598A/en
Application granted granted Critical
Publication of CN109660598B publication Critical patent/CN109660598B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • H04L67/5682Policies or rules for updating, deleting or replacing the stored data

Abstract

The invention discloses a cache replacement method and a cache replacement system for transient data of an Internet of things, wherein a depth reinforcement learning method is used for learning a cache strategy of the transient data, the heat trend of the data is mined from the data request history of an edge cache node, transient data information is combined and used as the input of depth reinforcement learning, an instant reward is set as the opposite number of comprehensive communication cost, and the cache strategy is learned in a self-adaptive manner by continuously performing cache replacement operation by utilizing a critic network learning value function and an actor network learning strategy function, so that the problem of low cache efficiency of the transient data in the edge cache node under the condition of limited storage resources is solved; the data freshness and the communication resource consumption of the data of the Internet of things are brought into the comprehensive communication cost, the long-term comprehensive cost for acquiring the data of the Internet of things is minimized, the network flow can be unloaded to the network edge, the time delay is reduced, and the problems of large time delay and large communication resource consumption in the transmission of the mass transient data of the Internet of things are solved to a certain extent.

Description

Cache replacement method and system for transient data of Internet of things
Technical Field
The invention belongs to the field of wireless communication, and particularly relates to a cache replacement method and system for transient data of an Internet of things.
Background
With the rapid development and wide application of the internet of things (IoT) in the fields of intelligent transportation, smart grid, smart home, industrial automation and the like, huge pressure and challenge are brought to the current communication network by massive internet of things data traffic. In order to solve the above problems, a common idea is to add an edge cache mechanism in the internet of things, cache hot spot data by using idle storage resources of network edge cache nodes, and a request end can directly obtain data from corresponding edge cache nodes without obtaining the data from a data source, thereby avoiding a large amount of unnecessary end-to-end communication. The edge cache in the Internet of things system can unload network flow, reduce network delay and provide better service quality and user experience. Because the storage capacity of the edge cache node is generally limited, an efficient cache replacement strategy can improve the cache hit rate, so that the cache space is utilized more efficiently, and more network traffic is unloaded. The mass application in the internet of things system also has a requirement on the timeliness of data, and only data within a certain timeliness is available, so that the freshness of the cached data is an important consideration standard for cache replacement. Therefore, an edge cache replacement strategy in the internet of things system needs to consider the heat and freshness information of cache data at the same time, so that the cache requirement in the scene of the internet of things is better met.
The traditional cache replacement method, such as a first-in first-out method, a least recently used method, a least frequently used method and the like, has low cache efficiency because the hot trend of the content and the distribution of the user requests are not considered. The existing cache replacement policy of the edge cache includes: bastag et al, predict the heat of data by collaborative filtering using user-data correlation, greedily cache hot data at the beginning by a cache strategy until the cache of an edge cache node is exhausted, and then perform cache replacement according to the predicted heat information; blasco et al, solved the problem of data placement in small base stations by the Knapsack method (Knapack), where the data heat is estimated from the request rate at which the data is received; song et al, which considers data heat in combination with the data caching process; tanzil et al, estimate the heat of data by constructing a neural network, and calculate the placement position and size of the cache using a mixed integer linear programming method, but the heat prediction stage of the method needs to utilize keywords and classification information of video content, which is not applicable to general IoT data.
The method mainly focuses on the edge cache of non-transient data, and the edge cache node determines whether the cache replaces the cache by estimating the heat of the data. On one hand, the heat distribution of the data and the user request are assumed to follow a specific distribution (such as a poisson distribution), so that the scene that the heat distribution of the data and the user request are rapidly changed cannot be adapted; on the other hand, only the caching of non-transient data is focused, and the timeliness problem of transient data is not considered.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to solve the technical problems of long time delay, high communication resource consumption and low transient data caching efficiency in an edge cache node under the condition of limited storage resources in the prior art of mass transient data transmission of the Internet of things.
To achieve the above object, in a first aspect, an embodiment of the present invention provides a cache replacement method for transient data of an internet of things, where a cache space of a current edge cache node is full, the method includes the following steps:
s1, an edge cache node receives a new transient data item request sent by a user;
s2, judging whether the content of the transient data item of the request is in the cache of the edge cache node, if so, entering a step S3, otherwise, entering a step S6;
s3, judging whether the transient data item of the request is fresh data or expired data, if so, entering a step S4, and if so, entering a step S5;
s4, directly reading the data from the cache region of the edge cache node, and forwarding the data to a user;
s5, the edge cache node forwards the user request to a data source, reads new data from the data source, replaces the expired data in a cache region of the edge cache node with the new data, and forwards the new data to the user;
s6, the edge cache node forwards the user request to a data source, reads new data from the data source, selects data to be replaced in a cache region of the edge cache node by using depth-enhanced learning, replaces the data to be replaced by the new data, and forwards the new data to the user.
Specifically, in step S2, if f isk∈FkThe requested data item content is in the cache of the edge cache node, if so
Figure BDA0001869689710000031
The requested data item content is not in the cache of the edge cache node, where fkRequesting for a data item a corresponding data content unique identifier CID, FkAnd the CID set corresponding to the data item cached in the edge cache node when the request k arrives.
Specifically, in step S3, if t is the same asage(p(fk))≤Tlife(p(fk) The requested data item is fresh data if t)age(p(fk))>Tlife(p(fk) The requested data item is stale data, where fkRequesting k the corresponding CID for the data item, p (-) a mapping function from the request content CID to the data item, Tlife(. is) the effective lifecycle of the data item, tage(. cndot.) represents the age of the data item.
Specifically, the selecting, by using depth-enhanced learning in step S6, data to be replaced in the cache region of the edge cache node specifically includes:
1) at the moment n, observing the state information of the edge cache node to obtain the state s at the moment nn
2) According to a cache strategy pi (a)n|sn) Select cache action anAnd executing the caching action;
3) performing a caching action anThen, calculate the instant prize rnThe edge cache node state information is represented by snBecomes sn+1
4) Will award r immediatelynFeeding back to the edge cache node and converting the state<sn,an,rn,sn+1>As a training sample, an actor-critic network for training deep reinforcement learning, and the above process is repeated.
In particular, an instant prize rnThe calculation formula of (a) is as follows:
Figure BDA0001869689710000041
wherein, ReqnRepresenting a caching action anAfter execution, the action a is cached until the next executionn+1All data request sets received by the edge cache nodes in between, C (d)k) For obtaining data dkThe comprehensive cost of (2).
In particular, the combined cost C (d)k) The calculation formula of (a) is as follows:
C(dk)=α·c(dk)+(1-α)·l(dk)
Figure BDA0001869689710000042
Figure BDA0001869689710000043
wherein, α ∈ [0, 1]]Representing the compromise coefficient, c (d)k) Denotes the communication cost, l (d)k) Representing the data timeliness cost, c1Representing the communication overhead of data taken directly from the edge cache node, c2Representing the communication overhead for obtaining data from a data source, c1<c2And c is and c1、c2Are all normal numbers; f. ofkRequesting k the corresponding CID, F for the data itemkThe CID set corresponding to the data item cached in the edge cache node when the request k arrives, p (-) is a mapping function from the CID of the request content to the data item, Tlife(. is) the effective lifecycle of the data item, tage(. cndot.) represents the age of the data item.
Specifically, the actor network parameter θ in the deep reinforcement learning is updated according to the gradient rise as follows:
Figure BDA0001869689710000044
Figure BDA0001869689710000045
wherein λ is the learning rate of the mobile network,
Figure BDA0001869689710000046
representing gradient operators, strategy pi (a)n|sn(ii) a θ) is shown in state snNext, select cache replace action anThe probability of (a) of (b) being,
Figure BDA0001869689710000047
to advantageFunction, γ ∈ [0, 1]]The discount factor is represented by a number of discount factors,
Figure BDA0001869689710000048
representing a state-cost function;
network parameter theta of critic network in deep reinforcement learningvUpdate according to gradient descent is:
Figure BDA0001869689710000051
where λ' is the learning rate of the critic network.
Specifically, the actor network parameter θ in the deep reinforcement learning is updated according to the gradient rise as follows:
Figure BDA0001869689710000052
Figure BDA0001869689710000053
Figure BDA0001869689710000054
wherein λ is the learning rate of the mobile network,
Figure BDA0001869689710000055
representing gradient operators, strategy pi (a)n|sn(ii) a θ) is shown in state snNext, select cache replace action anThe probability of (a) of (b) being,
Figure BDA0001869689710000056
for the merit function, γ ∈ [0, 1]]The discount factor is represented by a number of discount factors,
Figure BDA0001869689710000057
representing a state-cost function; h (pi (· | s)n(ii) a θ)) is the state snStrategy nθthe strategy entropy of the output action space, beta, represents an exploration coefficient;
network parameter theta of critic network in deep reinforcement learningvUpdate according to gradient descent is:
Figure BDA0001869689710000058
where λ' is the learning rate of the critic network.
In a second aspect, an embodiment of the present invention provides a cache replacement system for transient data of an internet of things, where a cache space of a current edge cache node is full, and the system includes: the device comprises a state judgment module, a reading module, a request forwarding module and a cache replacement module;
the state judgment module is configured to judge a state of transient data requested by a user, where the state includes: the first state: the requested transient data item content is in the cache of the edge cache node and the requested transient data item is fresh data; and a second state: the requested transient data item content is in the cache of the edge cache node and the requested transient data item is stale data; and a third state: the transient data item content of the request is not in the cache of the edge cache node;
the reading module is used for directly reading the data from the cache region of the edge cache node and forwarding the data to the user when the judgment result of the state judgment module is that the state is one;
the request forwarding module is used for forwarding the user request to the data source by the edge cache node when the judgment result of the state judgment module is the state two or three, and reading new data from the data source;
the cache replacement module is configured to replace the expired data in the cache area of the edge cache node with the new data read by the request forwarding module and forward the new data to the user when the determination result of the state determination module is state two; and when the judgment result of the state judgment module is the state three, selecting the data to be replaced in the cache region of the edge cache node by using deep reinforcement learning, replacing the data to be replaced by the new data read by the request forwarding module, and forwarding the new data to the user.
In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the cache replacement method described in the first aspect.
Generally, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects:
1. the invention considers the effective life cycle of the transient data, and brings the data freshness of the data of the internet of things and the consumption of communication resources into the comprehensive communication cost, thereby providing the target of the transient data caching strategy of the internet of things: and the long-term comprehensive cost for acquiring the data of the Internet of things is minimized. By caching the transient data in the edge cache node of the network, the network flow can be unloaded to the edge of the network, the time delay is reduced, and the problems of large time delay and large communication resource consumption in the transmission of mass transient data of the Internet of things are solved to a certain extent.
2. The method uses a deep reinforcement learning method to learn the caching strategy of the transient data, particularly models a cache replacement problem as a Markov process, mines the heat trend information of the data from the data request history of the edge cache node, and combines the life cycle and the data freshness information of the transient data as the input of the environmental state of the deep reinforcement learning. The short term rewards are set as the inverse of the composite communication cost. By utilizing a critic network learning value function and an actor network learning strategy function, the cache strategy is learned in a self-adaptive manner by continuously carrying out cache replacement operation, and long-term reward is maximized, so that the long-term comprehensive cost of acquiring transient data in the Internet of things is minimized, and the problem of low transient data cache efficiency in an edge cache node under the condition of limited storage resources is solved.
Drawings
Fig. 1 is a flowchart of a cache replacement method for transient data of an internet of things according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
First, some terms used in the present invention are explained.
The edge cache node refers to a network node which is close to the user side and has caching capacity in the Internet of things.
The data freshness is equal to the ratio of the remaining effective time of the data to the effective life cycle, and the larger the ratio is, the more timely the data is used, and the higher the freshness of the data is.
Data timeliness refers to the time interval and efficiency between the generation of data from a data source and the acquisition of the data at a user end. The shorter the acquisition time interval, the more time-efficient.
The data heat degree represents the popularity of the data, namely the number of times of requesting in a certain time, and the higher the number of times, the higher the data heat degree.
Transient data refers to data that has time-dependent requirements for the data.
The conception of the invention is as follows: firstly, in order to research the comprehensive cost of transient data caching, the comprehensive cost for acquiring the data of the internet of things is divided into two parts: communication costs (including bandwidth consumption, latency, etc.) and data timeliness costs. The goal of cache replacement in the edge cache nodes of the internet of things is to minimize the long-term integrated cost of acquiring data, i.e., to consider both the communication cost and the data timeliness cost. Then, modeling the cache replacement problem as a markov process problem, thereby constructing a Deep Learning (DRL) -based cache policy: according to the history of data requests and the current cache state of the Internet of things within a period of time, a cache strategy is automatically learned, and the long-term comprehensive cost for acquiring the data of the Internet of things is minimized.
As shown in fig. 1, a cache replacement method for transient data of an internet of things, where a cache space of a current edge cache node is full, includes the following steps:
s1, an edge cache node receives a new transient data item request sent by a user;
s2, judging whether the content of the transient data item of the request is in the cache of the edge cache node, if so, entering a step S3, otherwise, entering a step S6;
s3, judging whether the transient data item of the request is fresh data or expired data, if so, entering a step S4, and if so, entering a step S5;
s4, directly reading the data from the cache region of the edge cache node, and forwarding the data to a user;
s5, the edge cache node forwards the user request to a data source, reads new data from the data source, replaces the expired data in a cache region of the edge cache node with the new data, and forwards the new data to the user;
s6, the edge cache node forwards the user request to a data source, reads new data from the data source, selects data to be replaced in a cache region of the edge cache node by using depth-enhanced learning, replaces the data to be replaced by the new data, and forwards the new data to the user.
S1, the edge cache node receives a new transient data item request sent by a user.
Data item d in the internet of things is uniquely identified by CID (Content ID), and each data item comprises two fields: generating a time tgen(d) And effective life cycle Tlife(d) In that respect At time t, the age of the data item d is denoted tage(d)=t-tgen(d) In that respect If the age of the data item d is less than its effective life cycle, i.e. tage(d)<Tlife(d) Then the data item d is said to be fresh data, in the effective life cycle; otherwise, data item d is said to be stale data, which is stale.
Recording a data item request from a user side of the Internet of things as k, wherein the CID corresponding to the content of the requested data item is fkThe time of arrival of the request is tk. At tkAt any moment, the data item set cached in the edge cache node of the Internet of things is recorded as
Figure BDA0001869689710000081
The CID set corresponding to the cached data item is
Figure BDA0001869689710000082
Wherein, I represents the maximum cache data item capacity of the edge cache node. Mapping function
Figure BDA0001869689710000083
CID information of content to be requested
Figure BDA0001869689710000084
Data item with cache
Figure BDA0001869689710000085
Are linked together.
When a data item request k arrives, the edge cache node of the Internet of things firstly checks whether CID (CID) is f in the cache or notkAnd cache data items freshly. Consider three cases:
case 1: f. ofk∈FkAnd t isage(p(fk))≤Tlife(p(fk) I.e. the requested data item content is in the cache and the cache item is fresh, meeting timeliness requirements. Thus, the edge cache node returns the cached data item p (f) directlyk) To the data requestor.
Case 2: f. ofk∈FkAnd t isage(p(fk))>Tlife(p(fk) I.e., the requested data item content is in the cache and the cache item has expired. Thus, the edge cache node retrieves new data from the data source back to the data requestor and replaces the stale data in the cache with the newly retrieved data.
Case 3:
Figure BDA0001869689710000091
i.e. the requested data item content is not in the cache. Therefore, the edge cache node acquires new data from the data source and returns the new data to the data requester, and meanwhile, the data to be replaced in the cache region of the edge cache node is selected by using the depth-enhanced learning, and the data to be replaced is replaced by the new data.
Through the analysis of the three situations, the user sends the request k and receives the return data item dkCan be expressed as:
Figure BDA0001869689710000092
wherein f iskRequesting k the corresponding CID for the data item, p (-) a mapping function from the request content CID to the data item, FkA CID set corresponding to the data item cached in the edge cache node when the request k arrives, tage(d) Indicates the age, T, of the data item dlife(d) Representing the effective life cycle of data item d.
And S2, judging whether the content of the transient data item requested is in the cache of the edge cache node, if so, entering the step S3, and otherwise, entering the step S6.
fk∈FkIndicating that the requested data item content is in the cache of the edge cache node,
Figure BDA0001869689710000093
indicating that the requested data item content is not in the cache of the edge cache node.
And S3, judging whether the transient data item of the request is fresh data or expired data, if so, entering the step S4, and if so, entering the step S5.
tage(p(fk))≤Tlife(p(fk) Indicates that the requested data item is fresh data, tage(p(fk))>Tlife(p(fk) Indicates that the requested data item is stale data.
For the cache replacement policy of the edge cache node, initially, the edge cache node caches all arriving data items greedily until the cache space is full. When the cache is full, if a newly arrived request corresponds to the situation 1, cache replacement is not needed; corresponding to the situation 2, since it is known that the cache data corresponding to the current request is expired, the expired data is directly replaced by the newly acquired data; corresponding to case 3, when new data arrives, the cache replacement method is required to determine whether to replace the cached data item in the cache region with the new data item, and if so, which cached data item is specifically replaced.
Specifically, for data item dkThe caching action given by the caching strategy is marked as akThe motion space is A ═ a0,a1,…,aI}。ak=a0Indicating no cache replacement, ak=ai(1 ≦ I ≦ I) representing the data item d in fresh data obtained from the data sourcekReplacement of cache area
Figure BDA0001869689710000101
And cache items corresponding to the positions. When the cache area executes cache replacement action akThereafter, DkAnd FkWill become Dk+1And Fk+1
And S6, the edge cache node forwards the user request to a data source, reads new data from the data source, selects data to be replaced in a cache region of the edge cache node by using depth-enhanced learning, replaces the data to be replaced by the new data, and forwards the new data to the user.
Step S6 corresponds to case 3, and defines and obtains transient data item d by selecting data to be replaced in the cache region of the edge cache node using deep reinforcement learningkCombined cost of C (d)k)。
A transient data item d will be acquiredkCombined cost of C (d)k) Divided into two parts, one part being the communication cost c (d)k) The other part is the data timeliness cost l (d)k)。
Communication cost c (d)k) The calculation formula of (a) is as follows:
Figure BDA0001869689710000102
wherein, c1Representing the communication overhead of data taken directly from the edge cache node, c2Representing the communication overhead for obtaining data from a data source, c1<c2And c is and c1、c2They are all normal numbers.
Data timeliness cost l (d)k) The calculation formula of (a) is as follows:
Figure BDA0001869689710000111
combined cost C (d)k) The calculation formula of (a) is as follows:
C(dk)=α·c(dk)+(1-α)·l(dk)
wherein, α belongs to [0, 1] to represent a compromise coefficient, the importance of the two costs is weighted, the larger α represents that the user is more interested in communication loss, otherwise, the user is more interested in data timeliness.
In order to optimize the comprehensive cost of data acquisition of the Internet of things, the cache replacement problem is modeled into a Markov process problem. Both cases 1 and 2 are deterministic rules for the previous cache, so only the cache replacement action in scenario 3 needs to be optimized.
The Markov process problem can be solved by { S, A, M (S)n+1|sn,an),R(sn,an) Definition, wherein S represents a state set of an edge cache node of the Internet of things system, and SnRepresenting the state of the edge cache node at the n moment; a represents the action set of the cache replacement policy, anRepresenting the caching action at the n moment; m(s)n+1|sn,an) Indicating the execution of action anThereafter, the state of the edge cache node is from snIs transferred to sn+1Of the state transition probability matrix R(s)n,an) Representing an instant reward function, at state snPerforming action anThe latter system rewards feedback. Thus, the entire cache replacement process can be expressed as:
1) at n time, the edge cache node observes the system state information to obtain the state s of the system at n timen∈S。
2) The edge cache node is according to the cache strategy pi (a)n|sn) Select cache action anAnd executed.
3) Performing caching actionsanThereafter, the system returns an instant prize rn=R(sn,an) And the system state is represented by snIs transferred to sn+1
4) Instant reward rnFeeding back to the edge cache node to convert the current state<sn,an,rn,sn+1>And adding the training samples into an experience pool of deep reinforcement learning for training an actor-critic network, and repeating the process.
Wherein, the cache strategy is pi (a)n|sn) Indicates in a state snNext, select cache replace action anThe probability of (c) can be abbreviated as pi.
Long-term cumulative rewards are recorded for the whole process
Figure BDA0001869689710000121
Wherein γ ∈ [0, 1]]Is a discount coefficient which determines how much future rewards will affect the current stage caching decision. Therefore, the aim of the invention is to find the optimal caching strategy pi*Maximizing long-term jackpot expectations in all states:
Figure BDA0001869689710000122
wherein E [ R ]n|π]Long-term cumulative reward R representing a caching strategy pinThe expectation is that.
In order to measure the advantages and disadvantages of the caching strategy pi, a cost function V is definedπ(sn)=E[Rn|sn;π]Representing a cache policy pi in state snLong term accumulated reward RnThe expectation is that. State snThe optimal cost function of*(sn)=maxπVπ(sn)。
From Markov properties, will Vπ(sn) Substituting the Bellman equation yields:
Figure BDA0001869689710000123
wherein, p(s)n+1,rn|sn,an) Is shown in state snLower execution action anThen, the state is shifted ton+1And receive the system instant prize rnIs given by [0, 1]]Representing the discount coefficient. The formula provides an iterative computation method of the cost function in the Markov process.
Given a state transition probability matrix M(s) for the Markov processn+1|sn,an) The optimal strategy can be solved by a dynamic planning method. However, in the process of edge cache replacement, the state transition probability matrix is unknown, so that the method intelligently mines information from historical data by using a deep reinforcement learning method and learns a fitting state-value function by using a deep neural network, thereby obtaining an optimal cache replacement strategy.
The invention uses a deep reinforcement learning method to adaptively learn an efficient cache strategy, takes an Actor-Critic (Actor-Critic) deep reinforcement learning method as an example, but is not limited to the Actor-Critic method, and introduces the technical details of the invention. Wherein:
inputting: in the process that the edge cache node continuously processes the user request, the situation 3 occurs, that is, when the content of the requested data item is not in the cache, the edge cache node obtains the requested data item d from the data source by forwarding the requestn. At this point, the caching policy Agent (Agent) observes the current environmental state
Figure BDA0001869689710000131
As input to the deep neural network. Wherein the content of the first and second substances,
Figure BDA0001869689710000132
indicating the currently requested data item dnIs determined by the feature vector of (a),
Figure BDA0001869689710000133
a feature vector representing the ith data item in the cache region of the edge cache node. Feature vector
Figure BDA0001869689710000134
Representing the content of data items in each group in past J-group requests
Figure BDA0001869689710000135
The feature vector reflects the heat of the data item.
Figure BDA0001869689710000136
Representing data item content
Figure BDA0001869689710000137
The effective life cycle of the system is as follows,
Figure BDA0001869689710000138
representing data item content
Figure BDA0001869689710000139
Freshness equal to the data item content
Figure BDA00018696897100001310
Is compared to the effective life cycle. In addition to the data request information, the input information may include scene information, edge network information, and the like.
Strategy: the present invention characterizes a replacement policy by an actor networkθ(an|sn) And can also be expressed as pi (a)n|sn(ii) a Theta), may be abbreviated as piθThe actor network parameter is θ. The input of the actor network is the state information s of the edge cache node at the time nnThe output of the actor network is the probability of selecting each cache action. The deep reinforcement learning is finally according to the strategy piθ(an|sn) Selecting cache replacement action, namely executing replacement action a on data at the cache region position with the maximum probability of selecting each cache actionn. For example, there are 3 buffers in the edge cache node (index bits 1,2,3, then 0 for no replacement). The probability of the actor's network output is (0.1,0.1,0.7,0.1), and then the alternate location is selected according to this probability, where the output is likely to beIs a2Therefore, the buffer 2 is replaced. The aim of the actor network is to output a corresponding cache replacement action according to the learned cache policy, thereby maximizing the long-term cumulative reward expectation ERnθ]。
State-cost function: the invention characterizes a state-value function with a critic network
Figure BDA00018696897100001311
The comment family network parameter is thetav. The input of the critic network is the state information s of the edge cache node at the n momentnThe output of the critic's network is that the state is in the current policy πθThe values indicated below. The goal of the critic network is to estimate the strategy pi as accurately as possibleθLower state snThe value of (A) is obtained.
Strategy training: cache replace action a Per executionnAnd the system feeds back an instant reward r to the cache strategy intelligent decision modulen
Figure BDA0001869689710000141
Wherein, ReqnRepresenting a caching action anAfter execution, the action a is cached until the next executionn+1All data request sets received by the edge cache nodes in between, C (d)k) For obtaining data dkThe comprehensive cost of (2). Since the goal is to minimize the long-term cost of acquiring data, the cost is preceded by a negative sign.
The expectation of the long-term total reward can be estimated from the Bellman equation and the state transition trajectory of the markov process, so that the network parameters are learned by means of gradient updating. The gradient of the total reward expectation versus actor network parameter θ may be calculated as:
Figure BDA0001869689710000142
to maximize the overall reward expectation, the actor network parameter θ is updated according to a gradient ramp as:
Figure BDA0001869689710000143
Figure BDA0001869689710000144
wherein, λ is the learning rate of the actor network, which can be adjusted according to the actual situation;
Figure BDA0001869689710000145
representing a gradient operator; merit function
Figure BDA0001869689710000146
Measure at state snNext, select action anHow good is it.
The critic network can be trained by a time difference method, and the loss function is set as the output of the critic network
Figure BDA0001869689710000147
And a target value
Figure BDA0001869689710000148
The square error of (d). Network parameter theta of critic networkvUpdate according to gradient descent is:
Figure BDA0001869689710000149
wherein λ' is the learning rate of the critic network, and can be adjusted according to actual conditions;
Figure BDA00018696897100001410
representing the gradient operator.
To address the "exploration-exploitation dilemma" in reinforcement learning, "exploitation" means taking the best action that has been currently learned, while "exploration" attempts to fully explore the action space by taking currently non-best actions. In order to prevent the learning strategy from falling into local optimum, the technical scheme adds the entropy of the strategy (action probability distribution output by the actor network) into the updating process of the actor network parameter theta in the form of a regular term, thereby realizing the encouragement of an 'exploration' process:
Figure BDA0001869689710000151
wherein H (pi (. | s)n(ii) a θ)) is the state snStrategy nθthe strategy entropy of the output motion space, β represents an exploration coefficient,
Figure BDA0001869689710000152
representing the gradient operator. The specific calculation is as follows:
Figure BDA0001869689710000153
the method is characterized in that a gradient ascending method is adopted, theta is updated towards the direction of entropy increase, and the 'exploration' process is encouraged.
The caching strategy supports two modes of online learning and offline learning. The online learning can be directly deployed to the edge cache nodes, network parameters are periodically updated according to the Internet of things request data processed by the edge cache nodes, and a cache strategy is learned. Offline learning is performed by pre-training a cache strategy on line, and then the cache strategy is deployed to an edge cache node and is kept unchanged.
The invention provides a cache replacement system for transient data of the Internet of things, which comprises: the device comprises a state judgment module, a reading module, a request forwarding module and a cache replacement module;
the state judgment module is configured to judge a state of transient data requested by a user, where the state includes: the first state: the requested transient data item content is in the cache of the edge cache node and the requested transient data item is fresh data; and a second state: the requested transient data item content is in the cache of the edge cache node and the requested transient data item is stale data; and a third state: the transient data item content of the request is not in the cache of the edge cache node;
the reading module is used for directly reading the data from the cache region of the edge cache node and forwarding the data to the user when the judgment result of the state judgment module is that the state is one;
the request forwarding module is used for forwarding the user request to the data source by the edge cache node when the judgment result of the state judgment module is the state two or three, and reading new data from the data source;
the cache replacement module is configured to replace the expired data in the cache area of the edge cache node with the new data read by the request forwarding module and forward the new data to the user when the determination result of the state determination module is state two; and when the judgment result of the state judgment module is the state three, selecting the data to be replaced in the cache region of the edge cache node by using deep reinforcement learning, replacing the data to be replaced by the new data read by the request forwarding module, and forwarding the new data to the user.
The system also comprises a training module which is used for collecting the cache replacement action, instant reward brought by the replacement action and the states of the edge cache nodes before and after replacement when the cache replacement action is executed every time, and training the network parameters of deep reinforcement learning in the cache replacement module based on the parameters.
The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (7)

1. A cache replacement method for transient data of the Internet of things is characterized in that the cache space of a current edge cache node is full, and the method comprises the following steps:
s1, an edge cache node receives a new transient data item request sent by a user;
s2, judging whether the content of the transient data item of the request is in the cache of the edge cache node, if so, entering a step S3, otherwise, entering a step S6;
s3, judging whether the transient data item of the request is fresh data or expired data, if so, entering a step S4, and if so, entering a step S5;
s4, directly reading the data from the cache region of the edge cache node, and forwarding the data to a user;
s5, the edge cache node forwards the user request to a data source, reads new data from the data source, replaces the expired data in a cache region of the edge cache node with the new data, and forwards the new data to the user;
s6, the edge cache node forwards the user request to a data source, reads new data from the data source, selects data to be replaced in a cache region of the edge cache node by using depth-enhanced learning, replaces the data to be replaced by the new data, and forwards the new data to the user;
in step S6, the selecting data to be replaced in the cache region of the edge cache node using depth-enhanced learning specifically includes:
1) at the moment n, observing the state information of the edge cache node to obtain the state s at the moment nn
2) According to a cache strategy pi (a)n|sn) Select cache action anAnd executing the caching action;
3) performing a caching action anThen, calculate the instant prize rnThe edge cache node state information is represented by snBecomes sn+1
4) Will award r immediatelynFeeding back to the edge cache node and converting the state<sn,an,rn,sn+1>As a training sample, training an actor-critic network for deep reinforcement learning, and repeating the process;
the network parameter θ of the actor in the deep reinforcement learning is updated according to the gradient rise as follows:
Figure FDA0002345645620000021
Figure FDA0002345645620000022
wherein λ is the learning rate of the mobile network,
Figure FDA0002345645620000023
representing gradient operators, strategy pi (a)n|sn(ii) a θ) is shown in state snNext, select cache replace action anThe probability of (a) of (b) being,
Figure FDA0002345645620000024
for the merit function, γ ∈ [0, 1]]The discount factor is represented by a number of discount factors,
Figure FDA0002345645620000025
representing a state-cost function; network parameter theta of critic network in deep reinforcement learningvUpdate according to gradient descent is:
Figure FDA0002345645620000026
wherein λ' is the learning rate of the critic network;
or, the actor network parameter θ in the deep reinforcement learning is updated according to the gradient rise as:
Figure FDA0002345645620000027
Figure FDA0002345645620000028
Figure FDA0002345645620000029
wherein λ is the learning rate of the mobile network,
Figure FDA00023456456200000210
representing gradient operators, strategy pi (a)n|sn(ii) a θ) is shown in state snNext, select cache replace action anThe probability of (a) of (b) being,
Figure FDA00023456456200000211
for the merit function, γ ∈ [0, 1]]The discount factor is represented by a number of discount factors,
Figure FDA00023456456200000212
representing a state-cost function; h (pi (· | s)n(ii) a θ)) is the state snStrategy nθstrategy entropy of output action space, beta represents exploration coefficient, and network parameter theta of critic network in deep reinforcement learningvUpdate according to gradient descent is:
Figure FDA0002345645620000031
where λ' is the learning rate of the critic network.
2. The cache replacement method according to claim 1, wherein in step S2, if f is greater than fk∈FkThe requested data item content is in the cache of the edge cache node, if so
Figure FDA0002345645620000032
The requested data item content is not in the cache of the edge cache node, where fkRequesting for a data item a corresponding data content unique identifier CID, FkAnd the CID set corresponding to the data item cached in the edge cache node when the request k arrives.
3. The cache replacement method according to claim 1, wherein in step S3, if t is the caseage(p(fk))≤Tlife(p(fk) The requested data item is fresh data if t)age(p(fk))>Tlife(p(fk)),The requested data item is stale data, where fkRequesting k the corresponding CID for the data item, p (-) a mapping function from the request content CID to the data item, Tlife(. is) the effective lifecycle of the data item, tage(. cndot.) represents the age of the data item.
4. The cache replacement method of claim 1, wherein an instant prize rnThe calculation formula of (a) is as follows:
Figure FDA0002345645620000033
wherein, ReqnRepresenting a caching action anAfter execution, the action a is cached until the next executionn+1All data request sets received by the edge cache nodes in between, C (d)k) For obtaining data dkThe comprehensive cost of (2).
5. The cache replacement method of claim 4, wherein the combined cost C (d)k) The calculation formula of (a) is as follows:
C(dk)=α·c(dk)+(1-α)·l(dk)
Figure FDA0002345645620000041
Figure FDA0002345645620000042
wherein, α ∈ [0, 1]]Representing the compromise coefficient, c (d)k) Denotes the communication cost, l (d)k) Representing the data timeliness cost, c1Representing the communication overhead of data taken directly from the edge cache node, c2Representing the communication overhead for obtaining data from a data source, c1<c2And c is and c1、c2Are all normal numbers; f. ofkRequesting k the corresponding CID, F for the data itemkCaching in edge cache nodes for request k arrivalP (-) is a mapping function from the request content CID to the data item, Tlife(. is) the effective lifecycle of the data item, tage(. cndot.) represents the age of the data item.
6. A cache replacement system for transient data of the Internet of things is characterized in that the cache space of a current edge cache node is full, and the system comprises: the device comprises a state judgment module, a reading module, a request forwarding module and a cache replacement module;
the state judgment module is configured to judge a state of transient data requested by a user, where the state includes: the first state: the requested transient data item content is in the cache of the edge cache node and the requested transient data item is fresh data; and a second state: the requested transient data item content is in the cache of the edge cache node and the requested transient data item is stale data; and a third state: the transient data item content of the request is not in the cache of the edge cache node;
the reading module is used for directly reading the data from the cache region of the edge cache node and forwarding the data to the user when the judgment result of the state judgment module is that the state is one;
the request forwarding module is used for forwarding the user request to the data source by the edge cache node when the judgment result of the state judgment module is the state two or three, and reading new data from the data source;
the cache replacement module is configured to replace the expired data in the cache area of the edge cache node with the new data read by the request forwarding module and forward the new data to the user when the determination result of the state determination module is state two; when the judgment result of the state judgment module is state three, selecting data to be replaced in the cache region of the edge cache node by using deep reinforcement learning, replacing the data to be replaced by the new data read by the request forwarding module, and forwarding the new data to the user;
the selecting of the data to be replaced in the cache region of the edge cache node by using the deep reinforcement learning specifically includes:
1) at the moment n, observing the state information of the edge cache node to obtain the state s at the moment nn
2) According to a cache strategy pi (a)n|sn) Select cache action anAnd executing the caching action;
3) performing a caching action anThen, calculate the instant prize rnThe edge cache node state information is represented by snBecomes sn+1
4) Will award r immediatelynFeeding back to the edge cache node and converting the state<sn,an,rn,sn+1>As a training sample, training an actor-critic network for deep reinforcement learning, and repeating the process;
the network parameter θ of the actor in the deep reinforcement learning is updated according to the gradient rise as follows:
Figure FDA0002345645620000051
Figure FDA0002345645620000052
wherein λ is the learning rate of the mobile network,
Figure FDA0002345645620000053
representing gradient operators, strategy pi (a)n|sn(ii) a θ) is shown in state snNext, select cache replace action anThe probability of (a) of (b) being,
Figure FDA0002345645620000054
for the merit function, γ ∈ [0, 1]]The discount factor is represented by a number of discount factors,
Figure FDA0002345645620000055
representing a state-cost function; network parameter theta of critic network in deep reinforcement learningvUpdate according to gradient descent is:
Figure FDA0002345645620000061
wherein λ' is the learning rate of the critic network;
or, the actor network parameter θ in the deep reinforcement learning is updated according to the gradient rise as:
Figure FDA0002345645620000062
Figure FDA0002345645620000063
Figure FDA0002345645620000064
wherein λ is the learning rate of the mobile network,
Figure FDA0002345645620000065
representing gradient operators, strategy pi (a)n|sn(ii) a θ) is shown in state snNext, select cache replace action anThe probability of (a) of (b) being,
Figure FDA0002345645620000066
for the merit function, γ ∈ [0, 1]]The discount factor is represented by a number of discount factors,
Figure FDA0002345645620000067
representing a state-cost function; h (pi (· | s)n(ii) a θ)) is the state snStrategy nθstrategy entropy of output action space, beta represents exploration coefficient, and network parameter theta of critic network in deep reinforcement learningvUpdate according to gradient descent is:
Figure FDA0002345645620000068
where λ' is the learning rate of the critic network.
7. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the cache replacement method of any one of claims 1 to 5.
CN201811370683.4A 2018-11-17 2018-11-17 Cache replacement method and system for transient data of Internet of things Active CN109660598B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811370683.4A CN109660598B (en) 2018-11-17 2018-11-17 Cache replacement method and system for transient data of Internet of things

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811370683.4A CN109660598B (en) 2018-11-17 2018-11-17 Cache replacement method and system for transient data of Internet of things

Publications (2)

Publication Number Publication Date
CN109660598A CN109660598A (en) 2019-04-19
CN109660598B true CN109660598B (en) 2020-05-19

Family

ID=66111253

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811370683.4A Active CN109660598B (en) 2018-11-17 2018-11-17 Cache replacement method and system for transient data of Internet of things

Country Status (1)

Country Link
CN (1) CN109660598B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110456647B (en) * 2019-07-02 2020-11-27 珠海格力电器股份有限公司 Intelligent household control method and intelligent household control device
CN113055721B (en) * 2019-12-27 2022-12-09 中国移动通信集团山东有限公司 Video content distribution method and device, storage medium and computer equipment
CN111277666B (en) * 2020-02-21 2021-06-01 南京邮电大学 Online collaborative caching method based on freshness
CN111292001B (en) * 2020-02-24 2023-06-02 清华大学深圳国际研究生院 Combined decision method and device based on reinforcement learning
WO2021253168A1 (en) * 2020-06-15 2021-12-23 Alibaba Group Holding Limited Managing data stored in a cache using a reinforcement learning agent
CN113630742B (en) * 2020-08-05 2023-02-17 北京航空航天大学 Mobile edge cache replacement method adopting request rate and dynamic property of information source issued content
CN113038616B (en) * 2021-03-16 2022-06-03 电子科技大学 Frequency spectrum resource management and allocation method based on federal learning
CN113115368B (en) * 2021-04-02 2022-08-05 南京邮电大学 Base station cache replacement method, system and storage medium based on deep reinforcement learning
CN113115362B (en) * 2021-04-16 2023-04-07 三峡大学 Cooperative edge caching method and device
CN113395333B (en) * 2021-05-31 2022-03-25 电子科技大学 Multi-edge base station joint cache replacement method based on intelligent agent depth reinforcement learning
CN113438315B (en) * 2021-07-02 2023-04-21 中山大学 Internet of things information freshness optimization method based on double-network deep reinforcement learning
CN113676513B (en) * 2021-07-15 2022-07-01 东北大学 Intra-network cache optimization method driven by deep reinforcement learning
CN114170560B (en) * 2022-02-08 2022-05-20 深圳大学 Multi-device edge video analysis system based on deep reinforcement learning
CN115914388A (en) * 2022-12-14 2023-04-04 广东信通通信有限公司 Resource data fresh-keeping method based on monitoring data acquisition

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102291447A (en) * 2011-08-05 2011-12-21 中国电信股份有限公司 Content distribution network load scheduling method and system
CN107479829A (en) * 2017-08-03 2017-12-15 杭州铭师堂教育科技发展有限公司 A kind of Redis cluster mass datas based on message queue quickly clear up system and method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7921259B2 (en) * 2007-09-07 2011-04-05 Edgecast Networks, Inc. Content network global replacement policy
CN106452919B (en) * 2016-11-24 2019-10-25 浪潮集团有限公司 A kind of mist node optimization method based on fuzzy theory
CN106888270B (en) * 2017-03-30 2020-06-23 网宿科技股份有限公司 Method and system for back source routing scheduling

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102291447A (en) * 2011-08-05 2011-12-21 中国电信股份有限公司 Content distribution network load scheduling method and system
CN107479829A (en) * 2017-08-03 2017-12-15 杭州铭师堂教育科技发展有限公司 A kind of Redis cluster mass datas based on message queue quickly clear up system and method

Also Published As

Publication number Publication date
CN109660598A (en) 2019-04-19

Similar Documents

Publication Publication Date Title
CN109660598B (en) Cache replacement method and system for transient data of Internet of things
Zhu et al. Caching transient data for Internet of Things: A deep reinforcement learning approach
He et al. QoE-driven content-centric caching with deep reinforcement learning in edge-enabled IoT
Zhang et al. Joint optimization of cooperative edge caching and radio resource allocation in 5G-enabled massive IoT networks
CN109639760A (en) It is a kind of based on deeply study D2D network in cache policy method
CN112752308B (en) Mobile prediction wireless edge caching method based on deep reinforcement learning
CN113687960B (en) Edge computing intelligent caching method based on deep reinforcement learning
CN104822068A (en) Streaming media proxy cache replacing method and device
CN113115368A (en) Base station cache replacement method, system and storage medium based on deep reinforcement learning
CN111881358A (en) Object recommendation system, method and device, electronic equipment and storage medium
Qiu et al. Oa-cache: Oracle approximation based cache replacement at the network edge
Somesula et al. Cooperative cache update using multi-agent recurrent deep reinforcement learning for mobile edge networks
CN116346837A (en) Internet of things edge collaborative caching method based on deep reinforcement learning
CN116321307A (en) Bidirectional cache placement method based on deep reinforcement learning in non-cellular network
WO2012176924A1 (en) Information processing device, information processing system, information processing method and program
CN117221403A (en) Content caching method based on user movement and federal caching decision
He et al. Prediction of electricity demand of China based on the analysis of decoupling and driving force
CN107241442B (en) A kind of key assignments data storage storehouse copy selection method based on prediction
Yan et al. Drl-based collaborative edge content replication with popularity distillation
CN111901394A (en) Method and system for caching moving edge by jointly considering user preference and activity degree
CN116320024A (en) Online cache scheduling method and device based on prediction information
CN106936913B (en) Cache updating method and network based on node displacement and LRU (least recently used) record
Wang et al. Ice: Intelligent caching at the edge
Si et al. Edge Caching Strategy Based on User's Long and Short Term Interests
Li et al. A Deep Reinforcement Learning-Based Content Updating Algorithm for High Definition Map Edge Caching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant