CN109086224B

CN109086224B - Caching method for capturing thermal data by self-adaptive classification reuse distance

Info

Publication number: CN109086224B
Application number: CN201810748365.0A
Authority: CN
Inventors: 邓玉辉; 艾亮
Original assignee: Jinan University
Current assignee: Jinan University
Priority date: 2018-07-10
Filing date: 2018-07-10
Publication date: 2022-10-21
Anticipated expiration: 2038-07-10
Also published as: CN109086224A

Abstract

The invention discloses a caching method for capturing thermal data by self-adaptive classification reuse distance, which aims at the caching strategy problem in the field of computer storage. Most of the conventional cache policy schemes only consider two characteristics, namely new progress (Recency) or frequency, reflected by data, and cannot capture mode features reflected by data access more deeply, and in addition, an algorithm does not have the capability of performing adaptive adjustment on captured data feature changes, so that the phenomena of low data hit rate and low stability of the policy may be caused. Aiming at the problem, the caching method deeply mines the characteristics of a data access mode by utilizing the reuse distance characteristics of data and cache replaced metadata historical information, and processes the processes of hot data identification and cache time allocation in a self-adaptive mode, thereby achieving the purpose of improving the algorithm hit rate.

Description

Caching method for capturing thermal data by self-adaptive classification reuse distance

Technical Field

The invention relates to the technical field of storage systems, in particular to a caching method for capturing thermal data by using a self-adaptive classification Reuse distance (Reuse-distance).

Background

The caching technology is not limited to caching, and since a modern computer system proposes a memory hierarchy, a CPU register set, a cache, a main memory, an SSD and an HDD storage device, and even a remote secondary storage device such as a distributed file system, can be understood as existing as a cache of a next layer of a larger and slower storage device, so that the storage device itself has a caching problem. The cache replacement algorithm is a core part of the research of the cache technology, and the research is directly related to the research of computer storage systems, operating systems, file systems, databases, web servers, and even various other application technologies such as cold and hot data identification and data compression.

The design principle of the conventional cache replacement algorithm mainly utilizes the locality principle of the cache (i.e. the tendency of a program to access data and code in a local area) and the frequency of data access. However, an algorithm designed only by using two characteristics of new progress or frequency reflected by data, such as LRU, LFU, 2Q, MQ algorithm and the like, although having the characteristics of simplicity, easy implementation, wide application and the like, firstly only considering the two indexes of new progress or frequency cannot fully reflect the mode characteristics of data access; secondly, the mode of data access is dynamic, so the algorithm has no capability of making self-adaptive adjustment aiming at the captured data characteristic change; finally, the algorithm improvement methods proposed above for these two features have tended to mature (LRU algorithm was already widely used in the beginning of the 80 s), so that it is difficult to make a major breakthrough.

Therefore, the current superior cache replacement algorithms such as the LIRS, ARC and CAR algorithms, etc. such as the LIRS algorithm can predict the future access sequence of data by utilizing the reuse distance characteristics of the data; the ARC algorithm is adaptive, and the algorithm can adjust itself according to changes in data access modes. However, despite the advantages of these excellent algorithms, there are problems such as the inability to capture hot data larger than the cache and cache pollution.

Therefore, for the current large-scale I/O data stream environment, it is important to design a novel cache scheme for effectively identifying hot data to improve the storage performance of a computer, on the condition of reducing the memory overhead and the calculation overhead of a storage system, how to change the problem of low data hit rate of the conventional cache technology.

Disclosure of Invention

The invention aims to solve the above defects in the prior art and provide a cache method for capturing hot data by using an adaptive classification reuse distance.

The purpose of the invention can be achieved by adopting the following technical scheme:

a caching method for capturing hot data by self-adaptive classification reuse distance comprises the steps of cache region division, load data stream data type identification and cache allocation of different types of data, wherein:

the cache region is divided as follows:

(a) The whole Cache is provided with two types of Cache stacks, namely a Real Cache stack (Real Cache) and a Ghost Cache stack (Ghost Cache), wherein the Real Cache stack stores Real data and metadata, the Ghost Cache stack only stores the metadata, the data structure of the Real Cache stack is an LRU (Least recently used) stack structure, and the Ghost Cache stack is a first-in first-out queue structure;

(b) The whole cache is divided into three areas according to characteristics of new progress (Recency), frequency and reuse distance: a potential hot data area R, a short reuse distance hot data area S and a long reuse distance hot data area L;

(c) The potentially hot data region R stores data that has been accessed once recently, which continues to be divided into short reuse distance stacks R according to the size of the reuse distance ₁ And a long reuse distance stack R ₂ Short reuse distance stack R ₁ For a real cache stack, a long reuse distance stack R ₂ Is a ghost cache stack. The short reuse distance hot data area S and the long reuse distance hot data area L are both hot data areas, store data with the latest access times larger than or equal to 2 times, and are respectively provided with an actual cache stack and a ghost cache stack. I.e. the short reuse distance thermal data area S is divided into S ₁ Stack and S ₂ Stack of S ₁ The stack is used to store a short reuse distance hot data area, which is the actual cache stack, S ₂ The stack is used for storing the slave S ₁ The area of the cold data that is evicted by the stack, which is the ghost cache stack. Similarly, the long reuse distance thermal data region L is divided into L ₁ Stack and L ₂ Stack and have the same division characteristics as S. Wherein L is ₁ The stack is used to store a long reuse distance hot data region, which is the actual cache stack, L ₂ Stack for storing slave L ₁ The area of the cold data that is evicted by the stack, which is the ghost cache stack.

(d) In all the above partitioning manners, assuming that the size of the cached data is C, the size of the area thereof must satisfy the following regulations:

(1)|R ₁ |≤mC，m∈(0.1,1)；

(2)|R ₁ |+|R ₂ |≤C,|S ₁ |+|S ₂ |≤C,|L ₁ |+|L ₂ |≤C；

(3)|R ₁ |+|S ₁ |+|L ₁ |≤C；

(4)|R|+|S|+|L|≤3C；

note that this | | symbol refers to the number of data in the cache, and in this embodiment, R is ₁ The upper limit value of the stack data size is set to be 0.2C, but in practical application, m =0.2 does not limit the technical solution of the present invention.

The identification of the flow data type of the load data and the distribution of the cache of different types of data comprise the following steps:

s1, initially, the actual buffer stack and the ghost buffer stack are both empty. For each read request or write request in the payload data stream, there are and only three cases that occur: A. actual cache stack hit; B. ghost cache stack hit; C. neither the real cache stack nor the ghost cache stack has a hit. For the three cases, when A is satisfied, executing step S2; when B is satisfied, executing step S3; when C is satisfied, executing step S4;

s2, when the actual cache stack is hit, if the hit area occurs in the actual cache stack of the potential hot data area, migrating the data to the MRU position of the actual cache stack of the short reuse distance hot data area; if the hit area occurs in the actual buffer stacks of the short reuse distance hot data area and the long reuse distance hot data area, moving the data to the MRU (Most recent used) position of each stack;

further, when a real cache stack hit occurs, it is stated that the data is accessed at least 2 times recently at this time, and thus the data should be stored into the hot data S ₁ Stack or L ₁ In the stack, specifically, when R is hit ₁ When stacked, the data will be from R ₁ Stack migration to S ₁ MRU position of stack; when hit S ₁ Stack or L ₁ When stacking, the data is already in the hot data stack, so that the data only needs to be moved to the MRU position of each stack;

s3, when ghost cache stack hit occurs, three steps are required to be executed: 1. adjusting the matching value of each type of actual cache size according to the current data access change trend; 2. when the actual cache is full, the algorithm decides which page of the actual cache stack is eliminated according to the matching value; 3. if the hit area occurs in a ghost cache stack of the potential hot data area, storing the data to the MRU position of an actual cache stack of the long reuse distance hot data area; if the hit area occurs in the ghost cache stack of the short reuse distance hot data area, storing the data to the MRU position of the actual cache stack of the short reuse distance hot data area; if the hit area occurs in the ghost cache stack of the long reuse distance hot data area, storing the data to the MRU position of the actual cache stack of the long reuse distance hot data area;

further, when a ghost cache stack hit occurs, three steps need to be performed at this time: 1) Adjusting the matching value of each type of actual cache size according to the current data access change trend; 2) When the actual cache is full, the algorithm decides which page of the actual cache stack is eliminated according to the matching value; 3) Finally when R is ₂ On a stack hit, the data will be inserted into the L ₁ The MRU end of the stack; when S is ₂ On a stack hit, the data will be inserted into S ₁ The MRU end of the stack; when L is ₂ On a stack hit, the data will be inserted into the L ₁ MRU end of stack. Meanwhile, the hit ghost cache page is deleted.

S4, when both the actual cache stack and the ghost cache stack are missed, if the potential hot data cache region is not full, the data are stored in the R ₁ MRU position of stack; otherwise, the following three cases occur, the 1 st case is that the potential hot data region R or the short reuse distance hot data region S or the long reuse distance hot data region L is full; case 2 is actually cache full; in case 3, when the above two cases do not occur, the algorithm directly migrates the LRU end data of the real buffer stack of the potential hot data area to the MRU end of the ghost buffer stack of the potential hot data area, and then stores the data into the LRU end of the real buffer stack of the potential hot data.

Further, if R misses in both the real and ghost stack, then ₁ When the stack is not full, data is stored in R ₁ MRU position of stack; otherwise, the following three cases occur, the 1 st case being when the potential hot data region R or the short reuse distance hot data region S or the long reuse distance hot data region L is full, whenThe LRU page in the filled ghost cache is deleted, an eviction function is called to select a real cache page for eviction, and finally data is stored in the R ₁ MRU position of stack; in case 2, when the actual cache of the whole cache is full, the eviction function is directly called to select an actual cache page for deletion, and then data is stored in the R ₁ MRU position of stack; in case 3, when the above two cases do not occur, R is directly added ₁ LRU end data migration to R of stack ₂ MRU end of stack, then store data into R ₁ LRU end of stack.

Further, the processing of adjusting the matching value of each type of actual cache size according to the current data access change trend in step S3) is specifically as follows:

designing two adjustable parameters p and q, wherein the parameter p represents a cache size matching value obtained by an actual cache stack of a potential hot data area according to the current load condition, and the parameter q represents a cache size matching value obtained by an actual cache stack of a short reuse distance hot data area according to the current load condition;

for the adjustment condition of the parameter value of the parameter p, when the ghost cache stack of the potential hot data area hits, the size of the parameter value is increased; when a ghost cache stack hit of the short reuse distance hot data area or the long reuse distance hot data area occurs, reducing the size of the parameter value;

for the adjustment condition of the parameter value of the parameter q, when a ghost cache stack hit of the short reuse distance hot data area occurs, increasing the size of the parameter value; when a ghost cache stack hit occurs that has a long reuse distance to the hot data region, the size of the parameter value is reduced.

Further, the processing method for selecting an actual cache page for eviction according to the parameter values of the adjustable parameters p and q in steps S3 and S4 is specifically as follows;

the algorithm compares the two parameter values with the size of the actual cache stack of the short reuse distance hot data area and the size of the actual cache area of the long reuse distance hot data area in sequence, and then evicts the cache page of which the actual cache is larger than the corresponding matching parameter value.

Further, the processing method when the potential hot data area or the short reuse distance hot data area or the long reuse distance hot data area is full for the 1 st case in the case of no hit in the step S4 is specifically as follows;

and deleting the LRU page in the filled ghost cache, selecting a real cache page for eviction according to the parameter values of the adjustable parameters p and q, selecting a real cache page for eviction, and finally storing the data in the MRU position of the real cache stack in the potential hot data area.

Further, the processing method when the actual cache full is encountered for the 2 nd case in the case of no hit in step S4 is specifically as follows;

selecting an actual cache page for deleting according to the parameter values of the adjustable parameters p and q, and then storing the data in the MRU position of the actual cache stack of the potential hot data area;

compared with the prior art, the invention has the following advantages and effects:

(1) The performance of the caching method disclosed by the invention is obviously superior to that of an LRU algorithm, and the caching method is superior to that of the existing LIRS and ARC algorithms in many occasions. Meanwhile, the hit rate of the cache method under different cache scales has stability.

(2) The caching method disclosed by the invention adopts a self-adaptive mode to solve the problems of identification of hot data and cache time distribution of different types of data, so that the hit rate of the caching method is kept more stable, the specific implementation process of the caching method is simpler, and the manual parameter adjustment and strategy maintenance processes are fewer.

(3) The caching method disclosed by the invention can effectively solve the problem caused by caching pollution under the condition that the storage and calculation cost is ensured to be within an acceptable range, and meanwhile, the caching method can effectively identify the access characteristics of data with larger reuse distance which cannot be identified by a common algorithm.

Drawings

Fig. 1 is an algorithm structure diagram of a caching method for capturing thermal data by adaptive classification reuse distance according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Examples

The embodiment realizes a caching method for capturing hot data by self-adaptive classification reuse distance. Conventional caching strategies mainly utilize the cache locality principle (i.e., programs have a tendency to access data and code in local areas) and the frequency of data accesses. However, only by using the two characteristics of the new progress or the frequency reflected by the data, although the method has the characteristics of simplicity, easiness in implementation, wide application and the like, the mode characteristics of data access cannot be comprehensively reflected only by considering the two indexes of the new progress or the frequency; secondly, the mode of data access is dynamic, so the scheme does not have the capability of making adaptive adjustment for the change of captured data characteristics, and these factors also cause the problem that the data hit rate of the traditional cache strategy is not high and general applicability. In view of these existing problems and drawbacks, the present embodiment provides a caching method called adaptive classification reuse distance. The method deeply excavates the characteristics of a data access mode by utilizing the reuse distance characteristics of data and the metadata historical information of cache replacement, and realizes the processes of hot data identification and cache time allocation in a self-adaptive mode, thereby achieving the purpose of improving the hit rate of cache data.

Referring to fig. 1, a buffering method for capturing hot data by adaptive classification reuse distance includes buffer area division, load data stream data type identification and allocation of different types of data buffers, wherein,

dividing a cache region:

(a) The whole Cache is provided with two types of Cache stacks, namely a Real Cache stack (Real Cache) and a Ghost Cache stack (Ghost Cache), wherein the Real Cache stack stores Real data and metadata, the Ghost Cache stack only stores the metadata, the data structure of the Real Cache stack is an LRU stack structure, and the Ghost Cache stack is a first-in first-out queue structure;

(b) The whole cache is divided into three areas according to the characteristics of new progress, frequency and reuse distance: a potential hot data area R, a short reuse distance hot data area S and a long reuse distance hot data area L;

(c) The potentially hot data region R stores data that has been accessed once recently, which continues to be divided into short reuse distance stacks R according to the size of the reuse distance ₁ And a long reuse distance stack R ₂ Short reuse distance stack R ₁ For a real cache stack, a long reuse distance stack R ₂ Is a ghost cache stack. The short reuse distance hot data area S and the long reuse distance hot data area L are both hot data areas, store data with the latest access times larger than or equal to 2 times, and are respectively provided with an actual cache stack and a ghost cache stack. I.e. the short reuse distance thermal data area S is divided into S ₁ Stack and S ₂ Stack, wherein S ₁ The stack is used to store a short reuse distance hot data area, which is the actual cache stack, S ₂ The stack is used for storing the slave S ₁ The area of the cold data that is evicted by the stack, which is the ghost cache stack. Similarly, the long reuse distance thermal data region L is divided into L ₁ Stack and L ₂ Stack and have the same partitioning characteristics as S. Wherein L is ₁ The stack is used to store a long reuse distance hot data region, which is the actual cache stack, L ₂ Stack for storing slave L ₁ The area of the cold data that is evicted by the stack, which is the ghost cache stack.

(1)|R ₁ |≤mC，m∈(0.1,1)；

(2)|R ₁ |+|R ₂ |≤C,|S ₁ |+|S ₂ |≤C,|L ₁ |+|L ₂ |≤C；

(3)|R ₁ |+|S ₁ |+|L ₁ |≤C；

(4)|R|+|S|+|L|≤3C；

Identifying the flow data type of the load data, and performing cache allocation according to different flow data types of the load data, comprising the following steps:

s2, when the actual cache stack hit occurs, the data is indicated to be accessed at least 2 times recently, so that the data should be stored in the hot data area S ₁ Stack or L ₁ In the stack, specifically, when R is hit ₁ Then the data will be from R ₁ Stack migration to S ₁ MRU position of stack; when hit S ₁ Stack or L ₁ When stacking, the data is already in the hot data stack, so that the data only needs to be moved to the MRU position of each stack;

s3, when ghost cache stack hit occurs, three steps are required to be executed: 1) Adjusting the matching value of each type of actual cache size according to the current data access change trend; 2) When the actual cache is full, an algorithm designs an eviction function to decide which page of the actual cache stack is eliminated according to the matching value of the current actual cache sizes of various types; 3) Finally when R is ₂ On hit, the data will be inserted into L ₁ MRU end of stack; when S is ₂ On hit, the data will be inserted into S ₁ MRU end of stack; when L is ₂ On hit, the data will be inserted into L ₁ MRU end of stack. Meanwhile, the hit ghost cache page is deleted.

S4, when the actual cache stack and the ghost cache stack miss, if R is not hit ₁ When the stack is not full, data is stored in R ₁ MRU position of stack; otherwise, the following three conditions occur, wherein the 1 st condition is that R, S or L is full, at this time, the algorithm deletes the LRU page in the filled ghost cache, then calls the eviction function to select a real cache page for eviction, and finally stores the data in R ₁ MRU position of stack; in case 2, when the actual cache of the whole cache is full, the algorithm will directly call the eviction function to select an actual cache page for deletion, and then store the data into the R ₁ MRU position of stack; in case 3, when the two cases do not occur, the algorithm directly compares R ₁ LRU end data migration to R of stack ₂ MRU end of stack, then store data into R ₁ LRU end of stack.

In step S3, the process of adjusting the matching value of each type of actual cache size according to the current data access change trend is specifically as follows:

designing 2 adjustable parameters p and q, wherein p represents R ₁ The stack obtains a cache size matching value according to the current load condition, and q represents S ₁ The stack obtains a cache size matching value according to the current load condition;

when R occurs ₂ On a stack hit, consider R ₁ The current buffer size of the stack is not enough, and the algorithm increases the size of p; when S occurs ₂ Stack and L ₂ When the stack hits, it is regarded as the S corresponding to each stack ₁ Stack and L ₁ The current cache of the stack is too small, and the algorithm reduces the size of p; wherein when S occurs ₂ On a stack hit, the algorithm may also consider the trend of the current data access type to be likely short reuse distance data, so the algorithm may increase the size of q, and similarly, when L occurs ₂ On a stack hit, the algorithm reduces the size of q. The specific adjustment formulas for p and q are as follows:

if R occurs ₂ On a stack hit, then:

if S occurs ₂ On a stack hit, then:

if L occurs ₂ On a stack hit, then:

in this embodiment, the processing method for selecting an actual cache page for eviction by the eviction function in step S3) and step S4 is specifically as follows;

the algorithm compares two parameter values p and q with S corresponding to the two parameter values in sequence ₁ Stack and S ₂ Stack size and then evict cache pages whose actual cache is larger than their corresponding matching parameter values.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A buffer method for capturing hot data by self-adaptive classification reuse distance is characterized by comprising buffer area division, load data stream data type identification and buffer allocation of different types of data, wherein the buffer area division specifically comprises the following steps:

dividing the whole cache into three areas according to the characteristics of new progress, frequency and reuse distance: a potential hot data region R, a short reuse distance hot data region S and a long reuse distance hot data region L; the whole cache is provided with two types of cache stacks, namely an actual cache stack and a ghost cache stack, the actual cache stack stores real data and metadata, the ghost cache stack only stores the metadata, the data structure of the actual cache stack is an LRU stack structure, and the ghost cache stack is a first-in first-out queue structure;

the potential hot data region R stores data which is accessed once recently and is continuously divided into a short reuse distance stack R according to the size of the reuse distance ₁ And a long heavy distance stack R ₂ Short reuse distance stack R ₁ For a real cache stack, a long reuse distance stack R ₂ Is a ghost cache stack;

the short reuse distance thermal data area S and the long reuse distance thermal data area L are thermal data areas which store data with the latest access times more than or equal to 2 times; the short reuse distance thermal data area S is divided into S ₁ Stack and S ₂ Stack, wherein S ₁ The stack is used to store a short reuse distance hot data area, which is the actual cache stack, S ₂ The stack is used for storing the slave S ₁ A region of stack evicted cold data, which is a ghost cache stack; the long reuse distance thermal data area L is divided into L ₁ Stack and L ₂ Stack of which L ₁ The stack is used to store a long reuse distance hot data region, which is the actual cache stack, L ₂ Stack for storing slave L ₁ A region of stack evicted cold data, which is a ghost cache stack;

assuming that the size of the buffered data is C, the area size must satisfy the following specifications:

(1)|R ₁ |≤mC，m∈(0.1,1)；

(2)|R ₁ |+|R ₂ |≤C,|S ₁ |+|S ₂ |≤C,|L ₁ |+|L ₂ |≤C；

(3)|R ₁ |+|S ₁ |+|L ₁ |≤C；

(4)|R|+|S|+|L|≤3C；

the | symbol represents the number of data in the cache;

the identification of the flow data type of the load data and the cache allocation of different types of data comprise the following steps:

s1, initially, an actual buffer stack and a ghost buffer stack are both empty, and for each read request or write request in a load data stream, the following three conditions occur: A. actual cache stack hit; B. ghost cache stack hit; C. for the three cases, when A is met, executing a step S2; when B is satisfied, executing step S3; when C is satisfied, executing step S4;

s2, when the actual cache stack hit occurs, the data should be stored into the hot data S ₁ Stack or L ₁ In the stack, when hit R ₁ When stacked, the data will be from R ₁ Stack migration to S ₁ MRU position of stack; when hit S ₁ Stack or L ₁ When stacking, because the data is already in the hot data stack, the data is moved to the MRU position of each stack;

s3, when ghost cache stack hit occurs, three steps are required to be executed: 1) Adjusting the matching value of each type of actual cache size according to the current data access change trend; 2) When the actual cache is full, determining which page of the actual cache stack is eliminated according to the matching value of the actual cache sizes of the current types; 3) Finally when R is ₂ On a stack hit, the data will be inserted into the L ₁ The MRU end of the stack; when S is ₂ On a stack hit, the data will be inserted into S ₁ MRU end of stack; when L is ₂ On a stack hit, the data will be inserted into the L ₁ At the MRU end of the stack, the hit ghost cache page is deleted;

s4, when the actual cache stack and the ghost cache stack are not hit, if the potential hot data cache region is not full, the data are stored in the R ₁ MRU position of stack; otherwise it will go outNow, the following three cases, case 1 is that the potential hot data region R or the short reuse distance hot data region S or the long reuse distance hot data region L is full; case 2 is actually cache full; in case 3, when the above two cases do not occur, the data at the LRU end of the real buffer stack of the potential hot data area is directly migrated to the MRU end of the ghost buffer stack of the potential hot data area, and then the data is stored at the LRU end of the real buffer stack of the potential hot data area.

2. The method according to claim 1, wherein the step S3) of adjusting the matching value of each type of actual buffer size according to the current data access variation trend comprises the following steps:

designing 2 adjustable parameters p and q, wherein p represents R ₁ The stack obtains the cache size matching value according to the current load condition, and q represents S ₁ The stack obtains a cache size matching value according to the current load condition;

when R occurs ₂ On a stack hit, consider R ₁ Increasing the size of p when the current cache size of the stack is not enough; when S occurs ₂ Stack and L ₂ When the stack hits, it is regarded as the S corresponding to each stack ₁ Stack and L ₁ The current cache of the stack is too small, reducing the size of p; wherein when S occurs ₂ On a stack hit, the trend for the current data access type is considered to be short reuse distance data, thus increasing the size of q, and similarly when L occurs ₂ And when the stack hits, reducing the size of q, wherein the adjustment formulas of p and q are as follows:

if R occurs ₂ On a stack hit, then:

if S occurs ₂ On a stack hit, then:

if L occurs ₂ On a stack hit, then:

3. the method as claimed in claim 2, wherein in step 2) of step S3, when the actual cache is full, the method decides which actual cache stack is eliminated according to the matching value of the current actual cache sizes as follows:

sequentially comparing two parameter values p and q with corresponding S ₁ Stack and S ₂ Stack size and then evicts cache pages that are actually cached larger than their corresponding matching parameter values.

4. The caching method for capturing hot data according to claim 3, wherein the step S4 is specifically performed as follows for the case 1, when the potential hot data region R or the short reuse distance hot data region S or the long reuse distance hot data region L is full, in case of a miss;

deleting the LRU page in the filled ghost cache, selecting a real cache page for eviction according to the matching value of the current real cache sizes by using the method in claim 3, and storing the data into the R ₁ MRU position of stack.

5. The method according to claim 3, wherein the step S4 of processing when the actual buffer fullness is encountered for the 2 nd case in case of miss is as follows:

direct use of the method claimed in claim 3 for selecting a real cache page for eviction based on matching values of current real cache sizes of respective types, and storing data into R ₁ MRU position of stack.