CN112015678A - Log caching method and device - Google Patents

Log caching method and device Download PDF

Info

Publication number
CN112015678A
CN112015678A CN201910460169.8A CN201910460169A CN112015678A CN 112015678 A CN112015678 A CN 112015678A CN 201910460169 A CN201910460169 A CN 201910460169A CN 112015678 A CN112015678 A CN 112015678A
Authority
CN
China
Prior art keywords
cache
log
page
cache page
xth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910460169.8A
Other languages
Chinese (zh)
Inventor
李玥
何小锋
刘海锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201910460169.8A priority Critical patent/CN112015678A/en
Publication of CN112015678A publication Critical patent/CN112015678A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • G06F12/121Replacement control using replacement algorithms

Abstract

The invention discloses a log caching method, which comprises the following steps: every continuous n logs form a cache page, a plurality of cache pages are stored in a cache in a storage structure supporting range searching, and in the storage structure, the index of each cache page is the position of a first log in the cache; when the xth log is requested, searching in the storage structure, if the xth log is not hit, reading a cache page where the log x is located from a disk, adding the cache page into a cache when the cache is not overflowed, and returning to the xth log; and if the cache overflows, performing cache cleaning: dividing the cache pages with the last access time and the current time less than t seconds into hot areas, and dividing other cache pages into cold areas; and preferentially removing the cache pages belonging to the cold area from the cache, and if the cold area is empty, continuously removing the cache pages in the hot area. Corresponding electronic devices and computer-readable storage media are also disclosed. By applying the technical scheme disclosed by the invention, the cache hit rate of the log can be improved.

Description

Log caching method and device
Technical Field
The present invention relates to the field of cache technologies, and in particular, to a log caching method and device.
Background
A log refers to a sequence of records that have an unalterable, unadditionable write, and a time-ordered characteristic. The log in the invention is not only the operation record log generated in the operation process of the system or the program, but also a universal storage abstraction. Most logs also have the characteristics of tail reads and sequential reads, i.e., read accesses to the log all occur near the tail of the log and are all sequential reads.
The log is stored in disk. Because the memory has a faster read-write performance compared with the disk, in order to improve the read-write performance of the log, the memory is generally used as a cache, and a part of log data is cached in the memory. When the external application needs to access the log, the log is preferentially read from the cache of the memory, and if the log data to be accessed exists in the cache, the log data is directly returned, wherein the condition is called one-time cache hit; and if the log data to be accessed does not exist in the cache, reading the log data from the disk.
Obviously, the higher the cache hit rate, the better the log access performance. The method for determining which data is stored in the cache so that the hit rate of the cache is as high as possible is called a cache policy.
Common caching strategies are: least recently used page replacement policy (LRU), least recently used page replacement policy (LFU), etc. The LRU strategy preferentially eliminates the page which is not used for the longest time, and the LFU preferentially eliminates the page which is accessed for the least times in a certain period.
Disclosure of Invention
The embodiment of the invention provides a log caching method and device, which are used for improving the cache hit rate of a log.
The embodiment of the invention discloses a log caching method, which comprises the following steps:
every continuous n logs form a cache page, a plurality of cache pages are stored in a cache in a storage structure supporting range searching, in the storage structure, the index of each cache page is the position of a first log in the cache, wherein n is a set constant;
when the xth log is requested, searching the xth log in the storage structure, if the cache is not hit, reading a cache page where the xth log is located from a disk, adding the cache page into the cache, and returning the xth log; wherein x is a non-negative integer;
before adding the cache page into the cache, checking whether the cache overflows or not, and if the cache overflows, cleaning the cache until the cache does not overflow;
the performing cache cleaning comprises:
dividing the cache pages with the last access time and the current time less than t seconds into hot areas, and dividing other cache pages into cold areas, wherein t is a set constant;
and preferentially removing the cache pages belonging to the cold area from the cache, and if the cold area is empty, continuously removing the cache pages belonging to the hot area.
Preferably, the performing cache cleaning further includes:
calculating the weight of each cache page:
the weight of the cache page is (log tail position-cache page position) × 1
And preferentially moving out the cache page with the minimum weight value in the same region according to the weight value of the cache page.
Preferably, the step of forming a cache page by every n consecutive logs specifically includes:
according to Pk={Ln*k,Ln*k+1,…,Ln*(k+1)-1Determining the log contained in each cache page;
wherein, PkRepresenting the kth cache page, wherein k is a non-negative integer;
Ln*krepresenting the nth log;
n is the number of logs each cache page contains.
Preferably, the step of searching the xth log in the storage structure specifically includes:
searching all the cache pages with the first log sequence numbers smaller than or equal to x in the storage structure, wherein the cache page with the largest first log sequence number is recorded as the sequence number of the first log in the found cache pages;
judging whether x satisfies the condition: x < y + n, hit the cache if satisfied, otherwise miss the cache.
Preferably, the method further comprises:
if the cache is hit, and the hit cache page PkIs located in the hot area, and the x-th log is located in the cache page PkThe tail of the cache page P is asynchronously loadedkThe following cache page Pk+1
Preferably, the judgment is made whether the xth log is located in the cache page PkThe tail part of (2), specifically comprising:
judgment (location of the xth Log-cache Page PkPosition)>Whether a cache page length tail threshold is established or not;
if yes, the xth log is located in the cache page PkThe tail of (a);
wherein, the tail threshold is a predefined constant with a value range of (0, 1).
The embodiment of the invention also discloses an electronic device, which comprises a memory, a processor and a computer program which is stored on the memory and can be run on the processor, wherein the processor is specifically used for executing the following operations when executing the program:
every continuous n logs form a cache page, a plurality of cache pages are stored in a cache in a storage structure supporting range searching, in the storage structure, the index of each cache page is the position of a first log in the cache, wherein n is a set constant;
when the xth log is requested, searching the xth log in the storage structure, if the cache is not hit, reading a cache page where the log x is located from a disk, adding the cache page into the cache, and returning to the xth log; wherein x is a non-negative integer;
before adding the cache page into the cache, checking whether the cache overflows or not, and if the cache overflows, cleaning the cache until the cache does not overflow;
the performing cache cleaning comprises:
dividing the cache pages with the last access time and the current time less than t seconds into hot areas, and dividing other cache pages into cold areas, wherein t is a set constant;
and preferentially removing the cache pages belonging to the cold area from the cache, and if the cold area is empty, continuously removing the cache pages belonging to the hot area.
Preferably, the processor is further configured to, when performing cache scrubbing:
calculating the weight of each cache page:
the weight of the cache page is (log tail position-cache page position) × 1
And preferentially moving out the cache page with the minimum weight value in the same region according to the weight value of the cache page.
Preferably, the processor is specifically configured to:
according to Pk={Ln*k,Ln*k+1,…,Ln*(k+1)-1Determining the log contained in each cache page;
wherein, PkRepresenting the kth cache page, wherein k is a non-negative integer;
Ln*krepresenting the nth log;
n is the number of logs each cache page contains.
Preferably, when the processor searches the xth log in the storage structure, the processor is specifically configured to:
searching all the cache pages with the first log sequence numbers smaller than or equal to x in the storage structure, wherein the cache page with the largest first log sequence number is recorded as the sequence number of the first log in the found cache pages;
judging whether x satisfies the condition: x < y + n, hit the cache if satisfied, otherwise miss the cache.
Preferably, the processor is further configured to perform the following operations:
if the cache is hit, and the hit cache page PkLocated in a hot zone and the xth log is located in the hot zoneCache page PkThe tail of the cache page P is asynchronously loadedkThe following cache page Pk+1
Preferably, the processor is determining whether the xth log is located in the PkThe tail section of (2) is specifically used for:
judgment (location of the xth Log-cache Page PkPosition)>Whether a cache page length tail threshold is established or not;
if yes, the xth log is located in the cache page PkThe tail of (a);
wherein, the tail threshold is a predefined constant with a value range of (0, 1).
The embodiment of the invention also discloses a computer readable storage medium, which is stored with a computer program, and is characterized in that when being executed by a processor, the program realizes the steps of the log caching method.
One embodiment of the above invention has the following beneficial effects:
the invention aims at the characteristics of log data and the read-write rule thereof, and improves the existing cache strategy according to the rule that the probability of the page which is accessed just before is relatively high for accessing again in the future, and adopts a cold-hot partition mode to carry out cache cleaning, so that the page which is accessed recently is preferentially kept in the cache, thereby achieving the technical effect of improving the cache hit rate.
Another embodiment of the above invention has the following beneficial effects:
on the basis of the cache cleaning strategy, a strategy for performing cache cleaning based on the distance between the cache page and the tail of the log cache is provided according to the rule that the closer the page is to the tail of the log, the higher the probability of being accessed again in the future is, so that the high hit rate of hot area data which needs to be accessed frequently is kept, and the problem of cache pollution caused by accidental batch access is effectively solved.
In addition, another embodiment of the above invention has the following beneficial effects:
the invention considers that log data has continuous read-write characteristic in most cases, namely, the log data is continuously read and written from a certain position backwards. Based on the characteristic, the invention predicts the position to be accessed, and further improves the hit rate of the cache by a mode of asynchronously loading the corresponding cache page in advance. The invention provides a cache preloading mechanism, which can further improve the cache hit rate and reduce jitter.
Drawings
FIG. 1 is a flow chart illustrating a log caching method according to the present invention;
FIG. 2 is a diagram illustrating a data structure of a cache according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating a log access flow in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and examples.
In the process of implementing the present invention, the inventor finds that the existing caching strategy mainly has the following problems:
1. because the existing cache strategy is designed for a general random access storage structure and is not optimized for log data, the cache hit rate of the log is low.
2. The hit rate of the existing cache strategy is sharply reduced due to accidental batch operation, and the cache pollution condition is serious.
3. The existing cache strategy lacks a page preloading mechanism, and performance jitter is generated when a page is loaded from a disk.
In order to solve the problems in the prior art, the invention provides a log caching method in a targeted manner according to the characteristics of log data and the read-write rule thereof, and the log caching method is used for managing the cache of log data access so as to maximally improve the cache hit rate and further achieve the purpose of improving the read-write performance of the log.
The invention provides a log caching method, as shown in fig. 1, the method comprises the following steps:
step 101: storing the log in a cache, specifically: every continuous n logs form a cache page, a plurality of cache pages are stored in a cache in a storage structure supporting range finding, in the storage structure, the index of each cache page is the position of the first log in the cache, and n is a set constant.
In this step, as long as the storage structure supporting range finding can be used in the present invention, almost all trees support range finding, and thus, all trees can be used to implement the present invention. Based on the application scenario of the present invention, the red-black tree has relatively optimal performance, and therefore, the present invention can be preferably implemented by using the red-black tree.
Step 102: when the xth log is requested, searching the xth log in the storage structure, and if the xth log is hit in the cache, executing step 107; if the cache is not hit, step 103 is performed.
Wherein x is a non-negative integer.
Step 103: and reading the cache page where the x-th log is located from the disk.
Step 104: and judging whether the cache overflows or not, if so, executing the step 105, and otherwise, executing the step 106.
Step 105: and (6) performing cache cleaning and returning to the step 104.
In the scheme, if the cache overflows, the cache is cleaned, and whether the cache overflows or not is judged again, and the steps are repeated until the cache does not overflow.
In this step, the cache cleaning includes:
dividing the cache pages with the last access time and the current time less than t seconds into hot areas, and dividing other cache pages into cold areas, wherein t is a set constant;
and preferentially removing the cache pages belonging to the cold area from the cache, and if the cold area is empty, continuously removing the cache pages belonging to the hot area.
Step 106: and adding the cache page where the x-th log is located into the cache.
Step 107: the x-th log is returned.
This concludes the flow shown in fig. 1.
The cache cleaning strategy provided by the invention is to take the law that the probability of the page which is just accessed and is accessed again in the future is relatively high into consideration, and adopt a cold and hot partition mode to clean the cache. On the basis of cold and hot partitions, the invention further provides a cache cleaning strategy based on the distance between the cache page and the tail of the log cache according to the rule that the closer the page to the tail of the log is, the higher the probability of being accessed again in the future is, and specifically, on the basis of the cold and hot partitions, the method comprises the following processing:
calculating the weight of each cache page:
the weight of the cache page is (log tail position-cache page position) × 1
And preferentially moving out the cache page with the minimum weight value in the same region according to the weight value of the cache page.
In the present example, according to Pk={Ln*k,Ln*k+1,…,Ln*(k+1)-1Determine the log contained in each cache page, that is, the corresponding relationship between the cache page and the log is:
Pk={Ln*k,Ln*k+1,…,Ln*(k+1)-1}
wherein, PkRepresenting the kth cache page, wherein k is a non-negative integer;
Ln*krepresenting the nth log;
n is the number of logs each cache page contains.
Wherein, the step 102 involves searching the xth log in a storage structure, and specifically includes:
searching all the cache pages with the first log sequence numbers smaller than or equal to x in the storage structure, wherein the cache page with the largest first log sequence number is recorded as the sequence number of the first log in the found cache pages;
judging whether x satisfies the condition: x < y + n, hit the cache if satisfied, otherwise miss the cache.
On the basis of the above scheme, in order to further improve the hit rate of the cache, the present invention proposes the following cache preloading mechanism:
if the cache is hit, and the hit cache page PkIs located in the hot area, and the x-th log is located in the cache page PkThe tail of the cache page P is asynchronously loadedkThe following cache page Pk+1
Judging whether the x-th log is located in the cache page PkThe tail part specifically comprises:
judgment (location of the xth Log-cache Page PkPosition)>Whether a cache page length tail threshold is established or not;
if yes, the xth log is located in the cache page PkThe tail of (a);
the tail threshold is a predefined constant, the value range is not (0,1), and the preferable value is between 80% and 90%.
Therefore, the technical scheme of the application improves the prior art from the aspects of cache data structure, log access, cache cleaning, cache preloading and the like. The technical solution of the present invention will be described in further detail below in the following aspects.
First, cached data structure
In order to avoid performance loss caused by frequent swap-in and swap-out, the minimum unit of the cache is generally a cache page, and each page comprises a plurality of logs.
The log cache of the present invention is composed of a plurality of cache pages, and as shown in fig. 2, is a schematic diagram of a data structure of the cache in an embodiment of the present invention:
each cache page contains n consecutive logs, where n is a preset constant. And, there is the following correspondence between the cache page and the log:
Pk={Ln*k,Ln*k+1,…,Ln*(k+1)-1}
wherein P is a cache page, k is any nonnegative integer, PkDenotes the k-th cache page, P0, P1 … … P in FIG. 1kRepresenting a cache page;
l is a log record, LxRepresents the xth log record, for example: l isn*kRepresenting the nth × k log records;
n is the number of logs contained in each cache page;
then the k-th cache page PkThe contained logs are: from Ln*kTo Ln*(k+1)-1There are n logs.
As can be seen from the above definition of cache pages, when the constant n is determined, the cache page to which any log belongs is unique and exists, and all cache pages are not overlapped, that is: there is no log belonging to more than one cache page at a time.
The number of pages that the cache can accommodate depends on the capacity of the cache and the size of each page within the cache. It is to be noted that, although the number of logs in a cache page is fixed, since the length of the log is variable, the length of the cache page is also variable.
Second, log access flow
For fast search, in the embodiment of the present invention, all cache pages are stored in the memory in a manner of organizing a storage structure supporting range search by using the position of the first log in the cache page as an index. As described above, the red-black tree has relatively optimal performance based on the application scenario of the present invention, and therefore, the red-black tree is taken as an example in the following embodiments. The red and black tree is a self-balancing binary search tree and is a data structure in computer science, and the red and black tree is an efficient search tree and can complete search, insertion and deletion in O (logN) time. Accordingly, the log access flow in the embodiment of the present invention is as shown in fig. 3.
Assuming that the xth log (hereinafter, referred to as "log x") is currently accessed, according to the flow shown in fig. 3, the log access flow includes the following steps:
step 1: and searching all cache pages with the first log sequence numbers smaller than or equal to x in the cached red and black tree, wherein the cache page with the largest first log sequence number is recorded as y of the found cache page.
Step 2: judging whether the log x is in the found cache page, and only judging whether x meets the following conditions: x < y + n, where n is the number of logs in each cache page. If yes, the log x is directly returned from the cache in the found cache page, and the log access process is ended; otherwise, the cache is not hit, and the step 3 is continued.
And step 3: and if the cache is not hit, reading the cache page of the log x from the disk into the memory.
According to the correspondence between the cache page and the log as described above, the cache page corresponding to the log x can be determined.
And 4, step 4: judging whether the cache overflows or not: the conditions for buffer overflow are: the current cache size + the size of the page where the log x is located > cache capacity. And if the cache overflows, executing a cache cleaning process, judging the cache overflow, repeating the steps until the cache does not overflow, and executing the next step.
And 5: and adding the page of the log x into a log cache.
Step 6: and returning to the log x in the page, and ending the process.
Third, cache cleaning strategy
The cache scrubbing policy determines which cache pages will be preferentially removed from the cache when the cache is about to overflow.
The embodiment of the invention considers the read-write characteristics of the log data and summarizes the following rules:
1. the cache page which is accessed just now has relatively high probability of being accessed again in the future;
2. the closer the cache page is to the tail of the log, the higher the probability of being accessed again in the future.
Based on the rule, the invention adopts two dimensions of cold and hot partition and distance to comprehensively decide the page to be moved out of the cache.
And dividing the time before the current time into a cold area and a hot area by taking the current time as a stop time, wherein the hot area is closer to the current time, and the cold area is farther. For example, the time range of the hot area is set to 10 seconds, then the cache page whose last access time is less than 10 seconds from the current time belongs to the hot area, and the other cache pages belong to the cold area.
Another factor determining whether a cache page is moved out is the location of the cache page, and the present invention defines:
the weight of the cache page is (log tail position-cache page position) × 1
From the above definitions it follows: the closer the cache page is to the tail of the log file, the greater the weight. It is worth noting that: because the log file is written continuously, the position of the tail part of the log is changed continuously, and correspondingly, the weight value of the cache page is changed continuously.
The strategy of cache scrubbing is as follows:
1. preferentially moving out the cache pages in the cold area, and if the cold area is empty, continuously cleaning the cache pages in the hot area;
2. selecting the shifted cache pages in the same region according to the weight of the cache pages: and preferentially moving out the cache page with the minimum weight value.
The cache cleaning strategy not only keeps a high hit rate for hot area data which needs to be accessed frequently, but also can effectively solve the problem of cache pollution caused by accidental batch access.
For example, normally, the requests for the cache are concentrated at the tail of the log file, and most cache pages in the cache are also located near the tail of the log file. When a certain user continuously accesses the log data backwards from a certain position in the middle of the log file:
if the cache policies such as LRU are used, along with the access of the user, a large number of cache pages at the middle positions replace a large number of cache pages at the tail part out of the cache, so that the cache hit rate of other users for normally accessing the tail part log is reduced;
by using the cache cleaning strategy of the invention, because the weight of the cache page at the middle position is smaller than that at the tail part, even if a user starts to continuously access the log data backwards from a certain middle position of the log file, the weight of the cache page at the tail part is still larger than that of the cache page at the middle position, so that the accidental batch access cannot cause the tail log to be preferentially cleaned out of the cache, the high cache hit rate of the tail log is ensured, and the cache pollution problem is effectively avoided.
Fourthly, cache preloading
In most cases, log data has a continuous read-write characteristic, i.e., the read-write is continuously performed from a certain position backward.
Based on the characteristic, the invention predicts the position to be accessed and asynchronously loads the corresponding cache page in advance so as to further improve the hit rate of the cache.
The embodiment of the invention sets the following trigger conditions for cache preloading:
judging whether the following conditions are met when the log is requested:
1. hit cache, marking the hit cache page as Pk
2.PkLocated in the hot zone;
3. the requested log is located at PkThe tail of (2). Judging whether the x-th log is in PkThe conditions for the tail are: judgment "(position of the xth Log-cache Page PkPosition of (2)>Whether the length of the cache page is equal to the tail threshold value or not is judged, if yes, the xth log is located in the cache page PkThe tail of (2). The tail threshold is a predefined constant, and has a value range of (0,1), and the value of the tail threshold is generally 80% -90%.
If all the above conditions are satisfied, the asynchronous loading of the cache page P is carried outk+1
As can be seen from the above, compared with a general caching strategy, the log caching method provided by the present invention has the following advantages:
1. the existing caching strategy is improved according to the characteristics of log data and the read-write rule of the log data, and the cache hit rate of the log is high.
2. The method is compatible with sporadic batch operation, and the hit rate cannot be reduced sharply.
3. And a cache preloading mechanism is supported, so that the cache hit rate is further improved, and the jitter is reduced.
Based on the above technical solutions of the present application, specific embodiment modes of the technical solutions of the present application will now be described in detail with reference to two embodiments.
The first embodiment is as follows:
the log caching method provided by the embodiment comprises the following steps:
step 1: a cache page is formed by every 20 continuous logs, and a plurality of cache pages are stored in a cache in the form of a red-black tree, wherein each cache page takes the position of the first log in the cache as an index.
In this embodiment, cache page P0The log contained is { L }0,L1,…,L19In a red-black tree, page P is cached0With log L0As an index;
cache page P1The log contained is { L }20,L1,…,L39In a red-black tree, page P is cached1With log L20As an index;
by analogy, the cache page P20The log contained is { L }400,L401,…,L421In a red-black tree, page P is cached20With log L400As an index.
The present embodiment assumes that the cache page currently stored in the cache is P1~P20
Step 2: assuming that the 430 th log is currently requested, the cache is searched for a miss, and step 3 is executed.
In this step, the log L is searched in the cache430The method comprises the following steps:
firstly, searching in a cache according to the index of the cache page, and finding the cache page with the sequence number less than or equal to 430 and the largest sequence number, that is: p20,P20Is L400
Then, according to 400+20<430, determine Log L430Out of cache page P20Thus, a miss.
And step 3: reading log L from disk430In the cache page P22To the memory.
And 4, step 4: according to the current buffer size + buffer page P22Whether the size of the buffer is larger than the buffer capacity or not is judged, and whether the buffer overflows or not is judged. In this embodiment, if overflow is assumed, a cache cleaning process needs to be executed, and the cache cleaning scheme provided in this embodiment specifically includes:
first, the current time is used as the cutoff time, and the time before the current time is divided into a cold area and a hot area. For example, the present embodiment divides the time range within 10 seconds before the current time into hot zones, that is: the cache pages with the last access time less than 10 seconds from the current time belong to the hot area, and other cache pages belong to the cold area.
Then, the cache pages in the cold area are preferentially moved out, and if the cold area is empty, the cache pages in the hot area are continuously cleaned.
On the basis of the cache cleaning strategy, different weights can be further given to the cache pages in the same region according to the positions of the cache pages, and the cache pages are cleaned according to the weights of the cache pages. The scheme provided by the embodiment specifically comprises the following steps:
the cache page is given a weight according to the following formula:
the weight of the cache page is (log tail position-cache page position) × 1
And for the cache pages in the same region, preferentially removing the cache page with the minimum weight value according to the weight value of the cache page.
And (5) performing cache cleaning according to the cache cleaning strategy in the embodiment until the cache does not overflow, and executing the step.
And 5: will log L430In the cache page P22And adding the log cache.
Step 6: log L within return page430And the flow ends.
Example two:
in this embodiment, a cache preloading mechanism is introduced on the basis of the first embodiment to further improve the hit rate of the cache.
The log caching method provided by the embodiment comprises the following steps:
step 1: a cache page is formed by every 20 continuous logs, and a plurality of cache pages are stored in a cache in the form of a red-black tree, wherein each cache page takes the position of the first log in the cache as an index.
In this embodiment, cache page P0The log contained is { L }0,L1,…,L19In a red-black tree, page P is cached0With log L0As an index;
cache page P1The log contained is { L }20,L1,…,L39In a red-black tree, page P is cached1With log L20As an index;
by analogy, the cache page P20The log contained is { L }400,L401,…,L421In a red-black tree, page P is cached20With log L400As an index.
The present embodiment assumes that the cache page currently stored in the cache is P1~P20
Step 2: assuming that 338 th log is currently requested, look up log L in the cache338
In this step, the log L is searched in the cache338The method comprises the following steps:
firstly, searching in a cache according to the index of the cache page, and finding the cache page with the sequence number less than or equal to 338 and the maximum sequence number, namely: p15,P15Is L320
Then, according to 320+20 > 338, the log L is determined338In a cache page P15Thus, a cache hit.
And step 3: return log L338
And 4, step 4: judging whether a cache preloading condition is met, specifically comprising:
1) determining cache pages P15Whether located in a hot zone.
As described above, the time before the current time may be divided into two hot and cold regions with the current time as the cutoff time. For example, the present embodiment divides the time range within 10 seconds before the current time into hot zones, falseSetting a cache page P15The last access time is less than 10 seconds from the current time, then the cache page P15Belonging to the hot zone.
2) Judging log L338Whether or not it is located in cache page P15The tail of (2).
Specifically, whether the following formula is satisfied is judged:
L338–L320>cache page P15Length of 80%
Wherein 80% is the tail threshold set in this embodiment;
cache page P15Can be determined by the length of L320To L339Are accumulated to obtain the length of the optical fiber.
After the above determination, it can be determined that the cache preloading condition is satisfied, and step 5 is executed.
And 5: asynchronously loading cache pages P16
The flow of this embodiment is ended.
Furthermore, the present application also provides a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the log caching method as described above.
In addition, the present application further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor is specifically configured to perform the following operations when executing the program:
every continuous n logs form a cache page, a plurality of cache pages are stored in a cache in a storage structure supporting range searching, in the storage structure, the index of each cache page is the position of a first log in the cache, wherein n is a set constant;
when the xth log is requested, searching the xth log in the storage structure, if the cache is not hit, reading a cache page where the log x is located from a disk, adding the cache page into the cache, and returning to the xth log; wherein x is a non-negative integer;
before adding the cache page into the cache, checking whether the cache overflows or not, and if the cache overflows, cleaning the cache until the cache does not overflow;
the performing cache cleaning comprises:
dividing the cache pages with the last access time and the current time less than t seconds into hot areas, and dividing other cache pages into cold areas, wherein t is a set constant;
and preferentially removing the cache pages belonging to the cold area from the cache, and if the cold area is empty, continuously removing the cache pages belonging to the hot area.
Wherein, the processor is further configured to:
calculating the weight of each cache page:
the weight of the cache page is (log tail position-cache page position) × 1
And preferentially moving out the cache page with the minimum weight value in the same region according to the weight value of the cache page.
Wherein the processor is specifically configured to:
according to Pk={Ln*k,Ln*k+1,…,Ln*(k+1)-1Determining the log contained in each cache page;
wherein, PkRepresenting the kth cache page, wherein k is a non-negative integer;
Ln*krepresenting the nth log;
n is the number of logs each cache page contains.
Wherein, the processor is specifically configured to, when searching the xth log in the storage structure:
searching all the cache pages with the first log sequence numbers smaller than or equal to x in the storage structure, wherein the cache page with the largest first log sequence number is recorded as the sequence number of the first log in the found cache pages;
judging whether x satisfies the condition: x < y + n, hit the cache if satisfied, otherwise miss the cache.
Wherein the processor is further configured to:
if the cache is hit, and the hit cache page PkIs located in the hot area, and the x-th log is located in the cache page PkThe tail of the cache page P is asynchronously loadedkThe following cache page Pk+1
Wherein the processor is judging whether the x-th log is located in the PkThe tail section of (2) is specifically used for:
judgment (location of the xth Log-cache Page PkPosition)>Whether a cache page length tail threshold is established or not;
if yes, the xth log is located in the cache page PkThe tail of (a);
the tail threshold is a predefined constant, and the value range is (0,1), preferably 80% -90%.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (13)

1. A method of log caching, comprising:
every continuous n logs form a cache page, a plurality of cache pages are stored in a cache in a storage structure supporting range searching, in the storage structure, the index of each cache page is the position of a first log in the cache, wherein n is a set constant;
when the xth log is requested, searching the xth log in the storage structure, if the cache is not hit, reading a cache page where the xth log is located from a disk, adding the cache page into the cache, and returning the xth log; wherein x is a non-negative integer;
before adding the cache page into the cache, checking whether the cache overflows or not, and if the cache overflows, cleaning the cache until the cache does not overflow;
the performing cache cleaning comprises:
dividing the cache pages with the last access time and the current time less than t seconds into hot areas, and dividing other cache pages into cold areas, wherein t is a set constant;
and preferentially removing the cache pages belonging to the cold area from the cache, and if the cold area is empty, continuously removing the cache pages belonging to the hot area.
2. The method of claim 1, wherein the performing cache scrubbing further comprises:
calculating the weight of each cache page:
the weight of the cache page is (log tail position-cache page position) × 1
And preferentially moving out the cache page with the minimum weight value in the same region according to the weight value of the cache page.
3. The method according to claim 1 or 2, characterized in that:
the step of forming one cache page by every continuous n logs specifically comprises the following steps:
according to Pk={Ln*k,Ln*k+1,…,Ln*(k+1)-1Determining the log contained in each cache page;
wherein, PkRepresenting the kth cache page, wherein k is a non-negative integer;
Ln*krepresenting the nth log;
n is the number of logs each cache page contains.
4. The method according to claim 3, wherein the searching the xth log in the storage structure specifically comprises:
searching all the cache pages with the first log sequence numbers smaller than or equal to x in the storage structure, wherein the cache page with the largest first log sequence number is recorded as the sequence number of the first log in the found cache pages;
judging whether x satisfies the condition: x < y + n, hit the cache if satisfied, otherwise miss the cache.
5. The method of claim 3, further comprising:
if the cache is hit, and the hit cache page PkIs located in the hot area, and the x-th log is located in the cache page PkThe tail of the cache page P is asynchronously loadedkThe following cache page Pk+1
6. The method of claim 5, wherein the determining whether the xth log is located in the cache page PkThe tail part of (2), specifically comprising:
judgment (location of the xth Log-cache Page PkPosition)>Whether a cache page length tail threshold is established or not;
if yes, the xth log is located in the cache page PkThe tail of (a);
wherein, the tail threshold is a predefined constant with a value range of (0, 1).
7. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor is specifically configured to perform the following operations when executing the program:
every continuous n logs form a cache page, a plurality of cache pages are stored in a cache in a storage structure supporting range searching, in the storage structure, the index of each cache page is the position of a first log in the cache, wherein n is a set constant;
when the xth log is requested, searching the xth log in the storage structure, if the cache is not hit, reading a cache page where the log x is located from a disk, adding the cache page into the cache, and returning to the xth log; wherein x is a non-negative integer;
before adding the cache page into the cache, checking whether the cache overflows or not, and if the cache overflows, cleaning the cache until the cache does not overflow;
the performing cache cleaning comprises:
dividing the cache pages with the last access time and the current time less than t seconds into hot areas, and dividing other cache pages into cold areas, wherein t is a set constant;
and preferentially removing the cache pages belonging to the cold area from the cache, and if the cold area is empty, continuously removing the cache pages belonging to the hot area.
8. The electronic device of claim 7, wherein the processor, when performing cache scrubbing, is further configured to:
calculating the weight of each cache page:
the weight of the cache page is (log tail position-cache page position) × 1
And preferentially moving out the cache page with the minimum weight value in the same region according to the weight value of the cache page.
9. The electronic device of claim 7 or 8, wherein the processor is specifically configured to:
according to Pk={Ln*k,Ln*k+1,…,Ln*(k+1)-1Determining the log contained in each cache page;
wherein, PkRepresenting the kth cache page, wherein k is a non-negative integer;
Ln*krepresenting the nth log;
n is the number of logs each cache page contains.
10. The electronic device of claim 9, wherein the processor, when looking up the xth log in the storage structure, is specifically configured to:
searching all the cache pages with the first log sequence numbers smaller than or equal to x in the storage structure, wherein the cache page with the largest first log sequence number is recorded as the sequence number of the first log in the found cache pages;
judging whether x satisfies the condition: x < y + n, hit the cache if satisfied, otherwise miss the cache.
11. The electronic device of claim 9, wherein the processor is further configured to:
if the cache is hit, and the hit cache page PkIs located in the hot area, and the x-th log is located in the cache page PkThe tail of the cache page P is asynchronously loadedkThe following cache page Pk+1
12. The electronic device of claim 11, wherein the processor is configured to determine whether the xth log is located in the PkThe tail section of (2) is specifically used for:
judgment (location of the xth Log-cache Page PkPosition)>Whether a cache page length tail threshold is established or not;
if yes, the xth log is located in the cache page PkThe tail of (a);
wherein, the tail threshold is a predefined constant with a value range of (0, 1).
13. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method steps of any one of claims 1 to 6.
CN201910460169.8A 2019-05-30 2019-05-30 Log caching method and device Pending CN112015678A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910460169.8A CN112015678A (en) 2019-05-30 2019-05-30 Log caching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910460169.8A CN112015678A (en) 2019-05-30 2019-05-30 Log caching method and device

Publications (1)

Publication Number Publication Date
CN112015678A true CN112015678A (en) 2020-12-01

Family

ID=73501823

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910460169.8A Pending CN112015678A (en) 2019-05-30 2019-05-30 Log caching method and device

Country Status (1)

Country Link
CN (1) CN112015678A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102760101A (en) * 2012-05-22 2012-10-31 中国科学院计算技术研究所 SSD-based (Solid State Disk) cache management method and system
US20140281131A1 (en) * 2013-03-15 2014-09-18 Fusion-Io, Inc. Systems and methods for persistent cache logging
CN108829343A (en) * 2018-05-10 2018-11-16 中国科学院软件研究所 A kind of cache optimization method based on artificial intelligence
CN109271355A (en) * 2018-08-27 2019-01-25 杭州迪普科技股份有限公司 A kind of method and device of cleaning journal file caching

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102760101A (en) * 2012-05-22 2012-10-31 中国科学院计算技术研究所 SSD-based (Solid State Disk) cache management method and system
US20140281131A1 (en) * 2013-03-15 2014-09-18 Fusion-Io, Inc. Systems and methods for persistent cache logging
CN108829343A (en) * 2018-05-10 2018-11-16 中国科学院软件研究所 A kind of cache optimization method based on artificial intelligence
CN109271355A (en) * 2018-08-27 2019-01-25 杭州迪普科技股份有限公司 A kind of method and device of cleaning journal file caching

Similar Documents

Publication Publication Date Title
EP3229142B1 (en) Read cache management method and device based on solid state drive
Zhou et al. Second-level buffer cache management
CN103440207B (en) Caching method and caching device
Liu et al. Hybrid storage management for database systems
US7203815B2 (en) Multi-level page cache for enhanced file system performance via read ahead
CN106547476B (en) Method and apparatus for data storage system
Wu et al. {AC-Key}: Adaptive caching for {LSM-based}{Key-Value} stores
CN108268219B (en) Method and device for processing IO (input/output) request
US8032708B2 (en) Method and system for caching data in a storgae system
JP6711121B2 (en) Information processing apparatus, cache memory control method, and cache memory control program
CN108845957B (en) Replacement and write-back self-adaptive buffer area management method
US11620219B2 (en) Storage drive dependent track removal in a cache for storage
US20060143395A1 (en) Method and apparatus for managing a cache memory in a mass-storage system
Yao et al. Building efficient key-value stores via a lightweight compaction tree
CN104077242A (en) Cache management method and device
US11593268B2 (en) Method, electronic device and computer program product for managing cache
Ahn et al. μ*-Tree: An ordered index structure for NAND flash memory with adaptive page layout scheme
CN109002400B (en) Content-aware computer cache management system and method
US20170262485A1 (en) Non-transitory computer-readable recording medium, data management device, and data management method
CN109144431B (en) Data block caching method, device, equipment and storage medium
CN113672166A (en) Data processing method and device, electronic equipment and storage medium
CN112015678A (en) Log caching method and device
US20100077147A1 (en) Methods for caching directory structure of a file system
CN113050894A (en) Agricultural spectrum hybrid storage system cache replacement algorithm based on cuckoo algorithm
Park et al. A flash-based SSD cache management scheme for high performance home cloud storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination