CN114741382A - Caching method and system for reducing read time delay - Google Patents
Caching method and system for reducing read time delay Download PDFInfo
- Publication number
- CN114741382A CN114741382A CN202110020501.6A CN202110020501A CN114741382A CN 114741382 A CN114741382 A CN 114741382A CN 202110020501 A CN202110020501 A CN 202110020501A CN 114741382 A CN114741382 A CN 114741382A
- Authority
- CN
- China
- Prior art keywords
- key
- sst
- sst file
- column family
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24552—Database cache management
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a cache system for reducing read time delay, which is characterized by comprising the following components: Key-Value column family B, column family B contains SST files; and the secondary index comprises data of < f (key) and SST file positioning information >, wherein f (key) is a one-to-one function mapping of the key, and the SST file positioning information is the position information of the SST file containing the key in the column family. According to the method, a secondary index of the Key is established by constructing the Key and SST File positioning information of a Level 0 layer where the Key is located, an SST File in the Level 0 layer where the Key is located is directly positioned for searching, the time complexity of reading operation is O (1) from the traditional O (n), according to an experimental result, after an optimized storage engine is used by a service system, the average reading time delay is reduced from several ms to more than 100 us, and the data reading time is not increased along with the increase of the stored data quantity.
Description
Technical Field
The method of the invention is applied to a cache scene based on an LSM tree, and relates to a method and a system for reducing read operation time delay.
Background
How to maintain high capacity and high QPS of a data service system under the condition of mass data becomes a common contradiction of the current data service industry, and a cache system with relatively low read-write time delay is a common requirement in the industry. A currently commonly used cache storage system is a storage system based on a LSM Tree (Log-Structured-target-Tree), such as RocksDB, and when data is written, the data is written into a Log first, and the data is inserted into a memory file. Memory file is a structure similar to a balanced binary tree, and when the Memory file reaches a certain size, the Memory file is written into a disk file, so that the disk file written by the Memory file is called a Level 0 SST (coded Strings Table) file. Data inside the SST files of the Level 0 layer are ordered, but there may be overlap between different SST files. Therefore, when a key is searched, all SST files need to be traversed, and the read delay is increased sharply as the data volume increases.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, an object of the present invention is to provide a caching method and system for reducing read latency, which avoid the missing synchronization of data and greatly save the development workload.
To achieve the above and other related objects, the present invention provides a cache system for reducing read latency, comprising: Key-Value column family B, column family B contains SST files; and the secondary index comprises data of < f (key) and SST file positioning information >, wherein f (key) is a one-to-one function mapping of the key, and the SST file positioning information is the position information of the SST file containing the key in the column family.
Preferably, in the cache system for reducing read latency, f (Key) is the Key itself, or a hash value of the Key, or the Key is mapped to 8 bytes of data by siphorash.
Preferably, in the cache system for reducing read latency, the SST file location information includes an ID of the SST file or a system timestamp of the Key when the Key is written.
Preferably, in the above cache system for reducing read latency, the cache system is applied in a storage engine supporting multiple column families, and the second-level index C is a newly created column family.
The invention also provides a cache method for reducing the read time delay, which is applied to a storage engine supporting a plurality of columns, and is characterized by comprising the following steps: establishing a Key-Value column family, wherein the Key-Value column family contains SST files; establishing a secondary index, wherein the secondary index comprises < f (key), SST file positioning information > data, wherein f (key) is a one-to-one function mapping of keys, and the SST file positioning information is the position information of the SST files containing the keys in the column family.
Preferably, in the above caching method for reducing read latency, f (Key) is the Key itself, or a hash value of the Key, or the Key is mapped to 8 bytes of data by SipHash.
Preferably, in the caching method for reducing read latency, the SST file location information includes an ID of the SST file or a system timestamp of the Key when the Key is written.
The invention also provides a cache access method for reducing read time delay based on the LSM tree storage system, which is characterized by comprising the following steps: a step of accessing a secondary index to obtain SST file location information for a Key in the Key-Value column family, the secondary index including < f (Key), SST file location information > data, the f (Key) being a one-to-one function mapping of the Key.
Preferably, in the above cache access method for reducing read latency based on an LSM tree storage system, f (Key) is the Key itself, or a hash value of the Key, or the Key is mapped to 8 bytes of data by SipHash.
Preferably, in the cache access method for reducing read latency based on the LSM tree storage system, the SST file location information includes an ID of the SST file or a system timestamp of the Key when the Key is written.
Drawings
Fig. 1 is a logical structure diagram of a cache system for reducing read latency according to the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.
Fig. 1 is a logical structure diagram of a cache system for reducing read latency according to the present invention. The mark a indicates a KV cache system, which includes a Key + Value column family (KV column family) B, and the column family B contains an SST file. The cache system a further includes a second-level index C, which may be a newly created column family in a storage engine scenario supporting a multiple column family (column family).
The data format of the secondary index C includes < f (Key), SST file location information >, that is, an association between f (Key) and SST file location information is established, where f (Key) is a one-to-one function mapping of a Key, for example, a hash value of the Key itself or a Key, and SST file location information refers to location information of the SST file containing the Key in the column B, which may have various implementation schemes according to business scenarios or implementation difficulties, for example, a globally unique and monotonically increasing file ID of the SST file may be used, or some features (for example, a system timestamp) recorded when the Key is written may be used as a location cursor of the SST file, as long as the location of the SST file in the column B can be accurately and quickly located, where the location refers to a location where the SST file that the system wishes to return as a read result, such as a latest SST file.
And (key), the smaller the SST file positioning information data is, the more the secondary index C has, the more parts can be put into a memory, and therefore the more efficient the adding and deleting searching performance is. In this example, the generation mechanism of f (Key) is to store the 8-byte data mapped by Key using siphorash, so that the secondary index of Key can be guaranteed to be all in the memory as much as possible.
The following describes a buffering method for reducing read latency according to the present invention with reference to the system of fig. 1, where the numbers in fig. 1 represent the logical order of priority.
The write operation is similar to that of the conventional cache system, and is not repeated. Different from the traditional cache mode, the invention adds the step of generating the secondary index information of the Key. The secondary index C includes data of < f (Key), SST file location information >, that is, an association between f (Key) and SST file location information is established, where f (Key) is a one-to-one function mapping of a Key, for example, a hash value of the Key itself or the Key, and SST file location information refers to location information of an SST file containing the Key in the column B, and it may have various implementation schemes according to business scenarios or implementation difficulties, for example, a globally unique and monotonically increasing file ID of the SST file may be used, or some features (for example, a system timestamp) recorded when the Key is written may be used as a location cursor of the SST file, as long as the location of the SST file in the column B can be accurately and quickly located, where the location refers to a location where the SST file that the system wishes to return as a read result, such as a latest SST file.
For read operations, SST files locate information. With the position of the SST file in the column family B, the SST file with the Key can be directly inquired from the column family B and the result is returned.
According to the method, a secondary index of the Key is established by constructing the Key and SST File positioning information of a Level 0 layer where the Key is located, an SST File in the Level 0 layer where the Key is located is directly positioned for searching, the time complexity of reading operation is O (1) from the traditional O (n), according to an experimental result, after an optimized storage engine is used by a service system, the average reading time delay is reduced from several ms to more than 100 us, and the data reading time is not increased along with the increase of the stored data volume, so that the method has high industrial utilization value.
Claims (10)
1. A cache system for reducing read latency, comprising:
Key-Value column family B, column family B contains SST files;
and the secondary index comprises data of < f (key) and SST file positioning information >, wherein f (key) is a one-to-one function mapping of the key, and the SST file positioning information is the position information of the SST file containing the key in the column family.
2. The cache system for reducing read latency of claim 1, wherein f (Key) is the Key itself, or a hash value of the Key, or the Key is mapped to 8 bytes of data using siphorah.
3. The cache system for reducing read latency of claim 1, wherein the SST file location information comprises an ID of the SST file or a system timestamp of the Key write.
4. The cache system for reducing read latency of claim 1, wherein the cache system is applied in a storage engine supporting a multi-column family, and the second level index C is a newly opened column family.
5. A cache method for reducing read latency, applied in a storage engine supporting multiple column families, is characterized by comprising:
establishing a Key-Value column family, wherein the Key-Value column family contains SST files;
establishing a secondary index, wherein the secondary index comprises < f (key), SST file positioning information > data, wherein f (key) is a one-to-one function mapping of keys, and the SST file positioning information is the position information of the SST files containing the keys in the column family.
6. The caching method for reducing read latency according to claim 5, wherein f (Key) is the Key itself, or a hash value of the Key, or the Key is mapped to 8 bytes of data by SipHash.
7. The caching method for reducing the read latency according to claim 5, wherein the SST file location information comprises an ID of the SST file or a system timestamp of the Key when it is written.
8. A cache access method for reducing read delay based on an LSM tree storage system is characterized by comprising the following steps:
a step of accessing a secondary index to obtain SST file location information for a Key in the Key-Value column family, the secondary index including < f (Key), SST file location information > data, the f (Key) being a one-to-one function mapping of the Key.
9. The LSM tree storage system-based cache access method with reduced read latency of claim 8, wherein f (Key) is the Key itself, or the hash value of the Key, or the Key is mapped to 8 bytes of data by siphorah.
10. The LSM tree storage system-based cache access method with reduced read latency of claim 8, wherein the SST file location information comprises an ID of the SST file or a system timestamp of the Key when it is written.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110020501.6A CN114741382A (en) | 2021-01-07 | 2021-01-07 | Caching method and system for reducing read time delay |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110020501.6A CN114741382A (en) | 2021-01-07 | 2021-01-07 | Caching method and system for reducing read time delay |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114741382A true CN114741382A (en) | 2022-07-12 |
Family
ID=82274016
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110020501.6A Pending CN114741382A (en) | 2021-01-07 | 2021-01-07 | Caching method and system for reducing read time delay |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114741382A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117390031A (en) * | 2023-12-11 | 2024-01-12 | 武汉纺织大学 | Verification method for validity of secondary index in storage system based on LSM tree |
-
2021
- 2021-01-07 CN CN202110020501.6A patent/CN114741382A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117390031A (en) * | 2023-12-11 | 2024-01-12 | 武汉纺织大学 | Verification method for validity of secondary index in storage system based on LSM tree |
CN117390031B (en) * | 2023-12-11 | 2024-03-08 | 武汉纺织大学 | Verification method for validity of secondary index in storage system based on LSM tree |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108984420B (en) | Managing multiple namespaces in non-volatile memory (NVM) | |
US8225029B2 (en) | Data storage processing method, data searching method and devices thereof | |
CN107168657B (en) | Virtual disk hierarchical cache design method based on distributed block storage | |
US9990276B2 (en) | Read-write control method for memory, and corresponding memory and server | |
US20070124277A1 (en) | Index and Method for Extending and Querying Index | |
JP2015512604A (en) | Cryptographic hash database | |
CN107817946B (en) | Method and device for reading and writing data of hybrid storage device | |
CN108628542B (en) | File merging method and controller | |
CN107391544B (en) | Processing method, device and equipment of column type storage data and computer storage medium | |
CN113535670B (en) | Virtual resource mirror image storage system and implementation method thereof | |
CN113704217A (en) | Metadata and data organization architecture method in distributed persistent memory file system | |
CN114741382A (en) | Caching method and system for reducing read time delay | |
US7185020B2 (en) | Generating one or more block addresses based on an identifier of a hierarchical data structure | |
CN101034416A (en) | Method for file seek track using file allocation table and system frame thereof | |
CN110309081B (en) | FTL data page reading and writing method based on compressed storage and address mapping table entry | |
US8195696B2 (en) | File format converting method | |
CN113609245B (en) | Method and system for expanding capacity of index by fragments | |
CN112181288B (en) | Data processing method of nonvolatile storage medium and computer storage medium | |
US20130218851A1 (en) | Storage system, data management device, method and program | |
CN117290390B (en) | Method for memory mapping on big data retrieval based on special index | |
CN112506922A (en) | Embedded IoT time sequence database design method for hybrid solid-state storage system | |
CN112306418A (en) | Data recording method based on RawFS in VxWorks environment | |
CN107506156B (en) | Io optimization method of block device | |
KR100859989B1 (en) | Apparatus for managing space information of flash memory and method of the same | |
CN111831423A (en) | Method and system for realizing Redis memory database on nonvolatile memory |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
DD01 | Delivery of document by public notice | ||
DD01 | Delivery of document by public notice |
Addressee: Patent of xiaohongshu Technology Co.,Ltd. The person in charge Document name: Notification of conformity |
|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |