CN114741382A - Caching method and system for reducing read time delay - Google Patents

Caching method and system for reducing read time delay Download PDF

Info

Publication number
CN114741382A
CN114741382A CN202110020501.6A CN202110020501A CN114741382A CN 114741382 A CN114741382 A CN 114741382A CN 202110020501 A CN202110020501 A CN 202110020501A CN 114741382 A CN114741382 A CN 114741382A
Authority
CN
China
Prior art keywords
key
sst
sst file
column family
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110020501.6A
Other languages
Chinese (zh)
Inventor
刘军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiaohongshu Technology Co ltd
Original Assignee
Xiaohongshu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiaohongshu Technology Co ltd filed Critical Xiaohongshu Technology Co ltd
Priority to CN202110020501.6A priority Critical patent/CN114741382A/en
Publication of CN114741382A publication Critical patent/CN114741382A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a cache system for reducing read time delay, which is characterized by comprising the following components: Key-Value column family B, column family B contains SST files; and the secondary index comprises data of < f (key) and SST file positioning information >, wherein f (key) is a one-to-one function mapping of the key, and the SST file positioning information is the position information of the SST file containing the key in the column family. According to the method, a secondary index of the Key is established by constructing the Key and SST File positioning information of a Level 0 layer where the Key is located, an SST File in the Level 0 layer where the Key is located is directly positioned for searching, the time complexity of reading operation is O (1) from the traditional O (n), according to an experimental result, after an optimized storage engine is used by a service system, the average reading time delay is reduced from several ms to more than 100 us, and the data reading time is not increased along with the increase of the stored data quantity.

Description

Caching method and system for reducing read time delay
Technical Field
The method of the invention is applied to a cache scene based on an LSM tree, and relates to a method and a system for reducing read operation time delay.
Background
How to maintain high capacity and high QPS of a data service system under the condition of mass data becomes a common contradiction of the current data service industry, and a cache system with relatively low read-write time delay is a common requirement in the industry. A currently commonly used cache storage system is a storage system based on a LSM Tree (Log-Structured-target-Tree), such as RocksDB, and when data is written, the data is written into a Log first, and the data is inserted into a memory file. Memory file is a structure similar to a balanced binary tree, and when the Memory file reaches a certain size, the Memory file is written into a disk file, so that the disk file written by the Memory file is called a Level 0 SST (coded Strings Table) file. Data inside the SST files of the Level 0 layer are ordered, but there may be overlap between different SST files. Therefore, when a key is searched, all SST files need to be traversed, and the read delay is increased sharply as the data volume increases.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, an object of the present invention is to provide a caching method and system for reducing read latency, which avoid the missing synchronization of data and greatly save the development workload.
To achieve the above and other related objects, the present invention provides a cache system for reducing read latency, comprising: Key-Value column family B, column family B contains SST files; and the secondary index comprises data of < f (key) and SST file positioning information >, wherein f (key) is a one-to-one function mapping of the key, and the SST file positioning information is the position information of the SST file containing the key in the column family.
Preferably, in the cache system for reducing read latency, f (Key) is the Key itself, or a hash value of the Key, or the Key is mapped to 8 bytes of data by siphorash.
Preferably, in the cache system for reducing read latency, the SST file location information includes an ID of the SST file or a system timestamp of the Key when the Key is written.
Preferably, in the above cache system for reducing read latency, the cache system is applied in a storage engine supporting multiple column families, and the second-level index C is a newly created column family.
The invention also provides a cache method for reducing the read time delay, which is applied to a storage engine supporting a plurality of columns, and is characterized by comprising the following steps: establishing a Key-Value column family, wherein the Key-Value column family contains SST files; establishing a secondary index, wherein the secondary index comprises < f (key), SST file positioning information > data, wherein f (key) is a one-to-one function mapping of keys, and the SST file positioning information is the position information of the SST files containing the keys in the column family.
Preferably, in the above caching method for reducing read latency, f (Key) is the Key itself, or a hash value of the Key, or the Key is mapped to 8 bytes of data by SipHash.
Preferably, in the caching method for reducing read latency, the SST file location information includes an ID of the SST file or a system timestamp of the Key when the Key is written.
The invention also provides a cache access method for reducing read time delay based on the LSM tree storage system, which is characterized by comprising the following steps: a step of accessing a secondary index to obtain SST file location information for a Key in the Key-Value column family, the secondary index including < f (Key), SST file location information > data, the f (Key) being a one-to-one function mapping of the Key.
Preferably, in the above cache access method for reducing read latency based on an LSM tree storage system, f (Key) is the Key itself, or a hash value of the Key, or the Key is mapped to 8 bytes of data by SipHash.
Preferably, in the cache access method for reducing read latency based on the LSM tree storage system, the SST file location information includes an ID of the SST file or a system timestamp of the Key when the Key is written.
Drawings
Fig. 1 is a logical structure diagram of a cache system for reducing read latency according to the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.
Fig. 1 is a logical structure diagram of a cache system for reducing read latency according to the present invention. The mark a indicates a KV cache system, which includes a Key + Value column family (KV column family) B, and the column family B contains an SST file. The cache system a further includes a second-level index C, which may be a newly created column family in a storage engine scenario supporting a multiple column family (column family).
The data format of the secondary index C includes < f (Key), SST file location information >, that is, an association between f (Key) and SST file location information is established, where f (Key) is a one-to-one function mapping of a Key, for example, a hash value of the Key itself or a Key, and SST file location information refers to location information of the SST file containing the Key in the column B, which may have various implementation schemes according to business scenarios or implementation difficulties, for example, a globally unique and monotonically increasing file ID of the SST file may be used, or some features (for example, a system timestamp) recorded when the Key is written may be used as a location cursor of the SST file, as long as the location of the SST file in the column B can be accurately and quickly located, where the location refers to a location where the SST file that the system wishes to return as a read result, such as a latest SST file.
And (key), the smaller the SST file positioning information data is, the more the secondary index C has, the more parts can be put into a memory, and therefore the more efficient the adding and deleting searching performance is. In this example, the generation mechanism of f (Key) is to store the 8-byte data mapped by Key using siphorash, so that the secondary index of Key can be guaranteed to be all in the memory as much as possible.
The following describes a buffering method for reducing read latency according to the present invention with reference to the system of fig. 1, where the numbers in fig. 1 represent the logical order of priority.
The write operation is similar to that of the conventional cache system, and is not repeated. Different from the traditional cache mode, the invention adds the step of generating the secondary index information of the Key. The secondary index C includes data of < f (Key), SST file location information >, that is, an association between f (Key) and SST file location information is established, where f (Key) is a one-to-one function mapping of a Key, for example, a hash value of the Key itself or the Key, and SST file location information refers to location information of an SST file containing the Key in the column B, and it may have various implementation schemes according to business scenarios or implementation difficulties, for example, a globally unique and monotonically increasing file ID of the SST file may be used, or some features (for example, a system timestamp) recorded when the Key is written may be used as a location cursor of the SST file, as long as the location of the SST file in the column B can be accurately and quickly located, where the location refers to a location where the SST file that the system wishes to return as a read result, such as a latest SST file.
For read operations, SST files locate information. With the position of the SST file in the column family B, the SST file with the Key can be directly inquired from the column family B and the result is returned.
According to the method, a secondary index of the Key is established by constructing the Key and SST File positioning information of a Level 0 layer where the Key is located, an SST File in the Level 0 layer where the Key is located is directly positioned for searching, the time complexity of reading operation is O (1) from the traditional O (n), according to an experimental result, after an optimized storage engine is used by a service system, the average reading time delay is reduced from several ms to more than 100 us, and the data reading time is not increased along with the increase of the stored data volume, so that the method has high industrial utilization value.

Claims (10)

1. A cache system for reducing read latency, comprising:
Key-Value column family B, column family B contains SST files;
and the secondary index comprises data of < f (key) and SST file positioning information >, wherein f (key) is a one-to-one function mapping of the key, and the SST file positioning information is the position information of the SST file containing the key in the column family.
2. The cache system for reducing read latency of claim 1, wherein f (Key) is the Key itself, or a hash value of the Key, or the Key is mapped to 8 bytes of data using siphorah.
3. The cache system for reducing read latency of claim 1, wherein the SST file location information comprises an ID of the SST file or a system timestamp of the Key write.
4. The cache system for reducing read latency of claim 1, wherein the cache system is applied in a storage engine supporting a multi-column family, and the second level index C is a newly opened column family.
5. A cache method for reducing read latency, applied in a storage engine supporting multiple column families, is characterized by comprising:
establishing a Key-Value column family, wherein the Key-Value column family contains SST files;
establishing a secondary index, wherein the secondary index comprises < f (key), SST file positioning information > data, wherein f (key) is a one-to-one function mapping of keys, and the SST file positioning information is the position information of the SST files containing the keys in the column family.
6. The caching method for reducing read latency according to claim 5, wherein f (Key) is the Key itself, or a hash value of the Key, or the Key is mapped to 8 bytes of data by SipHash.
7. The caching method for reducing the read latency according to claim 5, wherein the SST file location information comprises an ID of the SST file or a system timestamp of the Key when it is written.
8. A cache access method for reducing read delay based on an LSM tree storage system is characterized by comprising the following steps:
a step of accessing a secondary index to obtain SST file location information for a Key in the Key-Value column family, the secondary index including < f (Key), SST file location information > data, the f (Key) being a one-to-one function mapping of the Key.
9. The LSM tree storage system-based cache access method with reduced read latency of claim 8, wherein f (Key) is the Key itself, or the hash value of the Key, or the Key is mapped to 8 bytes of data by siphorah.
10. The LSM tree storage system-based cache access method with reduced read latency of claim 8, wherein the SST file location information comprises an ID of the SST file or a system timestamp of the Key when it is written.
CN202110020501.6A 2021-01-07 2021-01-07 Caching method and system for reducing read time delay Pending CN114741382A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110020501.6A CN114741382A (en) 2021-01-07 2021-01-07 Caching method and system for reducing read time delay

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110020501.6A CN114741382A (en) 2021-01-07 2021-01-07 Caching method and system for reducing read time delay

Publications (1)

Publication Number Publication Date
CN114741382A true CN114741382A (en) 2022-07-12

Family

ID=82274016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110020501.6A Pending CN114741382A (en) 2021-01-07 2021-01-07 Caching method and system for reducing read time delay

Country Status (1)

Country Link
CN (1) CN114741382A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117390031A (en) * 2023-12-11 2024-01-12 武汉纺织大学 Verification method for validity of secondary index in storage system based on LSM tree

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117390031A (en) * 2023-12-11 2024-01-12 武汉纺织大学 Verification method for validity of secondary index in storage system based on LSM tree
CN117390031B (en) * 2023-12-11 2024-03-08 武汉纺织大学 Verification method for validity of secondary index in storage system based on LSM tree

Similar Documents

Publication Publication Date Title
CN108984420B (en) Managing multiple namespaces in non-volatile memory (NVM)
US8225029B2 (en) Data storage processing method, data searching method and devices thereof
CN107168657B (en) Virtual disk hierarchical cache design method based on distributed block storage
US9990276B2 (en) Read-write control method for memory, and corresponding memory and server
US20070124277A1 (en) Index and Method for Extending and Querying Index
JP2015512604A (en) Cryptographic hash database
CN107817946B (en) Method and device for reading and writing data of hybrid storage device
CN108628542B (en) File merging method and controller
CN107391544B (en) Processing method, device and equipment of column type storage data and computer storage medium
CN113535670B (en) Virtual resource mirror image storage system and implementation method thereof
CN113704217A (en) Metadata and data organization architecture method in distributed persistent memory file system
CN114741382A (en) Caching method and system for reducing read time delay
US7185020B2 (en) Generating one or more block addresses based on an identifier of a hierarchical data structure
CN101034416A (en) Method for file seek track using file allocation table and system frame thereof
CN110309081B (en) FTL data page reading and writing method based on compressed storage and address mapping table entry
US8195696B2 (en) File format converting method
CN113609245B (en) Method and system for expanding capacity of index by fragments
CN112181288B (en) Data processing method of nonvolatile storage medium and computer storage medium
US20130218851A1 (en) Storage system, data management device, method and program
CN117290390B (en) Method for memory mapping on big data retrieval based on special index
CN112506922A (en) Embedded IoT time sequence database design method for hybrid solid-state storage system
CN112306418A (en) Data recording method based on RawFS in VxWorks environment
CN107506156B (en) Io optimization method of block device
KR100859989B1 (en) Apparatus for managing space information of flash memory and method of the same
CN111831423A (en) Method and system for realizing Redis memory database on nonvolatile memory

Legal Events

Date Code Title Description
DD01 Delivery of document by public notice
DD01 Delivery of document by public notice

Addressee: Patent of xiaohongshu Technology Co.,Ltd. The person in charge

Document name: Notification of conformity

PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination