WO2014090097A1 - 一种数据存储方法和装置 - Google Patents
一种数据存储方法和装置 Download PDFInfo
- Publication number
- WO2014090097A1 WO2014090097A1 PCT/CN2013/088286 CN2013088286W WO2014090097A1 WO 2014090097 A1 WO2014090097 A1 WO 2014090097A1 CN 2013088286 W CN2013088286 W CN 2013088286W WO 2014090097 A1 WO2014090097 A1 WO 2014090097A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- key
- data block
- length
- block
- data
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0613—Improving I/O performance in relation to throughput
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2255—Hash tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2365—Ensuring data consistency and integrity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
- G06F16/24561—Intermediate data storage techniques for performance improvement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0679—Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
Definitions
- Embodiments of the present invention relate to the field of information processing technologies, and, more particularly, to a data storage method and apparatus. Background of the invention
- the key-value distributed storage system has the advantages of fast query speed, large amount of data storage, and high concurrency support (such as supporting multiple concurrent query processes). It is very suitable for querying through primary keys, but it cannot be complicated. Conditional query. If the real-time search engine (Real-Time Search Engine) is used for complex condition retrieval and full-text search, it can replace the low-performance relational database such as MySQL, which achieves high concurrency, high performance and saves the number of servers.
- Real-Time Search Engine Real-Time Search Engine
- the embodiment of the invention provides a data storage method, which can improve the utilization of the storage space.
- the embodiment of the invention also provides a data storage device, which can improve the utilization of the storage space.
- a data storage method comprising:
- Storing a fixed length key and a value thereof in the first data block wherein the storing the fixed length key comprises: uniformly storing a common prefix of each fixed length key, and separately storing remaining parts of each fixed length key after removing the common prefix;
- variable length key and its value are stored in the second data block, wherein the storage variable length key comprises: a full length storage variable of the reference key type, and a prefix compression of the variable length key of the prefix compression key type.
- a data storage device comprising: a fixed length key storage unit and a variable length key storage unit, wherein: a fixed length key storage unit, configured to store a fixed length key and a value thereof in the first data block, wherein the storing
- the fixed length key includes: uniformly storing a common prefix of each fixed length key, and separately storing the remaining parts after each fixed length key removes the common prefix;
- variable length key storage unit configured to store a variable length key and a value thereof in the second data block, wherein the storage variable length key comprises: a full length storage variable length key of the reference key type, and a variable length key of the prefix compression key type Perform prefix compression.
- FIG. 1 is a schematic structural diagram of a computing device of an embodiment
- FIG. 2 is a schematic diagram of a file format according to an embodiment of the present invention.
- FIG. 3 is a flowchart of a data storage method according to an embodiment of the present invention.
- FIG. 4 is a schematic diagram of a data block storage structure for storing a record having a fixed length key according to an embodiment of the present invention
- FIG. 5 is a schematic diagram of a process of writing a fixed length key to a data block according to an embodiment of the present invention
- FIG. 6 is a schematic diagram of a data block storage structure for storing records having variable length keys according to an embodiment of the present invention
- FIG. 7 is a schematic diagram of a process of writing variable length keys into data blocks according to an embodiment of the present invention.
- FIG. 8 is a schematic diagram of a storage structure of a bloom filter according to an embodiment of the present invention.
- FIG. 9 is a schematic diagram of a process of writing block index information into an index block according to an embodiment of the present invention.
- FIG. 10 is a schematic diagram of a storage structure of a file header according to an embodiment of the present invention.
- FIG. 11 is a schematic flowchart of writing a record to a file according to an embodiment of the present invention.
- FIG. 12 is a schematic diagram of a file reading method according to an embodiment of the present invention.
- FIG. 13 is a schematic diagram of a process of reading and recording according to an embodiment of the present invention.
- FIG. 14 is a structural diagram of a data storage device according to an embodiment of the present invention.
- FIG. 15 is a structural diagram of a data storage device according to an embodiment of the present invention.
- FIG. 1 is a block diagram showing the structure of a computing device of an embodiment.
- computer 100 can be a computing device capable of implementing the methods and software systems provided by the various examples of the present invention.
- computer 100 can be a personal computer or a portable device such as a laptop, tablet, cell phone or smartphone, and the like.
- the computer 100 can also be a server connected to the above device via a network.
- Computer 100 can have different capabilities and features. Various possible implementations are protected in this article Within the scope.
- computer 100 can include a keypad/keyboard 156, and can also include a display 154, such as a liquid crystal display (LCD), or a display with advanced features, such as a touch sensitive 2D or 3D display.
- a web-enabled computer 100 can include one or more physical or virtual keyboards, as well as mass storage device 130.
- the computer 100 may also include or allow various operating systems 141, such as the WindowsTM or LinuxTM operating system, or a mobile operating system such as iOSTM, AndroidTM, or Windows MobileTM.
- Computer 100 can include or run various applications 142, such as data storage application 145.
- the data storage application 145 is capable of storing an ordered record in the file format of the embodiment of the present invention into the non-volatile storage device 130.
- computer 100 can include one or more processor readable non-volatile storage media 130 and one or more processors 122 in communication with storage medium 130.
- the processor readable non-volatile storage medium 130 can be a RAM, a flash memory, a ROM, an EPROM, an EEPROM, a register, a hard disk, a removable hard disk, a CD-ROM, or other various forms of non-volatile storage media.
- Storage medium 130 may store a series of instructions or units and/or modules containing instructions for performing the operations of various embodiments of the present invention.
- the processor can execute the above instructions to perform the operations in the various embodiments.
- a data persistence file format is proposed, which can be based on key sorting, supports fixed length, variable length key, and value, and the key can be prefix-compressed.
- the data block (block) is used as a storage unit, which is advantageous for 10 and the strength of the analysis.
- the prefix compression method is used in the data block of the embodiment of the present invention, and each data block is compressed, thereby effectively reducing the storage space of the data, and improving the disk utilization of the machine.
- the index block may be used. And the order inside the data block, quickly locate the data of the query.
- the data storage method of the key ordering of the embodiment of the present invention may include the following steps.
- the fixed length key and its value are stored in the first data block, wherein the storing the fixed length key comprises: uniformly storing the common prefix of each fixed length key, and respectively storing the remaining part after each fixed length key removes the common prefix.
- variable length key and its value are stored in the second data block, wherein the storage variable length key comprises: a full length storage variable of the reference key type, and a prefix compression of the variable length key of the prefix compression key type.
- the first data block is dedicated to storing fixed length keys and their values
- the second data block is dedicated to storing variable length keys and their values.
- variable length key can be divided into a reference key type and a prefix compression key type according to a preset threshold length and a threshold difference, where: And comparing the current variable length key with the previous reference key, and if the same prefix string is less than the threshold length, determining that the current variable length key is a reference key type;
- the current variable length key is compared with the previous reference key by a prefix. If the same prefix string is greater than the threshold length and less than the sum of the threshold length and the threshold difference, it is determined that the current variable length key is a prefix compression key type.
- variable length keys of the prefix compression key type perform prefix compression including:
- variable length key of the prefix compression key type For the variable length key of the prefix compression key type, storing the length of the common prefix of the variable length key of the prefix compression key type and the previous reference key, and the variable length key storing the prefix compression key type to remove the remaining part after the common prefix .
- the first data block when it is determined that the fixed length key and the value thereof are stored to the first data block, the first data block is compressed, and the storage buffer is allocated; when the size of the first data block is smaller than When the buffer is stored, the compressed first data block is written into the storage buffer;
- the second data block is compressed, and the storage buffer is allocated; when the size of the second data block after compression is smaller than the storage buffer And writing the compressed second data block to the storage buffer.
- the method can also include:
- the Bloom filter information of the fixed length key and its value is written into the Bloom filter
- the Bloom filter information of the fixed length key and its value is written into the Bloom filter.
- a read buffer may be set, and when it is determined that the read length of the read operation is smaller than the buffer and the current data block is not the last data block, the start of the next data block is fetched. Address and record the length of the next data block and continue reading until the read length is greater than the length of the buffer.
- a file format of an embodiment of the present invention consists of a data block (including data block 1, data block 2, up to data block n), a Bloom filter, an index block, and a file header (including a file header and a file header length).
- An ordered record (record) is stored in the data block portion, which is divided into a Key and a value (value). For the Key part, it can be divided into fixed length Key and variable length Key.
- the same type of Key and its Value can be stored in the same data block, for example, in the data block 1, the fixed length Key and its Value are stored; in the data block 2, the variable length Key and its Value are stored, and so on. Moreover, there may be more than one data block that exclusively stores the same type of Key and its Value. In the data block storing the fixed length Key and its Value, it is preferable to store only one copy for the common prefix (prefix) of each fixed length key, and only the remaining part after the common prefix is removed for each key (ie, different departments) Remainder ).
- prefix common prefix
- the base key and the prefix compressed key can be distinguished according to a preset threshold length and a threshold diff.
- the shell 1 J stores the current refix compressed key and the previous base key's prefix length (variable length integer compression), and then stores the refix compressed key of the refix compressed key.
- the compression result of all the cells belonging to the same key can be stored.
- the resident memory block requires a file writer (such as: SSTable writer) for management, if the SSTable writer disappears, the resident memory block also disappears, requiring the user to build the SSTable reader SSTable. Reader, call the SSTable writer's Assign method to swap out the dumped block), then allocate the block, hold the block in the SSTable writer; if the block is not resident in the memory, directly dump the block to the SSTable.
- SSTable writer such as: SSTable writer
- the metadata of the file ie bloom filter, block index, header, header length information
- the file writer ie bloom filter, block index, header, header length information
- a block based on the file format of the embodiment of the present invention needs to allocate a temporary space for storing a key offset, a base offset, and a key when performing a push record. ) , data offset (data offset ) and data (data ).
- the current key is compared with the reference key of the previous record (getting the base offset of the previous record, and the same prefix string is greater than the threshold (thresholdjen, configurable), the prefix is compressed,
- the base offset corresponding to the key stores the offset of the reference key of the previous record, and obtains the length of the same prefix after compression (var int), and then writes in the key temporary allocation area, and then the last written end position, The length of the same prefix (represented by var int) and the key of different parts;
- the data temporary allocation area writes the offset of data and data data data.
- the current key is compared with the reference key of the previous record.
- the prefix compression is not performed.
- the key is stored as the reference key, and the base offset is set.
- the current key offset (the current key is the base compression, this key is not compressed), and then in the key temporary allocation area, then the last written end position, write the key; in the data offset, data temporary allocation area write Input data offset and data data
- the longest common string that has been pushed is calculated.
- the storage structure of the block memory are obtained. The longest common string is stored first, then the remaining strings of each key are stored, then the data data is stored, and finally the header block header is stored.
- the read (get) operation of the data block in the file format mainly includes: when performing get record, searching in the block according to the index of the record; or searching for the record corresponding to the key in the block index.
- the Bloom filter of the embodiment of the present invention is a special degenerate Hash Table. Degraded to not processing Collision, does not store Key value; bloom filter can set the number of hashes, calculate the position of the record in the bitmap according to the number of hashes, and the key of each record, and set it.
- the first layer is filtered according to the bloom filter, and the position in the corresponding bitmap of the record is set to 1 according to the number of hashes and the key of each record. If not, the current key is not In the existing file, if it is 1, it may exist in the file, and it is searched in the file according to the end key of the block index in the file.
- the storage method of the index block (block index) in the embodiment of the present invention (variable length key, fixed length value) It is similar to how blocks are stored.
- the key field is the cell key, which stores the full key of the last cell of each data block (row key + cfid + column ), the value field is the offset of the data block in the file (offset length), and the current row The length of the key (row key length).
- the header of the embodiment of the present invention stores file related information and the offset and length of each part, which facilitates rapid positioning of each part, and saves system resource waste caused by traversing from the file.
- the value is the length of data; for non-fixed length, it is 0;
- the present invention proposes a writing method of recording data.
- the record is first written into the block; if the write is successful, the blowfilter information of the current record is written to the bloomfilter structure; if the write fails, the current block data is full, and the current block is compressed, according to write
- the unused space size and the size of the block in the buffer are allocated the appropriate write buffer. If the current write buffer space is sufficient, the block can be written to the write buffer; if the write buffer space is insufficient, the existing write buffer is written.
- Write buffer is equal to a layer of cache, the cache has been written to several blocks, - write more times Block to disk.
- each block write process will write the last key record of the block into the Block Index structure.
- a file format proposed by the embodiment of the present invention (hereinafter, this file format is referred to as sstable format, and a file of this format is referred to as sstable file).
- This file format can be used to permanently save data, also known as data persistence.
- This file format can store records with fixed lengths, variable length keys, and values.
- a record consists of a key and a value.
- a key and a value belonging to the same record as the key are simply referred to as a key and its value, or a key and its corresponding value, or the currently recorded key and value are simply referred to as the current key and the current value.
- the key is the keyword of the record, which can be entered by the user or generated by other means.
- FIG. 2 is a schematic diagram of a file format according to an embodiment of the present invention.
- the file format is composed of data blocks (including data block 1, data block 2, ⁇ data block n ), and metadata (meta data).
- Metadata includes Bloom filters, index blocks, and file headers (including header and header length).
- a block is used to store records, which can be ordered or unordered.
- records which can be ordered or unordered.
- a record that has been sorted by key can be stored in the data block.
- the bloom filter is a special degenerate hash table. Degraded to not deal with conflicts (Collision), does not store key values.
- the Bloom filter stores information about whether each record is in the file (also known as Bloom filter information).
- the position of the record in the bitmap can be calculated according to the preset number of hashes and the key of the record, and the value of the position is set to indicate that the key exists in the file. For example, when the value of the position in the bitmap is not 1, it indicates that the record does not exist in the file; when the value of the position in the bitmap is 1, it indicates that the record may exist in the file. In the file.
- the position of the key in the bitmap (BitMap) can be calculated according to the preset number of hashes and the key, and whether the key exists in the file is determined according to the value of the position.
- the index block portion is used to store location information of each data block and range information of keys stored in the respective data blocks.
- the stored records are sorted by key, for example, sorted according to the ASCII code of the keys
- the last key stored in each data block can be recorded as a block in the index block.
- the index key end key
- the data block in which the record is stored can be determined according to the block index key in the index block, and the data block is read according to the position of the data block, thereby searching for the record in the data block.
- the file header (including header and header length) is used to store the file information and the offset and length of each part, which helps to quickly locate each part, eliminating the wasted system resources caused by traversing from the beginning of the file.
- the sstable file generation process in an example includes: writing a record to a data block (also referred to as a block); if the write is successful, writing the currently recorded Bloom filter information to Bloom filter The structure of the block; when a block is full, the block index information is written to the index block portion; when all the blocks are written, the metadata (meta data) of the file is written, and the file is written.
- the throughput can be improved by 10
- the resolution efficiency can be improved
- the resolution speed can be accelerated.
- a write buffer to the file in memory; allocate multiple temporary storage areas for one block in memory, and The temporary storage area writes information of the record, for example, a temporary storage area can be allocated for each part in the data block structure. If the write fails, it indicates that the current data block is full. According to the size of the unused space in the write buffer and the size of the current data block, it is determined whether the storage buffer space is sufficient. If the current storage buffer space is sufficient, write the data block to the storage buffer; if the storage buffer space is insufficient, write the existing storage buffer to the disk, and then reallocate the storage buffer. To store the current data block and subsequent data blocks.
- the storage buffer is equivalent to a layer of cache, which is used to cache several blocks that have been written, and then write these blocks to the disk at one time, reducing interaction with the disk and speeding up the writing speed. This improves the efficiency of writing to disk.
- Other examples may also use other caching mechanisms, for example, writing the block to the disk every time a block is written, etc., which is not limited by the present invention.
- the block when a block is written to a file, the block can be compressed first and the compressed block can be written to save storage space.
- the key in the record can be a fixed length key or a variable length key.
- the length of the fixed length key is equal to the preset value, and the length of the variable length key has no fixed value.
- the same type of key and its value can be stored in the same data block, for example, the data block 1 stores the fixed length key and its value; in the data block 2, the variable length key and its value are stored, etc. . There can be more than one number of data blocks that hold the same type of key and its value.
- FIG. 3 is a flowchart of a data storage method according to an embodiment of the present invention. As shown in Figure 3, the method can include the following steps.
- Step S11 storing a fixed length key and a value thereof in the first data block, where the storing fixed length key comprises: storing The common prefix of each fixed length key is stored, and the remaining parts after the common prefix is removed by each fixed length key are respectively stored.
- Step S12 storing a variable length key and a value thereof in the second data block, wherein the storage variable length key comprises: a full length storage variable length key of the reference key type, and performing prefix compression on the variable length key of the prefix compression key type.
- each data block can be preset according to the size of the data block. This preset size can be set by the user or otherwise determined. All data blocks can be set to have the same size, or they can be individually sized for the first data block and the second data block.
- only the common prefix of each fixed length key can be stored in the first data block, and the remaining part after the common prefix is removed (ie, the different part, the remainder) ).
- FIG. 4 is a schematic diagram of a storage structure of a data block for storing a record having a fixed length key according to an embodiment of the present invention.
- the data block is composed of a block header 401, a data offset 402, a data 403, a remainder key 404, and a key prefix (common prefix key). ) 405 These parts (also called area, field) are composed.
- the block header 401 stores information of the data block, which may be defined as needed, for example, may include the length of the common prefix stored by the key common prefix 405.
- the key common prefix 405 refers to the common prefix portion of all keys stored in the data block.
- the data offset 402, the data 403, and the remaining portion of the key 404 respectively include a plurality of memory cells (cells), and each memory cell of each portion corresponds to one record.
- Each record in the data block corresponds to one memory cell of the remaining key portion 404, one memory cell of the data offset 402, and one memory cell of the data 404, that is, three memory cells corresponding to one record in the data block.
- the remaining portion of the key 404 stores the remaining portions of each key after the common prefix is removed.
- the data offset 402 stores the offset of the storage location of each data (i.e., the value in each record) relative to a starting location in the block for locating the values in the block.
- Data 403 stores the value in each record.
- FIG. 5 is a schematic diagram of a process of writing a fixed length key to a data block according to an embodiment of the present invention.
- the temporary storage space 501 may be allocated in the memory for the block header 401, the data offset 402, the data 403, the key remaining portion 404, and the key common prefix 405 of the first data block as described in FIG. , 502, 503, 504, 505.
- the common prefix is written to the key temporary storage space 505, and the length of the common prefix is written to the temporary storage space 501.
- the same portion as the common prefix is removed from the key prefix of the record, and the remaining portion of the key is written to the storage unit in the temporary storage space 504 corresponding to the record.
- the value of the record is written to the storage unit in the temporary storage space 503 corresponding to the record, and the offset of the storage unit in the temporary storage space 503 is written into the storage unit corresponding to the record in the temporary storage space 502.
- the sum of the sizes of the temporary storage spaces 501, 502, 503, 504, 505 described above can be calculated each time a record is written. If the sum is greater than or equal to the preset data block size (size), the data block is already full, and the current record is not written, that is, the current record storage fails.
- the temporary storage spaces 502, 503, 504, 505 are respectively written into a storage buffer (write buffer) according to the storage structure of the data block (for example, according to the storage structure shown in FIG. 4), and are temporarily in the block header.
- the storage space 501 records information of each part of the writing, such as the position of the data offset 402, the data 403, the key remaining portion 404, and the key common prefix 405 in the data block (e.g., offset, etc.).
- the temporary storage space 501 is written to the storage buffer. At this point, the writing process of the first data block is completed. If there are still records with fixed length keys that have not yet been written, the temporary storage space can be reassigned and the writing process of the above data blocks can be repeated, and a new data block has been generated.
- the method for determining the common prefix of the fixed length key may be determined according to actual needs. For example, the number of records that can be stored in the data block can be predicted based on the size of the allocated temporary storage space for storing the remaining portion of the key 404, and then the corresponding number of recorded keys are read to obtain a common prefix for the keys.
- the first data block when it is determined that the fixed length key and its value are stored in the first data block, it is determined that the first data block is full.
- the first data block may be compressed to reduce the storage space of the data.
- the size of the first data block after compression is less than the available space of the storage buffer, the compressed first data block is written into the storage buffer.
- the variable length key when storing a variable length key, can be divided into a base key and a prefix compressed key. For the datum key, a full amount of storage is performed, that is, the full key is stored.
- the prefix compression key the length of the same part of the prefix of the current prefix compression key and its reference key (hereinafter referred to as the same prefix or the same prefix string) is stored, and then the remaining part after the same prefix is removed by the prefix compression key (hereinafter) This step is simply referred to as prefix compression, or variable length compression.
- FIG. 6 is a schematic diagram of a storage structure of a data block for storing variable length keys according to an embodiment of the present invention.
- the data block is composed of a block header 601, a data offset 602, a base offset 603, a key offset 604, and a data. 605, key (key) 606.
- the block header 601 stores information of the data block, which can be defined as needed.
- Data offset 602, reference offset 603, key offset 604, data 605, and key 606 each include multiple saves A storage unit (cell), each storage unit of each part corresponds to one record. Each record in the data block has a corresponding one of the data offset 602, the reference offset 603, the key offset 604, the data 605, and the key 606, that is, a record corresponding to the five found in the data block portion.
- the information of the storage unit can restore the record ( key+value ).
- the data offset 602 stores the offset of the storage location of each data (i.e., the value in each record) relative to a starting location in the block for locating data in the block.
- Data 605 stores the value in each record.
- the reference offset 603 stores the offset of the storage position of the reference key of each key in the data block from a start position (i.e., the key offset of the reference key) for positioning the reference key of each key in the block.
- the reference key's offset is set to zero.
- the key offset 604 stores the offset of the storage location of each key in the data block relative to a starting position for locating the keys in the block.
- the key 606 stores the length of the same prefix of each key and its reference key and the remaining part after the same prefix is removed.
- the key of the first record written in the current data block can be used as a reference key, and the keys of the subsequent record are prefix-compressed according to the reference key.
- performing prefix compression on the variable length key of the prefix compression key type includes: storing a length of the same prefix of the variable length key of the prefix compression key type and its reference key, and a variable length key storing the prefix compression key type to remove the The remainder after the same prefix.
- Prefix compression in the data block can reduce the storage space of the data and increase the disk utilization of the machine.
- a data block can also have one or more reference keys.
- the variable length key can be divided into a reference key type and a prefix compression key type by a preset method, that is, a variable length key is determined as a reference key or a prefix compression key.
- variable length key can be divided into a base key type and a prefix compression key type according to a preset threshold length (threshold_len).
- threshold_len a preset threshold length
- the current variable length key and the current reference key ie, the reference key of the previous key; when the current key is determined as the reference key, in order to distinguish the current key from the reference key of the previous key, the reference key of the previous key is also referred to Performing a prefix comparison for the previous reference key.
- the current variable length key is the reference key type; the current variable length key is compared with the current reference key for prefix comparison, if the same prefix If the length of the string is greater than the threshold length, it is determined that the current variable length key is a prefix compression key type.
- the threshold length can be set as needed.
- variable length key can also be divided into the reference key type and the former according to a preset threshold difference.
- Embed the compression key type Calculate the length of the same prefix string of the current key and the previous key, recorded as the first length. Calculate the length of the common prefix string of the current key and the current datum key, and record it as the second length.
- the key offset of the current reference key is used as the reference offset of the current key; if the first length is greater than or equal to The sum of the second length and the threshold difference determines that the current key is the reference key, the current key is not compressed, and the reference offset of the current key is set to zero.
- FIG. 7 is a schematic diagram of a process of writing a variable length key into a data block according to an embodiment of the present invention.
- a data block in the format shown in FIG. 6 when a data block in the format shown in FIG. 6 is recorded, it may be a block header 601, a data offset 602, a reference offset 603, a key offset 604, and a data 605 in the memory.
- the keys 606 allocate temporary storage spaces 701, 702, 703, 704, 705, and 706, respectively.
- the key of the first record is used as the reference key. Since the reference key is not compressed, the complete key is stored in the temporary storage space 706 for storing the key.
- the storage unit (for example, the first record can correspond to the first storage unit of each temporary storage space).
- the reference offset of the key is stored in the temporary storage space 703 for storing the reference offset, and the storage unit corresponding to the record.
- the datum offset of the datum key can be set to 0.
- the recorded data i.e., value
- the offset of the current key in the temporary storage space 706 (relative position relative to the start position of the area) is written to the storage unit corresponding to the record in the temporary storage space 704 for storing the key offset.
- the offset of the currently recorded data in the temporary storage space 705 (relative position relative to the start position of the area) is written to the storage unit corresponding to the record in the temporary storage space 702 for storing the data offset.
- the key offset of the first record and its corresponding data offset can both be set to zero.
- each subsequent record is written to the data block, it is calculated whether the currently recorded key can become the reference key. If it is determined that the current key is the reference key, it is determined that the reference offset of the current key is 0, and the complete key (because the reference key is not compressed) and the reference offset are stored in the temporary storage spaces 706, 703 corresponding to the current record. Storage unit.
- the key offset of the current reference key is written into the storage unit corresponding to the current record in the temporary storage space 703, and the key is prefix-compressed, that is, the length of the same prefix of the current key and the reference key ( Var int indicates) and the different parts (that is, the remaining part after the current key is removed from the same prefix) are stored in the storage unit corresponding to the current record in the temporary storage space 706.
- the currently recorded key offset, data offset, and data are stored in the storage locations corresponding to the current record in the temporary storage spaces 704, 702, and 705, respectively. In this way, the writing of a variable length key is completed.
- the method of determining the current datum key can include: obtaining a datum offset of the previous record to determine a current datum key.
- the method of determining the current reference key may include: saving information of the current reference key, such as a key offset of the current reference key, during the writing of the record. For example, you can assign a temporary storage space to temporarily store information about the current datum key. This will give you the current datum key based on the saved current datum key information.
- the information of the saved current reference key can be updated to the information of the current key, such as the key offset of the current key.
- Other examples can also use other methods to determine the current datum key.
- the sum of the sizes of the temporary storage spaces 701, 702, 703, 704, 705, 706 described above can be calculated each time a record is written. If the sum is greater than or equal to the preset data block size (size), the data block is already full, and the current record is not written, that is, the current record storage fails.
- the temporary storage spaces 702, 703, 704, 705, 706 are respectively written into a storage buffer (write buffer) according to the storage structure of the data block (for example, according to the storage structure shown in FIG. 6), and are in the temporary storage space.
- 701 records information of each part of the writing, such as data offset 602, reference offset 603, key offset 604, data 605, position of key 606 in the data block (eg, offset, etc.).
- the temporary storage space 701 is written to the storage buffer. At this point, the writing process of the second data block is completed. If there are still records with variable length keys that have not yet been written, the temporary storage space can be reassigned and the writing process of the above data blocks can be repeated, and a new second data block has been generated.
- each time a record is written or when the data block is full the sum of the size of the used space of the block storage area (ie, the above storage buffer) and the size of each temporary allocated space area can be calculated. If the size of the storage buffer is greater than or equal to the size of the storage buffer, the block storage area is already full, and the current record or data block is not written, that is, the current record or the data block storage fails; if the sum is smaller than the storage buffer. The size of the temporary offset, the key offset, the data offset, and the temporary storage space in which the current record is stored is written to the unused area in the above storage buffer.
- the second data block when it is determined that the variable length key and its value are stored to the second data block, the second data block may be compressed.
- the size of the second data block after compression is smaller than the unused space of the storage buffer, the compressed second data block is written into the storage buffer. Compressing each data block can effectively reduce the storage space of the data.
- the compression result of all the storage units (cells) corresponding to the same key may be stored, that is, the compressed value is stored, so that Subtract 'j, the storage space of the data.
- All blocks can be written to a file in sstable format after the records are full.
- the block order can be sequentially written into the sstable file according to the order of the stored keys. If a block of data resides in memory, the block is held in the file writer; if the block is not resident, it is written directly to the sstable file.
- the data block resident in memory needs to be managed by a file writer, and the user needs to construct a file reader (for example, s stable reader) to read.
- the file reader method (such as the As sign method, which can be implemented by a function) is called to swap out the data blocks in the written file.
- File writers and file readers are an interface provided to the user to manage blocks of resident memory.
- the metadata of the file includes information about the Bloom filter, block index, file header, and file header length.
- FIG. 8 is a schematic diagram of a storage structure of a bloom filter according to an embodiment of the present invention.
- the Bloom filter is a one-dimensional array or vector, for example expressed as ⁇ vl, v2, ..., vn ⁇ . Each of these elements corresponds to the information of a record's key stored in the file.
- the Bloom filter information of the fixed length key and its value is written into the Bloom filter; when the determination is changed When the long key and its value are stored until the second data block succeeds, the Bloom filter information of the variable length key and its value is written into the Bloom filter.
- the Bloom filter information of the key can be obtained according to a preset calculation method, for example, the number of hash calculations on the key can be preset.
- the Bloom filter information indicates the position of the key in the Bloom filter, and the value of the position indicates whether the key is present in the file. In one example, when the Bloom filter information of a key is 1, it indicates that the key may exist in the file; when the Bloom filter information of a key is not 1 (for example, 0 or null), the key is represented. Does not exist in this file.
- the Bloom filter information of the key can be obtained according to a preset calculation method, and the value indicated by the Bloom filter information is obtained from the Bloom filter, and then the value can be judged according to the obtained value. Whether the key is likely to exist in the file.
- the Bloom filter When all the records have been written, the Bloom filter is written. After all the data blocks have been written to the file, the Bloom filter is written to the file.
- a temporary storage space can be allocated in memory for temporary storage of the Bloom filter. When all the records have been stored, the Bloom filter in the temporary storage space is written to the sstable file on the disk.
- FIG. 9 is a schematic diagram of a process of writing block index information into an index block according to an embodiment of the present invention.
- the index block includes an index block header 901, a data offset 902, a reference offset 903, a block index offset 904, a data 905, and a block index key 906.
- the index block header 901 is configured to store information of the index block, including a starting position of each area in the index block, Length, etc.
- Each data block in the file corresponds to a storage unit in the data offset 902, data 905, reference offset 903, block index offset 904, and block index key 906 of the index block, respectively.
- the position of the data block in the file can be found based on the information of one of the data stored in the data offset 902, the data 905, the reference offset 903, the block index offset 904, and the block index key 906.
- the index block is stored in a similar manner to the data block storing the variable length key (ie, the second data block), and can also be regarded as storing a series of variable length key+values, and each variable length key+value corresponds to one data block. .
- Each key stored in the index block corresponds to the end key of the last storage unit ( cell ) of each data block, that is, the complete key, for example, may include a row key (row key) + a column family ID (cfid ) + column ( column ).
- the key is the complete form of the last key in the first data block, that is, the common prefix + the remaining part of the key;
- the key is the complete form of the last key in the second data block, that is, when the last key is the reference key, the key is the last key in the key 606, and the last key is the prefix compression key, the key is based on the last key The base offset obtained by the base offset, the same prefix length stored in the key as the reference key, and the complete prefix compression key recovered by the different parts.
- the last successful write key can be recorded while the record is being written to the data block. When the data block is full, the last successfully written key of the record is written to the index block.
- Each value stored in the index block corresponds to the position of each data block in the file, such as the offset (offset length) and the current row key length (row key length).
- the position of the data block in the file can be written in the index block when the data block is written to the file.
- the data offset 902 stores the value of the value of the data block (ie, the position of the data block in the file) in the index block;
- the data 905 stores the position (offset) of the corresponding data block in the file;
- the block index offset 904 stores the storage location of the last key of the corresponding data block in the index block;
- the reference offset 903 stores the data The end key of the block in which the reference key in the index block (not the reference key of the end key in the data block) is stored in the index block;
- the block index key 906 stores the end key of the data block and the index block in the index block
- the common prefix length of the base key in the middle and the end key remove the remainder of the common prefix.
- each time a data block is written the key of the last record of the data block is written to the index block structure.
- Each time a block of data is written to a file the location information of that block in the file is written to the index block structure.
- the process of writing the corresponding end key and value of each data block into the index block is similar to the process of writing the variable length key and its corresponding value into the second data block, and details are not described herein again.
- the index block write is completed. After the Bloom filter is written to the file, the index will be The block is written to the file.
- a temporary storage space can be allocated in memory for temporary storage of index blocks. When all data blocks and Bloom filters are written to files on disk, the index blocks in the temporary storage space are written to the files on the disk. in.
- each data block, Bloom filter, and index block are sequentially written to the file, and the position information of the Bloom filter and the index block is recorded to the header (header).
- FIG. 10 is a schematic diagram of a storage structure of a file header according to an embodiment of the present invention.
- the file header stores information about the file and the offset and length of each part, which helps to quickly locate each part, eliminating the wasted system resources caused by traversing from the file.
- the file header structure of an example can be as shown in Figure 10, and the header can be set as follows.
- KVtype which indicates the KV type of the record written by the file.
- KV type includes two types of key types (variable length key, fixed length key) and two value types (variable length value, fixed length value).
- Threshold length that is, a parameter for determining whether the variable length key is used as the reference key or a threshold value indicating the length of the common prefix string with the current reference key in the foregoing example.
- Threshold difference which is the (7) file id number used to determine whether the variable length key is used as the reference key in the previous example, indicating the identifier of the file
- the length after compression indicating the length of the sstable file after compression
- the size of the data block indicating the size of each data block, that is, the parameter for determining whether the data block is full, can be set by the user
- (22) sstable creates a timestamp indicating the creation time of the sstable file
- FIG. 11 is a schematic flowchart of writing a record to a file according to an embodiment of the present invention. As shown in the record data writing process of Figure 11, the method may include the following steps.
- Step S201 Write a record into a current data block (block) (also referred to as a block for short), where a data block for storing a fixed length key and a value thereof is written for a record corresponding to the fixed length key, corresponding to the variable length key The record is written to a block of data that stores the variable length key and its value.
- block also referred to as a block for short
- the storage fixed length key includes: in a data block that exclusively stores the fixed length key and its value, uniformly stores the common prefix of each fixed length key, and separately stores the remaining parts after each fixed length key removes the common prefix;
- the storage variable length key includes: a full length storage variable of the reference key type, and a prefix compression of the variable length key of the prefix compression key type.
- Step S202 It is judged whether the record writing in step S201 is successful, and if yes, step S211 and subsequent steps are performed; otherwise, step S203 and subsequent steps are performed.
- Step S203 Determine whether the current data block is empty. If yes, return a parameter error, and exit the process. If not, perform step S204 and subsequent steps.
- Step S204 Compress the current data block.
- Step S205 determining whether the current data block compression is successful, and if not, returning a compression error, and Exit the process, if successful, perform step S206 and its subsequent processes.
- Step S206 It is judged whether the current buffer (write_buffer) is empty and the current block is larger than the size of the buffer after compression. If yes, step S208 and subsequent steps are performed; otherwise, step S207 and subsequent steps are performed.
- Step S207 Determine whether the remaining space can be written into the current data block, and if yes, perform step S210 and subsequent steps, otherwise perform step S209 and subsequent steps.
- Step S208 Re-apply the buffer space, and perform step S210 and subsequent steps.
- Step S209 Start dump, and end the process.
- Step S210 Write the current data block into the buffer, reserve the index, cache the data block, and reset the current data block.
- Step S211 Write the Bloom filter information of the current data block.
- the stored records may be pre-arranged, for example, sorted according to the ASCII code of the recorded keys. In this way, when reading data, the query data can be quickly located according to the index block and the order of the internal data of the data block.
- the following description explains a method of reading data from a file by taking a record in which a file is sorted in advance according to the order of the keys.
- basic information such as a header length, a header, an index block, and a Bloom filter may be sequentially read.
- FIG. 12 is a schematic diagram of a file reading method according to an embodiment of the present invention. As shown in Figure 12, the method can include the following steps.
- Step S31 Read a file header length (field) of the file, and obtain a length of a header area.
- Step S32 The file header area is read according to the length of the file header.
- Step S33 Read an index block area according to information in a file header area.
- the starting position of the index block in the file may be determined according to the offset of the index block in the file header, and then starting from the above starting position according to the compressed length of the index block in the file header or the length before the index block is compressed. Read out the index block.
- Step S34 reading a bloom filter area according to information in the file header area. For example, you can determine the starting position of the Bloom filter in the file based on the offset of the Bloom filter in the file header. The Bloom filter in the file is then read from the above starting position based on the Bloom filter length in the header.
- step S35 the process of notifying the upper layer to open the sstable file is completed.
- a record can be found in a file by key (e.g., receiving a key to be searched for by the user). Since the order in which the records are stored in the file is sorted by key, the search process based on the keys can be searched according to the binary method to locate the records.
- the first layer of filtering may be performed according to the Bloom filter, that is, the key corresponding to the Bloom filter bitmap is calculated according to the number of hashes in the file header.
- the position, and then the value of the position in the Bloom filter bitmap determines whether the key exists in the file. For example, if the value is not 1, the key does not exist in the file; if it is 1, the key may exist in the file, and the file is found in the file according to the end key of each data block stored in the index block. The record of the key.
- the binary block can be used to find the data blocks in the index block where the target key may be stored.
- the position and length of the key offset area in the index block can be read from the index block header. Gets the key offset at a selected position in the key offset area (eg, a key offset in the middle of the key offset area). Get the key corresponding to the key offset, the reference offset of the key, and restore the key.
- the key stored in the index block is the last key in each data block, and the value is the position of each data block in the file.
- the recovered key is compared with the target key. If they are equal, the data block corresponding to the key is directly read, and then the last key stored in the data block and its corresponding value are read, that is, the record to be found.
- the above search process determines the range of positions of the key offset to be searched based on whether the restored key is greater than the target key. For example, if the keys are sorted in ascending order, if the recovered key is larger than the target key, a key before the recovered key is obtained, and it is determined whether the previous key is equal to the target key, and if it is equal to the target key, Obtaining the record of the previous key is the record to be found.
- the target key is continuously searched according to the above method between the position of the end of the key offset area and the position of the latter key.
- the target key When judging the size of the target key is between When two adjacent keys in the block are referenced, it is determined that the target key may be stored in the data block in which the next key is located, the data block may be read, and then the target key is searched for in the data block.
- data when determining a data block that a target key may store, data may be acquired according to a corresponding data offset of the data block in the index block, that is, the location of the data block in the file. The data block is then read from the corresponding location of the file based on the location and the block size in the file header.
- the storage sequence number of the key offset in the key offset area is its corresponding
- the reference offset and the data offset are stored in the reference offset area and the data offset area. There are many ways to get the corresponding reference offset and data offset based on the key offset.
- the number of the key offset in the key offset area can be calculated according to the position of the key offset in the key offset area, and then the reference can be based on the reference.
- the length of the offset and the number of the reference offset corresponding to the key are obtained in the reference offset region, thereby obtaining the position of the reference key of the key from the reference offset.
- the data offset corresponding to the key and the data acquisition method are the same.
- the serial number of each key can be stored in each key offset, reference offset, and data offset.
- the record can be obtained in the data block (or the data block is The sequence number of each piece of information in the index block, and then the reference offset corresponding to the sequence number is obtained in the reference offset area.
- the method of obtaining the data offset corresponding to the key is the same as above.
- the method of finding the target key in the read data block is similar to the above method of finding the target key in the index block.
- the dichotomy that is, select a position from the key storage area (for example, the position in the middle of the area), obtain the key stored in the position, compare it with the target key, and then further Narrow your search.
- the start position and length of the key offset area are read from the head of the data block.
- Select a position from the key offset area such as the middle position, to obtain the key offset of the position, and obtain the key according to the key offset.
- the recovered key is compared with the target key.
- the restored key is equal to the target key, the record corresponding to the key is the record to be found; if the restored key is larger than the target key, then the key offset area starts from Selecting a new position from the start position and the selected position, repeating the above search process; if the recovered key is smaller than the target key, selecting a new position at the end position of the key offset area and the selected position, repeating The above search process.
- the method of restoring a key from a storage location (key offset) in a data block is related to the storage structure of the two data blocks described above.
- the position and length of the common prefix can be read from the data block header, from the data block. Reading the common prefix; Reading the position of the remaining part of the key from the head of the data block, determining the storage position of the remaining part of the key corresponding to the key according to the key offset, reading the remaining part of the key of the key, and adding the common prefix to the front of the remaining part of the key to recover The key is out.
- variable length key the variable length key can be judged from the value of the key-value type (KV type) in the file header
- the position and length of each area can be read from the data block header, from the data block. Reading the key and the reference offset, and then reading out the reference key, reading out the same prefix length and different parts in the key, and cutting the string of the same prefix length from the reference key prefix, plus the different parts of the key, This button was restored.
- the method of obtaining the corresponding reference offset according to the key offset of a key in the data block storing the variable length key is obtained by the key offset according to a key (the end key of the data block) in the index block.
- the method of the reference offset is the same and will not be described again.
- the data offset corresponding to the key can be obtained, and then the position of the data part is obtained from the head of the data block, and the value corresponding to the key is read according to the data offset, and the key can be obtained according to the key (key ) and the value ( value ) restores the record (key + value ).
- the method of obtaining the corresponding data offset according to the key offset is the same as the method of obtaining the data offset corresponding to the key according to the key offset of a key (the end key of the data block) in the index block, and will not be described again.
- the read buffer when the record is read, the read buffer may be set, and when it is determined that the read length of the read operation is smaller than the buffer and the current data block is not the last data block, the next data block is fetched. The starting address and record the length of the next data block and continue reading until the read length is greater than the length of the buffer.
- Other examples can also use other caching methods to read files.
- the user searches for the position of the user key (that is, the target key input by the user) according to the user's request, loads the block block where the key is located, and reads the information required by the user.
- the position of the user key that is, the target key input by the user
- Prefetch read is to read multiple block blocks at one time
- delayed read is to aggregate multiple reads, that is, after receiving multiple read requests, then read multiple block blocks at one time.
- the start address of the start block is obtained. If the current block block is the last block of the SSTable file, the length of the last block is directly obtained and returned after recording; if it is not the last block of the SSTable file, the start address and length of the next block block are obtained.
- the read length is less than the length of the Read Buffer, and the current block block is not the last block block, the start address of the next block block is taken, and the length of the next block is recorded until the read length is long.
- the degree is greater than the length of the Read Buffer.
- the block block information for each read is aggregated, and then the start address and the read mode of the start block are similar to the prefetch read mode.
- Prefetching and deferred reading are essentially one-time reading of multiple block blocks, minimizing seek and rotation when reading a disk, and speeding up disk reading.
- FIG. 13 is a schematic diagram of a process of reading and recording according to an embodiment of the present invention. As shown in the recording data reading process shown in Fig. 13, the method may include the following steps.
- Step S41 Acquire a starting address of the starting data block.
- Step S42 determining whether the current data block is the last block, if yes, performing step S43 and subsequent steps, otherwise performing step S44 and subsequent steps.
- Step S43 Obtain the length of the last block of data and record, then return, and exit this process.
- Step S44 Acquire a start address of the next data block and a length of the current data block.
- Step S45 It is judged whether the prefetching operation is performed, if not, the flow is exited, and if it is executed, step S46 and subsequent steps are performed.
- Step S46 determining whether the read length is less than the maximum read size (KMaxReadSize) and not the last block of data, if yes, performing step S47 and subsequent steps, otherwise performing step S48 and subsequent steps.
- KMaxReadSize the maximum read size
- Step S47 Obtain the starting address of the next block of data, and record the length of the next block.
- KMaxReadSize maximum read size
- Step S49 Obtain the last block length and record.
- FIG. 14 is a structural diagram of a data storage device according to an embodiment of the present invention. As shown in Fig. 14, the apparatus includes a fixed length key storage unit 1401 and a variable length key storage unit 1402.
- the fixed length key storage unit 1401 is configured to store the fixed length key and the value thereof in the first data block, wherein the storing the fixed length key comprises: uniformly storing a common prefix of each fixed length key, and storing each fixed length key separately Remove the remainder after the common prefix;
- variable length key storage unit 1402 is configured to store the variable length key and the value thereof in the second data block, wherein the storage variable length key comprises: a full length storage variable length key of the reference key type, and a variable length of the prefix compression key type The key performs prefix compression.
- FIG. 15 is a structural diagram of a data storage device according to an embodiment of the present invention. As shown in Figure 15, the device includes The long key storage unit 1501 and the variable length key storage unit 1502 have functions similar to the fixed length key storage unit 1401 and the variable length key storage unit 1402 shown in FIG.
- the apparatus can also include a key type distinguishing unit 1503.
- the key type distinguishing unit 1503 is configured to divide the variable length key into a reference key type and a prefix compression key type according to a preset threshold length and a threshold difference; wherein: the current variable length key is compared with the previous reference key by a prefix, if the same prefix If the string is smaller than the threshold length, it is determined that the current variable length key is a reference key type; the current variable length key is compared with the previous reference key by a prefix, and if the same prefix string is greater than the sum of the threshold length and the threshold difference, then determining The current variable length key is a reference key type; the current variable length key is compared with the previous reference key by a prefix, and if the same prefix string is greater than the threshold length and smaller than the sum of the threshold length and the threshold difference, determining the current change The long key is the prefix compression key type.
- the fixed length key storage unit 1501 is configured to store a variable length key of the prefix compression key type, store a length of a common prefix of the variable length key of the prefix compression key type and a previous reference key, and store the prefix compression key.
- the variable length key of the type removes the remainder of the common prefix.
- the apparatus can also include an inter-block compression unit 1504.
- the inter-block compression unit 1504 is configured to compress the first data block when it is determined that the fixed length key and its value are stored in the first data block; when it is determined that the variable length key and its value are stored in the second data block The second data block is compressed.
- the apparatus can also include a storage buffer unit 1505.
- the storage buffer unit 1505 is configured to allocate a storage buffer, and when the size of the first data block after compression is smaller than the storage buffer, write the compressed first data block into the storage buffer; When the size of the second data block after compression is smaller than the storage buffer, the compressed second data block is written into the storage buffer.
- the apparatus can also include a Bloom filter 1506.
- the Bloom filter 1506 is configured to write the Bloom filter information of the fixed length key and its value into the first data block when it is determined that the fixed length key and its value are successfully stored; when it is determined that the variable length key is When the value and its value are stored until the second data block succeeds, the Bloom filter information of the fixed length key and its value is written therein.
- the apparatus can also include a read buffer unit 1507.
- the read buffer unit 1507 is configured to set a read buffer, and when it is determined that the read length of the read operation is smaller than the buffer and the current data block is not the last data block, the next data block is fetched. Start the address and record the length of the next data block and continue reading until the read length is greater than the length of the buffer.
- the apparatus can also include a data block index storage unit 1508.
- the data block index storage unit 1508 is configured to store the first data block and the last unit of the second data block The full key of ( cell ), and stores the offset of the first data block and the second data block in the data storage file and the length of the current row key.
- the apparatus can also include a key type distinguishing unit.
- the key type distinguishing unit is configured to compare the current variable length key with the current reference key, and if the same prefix string is smaller than the threshold length, determine that the current variable long key is a reference key type; if the same prefix string If the threshold length is greater than or equal to, the current variable length key is determined to be a prefix compression key type.
- the key type distinguishing unit is configured to compare a current variable length key with a previous key stored in the second data block, and if the same prefix string is greater than or equal to a sum of the threshold length and a threshold difference, Then determining that the current variable length key is a reference key type; if the same prefix string is smaller than a sum of the threshold length and the threshold difference, determining that the current variable length key is a prefix compression key type.
- the key type distinguishing unit is configured to obtain a first length of the same prefix string of the current variable length key and the previous key stored in the second data block, and obtain the same prefix of the current variable length key and the current reference key. a second length of the string, if the first length is greater than or equal to a sum of the second length and the threshold difference, determining that the current variable length key is a reference key type, if the first length is smaller than the second length and the threshold difference And, it is determined that the current variable length key is a prefix compression key type.
- variable length key storage unit is configured to store a variable length key of a prefix compression key type, store a length of a common prefix of the variable length key of the prefix compression key type and a current reference key, and store the prefix compression key type.
- the variable length key removes the remainder of the common prefix.
- the apparatus can also include a data block index storage unit.
- the data block index storage unit is configured to store the location information of the first data block in the file to the index block when the first data block is stored in the file; when the second data block is stored in the file And storing location information of the second data block in the file into an index block; storing the index block into the file.
- the device can also include a Bloom filter.
- the Bloom filter is used to write the Bloom filter information of the fixed length key and its value into the Bloom filter when it is determined that the fixed length key and its value are successfully stored in the first data block; When the variable length key and its value are stored to the second data block successfully, the Bloom filter information of the fixed length key and its value is written into the Bloom filter; the Bloom filter is stored in the In the file.
- the apparatus can also include an inter-block compression unit.
- the inter-block compression unit is configured to compress the first data block or the second data block, and store the compressed first data block or the second data block into a file.
- the apparatus can also include a value compression unit.
- the value compression unit is configured to: store the data to the first data block, and: compress the value corresponding to the fixed length key, and provide the compressed value to the fixed length key storage unit for storage to the first data block;
- Storing the variable length key and its value to the second data block includes: compressing the value, and providing the compressed value to the variable length key storage unit for storage to the second data block.
- the fixed length key storage unit is configured to store the fixed length key and its value into the first data block according to an order sorted according to the fixed length key in advance;
- the data block index storage unit is configured to store a last fixed length key stored in the first data block and a starting position and length of the first data block in the file into the index block;
- variable length key storage unit is configured to store the variable length key and its value into the second data block according to an order sorted according to the variable length key in advance;
- the data block index storage unit is configured to store a last variable length key stored in the second data block and a starting position and length of the second data block in the file into the index block.
- the device shown in Figure 14 can be integrated into the hardware entities of various communication networks.
- the key sorting-based data storage device proposed by the embodiments of the present invention can be embodied in various forms.
- a standard-formatted application interface can be used to write a key-based data storage device as a plug-in in a storage server, or it can be packaged as an application for users to download and use.
- a plug-in When written as a plug-in, it can be implemented as a variety of plug-ins such as ocx, dll, cab, etc.
- the key sorting-based data storage device proposed by the embodiment of the present invention may also be implemented by a specific technology such as a Flash plug-in, a RealPlayer plug-in, an MMS plug-in, a MIDI staff plug-in, or an ActiveX plug-in.
- the key ordering-based data storage method proposed by the embodiment of the present invention can be stored on various storage media by means of instructions or a storage mode stored in the instruction set.
- These storage media include, but are not limited to, floppy disks, optical disks, DVDs, hard disks, flash memories, USB flash drives, CF cards, SD cards, MMC cards, SM cards, Memory Sticks, xD cards, and the like.
- the key sorting-based data storage method proposed by the embodiment of the present invention may be applied to a Nand flash-based storage medium, such as a USB flash drive, a CF card, an SD card, an SDHC card, an MMC card, or an SM card. , memory stick, xD card, etc.
- a Nand flash-based storage medium such as a USB flash drive, a CF card, an SD card, an SDHC card, an MMC card, or an SM card.
- memory stick xD card, etc.
- the fixed length key and the value thereof are stored in the first data block, wherein the storage fixed length key comprises: uniformly storing a common prefix of each fixed length key, and separately storing each fixed The long key removes the remaining portion after the common prefix; the variable length key and its value are stored in the second data block, wherein the storage variable length key comprises: a full length storage variable of the reference key type, and a change of the prefix compression key type
- the long key performs prefix compression. It can be seen that after applying the embodiment of the present invention, the prefix compression method is used in the variable length key data block, and Selecting each data block for compression can effectively reduce the storage space of the data and improve the utilization of the machine disk.
- the embodiment of the present invention uses a data block as a storage unit, and thus can facilitate the strength of 10 and parsing.
- the embodiment of the present invention can quickly locate the queried data according to the index block and the order within the data block, thereby improving query efficiency.
- the hardware modules in the various embodiments may be implemented mechanically or electronically.
- a hardware module can include specially designed permanent circuits or logic devices (such as dedicated processors such as FPGAs or ASICs) for performing specific operations.
- the hardware modules may also include programmable logic devices or circuits (e.g., including general purpose processors or other programmable processors) that are temporarily configured by software for performing particular operations.
- programmable logic devices or circuits e.g., including general purpose processors or other programmable processors
- the specific use of mechanical means, or the use of dedicated permanent circuits, or the use of temporarily configured circuits (such as software configuration) to implement hardware modules can be determined based on cost and time considerations.
- the present invention also provides a machine readable storage medium storing instructions for causing a machine to perform a method as described herein.
- a system or apparatus equipped with a storage medium on which software program code implementing the functions of any of the above-described embodiments is stored, and a computer (or CPU or MPU) of the system or apparatus may be stored Reading and executing the program code stored in the storage medium.
- some or all of the actual operations may be performed by an operating system or the like operating on a computer based on instructions of the program code. It is also possible to write the program code read out from the storage medium into a memory set in an expansion board inserted in the computer or into a memory set in an extension unit connected to the computer, and then install the program based on the instruction of the program code.
- the CPU or the like on the expansion board or the expansion unit performs part and all of the actual operations, thereby realizing the functions of any of the above embodiments.
- Storage medium embodiments for providing program code include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM ⁇ CD-R, CD-RW ⁇ DVD-ROM ⁇ DVD-RAM ⁇ DVD-RW, DVD+RW), Tape, non-volatile memory card and ROM.
- the program code can be downloaded from the server computer by the communication network.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Computer Security & Cryptography (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/652,002 US9377959B2 (en) | 2012-12-14 | 2013-12-02 | Data storage method and apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210541207.0 | 2012-12-14 | ||
CN201210541207.0A CN103870492B (zh) | 2012-12-14 | 2012-12-14 | 一种基于键排序的数据存储方法和装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014090097A1 true WO2014090097A1 (zh) | 2014-06-19 |
Family
ID=50909035
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2013/088286 WO2014090097A1 (zh) | 2012-12-14 | 2013-12-02 | 一种数据存储方法和装置 |
Country Status (3)
Country | Link |
---|---|
US (1) | US9377959B2 (zh) |
CN (1) | CN103870492B (zh) |
WO (1) | WO2014090097A1 (zh) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111316255A (zh) * | 2017-11-20 | 2020-06-19 | 华为技术有限公司 | 数据存储系统以及用于提供数据存储系统的方法 |
US10896022B2 (en) | 2017-11-30 | 2021-01-19 | International Business Machines Corporation | Sorting using pipelined compare units |
US10936283B2 (en) | 2017-11-30 | 2021-03-02 | International Business Machines Corporation | Buffer size optimization in a hierarchical structure |
CN112486910A (zh) * | 2020-11-23 | 2021-03-12 | 天津津航计算技术研究所 | 一种快速解析海量数据文件的方法 |
US11048475B2 (en) | 2017-11-30 | 2021-06-29 | International Business Machines Corporation | Multi-cycle key compares for keys and records of variable length |
US11354094B2 (en) | 2017-11-30 | 2022-06-07 | International Business Machines Corporation | Hierarchical sort/merge structure using a request pipe |
Families Citing this family (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103927124B (zh) * | 2013-01-15 | 2018-03-13 | 深圳市腾讯计算机系统有限公司 | 以Hash方式组织的磁盘访问控制装置及方法 |
CN104657500A (zh) * | 2015-03-12 | 2015-05-27 | 浪潮集团有限公司 | 一种基于key-value键值对的分布式存储方法 |
JP6468098B2 (ja) * | 2015-07-02 | 2019-02-13 | 富士通株式会社 | 情報処理プログラム、装置、及び方法 |
CN105224828B (zh) * | 2015-10-09 | 2017-10-27 | 人和未来生物科技(长沙)有限公司 | 一种基因序列片段快速定位用键值索引数据压缩方法 |
US10037148B2 (en) | 2016-01-05 | 2018-07-31 | Microsoft Technology Licensing, Llc | Facilitating reverse reading of sequentially stored, variable-length data |
CN107092607B (zh) * | 2016-02-18 | 2021-04-23 | 中国移动通信集团安徽有限公司 | 一种话单入库方法及装置 |
US20170308561A1 (en) * | 2016-04-21 | 2017-10-26 | Linkedin Corporation | Indexing and sequentially storing variable-length data to facilitate reverse reading |
US20170371551A1 (en) * | 2016-06-23 | 2017-12-28 | Linkedin Corporation | Capturing snapshots of variable-length data sequentially stored and indexed to facilitate reverse reading |
US10310997B2 (en) * | 2016-09-22 | 2019-06-04 | Advanced Micro Devices, Inc. | System and method for dynamically allocating memory to hold pending write requests |
US10191693B2 (en) | 2016-10-14 | 2019-01-29 | Microsoft Technology Licensing, Llc | Performing updates on variable-length data sequentially stored and indexed to facilitate reverse reading |
CN106874348B (zh) * | 2016-12-26 | 2020-06-16 | 贵州白山云科技股份有限公司 | 文件存储和索引方法、装置及读取文件的方法 |
CN107832343B (zh) * | 2017-10-13 | 2020-02-21 | 天津大学 | 一种基于位图的mbf数据索引结构对数据快速检索的方法 |
US10735826B2 (en) * | 2017-12-20 | 2020-08-04 | Intel Corporation | Free dimension format and codec |
CN109388641B (zh) * | 2018-10-22 | 2019-10-18 | 无锡华云数据技术服务有限公司 | 一种检索键值数据库中键的共同前缀的方法、设备、介质 |
CN109299112B (zh) | 2018-11-15 | 2020-01-17 | 北京百度网讯科技有限公司 | 用于处理数据的方法和装置 |
CN111208933B (zh) * | 2018-11-21 | 2023-06-30 | 昆仑芯(北京)科技有限公司 | 数据访问的方法、装置、设备和存储介质 |
CN110825940B (zh) * | 2019-09-24 | 2023-08-22 | 武汉智美互联科技有限公司 | 网络数据包存储和查询方法 |
US11863445B1 (en) | 2019-09-25 | 2024-01-02 | Juniper Networks, Inc. | Prefix range to identifier range mapping |
US11062507B2 (en) | 2019-11-04 | 2021-07-13 | Apple Inc. | Compression techniques for pixel write data |
CN111241398B (zh) * | 2020-01-10 | 2023-07-25 | 百度在线网络技术(北京)有限公司 | 数据预取方法、装置、电子设备及计算机可读存储介质 |
CN111241108B (zh) * | 2020-01-16 | 2023-12-26 | 北京百度网讯科技有限公司 | 基于键值对kv系统的索引方法、装置、电子设备和介质 |
CN116506341A (zh) * | 2020-03-09 | 2023-07-28 | 华为技术有限公司 | 一种生成段标识sid的方法和网络设备 |
US11366796B2 (en) * | 2020-04-30 | 2022-06-21 | Oracle International Corporation | Systems and methods for compressing keys in hierarchical data structures |
CN113779014A (zh) * | 2020-06-10 | 2021-12-10 | 深信服科技股份有限公司 | 一种数据存储方法、装置、设备和存储介质 |
CN111930757B (zh) * | 2020-09-24 | 2021-01-12 | 南京中兴软件有限责任公司 | 数据处理方法、系统、封装节点和解封装节点 |
CN112612925B (zh) * | 2020-12-29 | 2022-12-23 | 度小满科技(北京)有限公司 | 数据的存储方法、读取方法以及电子设备 |
CN113923209B (zh) * | 2021-09-29 | 2023-07-14 | 北京轻舟智航科技有限公司 | 一种基于LevelDB进行批量数据下载的处理方法 |
CN116089415A (zh) * | 2021-11-05 | 2023-05-09 | 中兴通讯股份有限公司 | 键-值存储的方法和设备、计算机可读介质 |
CN116414828A (zh) * | 2021-12-31 | 2023-07-11 | 华为技术有限公司 | 一种数据管理方法及相关装置 |
CN114077609B (zh) * | 2022-01-19 | 2022-04-22 | 北京四维纵横数据技术有限公司 | 数据存储及检索方法,装置,计算机可读存储介质及电子设备 |
CN115202767B (zh) * | 2022-09-19 | 2022-11-25 | 腾讯科技(深圳)有限公司 | 一种振动控制方法、装置、设备及计算机可读存储介质 |
CN116521090B (zh) * | 2023-06-25 | 2023-09-12 | 苏州浪潮智能科技有限公司 | 数据落盘方法、装置、电子设备及存储介质 |
CN117271440B (zh) * | 2023-11-21 | 2024-02-06 | 深圳市云希谷科技有限公司 | 一种基于freeRTOS文件信息存储方法、读取方法及相关设备 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1564991A (zh) * | 2001-10-02 | 2005-01-12 | 索尼国际(欧洲)股份有限公司 | 字数据库压缩 |
CN101777056A (zh) * | 2009-12-31 | 2010-07-14 | 成都市华为赛门铁克科技有限公司 | 数据存储方法及设备 |
CN102223289A (zh) * | 2010-04-15 | 2011-10-19 | 杭州华三通信技术有限公司 | 一种存储IPv4地址和IPv6地址的方法和装置 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB9320404D0 (en) * | 1993-10-04 | 1993-11-24 | Dixon Robert | Method & apparatus for data storage & retrieval |
US20040220941A1 (en) * | 2003-04-30 | 2004-11-04 | Nielson Mark R. | Sorting variable length keys in a database |
US7496572B2 (en) * | 2003-07-11 | 2009-02-24 | Bmc Software, Inc. | Reorganizing database objects using variable length keys |
US8589574B1 (en) * | 2005-12-29 | 2013-11-19 | Amazon Technologies, Inc. | Dynamic application instance discovery and state management within a distributed system |
US9047330B2 (en) * | 2008-10-27 | 2015-06-02 | Ianywhere Solutions, Inc. | Index compression in databases |
CN101639848B (zh) * | 2009-06-01 | 2011-06-01 | 北京四维图新科技股份有限公司 | 一种空间数据引擎及应用其管理空间数据的方法 |
CN102609490B (zh) * | 2012-01-20 | 2014-07-02 | 东华大学 | 一种面向列存储dwms的b+树索引方法 |
-
2012
- 2012-12-14 CN CN201210541207.0A patent/CN103870492B/zh active Active
-
2013
- 2013-12-02 WO PCT/CN2013/088286 patent/WO2014090097A1/zh active Application Filing
- 2013-12-02 US US14/652,002 patent/US9377959B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1564991A (zh) * | 2001-10-02 | 2005-01-12 | 索尼国际(欧洲)股份有限公司 | 字数据库压缩 |
CN101777056A (zh) * | 2009-12-31 | 2010-07-14 | 成都市华为赛门铁克科技有限公司 | 数据存储方法及设备 |
CN102223289A (zh) * | 2010-04-15 | 2011-10-19 | 杭州华三通信技术有限公司 | 一种存储IPv4地址和IPv6地址的方法和装置 |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111316255A (zh) * | 2017-11-20 | 2020-06-19 | 华为技术有限公司 | 数据存储系统以及用于提供数据存储系统的方法 |
CN111316255B (zh) * | 2017-11-20 | 2023-11-03 | 华为技术有限公司 | 数据存储系统以及用于提供数据存储系统的方法 |
US10896022B2 (en) | 2017-11-30 | 2021-01-19 | International Business Machines Corporation | Sorting using pipelined compare units |
US10936283B2 (en) | 2017-11-30 | 2021-03-02 | International Business Machines Corporation | Buffer size optimization in a hierarchical structure |
US11048475B2 (en) | 2017-11-30 | 2021-06-29 | International Business Machines Corporation | Multi-cycle key compares for keys and records of variable length |
US11354094B2 (en) | 2017-11-30 | 2022-06-07 | International Business Machines Corporation | Hierarchical sort/merge structure using a request pipe |
CN112486910A (zh) * | 2020-11-23 | 2021-03-12 | 天津津航计算技术研究所 | 一种快速解析海量数据文件的方法 |
Also Published As
Publication number | Publication date |
---|---|
US20150331619A1 (en) | 2015-11-19 |
CN103870492A (zh) | 2014-06-18 |
CN103870492B (zh) | 2017-08-04 |
US9377959B2 (en) | 2016-06-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2014090097A1 (zh) | 一种数据存储方法和装置 | |
EP4068070A1 (en) | Data storage method and apparatus, and storage system | |
CN108319654B (zh) | 计算系统、冷热数据分离方法及装置、计算机可读存储介质 | |
US8370315B1 (en) | System and method for high performance deduplication indexing | |
CN107038206B (zh) | Lsm树的建立方法、lsm树的数据读取方法和服务器 | |
US20180089244A1 (en) | Key-value stores implemented using fragmented log-structured merge trees | |
US9043334B2 (en) | Method and system for accessing files on a storage system | |
US10783115B2 (en) | Dividing a dataset into sub-datasets having a subset of values of an attribute of the dataset | |
TW201841122A (zh) | 鍵值儲存樹 | |
TW201842454A (zh) | 合併樹廢棄項目指標 | |
CN106980665B (zh) | 数据字典实现方法、装置及数据字典管理系统 | |
US11169968B2 (en) | Region-integrated data deduplication implementing a multi-lifetime duplicate finder | |
CN116450656B (zh) | 数据处理方法、装置、设备及存储介质 | |
WO2023165272A1 (zh) | 数据存储及查询 | |
EP3343395B1 (en) | Data storage method and apparatus for mobile terminal | |
US20140012879A1 (en) | Database management system, apparatus, and method | |
US20220398220A1 (en) | Systems and methods for physical capacity estimation of logical space units | |
CN112241396B (zh) | 基于Spark的对Delta进行小文件合并的方法及系统 | |
US11789639B1 (en) | Method and apparatus for screening TB-scale incremental data | |
CN116048396B (zh) | 基于日志结构化合并树的数据存储装置和存储控制方法 | |
CN112328587A (zh) | ElasticSearch的数据处理方法和装置 | |
CN116561120B (zh) | 一种用于时序数据库的数据文件快速合并方法及系统 | |
CN115344539B (zh) | 用于分布式数据库的日志空间回收方法和装置 | |
JP5709563B2 (ja) | バッファキャッシュ管理方法、バッファキャッシュ管理装置及びプログラム | |
CN116991761A (zh) | 一种数据处理方法、装置、计算机设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13862302 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14652002 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 20/08/2015) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13862302 Country of ref document: EP Kind code of ref document: A1 |