WO2014090097A1 - 一种数据存储方法和装置 - Google Patents

一种数据存储方法和装置 Download PDF

Info

Publication number
WO2014090097A1
WO2014090097A1 PCT/CN2013/088286 CN2013088286W WO2014090097A1 WO 2014090097 A1 WO2014090097 A1 WO 2014090097A1 CN 2013088286 W CN2013088286 W CN 2013088286W WO 2014090097 A1 WO2014090097 A1 WO 2014090097A1
Authority
WO
WIPO (PCT)
Prior art keywords
key
data block
length
block
data
Prior art date
Application number
PCT/CN2013/088286
Other languages
English (en)
French (fr)
Inventor
陈峥
邓大付
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to US14/652,002 priority Critical patent/US9377959B2/en
Publication of WO2014090097A1 publication Critical patent/WO2014090097A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0613Improving I/O performance in relation to throughput
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24561Intermediate data storage techniques for performance improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]

Definitions

  • Embodiments of the present invention relate to the field of information processing technologies, and, more particularly, to a data storage method and apparatus. Background of the invention
  • the key-value distributed storage system has the advantages of fast query speed, large amount of data storage, and high concurrency support (such as supporting multiple concurrent query processes). It is very suitable for querying through primary keys, but it cannot be complicated. Conditional query. If the real-time search engine (Real-Time Search Engine) is used for complex condition retrieval and full-text search, it can replace the low-performance relational database such as MySQL, which achieves high concurrency, high performance and saves the number of servers.
  • Real-Time Search Engine Real-Time Search Engine
  • the embodiment of the invention provides a data storage method, which can improve the utilization of the storage space.
  • the embodiment of the invention also provides a data storage device, which can improve the utilization of the storage space.
  • a data storage method comprising:
  • Storing a fixed length key and a value thereof in the first data block wherein the storing the fixed length key comprises: uniformly storing a common prefix of each fixed length key, and separately storing remaining parts of each fixed length key after removing the common prefix;
  • variable length key and its value are stored in the second data block, wherein the storage variable length key comprises: a full length storage variable of the reference key type, and a prefix compression of the variable length key of the prefix compression key type.
  • a data storage device comprising: a fixed length key storage unit and a variable length key storage unit, wherein: a fixed length key storage unit, configured to store a fixed length key and a value thereof in the first data block, wherein the storing
  • the fixed length key includes: uniformly storing a common prefix of each fixed length key, and separately storing the remaining parts after each fixed length key removes the common prefix;
  • variable length key storage unit configured to store a variable length key and a value thereof in the second data block, wherein the storage variable length key comprises: a full length storage variable length key of the reference key type, and a variable length key of the prefix compression key type Perform prefix compression.
  • FIG. 1 is a schematic structural diagram of a computing device of an embodiment
  • FIG. 2 is a schematic diagram of a file format according to an embodiment of the present invention.
  • FIG. 3 is a flowchart of a data storage method according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a data block storage structure for storing a record having a fixed length key according to an embodiment of the present invention
  • FIG. 5 is a schematic diagram of a process of writing a fixed length key to a data block according to an embodiment of the present invention
  • FIG. 6 is a schematic diagram of a data block storage structure for storing records having variable length keys according to an embodiment of the present invention
  • FIG. 7 is a schematic diagram of a process of writing variable length keys into data blocks according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a storage structure of a bloom filter according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of a process of writing block index information into an index block according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of a storage structure of a file header according to an embodiment of the present invention.
  • FIG. 11 is a schematic flowchart of writing a record to a file according to an embodiment of the present invention.
  • FIG. 12 is a schematic diagram of a file reading method according to an embodiment of the present invention.
  • FIG. 13 is a schematic diagram of a process of reading and recording according to an embodiment of the present invention.
  • FIG. 14 is a structural diagram of a data storage device according to an embodiment of the present invention.
  • FIG. 15 is a structural diagram of a data storage device according to an embodiment of the present invention.
  • FIG. 1 is a block diagram showing the structure of a computing device of an embodiment.
  • computer 100 can be a computing device capable of implementing the methods and software systems provided by the various examples of the present invention.
  • computer 100 can be a personal computer or a portable device such as a laptop, tablet, cell phone or smartphone, and the like.
  • the computer 100 can also be a server connected to the above device via a network.
  • Computer 100 can have different capabilities and features. Various possible implementations are protected in this article Within the scope.
  • computer 100 can include a keypad/keyboard 156, and can also include a display 154, such as a liquid crystal display (LCD), or a display with advanced features, such as a touch sensitive 2D or 3D display.
  • a web-enabled computer 100 can include one or more physical or virtual keyboards, as well as mass storage device 130.
  • the computer 100 may also include or allow various operating systems 141, such as the WindowsTM or LinuxTM operating system, or a mobile operating system such as iOSTM, AndroidTM, or Windows MobileTM.
  • Computer 100 can include or run various applications 142, such as data storage application 145.
  • the data storage application 145 is capable of storing an ordered record in the file format of the embodiment of the present invention into the non-volatile storage device 130.
  • computer 100 can include one or more processor readable non-volatile storage media 130 and one or more processors 122 in communication with storage medium 130.
  • the processor readable non-volatile storage medium 130 can be a RAM, a flash memory, a ROM, an EPROM, an EEPROM, a register, a hard disk, a removable hard disk, a CD-ROM, or other various forms of non-volatile storage media.
  • Storage medium 130 may store a series of instructions or units and/or modules containing instructions for performing the operations of various embodiments of the present invention.
  • the processor can execute the above instructions to perform the operations in the various embodiments.
  • a data persistence file format is proposed, which can be based on key sorting, supports fixed length, variable length key, and value, and the key can be prefix-compressed.
  • the data block (block) is used as a storage unit, which is advantageous for 10 and the strength of the analysis.
  • the prefix compression method is used in the data block of the embodiment of the present invention, and each data block is compressed, thereby effectively reducing the storage space of the data, and improving the disk utilization of the machine.
  • the index block may be used. And the order inside the data block, quickly locate the data of the query.
  • the data storage method of the key ordering of the embodiment of the present invention may include the following steps.
  • the fixed length key and its value are stored in the first data block, wherein the storing the fixed length key comprises: uniformly storing the common prefix of each fixed length key, and respectively storing the remaining part after each fixed length key removes the common prefix.
  • variable length key and its value are stored in the second data block, wherein the storage variable length key comprises: a full length storage variable of the reference key type, and a prefix compression of the variable length key of the prefix compression key type.
  • the first data block is dedicated to storing fixed length keys and their values
  • the second data block is dedicated to storing variable length keys and their values.
  • variable length key can be divided into a reference key type and a prefix compression key type according to a preset threshold length and a threshold difference, where: And comparing the current variable length key with the previous reference key, and if the same prefix string is less than the threshold length, determining that the current variable length key is a reference key type;
  • the current variable length key is compared with the previous reference key by a prefix. If the same prefix string is greater than the threshold length and less than the sum of the threshold length and the threshold difference, it is determined that the current variable length key is a prefix compression key type.
  • variable length keys of the prefix compression key type perform prefix compression including:
  • variable length key of the prefix compression key type For the variable length key of the prefix compression key type, storing the length of the common prefix of the variable length key of the prefix compression key type and the previous reference key, and the variable length key storing the prefix compression key type to remove the remaining part after the common prefix .
  • the first data block when it is determined that the fixed length key and the value thereof are stored to the first data block, the first data block is compressed, and the storage buffer is allocated; when the size of the first data block is smaller than When the buffer is stored, the compressed first data block is written into the storage buffer;
  • the second data block is compressed, and the storage buffer is allocated; when the size of the second data block after compression is smaller than the storage buffer And writing the compressed second data block to the storage buffer.
  • the method can also include:
  • the Bloom filter information of the fixed length key and its value is written into the Bloom filter
  • the Bloom filter information of the fixed length key and its value is written into the Bloom filter.
  • a read buffer may be set, and when it is determined that the read length of the read operation is smaller than the buffer and the current data block is not the last data block, the start of the next data block is fetched. Address and record the length of the next data block and continue reading until the read length is greater than the length of the buffer.
  • a file format of an embodiment of the present invention consists of a data block (including data block 1, data block 2, up to data block n), a Bloom filter, an index block, and a file header (including a file header and a file header length).
  • An ordered record (record) is stored in the data block portion, which is divided into a Key and a value (value). For the Key part, it can be divided into fixed length Key and variable length Key.
  • the same type of Key and its Value can be stored in the same data block, for example, in the data block 1, the fixed length Key and its Value are stored; in the data block 2, the variable length Key and its Value are stored, and so on. Moreover, there may be more than one data block that exclusively stores the same type of Key and its Value. In the data block storing the fixed length Key and its Value, it is preferable to store only one copy for the common prefix (prefix) of each fixed length key, and only the remaining part after the common prefix is removed for each key (ie, different departments) Remainder ).
  • prefix common prefix
  • the base key and the prefix compressed key can be distinguished according to a preset threshold length and a threshold diff.
  • the shell 1 J stores the current refix compressed key and the previous base key's prefix length (variable length integer compression), and then stores the refix compressed key of the refix compressed key.
  • the compression result of all the cells belonging to the same key can be stored.
  • the resident memory block requires a file writer (such as: SSTable writer) for management, if the SSTable writer disappears, the resident memory block also disappears, requiring the user to build the SSTable reader SSTable. Reader, call the SSTable writer's Assign method to swap out the dumped block), then allocate the block, hold the block in the SSTable writer; if the block is not resident in the memory, directly dump the block to the SSTable.
  • SSTable writer such as: SSTable writer
  • the metadata of the file ie bloom filter, block index, header, header length information
  • the file writer ie bloom filter, block index, header, header length information
  • a block based on the file format of the embodiment of the present invention needs to allocate a temporary space for storing a key offset, a base offset, and a key when performing a push record. ) , data offset (data offset ) and data (data ).
  • the current key is compared with the reference key of the previous record (getting the base offset of the previous record, and the same prefix string is greater than the threshold (thresholdjen, configurable), the prefix is compressed,
  • the base offset corresponding to the key stores the offset of the reference key of the previous record, and obtains the length of the same prefix after compression (var int), and then writes in the key temporary allocation area, and then the last written end position, The length of the same prefix (represented by var int) and the key of different parts;
  • the data temporary allocation area writes the offset of data and data data data.
  • the current key is compared with the reference key of the previous record.
  • the prefix compression is not performed.
  • the key is stored as the reference key, and the base offset is set.
  • the current key offset (the current key is the base compression, this key is not compressed), and then in the key temporary allocation area, then the last written end position, write the key; in the data offset, data temporary allocation area write Input data offset and data data
  • the longest common string that has been pushed is calculated.
  • the storage structure of the block memory are obtained. The longest common string is stored first, then the remaining strings of each key are stored, then the data data is stored, and finally the header block header is stored.
  • the read (get) operation of the data block in the file format mainly includes: when performing get record, searching in the block according to the index of the record; or searching for the record corresponding to the key in the block index.
  • the Bloom filter of the embodiment of the present invention is a special degenerate Hash Table. Degraded to not processing Collision, does not store Key value; bloom filter can set the number of hashes, calculate the position of the record in the bitmap according to the number of hashes, and the key of each record, and set it.
  • the first layer is filtered according to the bloom filter, and the position in the corresponding bitmap of the record is set to 1 according to the number of hashes and the key of each record. If not, the current key is not In the existing file, if it is 1, it may exist in the file, and it is searched in the file according to the end key of the block index in the file.
  • the storage method of the index block (block index) in the embodiment of the present invention (variable length key, fixed length value) It is similar to how blocks are stored.
  • the key field is the cell key, which stores the full key of the last cell of each data block (row key + cfid + column ), the value field is the offset of the data block in the file (offset length), and the current row The length of the key (row key length).
  • the header of the embodiment of the present invention stores file related information and the offset and length of each part, which facilitates rapid positioning of each part, and saves system resource waste caused by traversing from the file.
  • the value is the length of data; for non-fixed length, it is 0;
  • the present invention proposes a writing method of recording data.
  • the record is first written into the block; if the write is successful, the blowfilter information of the current record is written to the bloomfilter structure; if the write fails, the current block data is full, and the current block is compressed, according to write
  • the unused space size and the size of the block in the buffer are allocated the appropriate write buffer. If the current write buffer space is sufficient, the block can be written to the write buffer; if the write buffer space is insufficient, the existing write buffer is written.
  • Write buffer is equal to a layer of cache, the cache has been written to several blocks, - write more times Block to disk.
  • each block write process will write the last key record of the block into the Block Index structure.
  • a file format proposed by the embodiment of the present invention (hereinafter, this file format is referred to as sstable format, and a file of this format is referred to as sstable file).
  • This file format can be used to permanently save data, also known as data persistence.
  • This file format can store records with fixed lengths, variable length keys, and values.
  • a record consists of a key and a value.
  • a key and a value belonging to the same record as the key are simply referred to as a key and its value, or a key and its corresponding value, or the currently recorded key and value are simply referred to as the current key and the current value.
  • the key is the keyword of the record, which can be entered by the user or generated by other means.
  • FIG. 2 is a schematic diagram of a file format according to an embodiment of the present invention.
  • the file format is composed of data blocks (including data block 1, data block 2, ⁇ data block n ), and metadata (meta data).
  • Metadata includes Bloom filters, index blocks, and file headers (including header and header length).
  • a block is used to store records, which can be ordered or unordered.
  • records which can be ordered or unordered.
  • a record that has been sorted by key can be stored in the data block.
  • the bloom filter is a special degenerate hash table. Degraded to not deal with conflicts (Collision), does not store key values.
  • the Bloom filter stores information about whether each record is in the file (also known as Bloom filter information).
  • the position of the record in the bitmap can be calculated according to the preset number of hashes and the key of the record, and the value of the position is set to indicate that the key exists in the file. For example, when the value of the position in the bitmap is not 1, it indicates that the record does not exist in the file; when the value of the position in the bitmap is 1, it indicates that the record may exist in the file. In the file.
  • the position of the key in the bitmap (BitMap) can be calculated according to the preset number of hashes and the key, and whether the key exists in the file is determined according to the value of the position.
  • the index block portion is used to store location information of each data block and range information of keys stored in the respective data blocks.
  • the stored records are sorted by key, for example, sorted according to the ASCII code of the keys
  • the last key stored in each data block can be recorded as a block in the index block.
  • the index key end key
  • the data block in which the record is stored can be determined according to the block index key in the index block, and the data block is read according to the position of the data block, thereby searching for the record in the data block.
  • the file header (including header and header length) is used to store the file information and the offset and length of each part, which helps to quickly locate each part, eliminating the wasted system resources caused by traversing from the beginning of the file.
  • the sstable file generation process in an example includes: writing a record to a data block (also referred to as a block); if the write is successful, writing the currently recorded Bloom filter information to Bloom filter The structure of the block; when a block is full, the block index information is written to the index block portion; when all the blocks are written, the metadata (meta data) of the file is written, and the file is written.
  • the throughput can be improved by 10
  • the resolution efficiency can be improved
  • the resolution speed can be accelerated.
  • a write buffer to the file in memory; allocate multiple temporary storage areas for one block in memory, and The temporary storage area writes information of the record, for example, a temporary storage area can be allocated for each part in the data block structure. If the write fails, it indicates that the current data block is full. According to the size of the unused space in the write buffer and the size of the current data block, it is determined whether the storage buffer space is sufficient. If the current storage buffer space is sufficient, write the data block to the storage buffer; if the storage buffer space is insufficient, write the existing storage buffer to the disk, and then reallocate the storage buffer. To store the current data block and subsequent data blocks.
  • the storage buffer is equivalent to a layer of cache, which is used to cache several blocks that have been written, and then write these blocks to the disk at one time, reducing interaction with the disk and speeding up the writing speed. This improves the efficiency of writing to disk.
  • Other examples may also use other caching mechanisms, for example, writing the block to the disk every time a block is written, etc., which is not limited by the present invention.
  • the block when a block is written to a file, the block can be compressed first and the compressed block can be written to save storage space.
  • the key in the record can be a fixed length key or a variable length key.
  • the length of the fixed length key is equal to the preset value, and the length of the variable length key has no fixed value.
  • the same type of key and its value can be stored in the same data block, for example, the data block 1 stores the fixed length key and its value; in the data block 2, the variable length key and its value are stored, etc. . There can be more than one number of data blocks that hold the same type of key and its value.
  • FIG. 3 is a flowchart of a data storage method according to an embodiment of the present invention. As shown in Figure 3, the method can include the following steps.
  • Step S11 storing a fixed length key and a value thereof in the first data block, where the storing fixed length key comprises: storing The common prefix of each fixed length key is stored, and the remaining parts after the common prefix is removed by each fixed length key are respectively stored.
  • Step S12 storing a variable length key and a value thereof in the second data block, wherein the storage variable length key comprises: a full length storage variable length key of the reference key type, and performing prefix compression on the variable length key of the prefix compression key type.
  • each data block can be preset according to the size of the data block. This preset size can be set by the user or otherwise determined. All data blocks can be set to have the same size, or they can be individually sized for the first data block and the second data block.
  • only the common prefix of each fixed length key can be stored in the first data block, and the remaining part after the common prefix is removed (ie, the different part, the remainder) ).
  • FIG. 4 is a schematic diagram of a storage structure of a data block for storing a record having a fixed length key according to an embodiment of the present invention.
  • the data block is composed of a block header 401, a data offset 402, a data 403, a remainder key 404, and a key prefix (common prefix key). ) 405 These parts (also called area, field) are composed.
  • the block header 401 stores information of the data block, which may be defined as needed, for example, may include the length of the common prefix stored by the key common prefix 405.
  • the key common prefix 405 refers to the common prefix portion of all keys stored in the data block.
  • the data offset 402, the data 403, and the remaining portion of the key 404 respectively include a plurality of memory cells (cells), and each memory cell of each portion corresponds to one record.
  • Each record in the data block corresponds to one memory cell of the remaining key portion 404, one memory cell of the data offset 402, and one memory cell of the data 404, that is, three memory cells corresponding to one record in the data block.
  • the remaining portion of the key 404 stores the remaining portions of each key after the common prefix is removed.
  • the data offset 402 stores the offset of the storage location of each data (i.e., the value in each record) relative to a starting location in the block for locating the values in the block.
  • Data 403 stores the value in each record.
  • FIG. 5 is a schematic diagram of a process of writing a fixed length key to a data block according to an embodiment of the present invention.
  • the temporary storage space 501 may be allocated in the memory for the block header 401, the data offset 402, the data 403, the key remaining portion 404, and the key common prefix 405 of the first data block as described in FIG. , 502, 503, 504, 505.
  • the common prefix is written to the key temporary storage space 505, and the length of the common prefix is written to the temporary storage space 501.
  • the same portion as the common prefix is removed from the key prefix of the record, and the remaining portion of the key is written to the storage unit in the temporary storage space 504 corresponding to the record.
  • the value of the record is written to the storage unit in the temporary storage space 503 corresponding to the record, and the offset of the storage unit in the temporary storage space 503 is written into the storage unit corresponding to the record in the temporary storage space 502.
  • the sum of the sizes of the temporary storage spaces 501, 502, 503, 504, 505 described above can be calculated each time a record is written. If the sum is greater than or equal to the preset data block size (size), the data block is already full, and the current record is not written, that is, the current record storage fails.
  • the temporary storage spaces 502, 503, 504, 505 are respectively written into a storage buffer (write buffer) according to the storage structure of the data block (for example, according to the storage structure shown in FIG. 4), and are temporarily in the block header.
  • the storage space 501 records information of each part of the writing, such as the position of the data offset 402, the data 403, the key remaining portion 404, and the key common prefix 405 in the data block (e.g., offset, etc.).
  • the temporary storage space 501 is written to the storage buffer. At this point, the writing process of the first data block is completed. If there are still records with fixed length keys that have not yet been written, the temporary storage space can be reassigned and the writing process of the above data blocks can be repeated, and a new data block has been generated.
  • the method for determining the common prefix of the fixed length key may be determined according to actual needs. For example, the number of records that can be stored in the data block can be predicted based on the size of the allocated temporary storage space for storing the remaining portion of the key 404, and then the corresponding number of recorded keys are read to obtain a common prefix for the keys.
  • the first data block when it is determined that the fixed length key and its value are stored in the first data block, it is determined that the first data block is full.
  • the first data block may be compressed to reduce the storage space of the data.
  • the size of the first data block after compression is less than the available space of the storage buffer, the compressed first data block is written into the storage buffer.
  • the variable length key when storing a variable length key, can be divided into a base key and a prefix compressed key. For the datum key, a full amount of storage is performed, that is, the full key is stored.
  • the prefix compression key the length of the same part of the prefix of the current prefix compression key and its reference key (hereinafter referred to as the same prefix or the same prefix string) is stored, and then the remaining part after the same prefix is removed by the prefix compression key (hereinafter) This step is simply referred to as prefix compression, or variable length compression.
  • FIG. 6 is a schematic diagram of a storage structure of a data block for storing variable length keys according to an embodiment of the present invention.
  • the data block is composed of a block header 601, a data offset 602, a base offset 603, a key offset 604, and a data. 605, key (key) 606.
  • the block header 601 stores information of the data block, which can be defined as needed.
  • Data offset 602, reference offset 603, key offset 604, data 605, and key 606 each include multiple saves A storage unit (cell), each storage unit of each part corresponds to one record. Each record in the data block has a corresponding one of the data offset 602, the reference offset 603, the key offset 604, the data 605, and the key 606, that is, a record corresponding to the five found in the data block portion.
  • the information of the storage unit can restore the record ( key+value ).
  • the data offset 602 stores the offset of the storage location of each data (i.e., the value in each record) relative to a starting location in the block for locating data in the block.
  • Data 605 stores the value in each record.
  • the reference offset 603 stores the offset of the storage position of the reference key of each key in the data block from a start position (i.e., the key offset of the reference key) for positioning the reference key of each key in the block.
  • the reference key's offset is set to zero.
  • the key offset 604 stores the offset of the storage location of each key in the data block relative to a starting position for locating the keys in the block.
  • the key 606 stores the length of the same prefix of each key and its reference key and the remaining part after the same prefix is removed.
  • the key of the first record written in the current data block can be used as a reference key, and the keys of the subsequent record are prefix-compressed according to the reference key.
  • performing prefix compression on the variable length key of the prefix compression key type includes: storing a length of the same prefix of the variable length key of the prefix compression key type and its reference key, and a variable length key storing the prefix compression key type to remove the The remainder after the same prefix.
  • Prefix compression in the data block can reduce the storage space of the data and increase the disk utilization of the machine.
  • a data block can also have one or more reference keys.
  • the variable length key can be divided into a reference key type and a prefix compression key type by a preset method, that is, a variable length key is determined as a reference key or a prefix compression key.
  • variable length key can be divided into a base key type and a prefix compression key type according to a preset threshold length (threshold_len).
  • threshold_len a preset threshold length
  • the current variable length key and the current reference key ie, the reference key of the previous key; when the current key is determined as the reference key, in order to distinguish the current key from the reference key of the previous key, the reference key of the previous key is also referred to Performing a prefix comparison for the previous reference key.
  • the current variable length key is the reference key type; the current variable length key is compared with the current reference key for prefix comparison, if the same prefix If the length of the string is greater than the threshold length, it is determined that the current variable length key is a prefix compression key type.
  • the threshold length can be set as needed.
  • variable length key can also be divided into the reference key type and the former according to a preset threshold difference.
  • Embed the compression key type Calculate the length of the same prefix string of the current key and the previous key, recorded as the first length. Calculate the length of the common prefix string of the current key and the current datum key, and record it as the second length.
  • the key offset of the current reference key is used as the reference offset of the current key; if the first length is greater than or equal to The sum of the second length and the threshold difference determines that the current key is the reference key, the current key is not compressed, and the reference offset of the current key is set to zero.
  • FIG. 7 is a schematic diagram of a process of writing a variable length key into a data block according to an embodiment of the present invention.
  • a data block in the format shown in FIG. 6 when a data block in the format shown in FIG. 6 is recorded, it may be a block header 601, a data offset 602, a reference offset 603, a key offset 604, and a data 605 in the memory.
  • the keys 606 allocate temporary storage spaces 701, 702, 703, 704, 705, and 706, respectively.
  • the key of the first record is used as the reference key. Since the reference key is not compressed, the complete key is stored in the temporary storage space 706 for storing the key.
  • the storage unit (for example, the first record can correspond to the first storage unit of each temporary storage space).
  • the reference offset of the key is stored in the temporary storage space 703 for storing the reference offset, and the storage unit corresponding to the record.
  • the datum offset of the datum key can be set to 0.
  • the recorded data i.e., value
  • the offset of the current key in the temporary storage space 706 (relative position relative to the start position of the area) is written to the storage unit corresponding to the record in the temporary storage space 704 for storing the key offset.
  • the offset of the currently recorded data in the temporary storage space 705 (relative position relative to the start position of the area) is written to the storage unit corresponding to the record in the temporary storage space 702 for storing the data offset.
  • the key offset of the first record and its corresponding data offset can both be set to zero.
  • each subsequent record is written to the data block, it is calculated whether the currently recorded key can become the reference key. If it is determined that the current key is the reference key, it is determined that the reference offset of the current key is 0, and the complete key (because the reference key is not compressed) and the reference offset are stored in the temporary storage spaces 706, 703 corresponding to the current record. Storage unit.
  • the key offset of the current reference key is written into the storage unit corresponding to the current record in the temporary storage space 703, and the key is prefix-compressed, that is, the length of the same prefix of the current key and the reference key ( Var int indicates) and the different parts (that is, the remaining part after the current key is removed from the same prefix) are stored in the storage unit corresponding to the current record in the temporary storage space 706.
  • the currently recorded key offset, data offset, and data are stored in the storage locations corresponding to the current record in the temporary storage spaces 704, 702, and 705, respectively. In this way, the writing of a variable length key is completed.
  • the method of determining the current datum key can include: obtaining a datum offset of the previous record to determine a current datum key.
  • the method of determining the current reference key may include: saving information of the current reference key, such as a key offset of the current reference key, during the writing of the record. For example, you can assign a temporary storage space to temporarily store information about the current datum key. This will give you the current datum key based on the saved current datum key information.
  • the information of the saved current reference key can be updated to the information of the current key, such as the key offset of the current key.
  • Other examples can also use other methods to determine the current datum key.
  • the sum of the sizes of the temporary storage spaces 701, 702, 703, 704, 705, 706 described above can be calculated each time a record is written. If the sum is greater than or equal to the preset data block size (size), the data block is already full, and the current record is not written, that is, the current record storage fails.
  • the temporary storage spaces 702, 703, 704, 705, 706 are respectively written into a storage buffer (write buffer) according to the storage structure of the data block (for example, according to the storage structure shown in FIG. 6), and are in the temporary storage space.
  • 701 records information of each part of the writing, such as data offset 602, reference offset 603, key offset 604, data 605, position of key 606 in the data block (eg, offset, etc.).
  • the temporary storage space 701 is written to the storage buffer. At this point, the writing process of the second data block is completed. If there are still records with variable length keys that have not yet been written, the temporary storage space can be reassigned and the writing process of the above data blocks can be repeated, and a new second data block has been generated.
  • each time a record is written or when the data block is full the sum of the size of the used space of the block storage area (ie, the above storage buffer) and the size of each temporary allocated space area can be calculated. If the size of the storage buffer is greater than or equal to the size of the storage buffer, the block storage area is already full, and the current record or data block is not written, that is, the current record or the data block storage fails; if the sum is smaller than the storage buffer. The size of the temporary offset, the key offset, the data offset, and the temporary storage space in which the current record is stored is written to the unused area in the above storage buffer.
  • the second data block when it is determined that the variable length key and its value are stored to the second data block, the second data block may be compressed.
  • the size of the second data block after compression is smaller than the unused space of the storage buffer, the compressed second data block is written into the storage buffer. Compressing each data block can effectively reduce the storage space of the data.
  • the compression result of all the storage units (cells) corresponding to the same key may be stored, that is, the compressed value is stored, so that Subtract 'j, the storage space of the data.
  • All blocks can be written to a file in sstable format after the records are full.
  • the block order can be sequentially written into the sstable file according to the order of the stored keys. If a block of data resides in memory, the block is held in the file writer; if the block is not resident, it is written directly to the sstable file.
  • the data block resident in memory needs to be managed by a file writer, and the user needs to construct a file reader (for example, s stable reader) to read.
  • the file reader method (such as the As sign method, which can be implemented by a function) is called to swap out the data blocks in the written file.
  • File writers and file readers are an interface provided to the user to manage blocks of resident memory.
  • the metadata of the file includes information about the Bloom filter, block index, file header, and file header length.
  • FIG. 8 is a schematic diagram of a storage structure of a bloom filter according to an embodiment of the present invention.
  • the Bloom filter is a one-dimensional array or vector, for example expressed as ⁇ vl, v2, ..., vn ⁇ . Each of these elements corresponds to the information of a record's key stored in the file.
  • the Bloom filter information of the fixed length key and its value is written into the Bloom filter; when the determination is changed When the long key and its value are stored until the second data block succeeds, the Bloom filter information of the variable length key and its value is written into the Bloom filter.
  • the Bloom filter information of the key can be obtained according to a preset calculation method, for example, the number of hash calculations on the key can be preset.
  • the Bloom filter information indicates the position of the key in the Bloom filter, and the value of the position indicates whether the key is present in the file. In one example, when the Bloom filter information of a key is 1, it indicates that the key may exist in the file; when the Bloom filter information of a key is not 1 (for example, 0 or null), the key is represented. Does not exist in this file.
  • the Bloom filter information of the key can be obtained according to a preset calculation method, and the value indicated by the Bloom filter information is obtained from the Bloom filter, and then the value can be judged according to the obtained value. Whether the key is likely to exist in the file.
  • the Bloom filter When all the records have been written, the Bloom filter is written. After all the data blocks have been written to the file, the Bloom filter is written to the file.
  • a temporary storage space can be allocated in memory for temporary storage of the Bloom filter. When all the records have been stored, the Bloom filter in the temporary storage space is written to the sstable file on the disk.
  • FIG. 9 is a schematic diagram of a process of writing block index information into an index block according to an embodiment of the present invention.
  • the index block includes an index block header 901, a data offset 902, a reference offset 903, a block index offset 904, a data 905, and a block index key 906.
  • the index block header 901 is configured to store information of the index block, including a starting position of each area in the index block, Length, etc.
  • Each data block in the file corresponds to a storage unit in the data offset 902, data 905, reference offset 903, block index offset 904, and block index key 906 of the index block, respectively.
  • the position of the data block in the file can be found based on the information of one of the data stored in the data offset 902, the data 905, the reference offset 903, the block index offset 904, and the block index key 906.
  • the index block is stored in a similar manner to the data block storing the variable length key (ie, the second data block), and can also be regarded as storing a series of variable length key+values, and each variable length key+value corresponds to one data block. .
  • Each key stored in the index block corresponds to the end key of the last storage unit ( cell ) of each data block, that is, the complete key, for example, may include a row key (row key) + a column family ID (cfid ) + column ( column ).
  • the key is the complete form of the last key in the first data block, that is, the common prefix + the remaining part of the key;
  • the key is the complete form of the last key in the second data block, that is, when the last key is the reference key, the key is the last key in the key 606, and the last key is the prefix compression key, the key is based on the last key The base offset obtained by the base offset, the same prefix length stored in the key as the reference key, and the complete prefix compression key recovered by the different parts.
  • the last successful write key can be recorded while the record is being written to the data block. When the data block is full, the last successfully written key of the record is written to the index block.
  • Each value stored in the index block corresponds to the position of each data block in the file, such as the offset (offset length) and the current row key length (row key length).
  • the position of the data block in the file can be written in the index block when the data block is written to the file.
  • the data offset 902 stores the value of the value of the data block (ie, the position of the data block in the file) in the index block;
  • the data 905 stores the position (offset) of the corresponding data block in the file;
  • the block index offset 904 stores the storage location of the last key of the corresponding data block in the index block;
  • the reference offset 903 stores the data The end key of the block in which the reference key in the index block (not the reference key of the end key in the data block) is stored in the index block;
  • the block index key 906 stores the end key of the data block and the index block in the index block
  • the common prefix length of the base key in the middle and the end key remove the remainder of the common prefix.
  • each time a data block is written the key of the last record of the data block is written to the index block structure.
  • Each time a block of data is written to a file the location information of that block in the file is written to the index block structure.
  • the process of writing the corresponding end key and value of each data block into the index block is similar to the process of writing the variable length key and its corresponding value into the second data block, and details are not described herein again.
  • the index block write is completed. After the Bloom filter is written to the file, the index will be The block is written to the file.
  • a temporary storage space can be allocated in memory for temporary storage of index blocks. When all data blocks and Bloom filters are written to files on disk, the index blocks in the temporary storage space are written to the files on the disk. in.
  • each data block, Bloom filter, and index block are sequentially written to the file, and the position information of the Bloom filter and the index block is recorded to the header (header).
  • FIG. 10 is a schematic diagram of a storage structure of a file header according to an embodiment of the present invention.
  • the file header stores information about the file and the offset and length of each part, which helps to quickly locate each part, eliminating the wasted system resources caused by traversing from the file.
  • the file header structure of an example can be as shown in Figure 10, and the header can be set as follows.
  • KVtype which indicates the KV type of the record written by the file.
  • KV type includes two types of key types (variable length key, fixed length key) and two value types (variable length value, fixed length value).
  • Threshold length that is, a parameter for determining whether the variable length key is used as the reference key or a threshold value indicating the length of the common prefix string with the current reference key in the foregoing example.
  • Threshold difference which is the (7) file id number used to determine whether the variable length key is used as the reference key in the previous example, indicating the identifier of the file
  • the length after compression indicating the length of the sstable file after compression
  • the size of the data block indicating the size of each data block, that is, the parameter for determining whether the data block is full, can be set by the user
  • (22) sstable creates a timestamp indicating the creation time of the sstable file
  • FIG. 11 is a schematic flowchart of writing a record to a file according to an embodiment of the present invention. As shown in the record data writing process of Figure 11, the method may include the following steps.
  • Step S201 Write a record into a current data block (block) (also referred to as a block for short), where a data block for storing a fixed length key and a value thereof is written for a record corresponding to the fixed length key, corresponding to the variable length key The record is written to a block of data that stores the variable length key and its value.
  • block also referred to as a block for short
  • the storage fixed length key includes: in a data block that exclusively stores the fixed length key and its value, uniformly stores the common prefix of each fixed length key, and separately stores the remaining parts after each fixed length key removes the common prefix;
  • the storage variable length key includes: a full length storage variable of the reference key type, and a prefix compression of the variable length key of the prefix compression key type.
  • Step S202 It is judged whether the record writing in step S201 is successful, and if yes, step S211 and subsequent steps are performed; otherwise, step S203 and subsequent steps are performed.
  • Step S203 Determine whether the current data block is empty. If yes, return a parameter error, and exit the process. If not, perform step S204 and subsequent steps.
  • Step S204 Compress the current data block.
  • Step S205 determining whether the current data block compression is successful, and if not, returning a compression error, and Exit the process, if successful, perform step S206 and its subsequent processes.
  • Step S206 It is judged whether the current buffer (write_buffer) is empty and the current block is larger than the size of the buffer after compression. If yes, step S208 and subsequent steps are performed; otherwise, step S207 and subsequent steps are performed.
  • Step S207 Determine whether the remaining space can be written into the current data block, and if yes, perform step S210 and subsequent steps, otherwise perform step S209 and subsequent steps.
  • Step S208 Re-apply the buffer space, and perform step S210 and subsequent steps.
  • Step S209 Start dump, and end the process.
  • Step S210 Write the current data block into the buffer, reserve the index, cache the data block, and reset the current data block.
  • Step S211 Write the Bloom filter information of the current data block.
  • the stored records may be pre-arranged, for example, sorted according to the ASCII code of the recorded keys. In this way, when reading data, the query data can be quickly located according to the index block and the order of the internal data of the data block.
  • the following description explains a method of reading data from a file by taking a record in which a file is sorted in advance according to the order of the keys.
  • basic information such as a header length, a header, an index block, and a Bloom filter may be sequentially read.
  • FIG. 12 is a schematic diagram of a file reading method according to an embodiment of the present invention. As shown in Figure 12, the method can include the following steps.
  • Step S31 Read a file header length (field) of the file, and obtain a length of a header area.
  • Step S32 The file header area is read according to the length of the file header.
  • Step S33 Read an index block area according to information in a file header area.
  • the starting position of the index block in the file may be determined according to the offset of the index block in the file header, and then starting from the above starting position according to the compressed length of the index block in the file header or the length before the index block is compressed. Read out the index block.
  • Step S34 reading a bloom filter area according to information in the file header area. For example, you can determine the starting position of the Bloom filter in the file based on the offset of the Bloom filter in the file header. The Bloom filter in the file is then read from the above starting position based on the Bloom filter length in the header.
  • step S35 the process of notifying the upper layer to open the sstable file is completed.
  • a record can be found in a file by key (e.g., receiving a key to be searched for by the user). Since the order in which the records are stored in the file is sorted by key, the search process based on the keys can be searched according to the binary method to locate the records.
  • the first layer of filtering may be performed according to the Bloom filter, that is, the key corresponding to the Bloom filter bitmap is calculated according to the number of hashes in the file header.
  • the position, and then the value of the position in the Bloom filter bitmap determines whether the key exists in the file. For example, if the value is not 1, the key does not exist in the file; if it is 1, the key may exist in the file, and the file is found in the file according to the end key of each data block stored in the index block. The record of the key.
  • the binary block can be used to find the data blocks in the index block where the target key may be stored.
  • the position and length of the key offset area in the index block can be read from the index block header. Gets the key offset at a selected position in the key offset area (eg, a key offset in the middle of the key offset area). Get the key corresponding to the key offset, the reference offset of the key, and restore the key.
  • the key stored in the index block is the last key in each data block, and the value is the position of each data block in the file.
  • the recovered key is compared with the target key. If they are equal, the data block corresponding to the key is directly read, and then the last key stored in the data block and its corresponding value are read, that is, the record to be found.
  • the above search process determines the range of positions of the key offset to be searched based on whether the restored key is greater than the target key. For example, if the keys are sorted in ascending order, if the recovered key is larger than the target key, a key before the recovered key is obtained, and it is determined whether the previous key is equal to the target key, and if it is equal to the target key, Obtaining the record of the previous key is the record to be found.
  • the target key is continuously searched according to the above method between the position of the end of the key offset area and the position of the latter key.
  • the target key When judging the size of the target key is between When two adjacent keys in the block are referenced, it is determined that the target key may be stored in the data block in which the next key is located, the data block may be read, and then the target key is searched for in the data block.
  • data when determining a data block that a target key may store, data may be acquired according to a corresponding data offset of the data block in the index block, that is, the location of the data block in the file. The data block is then read from the corresponding location of the file based on the location and the block size in the file header.
  • the storage sequence number of the key offset in the key offset area is its corresponding
  • the reference offset and the data offset are stored in the reference offset area and the data offset area. There are many ways to get the corresponding reference offset and data offset based on the key offset.
  • the number of the key offset in the key offset area can be calculated according to the position of the key offset in the key offset area, and then the reference can be based on the reference.
  • the length of the offset and the number of the reference offset corresponding to the key are obtained in the reference offset region, thereby obtaining the position of the reference key of the key from the reference offset.
  • the data offset corresponding to the key and the data acquisition method are the same.
  • the serial number of each key can be stored in each key offset, reference offset, and data offset.
  • the record can be obtained in the data block (or the data block is The sequence number of each piece of information in the index block, and then the reference offset corresponding to the sequence number is obtained in the reference offset area.
  • the method of obtaining the data offset corresponding to the key is the same as above.
  • the method of finding the target key in the read data block is similar to the above method of finding the target key in the index block.
  • the dichotomy that is, select a position from the key storage area (for example, the position in the middle of the area), obtain the key stored in the position, compare it with the target key, and then further Narrow your search.
  • the start position and length of the key offset area are read from the head of the data block.
  • Select a position from the key offset area such as the middle position, to obtain the key offset of the position, and obtain the key according to the key offset.
  • the recovered key is compared with the target key.
  • the restored key is equal to the target key, the record corresponding to the key is the record to be found; if the restored key is larger than the target key, then the key offset area starts from Selecting a new position from the start position and the selected position, repeating the above search process; if the recovered key is smaller than the target key, selecting a new position at the end position of the key offset area and the selected position, repeating The above search process.
  • the method of restoring a key from a storage location (key offset) in a data block is related to the storage structure of the two data blocks described above.
  • the position and length of the common prefix can be read from the data block header, from the data block. Reading the common prefix; Reading the position of the remaining part of the key from the head of the data block, determining the storage position of the remaining part of the key corresponding to the key according to the key offset, reading the remaining part of the key of the key, and adding the common prefix to the front of the remaining part of the key to recover The key is out.
  • variable length key the variable length key can be judged from the value of the key-value type (KV type) in the file header
  • the position and length of each area can be read from the data block header, from the data block. Reading the key and the reference offset, and then reading out the reference key, reading out the same prefix length and different parts in the key, and cutting the string of the same prefix length from the reference key prefix, plus the different parts of the key, This button was restored.
  • the method of obtaining the corresponding reference offset according to the key offset of a key in the data block storing the variable length key is obtained by the key offset according to a key (the end key of the data block) in the index block.
  • the method of the reference offset is the same and will not be described again.
  • the data offset corresponding to the key can be obtained, and then the position of the data part is obtained from the head of the data block, and the value corresponding to the key is read according to the data offset, and the key can be obtained according to the key (key ) and the value ( value ) restores the record (key + value ).
  • the method of obtaining the corresponding data offset according to the key offset is the same as the method of obtaining the data offset corresponding to the key according to the key offset of a key (the end key of the data block) in the index block, and will not be described again.
  • the read buffer when the record is read, the read buffer may be set, and when it is determined that the read length of the read operation is smaller than the buffer and the current data block is not the last data block, the next data block is fetched. The starting address and record the length of the next data block and continue reading until the read length is greater than the length of the buffer.
  • Other examples can also use other caching methods to read files.
  • the user searches for the position of the user key (that is, the target key input by the user) according to the user's request, loads the block block where the key is located, and reads the information required by the user.
  • the position of the user key that is, the target key input by the user
  • Prefetch read is to read multiple block blocks at one time
  • delayed read is to aggregate multiple reads, that is, after receiving multiple read requests, then read multiple block blocks at one time.
  • the start address of the start block is obtained. If the current block block is the last block of the SSTable file, the length of the last block is directly obtained and returned after recording; if it is not the last block of the SSTable file, the start address and length of the next block block are obtained.
  • the read length is less than the length of the Read Buffer, and the current block block is not the last block block, the start address of the next block block is taken, and the length of the next block is recorded until the read length is long.
  • the degree is greater than the length of the Read Buffer.
  • the block block information for each read is aggregated, and then the start address and the read mode of the start block are similar to the prefetch read mode.
  • Prefetching and deferred reading are essentially one-time reading of multiple block blocks, minimizing seek and rotation when reading a disk, and speeding up disk reading.
  • FIG. 13 is a schematic diagram of a process of reading and recording according to an embodiment of the present invention. As shown in the recording data reading process shown in Fig. 13, the method may include the following steps.
  • Step S41 Acquire a starting address of the starting data block.
  • Step S42 determining whether the current data block is the last block, if yes, performing step S43 and subsequent steps, otherwise performing step S44 and subsequent steps.
  • Step S43 Obtain the length of the last block of data and record, then return, and exit this process.
  • Step S44 Acquire a start address of the next data block and a length of the current data block.
  • Step S45 It is judged whether the prefetching operation is performed, if not, the flow is exited, and if it is executed, step S46 and subsequent steps are performed.
  • Step S46 determining whether the read length is less than the maximum read size (KMaxReadSize) and not the last block of data, if yes, performing step S47 and subsequent steps, otherwise performing step S48 and subsequent steps.
  • KMaxReadSize the maximum read size
  • Step S47 Obtain the starting address of the next block of data, and record the length of the next block.
  • KMaxReadSize maximum read size
  • Step S49 Obtain the last block length and record.
  • FIG. 14 is a structural diagram of a data storage device according to an embodiment of the present invention. As shown in Fig. 14, the apparatus includes a fixed length key storage unit 1401 and a variable length key storage unit 1402.
  • the fixed length key storage unit 1401 is configured to store the fixed length key and the value thereof in the first data block, wherein the storing the fixed length key comprises: uniformly storing a common prefix of each fixed length key, and storing each fixed length key separately Remove the remainder after the common prefix;
  • variable length key storage unit 1402 is configured to store the variable length key and the value thereof in the second data block, wherein the storage variable length key comprises: a full length storage variable length key of the reference key type, and a variable length of the prefix compression key type The key performs prefix compression.
  • FIG. 15 is a structural diagram of a data storage device according to an embodiment of the present invention. As shown in Figure 15, the device includes The long key storage unit 1501 and the variable length key storage unit 1502 have functions similar to the fixed length key storage unit 1401 and the variable length key storage unit 1402 shown in FIG.
  • the apparatus can also include a key type distinguishing unit 1503.
  • the key type distinguishing unit 1503 is configured to divide the variable length key into a reference key type and a prefix compression key type according to a preset threshold length and a threshold difference; wherein: the current variable length key is compared with the previous reference key by a prefix, if the same prefix If the string is smaller than the threshold length, it is determined that the current variable length key is a reference key type; the current variable length key is compared with the previous reference key by a prefix, and if the same prefix string is greater than the sum of the threshold length and the threshold difference, then determining The current variable length key is a reference key type; the current variable length key is compared with the previous reference key by a prefix, and if the same prefix string is greater than the threshold length and smaller than the sum of the threshold length and the threshold difference, determining the current change The long key is the prefix compression key type.
  • the fixed length key storage unit 1501 is configured to store a variable length key of the prefix compression key type, store a length of a common prefix of the variable length key of the prefix compression key type and a previous reference key, and store the prefix compression key.
  • the variable length key of the type removes the remainder of the common prefix.
  • the apparatus can also include an inter-block compression unit 1504.
  • the inter-block compression unit 1504 is configured to compress the first data block when it is determined that the fixed length key and its value are stored in the first data block; when it is determined that the variable length key and its value are stored in the second data block The second data block is compressed.
  • the apparatus can also include a storage buffer unit 1505.
  • the storage buffer unit 1505 is configured to allocate a storage buffer, and when the size of the first data block after compression is smaller than the storage buffer, write the compressed first data block into the storage buffer; When the size of the second data block after compression is smaller than the storage buffer, the compressed second data block is written into the storage buffer.
  • the apparatus can also include a Bloom filter 1506.
  • the Bloom filter 1506 is configured to write the Bloom filter information of the fixed length key and its value into the first data block when it is determined that the fixed length key and its value are successfully stored; when it is determined that the variable length key is When the value and its value are stored until the second data block succeeds, the Bloom filter information of the fixed length key and its value is written therein.
  • the apparatus can also include a read buffer unit 1507.
  • the read buffer unit 1507 is configured to set a read buffer, and when it is determined that the read length of the read operation is smaller than the buffer and the current data block is not the last data block, the next data block is fetched. Start the address and record the length of the next data block and continue reading until the read length is greater than the length of the buffer.
  • the apparatus can also include a data block index storage unit 1508.
  • the data block index storage unit 1508 is configured to store the first data block and the last unit of the second data block The full key of ( cell ), and stores the offset of the first data block and the second data block in the data storage file and the length of the current row key.
  • the apparatus can also include a key type distinguishing unit.
  • the key type distinguishing unit is configured to compare the current variable length key with the current reference key, and if the same prefix string is smaller than the threshold length, determine that the current variable long key is a reference key type; if the same prefix string If the threshold length is greater than or equal to, the current variable length key is determined to be a prefix compression key type.
  • the key type distinguishing unit is configured to compare a current variable length key with a previous key stored in the second data block, and if the same prefix string is greater than or equal to a sum of the threshold length and a threshold difference, Then determining that the current variable length key is a reference key type; if the same prefix string is smaller than a sum of the threshold length and the threshold difference, determining that the current variable length key is a prefix compression key type.
  • the key type distinguishing unit is configured to obtain a first length of the same prefix string of the current variable length key and the previous key stored in the second data block, and obtain the same prefix of the current variable length key and the current reference key. a second length of the string, if the first length is greater than or equal to a sum of the second length and the threshold difference, determining that the current variable length key is a reference key type, if the first length is smaller than the second length and the threshold difference And, it is determined that the current variable length key is a prefix compression key type.
  • variable length key storage unit is configured to store a variable length key of a prefix compression key type, store a length of a common prefix of the variable length key of the prefix compression key type and a current reference key, and store the prefix compression key type.
  • the variable length key removes the remainder of the common prefix.
  • the apparatus can also include a data block index storage unit.
  • the data block index storage unit is configured to store the location information of the first data block in the file to the index block when the first data block is stored in the file; when the second data block is stored in the file And storing location information of the second data block in the file into an index block; storing the index block into the file.
  • the device can also include a Bloom filter.
  • the Bloom filter is used to write the Bloom filter information of the fixed length key and its value into the Bloom filter when it is determined that the fixed length key and its value are successfully stored in the first data block; When the variable length key and its value are stored to the second data block successfully, the Bloom filter information of the fixed length key and its value is written into the Bloom filter; the Bloom filter is stored in the In the file.
  • the apparatus can also include an inter-block compression unit.
  • the inter-block compression unit is configured to compress the first data block or the second data block, and store the compressed first data block or the second data block into a file.
  • the apparatus can also include a value compression unit.
  • the value compression unit is configured to: store the data to the first data block, and: compress the value corresponding to the fixed length key, and provide the compressed value to the fixed length key storage unit for storage to the first data block;
  • Storing the variable length key and its value to the second data block includes: compressing the value, and providing the compressed value to the variable length key storage unit for storage to the second data block.
  • the fixed length key storage unit is configured to store the fixed length key and its value into the first data block according to an order sorted according to the fixed length key in advance;
  • the data block index storage unit is configured to store a last fixed length key stored in the first data block and a starting position and length of the first data block in the file into the index block;
  • variable length key storage unit is configured to store the variable length key and its value into the second data block according to an order sorted according to the variable length key in advance;
  • the data block index storage unit is configured to store a last variable length key stored in the second data block and a starting position and length of the second data block in the file into the index block.
  • the device shown in Figure 14 can be integrated into the hardware entities of various communication networks.
  • the key sorting-based data storage device proposed by the embodiments of the present invention can be embodied in various forms.
  • a standard-formatted application interface can be used to write a key-based data storage device as a plug-in in a storage server, or it can be packaged as an application for users to download and use.
  • a plug-in When written as a plug-in, it can be implemented as a variety of plug-ins such as ocx, dll, cab, etc.
  • the key sorting-based data storage device proposed by the embodiment of the present invention may also be implemented by a specific technology such as a Flash plug-in, a RealPlayer plug-in, an MMS plug-in, a MIDI staff plug-in, or an ActiveX plug-in.
  • the key ordering-based data storage method proposed by the embodiment of the present invention can be stored on various storage media by means of instructions or a storage mode stored in the instruction set.
  • These storage media include, but are not limited to, floppy disks, optical disks, DVDs, hard disks, flash memories, USB flash drives, CF cards, SD cards, MMC cards, SM cards, Memory Sticks, xD cards, and the like.
  • the key sorting-based data storage method proposed by the embodiment of the present invention may be applied to a Nand flash-based storage medium, such as a USB flash drive, a CF card, an SD card, an SDHC card, an MMC card, or an SM card. , memory stick, xD card, etc.
  • a Nand flash-based storage medium such as a USB flash drive, a CF card, an SD card, an SDHC card, an MMC card, or an SM card.
  • memory stick xD card, etc.
  • the fixed length key and the value thereof are stored in the first data block, wherein the storage fixed length key comprises: uniformly storing a common prefix of each fixed length key, and separately storing each fixed The long key removes the remaining portion after the common prefix; the variable length key and its value are stored in the second data block, wherein the storage variable length key comprises: a full length storage variable of the reference key type, and a change of the prefix compression key type
  • the long key performs prefix compression. It can be seen that after applying the embodiment of the present invention, the prefix compression method is used in the variable length key data block, and Selecting each data block for compression can effectively reduce the storage space of the data and improve the utilization of the machine disk.
  • the embodiment of the present invention uses a data block as a storage unit, and thus can facilitate the strength of 10 and parsing.
  • the embodiment of the present invention can quickly locate the queried data according to the index block and the order within the data block, thereby improving query efficiency.
  • the hardware modules in the various embodiments may be implemented mechanically or electronically.
  • a hardware module can include specially designed permanent circuits or logic devices (such as dedicated processors such as FPGAs or ASICs) for performing specific operations.
  • the hardware modules may also include programmable logic devices or circuits (e.g., including general purpose processors or other programmable processors) that are temporarily configured by software for performing particular operations.
  • programmable logic devices or circuits e.g., including general purpose processors or other programmable processors
  • the specific use of mechanical means, or the use of dedicated permanent circuits, or the use of temporarily configured circuits (such as software configuration) to implement hardware modules can be determined based on cost and time considerations.
  • the present invention also provides a machine readable storage medium storing instructions for causing a machine to perform a method as described herein.
  • a system or apparatus equipped with a storage medium on which software program code implementing the functions of any of the above-described embodiments is stored, and a computer (or CPU or MPU) of the system or apparatus may be stored Reading and executing the program code stored in the storage medium.
  • some or all of the actual operations may be performed by an operating system or the like operating on a computer based on instructions of the program code. It is also possible to write the program code read out from the storage medium into a memory set in an expansion board inserted in the computer or into a memory set in an extension unit connected to the computer, and then install the program based on the instruction of the program code.
  • the CPU or the like on the expansion board or the expansion unit performs part and all of the actual operations, thereby realizing the functions of any of the above embodiments.
  • Storage medium embodiments for providing program code include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM ⁇ CD-R, CD-RW ⁇ DVD-ROM ⁇ DVD-RAM ⁇ DVD-RW, DVD+RW), Tape, non-volatile memory card and ROM.
  • the program code can be downloaded from the server computer by the communication network.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明实施方式提出了一种数据存储方法和装置。在第一数据块中存储定长键及其值,其中所述存储定长键包括:统一存储各个定长键的公共前缀,并分别存储各个定长键去除公共前缀后的剩余部分;在第二数据块存储变长键及其值,其中所述存储变长键包括:全量存储基准键类型的变长键,而对前缀压缩键类型的变长键执行前缀压缩。

Description

一种数据存储方法和装置
相关文件
本申请要求于 2012年 12月 14日提交中国专利局、 申请号为 201210541207.0、 发明名称为"一种基于键排序的数据存储方法和装置"的中国专利申请的优先权, 其 全部内容通过引用结合在本申请中。
技术领域
本发明实施方式涉及信息处理技术领域, 更具体地, 涉及一种数据存储方法和 装置。 发明背景
键-值( key-value )分布式存储系统具有查询速度快、 存放数据量大、 支持高并 发(如支持多个并发的查询过程)等优点, 非常适合通过主键进行查询, 但不能进 行复杂的条件查询。 如果辅以实时搜索引擎( Real-Time Search Engine )进行复杂条 件检索、 全文检索, 就可以替代并发性能较低的 MySQL等关系型数据库, 达到高 并发、 高性能, 节省服务器数量的目的。
发明内容
本发明实施方式提出一种数据存储方法, 可以提高存储空间的利用率。
本发明实施方式还提出一种数据存储装置, 可以提高存储空间的利用率。
本发明实施方式的具体方案如下:
一种数据存储方法, 该方法包括:
在第一数据块中存储定长键及其值, 其中所述存储定长键包括: 统一存储各个 定长键的公共前缀, 并分别存储各个定长键去除公共前缀后的剩余部分;
在第二数据块存储变长键及其值, 其中所述存储变长键包括: 全量存储基准键 类型的变长键, 而对前缀压缩键类型的变长键执行前缀压缩。
一种数据存储装置, 该装置包括定长键存储单元和变长键存储单元, 其中: 定长键存储单元, 用于在在第一数据块中存储定长键及其值, 其中所述存储定 长键包括: 统一存储各个定长键的公共前缀, 并分别存储各个定长键去除公共前缀 后的剩余部分;
变长键存储单元, 用于在第二数据块存储变长键及其值, 其中所述存储变长键 包括: 全量存储基准键类型的变长键, 而对前缀压缩键类型的变长键执行前缀压缩。
从上述技术方案可以看出, 变长键数据块内釆用前缀压缩方式, 可以有效减小 数据的存储空间, 提高机器磁盘利用率。 附图简要说明
以下附图仅为本发明技术方案的一些例子, 本发明并不局限于图中示出的特 征。 以下附图中, 相似的标号表示相似的元素:
图 1是一个实施例的计算设备的结构示意图;
图 2为本发明实施方式的文件格式示意图;
图 3为本发明实施方式的数据存储方法流程图;
图 4为本发明实施例用于存储具有定长键的记录的数据块存储结构示意图; 图 5为根据本发明实施方式的将定长键写入数据块的过程示意图;
图 6为本发明实施例的用于存储具有变长键的记录的数据块存储结构示意图; 图 7为本发明实施例的将变长键写入数据块的过程示意图。
图 8为本发明实施例的布隆过滤器的存储结构示意图。
图 9为本发明实施例的将块索引信息写入索引块的过程示意图。
图 10为本发明实施例的文件头的存储结构示意图。
图 11为本发明实施例的将记录写入文件的流程示意图;
图 12为本发明实施例的文件读取方法示意图;
图 13为本发明实施例的读取记录流程示意图;
图 14为本发明实施例的数据存储装置结构图。
图 15为本发明实施例的数据存储装置结构图。
实施本发明的方式
为了描述上的简洁和直观, 下文通过描述若干代表性的实施例来对本发明的方 案进行阐述。 实施例中大量的细节仅用于帮助理解本发明的方案。 但是很明显, 本 发明的技术方案实现时可以不局限于这些细节。 为了避免不必要地模糊了本发明的 方案,一些实施方式没有进行细致地描述, 而是仅给出了框架。 下文中, "包括"是 指 "包括但不限于", "根据…… "是指 "至少根据 ··· ···, 但不限于仅根据…… "。 由 于汉语的语言习惯, 下文中没有特别指出一个成分的数量时, 意味着该成分可以是 一个也可以是多个, 或可理解为至少一个。
图 1是一个实施例的计算设备的结构示意图。 如图 1所示, 计算机 100可以是 能够实现本发明各例子提供的方法和软件系统的计算设备。 例如, 计算机 100可以 是个人电脑或便携设备, 例如笔记本电脑、 平板电脑、 手机或智能手机, 等。 计算 机 100还可以是与上述设备通过网络相连的服务器。
计算机 100可以具有不同的性能和特征。 各种可能的实现方式都在本文的保护 范围内。 例如, 计算机 100可以包括按键区 /键盘 156, 还可以包括一个显示器 154, 如液晶显示器(LCD ), 或者具有高级功能的显示器, 例如触摸感应 2D或 3D显示 器。 一个例子中, 一个具有 web功能的计算机 100可以包括一个或多个物理键盘或 虚拟键盘, 以及大容量存储装置 130。
计算机 100也可以包括或允许各种操作系统 141 ,例如 WindowsTM或 LinuxTM 操作系统, 或移动操作系统, 如 iOSTM , AndroidTM , 或 Windows MobileTM等。 计算机 100可以包括或运行各种应用程序 142, 例如数据存储应用 145。数据存储应 用 145能够将有序的记录(record )以本发明实施例的文件格式存储到非易失性存储 装置 130中。
此外, 计算机 100可以包括一个或多个处理器可读的非易失性存储介质 130和 一个或多个与存储介质 130通信的处理器 122。 例如, 处理器可读的非易失性存储 介质 130可以是 RAM、 闪存、 ROM、 EPROM、 EEPROM、 寄存器、 硬盘、 移动硬 盘、 CD-ROM, 或其它各种形式的非易失性存储介质。 存储介质 130可以存储一系 列指令或包含指令的单元和 /或模块, 用于完成本发明各种实施例的操作。 处理器可 以执行上述指令, 完成各种实施例中的操作。
在本发明实施方式中提出一种数据持久化的文件格式, 可以基于 key 排序, 支持定长、 变长 key以及值 ( value ) , 而且 key可以带前缀压缩。 而且, 在本 发明实施方式中, 以数据块 (block ) 为存储单位, 有利于 10和解析的力度。
优选在本发明实施方式的数据块内釆用前缀压缩方式以及对每个数据块进 行压缩, 从而有效减小数据的存储空间, 并提高机器磁盘利用率, 当读取数据 时, 可以依据索引块以及数据块内部的有序性, 快速定位查询的数据。
本发明实施方式的键排序的数据存储方法可以包括以下步骤。
在第一数据块中存储定长键及其值, 其中所述存储定长键包括: 统一存储各个 定长键的公共前缀, 并分别存储各个定长键去除公共前缀后的剩余部分。
在第二数据块存储变长键及其值, 其中所述存储变长键包括: 全量存储基准键 类型的变长键, 而对前缀压缩键类型的变长键执行前缀压缩。
其中, 在本发明实施方式中, 上述步骤的执行顺序并无任何限定。
在这里, 第一数据块专门用于存储定长键及其值, 第二数据块专门用于存储变 长键及其值。
一个例子中, 可以根据预先设置的阈值长度和阈值差分将变长键划分为基准键 类型和前缀压缩键类型, 其中: 将当前变长键与上一个基准键进行前缀比较, 如果相同前缀串小于所述阈值长 度, 则判定该当前变长键为基准键类型;
将当前变长键与上一个基准键进行前缀比较, 如果相同前缀串大于所述阈值长 度与阈值差分的和, 则判定该当前变长键为基准键类型;
将当前变长键与上一个基准键进行前缀比较, 如果相同前缀串大于所述阈值长 度而小于所述阈值长度与阈值差分的和, 则判定该当前变长键为前缀压缩键类型。
一个例子中, 前缀压缩键类型的变长键执行前缀压缩包括:
针对前缀压缩键类型的变长键, 存储该前缀压缩键类型的变长键与上一个基准 键的公共前缀的长度, 以及存储该前缀压缩键类型的变长键去除该公共前缀后的剩 余部分。
在一个实施方式中, 当判定将定长键及其值存储到第一数据块失败时, 压缩所 述第一数据块, 并分配存储緩冲区; 当压缩后第一数据块的大小小于所述存储緩冲 区时, 将所述压缩后第一数据块写入所述存储緩冲区;
当判定将变长键及其值存储到第二数据块失败时, 压缩所述第二数据块, 并分 配存储緩冲区; 当压缩后第二数据块的大小小于所述存储緩冲区时, 将所述压缩后 第二数据块写入所述存储緩冲区。
一个例子中, 该方法还可以包括:
当判定将定长键及其值存储到第一数据块成功时, 将所述定长键及其值的布隆 过滤器信息写入到布隆过滤器中;
当判定将变长键及其值存储到第二数据块成功时, 将所述定长键及其值的布隆 过滤器信息写入到布隆过滤器中。
在一个实施方式中, 可以设置读取緩冲区, 并当判定该读取操作的读取长度小 于所述緩冲区且该当前数据块不是最后数据块时, 取出下一数据块的起始地址并记 录下一数据块的长度并继续读取, 直到读取长度大于所述緩冲区的长度为止。
本发明实施方式的一种文件格式由数据块(包括数据块 1、 数据块 2, —直到数 据块 n )、 布隆过滤器, 索引块, 文件头部 (包括文件头和文件头长度)组成。
在数据块部分存储有序的记录(record ), 该记录分为 Key和值( value )。 对于 Key部分, 可以划分为定长 Key和变长 Key。
可以将同一类型的 Key及其 Value存放在相同的数据块中, 比如在数据块 1专 门存放定长 Key及其 Value;在数据块 2专门存放变长 Key及其 Value,等等。而且, 专门存放同一类型的 Key及其 Value的数据块的数目可以有多个。 在存放定长 Key及其 Value的数据块中,对于各个定长 Key的公共前缀( prefix ) 优选只存一份, 而针对每个 key只存该 key去除公共前缀后的剩余部分(即不同部 门 remainder )。
在存放变长 Key及其 Value的数据块,可以依据预先设定的阈值长度 ( threshold length ) 和阈值差分 ( threshold diff ) 区分基准 (base ) key 和前缀压缩 (prefix compressed ) key。 对于 base key, 执行全量存储; 对于 prefix compressed key, 贝1 J 存该当前 refix compressed key与前一个 base key的 prefix长度 (变长整型压缩 ), 然后存放该 refix compressed key的 remainder邵分。
存放定长 Key及其 Value的数据块以及存放变长 Key及其 Value的数据块中, 对于 value部分(变长;), 可以存放隶属于同一个 key的所有 cell的压缩结果。
基于上述文件结构, 当所有的 block存满记录( record )后, 顺序写入文件中。 如果该 block常驻内存(常驻内存的 block, 需要文件写入器(比如: SSTable writer ) 来进行管理, 若 SSTable writer消失, 则常驻内存的 block也消失, 需要用户构建 SSTable读取器 SSTable reader, 调用 SSTable writer的 Assign方法 swap out已 dump 的 block ), 则分配 block, 将 block hold在 SSTable writer中; 如果该 block不常驻内 存, 则直接 dump block到 SSTable。
当所有 block写完之后,会写入文件的元数据(即 bloom filter, block index, header, header length信息), 此时文件写入器(writer )写入完毕。
对于变长 Key, 基于本发明实施方式文件格式的 block在进行写入(push ) record时,需要分配临时空间用来存储键偏移( key offset ),基准偏移( base offset ), 键(key ) , 数据偏移 (data offset ) 和数据 (data ) 。
( 1 )当在写入第一条记录 record时,设置当前 Key、 Data在 Key offset, Data offset临时分配区域的偏移量 (第一条时, 偏移量为 0 ) , 并计算当前 key是否 可以成为 base key , 如果可以则记录 base offset (当前 key为基准进行压缩, 此 key不进行压缩) , 并将 key, data存放到临时分配区。
( 2 )每次 push—条记录时, 将当前 key与前一条记录的基准 key进行前缀 比较 (获取前一条记录的 base offset, 相同前缀串大于阈值 ( thresholdjen, 可 设置) 时, 进行前缀压缩, 该 key对应的 base offset中存放前一条记录的基准 Key的偏移, 得到压缩后相同前缀的长度 ( var int表示) , 然后在 key临时分配 区内, 接着上次写入的结束位置, 写入相同前缀的长度(var int表示)和不同部 分的 key; 在 data offset, data临时分配区写入 data的偏移和 data数据。 ( 3 )每次 push—条记录时, 将当前 key与前一条记录的基准 key进行前缀 比较, 相同前缀串小于阈值时, 不进行前缀压缩, 该 key为基准 key进行存储, base offset中设为当前 key的偏移 (当前 key为基准进行压缩, 此 key不进行压 缩) , 然后在 key临时分配区内, 接着上次写入的结束位置, 写入 key; 在 data offset, data临时分配区写入 data的偏移和 data数据
( 4 )在每次 push—条记录时, 计算 block存储区已使用的空间大小, 临时 分配空间区域的大小之和, 若大于 block size , 则该 block已经写满, 当前 record 未写入。 此时, 考虑是否将 base offset, key, data offset, data临时分配的区域写入 block未使用的区域。
对于定长 Key, 每次 Push—条记录时, 计算当前已经 push记录的最长公共 串, 当 block写满时, 已求出最长公共串的长度以及最长公共串, block内存的 存储结构是先存储最长公共串, 接着存放每个 key的剩余串, 然后存放 data数 据, 最后存放头部 block header。
对于基于本发明实施方式的文件格式的数据块的读取 (get ) 操作, 主要包 括: 当在进行 get record时, 依据 record的索引, 在 block中进行查找; 或者依 据 key在 block中查找 record对应索引。
( 1 )才艮据 record索引, 找 key offset和 base offset, 依据 base offset 可以判断当前记录 record的 key是否进行过前缀压缩, 若未进行前缀压缩, 则 直接读取 key; 若进行前缀压缩, 则依据 base offset找到 base key , 和当前存储 key的位置获取前缀串的长度( var int ) , 然后还原 key。
( 2 )根据 record索引, 找到 data的 offset, 获取当前记录的 data域。
( 3 )根据 key查找 record索引, 主要依据二分查找进行快速定位。
本发明实施方式的布隆过滤器 bloom filter是一个特殊的退化 Hash Table。 退化到不处理 Collision , 不存储 Key值; bloom filter可以设置 hash次数, 依 据 hash次数, 以及每条 record的 key计算该记录在 bitmap中的位置, 并进行设 置。
在读取记录时, 依据 bloom filter进行第一层的过滤, 根据 hash次数, 以及 每条 record的 key计算该记录对应 bitmap中的位置是否被设置为 1 ,如果不为 1 , 这当前的 key不存在文件中, 若为 1 , 则可能存在文件中, 并依据文件中 block index的 end key在文件中查找。
本发明实施方式的索引块( Block Index )的存储方式(变长 key ,定长 value ) 与 block的存储方式相似。 key字段为 cell key, 存储每个数据块的最后一条 cell 的全量 key (行键 ( row key ) + cfid + column ) , value字段是数据块在文件中 的偏移(offset length) , 以及当前 row key的长度 (row key length)。
本发明实施方式的文件头( Header )存放文件相关信息以及各个部分的偏移 和长度, 有利于快速定位各个部分, 省去从文件开始进行遍历导致的系统资源 浪费。
在该文件头中可以设置:
( 1 ) 文件写入 record的 KV类型;
(2) 每个 block的压缩方法;
(3) 对固定长度的 key,为 Key的长度; 对非固定长度的 Key,为 0;
(4) 对固定长度的 data,该值为 data的长度; 非固定长度的,为 0;
(5) 用于非定长 key选取 basekey的; 与上一个 basekey的公共前缀串长 度阈值;
(6) 用于非定长 key选取 basekey的阈值;表示与前一个 key的公共前缀 串长度和与当前 basekey的公共前缀串长度的差值;
(7) 文件 id号;
( 8) 文件是否常驻内存;
( 9 ) table no号;
( 10) LGid号;
( 11 ) s stable的 i己录数;
( 12) sstable压缩前的长度;
( 13) sstable压缩后的长度;
( 14) block的大小;
( 15) 索引块的压缩后的长度;
( 16) 索引块压缩前的长度;
( 17) 索引块的偏移量;
( 18) bloomfilter块的偏移量;
( 19) bloomfilter块的长度;
( 20 ) bloomfilter的 hash数目;
( 21 ) bloomfilter的冲突概率;
(22) sstable创建时 ;
(23) sstable row的数目;
( 24) sstable cell的数目。
基于上述详细描述, 本发明提出了一种记录数据的写入方法。
在本发明实施方式中, 首先写入 record到 Block中; 若写入成功, 则写当前 record的 bloomfilter信息到 bloomfilter结构; 若写入失败, 则表明当前 Block 数据已满,压缩当前 Block,依据 write buffer中未使用空间大小和 Block的大小, 分配合适的 write buffer, 若当前 write buffer空间足够, 则将 Block写入 write buffer即可; 若 write buffer空间不足时, 则将现有的 write buffer进行写入磁盘。 write buffer相等于一层 cache , 緩存已经写入的若干个 Block , —次性的写入多 个 Block到磁盘。
其中,每次 Block写入的过程,都会将该 Block最后一条 key记录写入 Block Index结构。
当所有的 record写完后, bloomfilter和 Block Index 已经产生, 依次写入 bloomfilter和 Block Index, 并将 bloomfilter和 Block Index的位置信息记录到 SSTable Header里面。
接着, 文件头(Header )写磁盘。 )最后, 记录文件头长度( Header Length ) 到磁盘。 本发明实施例提出的一种文件格式(以下将这种文件格式称为 sstable格式, 将 这种格式的文件称为 sstable文件)。 可以利用该文件格式将数据进行永久保存, 也 称为数据持久化。 该文件格式可以存储具有定长、 变长键以及值的记录(record )。 一条记录包括键 ( key )和值( value )。 后文也将一个键和与该键同属于一条记录的 值简称为键及其值,或者键及其相应的值, 或者将当前记录的键和值简称为当前键、 当前值。键是该记录的关键字, 可以是用户输入的, 也可以是通过其它方式生成的。
图 2为本发明实施方式的文件格式示意图。如图 2所示,文件格式由数据块(包 括数据块 1、 数据块 2, ··· ···数据块 n )、 和元数据(meta data )组成。 元数据包括布 隆过滤器, 索引块, 文件头部 (包括文件头和文件头长度)。
数据块(block )用于存储记录, 记录可以是有序的或无序的。 一个例子中, 数 据块中可以存储已根据键进行排序的记录。
布隆过滤器(bloom filter )是一个特殊的退化哈希表( Hash Table )。 退化到不 处理冲突(Collision ), 不存储键值。 布隆过滤器存储各个记录是否在该文件中的信 息 (也称为布隆过滤器信息)。 写入一条记录时, 可以依据预设的 hash次数和该记 录的键, 计算该记录在位图 (BitMap ) 中的位置, 并设置该位置的值, 以表示该键 存在于该文件中。 例如, 当该记录在位图中的位置的值不为 1时, 表示该记录不存 在于该文件中; 当该记录在位图中的位置的值为 1时, 表示该记录可能存在于该文 件中。当查询一个键时,可以依据预设的 hash次数和该键,计算该键在位图( BitMap ) 中的位置, 并根据该位置的值确定该键是否存在于该文件中
索引块(index block )部分用于存储各个数据块的位置信息和各个数据块中存 储的键的范围信息。 一个例子中, 当存储的记录是经过键排序的, 例如, 按照键的 ASCII码进行排序, 则可以在索引块中记录各个数据块中存储的最后一个键作为块 索引键(end key ), 这样, 就可以根据索引块中的块索引键确定存储有该记录的数 据块, 并根据该数据块的位置读取数据块, 从而在数据块中查找该记录。
文件头部 (包括 header和 header length )用于存放文件的信息以及各个部分的 偏移和长度, 有利于快速定位各个部分, 省去从文件开始处进行遍历导致的系统资 源浪费。
基于上述文件格式, 一个例子中的 sstable文件生成过程包括: 将记录写入到数 据块(也简称为块) 中; 若写入成功, 则将当前记录的布隆过滤器信息写入布隆过 滤器结构; 当一个块写满时, 将块索引信息写入索引块部分; 当所有块写完之后, 会写入文件的元数据(meta数据), 此时文件写入完毕。
在本发明实施方式中, 以数据块(block ) 为存储单位, 可提高 10吞吐量和提 高解析效率, 加快解析速度。
一个例子中, 将记录写入磁盘中的 sstable文件时, 可以先在内存中为该文件分 配存储緩冲区 (write buffer ); 在内存中为一个数据块分配多个临时存储区域, 并向 各临时存储区域写入记录的各项信息, 例如可为数据块结构中的各部分分别分配临 时存储区域。 若写入失败, 则表明当前数据块已满, 依据存储緩冲区 (write buffer ) 中未使用空间大小和当前数据块的大小, 判断存储緩冲区空间是否足够。 若当前存 储緩冲区空间足够, 则将数据块写入存储緩冲区; 若存储緩冲区空间不足时, 则将 现有的存储緩冲区进行写入磁盘, 然后重新分配存储緩冲区来存放当前数据块以及 后续可能有的数据块。存储緩冲区相当于一层 cache,用于緩存已经写入的若干个块, 然后将这些块一次性写入到磁盘中, 减少与磁盘的交互, 加速写入速度。 从而提高 写入磁盘的效率。 其它例子也可以釆用其它緩存机制, 例如, 每写满一个块就将该 块写入磁盘, 等, 本发明对此不作限定。
一个例子中, 在将块写入文件时, 可以先对块进行压缩, 写入压缩后的块, 可 以节省存储空间。
记录中的键可以是定长键或变长键。 定长键的长度等于预设的值, 变长键的长 度没有固定的值。一个例子中, 可以将同一类型的键及其值存放在相同的数据块中, 比如在数据块 1专门存放定长键及其值; 在数据块 2专门存放变长键及其值, 等等。 存放同一类型的键及其值的数据块的数目可以有多个。
图 3为本发明实施方式的数据存储方法流程图。 如图 3所示, 该方法可以包括 以下步骤。
步骤 S11 : 在第一数据块中存储定长键及其值, 其中所述存储定长键包括: 存 储各个定长键的公共前缀, 并分别存储各个定长键去除公共前缀后的剩余部分。 步骤 S12: 在第二数据块存储变长键及其值, 其中所述存储变长键包括: 全量 存储基准键类型的变长键, 而对前缀压缩键类型的变长键执行前缀压缩。
其中, 在本发明实施方式中, 针对步骤 S11和步骤 S12的执行顺序并无任何限 定。
下文中将专门用于存储定长键及其值的数据块称为第一数据块, 将专门用于存 储变长键及其值的数据块称为第二数据块。 可以根据预设每个数据块的大小, 这个 预设的大小可以由用户设置或者以其他方式确定。 可以设置所有数据块具有相同的 大小, 也可以针对第一数据块和第二数据块分别设置其大小。
一个例子中, 存放定长键时, 第一数据块中可以只存一份各个定长键的公共前 缀 ( common prefix ), 并且存各个键去除公共前缀后的剩余部分(即不同的部分, remainder )。
图 4为本发明实施例用于存储具有定长键的记录的数据块的存储结构示意图。 如图 4所示, 该数据块由块头部 ( block header ) 401、 数据偏移 (data offset ) 402、 数据( data ) 403、键剩余部分( remainder key ) 404和键公共前缀 ( common prefix key ) 405这几个部分(也称为区域, field )组成。
块头部 401存储该数据块的信息, 可根据需要定义, 例如, 可以包括键公共前 缀 405存储的公共前缀的长度。
键公共前缀 405是指该数据块中存放的所有键的公共前缀部分。
数据偏移 402、 数据 403和键剩余部分 404分别包括多个存储单元( cell ), 每 个部分的每个存储单元对应一条记录。 数据块中的每个记录都对应剩余键部分 404 的一个存储单元、数据偏移 402的一个存储单元和数据 404的一个存储单元, 也即, 在数据块中得到一条记录对应的三个存储单元的信息, 再加上键公共前缀 405就能 还原出该 i己录 ( key+value )。
键剩余部分 404存放各个键去除公共前缀后的剩余部分。
数据偏移 402存放各个数据 (即各记录中的 value )在 block中的存储位置相对 一个起始位置的偏移量, 用于在 block中定位各 value。
数据 403存储各记录中的 value。
图 5为本发明实施例的将定长键写入数据块的过程示意图。 如图 5所示, 可以 在内存中为如图 4所述的第一数据块的块头部 401、 数据偏移 402、 数据 403、 键剩 余部分 404和键公共前缀 405分别分配临时存储空间 501、 502、 503、 504、 505。 将公共前缀写入键临时存储空间 505 , 并将公共前缀的长度写入临时存储空间 501。 写入一条记录时, 从该记录的键前缀中去除与公共前缀相同的部分后, 将键的 剩余部分写入该记录对应的临时存储空间 504中的存储单元。 将该记录的值写入该 记录对应的临时存储空间 503 中的存储单元, 并将该存储单元在临时存储空间 503 中的偏移量写入临时存储空间 502中该记录对应的存储单元。
每次写入一条记录时, 可以计算上述临时存储空间 501、 502、 503、 504、 505 的大小之和。 若该和大于等于预设的数据块大小 (size ), 则该数据块已经写满, 当 前记录未写入, 即当前记录存储失败。 将临时存储空间 502、 503、 504、 505按照数 据块的存储结构 (例如, 按照如图 4所示的存储结构)分别写入存储緩冲区 (write buffer )中, 并且在块头部的临时存储空间 501分别记录写入的各部分的信息, 例如 数据偏移 402、 数据 403、键剩余部分 404和键公共前缀 405在数据块中的位置(如 偏移量等)。 最后将临时存储空间 501写入存储緩冲区。 至此, 完成了第一数据块的 写入过程。 如果仍有尚未写入的带有定长键的记录, 可以重新分配临时存储空间并 重复上述数据块的写入过程, 已生成新的数据块。
其中, 定长键公共前缀的确定方法可以根据实际需要确定。 例如, 可以根据分 配的用于存储键剩余部分 404的临时存储空间的大小预测该数据块中能存储的记录 条数, 然后读取相应数目的记录的键, 得到这些键的公共前缀。
一个例子中, 当判定将定长键及其值存储到第一数据块失败时, 确定该第一数 据块已满。 可以对所述第一数据块进行压缩以减小数据的存储空间。 当压缩后第一 数据块的大小小于存储緩冲区可用空间时, 将所述压缩后第一数据块写入所述存储 緩冲区。
一个例子中, 存放变长键时, 可以将变长键可以分为基准(base )键和前缀压 缩(prefix compressed )键。 对于基准键, 执行全量存储, 即存储完整的键。 对于前 缀压缩键, 则存放该当前前缀压缩键与其基准键的前缀相同部分(后文简称为相同 前缀或相同前缀串)的长度,然后存放该前缀压缩键除去相同前缀后的剩余部分(后 文将这一步骤简称为前缀压缩, 或者变长整型压缩 )。
图 6为本发明实施例的用于存储变长键的数据块的存储结构示意图。 如图 6所 示, 该数据块由块头部 ( block header ) 601、 数据偏移 (data offset ) 602、 基准偏移 ( base offset ) 603、 键偏移( key offset ) 604、 数据( data ) 605、 键( key ) 606组成。
块头部 601存储该数据块的信息, 可以根据需要定义。
数据偏移 602、基准偏移 603、 键偏移 604、 数据 605和键 606分别包括多个存 储单元(cell ), 每个部分的每个存储单元对应一条记录。 数据块中的每个记录都在 数据偏移 602、 基准偏移 603、 键偏移 604、 数据 605和键 606中具有对应的一个存 储单元, 也即, 在数据块部分找到一条记录对应的五个存储单元的信息就能还原出 该记录 ( key+value )。
数据偏移 602存储各个数据 (即各记录中的 value )在 block中的存储位置相对 一个起始位置的偏移量, 用于在 block中定位数据。
数据 605存储各个记录中的 value。
基准偏移 603存储各个键的基准键在该数据块中的存储位置相对一个起始位置 的偏移量(即基准键的键偏移), 用于在 block中定位各个键的基准键。 基准键的基 准偏移设为 0。
键偏移 604存储各个键在该数据块中的存储位置相对一个起始位置的偏移量, 用于在 block中定位各个键。
键 606存储各个键与其基准键的相同前缀的长度和去除该相同前缀后的剩余部 分。
一个例子中, 可以将当前数据块中写入的第一条记录的键作为基准键, 根据该 基准键对后续记录的键进行前缀压缩。
一个例子中, 对前缀压缩键类型的变长键执行前缀压缩包括: 存储该前缀压缩 键类型的变长键与其基准键的相同前缀的长度, 以及存储该前缀压缩键类型的变长 键去除该相同前缀后的剩余部分。
在数据块内釆用前缀压缩方式可以减小数据的存储空间, 并提高机器磁盘利用 率。
一个数据块中也可以有一个或者多个基准键, 可以通过预设的方法将变长键划 分为基准键类型和前缀压缩键类型,即判定一个变长键作为基准键或者前缀压缩键。
一个例子中, 可以根据预先设置的阈值长度( threshold_len )将变长键划分为 基准键类型和前缀压缩键类型。 例如, 将当前变长键与当前基准键(即上一个键的 基准键; 当前键被确定为基准键时, 为了区分当前键和前一个键的基准键, 也将前 一个键的基准键称为上一个基准键)进行前缀比较, 如果相同前缀串的长度小于所 述阈值长度, 则判定该当前变长键为基准键类型; 将当前变长键与当前基准键进行 前缀比较, 如果相同前缀串的长度大于所述阈值长度, 则判定该当前变长键为前缀 压缩键类型。 阈值长度可根据需要设置。
一个例子中, 还可以根据预先设置的阈值差分将变长键划分为基准键类型和前 缀压缩键类型。 计算当前键与前一个键的相同前缀串的长度, 记为第一长度。 计算 当前键与当前基准键的公共前缀串的长度, 记为第二长度。 如果第一长度小于第二 长度与阈值差分之和, 则确定当前基准键为当前键的基准键, 则将当前基准键的键 偏移作为当前键的基准偏移; 若第一长度大于或等于第二长度与阈值差分之和, 则 确定当前键为基准键, 对当前键不进行压缩, 将当前键的基准偏移设为 0。 一个例 子中, 可以直接判断当前键与前一个键的相同前缀串的长度是否小于阈值长度与阈 值差分之和, 若该长度小于该和, 则该变长键作为前缀压缩键, 当前基准键为该变 长键的基准键; 若该长度大于或等于该和, 则该变长键作为基准键。
图 7为本发明实施例的将变长键写入数据块的过程示意图。 如图 7所示的例子 中, 将记录写入图 6所示格式的数据块时, 可以在内存中为块头部 601、 数据偏移 602、基准偏移 603、键偏移 604、数据 605、键 606分别分配临时存储空间 701、 702、 703、 704、 705、 706。
向数据块写入第一条记录时, 以该第一条记录的键作为基准键, 由于基准键不 进行压缩, 因此将完整的键存放到用于存储键的临时存储空间 706中该记录对应的 存储单元(例如, 第一个记录可以对应各临时存储空间的第一个存储单元 )。 将该键 的基准偏移存放到用于存储基准偏移的临时存储空间 703 中该记录对应的存储单 元。 基准键的基准偏移可以设置为 0。 将该记录的数据(即 value )存放到用于存储 数据的临时存储空间 705 中该记录对应的存储单元。 将当前键在临时存储空间 706 的偏移量(相对该区域起始位置的相对位置)写入用于存储键偏移的临时存储空间 704中该记录对应的存储单元。 将当前记录的数据在临时存储空间 705 中的偏移量 (相对该区域起始位置的相对位置)写入用于存储数据偏移的临时存储空间 702中 该记录对应的存储单元。 第一个记录的键偏移量和其相应的数据偏移量可以均设为 0。
向数据块写入后续的每一条记录时, 计算当前记录的键是否可以成为基准键。 如果确定当前键为基准键, 则确定当前键的基准偏移为 0, 并将该完整的键(因为 基准键不进行压缩)和基准偏移存放到临时存储空间 706、 703中当前记录对应的存 储单元。 如果确定当前键为前缀压缩键, 将当前基准键的键偏移写入临时存储空间 703 中当前记录对应的存储单元, 对该键进行前缀压缩, 即将当前键与基准键的相 同前缀的长度(var int表示)和不同部分(即当前键除去相同前缀后剩余的部分) 存放到临时存储空间 706中当前记录对应的存储单元。 将当前记录的键偏移、 数据 偏移和数据分别存放到临时存储空间 704、 702和 705中当前记录对应的存储单元。 这样, 就完成了一个变长键的写入。
一个例子中, 确定当前基准键的方法可以包括: 获取前一条记录的基准偏移以 确定当前基准键。 一个例子中, 确定当前基准键的方法可以包括: 在写入记录的过 程中保存当前的基准键的信息, 例如当前基准键的键偏移等。 例如, 可以分配一个 临时存储空间用来暂存当前的基准键的信息。 这样就可以根据保存的当前基准键的 信息得到当前的基准键。 确定当前键作为基准键时, 可以将保存的当前基准键的信 息更新为当前键的信息, 如当前键的键偏移。 其它例子也可以利用其它方法确定当 前的基准键。
一个例子中,每次写入一条记录时,可以计算上述临时存储空间 701、 702、 703、 704、 705、 706 的大小之和。 若该和大于等于预设的数据块大小 (size ), 则该数据 块已经写满, 当前记录未写入, 即当前记录存储失败。 将临时存储空间 702、 703、 704、 705、 706按照数据块的存储结构 (例如, 按照如图 6所示的存储结构)分别 写入存储緩冲区 (write buffer ) 中, 并且在临时存储空间 701分别记录写入的各部 分的信息, 例如数据偏移 602、 基准偏移 603、 键偏移 604、 数据 605、 键 606在数 据块中的位置(如偏移量等)。 最后将临时存储空间 701写入存储緩冲区。 至此, 完 成了第二数据块的写入过程。 如果仍有尚未写入的带有变长键的记录, 可以重新分 配临时存储空间并重复上述数据块的写入过程, 已生成新的第二数据块。
一个例子中, 每次写入一条记录时或者数据块写满时, 可以计算块存储区 (即 上述存储緩冲区) 已使用的空间大小和各临时分配空间区域的大小之和, 若该和大 于等于所述存储緩冲区的大小 (size ), 则该块存储区已经写满, 当前记录或数据块 未写入, 即当前记录或数据块存储失败; 若该和小于上述存储緩冲区的大小, 将存 储有当前记录的基准偏移、 键、 数据偏移、 数据的临时存储空间写入上述存储緩冲 区中未使用的区域。
一个例子中, 当判定将变长键及其值存储到第二数据块失败时, 可以压缩所述 第二数据块。 当压缩后第二数据块的大小小于所述存储緩冲区的未用空间时, 将所 述压缩后第二数据块写入所述存储緩冲区。 对每个数据块进行压缩可以有效减小数 据的存储空间。
一个例子中, 在数据块中存放变长的 value (即 value的长度不固定)时, 可以 存放对应于同一个键的所有存储单元(cell ) 的压缩结果, 即存储压缩后的 value, 这样可以减 ' j、数据的存储空间。
一个例子中, 基于图 2所示文件结构, 当所有的块(例如一个存储緩冲区中的 所有块, 或者多个存储緩冲区中所有的块)存满记录后, 可以将所有块顺序写入 sstable格式的文件中。 例如, 当记录是根据键排序后的, 则可以根据存放的键的顺 序将块顺序写入 sstable文件中。如果一个数据块常驻内存,则将该数据块保存( hold ) 在文件写入器(writer )中;如果该数据块不常驻内存,则直接将其写入 sstable文件。 常驻内存的数据块,需要文件写入器来进行管理,需要用户构建文件读取器( reader, 例如, s stable reader )来读取。 读取时, 调用文件读取器的方法 (例如 As sign方法, 可以由函数来实现)来置换(swap out ) 已写入文件中的数据块。 文件写入器和文件 读取器是提供给用户的一个接口, 用于管理常驻内存的块。
当所有块写完之后, 会写入文件的元数据, 此时文件写入完毕。 文件的元数据 包括布隆过滤器, 块索引、 文件头、 文件头长度的信息。
图 8为本发明实施例的布隆过滤器的存储结构示意图。 如图 8所示的例子中, 布隆过滤器为一个一维数组或矢量, 例如表示为 {vl, v2, ..., vn}。 其中的每个元素对 应文件中存放的一个记录的键的信息。
一个例子中, 当判定将定长键及其值存储到第一数据块成功时, 将所述定长键 及其值的布隆过滤器信息写入到布隆过滤器中; 当判定将变长键及其值存储到第二 数据块成功时, 将所述变长键及其值的布隆过滤器信息写入到布隆过滤器中。
键的布隆过滤器信息可以根据预设的计算方法得到, 例如可以预设对键进行 hash计算的次数等。 布隆过滤器信息表示该键在布隆过滤器中的位置, 该位置的值 则表示该键是否存在于该文件中。 一个例子中, 一个键的布隆过滤器信息为 1时, 表示该键可能存在于该文件中; 一个键的布隆过滤器信息不为 1 (例如为 0或者空 null )时, 表示该键不存在于该文件中。 搜索一个键时, 可以根据预设的计算方法得 到这个键的布隆过滤器信息, 从布隆过滤器中获取该布隆过滤器信息指示的位置的 值, 然后就可以根据获得的值判断该键是否有可能存在在该文件中了。
当所有的记录写完后, 布隆过滤器即写入完成。 在将所有数据块写入文件后, 将布隆过滤器写入文件。 一个例子中, 可以在内存中分配一块临时存储空间用于临 时存储布隆过滤器, 当所有记录存放完毕后, 将临时存储空间中的布隆过滤器写入 磁盘中的 sstable文件中。
图 9为本发明实施例的将块索引信息写入索引块的过程示意图。 如图 9所示, 索引块包括索引块头部 901、 数据偏移 902、 基准偏移 903、 块索引偏移 904、 数据 905、 块索引键 906。
索引块头部 901用于存储该索引块的信息, 包括索引块中各区域的起始位置、 长度等。
文件中的每个数据块在索引块的数据偏移 902、 数据 905、 基准偏移 903、 块索 引偏移 904和块索引键 906分别对应一个存储单元。 根据数据偏移 902、 数据 905、 基准偏移 903、 块索引偏移 904和块索引键 906中存储的一个数据的各项信息就可 以找到这个数据块在文件中的位置。
索引块的存储方式与存放变长键的数据块(即第二数据块) 的存储方式相似, 也可以看做是存储一系列变长 key+value, 每个变长 key+value对应一个数据块。
索引块中存储的每个 key对应每个数据块的最后一个存储单元( cell )的全量键 ( end key ), 即完整的键, 例如, 可以包括行键( row key ) + 列族 ID ( cfid ) + 列 ( column )。 例如, 对于存放定长键的第一数据块, 该 key为该第一数据块中最后一 个键的完整形式, 即公共前缀 +键剩余部分; 对于存放变长键的第二数据块, 该 key 为该第二数据块中最后一个键的完整形式, 即最后一个键为基准键时, 该 key为键 606中最后一个键, 最后一个键为前缀压缩键时, 该 key为根据该最后一个键的基 准偏移得到的基准键、 键中存储的与基准键的相同前缀长度和不同部分恢复得到的 完整的前缀压缩键。 一个例子中, 可以在将记录写入数据块的时候记录最后一个写 入成功的键, 当数据块写满时, 将记录的最后一个写入成功的键写入索引块中。
索引块中存储的每个 value对应每个数据块在文件中的位置,例如偏移量( offset length )和当前 row key的长度 ( row key length )。 可以在将数据块写入文件时, 在 索引块中写入该数据块在文件中的位置。
与第二数据块的存储方法类似, 索引块中对应一个数据块的各部分中, 数据偏 移 902存放该数据块的 value (即数据块在文件中的位置)在索引块中的存储位置; 数据 905存储相应数据块在文件中的位置(偏移量); 块索引偏移 904存储相应数据 块的最后一个键 ( end key )在该索引块中的存储位置; 基准偏移 903存储该数据块 的 end key在该索引块中的基准 key (不是该 end key在数据块中的基准 key )在该 索引块中的存储位置; 块索引键 906存储该数据块的 end key与其在该索引块中的 基准键的公共前缀长度和该 end key除去公共前缀的剩余部分。
每次数据块写入完毕, 会将该数据块最后一条记录的键写入索引块结构。 每次 将数据块写入文件时, 会将该数据块在文件中的位置信息写入索引块结构。 将各数 据块相应的 end key和 value写入索引块的过程与将变长键及其相应的 value写入第 二数据块的过程相似, 这里不再赘述。
当所有的记录写完后, 索引块写入完成。 在将布隆过滤器写入文件后, 将索引 块写入文件。 一个例子中, 可以在内存中分配一块临时存储空间用于临时存储索引 块, 当所有数据块和布隆过滤器写入磁盘中的文件后, 将临时存储空间中的索引块 写入磁盘中的文件中。
一个例子中, 也可以在内存中暂存文件, 将文件各部分写入内存中的文件后, 将内存中的完整文件转存入磁盘中。 因此, 本发明的各实施例可以釆用不同的文件 生成方式, 本发明对此不作限定。
当所有的记录写完后, 依次向文件中写入各数据块、 布隆过滤器和索引块, 并 将布隆过滤器和索引块的位置信息记录到文件头 ( Header )。
图 10 为本发明实施例的文件头的存储结构示意图。 文件头存放文件相关信息 以及各个部分的偏移和长度, 有利于快速定位各个部分, 省去从文件开始进行遍历 导致的系统资源浪费。一个例子的文件头结构可以如图 10所示, 该文件头可以设置 以下部分。
(1) 键-值类型 (KVtype), 表示文件写入的记录的 KV类型。 一个例子中, KV type 包括两种键类型 (变长键、 定长键)和两种值类型 (变长值、 定长值) 两 两组合得到的四种情况。
(2) 压缩类型, 表示每个块的压缩方法。
(3) 定长键的长度, 对固定长度的键, 为键的长度; 对非固定长度的键, 为
0。
( 4 ) 定长值长度, 对固定长度的数据 (即 value ), 该值为数据的长度; 非固 定长度的数据, 该值为 0。
(5) 阈值长度, 即前述一个例子中用于确定变长键是否作为基准键的参数、 表示与当前基准键的公共前缀串的长度的阈值。
(6) 阈值差分, 即前述一个例子中用于确定变长键是否作为基准键的参数之 (7) 文件 id号, 表示该文件的标识;
(8) In memory type, 表示该文件是否常驻内存;
(9) Table No, 用于上层应用调用;
( 10) Lg ID, 用于上层应用调用;
(11) 记录数, 表示该 sstable文件中存放的记录的数量;
( 12) 压缩前长度, 表示该 sstable文件压缩前的长度;
(13) 压缩后长度, 表示该 sstable文件压缩后的长度; ( 14 ) 数据块的大小, 表示各数据块的大小, 即前述判断数据块是否写满的参 数, 可以由用户设定;
( 15 ) 索引块的压缩后的长度;
( 16 ) 索引块压缩前的长度;
( 17 ) 索引块的偏移量;
( 18 ) 布隆过滤器的偏移量;
( 19 ) 布隆过滤器的长度;
( 20 ) 布隆过滤器的 hash 次数, 即前述例子中用于计算键的布隆过滤器信息 的参数之一;
( 21 ) 布隆过滤器的冲突概率;
( 22 ) sstable创建时间戳, 表示该 sstable文件的创建时间;
( 23 ) 行的数目, 表示该 sstable文件中行( row ) 的数目;
( 24 ) Cell数, 表示该 sstable文件中 cell的数目。
最后, 将文件头 ( Header )和文件头长度 ( Header Length )写入文件。 至此, 文件写入完毕。
写入流程也是一次性写入多个 Block块,减少与磁盘的交互,加速写入速度。 图 11 为本发明实施例的将记录写入文件的流程示意图。 如图 1 1 的记录 ( record )数据写入流程所示, 该方法可以包括以下步骤。
步骤 S201 : 将记录写入当前的数据块 (block ) (也简称为块) , 其中对于 定长键所对应的记录写入专门存储定长键及其值的数据块, 对于变长键所对应的 记录写入专门存储变长键及其值的数据块。存储定长键包括:在专门存储定长键及 其值的数据块中, 统一存储各个定长键的公共前缀, 并分别存储各个定长键去除公 共前缀后的剩余部分; 在专门存储变长键及其值的数据块中, 存储变长键及其值, 其中存储变长键包括: 全量存储基准键类型的变长键, 而对前缀压缩键类型的变长 键执行前缀压缩。
步骤 S202: 判断步骤 S201中的记录写入是否成功, 如果是则执行步骤 S211及 其后续步骤; 否则执行步骤 S203及其后续步骤。
步骤 S203: 判断当前数据块是否为空,如果是则返回参数错误, 并退出本流程, 如果不为空则执行步骤 S204及其后续步骤。
步骤 S204: 压缩当前的数据块。
步骤 S205: 判断当前数据块压缩是否成功, 如果不成功, 则返回压缩出错, 并 退出本流程, 如果成功, 则执行步骤 S206及其后续流程。
步骤 S206: 判断当前緩冲区 (write_buffer )是否为空且当前块压缩后是否大于 緩冲区的大小, 如果是则执行步骤 S208及其后续步骤, 否则执行步骤 S207及其后 续步骤。
步骤 S207: 判断剩余空间是否能写入当前的数据块, 如果是则执行步骤 S210 及其后续步骤, 否则执行步骤 S209及其后续步骤。
步骤 S208: 重新申请緩冲区空间, 并执行步骤 S210及其后续步骤。
步骤 S209: 启动 dump, 并结束本流程。
步骤 S210: 将当前数据块写入緩冲区, 保留索引, 緩存数据块, 并重置当前的 数据块。
步骤 S211 : 写入当前数据块的布隆过滤器信息。
对于基于本发明文件格式所存储的记录的读取方式, 依次读取 Header Length, 文件格式头 ( header ), Block index, BloomFiler等基本信息。
本发明实施例中, 存储的记录可以是预先排好顺序的, 例如按照记录的键的 ASCII码进行排序的。 这样, 当读取数据时, 可以依据索引块以及数据块内部数据 的有序性, 快速定位查询的数据。
下面的描述以文件中存储有预先根据键排好顺序的记录为例来说明从文件中 读取数据的方法。
一个例子中, 对于基于图 2所述文件格式所存储的记录, 可以依次读取文件头 长度( header length )、 文件格式头 (header ), 索引块、 布隆过滤器等基本信息。
图 12为本发明实施例的文件读取方法示意图。 如图 12所示, 该方法可以包括 以下步骤。
步骤 S31 , 读取文件的文件头长度 ( header length ) 区域(field ), 获取文件头 ( header ) 区域的长度。
步骤 S32, 根据文件头的长度读取文件头区域。
步骤 S33 , 根据文件头区域中的信息读取索引块(index block ) 区域。 例如, 可以根据文件头中的索引块的偏移量确定索引块在文件中的起始位置, 然后根据文 件头中的索引块压缩后的长度或者索引块压缩前的长度从上述起始位置开始读取出 索引块。
步骤 S34, 根据文件头区域中的信息读取布隆过滤器(bloom filter ) 区域。 例 如,可以根据文件头中的布隆过滤器的偏移量确定布隆过滤器在文件中的起始位置, 然后根据文件头中的布隆过滤器长度从上述起始位置开始读取出文件中的布隆过滤 器。
步骤 S35 , 通知上层打开 sstable文件的过程完成。
文件打开完成后, 就可以根据读取出的布隆过滤器和索引块的内容查找记录 了。
一个例子中, 可以依据键(例如, 收到用户输入的所要搜索的键)在文件中查 找记录。 由于记录在文件中的存放顺序是按照键排序的, 因此根据键进行的查找过 程可以依据二分法查找, 从而定位记录。
一个例子中, 在搜索一个键(以下称为目标键) 时, 可以先依据布隆过滤器进 行第一层的过滤,即根据文件头中的 hash次数计算该键对应布隆过滤器 bitmap中的 位置, 然后跟布隆过滤器 bitmap中该位置的值判断该键是否存在在文件中。 例如, 该值如果不为 1 , 则该键不存在该文件中; 若为 1 , 则该键可能存在该文件中, 并依 据索引块中存放的各数据块的 end key在文件中查找具有该键的记录。
一个例子中, 由于记录是根据键的顺序存放在各数据块中的, 则可以利用二分 法在索引块中查找可能存放有该目标键的数据块。
例如, 可以从索引块头部读取索引块中键偏移区域的位置和长度。 获取键偏移 区域中的选定的一个位置上的键偏移(例如键偏移区域中间的一个键偏移)。获取该 键偏移对应的键、 该键的基准偏移, 恢复该键。 如前所述, 索引块中存储的键为各 数据块中的最后一个键, 值为各数据块在文件中的位置。 将恢复出的键跟目标键比 较, 如果相等, 则直接读取这个键对应的数据块, 然后读取该数据块中存储的最后 一个键及其对应的值, 即为要找的记录。 如果恢复出的键不等于目标键, 则需要确 定目标键是否存在于恢复出的键所在的数据块或者恢复出的键所在的数据块的后一 个数据块中, 如果也不是, 则仍需要重复上述查找过程, 根据恢复的键是否大于目 标键确定即将搜索的键偏移的位置范围。 例如, 如果键是按照从小到大的顺序排序 的, 如果恢复出的键大于目标键, 则获取恢复出的键之前的一个键, 判断该前一个 键是否等于目标键, 如果等于目标键, 则获取该前一个键所在的记录即为要找的记 录, 如果仍大于目标键, 则在键偏移区域起始位置和该前一个键所在的位置之间按 照上述方法继续寻找目标键; 如果恢复出的键小于目标键, 则获取恢复出的键之后 的一个键, 判断该后一个键是否等于目标键, 如果等于目标键, 则获取该后一个键 所在的记录即为要找的记录, 如果仍小于目标键, 则在键偏移区域末尾位置和该后 一个键所在的位置之间按照上述方法继续寻找目标键。 当判断目标键的大小介于索 引块中两个相邻的键时, 则确定目标键可能存放在后一个键所在的数据块中, 可以 读取该数据块, 然后在该数据块中查找该目标键。
一个例子中, 当确定目标键可能存放的数据块时, 可以根据该数据块在索引块 中对应的数据偏移获取数据, 即该数据块在文件中的位置。 然后根据该位置和文件 头中的数据块大小从文件的相应位置读取出该数据块。
由于同一个数据块对应的各项信息在索引块中 (或者同一个记录对应的各项信 息在数据块中)具有相同的存放顺序, 键偏移在键偏移区域的存放序号也就是其相 应的基准偏移和数据偏移在基准偏移区域和数据偏移区域的存放序号。 根据键偏移 获取对应的基准偏移和数据偏移的方法有很多种。
例如, 当键偏移、 基准偏移、 数据偏移均为定长时, 可根据键偏移在键偏移区 域的位置计算该键偏移在键偏移区域的序号, 然后即可根据基准偏移的长度以及该 序号在基准偏移区域得到该键对应的基准偏移, 从而从基准偏移中取得该键的基准 键的位置。 该键对应的数据偏移以及数据的获取方法相同。
又例如, 可以在各键偏移、 基准偏移、 数据偏移中存储其在各自区域的序号, 获取一个键的键偏移后, 就能得到该记录在数据块中 (或者该数据块在索引块中) 的各项信息的序号, 然后在基准偏移区域获取该序号对应的基准偏移。 获取该键对 应的数据偏移的方法同上。
在读取出的数据块中寻找目标键的方法与上述在索引块中寻找目标键的方法 相似。 当记录是按照键的顺序存储时, 也可以釆用二分法, 即从键存储区域中选定 一个位置(例如区域中间的位置), 获取该位置存储的键, 与目标键进行比较, 然后 进一步缩小查找范围。 例如, 当记录是按照键从小到大的顺序存储在数据块中, 则 从数据块头部读取键偏移区域的起始位置和长度。 从键偏移区域中选择一个位置, 例如中间的位置, 获取该位置的键偏移, 根据该键偏移获取该键。 将恢复出的键与 目标键进行对比, 如果恢复出的键与目标键相等时, 该键对应的记录就是要找的记 录; 如果恢复出的键大于目标键, 则在键偏移区域的起始位置与上述选定位置选定 一个新的位置, 重复上述查找过程; 如果恢复出的键小于目标键, 则在键偏移区域 的末尾位置与上述选定位置选定一个新的位置, 重复上述查找过程。
从一个键在数据块中的存储位置(键偏移)恢复该键的方法与前面介绍两种数 据块的存储结构有关。
对于定长键(可从文件头中的键-值类型 (KV type ) 的值判断文件中存放的是 定长键),可以从数据块头部读取公共前缀的位置和长度,从数据块中读取公共前缀; 从数据块头部读取键剩余部分区域的位置, 根据键偏移确定该键对应的键剩余部分 的存放位置, 读取该键的键剩余部分, 将公共前缀加在键剩余部分前面就恢复出了 该键。
对于变长键(可从文件头中的键-值类型 (KV type ) 的值判断文件中存放的是 变长键),可以从数据块头部读取各个区域的位置和长度,从数据块中读取键和基准 偏移, 进而读取出基准键, 读取出键中的相同前缀长度和不同部分, 将基准键前缀 中截取相同前缀长度的串, 加上该键的不同部分, 就恢复出了这个键。 在存储变长 键的数据块中根据一个键的键偏移获取其相应的基准偏移的方法与上面在索引块中 根据一个键(数据块的 end key )的键偏移获取该键对应的基准偏移的方法相同, 不 再赘述。
利用键偏移恢复记录的时, 可以获取该键对应的数据偏移, 然后从数据块头部 获取数据部分的位置, 根据数据偏移读取出该键对应的值, 即可根据键(key )和值 ( value )恢复出记录(key+value )。 根据键偏移获取其相应的数据偏移的方法与上 面在索引块中根据一个键(数据块的 end key )的键偏移获取该键对应的数据偏移的 方法相同, 不再赘述。
一个例子中, 读取记录时, 可以设置读取緩冲区, 并当判定该读取操作的读取 长度小于所述緩冲区且该当前数据块不是最后数据块时, 取出下一数据块的起始地 址并记录下一数据块的长度并继续读取,直到读取长度大于所述緩冲区的长度为止。 其它例子也可以釆用其它的緩存方法读取文件。
对于 Block的读取, 则是依据用户的请求寻找( seek )到用户 key (即用户输入 的目标键)所在的位置后, 加载 key所在的 Block块, 读取用户需要的信息。
数据块分为预取读和延迟读: 预取读是一次性的读多个 Block块; 延迟读是聚 集多次读, 即接收多个读请求后, 然后一次性的读多个 Block块。
根据预取读和聚集的多次读信息, 计算读取的起始 block块和 Read Buffer的大 小, 计算所需要读取的 block数目以及读取的长度, block块的起始位置和长度在 Block Index中获取。
首先获取起始块的开始地址, 若当前 Block块是 SSTable文件最后一块, 直接 获取最后一块的长度并记录后返回; 若不是 SSTable文件最后一块, 获取下一 Block 块的起始地址和长度。
若是预取读方式, 则读取长度小于 Read Buffer的长度, 且当前 Block块不是最 后一个 Block块, 取下一 Block块的起始地址, 并记录下一块的长度, 直到读取长 度大于 Read Buffer长度为止结束。
若是延迟读方式, 则聚集每次读取的 Block块信息, 然后获取起始块的开始地 址和读取方式与预取读方式类似。
预取读与延迟读本质都是一次性的读取多个 Block块, 尽量减少读取磁盘时的 寻道和转动, 加速磁盘读取。
图 13为本发明实施例的读取记录流程示意图。 如图 13所示的记录(record )数 据读取流程, 该方法可以包括以下步骤。
步骤 S41 : 获取起始数据块的起始地址。
步骤 S42: 判断当前数据块是否是最后一块, 如果是则执行步骤 S43及其后 续步骤, 否则执行步骤 S44及其后续步骤。
步骤 S43: 获取最后一块数据块的长度并记录, 然后返回, 并退出本流程。 步骤 S44: 获取下一数据块的起始地址以及当前数据块的长度。
步骤 S45: 判断是否执行预取操作, 如果不执行, 则退出本流程, 如果执行, 则执行步骤 S46及其后续步骤。
步骤 S46: 判断读取长度是否小于最大读取大小 ( KMaxReadSize ) 并且不 是最后一块数据块, 如果是则执行步骤 S47 及其后续步骤, 否则执行步骤 S48 及其后续步骤。
步骤 S47: 取得再下一块数据块的起始地址, 记录下一数据块的长度。 步骤 S48: 判断读取长度是否大于等于最大读取大小 (KMaxReadSize ) , 如果是则退出本流程, 如果不是则执行步骤 S49。
步骤 S49: 取得最后一块数据块长度并记录。
基于上述详细分析, 本发明实施方式还提出了一种键排序的数据存储装置。 图 14为本发明实施例的数据存储装置结构图。 如图 14所示, 该装置包括定 长键存储单元 1401和变长键存储单元 1402。
定长键存储单元 1401 , 用于在在第一数据块中存储定长键及其值, 其中所述存 储定长键包括: 统一存储各个定长键的公共前缀, 并分别存储各个定长键去除公共 前缀后的剩余部分;
变长键存储单元 1402, 用于在第二数据块存储变长键及其值, 其中所述存储变 长键包括: 全量存储基准键类型的变长键, 而对前缀压缩键类型的变长键执行前缀 压缩。
图 15为本发明实施例的数据存储装置结构图。 如图 15所示, 该装置包括定 长键存储单元 1501和变长键存储单元 1502, 其功能与图 14所示的定长键存储单元 1401和变长键存储单元 1402类似。
一个例子中, 该装置还可以包括键类型区分单元 1503。
键类型区分单元 1503 用于根据预先设置的阈值长度和阈值差分将变长键划分 为基准键类型和前缀压缩键类型; 其中: 将当前变长键与上一个基准键进行前缀比 较, 如果相同前缀串小于所述阈值长度, 则判定该当前变长键为基准键类型; 将当 前变长键与上一个基准键进行前缀比较, 如果相同前缀串大于所述阈值长度与阈值 差分的和, 则判定该当前变长键为基准键类型; 将当前变长键与上一个基准键进行 前缀比较,如果相同前缀串大于所述阈值长度而小于所述阈值长度与阈值差分的和, 则判定该当前变长键为前缀压缩键类型。
一个例子中, 定长键存储单元 1501 , 用于针对前缀压缩键类型的变长键, 存储 该前缀压缩键类型的变长键与上一个基准键的公共前缀的长度, 以及存储该前缀压 缩键类型的变长键去除该公共前缀后的剩余部分。
一个例子中, 该装置还可以包括数据块间压缩单元 1504。
数据块间压缩单元 1504用于当判定将定长键及其值存储到第一数据块失败时, 压缩所述第一数据块; 当判定将变长键及其值存储到第二数据块失败时, 压缩所述 第二数据块。
一个例子中, 该装置还可以包括存储緩冲单元 1505。
存储緩冲单元 1505用于分配存储緩冲区,并当压缩后第一数据块的大小小于所 述存储緩冲区时, 将所述压缩后第一数据块写入所述存储緩冲区; 当压缩后第二数 据块的大小小于所述存储緩冲区时,将所述压缩后第二数据块写入所述存储緩冲区。
一个例子中, 该装置还可以包括布隆过滤器 1506。
布隆过滤器 1506用于当判定将定长键及其值存储到第一数据块成功时,将所述 定长键及其值的布隆过滤器信息写入其中; 当判定将变长键及其值存储到第二数据 块成功时, 将所述定长键及其值的布隆过滤器信息写入其中。
一个例子中, 该装置还可以包括读取緩冲单元 1507。
读取緩冲单元 1507用于设置读取緩冲区,并当判定该读取操作的读取长度小于 所述緩冲区且该当前数据块不是最后数据块时, 取出下一数据块的起始地址并记录 下一数据块的长度并继续读取, 直到读取长度大于所述緩冲区的长度为止。
一个例子中, 该装置还可以包括数据块索引存储单元 1508。
数据块索引存储单元 1508 用于存储第一数据块和第二数据块的最后一条单元 ( cell )的全量键, 并存储第一数据块和第二数据块在数据存储文件中的偏移量以及 当前行键的长度。
一个例子中, 该装置还可以包括键类型区分单元。
一个例子中,该键类型区分单元用于将当前变长键与当前基准键进行前缀比较, 如果相同前缀串小于所述阈值长度, 则判定该当前变长键为基准键类型; 如果相同 前缀串大于或等于所述阈值长度, 则判定该当前变长键为前缀压缩键类型。
一个例子中, 该键类型区分单元用于将当前变长键与所述第二数据块中存放的 前一个键进行前缀比较, 如果相同前缀串大于或者等于所述阈值长度与阈值差分的 和, 则判定该当前变长键为基准键类型; 如果相同前缀串小于所述阈值长度与阈值 差分的和, 则判定该当前变长键为前缀压缩键类型。
一个例子中, 该键类型区分单元用于获取当前变长键与所述第二数据块中存放 的前一个键的相同前缀串的第一长度, 获取当前变长键与当前基准键的相同前缀串 的第二长度, 如果第一长度大于或者等于所述第二长度与阈值差分的和, 则判定该 当前变长键为基准键类型, 如果第一长度小于所述第二长度与阈值差分的和, 则判 定该当前变长键为前缀压缩键类型。
一个例子中, 所述变长键存储单元用于针对前缀压缩键类型的变长键, 存储该 前缀压缩键类型的变长键与当前基准键的公共前缀的长度, 以及存储该前缀压缩键 类型的变长键去除该公共前缀后的剩余部分。
一个例子中, 该装置还可以包括数据块索引存储单元。
数据块索引存储单元用于将所述第一数据块存储到文件中时, 将所述第一数据 块在文件中的位置信息存储到索引块; 将所述第二数据块存储到文件中时, 将所述 第二数据块在文件中的位置信息存储到索引块; 将所述索引块存储到所述文件中。
一个例子中, 该装置还可以包括布隆过滤器。
布隆过滤器用于当判定将定长键及其值存储到第一数据块成功时, 将所述定长 键及其值的布隆过滤器信息写入到布隆过滤器中; 当判定将变长键及其值存储到第 二数据块成功时, 将所述定长键及其值的布隆过滤器信息写入到布隆过滤器中; 将 所述布隆过滤器存储到所述文件中。
一个例子中, 该装置还可以包括数据块间压缩单元。
数据块间压缩单元用于压缩所述第一数据块或者所述第二数据块, 将压缩后的 所述第一数据块或者所述第二数据块存储到文件中。
一个例子中, 该装置还可以包括值压缩单元。 值压缩单元用于所述将存储到第一数据块包括: 压缩所述定长键对应的值, 将 压缩后的值提供给所述定长键存储单元以存储到第一数据块; 所述将变长键及其值 存储到第二数据块包括: 压缩所述值, 将压缩后的值提供给所述变长键存储单元以 存储到第二数据块。
一个例子中, 所述定长键存储单元用于按照预先根据所述定长键排序后的顺序 将所述定长键及其值存储到所述第一数据块中;
所述数据块索引存储单元用于将所述第一数据块中存储的最后一个定长键和所 述第一数据块在所述文件中的起始位置和长度存储到所述索引块中;
所述变长键存储单元用于按照预先根据所述变长键排序后的顺序将所述变长键 及其值存储到所述第二数据块中;
所述数据块索引存储单元用于将所述第二数据块中存储的最后一个变长键和所 述第二数据块在所述文件中的起始位置和长度存储到所述索引块中。
可以将图 14所示装置集成到各种通信网络的硬件实体当中。
实际上, 可以通过多种形式来具体实施本发明实施方式所提出的基于键排序的 数据存储装置。 比如, 可以遵循一定规范的应用程序接口, 将基于键排序的数据存 储装置编写为存储服务器中的插件程序, 也可以将其封装为应用程序以供用户自行 下载使用。 当编写为插件程序时, 可以将其实施为 ocx、 dll、 cab等多种插件形式。 也可以通过 Flash插件、 RealPlayer插件、 MMS插件、 MIDI五线谱插件、 ActiveX 插件等具体技术来实施本发明实施方式所提出的基于键排序的数据存储装置。
可以通过指令或指令集存储的储存方式将本发明实施方式所提出的基于键排序 的数据存储方法存储在各种存储介质上。 这些存储介质包括但是不局限于: 软盘、 光盘、 DVD、硬盘、 闪存、 U盘、 CF卡、 SD卡、 MMC卡、 SM卡、记忆棒( Memory Stick )、 xD卡等。
另外, 还可以将本发明实施方式所提出的基于键排序的数据存储方法应用到基 于闪存(Nand flash ) 的存储介质中, 比如 U盘、 CF卡、 SD卡、 SDHC卡、 MMC 卡、 SM卡、 记忆棒、 xD卡等。
综上所述, 在本发明实施方式中, 在第一数据块中存储定长键及其值, 其中所 述存储定长键包括: 统一存储各个定长键的公共前缀, 并分别存储各个定长键去除 公共前缀后的剩余部分; 在第二数据块存储变长键及其值, 其中所述存储变长键包 括: 全量存储基准键类型的变长键, 而对前缀压缩键类型的变长键执行前缀压缩。 由此可见, 应用本发明实施方式之后, 变长键数据块内釆用前缀压缩方式, 以及优 选对每个数据块进行压缩, 因此可以有效减小数据的存储空间, 提高机器磁盘利用 率。
另外, 不同于现有技术中以单个 Key为存储单位, 本发明实施方式以数据块为 存储单元, 因此可以有利于 10和解析的力度。
而且, 本发明实施方式在读取数据时, 可以依据索引块以及数据块内部的有序 性, 快速定位查询的数据, 从而提高查询效率。
需要说明的是, 上述各流程和各结构图中不是所有的步骤和模块都是必须的, 可以根据实际的需要忽略某些步骤或模块。 各步骤的执行顺序不是固定的, 可以根 据需要进行调整。 各模块的划分仅仅是为了便于描述釆用的功能上的划分, 实际实 现时, 一个模块可以分由多个模块实现, 多个模块的功能也可以由同一个模块实现, 这些模块可以位于同一个设备中, 也可以位于不同的设备中。
各实施例中的硬件模块可以以机械方式或电子方式实现。 例如, 一个硬件模块 可以包括专门设计的永久性电路或逻辑器件(如专用处理器, 如 FPGA或 ASIC )用 于完成特定的操作。 硬件模块也可以包括由软件临时配置的可编程逻辑器件或电路 (如包括通用处理器或其它可编程处理器) 用于执行特定操作。 至于具体釆用机械 方式, 或是釆用专用的永久性电路, 或是釆用临时配置的电路(如由软件进行配置) 来实现硬件模块, 可以根据成本和时间上的考虑来决定。
本发明还提供了一种机器可读的存储介质, 存储用于使一机器执行如本文所述 方法的指令。 具体地, 可以提供配有存储介质的系统或者装置, 在该存储介质上存 储着实现上述实施例中任一实施例的功能的软件程序代码, 且使该系统或者装置的 计算机(或 CPU或 MPU )读出并执行存储在存储介质中的程序代码。 此外, 还可 以通过基于程序代码的指令使计算机上操作的操作系统等来完成部分或者全部的实 际操作。 还可以将从存储介质读出的程序代码写到插入计算机内的扩展板中所设置 的存储器中或者写到与计算机相连接的扩展单元中设置的存储器中, 随后基于程序 代码的指令使安装在扩展板或者扩展单元上的 CPU等来执行部分和全部实际操作, 从而实现上述实施例中任一实施例的功能。
用于提供程序代码的存储介质实施例包括软盘、 硬盘、 磁光盘、 光盘 (如 CD-ROM ^ CD-R、 CD-RW ^ DVD-ROM ^ DVD-RAM ^ DVD-RW、 DVD+RW )、 磁 带、 非易失性存储卡和 ROM。 可选择地, 可以由通信网络从服务器计算机上下载程 序代码。
综上所述, 权利要求的范围不应局限于以上描述的例子中的实施方式, 而应当 将说明书作为一个整体并给予最宽泛的解释。

Claims

权利要求书
1、 一种数据存储方法, 其特征在于, 该方法包括:
在第一数据块中存储定长键及其值, 其中所述存储定长键包括: 统一存储各个 定长键的公共前缀, 并分别存储各个定长键去除公共前缀后的剩余部分;
在第二数据块存储变长键及其值, 其中所述存储变长键包括: 全量存储基准键 类型的变长键, 而对前缀压缩键类型的变长键执行前缀压缩。
2、 根据权利要求 1所述的数据存储方法, 其特征在于, 该方法进一步包括: 将当前变长键与当前基准键进行前缀比较,如果相同前缀串小于所述阈值长度, 则判定该当前变长键为基准键类型; 如果相同前缀串大于或等于所述阈值长度, 则 判定该当前变长键为前缀压缩键类型。
3、 根据权利要求 1所述的数据存储方法, 其特征在于, 该方法进一步包括: 将当前变长键与所述第二数据块中存放的前一个键进行前缀比较, 如果相同前 缀串大于或者等于所述阈值长度与阈值差分的和, 则判定该当前变长键为基准键类 型; 如果相同前缀串小于所述阈值长度与阈值差分的和, 则判定该当前变长键为前 缀压缩键类型。
4、 根据权利要求 1所述的数据存储方法, 其特征在于, 该方法进一步包括: 获取当前变长键与所述第二数据块中存放的前一个键的相同前缀串的第一长 度, 获取当前变长键与当前基准键的相同前缀串的第二长度, 如果第一长度大于或 者等于所述第二长度与阈值差分的和, 则判定该当前变长键为基准键类型, 如果第 一长度小于所述第二长度与阈值差分的和,则判定该当前变长键为前缀压缩键类型。
5、 根据权利要求 1所述的数据存储方法, 其特征在于, 所述对前缀压缩键类型 的变长键执行前缀压缩包括:
针对前缀压缩键类型的变长键, 存储该前缀压缩键类型的变长键与当前基准键 的公共前缀的长度, 以及存储该前缀压缩键类型的变长键去除该公共前缀后的剩余 部分。
6、 根据权利要求 1所述的数据存储方法, 其特征在于, 进一步包括: 将所述第一数据块存储到文件中时, 将所述第一数据块在文件中的位置信息存 储到索引块;
将所述第二数据块存储到文件中时, 将所述第二数据块在文件中的位置信息存 储到索引块; 将所述索引块存储到所述文件中。
7、根据权利要求 1或 6所述的数据存储方法,其特征在于,该方法进一步包括: 当判定将定长键及其值存储到第一数据块成功时, 将所述定长键及其值的布隆 过滤器信息写入到布隆过滤器中;
当判定将变长键及其值存储到第二数据块成功时, 将所述定长键及其值的布隆 过滤器信息写入到布隆过滤器中;
将所述布隆过滤器存储到所述文件中。
8、 根据权利要求 1所述的数据存储方法, 其特征在于, 进一步包括: 压缩所述第一数据块或者所述第二数据块, 将压缩后的所述第一数据块或者所 述第二数据块存储到文件中。
9、 根据权利要求 1所述的数据存储方法, 其特征在于,
所述将定长键及其值存储到第一数据块包括: 压缩所述值, 存储压缩后的值; 所述将变长键及其值存储到第二数据块包括: 压缩所述值, 存储压缩后的值。
10、 根据权利要求 6所述的数据存储方法, 其特征在于,
所述在第一数据块中存储定长键及其值包括: 按照预先根据所述定长键排序后 的顺序将所述定长键及其值存储到所述第一数据块中;
所述将第一数据块在文件中的位置信息存储到索引块包括: 将所述第一数据块 中存储的最后一个定长键和所述第一数据块在所述文件中的起始位置和长度存储到 所述索引块中;
所述在第二数据块中存储变长键及其值包括: 按照预先根据所述变长键排序后 的顺序将所述变长键及其值存储到所述第二数据块中;
所述将第二数据块在文件中的位置信息存储到索引块包括: 将所述第二数据块 中存储的最后一个变长键和所述第二数据块在所述文件中的起始位置和长度存储到 所述索引块中。
11、 一种数据存储装置, 其特征在于, 该装置包括定长键存储单元和变长键存 储单元, 其中:
定长键存储单元, 用于在在第一数据块中存储定长键及其值, 其中所述存储定 长键包括: 统一存储各个定长键的公共前缀, 并分别存储各个定长键去除公共前缀 后的剩余部分;
变长键存储单元, 用于在第二数据块存储变长键及其值, 其中所述存储变长键 包括: 全量存储基准键类型的变长键, 而对前缀压缩键类型的变长键执行前缀压缩。
12、 根据权利要求 11所述的数据存储装置, 其特征在于, 该装置进一步包括键 类型区分单元;
键类型区分单元, 用于将当前变长键与当前基准键进行前缀比较, 如果相同前 缀串小于所述阈值长度, 则判定该当前变长键为基准键类型; 如果相同前缀串大于 或等于所述阈值长度, 则判定该当前变长键为前缀压缩键类型。
13、 根据权利要求 11所述的数据存储装置, 其特征在于, 该装置进一步包括键 类型区分单元;
键类型区分单元, 用于将当前变长键与所述第二数据块中存放的前一个键进行 前缀比较, 如果相同前缀串大于或者等于所述阈值长度与阈值差分的和, 则判定该 当前变长键为基准键类型; 如果相同前缀串小于所述阈值长度与阈值差分的和, 则 判定该当前变长键为前缀压缩键类型。
14、 根据权利要求 11所述的数据存储装置, 其特征在于, 该装置进一步包括键 类型区分单元;
键类型区分单元, 用于获取当前变长键与所述第二数据块中存放的前一个键的 相同前缀串的第一长度, 获取当前变长键与当前基准键的相同前缀串的第二长度, 如果第一长度大于或者等于所述第二长度与阈值差分的和, 则判定该当前变长键为 基准键类型, 如果第一长度小于所述第二长度与阈值差分的和, 则判定该当前变长 键为前缀压缩键类型。
15、 根据权利要求 11所述的数据存储装置, 其特征在于,
所述变长键存储单元用于针对前缀压缩键类型的变长键, 存储该前缀压缩键类 型的变长键与当前基准键的公共前缀的长度, 以及存储该前缀压缩键类型的变长键 去除该公共前缀后的剩余部分。
16、 根据权利要求 11所述的数据存储装置, 其特征在于, 进一步包括数据块索 引存储单元;
数据块索引存储单元, 用于将所述第一数据块存储到文件中时, 将所述第一数 据块在文件中的位置信息存储到索引块; 将所述第二数据块存储到文件中时, 将所 述第二数据块在文件中的位置信息存储到索引块;将所述索引块存储到所述文件中。
17、 根据权利要求 11或 16所述的数据存储装置, 其特征在于, 进一步包括布 隆过滤器; 其中:
布隆过滤器, 用于当判定将定长键及其值存储到第一数据块成功时, 将所述定 长键及其值的布隆过滤器信息写入到布隆过滤器中; 当判定将变长键及其值存储到 第二数据块成功时, 将所述定长键及其值的布隆过滤器信息写入到布隆过滤器中; 将所述布隆过滤器存储到所述文件中。
18、 根据权利要求 11所述的数据存储装置, 其特征在于, 进一步包括数据块间 压缩单元, 其中:
数据块间压缩单元, 用于压缩所述第一数据块或者所述第二数据块, 将压缩后 的所述第一数据块或者所述第二数据块存储到文件中。
19、 根据权利要求 11所述的数据存储装置, 其特征在于, 进一步包括值压缩单 元, 其中:
值压缩单元, 用于所述将存储到第一数据块包括: 压缩所述定长键对应的值, 将压缩后的值提供给所述定长键存储单元以存储到第一数据块; 所述将变长键及其 值存储到第二数据块包括: 压缩所述值, 将压缩后的值提供给所述变长键存储单元 以存储到第二数据块。
20、 根据权利要求 16所述的数据存储装置, 其特征在于,
所述定长键存储单元用于按照预先根据所述定长键排序后的顺序将所述定长键 及其值存储到所述第一数据块中;
所述数据块索引存储单元用于将所述第一数据块中存储的最后一个定长键和所 述第一数据块在所述文件中的起始位置和长度存储到所述索引块中;
所述变长键存储单元用于按照预先根据所述变长键排序后的顺序将所述变长键 及其值存储到所述第二数据块中;
所述数据块索引存储单元用于将所述第二数据块中存储的最后一个变长键和所 述第二数据块在所述文件中的起始位置和长度存储到所述索引块中。
PCT/CN2013/088286 2012-12-14 2013-12-02 一种数据存储方法和装置 WO2014090097A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/652,002 US9377959B2 (en) 2012-12-14 2013-12-02 Data storage method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210541207.0 2012-12-14
CN201210541207.0A CN103870492B (zh) 2012-12-14 2012-12-14 一种基于键排序的数据存储方法和装置

Publications (1)

Publication Number Publication Date
WO2014090097A1 true WO2014090097A1 (zh) 2014-06-19

Family

ID=50909035

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/088286 WO2014090097A1 (zh) 2012-12-14 2013-12-02 一种数据存储方法和装置

Country Status (3)

Country Link
US (1) US9377959B2 (zh)
CN (1) CN103870492B (zh)
WO (1) WO2014090097A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111316255A (zh) * 2017-11-20 2020-06-19 华为技术有限公司 数据存储系统以及用于提供数据存储系统的方法
US10896022B2 (en) 2017-11-30 2021-01-19 International Business Machines Corporation Sorting using pipelined compare units
US10936283B2 (en) 2017-11-30 2021-03-02 International Business Machines Corporation Buffer size optimization in a hierarchical structure
CN112486910A (zh) * 2020-11-23 2021-03-12 天津津航计算技术研究所 一种快速解析海量数据文件的方法
US11048475B2 (en) 2017-11-30 2021-06-29 International Business Machines Corporation Multi-cycle key compares for keys and records of variable length
US11354094B2 (en) 2017-11-30 2022-06-07 International Business Machines Corporation Hierarchical sort/merge structure using a request pipe

Families Citing this family (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103927124B (zh) * 2013-01-15 2018-03-13 深圳市腾讯计算机系统有限公司 以Hash方式组织的磁盘访问控制装置及方法
CN104657500A (zh) * 2015-03-12 2015-05-27 浪潮集团有限公司 一种基于key-value键值对的分布式存储方法
JP6468098B2 (ja) * 2015-07-02 2019-02-13 富士通株式会社 情報処理プログラム、装置、及び方法
CN105224828B (zh) * 2015-10-09 2017-10-27 人和未来生物科技(长沙)有限公司 一种基因序列片段快速定位用键值索引数据压缩方法
US10037148B2 (en) 2016-01-05 2018-07-31 Microsoft Technology Licensing, Llc Facilitating reverse reading of sequentially stored, variable-length data
CN107092607B (zh) * 2016-02-18 2021-04-23 中国移动通信集团安徽有限公司 一种话单入库方法及装置
US20170308561A1 (en) * 2016-04-21 2017-10-26 Linkedin Corporation Indexing and sequentially storing variable-length data to facilitate reverse reading
US20170371551A1 (en) * 2016-06-23 2017-12-28 Linkedin Corporation Capturing snapshots of variable-length data sequentially stored and indexed to facilitate reverse reading
US10310997B2 (en) * 2016-09-22 2019-06-04 Advanced Micro Devices, Inc. System and method for dynamically allocating memory to hold pending write requests
US10191693B2 (en) 2016-10-14 2019-01-29 Microsoft Technology Licensing, Llc Performing updates on variable-length data sequentially stored and indexed to facilitate reverse reading
CN106874348B (zh) * 2016-12-26 2020-06-16 贵州白山云科技股份有限公司 文件存储和索引方法、装置及读取文件的方法
CN107832343B (zh) * 2017-10-13 2020-02-21 天津大学 一种基于位图的mbf数据索引结构对数据快速检索的方法
US10735826B2 (en) * 2017-12-20 2020-08-04 Intel Corporation Free dimension format and codec
CN109388641B (zh) * 2018-10-22 2019-10-18 无锡华云数据技术服务有限公司 一种检索键值数据库中键的共同前缀的方法、设备、介质
CN109299112B (zh) * 2018-11-15 2020-01-17 北京百度网讯科技有限公司 用于处理数据的方法和装置
CN111208933B (zh) * 2018-11-21 2023-06-30 昆仑芯(北京)科技有限公司 数据访问的方法、装置、设备和存储介质
CN110825940B (zh) * 2019-09-24 2023-08-22 武汉智美互联科技有限公司 网络数据包存储和查询方法
US11863445B1 (en) 2019-09-25 2024-01-02 Juniper Networks, Inc. Prefix range to identifier range mapping
US11062507B2 (en) * 2019-11-04 2021-07-13 Apple Inc. Compression techniques for pixel write data
CN111241398B (zh) * 2020-01-10 2023-07-25 百度在线网络技术(北京)有限公司 数据预取方法、装置、电子设备及计算机可读存储介质
CN111241108B (zh) * 2020-01-16 2023-12-26 北京百度网讯科技有限公司 基于键值对kv系统的索引方法、装置、电子设备和介质
CN113381932B (zh) * 2020-03-09 2022-12-27 华为技术有限公司 一种生成段标识sid的方法和网络设备
US11366796B2 (en) * 2020-04-30 2022-06-21 Oracle International Corporation Systems and methods for compressing keys in hierarchical data structures
CN113779014A (zh) * 2020-06-10 2021-12-10 深信服科技股份有限公司 一种数据存储方法、装置、设备和存储介质
CN111930757B (zh) * 2020-09-24 2021-01-12 南京中兴软件有限责任公司 数据处理方法、系统、封装节点和解封装节点
CN112612925B (zh) * 2020-12-29 2022-12-23 度小满科技(北京)有限公司 数据的存储方法、读取方法以及电子设备
CN113923209B (zh) * 2021-09-29 2023-07-14 北京轻舟智航科技有限公司 一种基于LevelDB进行批量数据下载的处理方法
CN116089415A (zh) * 2021-11-05 2023-05-09 中兴通讯股份有限公司 键-值存储的方法和设备、计算机可读介质
CN116414828A (zh) * 2021-12-31 2023-07-11 华为技术有限公司 一种数据管理方法及相关装置
CN114077609B (zh) * 2022-01-19 2022-04-22 北京四维纵横数据技术有限公司 数据存储及检索方法,装置,计算机可读存储介质及电子设备
CN115202767B (zh) * 2022-09-19 2022-11-25 腾讯科技(深圳)有限公司 一种振动控制方法、装置、设备及计算机可读存储介质
CN116521090B (zh) * 2023-06-25 2023-09-12 苏州浪潮智能科技有限公司 数据落盘方法、装置、电子设备及存储介质
CN117271440B (zh) * 2023-11-21 2024-02-06 深圳市云希谷科技有限公司 一种基于freeRTOS文件信息存储方法、读取方法及相关设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1564991A (zh) * 2001-10-02 2005-01-12 索尼国际(欧洲)股份有限公司 字数据库压缩
CN101777056A (zh) * 2009-12-31 2010-07-14 成都市华为赛门铁克科技有限公司 数据存储方法及设备
CN102223289A (zh) * 2010-04-15 2011-10-19 杭州华三通信技术有限公司 一种存储IPv4地址和IPv6地址的方法和装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB9320404D0 (en) * 1993-10-04 1993-11-24 Dixon Robert Method & apparatus for data storage & retrieval
US20040220941A1 (en) * 2003-04-30 2004-11-04 Nielson Mark R. Sorting variable length keys in a database
US7496572B2 (en) * 2003-07-11 2009-02-24 Bmc Software, Inc. Reorganizing database objects using variable length keys
US7647329B1 (en) * 2005-12-29 2010-01-12 Amazon Technologies, Inc. Keymap service architecture for a distributed storage system
US9047330B2 (en) * 2008-10-27 2015-06-02 Ianywhere Solutions, Inc. Index compression in databases
CN101639848B (zh) * 2009-06-01 2011-06-01 北京四维图新科技股份有限公司 一种空间数据引擎及应用其管理空间数据的方法
CN102609490B (zh) * 2012-01-20 2014-07-02 东华大学 一种面向列存储dwms的b+树索引方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1564991A (zh) * 2001-10-02 2005-01-12 索尼国际(欧洲)股份有限公司 字数据库压缩
CN101777056A (zh) * 2009-12-31 2010-07-14 成都市华为赛门铁克科技有限公司 数据存储方法及设备
CN102223289A (zh) * 2010-04-15 2011-10-19 杭州华三通信技术有限公司 一种存储IPv4地址和IPv6地址的方法和装置

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111316255A (zh) * 2017-11-20 2020-06-19 华为技术有限公司 数据存储系统以及用于提供数据存储系统的方法
CN111316255B (zh) * 2017-11-20 2023-11-03 华为技术有限公司 数据存储系统以及用于提供数据存储系统的方法
US10896022B2 (en) 2017-11-30 2021-01-19 International Business Machines Corporation Sorting using pipelined compare units
US10936283B2 (en) 2017-11-30 2021-03-02 International Business Machines Corporation Buffer size optimization in a hierarchical structure
US11048475B2 (en) 2017-11-30 2021-06-29 International Business Machines Corporation Multi-cycle key compares for keys and records of variable length
US11354094B2 (en) 2017-11-30 2022-06-07 International Business Machines Corporation Hierarchical sort/merge structure using a request pipe
CN112486910A (zh) * 2020-11-23 2021-03-12 天津津航计算技术研究所 一种快速解析海量数据文件的方法

Also Published As

Publication number Publication date
CN103870492B (zh) 2017-08-04
US9377959B2 (en) 2016-06-28
CN103870492A (zh) 2014-06-18
US20150331619A1 (en) 2015-11-19

Similar Documents

Publication Publication Date Title
WO2014090097A1 (zh) 一种数据存储方法和装置
CN108319654B (zh) 计算系统、冷热数据分离方法及装置、计算机可读存储介质
US10795871B2 (en) Key-value stores implemented using fragmented log-structured merge trees
EP4068070A1 (en) Data storage method and apparatus, and storage system
CN107038206B (zh) Lsm树的建立方法、lsm树的数据读取方法和服务器
US9043334B2 (en) Method and system for accessing files on a storage system
US10783115B2 (en) Dividing a dataset into sub-datasets having a subset of values of an attribute of the dataset
TW201841122A (zh) 鍵值儲存樹
TW201842454A (zh) 合併樹廢棄項目指標
CN106980665B (zh) 数据字典实现方法、装置及数据字典管理系统
CN105468642A (zh) 数据的存储方法及装置
JP2005267600A5 (zh)
CN114328545B (zh) 数据存储及查询方法、装置及数据库系统
US11169968B2 (en) Region-integrated data deduplication implementing a multi-lifetime duplicate finder
CN116450656B (zh) 数据处理方法、装置、设备及存储介质
US20140012879A1 (en) Database management system, apparatus, and method
US20220398220A1 (en) Systems and methods for physical capacity estimation of logical space units
CN112241396B (zh) 基于Spark的对Delta进行小文件合并的方法及系统
EP3343395B1 (en) Data storage method and apparatus for mobile terminal
US11789639B1 (en) Method and apparatus for screening TB-scale incremental data
CN112328587A (zh) ElasticSearch的数据处理方法和装置
CN116561120B (zh) 一种用于时序数据库的数据文件快速合并方法及系统
CN116048396B (zh) 基于日志结构化合并树的数据存储装置和存储控制方法
JP5709563B2 (ja) バッファキャッシュ管理方法、バッファキャッシュ管理装置及びプログラム
CN116991761A (zh) 一种数据处理方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13862302

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 14652002

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 20/08/2015)

122 Ep: pct application non-entry in european phase

Ref document number: 13862302

Country of ref document: EP

Kind code of ref document: A1