WO2018205151A1 - Procédé de mise à jour de données et dispositif de stockage - Google Patents

Procédé de mise à jour de données et dispositif de stockage Download PDF

Info

Publication number
WO2018205151A1
WO2018205151A1 PCT/CN2017/083657 CN2017083657W WO2018205151A1 WO 2018205151 A1 WO2018205151 A1 WO 2018205151A1 CN 2017083657 W CN2017083657 W CN 2017083657W WO 2018205151 A1 WO2018205151 A1 WO 2018205151A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
storage area
subtree
key value
storage
Prior art date
Application number
PCT/CN2017/083657
Other languages
English (en)
Chinese (zh)
Inventor
徐君
于群
王元钢
薛常亮
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201780070813.XA priority Critical patent/CN110168532B/zh
Priority to PCT/CN2017/083657 priority patent/WO2018205151A1/fr
Publication of WO2018205151A1 publication Critical patent/WO2018205151A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present application relates to the field of computer storage, and in particular, to a data update method and a storage device.
  • the key (Key) can be quickly determined by finding a keyword, that is, a key, so that the capability of processing services in a large-scale real-time can be realized.
  • the Log Structure Merge Tree (LSM Tree) is the main algorithm structure of the KV database.
  • the random write can be changed to sequential write by layer-by-layer merging, but since the storage device such as the disk is stored in units of blocks, each read and write operation is performed in units of blocks, thus In the layer-by-layer merging, the entire data block related to the data needs to be read into the memory and merged with the data, and then written back to the disk, so that the write amplification in the system is serious, further Affected the improvement of data storage performance.
  • the present application provides a data updating method and storage device, which can improve data reading and writing efficiency.
  • a data update method is provided, the method being performed by a storage device including a first storage area and a second storage area, wherein a data read/write speed of the first storage area is higher than the second storage Data read and write speed of the area, the method comprising: searching an index tree according to a first key value of the first data to obtain a first subtree corresponding to the first key value, wherein the first data storage In the first storage area, the index tree includes an M layer, and a first n layer node of the index tree is stored in the first storage area, and a root node of the first subtree is located in the index tree.
  • the first n layers, the leaf nodes of the first subtree include information of data stored in the second storage area, M and n are positive integers, n is less than or equal to M; Writing the first storage area to the second storage area; updating the first sub-tree according to the first key value, where the updated first leaf node of the first sub-tree includes the The first data information.
  • the storage area in the storage device is partitioned, wherein the read/write performance of the first storage area is better than the read/write performance of the second storage area. And storing the first n-th node of the index tree in the first storage area, and the root node of the first sub-tree of the index tree is also located in the first n-th layer of the index tree, so that during the data update process, Compared with the manner in which the index tree of each level needs to be merged layer by layer in the prior art, the corresponding first subtree can be quickly found by the first key value of the data to be written, thereby improving the search speed. Moreover, in the present application, the merging of data in the first storage area and the second storage area can be implemented based on the first subtree, thereby reducing the problem of write amplification caused by layer-by-layer merging in the prior art.
  • the root node of the first subtree is located in the first n layers of the index tree, and n is less than or equal to the total number of layers M of the index tree, thus The root node of the first subtree is also located in the first storage area.
  • the root node of the first subtree is located in the nth layer of the index tree, that is, in the first n layer of the index tree located in the first storage area, and the root node of the first subtree is the last of the n layers layer.
  • the total number of layers of the index tree is 5, wherein the first 3 layers are stored in the first storage area, and the root node of the first subtree is located in the third layer of the index tree.
  • the first sub-tree corresponding to the first key value means that the first key value is within a range of key values corresponding to the first sub-tree.
  • the information of the first data may include at least one of the following information: a value of the first data, a first key value of the first data, an address (or a link) of the first data, The address (or link) of the first key value, and so on.
  • the information of the second data may include at least one of the following: a value of the second data, a first key value of the second data, an address (or a link) of the second data, The address (or link) of the second key value, and so on.
  • the method further includes: receiving a write request, where the write request includes the first data to be written and the first key value; The first data and the first key value are written into the first storage area.
  • the newly written data is first written into the first storage area, and the read/write performance of the first storage area is superior, thereby increasing the data writing speed.
  • the manner of transferring from the first storage area to the second storage area also reduces writing caused by data update directly in the second storage area. Amplify the problem, which improves data storage performance.
  • the method further includes: receiving a read request, where the read request includes a second key value of the second data; when the second key value is When the second data is not found in the first storage area, searching the index tree according to the second key value to obtain a second subtree corresponding to the second key value, where the second The root node of the subtree is located in the first n layer of the index tree; and the second storage area is read from the second storage area according to the information of the second data included in the second leaf node of the second subtree Two data, wherein the second leaf node is a leaf node that is found according to the second key value.
  • the writing the first data from the first storage area to the second storage area includes: when remaining storage of the first storage area When the space is smaller than the first threshold and the number of readings and readings of the first subtree satisfies the second threshold, the first data is written from the first storage area to the second storage area.
  • the index range corresponding to each read operation covers multiple subtrees, and a certain number of cold subtrees (subtrees with too small access frequency) exist in the multiple subtrees. Or a subtree with fewer leaf nodes, then the multiple subtrees can be merged. Before the multiple subtrees are merged, the data indexed by the key values corresponding to the subtrees needs to be written from the first storage area. Two storage areas.
  • a storage device that can be used to perform various ones of the storage methods described in the first aspect and various implementations described above.
  • the storage device includes a storage module and a processing module, the storage module includes a first storage area and a second storage area, and the data read/write speed of the first storage area is higher than the data read/write speed of the second storage area
  • the processing module is configured to: search an index tree according to a first keyword key value of the first data to obtain a first subtree corresponding to the first key value, where the first data is stored in the first In a storage area, the index tree includes an M layer, and a first n layer node of the index tree is stored in the first storage area, and a root node of the first subtree is located in a front n layer of the index tree.
  • the leaf node of the first subtree includes information of data stored in the second storage area, where M and n are positive integers, n is less than or equal to M; and the first data is from the first storage
  • the area is written into the second storage area; the first sub-tree is updated according to the first key value, wherein the updated first leaf node of the first sub-tree includes the first data information.
  • the processing module is further configured to: receive a write request, where the write request includes the first data to be written and the first key value; Writing the first data and the first key value into the first storage area.
  • the processing module is further configured to: receive a read request, where the read request includes a second key value of the second data; When the value is not found in the first storage area, the index tree is searched according to the second key value to obtain a second subtree corresponding to the second key value, where The root node of the second subtree is located in the first n layer of the index tree; and the information is read from the second storage area according to the information of the second data included in the second leaf node of the second subtree The second data, wherein the second leaf node is a leaf node that is found according to the second key value.
  • the processing module is specifically configured to: when a remaining storage space of the first storage area is smaller than a first threshold, and the number of reading and writing of the first subtree is satisfied And the second threshold is written from the first storage area to the second storage area.
  • the first storage area includes a non-volatile storage medium.
  • a storage device including a transceiver, a processor, and a memory.
  • the memory stores a program that executes the program for performing the various processes in the data update method described in the first aspect and various implementations described above.
  • a computer including a processor and a memory; the memory is configured to store computer execution instructions, and the processor and the memory communicate with each other through an internal connection path, when the computer is running, The processor executes the computer-executed instructions stored by the memory to cause the computer to perform various ones of the data update methods described in the first aspect and various implementations described above.
  • a computer readable storage medium storing a program, the program causing the apparatus to perform any one of the above first aspects and various implementation manners thereof .
  • a system chip comprising an input interface, an output interface, a processor, and a memory
  • the processor is configured to execute an instruction stored by the memory, and when the instruction is executed, the processor can implement the foregoing The first aspect and any of its various implementations.
  • 1 is a schematic diagram of data storage in the prior art.
  • FIG. 2 is a schematic flowchart of a data update method according to an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a data update method according to another embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a data update method according to another embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a storage device according to an embodiment of the present application.
  • FIG. 6 is a schematic block diagram of a storage device according to an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a storage device according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a system chip according to an embodiment of the present application.
  • the data update method described in the embodiments of the present application may be applied to a storage system supporting key-values.
  • data is stored in a key-value, and multiple pairs of key-values are stored in the corresponding file, by looking up the keyword Key in the Key Value.
  • the data value Value corresponding to the keyword Key is quickly determined, so that the capability of processing services in a large-scale real-time can be realized.
  • the updated data sequence is first written to the disk log, and then The data update is performed in the memory cache.
  • the data is directly transferred to the level (level) 0 layer file of the disk; when the data volume of the level 0 layer file is accumulated to a certain extent The degree is merged with the Level 1 layer file, and the merged new file is stored to the Level 1 layer, and the redundant data is deleted; when the data volume of the Level 1 layer file is accumulated to a certain extent, it is merged with the Level 2 layer file, and Store the merged new file to the Level 2 layer and delete the redundant data; and so on, to form a smaller number of files with larger storage capacity.
  • Such a layer-by-layer merge results in a significant write amplification. For example, if only a small amount of data in the upper layer needs to be merged into the next layer, it is only necessary to write the small amount of data to the next layer, but since the storage device such as the disk is stored in units of blocks, each read and write operation It must be in units of data blocks, so the entire data block associated with the data needs to be read into the memory and merged with the data, and then written back to the disk, so that a block of data is written. The amount of data. Therefore, the layer-by-layer merge method leads to a serious write amplification, which further affects the performance of data storage.
  • the key corresponding to the data is first searched in the memory table (memtable). If the key is not found, the index file of each layer of data is checked in reverse order, that is, the character. Sorted String Table (SSTable) until the Key is found. Each SSTable is ordered. The search speed is slowed down as the number of SSTables increases.
  • the time complexity is O(K log N), where K is the number of sstable files and N is the average size of the SSTable. Therefore, the complexity of the write operation also limits the performance of data storage.
  • the embodiment of the present application provides a data update method for improving storage performance.
  • the write operation in the embodiment of the present application may include an operation of writing new data (put) or update data (update), and the read operation in the embodiment of the present application may include reading (get) or range of data.
  • Query range query and other operations.
  • FIG. 2 is a schematic flowchart of a data update method according to an embodiment of the present application.
  • the method is performed by a storage device including a first storage area and a second storage area, wherein a data read/write speed of the first storage area is higher than a data read/write speed of the second storage area, and the storage device is accessed through an index tree.
  • Data stored in the second storage area, the index tree includes an M layer, the first n layers of the index tree are stored in the first storage area, M and n are both positive integers and n is less than or equal to the total number of layers of the index tree M.
  • the index tree may be any type of index tree, such as a binary tree, a balanced multiple search tree (B-Tree, B+Tree), etc., which is not limited in this application.
  • the first storage area may be, for example, a storage-class memory (SCM) or other byte-addressable non-volatile storage medium;
  • the second storage area may be, for example, a NAND flash memory ( NAND Flash) or Hard Disk Drive (HDD). Since the first storage area uses a fast storage medium, and the second storage area uses a slow storage medium, the data read and write speed of the first storage area is higher than the data read/write speed of the second storage area, that is, The access performance of the first storage area is better than the second storage area.
  • the index tree is used to access data in the second storage area, but since the upper layer access of the index tree is frequently higher, the first n layers of the index tree can be stored in the first storage area.
  • n is smaller than the total number of layers of the index tree, a partial layer of the index tree is stored in the first storage area, and when n is equal to the total number of layers of the index tree, the entire index tree is stored in the first storage area.
  • the second storage area such as storage level memory SCM, etc.
  • it may be preferred to store a part of the index tree that is, the first few layers, in the first storage area, and the remaining layers are stored in The second storage area. If the second storage area, such as the storage level memory SCM, is sufficiently cheap in the future, and the space is large enough, the entire index tree may also be stored in the second storage area, which is not limited herein.
  • the data update method may include the following steps:
  • an index tree is searched according to a first keyword key value of the first data to obtain a pair with the first key value.
  • the first subtree should be.
  • the first data is stored in the first storage area, and the index tree includes an M layer.
  • the first n-layer node of the index tree is stored in the first storage area, and the root node of the first sub-tree is located in the index tree.
  • the first n layers, the leaf nodes of the first subtree include information of data stored in the second storage area, M and n are positive integers, and n is less than or equal to M.
  • the nodes of the index tree are divided into four categories: root node, leaf node, parent node, and child node.
  • the child node is the next-level node of the parent node. If a node has a higher level, the upper level is called its parent node. If there is no upper level, the node has no parent node.
  • a node with no children in a tree is called a leaf node. There are no other nodes above the current node. This node is called the root node.
  • the nodes in the embodiments of the present application can also be written as nodes.
  • the root node of the first subtree is located in the first n layers of the index tree, and n is less than or equal to the total number of layers M of the index tree, thus The root node of the first subtree is also located in the first storage area.
  • the root node of the first subtree is located in the nth layer of the index tree, that is, in the first n layer of the index tree located in the first storage area, and the root node of the first subtree is the last of the n layers layer.
  • the total number of layers of the index tree is 5, wherein the first 3 layers are stored in the first storage area, and the root node of the first subtree is located in the third layer of the index tree.
  • the first sub-tree corresponding to the first key value means that the first key value is within a range of key values corresponding to the first sub-tree.
  • data may be stored in the form of a Key Value.
  • the data stored in the first storage area and the second storage area may include a value of the data and an index corresponding to the data, for example, the first data mentioned herein includes the value of the first data and the first An index of data, the index of the first data is the key Key in the key-value pair, and the value of the first data is the value Value in the key-value pair.
  • the Value corresponding to the Key can be quickly found.
  • Different data can be managed by the index tree. When the data is read and written, the index tree can be used to determine the data block where the corresponding data is located by using the index of the data, thereby realizing access to the data.
  • the student's student number and name are stored, wherein the student number is used as the key Key and the name is the value Value. If the new data is written into the student management database, execute put(0600100, Chen Meiling), that is, adding a pair of key-values (0600100, Chen Meiling) to the student management database.
  • the first data is written from the first storage area to the second storage area.
  • the storage device may write the first data stored in the first storage area into the second storage area. That is, the data indexed by the key value corresponding to the first subtree in the first storage area and the data indexed by the key value corresponding to the first subtree in the second storage area may be data merged, that is, The data is rearranged and re-stored according to the index size of the data in the first storage area and the second storage area. The merged data will be stored in the second storage area. Thus, since the data in the first storage area is transferred to the second storage area, The storage space in the first storage area can be released.
  • writing the first data from the first storage area to the second storage area includes: when the remaining storage space of the first storage area is less than the first threshold, and the number of times of reading and writing of the first subtree is satisfied At the second threshold, the first data is written from the first storage area to the second storage area.
  • the timing of writing the first data from the first storage area to the second storage area may be determined according to the size of the remaining storage space of the first storage area at the time, when the remaining storage space of the first storage area is insufficient, for example, less than the first At the threshold, the transfer of the data is initiated, that is, the data is written from the first storage area to the second storage area.
  • the storage device first selects at least one subtree satisfying the condition in all the subtrees where the root node is located in the first n layer, for example, the nth layer, and the at least one subtree includes, for example, the first subtree described above, and according to the At least one subtree writes data in the first storage area to the second storage area.
  • the judgment of whether each sub-tree in the index tree satisfies the condition is triggered, so that the root node is located at the nth layer and the key value corresponding to the subtree satisfying the condition is obtained.
  • the indexed data is written from the first storage area to the second storage area.
  • the condition that the at least one subtree satisfies may include, for example, any one of the following: the data that can be released after the data indexed by the key value corresponding to the subtree is written from the first storage area to the second storage area is greater than one pre
  • the threshold is set; the ratio of the number of write operations of the subtree to the number of read operations is greater than a preset threshold; the read operation frequency and/or the write operation frequency corresponding to the subtree are less than a preset threshold.
  • the following uses the first subtree as an example to describe these three conditions.
  • the storage space that can be released after the sub-tree merge operation is performed on the first sub-tree is greater than a preset threshold.
  • the primary purpose of writing the first data from the first storage area to the second storage area is to release the storage space of the fast storage area, so when the storage space of the fast storage area is insufficient, the key corresponding to each sub-tree may be The size of the storage space that can be released after the first data that is indexed is written from the first storage area to the second storage area, to determine which sub-tree corresponding key values are indexed from the first storage area to the second data.
  • Storage area For example, if the data indexed by the key value corresponding to the first subtree (including the first key value) (including the first data) is written from the first storage area to the second storage area, the size of the storage space that can be released is greater than one.
  • the storage device may write the data indexed by the key value corresponding to the first subtree from the first storage area to the second storage area.
  • the ratio of the number of write operations of the first subtree to the number of read operations of the first subtree is greater than a preset threshold.
  • a subtree with many write operations and few read operations can be selected, so that the data indexed by the key value corresponding to the subtree is written from the first storage area to the second storage area.
  • the ratio of the number of write operations of the first subtree to the number of read operations of the first subtree is used as a measure, if the ratio of the number of write operations of the first subtree to the number of read operations of the first subtree is greater than a preset
  • the data (including the first data) indexed by the key value corresponding to the first subtree is written from the first storage area to the second storage area.
  • the read operation speed is faster in the first storage area, and the read operation speed is slower in the second storage area, if the data corresponding to the first sub-tree in the first storage area has more read operations, The data is left in the first memory area so that the read speed of the system can be increased.
  • the write operation of the first subtree refers to that the key value of the written data is in the range of key values corresponding to the first subtree; the read operation corresponding to the first subtree refers to the key value of the read data. It is located in the range of key values corresponding to the first subtree.
  • the read operation frequency and/or the write operation frequency corresponding to the first subtree are less than a preset threshold.
  • the subtree may also write data indexed by the key value corresponding to the subtree from the first storage area to the second storage area. For example, if the read operation frequency and/or the write operation frequency of the first subtree is less than a preset threshold, the data indexed by the key value corresponding to the first subtree (including the first key value) may be included (including the first Data) is written to the second storage area from the first storage area.
  • I/O statistics may be updated each time a write operation or a read operation is performed in the storage device, and the I/O statistical information may include, for example. At least one of the following: the sum of the number of write operations (including the put operation and the update operation) for each subtree, the number of read operations, the index range of the read operation, the number of queries for the index range, and each time
  • the time stamp of the operation records, for example, time information for performing a write operation, time information for performing an update operation, time information for performing a read operation, and the like.
  • the first subtree is updated according to the first key value.
  • the first leaf node of the updated first subtree includes the information of the first data.
  • the storage device further updates the first subtree according to the first key value, and the updated first leaf node in the first subtree Information including the first data.
  • the first leaf node may be any leaf node in the first subtree.
  • the information of the first data may include at least one of the following information: a value of the first data, a first key value of the first data, an address (or a link) of the first data, and a first The address (or link) of the key value, etc.
  • the storage area in the storage device is partitioned, wherein the read/write performance of the first storage area is better than the read/write performance of the second storage area. And storing the first n-th node of the index tree in the first storage area, and the root node of the first sub-tree of the index tree is also located in the first n-th layer of the index tree, so that during the data update process, Compared with the manner in which the index tree of each level needs to be merged layer by layer in the prior art, the corresponding first subtree can be quickly found by the first key value of the data to be written, thereby improving the search speed. Moreover, in the present application, the merging of data in the first storage area and the second storage area can be implemented based on the first subtree, thereby reducing the problem of write amplification caused by layer-by-layer merging in the prior art.
  • the method further includes 240 and 250.
  • a write request is received, the write request including the first data to be written and the first key value.
  • the first data and the first key value are written to the first storage area.
  • the storage device may first store the received data into the first storage area for temporary management.
  • the first storage area does not merge by layer by layer when writing data, but for example, data can be written in a granularity of bytes, so that the speed of writing data into the first storage area is significantly higher than that of writing data.
  • the speed of the two storage areas avoids the problem of write amplification.
  • the newly written data is first written into the first storage area, and the read/write performance of the first storage area is superior, thereby increasing the data writing speed.
  • the manner of transferring from the first storage area to the second storage area also reduces writing caused by data update directly in the second storage area. Amplify the problem, which improves data storage performance.
  • the method further includes:
  • a read request is received, the read request including a second key value of the second data.
  • the root node of the second subtree is located in the first n layers of the index tree.
  • the second data is read from the second storage area according to the information of the second data included in the second leaf node of the second subtree.
  • the second leaf node is a leaf node that is found according to the second key value.
  • the information of the second data includes at least one of the following information: a value of the second data, a second key value of the second data, an address (or a link) of the second data, and a second key The address (or link) of the value, etc.
  • the storage device first searches for the second data in the first storage area according to the second key value, and if the second data is found in the first storage area, directly reads the second data. data. Since part of the data is also stored in the first storage area, if the part of the data has not been written to the second storage area, the storage device can find the data in the first storage area. Since the data access speed of the first storage area is significantly higher than the data access speed of the second storage area, fast reading of data can be realized.
  • the storage device may not find the index of the second data in the first storage area, and at this time, the storage device needs to be according to the second
  • the key value searches the index tree to obtain a second subtree corresponding to the second key value, and reads from the second storage area according to the information of the second data included in the second leaf node of the second subtree. Take the second data.
  • FIG. 5 is a schematic diagram of a data update method according to an embodiment of the present application.
  • the first storage area is a first storage medium SCM
  • the second storage area is a NAND Flash or HDD
  • the first storage area includes a data area and an index area.
  • the data area is used to store data
  • the index area is used to store the index tree. It is assumed here that the index tree is a balanced multipath search tree B+Tree.
  • the second storage area is used to store the index tree and data.
  • the first storage area may be a storage medium, and the storage medium is divided into a data area and an index area.
  • the data area and the index area may use different storage media, which are not limited herein.
  • the total number of layers of the index tree for finding data in the second storage area in the storage device is 5, and the first 3 layers are stored in the index area of the first storage area, and the last 2 layers are stored in the second storage area.
  • the index tree has a node A as a root node, wherein the child nodes of the node A include a node B and a node P, wherein the child nodes of the node B include a node C and a node I, and in turn, a child of the node C
  • the node includes a node D and a node E, and the child nodes of the node D include a node F, a node G, and a node H.
  • Node F, node G, and node H are leaf nodes of the entire index tree.
  • the first data When writing data, for example, writing the first data, the first data is not written into the second storage area as in the prior art, but the first data is written into the data area of the first storage area, the first storage
  • the data write speed of the zone is significantly faster than the second memory zone, so the first data can be written first.
  • the storage space of the first storage area is continuously reduced, when the storage space of the first storage area is reduced to a certain extent, for example, less than a space threshold.
  • the storage device writes data from the first storage area to the second storage area to release the storage space of the first storage area.
  • the storage device specifically transfers the data to the second storage area, and the sub-trees satisfying certain conditions are selected in the plurality of sub-trees whose root nodes are located in the third layer, and the key values corresponding to the sub-trees satisfying the condition are indexed. Data is written from the first storage area to the second storage area.
  • the storage device may be configured to: according to the size of the storage space that can be released by the storage space that can be released after the data indexed by the key value corresponding to the subtree is written from the first storage area to the second storage area, or the read/write operation corresponding to each subtree. The number of times determines whether each subtree satisfies the merge condition.
  • the plurality of subtrees in the index tree in which the root node is located in the third layer includes the first subtree and the second subtree.
  • the storage device determines the first Whether the subtree and the second subtree satisfy a preset condition, for example, whether the number of read/write operations corresponding to each subtree reaches a certain threshold, assuming that the first subtree satisfies
  • the preset condition is that the storage device writes the data indexed by the key value corresponding to the first subtree from the first storage area to the second storage area.
  • the data included in the first subtree in the first storage area and the data included in the first subtree in the second storage area are read into the memory, and sorted according to the key Key of the data, and the sorted The data is transferred from the memory to the second storage area so as to be stored in the corresponding location in the second storage area. In this way, the storage space in the first storage area can be released. As shown in FIG.
  • the data indexed by the key value corresponding to the first subtree in the first storage area includes (Key 1, Value 1), (Key 2, Value 2), and (Key 4, Value 4), Data (Key 3, Value 3), (Key 5, Value 5), and (Key 6, Value 6) indexed by the key value corresponding to the first subtree in the second storage area, and the data is rearranged according to the size of the Key.
  • the merged data (Key 1, Value 1), (Key 2, Value 2), (Key 3, Value 3), (Key 4, Value 4), (Key 5, Value 5), and (Key 6) are formed. , Value 6), these combined data will be stored in the second storage area.
  • the data indexed by the key value corresponding to the first subtree refers to the data corresponding to the key value in the range of the key value of the first subtree.
  • the key value corresponding to the first subtree ranges from 10 to 10 25, if the first key value of the first data is 15, the first key value 15 is located in the key value range of the first subtree 10-25, and the data indexed by the key value corresponding to the first subtree includes the first data.
  • the key-value pair of the data may be stored in the data block as shown in FIG. 5 ( Key, Value), for example, the data in the dotted line in the lower left corner of the second storage area needs to store the key-value pairs of the data (Key 3, Value 3) when storing.
  • the leaf node F since the leaf node F has already stored the key Key 3 of the data, when the data is stored, only the Value 3 can be stored in the data block, that is, only the dotted line in the lower left corner of the second storage area is stored. Value 3 does not store the full (Key 3, Value 3).
  • the embodiment of the present application does not limit the data storage form in the data block.
  • the data when the data is temporarily managed in the first storage area, it may also be managed by means of an index tree.
  • a small square in the index area of the first storage area in FIG. 5 may be represented by Node C is the three child nodes of the parent node. These three child nodes manage data of different index ranges respectively. These three child nodes may also include other child nodes, which are not shown here.
  • the index tree with node A as the root node mentioned above is an index tree for accessing data in the second index area, and the index tree used for temporarily managing data in the first storage area is a different index. tree.
  • the index tree in the embodiment of the present application refers to an index tree for accessing data of the second storage area, that is, a 5-layer index tree with node A as the root node in FIG. 5, unless otherwise specified.
  • the second key value Key 3 of the second data to be read is first searched in the index area of the first storage area, if in the first storage area.
  • the second data (Key 3, Value 3) is read in the data area of the first storage area according to Key3.
  • the leaf where the Key 3 is located is searched layer by layer from the root node of the index tree, that is, the A node according to the index tree. Node, assuming that Key 3 is found in the leaf node F in the second storage area, data is read from the data block corresponding to the second storage area and Key 3 according to the found Key 3 (Key 3, Value 3) .
  • the index range corresponding to each read operation covers multiple subtrees, and a certain number of cold subtrees (subtrees with too small access frequency) exist in the multiple subtrees. Or subtrees with fewer leaf nodes, then you can merge these multiple subtrees.
  • the data indexed by the key values corresponding to the subtrees needs to be written into the second storage area from the first storage area before the multiple subtrees are merged.
  • the process of merging multiple subtrees is the same as the process of merging multiple subtrees in the prior art. For brevity, no further details are provided here.
  • the index range corresponding to each read operation covers the first subtree and the second subtree, and at least the first subtree and the second subtree are at least If there is a subtree of less than a certain number of leaf nodes of the subtree or at least one subtree, then the first subtree and the second subtree may be merged. At this time, the first subtree and the second subtree are combined. The data indexed by the corresponding key value has been written from the first storage area to the second storage area. After the first subtree and the second subtree are merged, the first subtree and the second subtree can be combined into a new subtree, and the parent node of the new subtree is also the node B.
  • the size of the sequence numbers of the foregoing processes does not mean the order of execution sequence, and the order of execution of each process should be determined by its function and internal logic, and should not be applied to the embodiment of the present application.
  • the implementation process constitutes any limitation.
  • a storage device according to an embodiment of the present application will be described below with reference to FIG. 6 to FIG. 8.
  • the technical features described in the method embodiments may be applied to the following device embodiments.
  • FIG. 6 is a schematic block diagram of a memory device 600 in accordance with an embodiment of the present application.
  • the storage device includes a storage module 610 and a processing module 620.
  • the storage module 610 includes a first storage area and a second storage area, and the data read and write speed of the first storage area is higher than the The data read and write speed of the second storage area, the processing module 620 is configured to:
  • the leaf node includes information of data stored in the second storage area, M and n are both positive integers, n is less than or equal to M; and the first data is written from the first storage area to the second storage Updating the first subtree according to the first key value, where the updated first leaf node of the first subtree includes information of the first data.
  • the storage area in the storage device is partitioned, wherein the read/write performance of the first storage area is better than the read/write performance of the second storage area. And storing the first n-th node of the index tree in the first storage area, and the root node of the first sub-tree of the index tree is also located in the first n-th layer of the index tree, so that during the data update process, Compared with the manner in which the index tree of each level needs to be merged layer by layer in the prior art, the corresponding first subtree can be quickly found by the first key value of the data to be written, thereby improving the search speed. Moreover, in the present application, the merging of data in the first storage area and the second storage area can be implemented based on the first subtree, thereby reducing the problem of write amplification caused by layer-by-layer merging in the prior art.
  • the processing module 620 is further configured to: receive a write request, where the write request includes the first data to be written and the first key value; and the first data and the first A key value is written to the first storage area.
  • the newly written data is first written into the first storage area, and the read/write performance of the first storage area is superior, thereby increasing the data writing speed.
  • the manner of transferring from the first storage area to the second storage area also reduces data update directly in the second storage area. The problem of write amplification is increased, thereby improving data storage performance.
  • the processing module 620 is further configured to: receive a read request, where the read request includes a second key value of the second data; when the second key value is not in the first storage area according to the second key value
  • the index tree is searched according to the second key value to obtain a second subtree corresponding to the second key value, where a root node of the second subtree is located in the a first n layer of the index tree; the second data is read from the second storage area according to the information of the second data included in the second leaf node of the second subtree, wherein the The two leaf node is a leaf node that is found according to the second key value.
  • the processing module 620 is specifically configured to: when the remaining storage space of the first storage area is less than a first threshold, and the number of read/write times of the first subtree meets the second threshold, The first data is written from the first storage area to the second storage area.
  • the first storage area comprises a non-volatile storage medium.
  • FIG. 7 is a schematic block diagram of a storage device 700 in accordance with an embodiment of the present application.
  • the storage device may include the storage device 600 shown in FIG. 6, which may be, for example, a device for storing data, such as a computer, a server, or the like.
  • the storage device 700 includes a processor 710, a transceiver 720, and a memory 730, wherein the processor 710, the transceiver 720, and the memory 730 communicate with each other through an internal connection path.
  • the memory 730 is used to store data and instructions in the file, and the processor 710 is configured to execute instructions stored in the memory 730 to control the transceiver 720 to receive signals or transmit signals.
  • the memory 730 includes a first storage area and a second storage area. The data read/write speed of the first storage area is higher than the data read/write speed of the second storage area.
  • the processor 710 is configured to:
  • the leaf node includes information of data stored in the second storage area, M and n are both positive integers, n is less than or equal to M; and the first data is written from the first storage area to the second storage Updating the first subtree according to the first key value, where the updated first leaf node of the first subtree includes information of the first data.
  • the storage area in the storage device is partitioned, wherein the read/write performance of the first storage area is better than the read/write performance of the second storage area. And storing the first n-th node of the index tree in the first storage area, and the root node of the first sub-tree of the index tree is also located in the first n-th layer of the index tree, so that during the data update process, Compared with the manner in which the index tree of each level needs to be merged layer by layer in the prior art, the corresponding first subtree can be quickly found by the first key value of the data to be written, thereby improving the search speed. Moreover, in the present application, the merging of data in the first storage area and the second storage area can be implemented based on the first subtree, thereby reducing the problem of write amplification caused by layer-by-layer merging in the prior art.
  • the processor 710 is further configured to: receive a write request, where the write request includes the first data to be written and the first key value; and the first data and the first A key value is written to the first storage area.
  • the processor 710 is further configured to: receive a read request, where the read request includes a second key value of the second data; when the second key value is not in the first storage area according to the second key value
  • the index tree is searched according to the second key value to obtain a second subtree corresponding to the second key value, where a root node of the second subtree is located in the a first n layer of the index tree; the second data is read from the second storage area according to the information of the second data included in the second leaf node of the second subtree, wherein the Two leaf node A leaf node found according to the second key value.
  • the processor 710 is specifically configured to: when the remaining storage space of the first storage area is less than a first threshold, and the number of read/write times of the first subtree meets the second threshold, The first data is written from the first storage area to the second storage area.
  • the first storage area comprises a non-volatile storage medium.
  • the processor 710 may be a central processing unit (CPU), and the processor 710 may also be other general-purpose processors, digital signal processing (DSP). , Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc.
  • DSP digital signal processing
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the memory 730 can include read only memory and random access memory and provides instructions and data to the processor 710. A portion of the memory 730 may also include a non-volatile random access memory.
  • each step of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 710 or an instruction in a form of software.
  • the steps of the positioning method disclosed in the embodiment of the present application may be directly implemented by the hardware processor, or may be performed by a combination of hardware and software modules in the processor 710.
  • the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
  • the storage medium is located in memory 730, and processor 710 reads the information in memory 730 and, in conjunction with its hardware, performs the steps of the above method. To avoid repetition, it will not be described in detail here.
  • the storage device 700 according to the embodiment of the present application may correspond to the storage device for performing the method 200 in the foregoing method 200, and the storage device 600 according to the embodiment of the present application, and each unit or module in the storage device 700 is used for respectively The operations or processes performed by the storage device in the above method 200 are performed.
  • each unit or module in the storage device 700 is used for respectively The operations or processes performed by the storage device in the above method 200 are performed.
  • detailed description thereof will be omitted.
  • FIG. 8 is a schematic structural diagram of a system chip according to an embodiment of the present application.
  • the system chip 800 of FIG. 8 includes an input interface 801, an output interface 802, at least one processor 803, and a memory 804.
  • the input interface 801, the output interface 802, the processor 803, and the memory 804 are interconnected by an internal connection path.
  • the processor 803 is configured to execute code in the memory 804. When the code is executed, the processor 803 can implement the method 200 performed by the storage device in a method embodiment. For the sake of brevity, it will not be repeated here.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the coupling or direct coupling or new connection shown or discussed may be an indirect coupling or a new connection through some interface, device or unit, and may be in electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present application which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention concerne un procédé de mise à jour de données et un dispositif de stockage, qui est exécuté par un dispositif de stockage comprenant une première zone de stockage et une seconde zone de stockage. La vitesse de lecture/écriture de données de la première zone de stockage est supérieure à la vitesse de lecture/écriture de données de la seconde zone de stockage, et le procédé consiste à : trouver un arbre d'index selon une première valeur de clé de premières données pour obtenir un premier sous-arbre correspondant à une première valeur de clé, dans lequel les premières données sont stockées dans la première zone de stockage, l'arbre d'index comprenant M couches, les n premiers noeuds de couches de l'arbre d'index étant stockés dans la première zone de stockage, le noeud racine du premier sous-arbre est situé dans les n premières couches de l'arbre d'index, et le noeud terminal du premier sous-arbre comprend des informations de données stockées dans la seconde zone de stockage; écrire les premières données de la première zone de stockage à la seconde zone de stockage; et mettre à jour le premier sous-arbre selon la première valeur de clé, le premier noeud de terminal du premier sous-arbre mis à jour comprenant des informations des premières données. Le procédé de mise à jour de données fourni par la présente invention peut améliorer les performances de stockage du dispositif de stockage.
PCT/CN2017/083657 2017-05-09 2017-05-09 Procédé de mise à jour de données et dispositif de stockage WO2018205151A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201780070813.XA CN110168532B (zh) 2017-05-09 2017-05-09 数据更新方法和存储装置
PCT/CN2017/083657 WO2018205151A1 (fr) 2017-05-09 2017-05-09 Procédé de mise à jour de données et dispositif de stockage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/083657 WO2018205151A1 (fr) 2017-05-09 2017-05-09 Procédé de mise à jour de données et dispositif de stockage

Publications (1)

Publication Number Publication Date
WO2018205151A1 true WO2018205151A1 (fr) 2018-11-15

Family

ID=64104067

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/083657 WO2018205151A1 (fr) 2017-05-09 2017-05-09 Procédé de mise à jour de données et dispositif de stockage

Country Status (2)

Country Link
CN (1) CN110168532B (fr)
WO (1) WO2018205151A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115086168A (zh) * 2022-08-19 2022-09-20 北京全路通信信号研究设计院集团有限公司 一种车载设备通信参数更新存储方法、系统

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111104403B (zh) * 2019-11-30 2022-06-07 北京浪潮数据技术有限公司 一种lsm树数据处理方法、系统、设备及计算机介质
CN111131015B (zh) * 2019-12-27 2021-09-03 芯启源(南京)半导体科技有限公司 一种基于PC-Trie动态更新路由的方法
CN111475507B (zh) * 2020-03-31 2022-06-21 浙江大学 一种工作负载自适应单层lsmt的键值数据索引方法
CN111857582B (zh) * 2020-07-08 2024-04-05 平凯星辰(北京)科技有限公司 一种键值存储系统
CN114626532B (zh) * 2020-12-10 2023-11-03 本源量子计算科技(合肥)股份有限公司 基于地址读取数据的方法、装置、存储介质及电子装置
CN115374127B (zh) * 2022-10-21 2023-04-28 北京奥星贝斯科技有限公司 数据存储方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030093613A1 (en) * 2000-01-14 2003-05-15 David Sherman Compressed ternary mask system and method
CN104090942A (zh) * 2014-06-30 2014-10-08 中国电子科技集团公司第三十二研究所 应用于网络处理器中的Trie搜索方法及装置
CN104899297A (zh) * 2015-06-08 2015-09-09 南京航空航天大学 具有存储感知的混合索引结构
CN105447059A (zh) * 2014-09-29 2016-03-30 华为技术有限公司 一种数据处理方法及装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751406B (zh) * 2008-12-18 2012-01-04 赵伟 一种实现基于列存储的关系型数据库的方法及装置
CN102591864B (zh) * 2011-01-06 2015-03-25 上海银晨智能识别科技有限公司 比对系统中的数据更新方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030093613A1 (en) * 2000-01-14 2003-05-15 David Sherman Compressed ternary mask system and method
CN104090942A (zh) * 2014-06-30 2014-10-08 中国电子科技集团公司第三十二研究所 应用于网络处理器中的Trie搜索方法及装置
CN105447059A (zh) * 2014-09-29 2016-03-30 华为技术有限公司 一种数据处理方法及装置
CN104899297A (zh) * 2015-06-08 2015-09-09 南京航空航天大学 具有存储感知的混合索引结构

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115086168A (zh) * 2022-08-19 2022-09-20 北京全路通信信号研究设计院集团有限公司 一种车载设备通信参数更新存储方法、系统
CN115086168B (zh) * 2022-08-19 2022-11-22 北京全路通信信号研究设计院集团有限公司 一种车载设备通信参数更新存储方法、系统

Also Published As

Publication number Publication date
CN110168532B (zh) 2021-08-20
CN110168532A (zh) 2019-08-23

Similar Documents

Publication Publication Date Title
WO2018205151A1 (fr) Procédé de mise à jour de données et dispositif de stockage
US10162598B2 (en) Flash optimized columnar data layout and data access algorithms for big data query engines
US11256696B2 (en) Data set compression within a database system
US9021189B2 (en) System and method for performing efficient processing of data stored in a storage node
US9495398B2 (en) Index for hybrid database
US9092321B2 (en) System and method for performing efficient searches and queries in a storage node
US8595248B2 (en) Querying a cascading index that avoids disk accesses
US7418544B2 (en) Method and system for log structured relational database objects
US9268804B2 (en) Managing a multi-version database
Ahn et al. ForestDB: A fast key-value storage system for variable-length string keys
EP2562657B1 (fr) Gestion des transactions de mises à jours et des restaurations dans des bases de données orientées colonne
US20200334292A1 (en) Key value append
WO2012095771A1 (fr) Organisation de table à index épars
US10509780B2 (en) Maintaining I/O transaction metadata in log-with-index structure
CN104054071A (zh) 访问存储设备的方法和存储设备
WO2015024406A1 (fr) Procédé et dispositif de gestion de fichiers de données
WO2013075306A1 (fr) Procédé et dispositif d'accès aux données
CN109407985B (zh) 一种数据管理的方法以及相关装置
KR101806394B1 (ko) 모바일 dbms환경에서 트랜잭션에 특화된 색인 캐시의 구조를 갖는 데이터 처리 방법
US8396858B2 (en) Adding entries to an index based on use of the index
CN109165321A (zh) 一种基于非易失内存的一致性哈希表构建方法和系统
US20220129466A1 (en) Compressing data sets for storage in a database system
CN110515897B (zh) Lsm存储系统读性能的优化方法及系统
CN114077378A (zh) 一种构建索引方法及装置
JP6006740B2 (ja) インデックス管理装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17909104

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17909104

Country of ref document: EP

Kind code of ref document: A1