WO2018205151A1

WO2018205151A1 - Data updating method and storage device

Info

Publication number: WO2018205151A1
Application number: PCT/CN2017/083657
Authority: WO
Inventors: 徐君; 于群; 王元钢; 薛常亮
Original assignee: 华为技术有限公司
Priority date: 2017-05-09
Filing date: 2017-05-09
Publication date: 2018-11-15
Also published as: CN110168532B; CN110168532A

Abstract

The present application provides a data updating method and a storage device, which is performed by a storage device comprising a first storage area and a second storage area. The data read/write speed of the first storage area is higher than the data read/write speed of the second storage area, and the method comprises: finding an index tree according to a first key value of first data to obtain a first subtree corresponding to a first key value, wherein the first data is stored in the first storage area, the index tree comprises M layers, the first n layers nodes of the index tree is stored in the first storage area, the root node of the first subtree is located in the first n layers of the index tree, and the leaf node of the first subtree comprises information of data stored in the second storage area; writing the first data from the first storage area to the second storage area; and updating the first subtree according to the first key value, wherein the first leaf node of the updated first subtree comprises information of the first data. The data updating method provided by the present application can improve the storage performance of the storage device.

Description

Data update method and storage device

Technical field

The present application relates to the field of computer storage, and in particular, to a data update method and a storage device.

Background technique

In a storage system that supports Key Value (KV), the key (Key) can be quickly determined by finding a keyword, that is, a key, so that the capability of processing services in a large-scale real-time can be realized. The Log Structure Merge Tree (LSM Tree) is the main algorithm structure of the KV database. When it is necessary to update the data, the random write can be changed to sequential write by layer-by-layer merging, but since the storage device such as the disk is stored in units of blocks, each read and write operation is performed in units of blocks, thus In the layer-by-layer merging, the entire data block related to the data needs to be read into the memory and merged with the data, and then written back to the disk, so that the write amplification in the system is serious, further Affected the improvement of data storage performance.

Summary of the invention

The present application provides a data updating method and storage device, which can improve data reading and writing efficiency.

In a first aspect, a data update method is provided, the method being performed by a storage device including a first storage area and a second storage area, wherein a data read/write speed of the first storage area is higher than the second storage Data read and write speed of the area, the method comprising: searching an index tree according to a first key value of the first data to obtain a first subtree corresponding to the first key value, wherein the first data storage In the first storage area, the index tree includes an M layer, and a first n layer node of the index tree is stored in the first storage area, and a root node of the first subtree is located in the index tree. The first n layers, the leaf nodes of the first subtree include information of data stored in the second storage area, M and n are positive integers, n is less than or equal to M; Writing the first storage area to the second storage area; updating the first sub-tree according to the first key value, where the updated first leaf node of the first sub-tree includes the The first data information.

In the data update method provided by the present application, the storage area in the storage device is partitioned, wherein the read/write performance of the first storage area is better than the read/write performance of the second storage area. And storing the first n-th node of the index tree in the first storage area, and the root node of the first sub-tree of the index tree is also located in the first n-th layer of the index tree, so that during the data update process, Compared with the manner in which the index tree of each level needs to be merged layer by layer in the prior art, the corresponding first subtree can be quickly found by the first key value of the data to be written, thereby improving the search speed. Moreover, in the present application, the merging of data in the first storage area and the second storage area can be implemented based on the first subtree, thereby reducing the problem of write amplification caused by layer-by-layer merging in the prior art.

It should be understood that the first n layers of the index tree are stored in the first storage area, the root node of the first subtree is located in the first n layers of the index tree, and n is less than or equal to the total number of layers M of the index tree, thus The root node of the first subtree is also located in the first storage area. Specifically, the root node of the first subtree is located in the nth layer of the index tree, that is, in the first n layer of the index tree located in the first storage area, and the root node of the first subtree is the last of the n layers layer. For example, the total number of layers of the index tree is 5, wherein the first 3 layers are stored in the first storage area, and the root node of the first subtree is located in the third layer of the index tree.

It should also be understood that the first sub-tree corresponding to the first key value means that the first key value is within a range of key values corresponding to the first sub-tree.

Optionally, the information of the first data may include at least one of the following information: a value of the first data, a first key value of the first data, an address (or a link) of the first data, The address (or link) of the first key value, and so on.

Optionally, the information of the second data may include at least one of the following: a value of the second data, a first key value of the second data, an address (or a link) of the second data, The address (or link) of the second key value, and so on.

Optionally, in an implementation manner of the first aspect, the method further includes: receiving a write request, where the write request includes the first data to be written and the first key value; The first data and the first key value are written into the first storage area.

According to this manner, the newly written data is first written into the first storage area, and the read/write performance of the first storage area is superior, thereby increasing the data writing speed. Moreover, after the data is written into the first storage area, the manner of transferring from the first storage area to the second storage area also reduces writing caused by data update directly in the second storage area. Amplify the problem, which improves data storage performance.

Optionally, in an implementation manner of the first aspect, the method further includes: receiving a read request, where the read request includes a second key value of the second data; when the second key value is When the second data is not found in the first storage area, searching the index tree according to the second key value to obtain a second subtree corresponding to the second key value, where the second The root node of the subtree is located in the first n layer of the index tree; and the second storage area is read from the second storage area according to the information of the second data included in the second leaf node of the second subtree Two data, wherein the second leaf node is a leaf node that is found according to the second key value.

Optionally, in an implementation manner of the first aspect, the writing the first data from the first storage area to the second storage area includes: when remaining storage of the first storage area When the space is smaller than the first threshold and the number of readings and readings of the first subtree satisfies the second threshold, the first data is written from the first storage area to the second storage area.

Optionally, if multiple read operations are performed on the index tree, the index range corresponding to each read operation covers multiple subtrees, and a certain number of cold subtrees (subtrees with too small access frequency) exist in the multiple subtrees. Or a subtree with fewer leaf nodes, then the multiple subtrees can be merged. Before the multiple subtrees are merged, the data indexed by the key values corresponding to the subtrees needs to be written from the first storage area. Two storage areas.

In a second aspect, a storage device is provided that can be used to perform various ones of the storage methods described in the first aspect and various implementations described above. The storage device includes a storage module and a processing module, the storage module includes a first storage area and a second storage area, and the data read/write speed of the first storage area is higher than the data read/write speed of the second storage area The processing module is configured to: search an index tree according to a first keyword key value of the first data to obtain a first subtree corresponding to the first key value, where the first data is stored in the first In a storage area, the index tree includes an M layer, and a first n layer node of the index tree is stored in the first storage area, and a root node of the first subtree is located in a front n layer of the index tree. And the leaf node of the first subtree includes information of data stored in the second storage area, where M and n are positive integers, n is less than or equal to M; and the first data is from the first storage The area is written into the second storage area; the first sub-tree is updated according to the first key value, wherein the updated first leaf node of the first sub-tree includes the first data information.

Optionally, in an implementation manner of the second aspect, the processing module is further configured to: receive a write request, where the write request includes the first data to be written and the first key value; Writing the first data and the first key value into the first storage area.

Optionally, in an implementation manner of the second aspect, the processing module is further configured to: receive a read request, where the read request includes a second key value of the second data; When the value is not found in the first storage area, the index tree is searched according to the second key value to obtain a second subtree corresponding to the second key value, where The root node of the second subtree is located in the first n layer of the index tree; and the information is read from the second storage area according to the information of the second data included in the second leaf node of the second subtree The second data, wherein the second leaf node is a leaf node that is found according to the second key value.

Optionally, in an implementation manner of the second aspect, the processing module is specifically configured to: when a remaining storage space of the first storage area is smaller than a first threshold, and the number of reading and writing of the first subtree is satisfied And the second threshold is written from the first storage area to the second storage area.

Optionally, in an implementation manner of the second aspect, the first storage area includes a non-volatile storage medium.

In a third aspect, a storage device is provided, the storage device including a transceiver, a processor, and a memory. The memory stores a program that executes the program for performing the various processes in the data update method described in the first aspect and various implementations described above.

In a fourth aspect, a computer is provided, including a processor and a memory; the memory is configured to store computer execution instructions, and the processor and the memory communicate with each other through an internal connection path, when the computer is running, The processor executes the computer-executed instructions stored by the memory to cause the computer to perform various ones of the data update methods described in the first aspect and various implementations described above.

In a fifth aspect, a computer readable storage medium is provided, the computer readable storage medium storing a program, the program causing the apparatus to perform any one of the above first aspects and various implementation manners thereof .

In a sixth aspect, a system chip is provided, the system chip comprising an input interface, an output interface, a processor, and a memory, the processor is configured to execute an instruction stored by the memory, and when the instruction is executed, the processor can implement the foregoing The first aspect and any of its various implementations.

DRAWINGS

1 is a schematic diagram of data storage in the prior art.

FIG. 2 is a schematic flowchart of a data update method according to an embodiment of the present application.

FIG. 3 is a schematic flowchart of a data update method according to another embodiment of the present application.

FIG. 4 is a schematic flowchart of a data update method according to another embodiment of the present application.

FIG. 5 is a schematic structural diagram of a storage device according to an embodiment of the present application.

FIG. 6 is a schematic block diagram of a storage device according to an embodiment of the present application.

FIG. 7 is a schematic structural diagram of a storage device according to an embodiment of the present application.

FIG. 8 is a schematic structural diagram of a system chip according to an embodiment of the present application.

detailed description

The technical solutions in the present application will be described below with reference to the accompanying drawings.

It should be understood that the data update method described in the embodiments of the present application may be applied to a storage system supporting key-values. In a storage system that supports key-values, data is stored in a key-value, and multiple pairs of key-values are stored in the corresponding file, by looking up the keyword Key in the Key Value. The data value Value corresponding to the keyword Key is quickly determined, so that the capability of processing services in a large-scale real-time can be realized.

In the prior art, when data needs to be updated, the updated data sequence is first written to the disk log, and then The data update is performed in the memory cache. As shown in FIG. 1 , when the amount of data in the memory reaches a certain threshold, the data is directly transferred to the level (level) 0 layer file of the disk; when the data volume of the level 0 layer file is accumulated to a certain extent The degree is merged with the Level 1 layer file, and the merged new file is stored to the Level 1 layer, and the redundant data is deleted; when the data volume of the Level 1 layer file is accumulated to a certain extent, it is merged with the Level 2 layer file, and Store the merged new file to the Level 2 layer and delete the redundant data; and so on, to form a smaller number of files with larger storage capacity. Such a layer-by-layer merge results in a significant write amplification. For example, if only a small amount of data in the upper layer needs to be merged into the next layer, it is only necessary to write the small amount of data to the next layer, but since the storage device such as the disk is stored in units of blocks, each read and write operation It must be in units of data blocks, so the entire data block associated with the data needs to be read into the memory and merged with the data, and then written back to the disk, so that a block of data is written. The amount of data. Therefore, the layer-by-layer merge method leads to a serious write amplification, which further affects the performance of data storage.

For a read operation, when a certain data needs to be read, the key corresponding to the data is first searched in the memory table (memtable). If the key is not found, the index file of each layer of data is checked in reverse order, that is, the character. Sorted String Table (SSTable) until the Key is found. Each SSTable is ordered. The search speed is slowed down as the number of SSTables increases. The time complexity is O(K log N), where K is the number of sstable files and N is the average size of the SSTable. Therefore, the complexity of the write operation also limits the performance of data storage.

The embodiment of the present application provides a data update method for improving storage performance. It should be understood that the write operation in the embodiment of the present application may include an operation of writing new data (put) or update data (update), and the read operation in the embodiment of the present application may include reading (get) or range of data. Query (range query) and other operations.

FIG. 2 is a schematic flowchart of a data update method according to an embodiment of the present application. The method is performed by a storage device including a first storage area and a second storage area, wherein a data read/write speed of the first storage area is higher than a data read/write speed of the second storage area, and the storage device is accessed through an index tree. Data stored in the second storage area, the index tree includes an M layer, the first n layers of the index tree are stored in the first storage area, M and n are both positive integers and n is less than or equal to the total number of layers of the index tree M. The index tree may be any type of index tree, such as a binary tree, a balanced multiple search tree (B-Tree, B+Tree), etc., which is not limited in this application.

It should be understood that the first storage area may be, for example, a storage-class memory (SCM) or other byte-addressable non-volatile storage medium; the second storage area may be, for example, a NAND flash memory ( NAND Flash) or Hard Disk Drive (HDD). Since the first storage area uses a fast storage medium, and the second storage area uses a slow storage medium, the data read and write speed of the first storage area is higher than the data read/write speed of the second storage area, that is, The access performance of the first storage area is better than the second storage area.

It should also be understood that the index tree is used to access data in the second storage area, but since the upper layer access of the index tree is frequently higher, the first n layers of the index tree can be stored in the first storage area. When n is smaller than the total number of layers of the index tree, a partial layer of the index tree is stored in the first storage area, and when n is equal to the total number of layers of the index tree, the entire index tree is stored in the first storage area. However, considering that for the second storage area, such as storage level memory SCM, etc., currently limited by manufacturing cost and storage space, it may be preferred to store a part of the index tree, that is, the first few layers, in the first storage area, and the remaining layers are stored in The second storage area. If the second storage area, such as the storage level memory SCM, is sufficiently cheap in the future, and the space is large enough, the entire index tree may also be stored in the second storage area, which is not limited herein.

As shown in FIG. 2, the data update method may include the following steps:

In 210, an index tree is searched according to a first keyword key value of the first data to obtain a pair with the first key value. The first subtree should be.

The first data is stored in the first storage area, and the index tree includes an M layer. The first n-layer node of the index tree is stored in the first storage area, and the root node of the first sub-tree is located in the index tree. The first n layers, the leaf nodes of the first subtree include information of data stored in the second storage area, M and n are positive integers, and n is less than or equal to M.

Here is a brief description of the nodes of the index tree. In an index tree, the nodes of the index tree are divided into four categories: root node, leaf node, parent node, and child node. The child node is the next-level node of the parent node. If a node has a higher level, the upper level is called its parent node. If there is no upper level, the node has no parent node. A node with no children in a tree is called a leaf node. There are no other nodes above the current node. This node is called the root node. The nodes in the embodiments of the present application can also be written as nodes.

It should also be understood that in a key-value enabled storage system, data may be stored in the form of a Key Value. In the embodiment of the present application, the data stored in the first storage area and the second storage area may include a value of the data and an index corresponding to the data, for example, the first data mentioned herein includes the value of the first data and the first An index of data, the index of the first data is the key Key in the key-value pair, and the value of the first data is the value Value in the key-value pair. Through the Key of the first data, the Value corresponding to the Key can be quickly found. Different data can be managed by the index tree. When the data is read and written, the index tree can be used to determine the data block where the corresponding data is located by using the index of the data, thereby realizing access to the data.

For example, for the student management database, the student's student number and name are stored, wherein the student number is used as the key Key and the name is the value Value. If the new data is written into the student management database, execute put(0600100, Chen Meiling), that is, adding a pair of key-values (0600100, Chen Meiling) to the student management database.

If the data in the student management database is updated, it is assumed that it has been written (0600100, Chen Meiling), that is, a pair of key-values such as (0600100, Chen Meiling) has been added to the student management database, and now Chen Meiling is renamed. For Chen Meimei, you need to update (0600100, Chen Meiling) to (0600100, Chen Mei) and execute update (0600100, Chen Mei). In this way, the (0600100, Chen Meiling) in the database becomes (0600100, Chen Meimei). The data update operation is to update the value corresponding to the existing Key, and the write operation is to add a new key-value (Key, Value).

At 220, the first data is written from the first storage area to the second storage area.

Specifically, the storage device may write the first data stored in the first storage area into the second storage area. That is, the data indexed by the key value corresponding to the first subtree in the first storage area and the data indexed by the key value corresponding to the first subtree in the second storage area may be data merged, that is, The data is rearranged and re-stored according to the index size of the data in the first storage area and the second storage area. The merged data will be stored in the second storage area. Thus, since the data in the first storage area is transferred to the second storage area, The storage space in the first storage area can be released.

Optionally, in 220, writing the first data from the first storage area to the second storage area includes: when the remaining storage space of the first storage area is less than the first threshold, and the number of times of reading and writing of the first subtree is satisfied At the second threshold, the first data is written from the first storage area to the second storage area.

Specifically, the timing of writing the first data from the first storage area to the second storage area may be determined according to the size of the remaining storage space of the first storage area at the time, when the remaining storage space of the first storage area is insufficient, for example, less than the first At the threshold, the transfer of the data is initiated, that is, the data is written from the first storage area to the second storage area. At this time, the storage device first selects at least one subtree satisfying the condition in all the subtrees where the root node is located in the first n layer, for example, the nth layer, and the at least one subtree includes, for example, the first subtree described above, and according to the At least one subtree writes data in the first storage area to the second storage area. That is to say, when the remaining storage space of the first storage area is insufficient, the judgment of whether each sub-tree in the index tree satisfies the condition is triggered, so that the root node is located at the nth layer and the key value corresponding to the subtree satisfying the condition is obtained. The indexed data is written from the first storage area to the second storage area.

The condition that the at least one subtree satisfies may include, for example, any one of the following: the data that can be released after the data indexed by the key value corresponding to the subtree is written from the first storage area to the second storage area is greater than one pre The threshold is set; the ratio of the number of write operations of the subtree to the number of read operations is greater than a preset threshold; the read operation frequency and/or the write operation frequency corresponding to the subtree are less than a preset threshold. The following uses the first subtree as an example to describe these three conditions.

Mode 1

The storage space that can be released after the sub-tree merge operation is performed on the first sub-tree is greater than a preset threshold.

Specifically, the primary purpose of writing the first data from the first storage area to the second storage area is to release the storage space of the fast storage area, so when the storage space of the fast storage area is insufficient, the key corresponding to each sub-tree may be The size of the storage space that can be released after the first data that is indexed is written from the first storage area to the second storage area, to determine which sub-tree corresponding key values are indexed from the first storage area to the second data. Storage area. For example, if the data indexed by the key value corresponding to the first subtree (including the first key value) (including the first data) is written from the first storage area to the second storage area, the size of the storage space that can be released is greater than one. The preset threshold, the storage device may write the data indexed by the key value corresponding to the first subtree from the first storage area to the second storage area.

Mode 2

The ratio of the number of write operations of the first subtree to the number of read operations of the first subtree is greater than a preset threshold.

Specifically, as far as possible, a subtree with many write operations and few read operations can be selected, so that the data indexed by the key value corresponding to the subtree is written from the first storage area to the second storage area. For example, the ratio of the number of write operations of the first subtree to the number of read operations of the first subtree is used as a measure, if the ratio of the number of write operations of the first subtree to the number of read operations of the first subtree is greater than a preset At the threshold, the data (including the first data) indexed by the key value corresponding to the first subtree (including the first key value) is written from the first storage area to the second storage area. Because the read operation speed is faster in the first storage area, and the read operation speed is slower in the second storage area, if the data corresponding to the first sub-tree in the first storage area has more read operations, The data is left in the first memory area so that the read speed of the system can be increased.

It should be understood that the write operation of the first subtree refers to that the key value of the written data is in the range of key values corresponding to the first subtree; the read operation corresponding to the first subtree refers to the key value of the read data. It is located in the range of key values corresponding to the first subtree.

Mode 3

The read operation frequency and/or the write operation frequency corresponding to the first subtree are less than a preset threshold.

Specifically, there is also a case where there are some subtrees, and corresponding read operations and write operations are relatively small, and the read operation frequency and/or write operation frequency of the subtree is very low, that is, the subtree is in a cold state, The subtree may also write data indexed by the key value corresponding to the subtree from the first storage area to the second storage area. For example, if the read operation frequency and/or the write operation frequency of the first subtree is less than a preset threshold, the data indexed by the key value corresponding to the first subtree (including the first key value) may be included (including the first Data) is written to the second storage area from the first storage area.

Optionally, in a specific implementation, input/output (I/O) statistics may be updated each time a write operation or a read operation is performed in the storage device, and the I/O statistical information may include, for example. At least one of the following: the sum of the number of write operations (including the put operation and the update operation) for each subtree, the number of read operations, the index range of the read operation, the number of queries for the index range, and each time The time stamp of the operation records, for example, time information for performing a write operation, time information for performing an update operation, time information for performing a read operation, and the like.

In 230, the first subtree is updated according to the first key value.

The first leaf node of the updated first subtree includes the information of the first data.

Specifically, after the first data is written by the first storage area into the second storage area, the storage device further updates the first subtree according to the first key value, and the updated first leaf node in the first subtree Information including the first data. In addition, the first leaf node may be any leaf node in the first subtree.

Optionally, the information of the first data may include at least one of the following information: a value of the first data, a first key value of the first data, an address (or a link) of the first data, and a first The address (or link) of the key value, etc.

Therefore, the storage area in the storage device is partitioned, wherein the read/write performance of the first storage area is better than the read/write performance of the second storage area. And storing the first n-th node of the index tree in the first storage area, and the root node of the first sub-tree of the index tree is also located in the first n-th layer of the index tree, so that during the data update process, Compared with the manner in which the index tree of each level needs to be merged layer by layer in the prior art, the corresponding first subtree can be quickly found by the first key value of the data to be written, thereby improving the search speed. Moreover, in the present application, the merging of data in the first storage area and the second storage area can be implemented based on the first subtree, thereby reducing the problem of write amplification caused by layer-by-layer merging in the prior art.

Optionally, as shown in FIG. 3, prior to 210, the method further includes 240 and 250.

At 240, a write request is received, the write request including the first data to be written and the first key value.

At 250, the first data and the first key value are written to the first storage area.

Specifically, when performing a write operation, the storage device may first store the received data into the first storage area for temporary management. The first storage area does not merge by layer by layer when writing data, but for example, data can be written in a granularity of bytes, so that the speed of writing data into the first storage area is significantly higher than that of writing data. The speed of the two storage areas avoids the problem of write amplification.

Optionally, as shown in FIG. 4, the method further includes:

At 260, a read request is received, the read request including a second key value of the second data.

In 270, when the second data is not found in the first storage area according to the second key value, searching the index tree according to the second key value to obtain a second subtree corresponding to the second key value. .

The root node of the second subtree is located in the first n layers of the index tree.

In 280, the second data is read from the second storage area according to the information of the second data included in the second leaf node of the second subtree.

The second leaf node is a leaf node that is found according to the second key value. Optionally, the information of the second data includes at least one of the following information: a value of the second data, a second key value of the second data, an address (or a link) of the second data, and a second key The address (or link) of the value, etc.

Specifically, when the data needs to be read, the storage device first searches for the second data in the first storage area according to the second key value, and if the second data is found in the first storage area, directly reads the second data. data. Since part of the data is also stored in the first storage area, if the part of the data has not been written to the second storage area, the storage device can find the data in the first storage area. Since the data access speed of the first storage area is significantly higher than the data access speed of the second storage area, fast reading of data can be realized.

If the data in the first storage area, for example, the second data has been written into the second storage area, the storage device may not find the index of the second data in the first storage area, and at this time, the storage device needs to be according to the second The key value searches the index tree to obtain a second subtree corresponding to the second key value, and reads from the second storage area according to the information of the second data included in the second leaf node of the second subtree. Take the second data.

The data updating method of the embodiment of the present application will be described below with reference to FIG. 5 in a detailed example. FIG. 5 is a schematic diagram of a data update method according to an embodiment of the present application. A first storage area and a second storage area are shown in FIG. 5. The first storage area is a first storage medium SCM, the second storage area is a NAND Flash or HDD, and the first storage area includes a data area and an index area. These two parts, the data area is used to store data, and the index area is used to store the index tree. It is assumed here that the index tree is a balanced multipath search tree B+Tree. The second storage area is used to store the index tree and data. It should be understood that the first storage area may be a storage medium, and the storage medium is divided into a data area and an index area. Alternatively, the data area and the index area may use different storage media, which are not limited herein.

The total number of layers of the index tree for finding data in the second storage area in the storage device is 5, and the first 3 layers are stored in the index area of the first storage area, and the last 2 layers are stored in the second storage area. As shown in FIG. 5, the index tree has a node A as a root node, wherein the child nodes of the node A include a node B and a node P, wherein the child nodes of the node B include a node C and a node I, and in turn, a child of the node C The node includes a node D and a node E, and the child nodes of the node D include a node F, a node G, and a node H. Node F, node G, and node H are leaf nodes of the entire index tree.

When writing data, for example, writing the first data, the first data is not written into the second storage area as in the prior art, but the first data is written into the data area of the first storage area, the first storage The data write speed of the zone is significantly faster than the second memory zone, so the first data can be written first.

However, each time the data written by the user is written into the first storage area, the storage space of the first storage area is continuously reduced, when the storage space of the first storage area is reduced to a certain extent, for example, less than a space threshold. The storage device writes data from the first storage area to the second storage area to release the storage space of the first storage area. The storage device specifically transfers the data to the second storage area, and the sub-trees satisfying certain conditions are selected in the plurality of sub-trees whose root nodes are located in the third layer, and the key values corresponding to the sub-trees satisfying the condition are indexed. Data is written from the first storage area to the second storage area. The storage device may be configured to: according to the size of the storage space that can be released by the storage space that can be released after the data indexed by the key value corresponding to the subtree is written from the first storage area to the second storage area, or the read/write operation corresponding to each subtree. The number of times determines whether each subtree satisfies the merge condition.

For example, as shown in FIG. 5, the plurality of subtrees in the index tree in which the root node is located in the third layer includes the first subtree and the second subtree. When the storage space in the first storage area is insufficient, the storage device determines the first Whether the subtree and the second subtree satisfy a preset condition, for example, whether the number of read/write operations corresponding to each subtree reaches a certain threshold, assuming that the first subtree satisfies The preset condition is that the storage device writes the data indexed by the key value corresponding to the first subtree from the first storage area to the second storage area. That is, the data included in the first subtree in the first storage area and the data included in the first subtree in the second storage area are read into the memory, and sorted according to the key Key of the data, and the sorted The data is transferred from the memory to the second storage area so as to be stored in the corresponding location in the second storage area. In this way, the storage space in the first storage area can be released. As shown in FIG. 5, the data indexed by the key value corresponding to the first subtree in the first storage area includes (Key 1, Value 1), (Key 2, Value 2), and (Key 4, Value 4), Data (Key 3, Value 3), (Key 5, Value 5), and (Key 6, Value 6) indexed by the key value corresponding to the first subtree in the second storage area, and the data is rearranged according to the size of the Key. After that, the merged data (Key 1, Value 1), (Key 2, Value 2), (Key 3, Value 3), (Key 4, Value 4), (Key 5, Value 5), and (Key 6) are formed. , Value 6), these combined data will be stored in the second storage area.

Here, the data indexed by the key value corresponding to the first subtree refers to the data corresponding to the key value in the range of the key value of the first subtree. For example, the key value corresponding to the first subtree ranges from 10 to 10 25, if the first key value of the first data is 15, the first key value 15 is located in the key value range of the first subtree 10-25, and the data indexed by the key value corresponding to the first subtree includes the first data.

It should be understood that, in FIG. 5, when the data corresponding to the first subtree in the first storage area and the second storage area is stored, the key-value pair of the data may be stored in the data block as shown in FIG. 5 ( Key, Value), for example, the data in the dotted line in the lower left corner of the second storage area needs to store the key-value pairs of the data (Key 3, Value 3) when storing. However, in some cases, since the leaf node F has already stored the key Key 3 of the data, when the data is stored, only the Value 3 can be stored in the data block, that is, only the dotted line in the lower left corner of the second storage area is stored. Value 3 does not store the full (Key 3, Value 3). The embodiment of the present application does not limit the data storage form in the data block.

It should also be understood that when the data is temporarily managed in the first storage area, it may also be managed by means of an index tree. For example, a small square in the index area of the first storage area in FIG. 5 may be represented by Node C is the three child nodes of the parent node. These three child nodes manage data of different index ranges respectively. These three child nodes may also include other child nodes, which are not shown here. The index tree with node A as the root node mentioned above is an index tree for accessing data in the second index area, and the index tree used for temporarily managing data in the first storage area is a different index. tree. Since the amount of data stored in the first storage area is generally small, and data is usually read and written in a granularity of bytes, the data may be temporarily managed in the first storage area without using an index tree. The data is managed in a simple manner, such as a list, and the like, which is not limited in the embodiment of the present application. The index tree in the embodiment of the present application refers to an index tree for accessing data of the second storage area, that is, a 5-layer index tree with node A as the root node in FIG. 5, unless otherwise specified.

The process of the write operation based on the data update method of the present application is exemplified above with reference to FIG. 5, and the process of the read operation is exemplified below with reference to FIG.

For example, if the second data (Key 3, Value 3) is to be searched, the second key value Key 3 of the second data to be read is first searched in the index area of the first storage area, if in the first storage area. When Key 3 is found, the second data (Key 3, Value 3) is read in the data area of the first storage area according to Key3. If, as shown in FIG. 5, the second key value Key 3 of the second data cannot be found in the first storage area, the leaf where the Key 3 is located is searched layer by layer from the root node of the index tree, that is, the A node according to the index tree. Node, assuming that Key 3 is found in the leaf node F in the second storage area, data is read from the data block corresponding to the second storage area and Key 3 according to the found Key 3 (Key 3, Value 3) .

Optionally, if multiple read operations are performed on the index tree, the index range corresponding to each read operation covers multiple subtrees, and a certain number of cold subtrees (subtrees with too small access frequency) exist in the multiple subtrees. Or subtrees with fewer leaf nodes, then you can merge these multiple subtrees. The data indexed by the key values corresponding to the subtrees needs to be written into the second storage area from the first storage area before the multiple subtrees are merged. The process of merging multiple subtrees is the same as the process of merging multiple subtrees in the prior art. For brevity, no further details are provided here.

Still taking FIG. 5 as an example, if multiple read operations are performed on the index tree, the index range corresponding to each read operation covers the first subtree and the second subtree, and at least the first subtree and the second subtree are at least If there is a subtree of less than a certain number of leaf nodes of the subtree or at least one subtree, then the first subtree and the second subtree may be merged. At this time, the first subtree and the second subtree are combined. The data indexed by the corresponding key value has been written from the first storage area to the second storage area. After the first subtree and the second subtree are merged, the first subtree and the second subtree can be combined into a new subtree, and the parent node of the new subtree is also the node B.

It should be understood that, in the various embodiments of the present application, the size of the sequence numbers of the foregoing processes does not mean the order of execution sequence, and the order of execution of each process should be determined by its function and internal logic, and should not be applied to the embodiment of the present application. The implementation process constitutes any limitation.

A storage device according to an embodiment of the present application will be described below with reference to FIG. 6 to FIG. 8. The technical features described in the method embodiments may be applied to the following device embodiments.

FIG. 6 is a schematic block diagram of a memory device 600 in accordance with an embodiment of the present application. As shown in FIG. 6, the storage device includes a storage module 610 and a processing module 620. The storage module 610 includes a first storage area and a second storage area, and the data read and write speed of the first storage area is higher than the The data read and write speed of the second storage area, the processing module 620 is configured to:

Finding an index tree according to a first keyword key value of the first data to obtain a first subtree corresponding to the first key value, wherein the first data is stored in the first storage area, the index The tree includes an M layer, the first n-th node of the index tree is stored in the first storage area, and the root node of the first sub-tree is located in the first n-layer of the index tree, where the first sub-tree The leaf node includes information of data stored in the second storage area, M and n are both positive integers, n is less than or equal to M; and the first data is written from the first storage area to the second storage Updating the first subtree according to the first key value, where the updated first leaf node of the first subtree includes information of the first data.

Optionally, the processing module 620 is further configured to: receive a write request, where the write request includes the first data to be written and the first key value; and the first data and the first A key value is written to the first storage area.

According to this manner, the newly written data is first written into the first storage area, and the read/write performance of the first storage area is superior, thereby increasing the data writing speed. Moreover, after the data is written into the first storage area, the manner of transferring from the first storage area to the second storage area also reduces data update directly in the second storage area. The problem of write amplification is increased, thereby improving data storage performance.

Optionally, the processing module 620 is further configured to: receive a read request, where the read request includes a second key value of the second data; when the second key value is not in the first storage area according to the second key value When the second data is found, the index tree is searched according to the second key value to obtain a second subtree corresponding to the second key value, where a root node of the second subtree is located in the a first n layer of the index tree; the second data is read from the second storage area according to the information of the second data included in the second leaf node of the second subtree, wherein the The two leaf node is a leaf node that is found according to the second key value.

Optionally, the processing module 620 is specifically configured to: when the remaining storage space of the first storage area is less than a first threshold, and the number of read/write times of the first subtree meets the second threshold, The first data is written from the first storage area to the second storage area.

Optionally, the first storage area comprises a non-volatile storage medium.

FIG. 7 is a schematic block diagram of a storage device 700 in accordance with an embodiment of the present application. The storage device may include the storage device 600 shown in FIG. 6, which may be, for example, a device for storing data, such as a computer, a server, or the like. As shown in FIG. 7, the storage device 700 includes a processor 710, a transceiver 720, and a memory 730, wherein the processor 710, the transceiver 720, and the memory 730 communicate with each other through an internal connection path. The memory 730 is used to store data and instructions in the file, and the processor 710 is configured to execute instructions stored in the memory 730 to control the transceiver 720 to receive signals or transmit signals. The memory 730 includes a first storage area and a second storage area. The data read/write speed of the first storage area is higher than the data read/write speed of the second storage area. The processor 710 is configured to:

Optionally, the processor 710 is further configured to: receive a write request, where the write request includes the first data to be written and the first key value; and the first data and the first A key value is written to the first storage area.

Optionally, the processor 710 is further configured to: receive a read request, where the read request includes a second key value of the second data; when the second key value is not in the first storage area according to the second key value When the second data is found, the index tree is searched according to the second key value to obtain a second subtree corresponding to the second key value, where a root node of the second subtree is located in the a first n layer of the index tree; the second data is read from the second storage area according to the information of the second data included in the second leaf node of the second subtree, wherein the Two leaf node A leaf node found according to the second key value.

Optionally, the processor 710 is specifically configured to: when the remaining storage space of the first storage area is less than a first threshold, and the number of read/write times of the first subtree meets the second threshold, The first data is written from the first storage area to the second storage area.

Optionally, the first storage area comprises a non-volatile storage medium.

It should be understood that, in the embodiment of the present application, the processor 710 may be a central processing unit (CPU), and the processor 710 may also be other general-purpose processors, digital signal processing (DSP). , Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor or any conventional processor or the like.

The memory 730 can include read only memory and random access memory and provides instructions and data to the processor 710. A portion of the memory 730 may also include a non-volatile random access memory.

In the implementation process, each step of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 710 or an instruction in a form of software. The steps of the positioning method disclosed in the embodiment of the present application may be directly implemented by the hardware processor, or may be performed by a combination of hardware and software modules in the processor 710. The software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like. The storage medium is located in memory 730, and processor 710 reads the information in memory 730 and, in conjunction with its hardware, performs the steps of the above method. To avoid repetition, it will not be described in detail here.

The storage device 700 according to the embodiment of the present application may correspond to the storage device for performing the method 200 in the foregoing method 200, and the storage device 600 according to the embodiment of the present application, and each unit or module in the storage device 700 is used for respectively The operations or processes performed by the storage device in the above method 200 are performed. Here, in order to avoid redundancy, detailed description thereof will be omitted.

FIG. 8 is a schematic structural diagram of a system chip according to an embodiment of the present application. The system chip 800 of FIG. 8 includes an input interface 801, an output interface 802, at least one processor 803, and a memory 804. The input interface 801, the output interface 802, the processor 803, and the memory 804 are interconnected by an internal connection path. . The processor 803 is configured to execute code in the memory 804. When the code is executed, the processor 803 can implement the method 200 performed by the storage device in a method embodiment. For the sake of brevity, it will not be repeated here.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present application.

A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. Alternatively, the coupling or direct coupling or new connection shown or discussed may be an indirect coupling or a new connection through some interface, device or unit, and may be in electrical, mechanical or other form.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

The functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including The instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

The foregoing is only a specific embodiment of the present application, but the scope of protection of the present application is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present application. It should be covered by the scope of protection of this application. Therefore, the scope of protection of the present application should be determined by the scope of the claims.

Claims

A data updating method, the method being performed by a storage device including a first storage area and a second storage area, wherein a data read/write speed of the first storage area is higher than a data read/write speed of the second storage area , the method includes:

Finding an index tree according to a first keyword key value of the first data to obtain a first subtree corresponding to the first key value, wherein the first data is stored in the first storage area, the index The tree includes an M layer, the first n-th node of the index tree is stored in the first storage area, and the root node of the first sub-tree is located in the first n-layer of the index tree, where the first sub-tree The leaf node includes information of the stored data in the second storage area, where M and n are positive integers, and n is less than or equal to M;

Writing the first data from the first storage area to the second storage area;

Updating the first subtree according to the first key value, where the updated first leaf node of the first subtree includes information of the first data.
The method of claim 1 further comprising:

Receiving a write request, where the write request includes the first data to be written and the first key value;

Writing the first data and the first key value into the first storage area.
The method according to claim 1 or 2, wherein the method further comprises:

Receiving a read request, where the read request includes a second key value of the second data;

When the second data is not found in the first storage area according to the second key value, searching the index tree according to the second key value to obtain a second corresponding to the second key value a subtree, wherein a root node of the second subtree is located in a front n layer of the index tree;

Reading the second data from the second storage area according to the information of the second data included in the second leaf node of the second subtree, wherein the second leaf node is according to the The leaf node found by the second key value.
The method according to any one of claims 1 to 3, wherein the writing the first data from the first storage area to the second storage area comprises:

Writing the first data from the first storage area when the remaining storage space of the first storage area is less than a first threshold and the number of read/write times of the first sub-tree satisfies the second threshold The second storage area is described.
The method according to any one of claims 1 to 4, wherein the first storage area comprises a non-volatile storage medium.
A storage device, comprising: a storage module and a processing module, the storage module comprising a first storage area and a second storage area, wherein the data read and write speed of the first storage area is higher than the Data read and write speed of the second storage area, the processing module is configured to:

Finding an index tree according to a first keyword key value of the first data to obtain a first subtree corresponding to the first key value, wherein the first data is stored in the first storage area, the index The tree includes an M layer, the first n-th node of the index tree is stored in the first storage area, and the root node of the first sub-tree is located in the first n-layer of the index tree, where the first sub-tree The leaf node includes information of the stored data in the second storage area, where M and n are positive integers, and n is less than or equal to M;

Writing the first data from the first storage area to the second storage area;

Updating the first subtree according to the first key value, where the updated first leaf node of the first subtree includes information of the first data.
The storage device according to claim 6, wherein the processing module is further configured to:

Receiving a write request, where the write request includes the first data to be written and the first key value;

Writing the first data and the first key value into the first storage area.
The storage device according to claim 6 or 7, wherein the processing module is further configured to:

Receiving a read request, where the read request includes a second key value of the second data;

When the second data is not found in the first storage area according to the second key value, searching the index tree according to the second key value to obtain a second corresponding to the second key value a subtree, wherein a root node of the second subtree is located in a front n layer of the index tree;

Reading the second data from the second storage area according to the information of the second data included in the second leaf node of the second subtree, wherein the second leaf node is according to the The leaf node found by the second key value.
The storage device according to any one of claims 6 to 8, wherein the processing module is specifically configured to:

Writing the first data from the first storage area when the remaining storage space of the first storage area is less than a first threshold and the number of read/write times of the first sub-tree satisfies the second threshold The second storage area is described.
The storage device according to any one of claims 6 to 9, wherein the first storage area comprises a non-volatile storage medium.
A computer, comprising: a processor and a memory;

The memory is configured to store computer execution instructions, and the processor and the memory communicate with each other through an internal connection path, and when the computer is running, the processor executes the computer execution instructions stored by the memory, The computer is caused to perform the method of any of claims 1-5.
A computer readable storage medium comprising computer executed instructions for performing the method of any of claims 1-5 when a processor of a computer executes the computer to execute an instruction.