WO2023108360A1

WO2023108360A1 - Method and apparatus for managing data in storage system

Info

Publication number: WO2023108360A1
Application number: PCT/CN2021/137522
Authority: WO
Inventors: 张海波; 郭小东; 唐飞龙; 李旭
Original assignee: 华为技术有限公司
Priority date: 2021-12-13
Filing date: 2021-12-13
Publication date: 2023-06-22

Abstract

Embodiments of the present application provide a method and apparatus for managing data in a storage system and relate to the field of data storage. The method and apparatus can save resources for data management. The method comprises: on the basis of content of target data, dividing the target data into M candidate data blocks, M being an integer greater than or equal to 2; then, according to respective fingerprint features of the M candidate data blocks, dividing the M candidate data blocks into N target data blocks, N being a positive integer less than or equal to M, wherein each target data block comprises at least one candidate data block; storing the N target data blocks and fingerprint features of the N target data blocks, the target data blocks and the fingerprint features of the target data blocks being in a one-to-one correspondence; and finally, generating an index tree of the target data according to the N target data blocks, the index tree being used for addressing the content of the target data.

Description

Data management method and device in a storage system

technical field

The embodiments of the present application relate to the field of data storage, and in particular, to a data management method and device in a storage system.

Background technique

In the field of data storage, the method of using Merkle directed acyclic graph (Merkle-DAG) to manage data is favored by major enterprises.

As we all know, Merkle-DAG is composed of at least one tree, that is, data management is based on the tree structure. At present, a method for managing data in a storage system based on Merkle-DAG is: divide the target data (that is, the object to be managed) into multiple data blocks according to a fixed size (such as a fixed byte), and calculate the The hash value of the block; then, store the hash values of the multiple data blocks and the multiple data blocks according to the corresponding relationship between the data blocks and the hash values of the data blocks. Thereafter, the above-mentioned multiple data blocks are grouped according to a fixed number of data blocks; finally, a Merkle-DAG for managing target data is generated based on the multiple groups and the hash value of each data block in the multiple groups (hereinafter The Merkle-DAG of the target data is called an index tree), and this index tree is saved.

However, when the application layer updates the target data (such as data insertion, data modification, and data deletion), the total data volume of the target data may change. If the target data is divided into blocks according to the above-mentioned fixed size, it may cause The number of divided data blocks changes, which in turn leads to a large change in the number of groups of data blocks, so that the index tree needs to be rebuilt. Therefore, more resources need to be consumed in the process of data management.

Contents of the invention

Embodiments of the present application provide a data management method and device in a storage system, which can save resources required for data management.

In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:

In the first aspect, an embodiment of the present application provides a data management method and device in a storage system, the method comprising: dividing the target data into M candidate data blocks based on the content of the target data, where M is an integer greater than or equal to 2;

According to the respective fingerprint characteristics of the M candidate data blocks, the M candidate data blocks are divided into N target data blocks, N is a positive integer less than or equal to M, and each target data block includes at least one candidate data block; store N target data blocks and the fingerprint features of the N target data blocks, the target data block and the fingerprint features of the target data blocks have a one-to-one correspondence; according to the N target data blocks, an index tree of the target data is generated, and the index tree Used to address the contents of object data.

In the data management method in the storage system provided by the embodiment of the present application, the data management device divides the target data into M candidate data blocks based on the content of the target data, and then divides the M candidate data blocks according to the respective fingerprint features of the M candidate data blocks. The candidate data blocks are divided into N target data blocks, and finally, an index tree for addressing the target data is generated according to the N target data blocks. In this embodiment of the application, the target data is first divided into multiple candidate data blocks based on the content of the target data, and then the candidate data blocks are divided into target data blocks according to the fingerprint characteristics of the candidate data blocks, instead of dividing the target data according to a fixed size The data is divided into multiple target data blocks, so the size of the target data block in the embodiment of the present application is not specifically limited. When inserting data into the target data, although the size of the target data after inserting the data has changed, the number of target data blocks may not necessarily change after the target data after the inserted data is divided into target data blocks. Therefore, To a certain extent, it can save the resources consumed by data management.

In a possible implementation manner, the above-mentioned target data is divided into M candidate data blocks based on the content of the target data, which specifically includes: determining M-1 division points of the target data according to the fingerprint characteristics of the target data; The M-1 dividing points divide the target data into M candidate data blocks.

In a possible implementation, the determination of the M-1 division points of the target data according to the fingerprint features of the target data includes: for any one of the above-mentioned M-1 division points, the first division point in the sliding window When the fingerprint feature of a data satisfies the first preset condition, the end position of the sliding window is determined as the division point, the first data is part of the data in the target data, and the first preset condition is the target in the sliding window The modulo value of the fingerprint feature of the data and the first threshold is equal to the second threshold.

In a possible implementation, the determination of the M-1 division points of the target data according to the fingerprint features of the target data includes: for any one of the above-mentioned M-1 division points, the first division point in the sliding window When the fingerprint feature of a data does not meet the first preset condition, slide the sliding window along the preset direction for a preset length, and when the fingerprint feature of the second data in the sliding window satisfies the first preset condition, slide the sliding window The end position is determined as a division point, the second data is part of the target data, and the second data is different from the above-mentioned first data; wherein, the preset length is less than or equal to the length of the sliding window, and the above-mentioned first preset condition A modulo value between the fingerprint feature of the target data in the sliding window and the first threshold is equal to the second threshold.

In a possible implementation manner, the above-mentioned division of the target data into M candidate data blocks based on the content of the target data includes: determining M-1 division points of the target data according to the transformation value of the above-mentioned target data, and the transformation The value is a value converted into a digital form of each data in the preset window based on a preset rule; the target data is divided into M candidate data blocks according to the M-1 division points.

In a possible implementation manner, the above-mentioned M-1 division points of the target data are determined according to the transformation value of the target data, including: the preset window includes a first fixed-length window, a variable-length window, and a second Fixed-length window; the length of the second fixed-length window is the length of a data in the target data or the length of the transformation value of a data in the target data; for any one of the above-mentioned M-1 division points, In the case that the conversion value of the data included in the second fixed-length window satisfies the second preset condition, the end position of the second fixed-length window is determined as the division point, and the second preset condition is The transformation value of the data is greater than the maximum value of the transformation values of each data in the first fixed-length window, and greater than the maximum value of the transformation values of each data in the variable-length window.

In a possible implementation manner, the above-mentioned M-1 division points of the target data are determined according to the transformation value of the target data, including: the above-mentioned preset window includes the first fixed-length window, the variable-length window and the first Two fixed-length windows; the length of the second fixed-length window is the length of a data in the above-mentioned target data or the length of the transformation value of a data in the target data; for any division point, in the second fixed-length window When the conversion value of the data does not meet the second preset condition, increase the length of the variable-length window in the preset window, and when the conversion value of the data in the second fixed-length window satisfies the second preset condition, the The end position of the second fixed-length window is determined as a division point, and the above-mentioned second preset condition is that the conversion value of the data in the second fixed-length window is greater than the maximum value of the conversion values of each data in the first fixed-length window, and Greater than the maximum value of the transformation value of each data in the variable length window.

In a possible implementation manner, according to the respective fingerprint features of the M candidate data blocks, the M candidate data blocks are divided into N target data blocks, which specifically includes: the fingerprint features of the M candidate data blocks satisfy the third predetermined Assume that the end position of the candidate data block of the condition is determined as the N-1 division points of the candidate data block, wherein the third preset condition is that the fingerprint feature of the data of the candidate data block and the modulo value of the third threshold are within the preset within the range; according to the N-1 dividing points, the M candidate data blocks are divided into N target data blocks.

In a possible implementation manner, the fingerprint feature of the data in the candidate data block is the fingerprint feature of all the data in the candidate data block; or, the fingerprint feature of the data in the candidate data block is the fingerprint feature of some data in the candidate data block.

In a possible implementation manner, the above-mentioned generation of the index tree of the target data based on the N target data blocks specifically includes: dividing the N target data blocks into at least one data block according to the respective fingerprint features of the N target data blocks group; generate an index tree of target data based on the respective fingerprint characteristics of multiple data groups.

In a possible implementation manner, according to the respective fingerprint features of the N target data blocks, the N target data blocks are divided into at least one data group, which specifically includes: among the fingerprint features of the N target data blocks satisfying the fourth A target data block with preset conditions is determined as at least one division point of the N target data blocks; and the N target data blocks are divided into multiple data groups according to at least one division point of the target data block.

In a possible implementation manner, the fourth preset condition is: a modulo value between the fingerprint feature of the target data block and the fourth threshold is greater than or equal to the fifth threshold.

In a possible implementation, the above fourth preset condition is: the modulo value between the fingerprint feature of the target data block and the fourth threshold is greater than or equal to the fifth threshold, and the number of target data blocks in the data group is within the first number Between the threshold and the second threshold number, the first number threshold is greater than the second number threshold.

In a possible implementation manner, the fingerprint feature is a hash value.

In the second aspect, the embodiment of the present application provides a data management device in a storage system, the data management device includes: a processing module, a storage module and a generation module; the processing module is used to divide the target data into M based on the content of the target data Candidate data blocks, M is an integer greater than or equal to 2; the processing module is also used to divide the M candidate data blocks into N target data blocks according to the respective fingerprint characteristics of the M candidate data blocks, and N is less than or equal to M is a positive integer, each target data block includes at least one candidate data block; the storage module is used to store N target data blocks and the fingerprint features of the N target data blocks, and the target data block and the fingerprint feature of the target data block have the same One-to-one relationship; the generation module is used to generate an index tree of the target data according to the N target data blocks, and the index tree is used to address the contents of the target data.

In a possible implementation manner, the determination module is configured to determine M-1 division points of the target data according to the fingerprint characteristics of the target data; the processing module is specifically configured to divide the target data according to the M-1 division points are M candidate data blocks.

In a possible implementation manner, the data management device in the above-mentioned storage system further includes: a determination module; the determination module is configured to, when the fingerprint feature of the first data in the sliding window meets the first preset condition, set the The end position is determined as a division point, the first data is part of the target data, and the first preset condition is that the modulo value between the fingerprint feature of the target data in the sliding window and the first threshold is equal to the second threshold.

In a possible implementation manner, the data management device in the above-mentioned storage system further includes: a sliding module; the sliding module is used for any one of the above-mentioned M-1 dividing points, the fingerprint of the first data in the sliding window When the feature does not meet the first preset condition, slide the sliding window along the preset direction for a preset length; when the fingerprint feature of the second data in the sliding window meets the first preset condition, the determination module will slide the sliding window The end position is determined as a division point, the second data is part of the target data, and the second data is different from the above-mentioned first data; wherein, the preset length is less than or equal to the length of the sliding window, and the above-mentioned first preset condition A modulo value between the fingerprint feature of the target data in the sliding window and the first threshold is equal to the second threshold.

In a possible implementation manner, the data management device in the above-mentioned storage system further includes: a determination module; the determination module is configured to determine M-1 division points of the target data according to the transformation value of the above-mentioned target data, and the transformation value is based on the predetermined A rule is set to convert each data in the preset window into a value in digital form; the processing module is used to divide the target data into M candidate data blocks according to the M-1 division points.

In a possible implementation manner, the preset window includes a first fixed-length window, a variable-length window, and a second fixed-length window that are sequentially adjacent; the length of the second fixed-length window is the length of one of the target data length or the length of the converted value of a data in the target data; the above-mentioned processing module is used to convert the converted value of the data included in the second fixed-length window to the second preset condition. The end position is determined as the division point, and the second preset condition is that the transformation value of the data in the second fixed-length window is greater than the maximum value of the transformation values of each data in the first fixed-length window, and greater than the maximum value of the transformation values of the data in the variable-length window. The maximum value of the transformation value of each data.

In a possible implementation manner, the preset window includes a first fixed-length window, a variable-length window, and a second fixed-length window that are sequentially adjacent; the length of the second fixed-length window is one of the above-mentioned target data length or the length of the conversion value of a data in the target data; the processing module is used to increase the available data in the preset window when the conversion value of the data in the second fixed-length window does not meet the second preset condition The length of the variable-length window, when the conversion value of the data in the second fixed-length window satisfies the second preset condition, the end position of the second fixed-length window is determined as the division point, and the above-mentioned second preset condition is the second The transformation value of the data in the fixed-length window is greater than the maximum value of the transformation values of each data in the first fixed-length window, and greater than the maximum value of the transformation values of each data in the variable-length window.

In a possible implementation, the determination module is used to determine the end positions of the candidate data blocks whose fingerprint features meet the third preset condition among the M candidate data blocks as the N-1 division points of the candidate data blocks, where the The three preset conditions are that the fingerprint feature of the data of the candidate data block and the modulo value of the third threshold are within the preset range; the processing module is used to divide the M candidate data blocks into three according to the N-1 dividing points N target data blocks.

In a possible implementation, the above-mentioned processing module is used to divide the N target data blocks into at least one data group according to the respective fingerprint features of the above-mentioned N target data blocks; Fingerprint features to generate an index tree of the target data.

In a possible implementation manner, the determination module is configured to determine a target data block satisfying the fourth preset condition among the respective fingerprint features of the N target data blocks as at least one division point of the N target data blocks; The module is used for dividing the N target data blocks into multiple data groups according to at least one dividing point of the target data blocks.

In a possible implementation manner, the fingerprint feature is a hash value.

In a third aspect, an embodiment of the present application provides a data management device in a storage system, wherein the memory is coupled to the processor; the memory is used to store computer program codes, wherein the computer program codes include computer instructions; when the computer instructions are executed by the processor , make the data management device in the storage system execute the method described in any one of the first aspect and its possible implementation manners.

In a fourth aspect, an embodiment of the present application provides a computer storage medium, including computer instructions. When the computer instructions are run on the computing device, the computing device is made to execute the above-mentioned method described in any one of the first aspect and its possible implementations. method.

In the fifth aspect, the embodiments of the present application provide a computer program product, which, when run on a computer, causes the computer to execute the method described in any one of the above first aspect and possible implementations thereof.

It should be understood that the beneficial effects obtained by the technical solutions of the second aspect to the fifth aspect of the embodiment of the present application and the corresponding possible implementation manners can refer to the technical effects of the above-mentioned first aspect and the corresponding possible implementation manners, I won't repeat them here.

Description of drawings

FIG. 1 is a schematic diagram of a block and group flow process of target data provided by an embodiment of the present application;

FIG. 2 is a schematic structural diagram of an index tree provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a construction process of an index tree provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of a hardware structure of a storage system provided by an embodiment of the present application;

FIG. 5 is a first schematic flowchart of a data management method in a storage system provided by an embodiment of the present application;

FIG. 6 is a second schematic flow diagram of a data management method in a storage system provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of a division process of a candidate data block provided in an embodiment of the present application;

FIG. 8 is a schematic flowchart of a method for dividing candidate data blocks provided by an embodiment of the present application;

FIG. 9 is a third schematic flowchart of a data management method in a storage system provided by an embodiment of the present application;

FIG. 10 is a schematic diagram of another candidate data block division process provided by the embodiment of the present application;

FIG. 11 is a schematic flowchart of another method for dividing candidate data blocks provided by the embodiment of the present application;

FIG. 12 is a fourth schematic flowchart of a data management method in a storage system provided by an embodiment of the present application;

FIG. 13 is a schematic flow diagram V of a data management method in a storage system provided by an embodiment of the present application;

FIG. 14 is a sixth schematic flow diagram of a data management method in a storage system provided by an embodiment of the present application;

FIG. 15 is a schematic diagram of a data management device in a storage system provided by an embodiment of the present application;

FIG. 16 is a schematic diagram of another data management device in a storage system provided by an embodiment of the present application.

Detailed ways

The term "and/or" in this article is just an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B can mean: A exists alone, A and B exist simultaneously, and there exists alone B these three situations.

The terms "first" and "second" in the description and claims of the embodiments of the present application are used to distinguish different objects, rather than to describe a specific order of objects. For example, the first threshold and the second threshold are used to distinguish different thresholds, but not to describe a specific order of the thresholds.

In the embodiments of the present application, words such as "exemplary" or "for example" are used as examples, illustrations or illustrations. Any embodiment or design scheme described as "exemplary" or "for example" in the embodiments of the present application shall not be interpreted as being more preferred or more advantageous than other embodiments or design schemes. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete manner.

In the description of the embodiments of the present application, unless otherwise specified, "plurality" means two or more. For example, a plurality of data groups refers to two or more data groups.

With the development of Internet technology, the way of using Merkle-DAG to manage data is more and more widely used in storage systems.

As we all know, Merkle-DAG is composed of at least one tree, that is, data management is based on the tree structure. In the prior art, the method of managing data in the storage system based on Merkle-DAG is as follows: first, divide the target data into multiple data blocks according to a fixed size, assuming that the size of the target data is 1024 bytes, as shown in Figure 1 As shown in (A) in Figure 1, the target data is divided into 4 data blocks according to the size of each data block is 256 bytes, that is, data block 1-data block shown in Figure (B) in Figure 1 4; Secondly, calculate the hash value of each data block separately, for example, the hash value of the above-mentioned data block 1-data block 4 is hash value 1-hash value 4, and according to the corresponding relationship between the data block and the hash value Store the data block and the hash value of the data block in the preset table, and the following table 1 is an example of the corresponding relationship between the above-mentioned data block 1-data block 4 and its hash value; then, according to a fixed number of the above-mentioned multiple data blocks Carry out grouping, for example 2 data blocks are one group, this 4 data blocks are divided into 2 data groups, for example data group 1-data group 2 shown in (C) figure among Fig. 1, wherein, data group 1 Including data block 1 and data block 2, data group 2 includes data block 3 and data block 4, after the grouping of data blocks is completed, a first index tree for managing target data is generated and saved based on the hash value of each data block.

Table 1

数据块索引data block index	数据块内容data block content	哈希值hash value	哈希值hash value
数据块1data block 1	AASDECCAASDECC	哈希值1hash value 1	-502161580-502161580
数据块2data block 2	FWEQFWEFWEQFWE	哈希值2Hash 2	257690423257690423
数据块3data block 3	FWEQFFTFWEQFFT	哈希值3Hash 3	257689911257689911
数据块4data block 4	JYTEWQCJYTEWQC	哈希值4Hash 4	-416492375-416492375

Still taking the above target data of 1024 bytes as an example, as shown in Figure 2, the generation method of the first index tree of the above target data is as follows: divide the target data into hash values corresponding to data block 1-data block 4 1-Hash value 4 is used as the leaf node of the first index tree, which can be recorded as leaf node 1, leaf node 2, leaf node 3, and leaf node 4; then, calculate the two data groups divided by these 4 leaf nodes The hash value of the leaf node, that is, calculate the hash value of hash value 1 and hash value 2, and the hash value of hash value 3 and hash value 4, and combine hash value 1 and hash value 2 As the parent node of leaf node 1 and leaf node 2 (called the first parent node, the first parent node is a child node in the first index tree), hash value 3 and hash value 4 as the parent node of leaf node 3 and leaf node 4 (called the second parent node, the second parent node is another child node in the first index tree), and finally, calculate the first parent node and For the hash value of the second parent node, use the hash values of the first parent node and the second parent node as the root node.

When the target data is updated, it may cause the total data volume of the target data to change. For example, with reference to FIG. The data volume of the target data is increased from 1024 bytes to 1280 bytes. If the size of each data block is fixed at 256 bytes, the target data is divided into blocks, and 5 data blocks can be obtained, and then the After the 5 data blocks are grouped according to the above grouping method, 3 data groups can be obtained. Data group 1 of the 3 data groups includes data block 1 and data block 5, and data group 2 includes data block 2 and data block 3. Data group 3 includes data block 4, and before new data is inserted, the target data is divided into two data groups, data group 1 includes data block 2 and data block 2, data group 2 includes data block 3 and data block 4, It can be seen that the total data volume of the target data has changed, resulting in a large change in the grouping of the target data. In this way, the second index tree shown in Figure 3 is generated based on the hash values of data blocks 1-5. It can be seen that the same as Compared with the first index tree, the second index tree has a large change, such as: the shaded part in Figure 3 is the part that has changed, therefore, it takes a lot of time to generate the index tree for the target data after inserting the data The amount of calculation leads to the consumption of more resources in the process of data management in the above-mentioned prior art.

In the above scheme, due to the insertion of data into the target data, the block and grouping of the target data have been transformed, so that when the target data is re-inserted to generate an index tree, it takes a lot of calculations, resulting in For the problem of consuming more resources in the process, the embodiment of the present application provides a data management method and device in a storage system. The data management device divides the target data into M candidate data blocks based on the content of the target data, and M is greater than or an integer equal to 2; according to the respective fingerprint characteristics of the M candidate data blocks, the M candidate data blocks are divided into N target data blocks, N is a positive integer less than or equal to M, and each target data block includes at least one candidate Data block; store N target data blocks and fingerprint features of N target data blocks, and target data blocks have a one-to-one correspondence with the fingerprint features of target data blocks; generate an index tree of target data according to N target data blocks, The index tree is used to address the content of the target data.

Through the technical solutions provided by the embodiments of the present application, resource consumption in the process of data management can be saved.

The data management method and device in the storage system provided by the embodiment of the present application can be applied to the storage system shown in Figure 4. The storage system can be a storage system composed of a solid-state hard disk, or a storage system composed of other types of storage media. system. As shown in Figure 4, the storage system includes a controller (abbreviation: main control) 401 and a plurality of hard disks 405, wherein the main control 401 includes: a processor 402, optionally, the controller 401 also includes a host interface 404, and n (n>0) channel controllers 403 .

The above-mentioned master control 401 is used to issue executable commands to multiple hard disks 405 , so as to read or update data on the hard disks 405 .

The above-mentioned host interface 404 is used to communicate with the host, and then receive the command request sent by the host, and forward the command request to the processor 402, wherein the above-mentioned host is not limited to any device such as server, personal computer or array controller.

The above-mentioned processor 402 sends executable commands to the above-mentioned multiple hard disks 405 according to the command request forwarded by the host interface 404. Specifically, the above-mentioned processor 402 is used to execute the data management method in the storage system provided by the embodiment of the present application, for example, processing The implementer 402 is used to block target data, group data blocks, and generate an index tree. Optionally, the processor 402 may include one or more CPUs, and the CPUs may be single-core CPUs (single-CPU) or multi-core CPUs (multi-CPU).

The channel controller 403 is used to carry the executable commands issued by the processor 402 to the hard disk 405 .

Optionally, the storage system further includes a bus 406, and the processor 402, the channel controller 403, the host interface 404, and the hard disk 405 are generally connected to each other through the bus 406, or are connected to each other in other ways.

When the above-mentioned storage system receives the target data transmitted by the host, the host interface 404 in the main control 401 forwards the target data to the processor 402 in the main control 401, and the processor 402 divides the target data into M candidate data blocks, According to the respective fingerprint features of the M candidate data blocks, the M candidate data blocks are divided into N target data blocks, and then, the processor 402 divides the N target data blocks according to the corresponding relationship between the target data blocks and the fingerprint features of the target data blocks. The target data block and the fingerprint features of the N target data blocks are sent to n hard disks 405 through the channel controller 403 to be stored in the hard disks 405; finally, the processor 402 generates the target data according to the N target data blocks. index tree, and store the index tree in the hard disk 405.

Optionally, the device for executing the data management method in the storage system provided in the embodiment of the present application may be the processor 402 in the controller in the storage system shown in FIG. 4 above.

With reference to the schematic architecture diagram of the storage system shown in FIG. 4 above, as shown in FIG. 5 , the data management method in the storage system provided by the embodiment of the present application may include S510-S540.

S510. The data management device divides the target data into M candidate data blocks based on the content of the target data.

Wherein, M is an integer greater than or equal to 2.

The above M candidate data blocks may be M data blocks that the data management device determines M-1 division points based on the content of the target data, and then cuts the target data into M data blocks according to the M-1 division points; The content of the target data determines M-1 division points in the content of the target data, and the M-1 division points divide the target data into M intervals, and there is no need to cut the target data according to the M-1 division points , each interval is a candidate data block, and the specific embodiment of the present application does not limit the division method of the above M candidate data blocks.

It should be understood that the content of the above-mentioned target data refers to all elements that make up the target data, and the sizes of the above-mentioned M candidate data blocks may be all the same, may be partly the same, or may be different from each other.

S520. The data management device divides the M candidate data blocks into N target data blocks according to the respective fingerprint features of the M candidate data blocks.

The above N is a positive integer less than or equal to M.

Optionally, the fingerprint feature of the above candidate block may be the hash value of the candidate block, or other features that can uniquely identify a data block, which are determined according to actual needs. To limit.

The above S520 is specifically: the data management device determines N-1 division points according to the respective fingerprint characteristics of the M candidate data blocks, and then, the data management device divides the M candidate data blocks into N according to the N-1 division points. target data blocks, and each target data block includes at least one candidate data block.

S530. The data management device stores N target data blocks and fingerprint features of the N target data blocks.

It should be noted that there is a one-to-one correspondence between the target data block and the fingerprint feature of the target data block.

The above steps are specifically: storing the fingerprint feature of a certain target data block and the content of the target data block in the same row of the preset table, or storing the fingerprint feature of the target data block and the target data block ( index of the target data block), the index of the target data block and the content of the target data block are stored in the preset table 2, and the application does not specify the storage method of the N target data blocks and the fingerprint features of the N target data blocks To limit.

S540. The data management device generates an index tree of the target data according to the N target data blocks, and the index tree is used for addressing content of the target data.

It should be noted that when the target data needs to be queried, the corresponding index tree of the target data needs to be determined, and then the leaf nodes of the index tree are searched recursively according to the root node of the index tree, and then according to the hash in the leaf node Hash the content of the data block corresponding to the leaf node to obtain the target data

Exemplarily, when the data management device needs to obtain the target data corresponding to the index tree shown in Figure 2, the data management device searches the child nodes (the first parent node and the second parent node) of the index tree according to the root node of the index tree. node), and then find 4 leaf nodes according to the child nodes, and finally, query the content of the data block corresponding to the 4 hash values in the preset table according to the hash values in the 4 leaf nodes, so as to obtain the target data.

Referring to FIG. 5 , as shown in FIG. 6 , in an implementation manner, the method for dividing the target data into M candidate data blocks based on the content of the target data (that is, S510 ) may specifically include: S610-S620.

S610. The data management device determines M−1 division points of the target data according to the fingerprint feature of the target data.

It can be understood that, in the process of determining the M-1 division points of the target data, the method of determining each division point is the same. As shown in FIG. 8 , the method of determining a division point of the target data includes S810-S830.

S810. The data management device judges whether the fingerprint feature of the data in the sliding window satisfies a first preset condition.

The size of the above-mentioned sliding window is fixed, the sliding step of the sliding window is a preset length, and the sliding step of the sliding window can be set according to actual needs, for example, the sliding step is 1 byte or 2 bytes, etc. This embodiment of the present application does not limit it.

The above-mentioned first preset condition is that the modulo value between the fingerprint feature of the data in the sliding window and the first threshold is equal to the second threshold, where the first threshold and the second threshold may be pre-configured.

Exemplarily, assume that the first threshold is 20, and the second threshold is 9; as shown in Figure 7 (A), the content of the target data is ABCDEF...XYZ, and the size of the sliding window corresponds to two Letter, if the data in the current sliding window is "BC", and the hash value of "BC" is 2113, first, calculate the modulo value between the hash value of the data in the sliding window and the first threshold as follows: 2113 mod 20 = 13 , where mod is used to represent a modulo operation, and then it is judged that the modulo value between the fingerprint feature of the data in the sliding window and the first threshold is equal to the second threshold.

It can be seen from the above calculation that the modulo value of the hash value 2113 of the data in the sliding window and the first threshold 20 is 13, and the second threshold is 9, that is: the hash value of the data in the sliding window and the first threshold The modulo value is not equal to the second threshold; therefore, the hash value of the data in the sliding window does not satisfy the first preset condition.

S820. In a case where the fingerprint feature of the first data in the sliding window satisfies a first preset condition, the data management device determines an end position of the sliding window as a division point.

The above-mentioned first data is part of the target data.

It should be noted that the end position of the sliding window is the position closest to the sliding direction on the sliding window. For example, as shown in Figure 7 (A), the end position of the sliding window at this time is the letters "C" and "D". "The junction position.

S830. When the fingerprint feature of the first data in the sliding window does not satisfy the first preset condition, the data management device slides the sliding window along a preset direction for a preset length.

The preset length is the sliding step of the sliding window, and the preset length is less than or equal to the length of the sliding window.

It should be noted that the sliding direction of the above-mentioned sliding window is pre-configured, and the sliding direction can be from right to left, or from left to right. In the embodiment of this application, the sliding direction of the sliding window is rightward An example is used for description, and details will not be described later.

It should be noted that after the data management device executes S830, it continues to execute the above S810. When the fingerprint feature of the second data in the sliding window satisfies the first preset condition, the end position of the sliding window is determined as the division point, and the second data is part of the data in the target data; that is, after the data management device executes S830, it continues to execute the above S810 until the fingerprint feature of the data in the sliding window satisfies the first preset condition, and determines the end position of the sliding window as dividing point.

Exemplarily, based on the example of S810, assuming that the preset length is the size of a letter, referring to (A) in FIG. 7 , the hash value of the data (that is, BC) in the sliding window does not meet the first preset condition , slide the sliding window to the right by one letter, currently, the data in the sliding window is "CD", as shown in (B) in Figure 7; at this time, determine the hash of the data "CD" in the sliding window Whether the value satisfies the first preset condition, and if so, determine the boundary position between "C" and "D" as the dividing point.

S620. The data management device divides the target data into M candidate data blocks according to the M-1 division points.

Still taking the target data as "ABCDEF...XYZ" as an example, assuming that according to the above S610, 4 division points of the target data are determined, and the target data is divided into 5 intervals according to these 4 division points, then the The 5 intervals correspond to 5 candidate data blocks, and the 5 candidate data blocks are respectively {ABCDEF}, {GHIJK}, {LMNOP}, {QRSTU} and {VWXYZ}.

It should be noted that after the target data is updated (such as inserting data into the target data), the content of the target data has changed, but the position in the updated target data that satisfies the first preset condition may not occur Changes, and then after the updated target data is divided into candidate blocks, the number of candidate blocks will not change, and the content of most candidate blocks may also remain unchanged.

Exemplarily, based on the example of S620, it is assumed that the first threshold is 20 and the second threshold is 9; when the data "AALMXXWX" is inserted into the target data as shown in (A) or (B) , assuming that the insertion position is between "C" and "D" in the above candidate data block {ABCDEF}, then for the new target data, execute the above S610-S620, because the target data after inserting the data satisfies the first preset The position of the condition has not changed, so the number and position of the division points are determined based on the above-mentioned technical solution of S810-S830, and the target data is divided into 5 candidate data blocks according to the 4 division points, and the 5 candidate data blocks The blocks are {ABCAALMXXWXDEF}, {GHIJK}, {LMNOP}, {QRSTU}, and {VWXYZ}.

Optionally, referring to FIG. 5, as shown in FIG. 9, in another implementation manner, the above-mentioned method of dividing the target data into M candidate data blocks (that is, S510) based on the content of the target data may include: S910- S920.

S910. The data management device determines M-1 division points of the target data according to the conversion value of the target data.

The above-mentioned conversion value is based on a preset rule to convert each data in the preset window into a value in digital form.

The transformation value of each data above is the value obtained after transforming each data in the target data by using a transformation method. When the file is composed of letters, the conversion value of a letter can be the ASCII (American standard code for information interchange, ASCII) corresponding to the letter, or the hash value corresponding to the letter or other values represented in digital form . As shown in (A) in FIG. 10 , it is the conversion value text of the target data obtained by converting the content of the target data into numbers according to preset rules.

It can be understood that in the process of determining the M-1 division points of the target data, the method of determining each division point is the same, and in one implementation, each division point of the target data is determined by using a preset window The method is specifically shown in Figure 11, including S1110-S1130.

S1110. The data management device judges whether the conversion value of the data in the second fixed-length window in the preset window satisfies a second preset condition.

It should be noted that the above-mentioned preset window includes the first fixed-length window, the variable-length window and the second fixed-length window adjacent in sequence; wherein, the length of the first fixed-length window is an integer greater than 0; the variable-length window The initial length is a preset value, and the preset value is an integer greater than or equal to 0; the length of the second fixed-length window is the length of a data in the target data or the length of a transformed value of a data in the target data, That is to say, when the length of the variable-length window is greater than 0, the length of the second fixed-length window is the smallest unit of data corresponding to the variable-length window and the first fixed-length window, for example: when the length of the first fixed-length window is 4 bytes, and when the length of the variable-length window is 2 bytes, the length of the second fixed-length window is 1 byte.

The above-mentioned second preset condition is that the conversion value of the data in the second fixed-length window is greater than the maximum value of the conversion value of each data in the first fixed-length window, and is greater than the maximum value of the conversion value of each data in the variable-length window. value.

Specifically, the data management device determines the maximum value of each transformation value in the first fixed-length window, and the maximum value is called the first maximum value; the data management device then determines the maximum value of each transformation value in the variable-length window , that is: the second maximum value; then, the data management device judges whether the transformation value in the second fixed-length window is greater than the first maximum value and also greater than the second maximum value.

S1120. In a case where the transformation value of the data included in the second fixed-length window satisfies a second preset condition, the data management device determines an end position of the second fixed-length window as a division point.

S1130. In a case where the conversion value of the data in the second fixed-length window does not satisfy the second preset condition, the data management device increases the length of the variable-length window in the preset window.

The above-mentioned data management device may increase the length of the variable-length window in the preset window according to a pre-configured length, and the pre-configured length may be determined according to actual conditions, for example, configured as 2 bytes.

It can be understood that after the length of the variable-length window is increased by a preset length, since the second fixed-length window is adjacent to the variable-length window, the second fixed-length window will move backward by a preset length.

It should be noted that after the execution of S1130, the data management device judges whether the transformation value of the data in the current second fixed-length window satisfies the second preset condition, and if so, determines the end position of the current second fixed-length window as the dividing point ; If not satisfied, then continue to increase the length of the variable-length window in the preset window until the conversion value of the data in the second fixed-length window meets the second preset condition, and determine the end position of the second fixed-length window as the dividing point.

Exemplarily, as shown in Figure 10 (A), 12 and 18 are included in the first fixed-length window, 2 is included in the variable-length window, and 6 is included in the second fixed-length window; wherein, the second fixed-length The conversion value 6 in the window is less than the maximum value 18 in the first fixed-length window, therefore, the conversion value of the data in the second fixed-length window does not meet the second preset condition; The length increases by 1 unit length so that the length of the variable length window is 2, as shown in (B) figure in Figure 10, at this time, 12 and 18 are included in the first fixed length window, and 2 is included in the variable length window. and 6, including 45 in the second fixed-length window, at this time, while the transformation value 45 in the second fixed-length window is greater than the maximum value 18 in the first fixed-length window, it is also greater than the maximum value 6 in the variable-length window , therefore, the data management device determines the end position of the second fixed-length window as the division point.

It should be noted that after determining a division point of the candidate data block, starting from the transformation value after the division point, continue to use the above-mentioned preset window to judge whether the transformation value in the preset window satisfies the second preset condition to determine For the next division point, it should be noted that the length of the variable-length window in the preset window used to determine the next division point is the initial length. It should be understood that the method of determining each division point is similar. For example, with reference to (C) in FIG. 10, after determining a division point (for example, the position of conversion value 45) of the candidate data block, the initial position of the preset window for determining the next division point is conversion value 5, The transformation values in the first fixed-length window in the preset window include 5 and 9, the transformation values in the variable-length window include 36, and the transformation values in the second fixed-length window include 5.

S920. The data management device divides the target data into M candidate data blocks according to the M-1 division points.

Exemplarily, based on the above S1110-S1130, the conversion value text of the target data can be divided into four candidate data blocks, for example, respectively: {12,18,2,6,45}, {5,9,36,5, 5,65}, {56,5,9,7,62}, and {8,8,432,9,81,20}.

It should be noted that after the target data is updated (such as: inserting data in the target data), the content of the target data has changed, but the position in the updated target data that satisfies the second preset condition may not occur Changes, and then after the updated target data is divided into candidate blocks, the number of candidate blocks will not change, and the content of most candidate blocks may also remain unchanged.

Exemplarily, insert data into the target text as shown in (A) in Figure 10, the conversion value set of the inserted data is {9,10,12,1,-40}, assuming the position of the inserted data is the middle of 18 and 2 in the candidate data block {12, 18, 2, 6, 45}, since the position in the target data after inserting the data that satisfies the second preset condition has not changed, so based on the above S1110-S1130 The technical solution determines the number and position of the division points, and divides the target data into 4 candidate data blocks according to the 3 division points, and the 4 candidate data blocks are respectively {12, 18, 9, 10, 12, 1 ,-40,2,6,45}, {5,9,36,5,5,65}, {56,5,9,7,62}, and {8,8,432,9,81,20}.

Optionally, in combination with FIG. 6 or FIG. 9, as shown in FIG. 12, the M candidate data blocks are divided into N target data blocks according to the respective fingerprint features of the M candidate data blocks (ie: S520), including: S1210-S1220.

S1210. The data management device determines, among the M candidate data blocks, the end positions of the candidate data blocks whose fingerprint features meet the third preset condition as N−1 division points of the candidate data blocks.

The fingerprint feature of the above candidate data block can be a fingerprint feature corresponding to all the data in the candidate data block. For example, if the data in the candidate data block is "WANHH", then the hash value of the candidate data block is "WANHH". The overall hash value can also be the fingerprint feature corresponding to some data in the candidate data block; for example, if the data in the candidate data block is "WANHH", then the hash value of the candidate data block corresponds to "NHH". For the hash value, in the embodiment of the present application, the fingerprint features of the candidate data blocks are described by taking the fingerprint features corresponding to all the data in the candidate data blocks as an example, and will not be described in detail later.

The above-mentioned third preset condition is that the modulo value between the fingerprint feature of the data of the candidate data block and the third threshold is within a preset range.

S1220. The data management device divides the M candidate data blocks into N target data blocks according to the N-1 division points.

Exemplarily, assuming that the third threshold is 60, and the preset range is 50 to 60, the target data is divided into 5 candidate data blocks, and the 5 candidate data blocks are respectively {ABCAALMXXWXDEF}, {GHIJK}, {LMNOP} , {QRSTU}, and {VWXYZ}, the calculated hash values of the five candidate data blocks are: -1130721247, 67787465, 72558990, 77330515, and 82102040; then, calculate the hash values of the five candidate data blocks and the first The modulo values of the three thresholds are: -7, 5, 30, 55 and 20 respectively. It can be seen that the hash value of the candidate data block {LMNOP} satisfies the third preset condition, so the candidate data block {LMNOP} is used as the above The division points of the 5 candidate data blocks can divide the candidate data blocks into 2 target data blocks, namely {ABCAALMXXWXDEFGHIJKLMNOP} and {QRSTU}.

Optionally, the method for dividing the target data block in the above S520 may be similar to the method for dividing candidate data blocks S610-S620 and S910-S920. For details, refer to the relevant descriptions of the above-mentioned S610-S620 and S910-S920, which will not be repeated here.

In the data management method in the storage system provided by the embodiment of the present application, the data management device divides the target data into M candidate data blocks, and then, the data management device divides the candidate data whose fingerprint characteristics satisfy the third preset condition in the M candidate data blocks The end positions of the data blocks are determined as N-1 division points of the candidate data blocks, and then N target data blocks are determined. In this way, after updating the target data (such as inserting data in the target data), even if the fingerprint feature or conversion value of the inserted data changes the number of candidate data blocks, the changed multiple candidate data blocks satisfy the third The position of the preset condition does not necessarily change, and the number of target data blocks does not necessarily change. Therefore, when constructing an index tree based on the multiple target data blocks, the index tree to be constructed is compared to the target data before inserting data. In the index tree, only the hash values of individual nodes have changed, thereby saving the resources consumed by data management.

Optionally, referring to FIG. 6 or FIG. 9 , as shown in FIG. 13 , the above S540 includes: S1310-S1320.

S1310. The data management device divides the N target data blocks into at least one data group according to the respective fingerprint features of the N target data blocks.

It should be noted that the above-mentioned fingerprint feature of the target data block can be a hash value of all data in the target data block, or a hash value corresponding to some data in the target data block. In the embodiment of the present application, the target data The fingerprint features of a block are illustrated by taking the hash value corresponding to all data in the target data block as an example.

Exemplarily, the data in the target data block is {ABCAALMXXWXDEFGHIJKLMNOP}, and the fingerprint feature of the target data block is the hash value of "ABCAALMXXWXDEFGHIJKLMNOP".

Optionally, referring to FIG. 13 , as shown in FIG. 14 , in an implementation manner, the foregoing S1310 specifically includes: S1410-S1420.

S1410. The data management device determines a target data block satisfying a fourth preset condition among the fingerprint features of each of the N target data blocks as at least one division point of the N target data blocks.

The fourth preset condition above is:

A modulo value between the fingerprint feature of the target data block and the fourth threshold is greater than or equal to the fifth threshold.

Or, the number of target data blocks in the data group is between the first number threshold and the second number threshold, and the modulo value of the fingerprint feature of the target data block and the fourth threshold is greater than or equal to the fifth threshold, wherein the first number The threshold is greater than the second number threshold.

It should be noted that the foregoing maximum threshold, minimum threshold, fourth threshold, and fifth threshold are pre-configured.

S1420. The data management device divides the N target data blocks into multiple data groups according to at least one division point of the target data blocks.

Exemplarily, it is assumed that the target data is divided into 4 target data blocks, namely: {ALMXXWXDEFGHIJK}, {ABRFGRTGRTRGE}, {DEWFRTNEBJ} and {JDIEOFJDEJFOEW}; the fourth threshold is 80, and the fifth threshold is 70; obtained by calculation The modulo values of the hash values of the above four target data blocks and the fourth threshold 80 are respectively 25, 75, 33 and 55. It can be seen that the target data block {ABRFGRTGRTRGE} satisfies the fourth preset condition. The target data blocks are divided into two groups: {ALMXXWXDEFGHIJK} and {ABRFGRTGRTRGE}, and {DEWFRTNEBJ} and {JDIEOFJDEJFOEW}.

Optionally, the above S1310 may also use the above methods S610-S620 and/or S910-S920 for dividing candidate blocks, for details, refer to the relevant descriptions of S610-S620 and/or S910-S920, which will not be repeated here.

S1320. The data management device generates an index tree of the target data based on the respective fingerprint features of the multiple data groups.

In the data management method in the storage system provided by the embodiment of the present application, after the data management device divides the target data block, the data management device determines the target data block that satisfies the fourth preset condition among the respective fingerprint features of the N target data blocks as The at least one division point of the target data block is used to divide the N target data blocks into multiple data groups according to the at least one division point of the target data block. Compared with the prior art, when the target data is updated, even if the number of target data blocks of the target data changes, but when the updated data of the target data (such as inserted data) does not meet the fourth preset condition, then the The division points for grouping the target data block remain unchanged, so the number of groups of the target data block is exactly the same as the number of groups of the target data block before updating the target data. At this time, when constructing the index tree of the updated target data, only the Part of the leaf nodes, part of the hash values of child nodes and the hash value of the root node of the index tree of the target data before the data is inserted can be modified, thereby saving resources required for data management.

Correspondingly, the embodiment of the present application provides a data management device in the storage system, the data management device in the storage system is used to execute each step in the above-mentioned fingerprint verification method, the embodiment of the present application can use the example of the above-mentioned method for the storage system The data management device divides the functional modules. For example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. The division of modules in the embodiment of the present application is schematic, and is only a logical function division, and there may be other division methods in actual implementation.

In the case of dividing each functional module corresponding to each function, FIG. 15 shows a possible structural diagram of the data management device in the storage system involved in the above embodiment. As shown in FIG. 15 , the data management device in the storage system includes: a processing module 1501 , a storage module 1502 and a generating module 1503 .

The processing module 1501 is configured to divide the target data into M candidate data blocks based on the content of the target data, for example, execute step S510 in the above method embodiment.

The processing module 1501 is further configured to divide the M candidate data blocks into N target data blocks according to their respective fingerprint features, for example, execute step S520 in the above method embodiment.

The storage module 1502 is configured to store N target data blocks and fingerprint features of the N target data blocks, for example, execute step S530 in the above method embodiment.

The generating module 1503 is configured to generate an index tree of the target data according to the N target data blocks, and the index tree is used to address the content of the target data, for example, execute step S540 in the above method embodiment.

Optionally, the data management device in the storage system provided in the embodiment of the present application further includes a determination module 1504;

The determining module 1504 is configured to determine M-1 division points of the target data according to the fingerprint feature of the target data, for example, execute step S610 in the above method embodiment.

The processing module 1501 is specifically configured to divide the target data into M candidate data blocks according to the M-1 division points, for example, execute step S620 in the above method embodiment.

Optionally, the data management device in the storage system provided in the embodiment of the present application further includes a sliding module 1505 . The determining module 1504 is configured to determine the end position of the sliding window as the division point when the fingerprint feature of the first data in the sliding window satisfies the first preset condition, for example, execute step S820 in the above method embodiment.

The sliding module 1505 is configured to slide the sliding window along a preset direction for a preset length when the fingerprint feature of the first data in the sliding window does not satisfy the first preset condition, for example, execute step S830 in the above method embodiment.

Optionally, the determination module 1504 is further configured to determine M-1 division points of the target data according to the transformation value of the target data, for example, execute step S910 in the above method embodiment.

The processing module 1501 is specifically configured to divide the target data into M candidate data blocks according to the M-1 division points, for example, execute step S920 in the above method embodiment.

Optionally, the determination module 1504 is further configured to determine the end position of the second fixed-length window as the dividing point when the transformation value of the data included in the second fixed-length window satisfies the second preset condition, for example, execute the above-mentioned Step S1120 in the method embodiment.

The above-mentioned processing module 1501 is also used to increase the length of the variable-length window in the preset window when the conversion value of the data in the second fixed-length window does not meet the second preset condition, and within the second fixed-length window When the transformation value of the data of the above-mentioned data satisfies the second preset condition, the end position of the second fixed-length window is determined as the division point, for example, step S1130 in the above method embodiment is executed.

Optionally, the determination module 1504 is configured to determine the end positions of the candidate data blocks whose fingerprint features meet the third preset condition among the M candidate data blocks as the N-1 division points of the candidate data blocks, for example, execute the above method to implement Step S1210 in the example.

The processing module 1501 divides the M candidate data blocks into N target data blocks according to the N-1 division points, for example, executes step S1220 in the above method embodiment.

Optionally, the above processing module 1501 is further configured to divide the N target data blocks into at least one data group according to their respective fingerprint features, for example, perform step S1310 in the above method embodiment.

The generating module 1503 is further configured to generate an index tree of target data based on the respective fingerprint features of multiple data groups, for example, execute step S1320 in the above method embodiment.

Optionally, the determination module 1504 is configured to determine the target data block that satisfies the fourth preset condition among the fingerprint features of each of the N target data blocks as at least one division point of the N target data blocks, for example, execute the above method to implement Step S1410 in the example.

The processing module 1501 is configured to divide the N target data blocks into multiple data groups according to at least one division point of the target data blocks, for example, execute step S1420 in the above method embodiment.

Each module of the data management device in the above-mentioned storage system can also be used to perform other actions in the above-mentioned method embodiment. All relevant content of each step involved in the above-mentioned method embodiment can be referred to the function description of the corresponding functional module, which is not described here. Let me repeat.

In the case of using an integrated unit, a schematic structural diagram of a data management device in a storage system provided by an embodiment of the present application is shown in FIG. 16 . In FIG. 16 , the electronic device includes: a processing module 1601 and a communication module 1602 . The processing module 1601 is used to control and manage the actions of the data management device in the storage system, for example, to execute the steps performed by the processing module 1501, the generation module 1503, the determination module 1504, and the sliding module 1505, and/or to execute the steps described herein. other processes of the technology. The communication module 1602 is used to support the interaction between the data management device and other devices in the storage system. As shown in Figure 16, the data management device in the storage system may also include a storage module 1603, which is used to store the program code of the data management device in the storage system and to relationship etc.

Wherein, the processing module 1601 may be a processor or a controller, for example, the controller 401 or the processor 402 in FIG. 4 . The communication module 1602 may be a transceiver, an RF circuit, or a communication interface, etc., such as the bus 406 and/or the channel controller 403 in FIG. 4 . The storage module 1603 may be a memory, such as the hard disk 405 in FIG. 4 .

In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using a software program, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, all or part of the processes or functions according to the embodiments of the present application will be generated. The computer can be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g. (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device including a server, a data center, and the like integrated with one or more available media. The available medium may be a magnetic medium (for example, a floppy disk, a magnetic disk, a magnetic tape), an optical medium (for example, a digital video disc (digital video disc, DVD)), or a semiconductor medium (for example, a solid state drive (solid state drives, SSD)), etc. .

Through the description of the above embodiments, those skilled in the art can clearly understand that for the convenience and brevity of the description, only the division of the above-mentioned functional modules is used as an example for illustration. In practical applications, the above-mentioned functions can be allocated according to needs It is completed by different functional modules, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. For the specific working process of the above-described system, device, and unit, reference may be made to the corresponding process in the foregoing method embodiments, and details are not repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, device and method can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be Incorporation may either be integrated into another system, or some features may be omitted, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage medium includes: flash memory, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk, and other various media capable of storing program codes.

The above description is only a specific implementation mode of the present invention, but the protection scope of the present invention is not limited thereto, and the protection scope of the present invention should be based on the protection scope of the claims.

Claims

A method for managing data in a storage system, comprising:

Based on the content of the target data, dividing the target data into M candidate data blocks, where M is an integer greater than or equal to 2;

According to the respective fingerprint features of the M candidate data blocks, divide the M candidate data blocks into N target data blocks, where N is a positive integer less than or equal to M, and each of the target data blocks includes at least one of the target data blocks Describe the candidate data block;

storing the N target data blocks and the fingerprint features of the N target data blocks, the target data blocks having a one-to-one correspondence with the fingerprint features of the target data blocks;

An index tree of the target data is generated according to the N target data blocks, and the index tree is used to address content of the target data.
The method according to claim 1, wherein the target data is divided into M candidate data blocks based on the content of the target data, specifically comprising:

Determine M-1 division points of the target data according to the fingerprint feature of the target data;

Divide the target data into M candidate data blocks according to the M-1 division points.
The method according to claim 2, wherein said determining M-1 division points of said target data according to the fingerprint feature of said target data comprises:

For any one of the M-1 division points, when the fingerprint feature of the first data in the sliding window satisfies the first preset condition, the end position of the sliding window is determined as the division point, The first data is part of the target data, and the first preset condition is that a modulo value between the fingerprint feature of the target data in the sliding window and the first threshold is equal to the second threshold.
The method according to claim 2, wherein said determining M-1 division points of said target data according to the fingerprint feature of said target data comprises:

For any one of the M-1 division points, if the fingerprint feature of the first data in the sliding window does not satisfy the first preset condition, slide the sliding window along a preset direction for a preset length , when the fingerprint feature of the second data in the sliding window satisfies the first preset condition, determining the end position of the sliding window as a division point, the second data being a part of the target data data, the second data is different from the first data; wherein, the preset length is less than or equal to the length of the sliding window, and the first preset condition is the fingerprint of the target data in the sliding window The modulo value of the feature and the first threshold is equal to the second threshold.
The method according to claim 1, wherein the target data is divided into M candidate data blocks based on the content of the target data, specifically comprising:

determining M-1 division points of the target data according to the conversion value of the target data, the conversion value is based on a preset rule to convert each data in a preset window into a value in digital form;

Divide the target data into M candidate data blocks according to the M-1 division points.
The method according to claim 5, wherein, according to the transformation value of the target data, determining M-1 division points of the target data comprises:

The preset window includes a first fixed-length window, a variable-length window, and a second fixed-length window that are sequentially adjacent; the length of the second fixed-length window is the length of one piece of data in the target data or the length of the the length of the transformation value of one of the data in the target data;

For any one of the M-1 division points, if the conversion value of the data included in the second fixed-length window satisfies the second preset condition, the second fixed-length window The end position of is determined as the division point, and the second preset condition is that the transformation value of the data in the second fixed-length window is greater than the maximum value of the transformation values of each data in the first fixed-length window, and greater than the maximum value of the transformation value of the data in the first fixed-length window. The maximum value of the transformation value of each data in the variable length window.
The method according to claim 5, wherein, according to the transformation value of the target data, determining M-1 division points of the target data comprises:

The preset window includes a first fixed-length window, a variable-length window, and a second fixed-length window that are sequentially adjacent; the length of the second fixed-length window is the length of one piece of data in the target data or the length of the the length of the transformation value of one of the data in the target data;

For any division point, when the conversion value of the data in the second fixed-length window does not satisfy the second preset condition, increase the length of the variable-length window in the preset window, and When the conversion value of the data in the second fixed-length window satisfies the second preset condition, the end position of the second fixed-length window is determined as the dividing point, and the second preset condition is the second fixed-length The transformation value of the data in the window is greater than the maximum value of the transformation value of each data in the first fixed-length window, and greater than the maximum value of the transformation value of each data in the variable-length window.
The method according to any one of claims 1-7, wherein the M candidate data blocks are divided into N target data blocks according to the respective fingerprint features of the M candidate data blocks, specifically include:

Determining the end positions of the candidate data blocks whose fingerprint characteristics satisfy the third preset condition among the M candidate data blocks are determined as the N-1 division points of the candidate data blocks, wherein the third preset condition is The fingerprint feature of the data of the candidate data block and the modulo value of the third threshold are within a preset range;

Divide the M candidate data blocks into the N target data blocks according to the N-1 division points.
The method according to any one of claims 1-8, wherein the generating an index tree of the target data according to the N target data blocks specifically includes:

Divide the N target data blocks into at least one data group according to the respective fingerprint features of the N target data blocks;

An index tree of the target data is generated based on the respective fingerprint features of the plurality of data groups.
The method according to claim 9, wherein said dividing said N target data blocks into at least one data group according to the respective fingerprint features of said N target data blocks, specifically comprises:

Determining a target data block that satisfies a fourth preset condition among the respective fingerprint features of the N target data blocks as at least one division point of the N target data blocks;

Divide the N target data blocks into the plurality of data groups according to at least one division point of the target data blocks.
The method according to claim 10, characterized in that,

The fourth preset condition is: a modulo value between the fingerprint feature of the target data block and the fourth threshold is greater than or equal to the fifth threshold.
The method according to claim 10, characterized in that,

The fourth preset condition is: the modulo value between the fingerprint feature of the target data block and the fourth threshold is greater than or equal to the fifth threshold, and the number of target data blocks in the data group is within the first number threshold and a second number threshold, the first number threshold being greater than the second number threshold.
The method according to any one of claims 1-12, characterized in that

The fingerprint feature is a hash value.
A data management device in a storage system, characterized by comprising: a processing module, a storage module and a generating module;

The processing module is configured to divide the target data into M candidate data blocks based on the content of the target data, where M is an integer greater than or equal to 2;

The processing module is further configured to divide the M candidate data blocks into N target data blocks according to the respective fingerprint features of the M candidate data blocks, where N is a positive integer less than or equal to M, and each The target data block includes at least one of the candidate data blocks;

The storage module is configured to store the N target data blocks and the fingerprint features of the N target data blocks, and the target data blocks have a one-to-one correspondence with the fingerprint features of the target data blocks;

The generating module is configured to generate an index tree of the target data according to the N target data blocks, and the index tree is used to address the content of the target data.
The data management device in the storage system according to claim 14, further comprising: a determining module;

The determination module is configured to determine M-1 division points of the target data according to the fingerprint characteristics of the target data;

The processing module is specifically configured to divide the target data into M candidate data blocks according to the M-1 division points.
The data management device in the storage system according to claim 14, further comprising: a determining module;

The determination module is configured to determine M-1 division points of the target data according to the conversion value of the target data, and the conversion value is to convert each data in the preset window into a digital form based on a preset rule value;

The processing module is configured to divide the target data into M candidate data blocks according to the M-1 division points.
The data management device in the storage system according to any one of claims 14-16, characterized in that,

The determining module is configured to determine the end positions of the candidate data blocks whose fingerprint features meet the third preset condition among the M candidate data blocks as the N-1 division points of the candidate data blocks, wherein the The third preset condition is that the modulo value between the fingerprint feature of the data of the candidate data block and the third threshold is within a preset range;

The processing module is configured to divide the M candidate data blocks into the N target data blocks according to the N-1 division points.
The data management device in the storage system according to any one of claims 14-16, characterized in that,

The processing module is configured to divide the N target data blocks into at least one data group according to the respective fingerprint features of the N target data blocks;

The generating module is configured to generate an index tree of the target data based on the respective fingerprint features of the plurality of data groups.
A data management device in a storage system, characterized in that it includes a memory and a processor, the memory is coupled to the processor; the memory is used to store computer program codes, and the computer program codes include computer instructions; when the When the computer instructions are executed by the processor, the processor is caused to perform the method according to any one of claims 1 to 13.
A computer storage medium, characterized by comprising computer instructions, which, when the computer instructions are run on a computing device, cause the computing device to execute the method according to any one of claims 1 to 13.
A computer program product, characterized by comprising computer-readable instructions, which, when the computer-readable instructions are run on a computer device, cause the computer device to execute the method according to any one of claims 1 to 13.