CN107219997B

CN107219997B - Method and device for verifying data consistency

Info

Publication number: CN107219997B
Application number: CN201610161907.5A
Authority: CN
Inventors: 朱家稷
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2016-03-21
Filing date: 2016-03-21
Publication date: 2020-08-18
Anticipated expiration: 2036-03-21
Also published as: CN107219997A

Abstract

The application provides a method for verifying data consistency, which is used for copy data of distributed storage and is characterized by comprising the following steps: acquiring a request for verifying the consistency of duplicate data contained in a specified data set; sending a request for acquiring check codes of the fragmented data and secondary check codes of the duplicate data to a storage node for storing each fragmented data of the data set, wherein the secondary check codes are acquired by using the check codes of all fragmented data through a preset algorithm; receiving a check code of the fragmented data fed back by each node and a secondary check code of the duplicate data; calculating a secondary verification code by using the received verification codes of all the fragmented data of the data set and adopting the same algorithm as the preset algorithm; and judging whether the duplicate data contained in the data set has consistency or not according to the secondary verification code and the received secondary verification code of the duplicate data. The effect of improving the efficiency of verifying the data consistency is achieved.

Description

Method and device for verifying data consistency

Technical Field

The application relates to a method and a device for verifying data consistency, and also relates to a method and a device for verifying data consistency; in addition, the invention also relates to a method and a device for generating the secondary check code, and a method and a device for responding to the request for acquiring the check code; the application also relates to a system for verifying data consistency.

Background

With the development of IT technology, a large amount of data needs to be stored and backed up for various applications to access and use. During the storage process, the change of data content caused by the update of data or the damage of the stored copy data or the change of the content caused by various factors are the situations that the system design and maintenance personnel have to deal with. The consistency of the copy data is ensured, namely, the stored copy data is ensured to be the same as the copy data before being written, and meanwhile, for the data stored in a distributed mode, the version of the stored copy data is ensured to be uniform, so that the value of the data can be correctly embodied and fully exerted.

In order to ensure the consistency of the copy data, the condition that the content of the copy data is changed must be checked in time. For copy data adopting full copy backup, different storage devices store all copies of data to be backed up and stored and check codes of the copy data. In this case, the consistency of the check codes of different copy data stored in different storage devices can be directly compared to judge whether the check codes are consistent, and if the check codes are not consistent, the consistency of the copy data is judged to be lost; if the data are consistent, the duplicate data are judged to have consistency.

For replica data stored by using distributed storage technology, replica data stored by different storage devices is only part of the data required to be stored, but not all of the data. In this case, the consistency of the copy data cannot be determined by directly comparing the check codes stored in different storage devices. In this case, it is necessary to read all the portions of the data to be stored in the different storage devices, perform necessary calculations thereon, and determine the consistency of the copy data based on the results of the calculations.

Therefore, according to the existing method for checking the consistency of the copy data, the stored copy data must be read and correspondingly calculated, which requires a large amount of data input and output operations on a system, consumes a large amount of system computing resources, and causes the problem that the consistency check of the data is difficult to complete in a short time, which results in the situation that the change of the content of the copy data cannot be detected in time.

Further, when the consistency of the data is verified, reading and corresponding calculation of a large amount of copy data also causes a huge burden on network transmission, which leads to a reduction in system service performance and even a breakdown of the system or the network.

Disclosure of Invention

The application provides a method and a device for verifying data consistency. The application also provides a method and a device for verifying data consistency and a method and a device for generating a secondary check code. A method and apparatus for responding to a request for acquiring a check code are also provided. The application also provides a system for verifying data consistency.

The application provides a method for verifying data consistency, which is used for copy data of distributed storage and comprises the following steps:

acquiring a request for verifying the consistency of duplicate data contained in a specified data set;

sending a request for acquiring check codes of the fragmented data and secondary check codes of the duplicate data to a storage node for storing each fragmented data of the data set, wherein the secondary check codes are acquired by using the check codes of all fragmented data through a preset algorithm;

receiving a check code of the fragmented data fed back by each node and a secondary check code of the duplicate data;

calculating a secondary verification code by using the received verification codes of all the fragmented data of the data set and adopting the same algorithm as the preset algorithm;

and judging whether the duplicate data contained in the data set has consistency or not according to the secondary verification code and the received secondary verification code of the duplicate data.

Preferably, the determining, according to the secondary verification code and the received secondary verification code of the duplicate data, whether the duplicate data included in the data set has consistency specifically includes:

judging whether the secondary verification code is consistent with the received secondary verification code of the copy data or not;

if the data sets are inconsistent, judging that the duplicate data contained in the data sets do not have consistency;

and if the data sets are consistent, judging that the duplicate data contained in the data sets have consistency.

Preferably, the preset algorithm is as follows: algorithms where different inputs produce different outputs.

Preferably, the algorithm that produces the different outputs from the different inputs is the MD5 hash algorithm or the SHA-1 hash algorithm.

Preferably, the method is used for verifying consistency of duplicate data distributively stored by using erasure coding technology, the data contained in the data set is duplicate data of the stored data, and the fragmented data in the set comprises data fragments of data block data of the stored data and coded fragment data of the data fragments of the data block data of the stored data;

correspondingly, the sending the request for obtaining the check code of the fragmented data and the secondary check code of the data set to the node storing each fragmented data of the data set includes:

a request is sent to a node storing encoded fragment data of a data fragment of data block data of stored data to acquire a check code of the encoded fragment data of the data fragment of data block data of the stored data and a secondary check code of copy data of the stored data.

The method for verifying data consistency is used for copy data of distributed storage and comprises the following steps:

receiving response information of the node to the request;

and judging whether the duplicate data contained in the data set has consistency according to whether the response information contains the fragment data change information of the data set.

Preferably, the determining whether the duplicate data included in the data set has consistency according to whether the response information includes the fragmented data change information of the data set specifically includes:

judging whether the response information contains information of the change of the fragmented data of the data set;

if so, judging that the duplicate data contained in the data set does not have consistency;

if not, calculating a secondary verification code by using the received verification codes of all the fragment data of the data set and adopting the same algorithm as the preset algorithm;

judging whether the secondary check verification code is consistent with the received secondary check code of the copy data or not;

The application provides a method for generating a secondary check code, which comprises the following steps:

acquiring a request for generating a check code for a specified data set;

calculating to obtain a check code corresponding to the fragmented data according to a preset algorithm by utilizing the fragmented data of the data set;

calculating to obtain a secondary check code of the data set according to a predetermined algorithm by using check codes corresponding to all the fragmented data of the data set;

storing each piece of partitioned data in the data set and the check code corresponding to the piece of partitioned data to a corresponding node;

and storing the secondary check code of the data set to the corresponding node.

Preferably, the calculating the check code corresponding to the fragmented data according to a predetermined algorithm includes:

calculating to obtain a cyclic redundancy check code corresponding to the fragment data by adopting a cyclic redundancy code calculation method;

preferably, the calculating the secondary check code of the data set according to a predetermined algorithm includes:

and calculating to obtain the secondary check code of the data set by adopting an algorithm of generating different outputs by different inputs.

Preferably, the method is used for performing distributed storage on data to be stored by adopting an erasure code technology;

the fragment data in the data set is: data segment data and corresponding coding segment data of a data block of data to be stored;

correspondingly, storing each piece of sliced data in the data set and the check code corresponding to the piece of sliced data to a corresponding node comprises: storing the data segment of the data block data of the data to be stored and the corresponding check code to the corresponding node, and storing the coded segment data of the data segment of the data block data of the data to be stored and the corresponding check code to the corresponding node;

correspondingly, the storing the secondary check code of the data set to the corresponding node comprises: and storing the secondary check code of the data to be stored into a corresponding node of the coding fragment data of the data fragment of the data block data storing the data to be stored.

Preferably, before the step of storing the data fragment of the data block data of the data to be stored and the check code corresponding thereto in the corresponding node, the step of storing the encoded fragment data of the data fragment of the data block data of the data to be stored and the check code corresponding thereto in the corresponding node includes a step of acquiring a position of the corresponding node, the step includes:

sending a request to a management node of the stored data to query a location of the corresponding node;

and receiving the corresponding identification of the storage node of each piece of data of the data set sent by the management node of the stored data.

The application provides a method for responding to a request for acquiring a check code, which is characterized by comprising the following steps:

acquiring a request for calculating a check code of the designated fragment data stored in the node;

calculating to obtain a calculated check code of the fragmented data stored by the node by adopting the same algorithm and parameters as those used for calculating the check code of the fragmented data stored by the node;

judging and marking the state of the fragmented data stored in the node according to the calculated check code of the fragmented data and the check code of the fragmented data stored in the node;

receiving a request for acquiring a check code of the fragmented data and a secondary check code of the duplicate data contained in a data set where the fragmented data is located, wherein the request is sent by a requester;

acquiring the state of the marked fragment data;

if the state is changed, sending the information of the change of the fragment data stored in the node to the requester;

and if the state is unchanged, sending the check code of the fragmented data stored by the node and the secondary check code of the duplicate data contained in the data set where the fragmented data is located to the requester.

Preferably, the determining and marking the state of the fragmented data stored at the node according to the computed check code of the fragmented data and the check code of the fragmented data stored at the node includes:

judging whether the calculation check code of the fragment data is consistent with the check code of the fragment data stored by the node;

if the data are inconsistent, marking the state of the fragment data stored in the node as changed;

and if the data are consistent with each other, marking the state of the fragment data stored by the node as unchanged.

Preferably, the check code of the fragmented data stored by the node is: and calculating the cyclic redundancy check code of the fragment data by using the fragment data and adopting a cyclic redundancy code calculation method.

Preferably for the verification of the consistency of data stored distributively using erasure coding techniques.

The application provides an apparatus for verifying data consistency, which is used for data stored in a distributed manner, and comprises the following units:

an acquisition unit configured to acquire a request for verifying consistency of duplicate data included in a designated data set;

the request unit is used for sending a request for acquiring the check codes of the fragmented data and the secondary check codes of the duplicate data to a storage node for storing each fragmented data of the data set, wherein the secondary check codes are acquired by using the check codes of all fragmented data through a preset algorithm;

a receiving unit, configured to receive a check code of the fragmented data fed back by each node pair and a secondary check code of the duplicate data;

the computing unit is used for computing a secondary verification code by using the received verification codes of all the fragmented data of the data set and adopting the same algorithm as the preset algorithm;

and the judging unit is used for judging whether the duplicate data contained in the data set has consistency or not according to the secondary verification code and the received secondary verification code of the duplicate data.

The device for verifying data consistency is used for the data stored in a distributed mode and comprises the following units:

a request unit, configured to send a request for acquiring a check code of the fragmented data and a secondary check code of the duplicate data to a storage node that stores each fragmented data of the data set, where the secondary check code is obtained through a predetermined algorithm by using check codes of all fragmented data;

a receiving unit, configured to receive response information of the node to the request;

and the judging unit is used for judging whether the duplicate data contained in the data set has consistency according to the fragmentation data change information of the data set contained in the response information.

The application provides a device for producing secondary check code, includes following unit:

an acquisition unit configured to acquire a request for generating a check code for a specified data set;

the check code calculation unit is used for calculating a check code corresponding to the fragmented data according to a preset algorithm by utilizing the fragmented data of the data set;

the secondary check code calculation unit is used for calculating the secondary check codes of the data set according to a preset algorithm by using the check codes corresponding to all the fragmented data of the data set;

the first storage unit is used for storing each piece of partitioned data in the data set and the check code corresponding to the piece of partitioned data to a corresponding node;

and the second storage unit is used for storing the secondary check codes of the data set to corresponding nodes.

The application provides a device for responding to a request for acquiring a check code, which comprises the following units:

the acquisition unit is used for acquiring a request for calculating the check code of the designated fragment data stored in the node;

the computing unit is used for computing the computed check code of the fragmented data stored in the node by adopting the same algorithm and parameters as those used for computing the check code of the fragmented data stored in the node;

the state marking unit is used for judging and marking the state of the fragmented data stored in the node according to the calculated check code of the fragmented data and the check code of the fragmented data stored in the node;

the receiving unit is used for receiving a request sent by a requester for acquiring the check code of the fragmented data and the secondary check code of the duplicate data contained in the data set where the fragmented data is located;

a second obtaining unit, configured to obtain a state of the marked sliced data;

a state sending unit, configured to send information of the change of the fragmented data stored in the node to the requester if the state is changed;

and the code sending unit is used for sending the check code of the fragmented data stored by the node and the secondary check code of the duplicate data contained in the data set where the fragmented data is located to the requester if the state is not changed.

The application provides a system for verifying data consistency, which comprises:

the device for verifying data consistency provided by the application or the device for verifying data consistency provided by the application;

the application provides a device for generating a secondary check code;

the application provides a device for responding to a request for acquiring a check code.

Compared with the prior art, the method for verifying the data consistency has the following advantages: and verifying the secondary check code of the copy data to judge the consistency of the copy data. When the data consistency is checked, only the check code and the secondary check code of the copy data are transmitted, and only the secondary check code of the copy data is calculated and verified, so that the speed of checking the data consistency is accelerated, the data consistency can be detected more timely, and conditions are created for keeping the consistency of the copy data as much as possible.

Furthermore, when the data is verified, only the verification code and the secondary verification code of the data are transmitted, and only the secondary verification code of the data is calculated, so that a large amount of duplicate data does not need to be transmitted, a large amount of duplicate data does not need to be calculated, and a large amount of system computing resources are not consumed. The method has the function of reducing the transmission and calculation amount of data when the data consistency is checked. The effect of saving system resources is achieved.

Compared with the prior art, the method for verifying the consistency of the data has the advantages that the change of the fragment data of the copy data in the storage node can be judged, and therefore the consistency of the copy data can be verified more comprehensively and accurately.

When the duplicate data is stored, the secondary check codes of the duplicate data are stored together in addition to the fragment data and the check codes of the fragment data, so as to be used for checking consistency of the duplicate data. Compared with the prior art, the method can verify the consistency of the duplicate data only by processing the check code of the fragment data and the secondary check code of the duplicate data without transmitting and calculating the huge duplicate data when verifying the consistency of the duplicate data, thereby reducing the workload of verifying the consistency of the duplicate data and saving resources.

According to the method for responding to the request for acquiring the check code, the fragmented data of the copy data is independently verified, and the state condition of the fragmented data is fed back to the corresponding requester when the consistency of the copy data is verified. Compared with the prior art, the method for responding to the check code acquisition request plays a role in sharing workload of copy data consistency verification. The effects of reducing network data transmission, reducing the calculated amount for verifying the consistency of the copy data and saving network and system resources are achieved.

Drawings

Fig. 1 is a flowchart illustrating a method for verifying data consistency according to a first embodiment of the present application.

Fig. 2 is a schematic diagram of replica data of distributed storage in the first and third embodiments of the present application.

Fig. 3 is a schematic diagram of data stored by using an erasure code (6, 3) algorithm in the first and third embodiments of the present application.

Fig. 4 is a flowchart illustrating a method for verifying data consistency according to a second embodiment of the present application.

Fig. 5 is a flowchart illustrating a method for generating a secondary check code according to a third embodiment of the present application.

Fig. 6 is a flowchart illustrating a method for responding to a request for acquiring a check code according to a fourth embodiment of the present application.

Fig. 7 is a block diagram illustrating an apparatus for verifying data consistency according to a fifth embodiment of the present application.

Fig. 8 is a block diagram illustrating a device for verifying data consistency according to a sixth embodiment of the present application.

Fig. 9 is a block diagram of an apparatus for generating a secondary check code according to a seventh embodiment of the present application.

Fig. 10 is a block diagram illustrating an apparatus for responding to a request for acquiring a check code according to an eighth embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

A first embodiment of the present application provides a method for verifying data consistency, a flowchart of which is shown in fig. 1, and this embodiment takes consistency verification of duplicate data stored in a distributed manner as an example for description, where the duplicate data stored in a distributed manner is a set of fragmented data that needs to store data. The fragment data is stored on different node devices.

The embodiment aims at verifying consistency of duplicate data adopting distributed storage. The copy data of the distributed storage is shown in fig. 2: the first node device, the second node device and the third node device respectively store parts of data to be stored, that is, fragment data, wherein the first node device stores fragment data DP0 and DP1 of the data to be backed up, the second node device stores fragment data DP1 and DP2 of the data to be backed up, and the third node device stores fragment data DP2 and DP0 of the data to be backed up. The fragment data DP0, DP1 and DP2 constitute a data set, i.e. a set of duplicate data of the data to be stored. Three node devices store 2 sets of copy data that require backup data. In addition, the node device also stores the check code corresponding to the stored fragment data and the secondary check code of the copy data, that is, the node device also stores the check code c0 of DP0 and the check code c1 of DP1 and the secondary verification code c of the copy data; the second node device also stores check codes c1 of DP1 and c2 of DP2 and a secondary verification code c of the copy data; the third node device also stores the check code c2 of DP2, the check code c0 of DP0 and the secondary verification code c of the copy data.

The following detailed description of verifying the consistency of the duplicate data:

step S101, a request for verifying the consistency of the duplicate data included in the designated data set is acquired.

Take the consistency of the copy data that has been obtained to the verification data set one as an example.

Step S102, sending a request for acquiring the check code of the fragmented data and the secondary check code of the duplicate data to a storage node storing each fragmented data of the data set, wherein the secondary check code is acquired by using the check codes of all fragmented data through a preset algorithm.

Sending a request for acquiring the check code of the fragment data DP0 and the secondary check code of the copy data to a node device; sending a request for acquiring a check code of fragment data DP1 and a secondary check code of the copy data to a second node device; and sending a request for acquiring the check code of the fragment data DP2 and the secondary check code of the copy data to the third node device.

The secondary check code c of the duplicate data is obtained by calculating a check code c0 of DP0, a check code c1 of DP1 and a check code c2 of DP2 by a preset algorithm.

The algorithm produces different outputs for different inputs, i.e. it is possible that a change in any of the data c0, c1 or c2 will result in a different calculated secondary check code. The condition that the check codes of the fragmented data are changed and the secondary check codes are the same can be avoided, and the accuracy of consistency verification can be ensured.

Based on this, the MD5 algorithm or SHA-1 algorithm, etc. among hash algorithms may be employed. In this embodiment, taking the secondary check code as the MD5 value calculated by using the DM5 algorithm as an example, the check code c0 of DP0, the check code c1 of DP1 and the check code c2 of DP2 may be sequentially spliced into a character string, and the MD5 value of the spliced character string is calculated. The MD5 value is the secondary check code c. The MD5 algorithm is efficient, compact, resistant to modification, and its wide application has also proven it to be an excellent message digest algorithm. The MD5 algorithm can ensure the accuracy of verification as much as possible.

Step S103, receiving the check code of the fragmented data fed back by each node and the secondary check code of the duplicate data.

Receiving a check code c0 of DP0 and a secondary check code c of the copy data fed back by the node equipment I; receiving a check code c1 of DP1 and a secondary check code c of the copy data fed back by the second node device; and receiving the check code c2 of the DP2 fed back by the node device III and the secondary check code c of the copy data.

And step S104, calculating a secondary verification code by using the received verification codes of all the fragmented data of the data set and adopting the same algorithm as the preset algorithm.

And calculating to obtain a secondary verification code c' of the copy data by using the same algorithm and parameters as those adopted by the secondary verification code c of the stored copy data and the received verification codes c0 of DP0, c1 of DP1 and c2 of DP 2.

For the case that the MD5 value obtained by using the MD5 algorithm after splicing the check code c0, the check code c1 of DP1, and the check code c2 of DP2 of DP0 into a character string is used as the secondary check code c of the replica data, the same character string splicing rule is used, the received check code c0 of DP0, the check code c1 of DP1, and the check code c2 of DP2 are spliced into characters, the MD5 value of the character string spliced by using the received check codes is obtained by using the same MD5 algorithm, and the MD5 value of the character string spliced by using the received check codes is used as the secondary check code c' of the replica data.

Step S105, judging whether the duplicate data contained in the data set has consistency according to the secondary verification code and the received secondary verification code of the duplicate data.

The step of judging whether the duplicate data contained in the data set has consistency according to the secondary verification code and the received secondary verification code of the duplicate data may adopt a mode of judging whether the secondary verification code is consistent with the received secondary verification code of the duplicate data.

And if the secondary check verification code c' of the duplicate data is the same as the secondary check code c of the duplicate data sent by the received node equipment, judging that the duplicate data has consistency.

And if the secondary verification code c' of the duplicate data is different from the received secondary verification code c of the duplicate data sent by the node equipment, judging that the duplicate data does not have consistency.

The method for verifying data consistency provided by the embodiment can also be used for verifying consistency of data distributively stored by adopting an erasure code technology.

For data which is stored in a distributed mode by adopting an erasure code technology, the data is divided into data block data, the data block data is divided into data segments according to a set size (such as 4K), the data segments belonging to the same data block are stored in the same node of the distributed system, correspondingly, coding segment data of the data block are obtained by calculation according to the adopted erasure code algorithm according to the data segment data of different data blocks, and the coding segment data are respectively stored in different nodes of the distributed system.

Fig. 3 is a schematic diagram of data storage using an erasure code (6, 3) algorithm.

Data stored by adopting an erasure code technology are respectively stored in nodes one to nine, wherein the nodes one to six store data block data DB1 to DB6 of data to be stored, each node stores data of one data block, the data of each data block is divided into data fragments (3 data fragments in the embodiment), for example, fragment data DP11, DP12 and DP13 stored in the node one are data fragment data DP11, DP12 and DP13 of a data block one (DB1) of the data to be stored, and the node one also stores check codes c11, c12 and c13 corresponding to the data fragments DP11, DP12 and DP 13; the fragment data DP21, DP22 and DP23 stored by the node two are data fragment data DP21, DP22 and DP23 of a data chunk two (DB2) of data required to be stored, and in addition, the node two also stores check codes c21, c22 and c23 corresponding to the data fragments DP21, DP22 and DP 23; and the other three to six nodes and so on.

The fragment data stored by the nodes seven to nine are 3 encoded fragment data E7N to E9N generated by using the nth (N is the number of data fragments, respectively 1 to 3) data fragments DPNn (N is the number of data blocks, respectively 1 to 6) of the data blocks DB1 to DB6 that require stored data, and in addition, the nodes seven to nine also store check codes c7N to c9N corresponding to the encoded fragment data E7N to E9N, and data fragments DPNn calculated by using the check codes cNn of the data fragments DPNn and the check codes c7N to c9N of the encoded fragment data E7N to N9N corresponding to the data fragments DPNn, and secondary check codes cN of the encoded fragment data E7N to E9N.

Please refer to fig. 3.

Wherein c11, c21, c31, c41, c51 and c61 are check codes of DP11, DP21, DP31, DP41, DP51 and DP61 respectively; e71, E81, and E91 are encoded fragment data corresponding to DP11, DP21, DP31, DP41, DP51, and DP 61; c71, c81 and c91 are check codes of E71, E81 and E91 respectively; c1 is a secondary check code of the duplicate data DP11, DP21, DP31, DP41, DP51, DP61, E71, E81 and E91 calculated by using c11, c21, c31, c41, c51, c61, c71, c81 and c91 according to a preset algorithm (for example, MD5 algorithm).

c12, c22, c32, c42, c52 and c62 are check codes of DP12, DP22, DP32, DP42, DP52 and DP62 respectively; e72, E82, and E92 are encoded fragment data corresponding to DP12, DP22, DP32, DP42, DP52, and DP 62; c72, c82 and c92 are check codes of E72, E82 and E92 respectively; c2 is a secondary check code of the duplicate data DP12, DP22, DP32, DP42, DP52, DP62, E72, E82 and E92 calculated by using c12, c22, c32, c42, c52, c62, c72, c82 and c92 according to a preset algorithm (for example, MD5 algorithm).

c13, c23, c33, c43, c53 and c63 are check codes of DP13, DP23, DP33, DP43, DP53 and DP63 respectively; e73, E83, and E93 are encoded fragment data corresponding to DP13, DP23, DP33, DP43, DP53, and DP 63; c73, c83 and c93 are check codes of E73, E83 and E93 respectively; c3 is a secondary check code of the duplicate data DP13, DP23, DP33, DP43, DP53, DP63, E73, E83 and E93 calculated by using c13, c23, c33, c43, c53, c63, c73, c83 and c93 according to a preset algorithm (for example, MD5 algorithm).

When performing the consistency verification of the data stored on the distributed system by using the erasure code technology, all the copy data stored in the distributed system needs to be verified, namely data-segment data of a data block and corresponding code-segment data, and in particular, the nth data-segment data and corresponding code-segment data of a different data block of said duplicate data are combined into one data set, namely DP11, DP21, DP31, DP41, DP51, DP61, E71, E81 and E91 constitute a data set, DP12, DP22, DP32, DP42, DP52, DP62, E72, E82 and E92 constitute a data set two, DP13, DP23, DP33, DP43, DP53, DP63, E73, E83 and E93 constitute a data set three, according to the method for verifying data consistency described above in this embodiment, consistency of duplicate data included in the three data sets is verified respectively. When the duplicate data in the three data sets have consistency, it may be determined that the duplicate data stored by using the erasure code technique has consistency.

Fig. 4 shows a flow chart of a method for verifying data consistency according to the present application. The method comprises the following steps:

step S201, a request for verifying the consistency of the duplicate data included in the designated data set is acquired.

Step S202, sending a request for acquiring a check code of the fragmented data and a secondary check code of the duplicate data to a storage node storing each fragmented data of the data set, wherein the secondary check code is acquired by a predetermined algorithm by using the check codes of all fragmented data;

step S203, receiving response information of the node to the request;

step S204, determining whether the duplicate data included in the data set has consistency according to whether the response information includes the fragmented data change information of the data set.

Compared with the method for verifying data consistency provided in the first embodiment of the present application, the method provided in this embodiment is different in that after the step of sending a request for obtaining the check code of the fragmented data and the secondary check code of the duplicate data to the storage node storing each fragmented data of the data set, corresponding processing is performed according to different received response information. In the following, the steps different from the first embodiment will be described in detail, and other steps may refer to the description in the first embodiment of the present application.

Step S203, receiving response information of the node to the request;

after the step of sending a request for acquiring the check code of the fragmented data and the secondary check code of the duplicate data to the storage node storing each fragmented data of the data set, the storage node of each fragmented data of the data set may feed back the requested information according to the condition of the storage node, or send back the information that the storage node considers to be fed back.

If the storage node of the fragmented data of the data set finds that the data stored by the storage node is changed according to the condition of the storage node, if the stored fragmented data cannot be matched with the check code of the stored fragmented data, the changed information of the stored fragmented data can be fed back as corresponding information; if the condition that the stored fragment data cannot be matched with the check code of the stored fragment data is not detected, the check code of the fragment data and the secondary check code of the fragment data can be fed back as response information.

The specific mode can be as follows:

firstly, whether the response information contains the information of the change of the fragmented data of the data set is judged.

And if the response information contains the information of the change of the fragmented data of the data set, judging that the duplicate data contained in the data set does not have consistency.

If the response information does not contain the information of the change of the fragmented data of the data set, calculating a secondary check verification code by using the received check codes of all fragmented data, and judging whether the duplicate data contained in the data set has consistency or not according to the secondary check verification code and the received secondary check code of the duplicate data.

Similar to the step S105 in the first embodiment of the present application, determining whether the duplicate data included in the data set has consistency according to the secondary verification code and the received secondary verification code of the duplicate data may adopt a manner of determining whether the secondary verification code is consistent with the received secondary verification code of the duplicate data, that is, if not, determining that the duplicate data included in the data set does not have consistency; and if the data sets are consistent, judging that the duplicate data contained in the data sets have consistency. For details, please refer to the first embodiment of the present application, which is not described in S105, and is not repeated herein.

A third embodiment of the present application provides a method for generating a secondary check code, and a flowchart thereof is shown in fig. 5. In order to perform rapid and effective consistency verification on the stored duplicate data, when the duplicate data is stored, a corresponding group of data is used as a data set, and corresponding processing is performed on the fragmented data in the data set. A method for generating a secondary check code is described below by taking the procedure of data generation of distributed storage shown in fig. 2 as an example, the method including the steps of:

step S301, a request for generating a check code for a specified data set is obtained.

The data to be stored is divided into three data blocks DP0, DP1 and DP2 in the distributed storage system, and the three data blocks are used as the slicing data of one data set.

Step S302, calculating a check code corresponding to the fragmented data according to a predetermined algorithm by using one fragmented data of the data set.

For the fragment data DP0, DP1 and DP2, their corresponding check codes are calculated respectively according to a preselected algorithm. E.g., their cyclic redundancy codes c0, c1, and c2, respectively, are computed. The calculation of the cyclic redundancy code is simple and mature, and the error detection performance is good.

Step S303, calculating to obtain a secondary check code of the data set according to a predetermined algorithm by using the check codes corresponding to all the fragmented data of the data set.

Using c0, c1 and c2, an algorithm is preferred that produces different outputs for different inputs according to a preselected algorithm, i.e., an algorithm that enables any change in data of c0, c1 or c2 to result in a different calculated secondary check code. This can avoid the condition that the check code of the fragment data DP0, DP1, or DP2 changes and the secondary check code is the same, and can ensure the accuracy of the consistency verification.

Based on this, the MD5 algorithm or SHA-1 algorithm, etc. among hash algorithms may be employed. In this embodiment, the MD5 value calculated by using the DM5 algorithm is taken as an example, that is, the check code c0 of the DP0, the check code c1 of the DP1 and the check code c2 of the DP2 may be sequentially spliced into a character string, and the MD5 value of the spliced character string is calculated. The MD5 value is the secondary check code c. The MD5 algorithm is efficient, compact, resistant to modification, and its wide application has also proven it to be an excellent message digest algorithm. The MD5 algorithm can ensure the accuracy of verification as much as possible.

Step S304, storing each sliced data in the data set and the check code corresponding thereto to the corresponding node.

For distributed storage, the fragmented data of the data to be stored is usually stored in different node devices, and in this embodiment, DP0 and DP1 and their corresponding check codes c0 and c1 are stored in node device one, DP1 and DP2 and their corresponding check codes c1 and c2 are stored in node device two, and DP2 and DP0 and their corresponding check codes c2 and c0 are stored in node device three.

Step S305, storing the secondary check code of the data set to a corresponding node.

The secondary check code c of the duplicate data calculated in step S303 is stored to each node apparatus.

Therefore, data to be stored are stored in three different node devices respectively, corresponding check codes and secondary check codes are also stored in the corresponding node devices, and the fragmented data stored in the different node devices form two sets of copy data. The distribution of the specific fragmented data, the check code and the secondary check code is shown in fig. 2.

For the data storage scheme adopting the erasure code technology in a distributed manner, before the duplicate data, the check code and the secondary check code are stored in the corresponding node device, the method can further comprise the step of acquiring the position of the corresponding node device. For example, for distributed storage of data using erasure coding techniques as shown in fig. 3:

and sending a request for inquiring the positions of the corresponding node equipment I, the node equipment II and the node equipment nine to the management node.

And the identifiers of the first node equipment, the second node equipment and the ninth node equipment, such as IP addresses or host names, sent by the management node for receiving the stored data. Therefore, the position information of the copy data storage can be managed in a centralized way, and different service requirements can be conveniently used.

A fourth embodiment of the present application provides a method for responding to a request for acquiring a check code, a flowchart of which is shown in fig. 6, and the method includes the following steps:

step S401, obtain a request for calculating a check code of the designated fragmented data stored in the current node.

For the case that the node device stores the fragmented data of the duplicate data, the node device may perform an integrity check operation on the stored fragmented data. The following description will take the storage fragment data DP0 of the node device, its check code c0, and the secondary check code c of the duplicate data included in the data set as an example. The check code c0 is a cyclic redundancy check code of DP 0.

Step S402, calculating the calculation check code of the fragment data stored in the node by using the same algorithm and parameters as those used for calculating the check code of the fragment data stored in the node.

And the node equipment correspondingly calculates the DP0 stored by the node equipment according to the same algorithm when the check code c0 of the fragment data is stored, namely a cyclic redundancy check code algorithm and the consistent parameters, so as to obtain a calculation check code c 0' of the DP 0.

Step S403, determining and marking the state of the fragmented data stored in the node according to the computed check code of the fragmented data and the check code of the fragmented data stored in the node.

The judgment and marking can be performed in the following manner: and judging whether the calculation check code c 0' of the fragment data is consistent with the check code c0 of the fragment data stored by the node.

If c0 is not equal to c 0', the state of the fragment data DP0 stored by the node device is marked as changed.

If c0 is equal to c 0', the state of the fragment data DP0 stored by the node device is marked as unchanged.

Step S404, receiving a request sent by a requester to obtain the check code of the fragmented data and the secondary check code of the duplicate data included in the data set where the fragmented data is located.

The requester requests to query the check code c0 of the DP0 and the secondary check code c of the copy data to which the DP0 belongs.

Step S405, acquiring the state of the marked fragment data.

The status of the sliced data is obtained with an initialization value "unchanged" the first time the sliced data is stored, and in step S403, the flag is updated.

Step S406, if the status is changed, sending the information of the change of the fragment data stored in the node to the requester.

If the status flag of the fragment data DP0 is changed, information that the fragment data DP0 is changed is transmitted to the requester.

Step S407, if the status is unchanged, sending the check code of the fragmented data stored in the node and the secondary check code of the duplicate data included in the data set in which the fragmented data is located to the requester.

If the status flag of the fragmented data DP0 is "unchanged", the check code c0 of fragmented data DP0 and the secondary check code c of the duplicate data contained in its data set are sent to the requester.

A fifth embodiment of the present application provides an apparatus for verifying data consistency, which is used for copy data of distributed storage, and a block diagram of the structure of the apparatus is shown in fig. 7. The method comprises the following steps:

an obtaining unit U501 is configured to obtain a request for verifying consistency of duplicate data included in a specified data set.

The request unit U502 is configured to send a request for acquiring a check code of the fragmented data and a secondary check code of the duplicate data to a storage node storing each fragmented data of the data set, where the secondary check code is obtained through a predetermined algorithm by using check codes of all fragmented data.

A receiving unit U503, configured to receive the check code of the fragmented data fed back by each node and the secondary check code of the duplicate data.

And a calculating unit U504, configured to calculate, by using the received check codes of all the sliced data of the data set, a secondary check verification code by using the same algorithm as the predetermined algorithm.

And the judging unit U505 judges whether the duplicate data included in the data set has consistency according to the secondary verification code and the received secondary verification code of the duplicate data.

A sixth embodiment of the present application provides an apparatus for verifying data consistency, which is used for replica data in distributed storage, and a block diagram of the structure of the apparatus is shown in fig. 8, and includes:

an obtaining unit U601, configured to obtain a request for verifying consistency of duplicate data included in a specified data set.

A requesting unit U602, configured to send a request for obtaining the check code of the fragmented data and the secondary check code of the duplicate data to a storage node that stores each fragmented data of the data set, where the secondary check code is obtained through a predetermined algorithm by using the check codes of all fragmented data.

A receiving unit U603, configured to receive response information of the node to the request.

A determining unit U604, configured to determine whether the duplicate data included in the data set has consistency according to whether the response information includes the fragmented data change information of the data set.

A seventh embodiment of the present application provides an apparatus for generating a secondary check code, a block diagram of which is shown in fig. 9, including:

an obtaining unit U701 is configured to obtain a request for generating a check code for a specified data set.

And the check code calculating unit U702 is configured to calculate, according to a predetermined algorithm, a check code corresponding to the fragmented data by using one fragmented data of the data set.

And the secondary check code calculation unit U703 is configured to calculate a secondary check code of the data set according to a predetermined algorithm by using check codes corresponding to all pieces of fragmented data of the data set.

And the first storage unit U704 is configured to store each piece of sliced data in the data set and the check code corresponding to the piece of sliced data in the data set to a corresponding node.

And the second storage unit U705 is configured to store the secondary check codes of the data sets to corresponding nodes.

An eighth embodiment of the present application provides an apparatus for responding to a request for acquiring a check code, a block diagram of which is shown in fig. 10, and includes:

the obtaining unit U801 is configured to obtain a request for calculating a check code of designated fragmented data stored in the node.

And the calculating unit U802 is configured to calculate the calculated check code of the fragmented data stored in the node by using the same algorithm and parameter as those used for calculating the check code of the fragmented data stored in the node.

And the state marking unit U803 is configured to determine and mark the state of the fragmented data stored in the node according to the calculated check code of the fragmented data and the check code of the fragmented data stored in the node.

A receiving unit U804, configured to receive a request sent by a requester to obtain a check code of the fragmented data and a secondary check code of duplicate data included in a data set in which the fragmented data is located.

A second obtaining unit U805, configured to obtain a marked state of the slice data.

A state sending unit U806, configured to send, to the requester, information that the fragmentation data stored in the node is changed if the state is changed.

A code sending unit U807, configured to send, to the requester, the check code of the fragmented data stored in the node and the secondary check code of the duplicate data included in the data set where the fragmented data is located if the status is unchanged.

A ninth embodiment of the present application provides a system for verifying data consistency, including: the device for verifying data consistency, the device for generating the secondary check code and the device for responding to the request for acquiring the check code are provided. Or consists of the following devices: the device for verifying data consistency, the device for generating the secondary check code and the device for responding to the request for acquiring the check code are provided.

The details of each device of the system can be referred to the descriptions of the fifth embodiment to the ninth embodiment of the present application.

Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.

2. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims

1. A method for verifying data consistency for replica data of distributed storage, comprising the steps of:

2. The method according to claim 1, wherein the determining whether the duplicate data included in the data set has consistency according to the secondary verification code and the received secondary verification code of the duplicate data specifically comprises:

3. The method for verifying data consistency according to claim 1, characterized in that the predetermined algorithm is: algorithms where different inputs produce different outputs.

4. A method for verifying data consistency according to claim 3, characterized in that the algorithm with which the different inputs produce different outputs is the MD5 hash algorithm or the SHA-1 hash algorithm.

5. The method for verifying data consistency according to any one of claims 1 to 4, wherein the method is used for verifying consistency of duplicate data which is distributively stored by using erasure coding technology, the data contained in the data set is duplicate data of the stored data, and the sliced data in the set comprises data fragments of data block data of the stored data and coded fragment data of the data fragments of the data block data of the stored data;

6. A method for verifying data consistency for replica data of distributed storage, comprising the steps of:

receiving response information of the node to the request;

7. The method according to claim 6, wherein the determining whether the duplicate data included in the data set has consistency according to whether the response information includes the fragmented data change information of the data set specifically includes:

8. The method for verifying data consistency according to claim 7, wherein the determining whether the duplicate data included in the data set has consistency according to the secondary verification code and the received secondary verification code of the duplicate data specifically comprises:

9. The method for verifying data consistency according to claim 6, wherein the predetermined algorithm is: algorithms where different inputs produce different outputs.

10. The method for verifying data consistency according to claim 9, wherein the algorithm that the different inputs generate different outputs is MD5 hash algorithm or SHA-1 hash algorithm.

11. A method for verifying data consistency according to any one of claims 6 to 10, wherein the method is used for verifying consistency of duplicate data distributively stored by using erasure coding technology, the data contained in the data set is duplicate data of the stored data, and the sliced data in the set comprises data fragments of data block data of the stored data and coded fragment data of the data fragments of the data block data of the stored data;

12. A method for generating a secondary check code, comprising the steps of:

acquiring a request for generating a check code for a specified data set;

and storing the secondary check code of the data set to the corresponding node.

13. The method for generating secondary check codes according to claim 12, wherein said calculating the check code corresponding to the sliced data according to a predetermined algorithm comprises:

and calculating to obtain the cyclic redundancy check code corresponding to the fragment data by adopting a cyclic redundancy code calculation method.

14. The method of claim 12, wherein calculating the secondary check code of the data set according to a predetermined algorithm comprises:

15. The method for generating double check code according to claim 14, wherein said algorithm that generates different outputs from different inputs is MD5 hash algorithm or SHA-1 hash algorithm.

16. The method for generating secondary check code according to any of claims 12-15, wherein the method is used for performing distributed storage on the data to be stored by using erasure code technology;

17. The method for generating secondary check codes according to claim 16, wherein the step of storing the data fragment of the data block data of the data to be stored and the check code corresponding thereto to the corresponding node comprises a step of acquiring a location of the corresponding node before the step of storing the encoded fragment data of the data fragment of the data block data of the data to be stored and the check code corresponding thereto to the corresponding node, the step comprising:

18. A method for responding to a request to obtain a check code, comprising the steps of:

acquiring the state of the marked fragment data;

19. The method according to claim 18, wherein said determining and marking the state of the fragmented data stored at the node according to the computed parity of the fragmented data and the parity of the fragmented data stored at the node comprises:

20. The method according to claim 18, wherein the check code of the fragmented data stored by the node is: and calculating the cyclic redundancy check code of the fragment data by using the fragment data and adopting a cyclic redundancy code calculation method.

21. A method for responding to a request for a check code according to any of claims 18-20, characterized by checking the consistency of the data distributively stored using erasure coding techniques.

22. An apparatus for verifying data consistency, for replica data of distributed storage, comprising the following units:

a receiving unit, configured to receive a check code of the fragmented data and a secondary check code of the duplicate data, where the check codes are fed back by the nodes;

23. An apparatus for verifying data consistency, which is used for replica data of distributed storage, and is characterized by comprising the following units:

24. An apparatus for generating a secondary check code, comprising the following elements:

25. An apparatus for responding to a request to obtain a check code, comprising:

26. A system for verifying data consistency, comprising:

means for verifying data consistency according to claim 22 or means for verifying data consistency according to claim 19;

the apparatus for generating a secondary check code according to claim 24;

an apparatus for responding to a request to obtain a check code according to claim 25.