WO2013091162A1

WO2013091162A1 - Method, device, and system for recovering distributed storage data

Info

Publication number: WO2013091162A1
Application number: PCT/CN2011/084219
Authority: WO
Inventors: 王志用; 杨德平
Original assignee: 华为技术有限公司
Priority date: 2011-12-19
Filing date: 2011-12-19
Publication date: 2013-06-27
Also published as: CN103262042B; CN103262042A

Abstract

Embodiments of the present invention provide a method, device, and system for recovering distributed storage data. The method comprises: a local node receiving a data operation sequence set list sent, according to a version value of a data operation sequence set of the local node, by a destination node; receiving data corresponding to the data operation sequence set list sent by the destination node; and according to the data operation sequence set list and the data corresponding to the data operation sequence set list, updating data of the local node. The method, device, and system for recovering distributed storage data provided in the embodiments of the present invention are able to recover data by sending the version number of a data operation sequence set of a local node, thereby reducing data transmission volumes and saving network bandwidth.

Description

Distributed storage data recovery method, device and system

Technical field

The present invention belongs to the field of information technology, and in particular, to a distributed storage data recovery method, apparatus and system.

Background technique

With the development of information technology, massive data processing has brought challenges to traditional data processing methods. Therefore, various large-scale distributed cluster systems have emerged. The distributed cluster system consists of a large number of traditional nodes, and the overall powerful processing capability is presented externally by sharing the processing power to each node. Each node needs to collaborate through shared data to complete processing tasks.

The distributed block storage system is a distributed storage system that uses data blocks as storage units to meet the massive storage requirements, and presents a powerful storage capability. The failure of nodes in a distributed storage system and the rapid recovery of data from a faulty node regression cluster are key to providing high-quality services.

In the prior art, a node maintains a snapshot file, and the snapshot file stores all data backups of the node. When data recovery is performed, the faulty node sends a snapshot file to the node that provides data recovery. The node that provides the recovery data receives the snapshot file sent by the faulty node, compares it with the snapshot file stored, and returns the difference data to the faulty node. The inventors have found that since the snapshot file stores data backup, the data backup by sending the failed node is compared with the data backup of the node providing the restored data during data recovery, and the amount of data to be transmitted is very large, which seriously wastes network bandwidth.

Summary of the invention

A brief summary of the invention is set forth below in order to provide a basic understanding of certain aspects of the invention. It should be understood that this summary is not an exhaustive overview of the invention. It is not intended to identify key or critical aspects of the invention, and is not intended to limit the scope of the invention. Its purpose is to present some concepts in a simplified form as a pre-

The embodiment of the invention provides a method for recovering distributed storage data, including:

The local node receiving target node sends according to the version value of the local node's data operation sequence set. a list of data manipulation sequence sets;

Receiving data corresponding to the data operation sequence set list sent by the target node; updating the data of the local node according to the data operation sequence set list and the data corresponding to the data operation sequence set list.

The embodiment of the invention further provides a distributed storage data recovery method, including:

Receiving a version value of a data operation sequence set of the local node;

And sending, according to the version value, a data operation sequence set list to the local node, and sending, according to the data operation sequence set list, data corresponding to the data operation sequence set list to the local node.

The embodiment of the invention further provides a node, including:

a receiving unit, configured to receive a data operation sequence set list sent by the target node according to the version value of the data operation sequence set of the node, and receive data corresponding to the data operation sequence set list sent by the target node;

The updating unit updates the data of the node according to the data operation sequence set list and the data corresponding to the data operation sequence set list.

The embodiment of the invention further provides a node, including:

a receiving unit, configured to receive a version value of a local node data operation sequence set;

And a sending unit, configured to send, according to the version value, the data operation sequence set list and the data operation sequence set list corresponding data to the local node.

The embodiment of the invention further provides a distributed storage data recovery system, comprising:

a local node, configured to receive a data operation sequence set list sent by the target node according to a version value of the data operation sequence set of the local node, and receive data corresponding to the data operation sequence set list sent by the target node, And being further configured to update data of the node according to the data operation sequence set list and the data corresponding to the data operation sequence set list;

The target node is configured to receive a version value of a data operation sequence set of the local node, and a root And according to the version value, sending a data operation sequence set list to the local node, and data corresponding to the data operation sequence set list.

The distributed storage data recovery method, device and system provided by the embodiment of the present invention update the data of the local node by receiving the data operation sequence set list sent by the target node according to the version value of the data operation sequence set of the local node, thereby performing When data is recovered, the amount of data transmission is reduced, and network bandwidth is saved.

DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, a brief description of the drawings used in the description of the embodiments will be briefly described. Those skilled in the art can also obtain other drawings based on these drawings without paying creative labor.

1 is a schematic diagram of a data operation sequence and a data operation sequence set;

FIG. 1 is a schematic diagram of a process of generating an operation record file;

FIG. 3 is a schematic diagram of a process of combining data operation sequences;

4 is a schematic flow chart of a first embodiment of the present invention;

Figure 5 is a schematic flow chart of a second embodiment of the present invention;

6 is a schematic flow chart of a third embodiment of the present invention;

7 is a schematic flow chart of a fourth embodiment of the present invention;

8 is a schematic flow chart of a fifth embodiment of the present invention;

9 is a schematic structural diagram of a node according to a sixth embodiment of the present invention;

10 is a schematic structural diagram of a node according to a seventh embodiment of the present invention;

11 is a schematic structural diagram of a node according to an eighth embodiment of the present invention;

12 is a schematic structural diagram of a node according to a ninth embodiment of the present invention;

FIG. 13 is a schematic structural diagram of a node according to a tenth embodiment of the present invention; FIG.

14 is a schematic structural diagram of a node according to an eleventh embodiment of the present invention; 15 is a schematic structural diagram of a node according to a twelfth embodiment of the present invention;

16 is a schematic structural diagram of a system according to a thirteenth embodiment of the present invention;

Specific embodiment

Exemplary embodiments of the present invention will be described hereinafter with reference to the accompanying drawings. For the sake of clarity and conciseness, not all features of an actual implementation are described in the specification. However, it should be understood that many implementation-specific decisions must be made in the development of any such practical embodiment in order to achieve the developer's specific goals, and these decisions may vary from implementation to implementation. .

The local node in this embodiment refers to a node that needs to perform data recovery, and may also be referred to as a fault node. The target node refers to a node that provides recovery data, and may also be referred to as a recovery data target node.

In a distributed storage system, the same data needs to be stored on different nodes to form a backup. The corresponding storage data on the node where the backup relationship exists should be consistent. As shown in Figure 1, when data is written to the node, a corresponding sequence of data operations is generated to record a write to the data. The data manipulation sequence is represented by obj-id: <off set, s ize>, where obj-id is a data identifier used to represent one type of data. Off set represents the initial value of the data operation sequence, and size represents the offset of the data operation sequence from the initial value. The data operation sequence is recorded in the node cache, and when the node status changes, the data operation sequence in the cache is flushed to the operation record file. The operation log file may include, but is not limited to, a check-point file. The sequence of data operations that are flushed into the operation log file at a time is called a collection of data manipulation sequences, and each data manipulation sequence set has a version number. A collection of data manipulation sequences contains a version value and one or more rows of data manipulation sequences. The data operation sequence set version value is a monotonically increasing non-negative integer. Of course, other symbols that can represent the order relationship can also be used. In the operation log file, the data operation sequence set is stored in the order of the version values, forming a list of data operation sequence sets. A data manipulation sequence collection list contains one or more collections of data manipulation sequences.

In addition to the operation record file, the node may also include a log data operation sequence list. Log The data manipulation sequence list is used to store all data manipulation sequences during the current version of the data manipulation sequence set. The log data manipulation sequence list consists of a collection of all data manipulation sequences generated during the current version of the data manipulation sequence collection.

The data manipulation sequence in all embodiments of the present invention records only the operation of the node to write data, and the data manipulation sequence does not include the written data itself.

As shown in FIG. 2, an update flowchart of the operation record file, the operation record file, which may include but is not limited to a check-point file, includes the following steps:

Step 201: The node starts, caches the initialization data operation sequence, and reads the current version value V of the node data operation sequence set.

Step 202: Set the value of the working variable V' to the current version value V of the node data operation sequence set. Step 203: Set the initial value of the data operation sequence of the log data operation sequence list to 0. Step 204: Read a data operation sequence from the initial value of the data operation sequence of the log data operation sequence list, and the data operation sequence length is recorded as len.

Step 205: Determine whether the read data operation sequence reaches the end of the data.

If the data operation sequence is read to the end of the data, the data in the initial value of the data operation sequence of the log data operation sequence list has been read, and the process proceeds to step 206a. If the sequence of reading the data operation is unsuccessful or does not reach the end of the data, the process proceeds to step 206b.

Step 206a: Determine whether the working variable V' is equal to the current version value V of the data operation sequence set. If the working variable V' is equal to the current version value of the data operation sequence set, it means that the node state has not changed, and the process proceeds to step 209a to enter sleep waiting. If the working variable V is not equal to the version value V of the current data operation sequence set, then the state of the node changes, and the version value of the data operation sequence set also changes, and step 209b is executed.

Step 206b: Determine whether the data identifier is already in the cache.

Determining whether a data operation sequence of the same data identifier exists in the cache. If there is a data operation sequence of the same identifier in the cache, proceed to step 207a to perform a vector merge operation, if If there is no data operation sequence of the same data identifier in the cache, then the process proceeds to step 207b to add the data operation sequence to the cache.

Step 207a: The data operation sequence is vector merged.

The new data operation sequence is vector-combined with the data operation sequence having the same data identifier already existing in the cache to obtain a new sequence, and the original data operation sequence in the cache is updated.

Step 207b: Add a sequence of data operations to the cache.

The newly read sequence of data operations is added to the cache intact.

Step 208: The initial value of the data operation sequence of the log data operation sequence list is shifted backward by len length.

The initial value is shifted back by len length to skip the data sequence that has just been read and moves to the new data operation sequence. Returning to step 204, the next data operation sequence is read.

Step 209a: Enter sleep waiting.

Sleep waits for T time, and then proceeds to step 204 to merge the sequence of data operations in the most recent T time. The time T is a time that can be set according to the state of the system, and is used as a timing to judge the time value of the refresh data operation sequence set.

Step 209b: The result of the merged data operation sequence in the cache is brushed to the operation record file.

The state of the node changes, causing the data operation sequence set version value V to increase, and the merged data operation sequence set in the cache is flushed to the operation record file.

Step 210: Clear the cache and log data operation sequence list.

Clear the cache and log files and repeat step 202.

The cache is cleared for new data operation sequence storage and vector merging, and a new log data operation is created. The sequence list stores all data operation sequences, deletes the old log data operation sequence list, and proceeds to step 202 to start a new data operation sequence merging operation.

It can be seen from the above operation record file generation process that the data operation sequence set in the operation record file is flushed from the cache to the operation record file when the node status changes. Data manipulation sequence In the current version of the collection, because the current node state has not changed, the merged data operation sequence has not been flushed into the operation log file in the cache, so the current version of the data operation sequence set is missing from the operation log file. The log data operation sequence list can record all data operation sequences in the current version, and the data operation sequence for the same data identification is not vector-combined. Thus, when the operation record file lacks the current version data operation sequence set, all data operation sequences of the current version of the data operation sequence set can be searched from the log data operation sequence list.

In the data operation sequence, the data operation sequence with the same data identifier is vector-merged, and the operation is as shown in FIG. 3, which specifically includes the following steps:

Step 301: A sequence of data operations to be merged.

Step 302: Determine whether there is an overlap interval between the data operation sequence to be merged and the existing operation sequence. The data operation sequence to be merged is compared with the already merged data operation sequence. If there is overlap, the process proceeds to step 303a to perform a vector merge operation. Otherwise, the process proceeds to step 303b to insert the sequence into the existing space according to the initial value of the data operation sequence.

Step 303a: Combine the existing data operation sequence with the interval of the data operation sequence to be merged.

Combine all the parts of the merged data operation sequence that overlap with the data operation sequence to be merged. For example, the data operation sequence A: <1, 5> already exists, and the merged data operation sequence is A: <2, 6>, already The existing data operation sequence A: <1, 5> has an overlapping interval with the data operation sequence A to be merged: <2, 6>, and is A: <1, 7> after the combination.

Step 303b: Insert the data operation sequence to be merged into the existing data operation sequence.

The data operation sequence to be merged is inserted into the already existing data operation sequence. The insertion principle is to ensure that all data operation sequences after the insertion are strictly incremented by the initial value. If the existing data operation sequence is A: <1, 4>, the data operation sequence to be merged is A: <6, 3>, there is no overlap between the two data operation sequences, according to the principle that the initial value of the data operation sequence is strictly increased. , expressed as A: <1, 4X6, 3>.

Step 304: The merge is completed. The embodiment of the present invention provides data recovery for storing all data operation sequence sets (without data) before the current version and all data operation sequences when the log file stores the current version data operation sequence set based on the operation record file. The operation record file stores a set of data operation sequences of all versions prior to the current version of the data operation sequence set, wherein if there is an overlap interval between the data operation sequences having the same data identifier in each data operation sequence set, vector merging is performed. The log data operation sequence list stores all data operation sequences on the node when the current data operation sequence set version value, and the data operation sequence having the same data identification is not vector merged.

When a node in a distributed storage system fails, other storage nodes that store the same data are still working properly. If a node that stores the same data backup fails, the node status changes, which causes the node that stores the same data backup to refresh the current version of the data operation sequence set stored in the node cache to the operation record file of the node, and cache. Used to store a collection of data manipulation sequences with new version values incremented. Because the failed node cannot perform data operations until the failure recovers. After the fault recovery, the local node only needs to send the list of data operation sequence sets composed of the local node related data operation sequence set to the current data operation sequence set version value in the local node operation record file to the data recovery. The local node sends the data corresponding to the data operation sequence set list sent to the local node at the same time, and updates the data to the local node, so that the data corresponding to the operation record file can be recovered. At the same time, in order to recover all the data, the log data operation sequence list composed of the target node log data operation sequence is sent to the local node, and the data corresponding to the log data operation sequence list is sent to the local node, and the local node updates the log data operation sequence list correspondingly. The data can be updated, so the node has an operation log file, a log data operation sequence list and data. The operation record file stores a list of data operation sequence sets composed of a set of data operation operation sequences of all versions before the current version of the data operation sequence set, data operations sequences in the data operation sequence set and data written before the current version of the data operation sequence set Correspondingly, each data write operation is recorded. Log data operation sequence list stores all data operation sequences during the current version of the data operation sequence set, these data operation sequences Corresponding to the data written during the current version of the data manipulation sequence set, each data write operation is recorded.

The first embodiment of the present invention provides a distributed storage data recovery method. As shown in FIG. 4, the method includes:

Step 401: The local node receives a list of data operation sequence sets sent by the target node according to the version value of the data operation sequence set of the local node.

Optionally, the version value of the data operation sequence set of the local node is sent by the local node to the target node.

Optionally, the version value of the data operation sequence set of the local node may also be sent by the master node to the target node.

Step 402: Receive data corresponding to the data operation sequence set list sent by the target node.

Optionally, after the local node receives the data operation sequence set list sent by the target node according to the version value of the local node data operation sequence set, the method further includes: the local node performing a vector on the data operation sequence set list Merging operation, and sending a list of data operation sequence sets after the vector combining operation to the target node;

And the data corresponding to the data operation sequence set list sent by the target node is received by: receiving data corresponding to the data operation sequence set list after the vector merging operation sent by the target node.

Optionally, after the local node receives the data operation sequence set list that is sent by the target node according to the version value of the local node data operation sequence set, the method further includes:

The local node performs random input and output (Input/Output, hereinafter abbreviated as 10) on the data operation sequence set list into the operation of the sequence 10, and sends the data operation sequence set after the random 10 is merged into the sequence 10 operation. List to the target node;

And receiving the data corresponding to the data operation sequence set list sent by the target node According to the data corresponding to the operation sequence collection list.

Specifically, in the data operation sequence set list, a ratio of a hole value between the data operation sequence having the same identifier to a continuous space size spanned by the merge sequence is less than a set percentage, or a hole between data operation sequences having the same identifier If the value is less than the set threshold, the random operation 10 is merged into the sequence 10 operation on the data operation sequence set list.

Optionally, receiving data corresponding to the data operation sequence set list sent by the target node is: receiving data that is sent by the target node to perform a vector merge operation on the data operation sequence set list. Operates the data corresponding to the sequence collection list.

Optionally, receiving, by the target node, the data corresponding to the data operation sequence set list is: receiving, by the target node, the target node, randomly, combining the data operation sequence set list into a sequence of 10 The data corresponding to the data operation sequence set list after the operation. Step 403: Update data of the local node according to the data operation sequence set list and the data corresponding to the data operation sequence set list.

Optionally, the local node updates the received list of data operation sequence sets to an operation record file of the local node.

Optionally, the method further includes:

The local node receives a log operation sequence list sent by the target node;

Receiving data corresponding to the log operation sequence list sent by the target node;

Updating the data of the local node according to the log operation sequence list and the data corresponding to the log operation sequence list.

The distributed storage data recovery method provided by the first embodiment of the present invention sends a data operation sequence sent by the target node according to the version value of the data operation sequence set of the local node by sending the version value of the data operation sequence set of the local node to the target node. The collection list is used to update the data of the local node, thereby reducing the amount of data transmission during the data recovery process and saving network bandwidth. A second embodiment of the present invention provides a method for recovering distributed storage data. As shown in FIG. 5, the method specifically includes:

Step 501: Receive a version value of a data operation sequence set of the local node.

The version value of the local node's data manipulation sequence set may be received by the target node.

Specifically, the current version value of the local node data operation sequence set received by the target node may be sent by the local node.

Specifically, the current version value of the local node data operation sequence set received by the target node may also be sent by the master node.

Step 502: Send a data operation sequence set list to the local node according to the version value. Optionally, searching for a data operation sequence set in the operation record file of the target node that has a version value greater than a version value of the data operation sequence set of the local node, and selecting a data operation sequence set related to the local node to generate the data. a list of operational sequence sets;

Sending the list of data manipulation sequence sets to the local node.

Specifically, a data operation sequence set associated with the local node is selected using a hash algorithm under a distributed hash table architecture or an allocation table algorithm in a metadata service architecture.

Step 503: Send, according to the data operation sequence set list, data corresponding to the data operation sequence set list to the local node.

And before the sending, according to the data operation sequence set list, the data corresponding to the data operation sequence set list to the local node, the method further includes: performing a vector merging operation on the data operation sequence set list;

And sending the data corresponding to the data operation sequence set list to the local node, where the data corresponding to the data operation sequence set list after the vector combining operation is sent to the local node.

Before the sending, according to the data operation sequence set list, the data corresponding to the data operation sequence set list to the local node, the method further includes: merging the data operation sequence set list into a sequence 10 by random 10 ; And sending, by the local node, the data corresponding to the data operation sequence set list is: data corresponding to the table.

Optionally, before the sending, according to the data operation sequence set list, the data corresponding to the data operation sequence set list to the local node, the method further includes: receiving, by the local node, the list of the data operation sequence set a list of data manipulation sequence sets after the vector merge operation;

Optionally, before the sending, according to the data operation sequence set list, the data corresponding to the data operation sequence set list to the local node, the method further includes: receiving, by the local node, the data operation sequence The list of data operations sequence sets after the random 10 merge order 10 operation is performed on the set list;

And sending the data corresponding to the data operation sequence set list to the local node, where the data corresponding to the data operation sequence set list after the random 10 merge order 10 operation is sent to the local node.

Optionally, the method further includes receiving a log data operation sequence request of the local node;

Sending a log data operation sequence list to the local node according to the log data operation sequence request,

And transmitting, according to the log data operation sequence list, data corresponding to the log data operation sequence list to the local node.

The distributed storage data recovery method provided by the second embodiment of the present invention transmits a data operation sequence set list by receiving a version value of a local operation data operation sequence set, and sends a data operation sequence set list corresponding according to the data operation sequence set list. Data, to update the data of the local node, thereby reducing the amount of data transmission during data recovery, saving network bandwidth.

A third embodiment of the present invention provides a distributed storage data recovery method, as shown in FIG. The body includes the following steps:

Step 601: Start recovering data.

After the local node fails, the data recovery process begins. When the local node fails, other nodes that store the same data backup still work normally, and the data storage operation is performed. When the local node recovers from the failure, it needs to be consistent with the data backup stored on other nodes. Therefore, it is necessary to restore the data that cannot be stored after the failure on the local node.

Step 602: Send a data operation sequence set current version value to the target node.

The local node reads the current version of the data operation sequence set stored in the operation log file before the failure, and sends the current version value to the target node. When the local node fails, the state of the node changes, and the version value of the data operation sequence set on the other nodes storing the same data backup increases. Therefore, it is necessary to send the current version value of the data operation sequence set in the local node operation record file to the target node, thereby finding the data operation sequence set added by the target node during the failure. In this step, the current version value of the local node data operation sequence set received by the target node may also be sent by the master node.

Step 603: Receive a version value of the data operation sequence set of the local node and find a data operation sequence set whose operation value in the operation record file is greater than the local node version value.

The target node finds, from the target node operation record file, a set of all data operation sequences whose version value is greater than the current version value of the data operation sequence set in the local node operation record file. During the failure of the local node, other nodes storing the same data backup may have a state change again. Therefore, it is necessary to find in the target node operation record file that the version value is greater than the current version value of the data operation sequence set in the local node operation record file. A collection of all data manipulation sequences. These data manipulation sequence collections whose version values are greater than the current version value of the data manipulation sequence collection in the local node operation log file record all write operations to the data backup after the local node failure.

Step 604: Generate a list of data operation sequence sets related to the local node.

The version value stored in the target node operation log file is larger than the local node operation record file. Data operation sequence set The data operation sequence set of the current version value is a data operation sequence set corresponding to all data stored on the target node, so there is data not related to the local node, meaning that data operations not related to the local node are also stored. sequence. Therefore, after the target node generates a list of data operation sequence sets whose version value is greater than the current version value of the data operation sequence set in the local node operation record file, it is required to calculate a version value greater than the data operation sequence in the local node operation record file according to the data placement algorithm. The data identifier of the data operation sequence of the data operation sequence set list of the current version value is set to belong to the local node, so that the data operation sequence not belonging to the local node is from the data operation sequence whose version value is greater than the current version value of the local node data operation sequence set. The collection of data in the collection is removed from the sequence of operations. Commonly used data placement algorithms include using a distributed hash table algorithm in a distributed hash table architecture or an allocation table algorithm in a metadata service architecture, and calculating a version value greater than a local node operation record file according to a data identifier of the data operation sequence. A set of data manipulation sequences on the local node in the list of data manipulation sequence sets of the current version value of the data manipulation sequence set, generating a list of data manipulation sequence sets associated with the local node.

Step 605: Send a list of data operation sequence sets related to the local node.

In order to restore the data that cannot be stored during the failure of the local node, it is first necessary to obtain the data operation sequence generated by the target node during the local node failure, that is, the version value mentioned in the embodiment of the present invention is larger than the data operation sequence in the local node operation record file. The data operation sequence in the data operation sequence set list of the current version value is collected, and the partial operation sequence is used to recover the corresponding data.

Step 606: The local node receives a list of data operation sequence sets related to the local node.

The target node sends a list of data operation sequence sets generated by the target node during the local node failure, and the local node receives a list of data operation sequence sets whose version value is greater than the current version value of the data operation sequence set in the local node operation record file, for updating and receiving. The version value is greater than the data operation sequence set list corresponding to the current version value of the data operation sequence set in the local node operation record file. Because each data operation sequence in the data operation sequence set list of the current version value of the data operation sequence set records a data write operation, when restoring data, it is required to follow the corresponding The data manipulation sequence updates the data corresponding to the data manipulation sequence record. The version value may be greater than the current version value of the data operation sequence set in the local node operation record file and the list of data operation sequence sets related to the local node is stored in the operation record file of the local node, and the version value is greater than the local node operation record file. The data manipulation sequence set in the current version value and the list of data manipulation sequence sets associated with the local node are not vector-combined and/or randomly merged into a sequence of 10. Storing the version value with the current version value of the data operation sequence set in the local node operation record file and storing the list of data operation sequence sets related to the local node in the operation record file of the local node is optional, not necessary for the embodiment of the present invention. step.

Step 607: Find data corresponding to the data operation sequence set list whose generation version value is greater than the current operation value of the data operation sequence set in the local node operation record file and related to the local node.

The target node reads the corresponding data according to the generated version value greater than the data operation sequence in the local node operation record file, and the current version value is read and the corresponding data is read from the list of data operation sequence sets associated with the local node. After the version value is greater than the current version value of the data operation sequence set in the local node operation record file and the list of data operation sequence sets related to the local node is sent to the target node, the version value needs to be larger than the data operation in the local node operation record file. The sequence sets the current version value and the data corresponding to the list of data operation sequence sets associated with the local node is sent to the local node. In this embodiment, the version value is greater than the current version value of the data operation sequence set in the local node operation record file, and the data corresponding to the data operation sequence set list related to the local node is one by one according to the data operation sequence in the data operation sequence set list. Find the data generated by reading the corresponding data.

Step 607 is performed after step 604, and may be performed simultaneously with step 605, or may be performed after step 605.

Step 608: The target node sends data corresponding to the list of data operation sequence sets related to the local node.

Step 609: Receive data corresponding to the data operation sequence set list.

The local node receives a version value greater than the data operation sequence set in the local node operation record file. The current version value and the data corresponding to the list of data manipulation sequence sets associated with the local node.

Step 610: Update the data.

The data of the local node is updated according to the version value of the data operation sequence set related to the local node operation record file and the data operation sequence set list related to the local operation node and the data corresponding to the data operation sequence set list.

After receiving the data corresponding to the current version value of the data operation sequence set in the local node operation record file and the data operation sequence set list related to the local node, the local node receives the data operation sequence in the local node operation record file according to the version value. A list of data manipulation sequence sets that aggregate the current version value and associated with the local node, updating the data to the local node.

The following is an optional step (not shown in FIG. 6) of the embodiment of the present invention:

Step 611: Send a 1 og data operation sequence request.

Since the local node is performing data recovery, the data operation sequence set recorded in the operation record file in the target node is all data operation sequence sets before the current version of the target node data operation sequence set, and the data operation sequence set of the current version of the target node is In the node cache, it is not refreshed to the operation record file of the target node. Therefore, in order to recover all the data, it is also necessary to restore the data corresponding to the current version of the data operation sequence set of the target node to the local node. Since the current version of the data operation sequence of the target node is stored in the cache, it cannot be flushed to the target node operation log file until the node status changes, and is now unreadable. Therefore, the log data operation sequence of the target node is required, and the data corresponding to the log data operation sequence of the target node is used for data recovery. In this step, the log data operation sequence request of the local node may also be sent by the control node.

Step 612: The target node receives the log data operation sequence request and searches for a list of log data operation sequences related to the local node.

Since the log data operation sequence list stored on the target node is all data operation sequences during the current version of the target node data operation sequence set, it is required to request the target node to associate the target node with the local node according to the log data operation sequence sent by the local node. Operation sequence list transmission Go to the local node. The method of selecting the list of 1 og data operation sequences associated with the local node is the same as step 604. Step 613: The target node sends a log data operation sequence list.

Step 614: The local node receives and updates the log data operation sequence list.

After receiving the list of log data operation sequences sent by the target node, the local node updates the local log data operation sequence.

Step 615: Find data corresponding to the log data operation sequence list related to the local node. In order to recover the data corresponding to the log data operation sequence list associated with the local node, the target node searches for the corresponding data one by one according to the data operation sequence corresponding to the log data operation sequence table associated with the local node.

Step 616: The target node sends data corresponding to the log data operation sequence list.

Step 615 may be performed after step 612, may be performed concurrently with step 613, or may be performed after step 613.

Step 617: The local node receives the data corresponding to the log data operation sequence list.

Step 618: The local node updates the local node data according to the log operation sequence list and the data corresponding to the log operation sequence list.

In the embodiment of the present invention, by sending the current version value of the data operation sequence set of the local node to the target node, and receiving the data operation sequence set list sent by the target node according to the version value of the data operation sequence set of the local node, the data recovery is performed, and the data recovery is reduced. The amount of data transferred saves network bandwidth.

A fourth embodiment of the present invention provides a distributed storage data recovery method. As shown in FIG. 7, the specific embodiment includes:

Step 701 to step 706 are the same as step 601 to step 606 of the third embodiment of the present invention, and details are not described herein again. Step 707: The local node compares the version value with the current version value of the data operation sequence set in the local node operation record file and the vector operation sequence set list related to the local node.

In order to reduce the amount of data that needs to be recovered, the version value is greater than the local node operation record. After the data operation sequence in the piece collects the current version value and the list of data operation sequence sets related to the local node, the local node receives the version value greater than the current version value of the data operation sequence set in the local node operation record file and the local node A list of related data manipulation sequence sets is vector merged. The specific operation method is that the version value is greater than the current version value of the data operation sequence set in the local node operation record file, and the data operation sequence set list related to the local node is the same data regardless of whether the version value of the data operation sequence set is the same or not. The operation sequence is vector-merged. The principle of merging is to merge all overlapping parts of the interval. If both data identifiers are

0x123 data operation sequence: 0x123: < 0, 1024> < 2000, 1024> and 0x123: < 500, 4096>, then the combined sequence is 0x123: 〈 0, 4596>. For example, the data operation sequence 0x321 with three data identifiers being 0x321: <0, 512> < 1024, 1024>, 0x321: < 1500, 2000>, 0x321: < 4096, 10240), the combined operation sequence is 0x321: <0, 512> < 1024, 2476> < 4096, 10240>.

Step 708: The version value is greater than the data operation sequence set current version value in the local node operation record file and the data operation sequence set list associated with the local node is randomly 10 combined into the sequence 10.

For the list of data operation sequence sets whose version value is greater than the current operation value of the data operation sequence set in the local node operation record file and related to the local node, the random distribution is evaluated by a statistical algorithm, and the random 10 is merged into the sequence 10 to reduce the node. And network overhead for optimal recovery performance. There are many kinds of statistical algorithms. The common statistical algorithm is the space in the data operation sequence in the list of statistical data operation sequence sets (the space with the same data identifier does not repeat), and then the space occupied by the merged sequence The size comparison, calculate the percentage value, if the percentage value is less than the value set by the system, you can combine these data operation sequences with the same data identification. The percentage value set by the system can be set and adjusted as needed. The present embodiment is exemplified by 20%, but this is not a limitation of the present invention, but only to explain the embodiment of the present invention more clearly. The sequence of operations after the combination in step 707 is 0x321: <0, 512>< 1024, 2476>< 4096, 10240) For example, after the three data operation sequences are merged, the hole size is [1024-(0+512)] + C4096-(1024+2476)]=1108, and the span interval is 4096+10240-0=14336, and the percentage of holes is approximately 7.1%, less than 20%, merged into order 10 is 0x321: < 0, 14336>. In addition, you can also use whether the hole value between the data operation sequence is greater than a threshold to determine whether to merge, the setting of the threshold can be based on the actual Need to be set and adjusted.

The embodiment of the present invention may only perform step 707 or step 708, and may also perform step 707 and step 708 at the same time. After step 709 and step 709, the version value after the processing operation is uniformly greater than that in the local node operation record file. The data manipulation sequence aggregates the current version value and a list of data manipulation sequence sets associated with the local node.

Performing, by step 707, or step 708, or step 707 and step 708, a list of data operation sequence sets having a version value greater than a current version value of the local node data operation sequence set and related to the local node, but the operation does not affect step 706 The list of data manipulation sequence sets that have been received with a version value greater than the current version value of the data manipulation sequence set in the local node operation record file and associated with the local node.

Step 709: Send a list of data operation sequence sets after the processing operation to the target node.

Step 710: The target node receives and searches for the corresponding data according to the data operation sequence set list after the processing operation.

The target node receives the data operation sequence set list after the processed operation, searches and reads the corresponding data one by one according to the data operation sequence in the data operation sequence set list after the processed operation, and generates data corresponding to the data operation sequence set list.

Step 711: Send data corresponding to the data operation sequence set list.

The data corresponding to the processed data operation sequence set list generated according to step 710 is used as the version value corresponding to the current operation value of the data operation sequence set in the local node operation record file and corresponding to the data operation sequence set list related to the local node. The data is sent to the local node.

Step 712: The local node receives the number corresponding to the data operation sequence set list after the processing operation According to.

Step 713: Update the data.

The local node updates the data of the local node according to the data operation sequence set list of the unprocessed operation and the data corresponding to the data operation sequence set list after receiving the processed operation.

The following are optional steps of an embodiment of the invention (not shown in Figure 7):

Steps 714 to 715 refer to steps 611 to 614 of the third embodiment of the present invention.

Step 716: Perform vector merging of the 1 og data operation sequence list.

All the data operation sequences with the same data identifiers in the log data operation sequence list are vector-combined, and the specific merge direction is the same as step 707.

Step 717: Combine the 1 og data operation sequence list into 10 random numbers.

The log data operation sequence list random 10 is merged into the sequence 10, and the specific merge mode is the same as step 708.

The embodiment of the present invention may only perform step 716 or step 717, and may also perform step 716 and step 717 at the same time. In step 718 and step 718, it is collectively referred to as a log data operation sequence list after the processing operation.

Thus, the log data operation sequence list after the operation is processed in step 716, or step 717, or step 716 and step 717 does not affect the list of log data operation sequences that the local node has received in step 715.

Step 718: The local node sends a list of log data operation sequences after the processing operation.

Step 719: The target node receives and searches for the corresponding data according to the log data operation sequence list after the processed operation.

The target node receives the log data operation sequence list after the processed operation, and searches for the corresponding data one by one according to the data operation sequence in the log data operation sequence list after the processing operation, and forms data corresponding to the log data operation sequence list.

Step 720: Send data corresponding to the log data operation sequence list. Step 721: Locally receive data corresponding to the list of l og data operation sequences after the processing operation. Step 722: Update data corresponding to the log data operation sequence list after the processing operation.

The data corresponding to the log data operation sequence list after the processing operation is updated to the local node according to the data corresponding to the l og data operation sequence list after the unprocessed operation and the l og data operation sequence list after the processing operation.

The step 615 to the step 618 in the third embodiment of the present invention may be directly executed without performing step 716 to step 722.

The distributed storage data recovery method provided by the embodiment of the invention can further reduce the transmission of duplicate data, save network bandwidth, and reduce the load of the target node.

A fifth embodiment of the present invention provides a distributed storage data recovery method. As shown in FIG. 8, the specific method includes:

Steps 801 to 806 are the same as steps 601 to 606 of the third embodiment of the present invention, and are not described again. Step 807: Perform vector combination of the version value of the data operation sequence set related to the local node and the data operation sequence set related to the local node in the local node operation record file.

A method of performing vector merging of a list of data operation sequence sets related to a local node with a version value greater than a data operation sequence set in the local node operation record file may refer to step 707 in the fourth embodiment of the present invention.

Step 808: Combine the random value 10 whose version value is greater than the current version value of the data operation sequence set in the local node operation record file and the data operation sequence set list related to the local node into the sequence 10.

Step 808: The method of combining the current version value of the data operation sequence set in the local node operation record file and the data operation sequence set list random sequence 10 related to the local node into the sequence 10 may refer to step 708 in the fourth embodiment of the present invention. .

The embodiment of the present invention may only perform step 807 or step 808, and may also perform step 807 and step 808 at the same time. In step 809, the version value after the processing operation is collectively greater than the local value. The data operation sequence in the node operation record sets a current version value and a list of data operation sequence sets associated with the local node.

Step 807, or step 807, or step 807 and step 808, the version value is greater than the current version value of the data operation sequence set in the local node operation record file and the data operation sequence set list related to the local node is processed, but the operation is performed. The list of data manipulation sequence sets that have received the version value in step 806 that is greater than the current version value of the data manipulation sequence set in the local node operation record file and related to the local node are not affected.

Step 809: Find corresponding data according to the data operation sequence set list after the processed operation. The target node reads and reads the corresponding data according to the version value after the processing operation is greater than the current version value of the data operation sequence set in the local node operation record file and the data operation sequence in the data operation sequence set list related to the local node is searched one by one. The data corresponds to the data in the sequence set list. The data corresponding to the list of data operation sequence sets used to restore the local node version value is greater than the data operation sequence in the local node operation record file.

Steps 807 to 809 may be performed before step 806 after step 805, or may be performed simultaneously with step 806, or may be performed after step 806.

Step 810: The target node sends the data corresponding to the data operation sequence set list after the processing operation.

Step 811: The local node receives the data corresponding to the data operation sequence set list after the processing operation.

Step 812: Update the data corresponding to the data operation sequence set list after the processing operation to update the data of the local node.

The following are optional steps of an embodiment of the invention (not shown in Figure 8):

Steps 813 to 816 refer to the description of steps 611 to 614 of the third embodiment of the present invention. Step 817: The target node performs vector merging of the log data operation sequence list.

The target node will have all data manipulation sequences with the same data identifier in the log data operation sequence. For vector merging, the specific method can refer to step 807.

Step 818: The target node merges the log data operation sequence list random 10 into the sequence 10. Combine the log data operation sequence list random 10 into the sequence 10 processing, the specific method is the same as the step

808.

The embodiment of the present invention may only perform step 817 or step 818, and may also perform step 817 and step 818 at the same time. In step 819 and step 819, it is collectively referred to as a log data operation sequence list after the processing operation.

Thus, the log data operation sequence list after the operation is processed in step 817, or step 818, or step 817 and step 818 does not affect the list of log data operation sequences that the local node has received in step 816.

Step 819: Find corresponding data according to the 1 og data operation sequence list after the merging processing operation. According to the data operation sequence in the log data operation sequence list after the processing operation, the corresponding data is read one by one to form data corresponding to the log data operation sequence list.

Step 820: Send data corresponding to the log data operation sequence list.

Step 821: The local node receives the data corresponding to the log data operation sequence list after the processing operation.

Step 822: Update the data corresponding to the log data operation sequence list after the processing operation.

The data corresponding to the log data operation sequence list after the processing operation is updated to the local node according to the log data operation sequence list after the unprocessed operation and the data corresponding to the log data operation sequence list after the processing operation.

The step 615 to the step 618 in the third embodiment of the present invention may be directly executed without performing the steps 817 to 822.

The distributed storage data recovery method provided by the embodiment of the invention reduces the transmission of duplicate data, further reduces the data that needs to be restored, and saves network bandwidth.

A sixth embodiment of the present invention provides a node, as shown in FIG. 9, specifically including a receiving unit 901. And update unit 902.

The receiving unit 901 is configured to receive a data operation sequence set list sent by the target node according to the version value of the node data operation sequence set, and receive data corresponding to the data operation sequence set list sent by the target node.

The updating unit 902 is configured to update the data of the node according to the data operation sequence set list and the data corresponding to the data operation sequence set list. The receiving unit 901 is further configured to receive a log operation sequence list sent by the target node, and further configured to receive data corresponding to the log operation sequence list sent by the target node. The updating unit 902 is further configured to update the node data according to the log operation sequence list received by the receiving unit 901 and the data corresponding to the log operation sequence list.

Optionally, the data corresponding to the data operation sequence set list sent by the target node received by the receiving unit 901 may be data corresponding to the data operation sequence set list after the target node sends the vector operation operation to the data operation sequence set list. Then, the updating unit 902 is specifically configured to update the data of the node according to the data operation sequence set list and the data corresponding to the data operation sequence set list after the vector combining operation.

Optionally, the data corresponding to the data operation sequence set list sent by the target node received by the receiving unit 901 may be a data operation sequence after the target node sent by the target node performs random 10 merging to the data operation sequence set list. The data corresponding to the collection list. Then, the updating unit 902 is specifically configured to update the data of the node according to the data operation sequence set list and the data corresponding to the data operation sequence set list after the random 10 merge operation.

The update unit 902 can also be used to update the list of received data manipulation sequence sets to the operation log file.

The node provided by the embodiment of the present invention receives, by the receiving unit, a list of data operation sequence sets sent by the target node according to the version value of the data operation sequence set of the node, and the update unit updates the data of the node, thereby reducing the data recovery process. The amount of data transferred in the network saves network bandwidth.

A seventh embodiment of the present invention provides a node, as shown in FIG. 10, specifically including a sending unit. 1001. Receiving unit 1002 and updating unit 1 003.

The sending unit 1001 is configured to send, to the target node, a version value of the data operation sequence set of the node. The receiving unit 1002 is configured to receive a data operation sequence set list sent by the target node according to the version value of the node data operation sequence set, and receive data corresponding to the data operation sequence set list sent by the target node. The updating unit 1003 is configured to update the data of the node according to the data operation sequence set list and the data corresponding to the data operation sequence set list.

The update unit 1 003 can also be used to update the list of received data operation sequence sets to the operation log file.

Further description of the receiving unit 1002 can be referred to the receiving unit 901 of the sixth embodiment, and will not be described again.

Further description of the update unit 1003 can be referred to the update unit 902 of the sixth embodiment, and will not be described again.

The node provided by the embodiment of the present invention sends a version value of the data operation sequence set of the node to the target node by using the sending unit, and the receiving unit receives the data operation sequence set sent by the target node according to the version value of the data operation sequence set of the node. The list, the update unit updates the data of the node, thereby reducing the amount of data transmission during the data recovery process and saving network bandwidth. Moreover, the operation log file of the node can be further updated.

The eighth embodiment of the present invention provides a node, as shown in FIG. 11, specifically, including a sending unit 1101, a receiving unit 1102, a vector combining unit 11 03, and an updating unit 11 04.

The sending unit 11 01 is configured to send, to the target node, a version value of the data operation sequence set of the node. The receiving unit 1102 is configured to receive a data operation sequence set list that is sent by the target node according to a version value of the node data operation sequence set. The vector merging unit 1 103 is configured to perform a vector merging operation on the list of data operation sequence sets received by the receiving unit 1101. The sending unit 1101 is further configured to send a data operation sequence set list after the vector combining operation to the target node, and the receiving unit is further configured to receive data corresponding to the data operation sequence set list after performing the vector combining operation. Update unit 1 104, The data of the node is updated according to the data operation sequence set list and the data corresponding to the data operation sequence set list. The updating unit 1104 can also be configured to update the received data operation sequence set list to the operation record file.

Further description of the receiving unit 1102 can be referred to the receiving unit 901 of the sixth embodiment, and will not be described again.

Further description of the update unit 1104 can be referred to the update unit 902 of the sixth embodiment, and will not be described again.

The node provided by the embodiment of the present invention can further reduce the transmission amount of the restored data and reduce the target node by sending a current version value of the data operation sequence set to the target node and performing a vector merge operation on the data operation sequence set list sent by the target node. The load when restoring data saves network bandwidth.

A ninth embodiment of the present invention provides a node, as shown in FIG. 12, specifically including a transmitting unit 1201, a receiving unit 1202, a merging unit 1203, and an updating unit 1204.

The sending unit 1201 is configured to send, to the target node, a version value of the data operation sequence set of the node. The receiving unit 1202 is configured to receive a data operation sequence set list that is sent by the target node according to a version value of the node data operation sequence set. The merging unit 1203 is configured to combine the data operation sequence set list received by the receiving unit 1202 into a random 10 operation into a sequence 10 operation. The sending unit 1201 is further configured to send, to the target node, a data operation sequence set list in which the random 10 is merged into the sequence 10 operation, and the receiving unit is further configured to receive data corresponding to the data operation sequence set list after the random 10 merge to the sequence 10 operation. The updating unit 1204 updates the data of the node according to the data operation sequence set list and the data corresponding to the data operation sequence set list. The update unit 1204 can also be configured to update the received list of data manipulation sequence sets to the operation log file.

In another embodiment, the vector merging unit 1103 and the merging unit 1203 may be simultaneously included, and the data operation sequence set sent by the target node is subjected to vector merging processing and the random 10 is combined into the sequence 10 processing. For further description of the receiving unit 1202, reference may be made to the receiving unit 901 of the sixth embodiment, and details are not described herein again.

Further description of the update unit 1204 can be referred to the update unit 902 of the sixth embodiment, and will not be described again.

The node provided by the embodiment of the present invention can perform the merge operation of the data operation sequence set list sent by the target node by sending the current version value of the data operation sequence set to the target node, thereby effectively reducing the transmission amount of the restored data and saving the network bandwidth. , to reduce the target node load.

The node provided in the sixth to ninth embodiments of the present invention may further be configured to receive a log operation sequence list sent by the target node, and further configured to receive data corresponding to the log operation sequence list sent by the target node. The updating unit may be further configured to update the data of the node according to the log operation sequence list received by the receiving unit and the data corresponding to the log operation sequence list. Thereby restoring all data of the node and reducing the amount of data transmission.

For the nodes provided in the sixth to ninth embodiments of the present invention, reference may be made to the description of the local node in the first to fifth embodiments of the method.

A tenth embodiment of the present invention provides a node, as shown in FIG. 13, including a receiving unit 1301 and a transmitting unit 1302.

The receiving unit 1301 is configured to receive a version value of the local node data operation sequence set. The sending unit 1302 is configured to send, according to the version value received by the receiving unit 1301, data corresponding to the data operation sequence set list and the data operation sequence set list to the local node.

The receiving unit 1301 is further configured to receive a list of data operation sequence sets sent by the local node to perform a vector combining operation on the data operation sequence set list. Then, the data corresponding to the data operation sequence set list sent by the sending unit 1302 is specifically the data corresponding to the data operation sequence set list after the vector combining operation.

The receiving unit 1301 is further configured to receive a data operation sequence set list that is sent by the local node and that performs the random 10 merge sequence 10 operation on the data operation sequence set list. Then send at this time The data corresponding to the data operation sequence set list sent by the unit 1 302 is specifically: the data corresponding to the data operation sequence set list after the random 10 merge order 10 operation.

The node provided by the embodiment of the present invention may provide a data set corresponding to the current version of the data operation sequence set of the local node and a data corresponding to the data operation sequence set list, and provide data recovery for the local node.

According to the embodiment of the present invention, the receiving unit receives the version value of the data operation sequence set of the local node, and the sending unit is configured to send the data operation sequence set list and the data operation sequence set list to the local node according to the version value received by the receiving unit. The corresponding data is used to update the data of the local node, thereby reducing the amount of data transmission during the data recovery process and saving network bandwidth.

An eleventh embodiment of the present invention provides a node, as shown in FIG. 14, comprising: a receiving unit 1401, a transmitting unit 1402, and a vector combining unit 1403.

The receiving unit 1401 is configured to receive a version value of the local node data operation sequence set. The sending unit 1403 is configured to send, according to the version value received by the receiving unit 1401, a data operation sequence set list to the local node. The vector merging unit 1403 is configured to perform vector merging on the data operation sequence set list, and the sending unit 1402 is further configured to send, to the local node, data corresponding to the data operation sequence set list after performing the vector merging operation.

Further description of the receiving unit 1401 can be referred to the receiving unit 1 301 of the tenth embodiment, and will not be described again.

Further description of the transmitting unit 1402 can be referred to the updating unit 1 302 of the tenth embodiment, and will not be described again.

The node provided by the embodiment of the present invention may provide a data set of the current version value data operation sequence set larger than the local node data operation sequence set and the data corresponding to the data operation sequence set list after the vector merge process, and provide data recovery for the local node. , further reducing the amount of data transferred.

A twelfth embodiment of the present invention provides a node, as shown in FIG. 15, comprising: a receiving unit 1501 The transmitting unit 1502 and the merging unit 1503.

The receiving unit 1501 is configured to receive a version value of the local node data operation sequence set. The sending unit 1503 is configured to send, according to the version value received by the receiving unit 1501, a data operation sequence set list to the local node. The merging unit 1503 is configured to perform random 10 merging into a sequence 10 operation on the data operation sequence set list, and the sending unit 1502 is further configured to send, to the local node, data corresponding to the data operation sequence set list after the random 10 merging to the sequence 10 operation.

Further description of the receiving unit 1501 can be referred to the receiving unit 1301 of the tenth embodiment, and will not be described again.

Further description of the transmitting unit 1502 can be referred to the updating unit 1302 of the tenth embodiment, and will not be described again.

In another embodiment of the present invention, the vector merging unit 1403 and the merging unit 1503 may be simultaneously included, the data operation sequence set is subjected to vector merging processing, and the random ray 10 is combined into a sequence 10 processing. It can reduce the overhead of data transmission and nodes.

The node provided by the embodiment of the present invention may provide a data sequence sequence set of the data operation sequence set larger than the local node, and a data sequence sequence set corresponding to the data sequence sequence set of the sequence 10 processed by the sequence 10, which is a local node. Provides data recovery while reducing node overhead.

The node provided by the tenth to twelfth embodiments of the present invention, the receiving unit is further configured to receive a log data operation sequence request of the local node. The sending unit is further configured to: according to the log data operation sequence request, send a log data operation sequence list to the local node, and send the data corresponding to the log data operation sequence list to the local node according to the log data operation sequence list.

The node provided by the tenth to the twelfth embodiments of the present invention may further include a search generating unit, configured to search for a data operation in which the version value of the operation record file of the target node is greater than the local node according to the version value of the data operation sequence set of the local node. A set of data manipulation sequences of version values of the sequence set, and selecting a set of data manipulation sequences associated with the local node to generate the list of data manipulation sequence sets. For the nodes provided in the tenth to twelfth embodiments of the present invention, reference may be made to the description of the target nodes in the first to fifth embodiments of the method.

A thirteenth embodiment of the present invention provides a distributed storage data recovery system, as shown in FIG. 16, including a local node 1601 and a target node 1602.

The local node 1601 is configured to receive a data operation sequence set list that is sent by the target node according to the version value of the data operation sequence set of the local node, and receive data corresponding to the data operation sequence set list sent by the target node, and is also used to Updating the data of the local node according to the data operation sequence set list and the data corresponding to the data operation sequence set list. The target node 1602 is configured to receive a version value of the data operation sequence set of the local node, and send, according to the version value, a data operation sequence set list and a data corresponding to the data operation sequence set list to the local node.

The system provided by the embodiment of the present invention reduces the amount of data transmission and saves network bandwidth by transmitting the version number of the data operation sequence set of the local node for data recovery.

The nodes provided in the sixth to twelfth embodiments of the present invention and the system provided in the thirteenth embodiment can be specifically referred to the description of the method embodiments of the present invention.

The distributed storage data recovery system provided by the embodiment of the present invention further reduces the data transmission amount by merging the data operation sequence set list and the log data operation sequence list into the sequence 10, thereby reducing the network bandwidth. . At the same time, the load on the target node is alleviated.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both, for clarity of hardware and software. Interchangeability, the composition and steps of the various examples have been generally described in terms of function in the above description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.

It will be apparent to those skilled in the art that the above description is convenient and concise for the description. For a specific working process of the system, the device, and the unit, refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

In the several embodiments provided herein, it should be understood that the disclosed systems, devices, and methods can be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed. In addition, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical, mechanical or otherwise.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may contribute to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a USB flash drive, a removable hard disk, a read-only memory (ROM), a random access memory (RAM, Random Acces s Memory), a magnetic disk or an optical disk, and the like, which can store program codes. medium.

The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of protection of the present invention should Subject to the scope of protection of the claims

Claims

A distributed storage data recovery method, comprising:

The local node receives a list of data operation sequence sets sent by the target node according to the version value of the data operation sequence set of the local node;

Receiving data corresponding to the data operation sequence set list sent by the target node; updating data of the local node according to the data operation sequence set list and the data corresponding to the data operation sequence set list.

2. The method according to claim 1, wherein a version value of the data operation sequence set of the local node is sent by the local node to the target node.

3. The method of claim 1 or 2, wherein

After the local node receives the data operation sequence set list sent by the target node according to the version value of the local node data operation sequence set, the method further includes: the local node performing a vector merge operation on the data operation sequence set list, and Sending a list of data operation sequence sets after the vector combining operation to the target node;

The method according to any one of claims 1 to 3, wherein after the local node receives the data operation sequence set list sent by the target node according to the version value of the local node data operation sequence set, the method further includes :

The local node performs a random 10 merging operation on the data operation sequence set list into a sequence 10 operation, and sends the random operation 10 merging to the sequence 10 operation data operation sequence set list to the target node;

And the data corresponding to the data operation sequence set list corresponding to the data operation sequence set list sent by the target node is received.

5. The method according to claim 4, wherein, in the data operation sequence set list, a ratio of a hole value between the data operation sequences having the same identifier to a continuous space size spanned by the merge sequence is less than a set percentage Or, if the hole value between the data operation sequences having the same identifier is less than the set threshold, the random operation 10 is merged into the sequence 10 operation on the data operation sequence set list.

The method according to claim 1 or 2, wherein receiving data corresponding to the data operation sequence set list sent by the target node is specifically: receiving the target node pair sent by the target node The data operation sequence set list performs data corresponding to the data operation sequence set list after the vector merge operation.

The method according to claim 1, 2 or 6, wherein the receiving the data corresponding to the data operation sequence set list sent by the target node is specifically: receiving the target node sent by the target node The data operation sequence set list is randomly 10 combined into data corresponding to the data operation sequence set list after the sequence 10 operation.

The method according to any one of claims 1 to 7, wherein the local node updates the received data operation sequence set list to the operation record file of the local node.

The method according to any one of claims 1 to 8, further comprising:

The local node receives a log operation sequence list sent by the target node;

A distributed storage data recovery method, comprising:

Receiving a version value of a data operation sequence set of the local node;

And sending, according to the version value, a data operation sequence set list to the local node; and sending, according to the data operation sequence set list, data corresponding to the data operation sequence set list to the local node.

The method of claim 10, wherein the sending the data operation sequence set list to the local node according to the version value comprises:

Finding a data operation sequence set whose version value in the operation record file of the target node is greater than a version value of the data operation sequence set of the local node, and selecting a data operation sequence set related to the local node to generate the data operation sequence set list ;

Sending the list of data manipulation sequence sets to the local node.

12. The method according to claim 11, wherein the data operation sequence set related to the local node is selected by using a hash algorithm under a distributed hash table architecture or an allocation table algorithm in a metadata service architecture. .

13. A method according to any one of claims 10 to 12, characterized in that

14. A method according to any one of claims 10 to 13 wherein:

Before the sending, according to the data operation sequence set list, the data corresponding to the data operation sequence set list to the local node, the method further includes: merging the data operation sequence set list into a sequence 10 by random 10 ;

And sending, by the local node, the data corresponding to the data operation sequence set list is: data corresponding to the table.

15. A method as claimed in any one of claims 10 to 12, characterized in that

Before the sending, according to the data operation sequence set list, the data corresponding to the data operation sequence set list to the local node, the method further includes: receiving the data operation sent by the local node a list of data operation sequence sets after the vector combination operation is performed on the sequence collection list;

The method according to claim 10, 11, 12 or 15, wherein, before the data corresponding to the data operation sequence set list is sent to the local node according to the data operation sequence set list, The method further includes: receiving, by the local node, a list of data operation sequence sets after performing the random 10 merge sequence 10 operation on the data operation sequence set list;

The method according to any one of claims 10 to 16, further comprising: receiving a log data operation sequence request of the local node;

18. A node, comprising:

And an updating unit, configured to update data of the node according to the data operation sequence set list and the data corresponding to the data operation sequence collection list.

The node according to claim 18, further comprising:

And a sending unit, configured to send, to the target node, a version value of the data operation sequence set of the node.

The node according to claim 19, further comprising:

a vector merging unit, configured to perform a vector merging operation on the list of the data operation sequence received by the receiving unit;

The sending unit is further configured to send, to the target node, a data operation sequence set list after the vector combining operation;

The data corresponding to the data operation sequence set list received by the receiving unit is specifically: the data corresponding to the data operation sequence set list after the vector combining operation.

The node according to claim 19 or 20, further comprising:

a merging unit, configured to perform the random sequence 10 operation of the data operation sequence set list received by the receiving unit; according to the operation sequence set list;

The data corresponding to the data operation sequence set list received by the receiving unit is specifically: the data corresponding to the data operation sequence set list after the sequence 10 operation of the random machine 10 is merged.

The node according to claim 18 or 19, wherein the data corresponding to the data operation sequence set list sent by the target node received by the receiving unit is specifically: receiving the location sent by the target node And the data corresponding to the data operation sequence set list after the target node performs a vector merge operation on the data operation sequence set list.

The node according to claim 18, 19 or 22, wherein the data corresponding to the data operation sequence set list sent by the target node received by the receiving unit is specifically: receiving the target node to send The target node performs random 10 merging on the data operation sequence set list to data corresponding to the data operation sequence set list after the sequence 10 operation.

24. A node according to any of claims 18 to 23, characterized in that

The updating unit is further configured to update the data operation sequence set list received by the receiving unit to an operation record file of the node.

25. A node according to any of claims 18 to 24, characterized in that

The receiving unit is further configured to receive a log operation sequence list sent by the target node, and further configured to receive data corresponding to the log operation sequence list sent by the target node;

The updating unit is further configured to update data of the node according to the log operation sequence list and the data corresponding to the log operation sequence list.

26. A node, comprising:

27. The node of claim 26, further comprising:

a search generating unit, configured to search, according to the version value, a data operation sequence set whose version value in the operation record file of the target node is greater than a version value of the data operation sequence set of the local node, and select a related to the local node The set of data manipulation sequences generates a list of the set of data manipulation sequences.

The node according to claim 26 or 27, wherein the receiving unit is further configured to receive, by the local node, a list of data operation sequence sets after performing a vector combining operation on the data operation sequence set list sent by the local node. ;

The data corresponding to the data operation sequence set list sent by the sending unit is specifically: the data corresponding to the data operation sequence set list after the vector combining operation.

The node according to any one of claims 26 to 28, wherein the receiving unit is further configured to receive, after the local node sends the data operation sequence set list, the random 10 merge order 10 operation. a list of data manipulation sequence sets;

The data corresponding to the data operation sequence set list sent by the sending unit is specifically: the data corresponding to the data operation sequence set list after the random 10 merge order 10 operation.

The node according to claim 26 or 27, further comprising: a vector merging unit, configured to perform a vector merging operation on the data operation sequence set list; and the sending unit is configured to send, by the sending unit, the data corresponding to the data operation sequence set list to: The local node sends the data corresponding to the data operation sequence set list after performing the vector combining operation.

The node according to claim 26, 27 or 30, further comprising: a merging unit, configured to perform the random 10 merging sequence 10 operation on the data operation sequence set list;

And the sending unit is configured to send, to the local node, data corresponding to the data operation sequence set list after the data operation sequence set list pair.

32. A node according to any of claims 26 to 31, characterized in that

The receiving unit is further configured to receive a log data operation sequence request of the local node;

The sending unit is further configured to send, according to the log data operation sequence request, a log data operation sequence list to the local node, and send the log data operation sequence to the local node according to the log data operation sequence list. The data corresponding to the list.

33. A distributed storage data recovery system, comprising:

The target node is configured to receive a version value of the data operation sequence set of the local node, and send, according to the version value, a data operation sequence set list to the local node, and the data corresponding to the data operation sequence set list .