CN109976941B

CN109976941B - Data recovery method and device

Info

Publication number: CN109976941B
Application number: CN201711456084.XA
Authority: CN
Inventors: 陶维忠
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2017-12-28
Filing date: 2017-12-28
Publication date: 2022-12-13
Anticipated expiration: 2037-12-28
Also published as: CN109976941A

Abstract

The application discloses a data recovery method and device. Under the condition that a physical node fails, determining at least one replica data fragment associated with a master data fragment to be recovered on the physical node, determining the maximum value of the judgment factor parameter value of each replica data fragment, and under the condition that the difference value between the maximum value and the judgment factor parameter value of the master data fragment is smaller than a preset value, selecting the replica data fragment corresponding to the maximum value as a new master data fragment, and simultaneously indicating other replica data fragments in the at least one replica data fragment and the new master data fragment to perform log synchronization, so that logs of each replica data fragment and logs of the new master data fragment are kept synchronous, data recovery can be rapidly performed on the master data fragment on the failed physical node, and data loss is reduced.

Description

Data recovery method and device

Technical Field

The present invention relates to the field of databases, and in particular, to a data recovery method and apparatus.

Background

In a distributed database system, in order to improve the disaster tolerance of data fragmentation on physical nodes, multiple copies are usually set on multiple different physical nodes, a master and the copies need to be synchronized in a working process, and the existence of multiple copies is a necessary means for improving the reliability, availability, performance and expandability of a distributed database system. There are 3 ways for the industry to implement replication between the master and the replica: full synchronous replication, semi-synchronous replication, asynchronous replication. The adoption of full synchronous replication can ensure consistency among the replicas, but affects availability, greatly limits partition fault tolerance, and increases request processing delay. The semi-synchronous replication technology is relative to asynchronous replication, extra waiting delay is added when a single transaction is completed, replay is delayed relative to full synchronous replication, and a master point is frozen due to failure of a slave node or network exception between a master node and a slave node. Asynchrony is generally employed for highly latency-demanding distributed database systems. The asynchronous and synchronous mode sacrifices consistency among partial copies, and can cause a short-time and small amount of data loss in the event of failure, but has better availability and partition fault tolerance.

After a certain physical node in the physical node cluster fails, the metadata server switches the new master to other replica nodes according to the replica of the failed data fragment. In the asynchronous replication scene, because the message is sent without waiting for receiving and directly returned, if various reasons such as network delay, flash and the like exist, the progress among the replicas is possibly inconsistent. In the case of a failure of a physical node in a distributed database system, how to recover and synchronize data of data fragments on the physical node is a hot spot of current research.

Disclosure of Invention

The technical problem to be solved in the embodiments of the present invention is to provide a data recovery method in a distributed database system, so that when a physical node fails, data on the failed physical node can be quickly recovered, and data loss is reduced.

In a first aspect, a distributed database system includes a physical node cluster and a data recovery apparatus (hereinafter referred to as an apparatus for short), where the apparatus is configured to manage the physical node cluster, and the physical node cluster includes one or more physical nodes. In the event of a failure of a physical node in a cluster of physical nodes, the physical node is removed from the cluster of physical nodes. Determining m replica data fragments associated with a master data fragment on a physical node, wherein the m replica data fragments are respectively located on different physical nodes, and m is an integer greater than 0; determining the maximum value of the judgment factor parameter values corresponding to the m replica data fragments, wherein the judgment factor parameter values are obtained by weighted average of at least one factor parameter value, and at least one factor parameter value comprises a DCN (data change number); and judging whether the difference value between the judgment factor parameter value and the maximum value of the main data fragment is smaller than a preset value, if so, selecting the replica data fragment corresponding to the maximum value as a new main data fragment, and synchronizing other replica data fragments in the m symbol data fragments with the new main data fragment.

The judgment factor is obtained by weighted average of at least one factor parameter value, and the weight of each factor parameter value can be prestored or preconfigured. The master data fragment represents a data fragment in a master state, the replica data fragment represents a data fragment in a replica state, one master data fragment is associated with at least one replica data fragment, and the data fragment represents a set of service data, for example: different data fragments correspond to different types of service data. At least one main copy data fragment is deployed on one physical node, and one main copy data fragment and at least one associated copy data fragment can be located on different physical nodes, so that reliability is improved.

In the embodiment, when a physical node fails, at least one replica data fragment associated with a master data fragment to be recovered on the physical node is determined, a maximum value of a judgment factor parameter value of each replica data fragment is determined, and when a difference value between the maximum value and the judgment factor parameter value of the master data fragment is smaller than a preset value, the replica data fragment corresponding to the maximum value is selected as a latest master data fragment, and log synchronization is indicated between other replica data fragments in the at least one replica data fragment and a new master data fragment, so that logs of each replica data fragment and logs of the new master data fragment are kept synchronized, thereby rapidly performing data recovery on the master data fragment on the failed physical node, and reducing data loss.

In one possible design, further comprising: under the condition that the difference value between the judgment factor parameter value and the maximum value of the main copy data fragment is not smaller than a preset value, setting the m copy data fragments to be in a state of waiting for recovery;

under the condition that the physical node with the fault is recovered to be normal, the physical node is added into the physical node cluster again; discard m replica data slices, for example: deleting or freezing the m replica data slices. Recreating m new replica data fragments in the physical node cluster; the m new replica data fragments are located at different physical nodes, and the m physical nodes corresponding to the m new replica data fragments do not include the physical node with the fault.

In one possible design, further comprising:

and receiving at least one factor parameter value of the data fragment reported by each physical node in the physical node cluster through the heartbeat message.

For each physical node in the physical node cluster, the physical node periodically reports at least one factor parameter value of each deployed data fragment (a main data fragment and a duplicate data fragment) to the device, and the device receives and stores the at least one factor parameter value of each data fragment reported by the physical node.

In one possible design, the at least one factor parameter value further includes an impact business record number and/or an impact table number.

In one possible design, further comprising: sending an identification of the m symbolic data slices to an application client to inform the application client that the m replica data slices are unavailable.

In a second aspect, the present application provides a data recovery apparatus, comprising:

the processing unit is used for removing the physical node out of the physical node cluster under the condition that the physical node in the physical node cluster fails;

the processing unit is further configured to determine m replica data shards associated with the master data shard on the physical node; the m replica data fragments are respectively located at different physical nodes, and m is an integer greater than 0;

the processing unit is further configured to determine a maximum value of decision factor parameter values of the m replica data slices; the judgment factor parameter value is obtained by weighted average of at least one factor parameter value, and the at least one factor parameter value comprises a data change sequence number DCN;

a judging unit, configured to judge whether a difference between a decision factor parameter value of the master data segment and the maximum value is smaller than a preset value;

and the synchronization unit is used for selecting the replica data fragment corresponding to the maximum value as a new master data fragment and performing log synchronization on other replica data fragments in the m replica data fragments and the new master data fragment if the judgment result of the judgment unit is yes.

In a possible design, the processing unit is further configured to set the m pieces of replica data to a waiting recovery state if the determination result of the determining unit is negative;

the processing unit is further configured to add the physical node to the physical node cluster when the physical node returns to normal;

the processing unit is further configured to discard the m duplicate data fragments;

the processing unit is further configured to recreate m new replica data fragments on different physical nodes in the physical node cluster; wherein the m physical nodes corresponding to the m new replica data fragments do not include the physical node;

the synchronization unit is further configured to perform log synchronization on the m new replica data fragments and the master data fragment.

In one possible design, further comprising:

a receiving unit, configured to receive at least one factor parameter value of a data fragment reported by each physical node in the physical node cluster through a heartbeat message.

In one possible design, further comprising:

a sending unit, configured to send, to an application client, the identifier of the m duplicate data fragments, so as to notify the application client that the m duplicate data fragments are unavailable.

In a third aspect, an embodiment of the present invention provides a data recovery apparatus, where the data recovery apparatus includes: a receiver, a transmitter, a memory, and a processor; wherein the memory stores a set of program codes and the processor is configured to call the program codes stored in the memory to perform the method according to the first aspect or each possible implementation manner of the first aspect.

Based on the same inventive concept, as the principle and the beneficial effects of the apparatus for solving the problems can refer to the possible method embodiments in the first aspect and each and the beneficial effects brought by the method, the method can be implemented for the apparatus, and the repeated parts are not described again.

A further aspect of the present application provides a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to perform the method of the first aspect or each of the possible implementations of the first aspect.

A further aspect of the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect or each of the possible implementations of the first aspect.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present invention, the drawings required to be used in the embodiments or the background art of the present invention will be described below.

Fig. 1 is a schematic structural diagram of a distributed database system according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a data recovery method according to an embodiment of the present invention;

fig. 3 is another schematic flow chart of a data recovery method according to an embodiment of the present invention;

fig. 4 is a schematic diagram illustrating a principle of data recovery on a physical node according to an embodiment of the present invention;

FIG. 5 is another schematic diagram of data recovery at a physical node according to an embodiment of the present invention;

FIG. 6 is another schematic diagram of data recovery at a physical node according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a data recovery apparatus according to an embodiment of the present invention;

fig. 8 is another schematic structural diagram of a data recovery apparatus according to an embodiment of the present invention.

Detailed Description

The embodiments of the present invention will be described below with reference to the drawings.

Fig. 1 is a schematic structural diagram of a distributed database system according to an embodiment of the present invention, where the distributed database system includes multiple application clients, a data recovery device, and multiple physical nodes, and the data recovery device in this embodiment may be referred to as a metadata service subsystem, a management node, a management server, or other names. Each physical node is provided with a plurality of data fragments, the types of the data fragments in each physical node are a main data fragment and a duplicate data fragment, under normal conditions, one main data fragment can be associated with a plurality of duplicate data fragments, and the main data fragment and the associated duplicate data fragment are respectively deployed on different physical nodes so as to improve the disaster tolerance capability of the data. The application client is used for accessing the data fragments on the physical node, and includes but is not limited to personal computers, mobile terminals, personal digital assistants, vehicle-mounted devices or other devices and the like. The device for storing the data fragments in each physical node can be a storage array, and the storage array can communicate with the physical nodes through a Block IO interface. The physical nodes can be realized by a small computer, a Server of an X86 platform, a PC Server and the like.

The plurality of application clients and the data recovery device may communicate based on an IP protocol, such as a TCP IP protocol or a UDP protocol, and the data recovery device and the plurality of physical nodes may also communicate based on the IP protocol.

The data recovery device may be deployed independently of the physical nodes, or may be deployed in each physical node. The main functions of the data recovery device are as follows:

metadata management: and the method is responsible for defining and maintaining metadata information and relationships among metadata. For the embodiment of the invention, the data base level metadata such as a table structure, a data type and a DataBase control function, and the distributed metadata such as data fragmentation and physical node mapping relation are mainly defined. In the invention, the DCN of the data fragments on each physical node is added in the metadata.

And (3) route management: and the routing data (mapping relation between the data fragments and the physical nodes and calling metadata definition unit interfaces) of the business object is managed.

Master-slave management: when a physical node deploying master data fragments is abnormal, selecting proper replica data fragments as new master data fragments (based on the data fragments), and if the master/slave management decision is to wait for persistent recovery, setting all replica data fragments associated with the failed master data fragments to be in a state of waiting for recovery by the master/slave management; in the embodiment of the invention, a management mechanism is added for main/standby management according to metadata decision;

and (3) fault detection: and monitoring whether the physical nodes have faults or are abnormal (calling a communication service unit to acquire information of each physical node).

And (3) fault recovery: is responsible for failure-related processing and self-healing processing of the physical node. But also failure recovery of the metadata node (data recovery apparatus) itself.

Copy management: and defining and managing a primary copy copying relation of each data fragment, and taking charge of management and control of copying integrity (calling a metadata definition unit interface). The replication management unit issues a replication request to the source physical node agent, and the source physical node completes the replication of the service data to the target physical node. After the failure physical node actively reports, the method is responsible for the management of reactivating the failure master data fragment and recreating the duplicate data fragment;

metadata synchronization/persistence: a slave node (call communication service unit interface) responsible for synchronizing metadata definition information to the data node, the application client Driver and the metadata service subsystem (data recovery apparatus). When the metadata is changed, the change information needs to be notified to the network element. And the metadata is persisted, so that the high reliability of the data is ensured.

Communication service: belonging to the basic unit. And the network communication capability of peripheral network elements (physical nodes, application client Driver and the slave node of the metadata service subsystem) is taken charge of.

Referring to fig. 2, fig. 2 is a schematic flow chart of a data recovery method according to an embodiment of the present invention, where the method includes, but is not limited to, the following steps:

s201, the data recovery device detects that the physical node fails.

Specifically, the data recovery device is used for managing each physical node in a physical node cluster in a distributed database system, the physical node cluster includes at least one physical node, at least one data fragment is deployed on the physical node, the data fragment represents a set of service data on the physical node, the data fragment on the physical node is divided into a main data fragment and a duplicate data fragment according to backup needs, the main data fragment represents a data fragment in a main state, and the duplicate data fragment represents a data fragment in a duplicate state. One main data fragment is associated with at least one copy data fragment, and the main data fragment and the associated at least one copy data fragment are located in different physical nodes so as to improve the reliability of data. The physical node may include a plurality of master data fragments, and the embodiment of the present invention is described with respect to a recovery process of any one of the master data fragments.

The method for detecting the physical node failure by the data recovery device may be: the physical node periodically sends heartbeat messages to the data recovery device, and the data recovery device determines that the physical node fails under the condition that the data recovery device does not receive the heartbeat messages sent by the physical node within a preset time length.

S202, the data recovery device sends a removal instruction to the physical node, and the physical node receives the removal instruction from the data recovery device.

Specifically, the removal indication is used to indicate that the failed physical node is removed from the physical node cluster.

S203, the data recovery device determines the maximum value of the judgment factor parameter values of the m duplicate data fragments associated with the main data fragment on the physical node.

Specifically, the master data fragment is a data fragment of any master state on the physical node with the failure, the data recovery device obtains m replica data fragments associated with the master data fragment according to a pre-stored or pre-configured mapping relationship, then the data recovery device obtains respective decision factor parameter values of the m replica data fragments, the decision factor parameter values are obtained by weighted averaging of at least one factor parameter value, the at least one factor parameter value includes DCN, the weight of each factor may be pre-stored or pre-configured, and the specific value of the weight is not limited in this embodiment. The mapping relationship includes, but is not limited to, a mapping relationship between a data fragment and a physical node, and a mapping relationship between a master data fragment and a replica data fragment.

In a possible implementation manner, the data recovery device receives at least one factor parameter value of a data fragment reported by each physical node in the physical node cluster through a heartbeat message, and the data recovery device overwrites the received factor parameter value over a stored factor parameter value every time the data recovery device receives the factor parameter value, so as to ensure that the stored factor parameter value is the latest.

For example: the data recovery device receives the heartbeat message reported by the physical node, and covers the DCN of each data fragment of the physical node, the number of the influenced service records and the number of the influenced tables, which are carried in the heartbeat message, stored DCN of each data fragment of the physical node, the number of the influenced service records and the number of the influenced tables.

The physical node maintains a DCN for each data fragment, the DCN represents the sequence number of the latest log corresponding to the data fragment, the log represents the record of the operation process aiming at the data fragment, the DCN generally starts numbering from 1, a log is generated aiming at each transaction operation of the data fragment, and 1 is added on the basis of the current DCN of the data fragment to obtain the updated DCN.

The data recovery device determines the maximum value of the judgment factor parameter values of m duplicate data fragments associated with the main data fragment on the physical node according to the stored data.

S204, the data recovery device judges that the difference value between the judgment factor parameter value and the maximum value of the main data fragment is smaller than a preset value.

Specifically, the data recovery apparatus determines the value of the judgment factor parameter of the master data segment and calculates the difference between the two values of the maximum value determined in S203, for example: and subtracting the maximum value from the judgment factor parameter value of the main data fragment to obtain a difference value. In the case where the difference is smaller than the preset value, S204 is performed. The preset value is a pre-stored or pre-configured value, and the size of the specific value is not limited in this embodiment.

And S205, selecting the copy data corresponding to the maximum value and distributing the copy data as a new main data fragment.

Specifically, the data recovery device determines a replica data fragment corresponding to a maximum value in the m replica data allocations, takes the replica data fragment as a new master data fragment, establishes an association relationship between the new master data fragment and the m-1 replica data fragments, and stores the association relationship. Optionally, the data recovery apparatus may create 1 replica data fragment on a physical node other than the m physical nodes corresponding to the new data fragment and the m-1 replica data fragments, keep the created replica data fragment and the new master data fragment synchronized, keep the DCN of the created replica data fragment after synchronization consistent with the new master data fragment, and at this time, the new master data fragment is associated with the m replica data fragments.

S206, the data recovery device executes a data synchronization process.

Specifically, the other data shards except the new master data shard in the m copy data shards are m-1 copy data shards, log synchronization and log replay are performed between the m-1 copy data shards and the new master data shard, so that the data of the m-1 copy data shards and the data of the new master data shard are kept synchronized, and the DCN of the m-1 copy data shards is the same as the DCN of the new master data shard after synchronization.

Optionally, when the failed physical node recovers to normal, the data recovery apparatus adds the physical node into the physical node cluster, and notifies other network elements (for example, application clients) that the physical node is online, and the data recovery apparatus needs to delete the old master data fragment on the physical node at the same time.

By implementing the embodiment, under the condition that a physical node fails, at least one replica data fragment associated with a master data fragment to be recovered on the physical node is determined, the maximum value of the judgment factor parameter value of each replica data fragment is determined, and under the condition that the difference value between the maximum value and the judgment factor parameter value of the master data fragment is smaller than the preset value, the replica data fragment corresponding to the maximum value is selected as the latest master data fragment, and meanwhile, log synchronization is performed between other replica data fragments in the at least one replica data fragment and a new master data fragment, so that logs of each replica data fragment and logs of the new master data fragment are kept synchronized, data recovery can be rapidly performed on the master data fragment on the failed physical node, and data loss is reduced.

Referring to fig. 3, another flow diagram of a data recovery method according to an embodiment of the present invention is shown, where in the embodiment of the present invention, the method includes:

s301, the data recovery device detects that the physical node fails.

The specific process may refer to the description of S201 in fig. 2, and is not described herein again.

S302, the data recovery device sends a removal instruction to the physical node, and the physical node receives the removal instruction from the data recovery device.

The specific process may refer to the description of S202 in fig. 2, and is not described herein again.

S303, the data recovery device determines the maximum value of the judgment factor parameter values of the m duplicate data fragments associated with the main data fragment on the physical node.

The specific process may refer to S203 in fig. 2, which is not described herein again.

S304, the data recovery device judges that the difference value between the judgment factor parameter value and the maximum value of the main data fragment is not less than a preset value.

Specifically, the data recovery apparatus determines a difference between the value of the decision factor parameter of the master data segment and the maximum value determined in S303, for example: and subtracting the maximum value from the judgment factor parameter value of the main data fragment to obtain a difference value. In the case where the difference is not less than the preset value, S305 is performed. The preset value is a pre-stored or pre-configured value, and the size of the specific value is not limited in this embodiment.

S305, the data recovery device sets the m copy data fragments to be in a recovery waiting state.

Specifically, the data recovery device sets m replica data fragments associated with the master data fragment to a waiting recovery state, where the waiting recovery state indicates that the data fragments are unavailable and cannot normally respond to the service request of the application client. Optionally, the data recovery apparatus may send the identifier of the m pieces of duplicate data to the application client, so as to notify the application client that the m pieces of duplicate data are unavailable.

S306, the data recovery device detects that the physical node is recovered to be normal.

Specifically, under the condition that the physical node is restored to normal through manual restoration or other manners, the data restoration device may receive a heartbeat message periodically sent by the physical node, where the heartbeat message carries at least one factor parameter value of each data fragment on the physical node.

S307, the data recovery device sends the joining instruction to the physical node, and the physical node receives the joining instruction from the data recovery device.

S308, the data recovery device discards the m copy data fragments.

Specifically, the data recovery apparatus discards m replica data fragments, for example: deleting or freezing the m replica data slices.

S309, creating m new copy data fragments.

Specifically, the data recovery apparatus creates m new replica data fragments on different physical nodes, establishes an association relationship between the m new replica data fragments and the new master data fragments, and stores the association relationship. The m physical nodes corresponding to the m new replica data fragments do not include the failed physical node, for example: the m new replica data fragments are the same as the physical nodes of the m original replica data.

S310, the data recovery device executes a data synchronization process.

Specifically, the data recovery device performs log synchronization and log replay between the m new replica data fragments and the master data fragment, so that data between the m new replica data fragments and the master data fragment are kept synchronized, and the DCNs of the m new replica data fragments after synchronization are the same as the DCNs of the master data fragment.

By implementing the embodiment, when a physical node fails, at least one replica data fragment associated with a master data fragment to be recovered on the physical node is determined, a maximum value of a judgment factor parameter value of each replica data fragment is determined, and when a difference value between the maximum value and the judgment factor parameter value of the master data fragment is not smaller than a preset value, switching of the master data fragment is not performed, and data recovery is performed after the failed physical node is recovered, so that data loss can be reduced.

Referring to fig. 4, which is a schematic structural diagram of a physical node cluster according to an embodiment of the present invention, in the embodiment of the present invention, the physical cluster includes a physical node 1, a physical node 2, a physical node 3, and a physical node 4, and each physical node is provided with 6 data fragments. In the following description, a master data fragment is simply referred to as "master", a replica data fragment is simply referred to as "replica", shaded rectangular boxes in fig. 3 represent masters, and unshaded rectangular boxes represent replicas. It can be seen that the physical node 1 is deployed with a master 1, a master 2, a copy 3b, a copy 4b, a copy 5b, and a copy 6a, the physical node 2 is deployed with a master 3, a master 4, a copy 1a, a copy 6a, a copy 7a, and a copy 7b, the physical node 3 is deployed with a master 5, a master 6, a copy 1b, a copy 2a, a copy 3a, and a copy 8a, and the physical node 4 is deployed with a master 7, a master 8, a copy 2b, a copy 3b, a copy 4a, and a copy 5a. The master 1, the copy 1a and the copy 1b have an association relationship, the master 1, the copy 1a and the copy 1b are deployed on different physical nodes, DCN =1001 for the master 1, DCN =1000 for the copy 1a, and DCN =900 for the copy 1b. The master 2, the copy 2a and the copy 2b have an association relationship, the master 2, the copy 2a and the copy 2b are deployed on different physical nodes, DCN =801 of the master 2, DCN =700 of the copy 2a, and DCN =701 of the copy 2 b. The master 3, the copy 3a and the copy 3b have an association relationship, and the master 3, the copy 3a and the copy 3b are deployed on different physical nodes. The master 4, the copy 4a and the copy 4b have an association relationship, and the master 4, the copy 4a and the copy 4b are deployed on different physical nodes. The master 5, the copy 5a and the copy 5b have an association relationship, and the master 5, the copy 5a and the copy 5b are deployed on different physical nodes. The master 6, the copy 6a and the copy 6b have an association relationship, and the master 6, the copy 6a and the copy 6b are deployed on different physical nodes. The master 7, the copy 7a and the copy 7b have an association relationship, and the master 7, the copy 7a and the copy 7b are deployed on different physical nodes. The master 8, the copy 8a and the copy 8b have an association relationship, and the master 8, the copy 8a and the copy 8b are deployed on different physical nodes.

In this embodiment, the at least one factor parameter value includes a DCN, and the physical node 1, the physical node 2, the physical node 3, and the physical node 4 periodically send a heartbeat message carrying the DCN of each data fragment to the data recovery apparatus, and the data recovery apparatus stores the received DCN of each data fragment.

Referring to fig. 5, in the case that the data recovery apparatus fails to operate the physical node 1 in the physical node cluster according to the heartbeat mechanism, the data recovery apparatus removes the physical node 1 from the physical node cluster. The data recovery device determines that the master 1 and the master 2 are deployed in the physical node 1 according to the pre-stored mapping relationship, and starts a data recovery process of the master 1 and the master 2. The data recovery device queries the physical node where the master and the copy are located, the operating state and the DCN on the physical node 1 according to predefined metadata information. For example: the metadata information corresponding to the physical node 1 is shown in table 1:

TABLE 1

Data recovery process of the master 1 and the master 2: the data recovery device obtains DCN =1000 of copy 1a and DCN =1000 of copy 1b associated with master 1, the data recovery device determines that the maximum value between DCN of copy 1a and DCN of copy 1b is 1000, the data recovery device obtains DCN =1001 of master 1, the data recovery device compares the difference between DCN =1001 and 1000 of master 1 with 1, and if the pre-stored or pre-configured preset value is 5 and the calculated difference is 1 < 5, the data recovery device selects copy 1b as a new master (master 1 a), and then master 1a only associates copy 1b. The data recovery apparatus may create a copy 1a 'for the master 1a on the physical node 4, and perform log synchronization and log replay on the copy 1 a' and the copy 1b by the data recovery apparatus, so that the data of the copy 1a 'and the copy 1b are consistent with the master 1a, and after synchronization is completed, the DCN of the copy 1 a' and the DCN of the copy 1b are the same as that of the master 1a, which are both 1000.

The data recovery apparatus acquires DCN =700 of the copy 2a associated with the master 2, DCN =701 of the copy 2b, the maximum value of the two DCNs is 701, the data recovery apparatus acquires DCN =801 of the master 2, the data recovery apparatus compares the difference between DCN =801 and 701 of the master 2 with 100, and assuming that the preconfigured or prestored preset value is 5, the calculated difference is 100 > 5, and the data recovery apparatus sets the copy 2a and the copy 2b to be in a recovery waiting state (see the oblique line box in fig. 4).

Referring to fig. 6, the data recovery apparatus detects that the physical node 1 is to be recovered to a normal state, the data recovery apparatus adds the physical node 1 to the physical node cluster, and the data recovery apparatus deletes or freezes the data segment 1 in the physical node 1. In addition, the data recovery apparatus deletes or freezes the copy 2a and the copy 2b associated with the master 2, and the data recovery apparatus recreates 1 copy 2a 'in the physical node 3 and 1 copy 2 b' in the physical node 4, so as to establish an association relationship between the master 2 and the copies 2a 'and 2 b'. The data recovery apparatus performs log synchronization and log recovery on the copy 2a 'and the copy 2 b' and the master 1, so that the data of the copy 2a 'and the copy 2 b' are kept synchronized with the data of the master 1, and after synchronization is completed, the DCNs of the copy 2a 'and the copy 2 b' are the same as those of the master 1, and are both 801.

In the method described in fig. 3 to 6, when a physical node fails, at least one replica data fragment associated with a master data fragment to be recovered on the physical node is determined, a maximum value of a decision factor parameter value of each of the at least one replica data fragment is determined, and when a difference between the maximum value and the decision factor parameter value of the master data fragment is not less than a preset value, switching of the master data fragment is not performed, and data recovery is performed after the physical node that has failed recovers, so that data loss can be reduced.

The method of embodiments of the present invention is set forth above in detail and the apparatus of embodiments of the present invention is provided below.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a data recovery apparatus according to an embodiment of the present invention, where the data recovery apparatus may include a processing unit 701, a determining unit 702, and a synchronizing unit 703, where details of each unit are described below.

A processing unit 701, configured to remove a physical node from a physical node cluster when the physical node in the physical node cluster fails.

The processing unit 701 is further configured to determine m replica data fragments associated with the master data fragment on the physical node; the m replica data fragments are respectively located at different physical nodes, and m is an integer greater than 0.

The processing unit 701 is further configured to determine a maximum value of decision factor parameter values of the m replica data slices; the judgment factor parameter value is obtained by weighted average of at least one factor parameter value, and the at least one factor parameter value comprises a data change sequence number DCN.

A determining unit 702, configured to determine whether a difference between a decision factor parameter value of the master data segment and the maximum value is smaller than a preset value;

a synchronizing unit 703, configured to select, if the determination result of the determining unit is yes, the replica data segment corresponding to the maximum value as a new master data segment, and perform log synchronization on other replica data segments in the m replica data segments and the new master data segment.

Optionally, the processing unit 701 is further configured to set the m pieces of replica data to a state of waiting for recovery if the determination result of the determining unit is negative;

the processing unit 701 is further configured to add the physical node to the physical node cluster when the physical node returns to normal;

the processing unit 701 is further configured to discard the m pieces of replica data;

the processing unit 701 is further configured to recreate m new replica data fragments on different physical nodes in the physical node cluster; wherein the m physical nodes corresponding to the m new replica data fragments do not include the physical node;

the synchronization unit 703 is further configured to perform log synchronization on the m new replica data fragments and the master data fragment.

Optionally, the data recovery apparatus 7 further includes:

and the receiving unit is used for receiving at least one factor parameter value of the data fragment reported by each physical node in the physical node cluster through the heartbeat message.

Optionally, the at least one factor parameter value further comprises a number of impact service records and/or a number of impact tables.

Optionally, the data recovery apparatus 7 further includes:

The embodiment of the present invention and the method embodiments of fig. 2 and 3 are based on the same concept, and the technical effects brought by the embodiment are also the same, and the specific process can refer to the description of the method embodiments of fig. 2 and 3, and will not be described again here.

The data recovery device 7 may also be a field-programmable gate array (FPGA), an application-specific integrated chip (asic), a system on chip (SoC), a Central Processing Unit (CPU), a Network Processor (NP), a digital signal processing circuit, a Micro Controller Unit (MCU), or a Programmable Logic Device (PLD) or other integrated chips.

Referring to fig. 8, fig. 8 is a data recovery apparatus 8, hereinafter referred to as the apparatus 8 for short, according to an embodiment of the present invention, where the apparatus 8 includes a processor 801, a memory 802, a receiver 803, and a transmitter 804, and the processor 801, the memory 802, the receiver 803, and the transmitter 804 are connected to each other through a bus.

The Memory 802 includes, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), or a portable Read-Only Memory (CD-ROM), and the Memory 802 is used for related instructions and data. The transceiver 803 is used for receiving and transmitting data.

The processor 801 may be one or more Central Processing Units (CPU), and in the case that the processor 801 is a CPU, the CPU may be a single-core CPU or a multi-core CPU.

The processor 801 in the apparatus 8 is configured to read the program code stored in the memory 802, and perform the following operations:

under the condition that a physical node in a physical node cluster fails, removing the physical node out of the physical node cluster;

determining the maximum value of the respective judgment factor parameter values of the m replica data fragments associated with the master data fragment on the physical node; the judgment factor parameter value is obtained by weighted average of at least one factor parameter value, and the at least one factor parameter value comprises a data change sequence number DCN;

judging whether the difference value between the judgment factor parameter value of the main data fragment and the maximum value is smaller than a preset value or not;

if so, selecting the replica data fragment corresponding to the maximum value as a new master data fragment, and performing log synchronization on other replica data fragments in the m replica data fragments and the new master data fragment.

Optionally, the processor 801 is further configured to:

if the difference value is not smaller than a preset value, setting the m replica data fragments to be in a state of waiting for recovery;

adding the physical node into the physical node cluster under the condition that the physical node is recovered to be normal;

discarding the m replica data slices;

recreating m new replica data fragments on different physical nodes in the physical node cluster; wherein the m physical nodes corresponding to the m new replica data fragments do not include the physical node;

and performing log synchronization on the m new copy data fragments and the master data fragment.

Optionally, the receiver 803 is configured to receive at least one factor parameter value of a data fragment reported by each physical node in the physical node cluster through a heartbeat message.

Optionally, the transmitter 804 is configured to transmit an identifier of the m duplicate data pieces to the application client, so as to notify the application client that the m duplicate data pieces are not available.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.

One of ordinary skill in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the above method embodiments. And the aforementioned storage medium includes: various media capable of storing program codes, such as ROM or RAM, magnetic or optical disks, etc.

Claims

1. A method for data recovery, comprising:

removing a physical node from a physical node cluster under the condition that the physical node in the physical node cluster fails;

determining the maximum value of the respective judgment factor parameter values of m replica data fragments associated with the main data fragment on the physical node; the judgment factor parameter value is obtained by weighted average of at least one factor parameter value, wherein the at least one factor parameter value comprises a data change sequence number DCN, and m is an integer greater than 0;

if so, selecting the replica data fragment corresponding to the maximum value as a new master data fragment, and performing log synchronization on other replica data fragments in the m replica data fragments and the new master data fragment;

if the difference value is not less than a preset value, setting the m replica data fragments to be in a state of waiting for recovery, adding the physical node into the physical node cluster under the condition that the physical node recovers to be normal, discarding the m replica data fragments, recreating m new replica data fragments on different physical nodes in the physical node cluster, and performing log synchronization on the m new replica data fragments and the main data fragment; wherein the m physical nodes corresponding to the m new replica data fragments do not include the physical node.

2. The method of claim 1, further comprising:

3. The method of claim 2, wherein the at least one factor parameter value further comprises an impact business record number and/or an impact table number.

4. The method of claim 1, further comprising:

sending the identification of the m replica data fragments to an application client to inform the application client that the m replica data fragments are unavailable.

5. A data recovery apparatus, comprising:

the processing unit is further configured to determine a maximum value among the decision factor parameter values of each of the m replica data slices; the judgment factor parameter value is obtained by weighted average of at least one factor parameter value, and the at least one factor parameter value comprises a data change sequence number DCN;

a synchronization unit, configured to select the replica data segment corresponding to the maximum value as a new master data segment if the determination result of the determination unit is yes, and perform log synchronization on other replica data segments in the m replica data segments and the new master data segment;

the processing unit is further configured to set the m replica data fragments to a state of waiting for recovery if the difference is not smaller than a preset value, add the physical node to the physical node cluster when the physical node recovers to normal, discard the m replica data fragments, and recreate m new replica data fragments on different physical nodes in the physical node cluster, where m physical nodes corresponding to the m new replica data fragments do not include the physical node; the synchronization unit is further configured to perform log synchronization on the m new replica data fragments and the master data fragment.

6. The apparatus of claim 5, further comprising:

7. The apparatus of claim 6, wherein the at least one factor parameter value further comprises a number of impact service records and/or a number of impact tables.

8. The apparatus of claim 5, further comprising: