CN114546724A

CN114546724A - Two-center deployed data center level disaster recovery method and system

Info

Publication number: CN114546724A
Application number: CN202210161220.7A
Authority: CN
Inventors: 朱林浩
Original assignee: Shandong Inspur Scientific Research Institute Co Ltd
Current assignee: Shanghai Yunxi Technology Co ltd
Priority date: 2022-02-22
Filing date: 2022-02-22
Publication date: 2022-05-27

Abstract

The invention particularly relates to a data center level disaster recovery method and system deployed in two centers. The two-center deployed data center level disaster recovery method and the two-center deployed data center level disaster recovery system are characterized in that at least 2 nodes are deployed in a main center, 3 nodes are deployed in a secondary center, one copy is deployed in the secondary center in a strong synchronization mode, a plurality of copies are deployed in the main center, and leader preference is configured; when the main center breaks down, the copy of the main center is forcibly degraded into a non-voting copy, the copy of the secondary center is selected as a leader, and the copy is supplemented at a vacant node of the secondary center, so that the usability of the secondary center is recovered; after the main center fault is repaired, the main center nodes are restarted one by one and rejoin the cluster, or all the main center nodes are restarted and rejoin the cluster together. The data center level disaster recovery method and system deployed by the two centers not only improve the disaster recovery capability of the distributed consistency algorithm and reduce the deployment cost, but also can reduce the RPO of the main center fault to 0 by deploying the secondary center copy in a strong synchronization mode.

Description

Two-center deployed data center level disaster recovery method and system

Technical Field

The invention relates to the technical field of distributed systems, in particular to a data center level disaster recovery method and system deployed by two centers.

Background

The Raft algorithm is a distributed consistency algorithm, a plurality of copies redundantly store the same copy of data, and data can be read and written as long as more than half of the copies survive. In the etcd implementation version of the raft algorithm, logs are generated by writing user data and configuration changes, the logs can take effect only after being submitted (the configuration changes modify metadata), and the submission of the logs needs to obtain the vote of more than half of voting copies in the cluster. And selecting a leader from the voting copies to process the read-write request, and using other copies as the follower. The way of synchronizing data with a lagging copy is divided into two ways: if the leader can also find the last log of the lag copy, the data is directly synchronized with the leader in a log adding mode; otherwise the snapshot will be sent to it. In contrast, the cost of sending a snapshot is significantly higher. The non-voting copies corresponding to the voting copies, without voting authority, will not affect the log submission but will still synchronize the data from the leader. The non-voting copy may be promoted to the voting copy, whereas the voting copy may be demoted to the non-voting copy. Unless otherwise specified, the following copies are all voting copies.

In a distributed system, the technique of managing in independent raft clusters for different data slices is called multi-raft.

The existing distributed system with single cluster and two-center deployment can cause the system to be unavailable when any data center fails. To achieve disaster recovery capability at the data center level, at least three-center deployment is required.

In order to break through the above limitations, the present invention provides a data center level disaster recovery method and system deployed in two centers.

Disclosure of Invention

In order to make up for the defects of the prior art, the invention provides a simple and efficient data center level disaster recovery method and system deployed by two centers.

The invention is realized by the following technical scheme:

a data center level disaster recovery method deployed in two centers is characterized in that: based on a raft distributed consistency algorithm, at least 2 nodes are deployed in a main center, 3 nodes are deployed in a secondary center, one copy is distributed in the secondary center, strong synchronization characteristics are added, and the rest copies are distributed in the main center and configured with leader preference;

when the main center fails, forcibly degrading the copy of the main center into a non-voting copy, selecting the copy of the secondary center as a leader, and supplementing the copy at the spare node of the secondary center to restore the usability of the secondary center;

and after the main center fault is repaired, the main center nodes are restarted one by one and rejoin the cluster, or all the main center nodes are restarted and rejoin the cluster together.

To avoid sending snapshots for replica moves, when the primary center restarts, the secondary center is prohibited from moving replicas to the primary center when taking over.

In order to avoid the strong synchronization property from slowing down the log submitting speed, after the secondary center takes over and supplements the copies, the strong synchronization configuration of the secondary center is cancelled until the primary center recovers.

At this time, because the actual number of the voting copies is less than the configured number, the risk of single-point failure is avoided, and when the main center fails, the copies are automatically supplemented at the vacant nodes of the secondary center based on the original copies of the secondary center, so that the raft cluster is restored to the multi-copy state again.

If data writing occurs in the stage of the current authority of the secondary center, in order to avoid sending a snapshot to the primary center copy by the secondary center copy after the primary center node is restarted, a log of the secondary center copy is reserved, and data synchronization is completed by the primary center in a log adding mode after the primary center is restarted.

When the main central nodes are restored one by one, the main central nodes restart, synchronize logs one by one and upgrade the logs into voting copies; and when one main center node is started, all normal nodes are connected by manually executing commands, all the copies on the started main center node are upgraded into voting copies again, a leader copy is set according to the original leader preference configuration, and redundant copies supplemented in spare nodes of the secondary center are deleted.

When the main center is recovered, the main center copy is started temporarily with a non-voting copy identity, the logs are synchronized from the secondary center, the secondary center node is connected to execute a command, the copy of the main center is upgraded to a voting copy, a leader copy is set according to the original leader preference configuration, and redundant copies supplemented in the spare nodes of the secondary center are deleted.

The invention discloses a system of a data center level disaster recovery method based on two-center deployment, which is characterized in that: the system comprises a main center, a secondary center, a node management module and a copy management module; the system comprises a main center, a secondary center and a plurality of receivers, wherein the main center is provided with at least 2 nodes, the secondary center is provided with 3 nodes, the secondary center is provided with a copy and is added with strong synchronization characteristics, and the other copies are distributed in the main center and are provided with leader preferences in the main center;

the copy management module is responsible for forcibly degrading the copy of the main center into a non-voting copy when the main center fails, selecting the copy of the secondary center as a leader, and supplementing the copy at a vacant node of the secondary center to restore the usability of the secondary center;

and the system is also responsible for upgrading all the copies on the started main center node into voting copies again after the fault of the main center is repaired, setting leader copies according to the original leader preference configuration, and deleting redundant copies supplemented in spare nodes of the secondary center.

And the node management module is responsible for restarting the main central nodes one by one and rejoining the cluster after the main central nodes are repaired, or rejoining the cluster after all the main central nodes are restarted.

The invention has the beneficial effects that: according to the data center level disaster Recovery method and system deployed by two centers, the data center level disaster Recovery capability is realized by only using two data centers, the disaster Recovery capability of a distributed consistency algorithm is improved, the deployment cost is reduced, meanwhile, a secondary center copy is deployed in a strong synchronization mode, and the RPO (Recovery Point Objective) of a main center fault can be reduced to 0.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic diagram of a data center level disaster recovery method deployed by each 3 nodes in two centers according to the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the embodiment of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The distributed system is deployed in two data centers, wherein a center which has more than half of copies and is configured with leader preference is called a Primary center (Primary), and the other center is called a Secondary center (Secondary); the primary center is at least provided with 2 nodes, and the secondary center is at least provided with 1 node.

The data center level disaster recovery method deployed by the two centers is based on a raft distributed consistency algorithm, and suggests that at least 2 nodes are deployed in a main center, 3 nodes are deployed in a secondary center, one copy is distributed in the secondary center at the same time, and a strong synchronization characteristic is added, and the rest copies are distributed in the main center and configured with leader preference;

If 3 copies are deployed, the main center has at most 2 copies, and the secondary center has at most 3 copies, so that the main center deploys 2 nodes, and the secondary center deploys 3 nodes, which is a reasonable deployment mode. If desired, the secondary hub can deploy more nodes without concern. If the main center also deploys 3 nodes or even more nodes, when the main center is restarted, copies can be moved from the secondary center to the main center based on the configuration of 'multiple copies of the main center, one copy of the secondary center', and the copies need to send snapshots, which will cause unnecessary network transmission.

In order to avoid sending snapshots for the above-mentioned copy movement, the movement of the copy to the primary center should be specifically prohibited when the primary center restarts but has not yet recovered (still belonging to the secondary center takeover period).

For a 3-copy raft cluster, 2 copies are placed at the nodes of the primary hub and 1 copy is placed at the nodes of the secondary hub. For more node numbers and copy numbers, the secondary center is recommended to place 1 copy, and the primary center places the rest copies. If the distributed system is multi-raft, different center primary and secondary relations can be arranged for different data fragments according to requirements, so that the utilization rate of the two data centers is improved, and the double-active effect is realized.

Because the network delay between the copies of the main center is extremely small, the logs can easily achieve the majority of votes in the main center and submit the votes, and the data writing of the secondary center has certain delay. When the primary center fails and is switched to the secondary center, if the effect that the data is not lost at all (that is, RPO is 0) is to be achieved, a strong synchronization attribute needs to be configured for the copy of the secondary center, so that the log submission must obtain the vote approved by the copy of the secondary center, the data is synchronously written into the two data centers, and the requirement on the network condition is high.

When the secondary hub encounters a data center level failure, availability is not affected because more than half of the copies of the raft cluster remain alive. If the copy of the secondary center is configured with the strong synchronization attribute, the copy is handled by a fault handling mechanism of the strong synchronization copy.

When the primary center encounters a data center level failure, the raft cluster survives less than half of the copies and is in an unavailable state. And manually executing a command to connect the nodes of the secondary center, and forcibly degrading the copy of the primary center into a non-voting copy (the forced degradation only modifies the copy role attribute in the memory, a log is not generated, and the log submission is not needed). The number of voting copies of the raft cluster in the secondary center is reduced to 1, which is equivalent to a single copy cluster, and then the copy in the secondary center can be directly selected as a leader. Since the forced copy downgrading does not change the information of the raft cluster in the metadata, after the inconsistency between the quantities of the raft voting copy and the non-voting copy and the metadata information is detected, complete configuration change is needed, and the main center copy is formally downgraded to the non-voting copy. At this point the raft cluster is completely changed from three copies to a single copy and availability has been restored.

Normally, committed logs will be cleaned up quickly. If data writing occurs in the stage of the current authority of the secondary center, in order to avoid sending a snapshot to the primary center copy by the secondary center copy after the primary center node is restarted, a log of the secondary center copy is reserved, and data synchronization is completed by the primary center in a log adding mode after the primary center is restarted.

The true difficulty of restarting the main center node is that the copy on the main center node does not know that the copy is degraded into a non-voting copy, and if the nodes of the main center are all started at the same time, the copy of the main center forms most of a raft cluster, and reaches consensus, and split with the copy of the secondary center occurs (which means that the main center and the secondary center respectively have a leader and an abnormal state of a raft cluster).

When the main central nodes are restored one by one, the main central nodes restart, synchronize logs one by one and upgrade the logs into voting copies; and when one main center node is started, all normal nodes (including secondary center nodes and nodes which are successfully recovered by the main center) are connected by manually executing a command, and all the copies on the started main center node are upgraded into voting copies again. The main center restarts only one node each time, the copy of the node can not obtain enough votes and is selected as a leader, but is used as a follower to add a log from the secondary center copy, and the configuration change is degraded into a non-voting copy; after the command is manually executed, the main center copy is upgraded into a voting copy (the upgrading process is similar to the downgrading process, the raft role is forcibly upgraded, complete configuration change is carried out again after the inconsistency between the quantity of the raft voting copy and the metadata information is detected), and a leader copy is set according to the original leader preference configuration. And at the moment, the actual number of the voting copies exceeds the configured number, and redundant copies supplemented in the spare nodes in the secondary center are deleted.

When the main center is completely recovered, the main center copy is temporarily started by a non-voting copy identity, the secondary center synchronizes the logs, and then the secondary center node is connected to execute a command, so that the copy of the main center is completely upgraded into a voting copy: directly starting all nodes of the main center, temporarily restarting the copy on the main center in the identity of the non-voting copy by adding a starting parameter asLearner, avoiding the occurrence of split brain because the non-voting copy has no election right, and synchronizing the latest log from the copy of the secondary center; and connecting the secondary center node to execute a command to upgrade all the copies of the primary center into voting copies (including two parts of forced upgrading and configuration changing) so that the recovery of the primary center is completed and redundant copies supplemented in the spare secondary center nodes are deleted.

The system of the data center level disaster recovery method based on the two-center deployment comprises a primary center, a secondary center, a node management module and a copy management module; the system comprises a main center, a secondary center and a plurality of receivers, wherein the main center is provided with at least 2 nodes, the secondary center is provided with 3 nodes, the secondary center is provided with a copy and is added with strong synchronization characteristics, and the other copies are distributed in the main center and are provided with leader preferences in the main center;

The above-described embodiment is only one specific embodiment of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.

Claims

1. A data center level disaster recovery method deployed in two centers is characterized in that: based on a raft distributed consistency algorithm, deploying at least 2 nodes in a main center, deploying 3 nodes in a secondary center, simultaneously distributing one copy in the secondary center, adding strong synchronization characteristics, distributing the rest copies in the main center, and configuring leader preference;

2. The two-center deployed data center level disaster recovery method according to claim 1, wherein: to avoid sending snapshots for replica moves, when the primary center restarts, the secondary center is prohibited from moving replicas to the primary center when taking over.

3. The two-center deployed data center level disaster recovery method according to claim 1, wherein: in order to avoid the strong synchronization property from slowing down the log submitting speed, after the secondary center takes over and supplements the copies, the strong synchronization configuration of the secondary center is cancelled until the primary center recovers.

4. The two-center deployed data center level disaster recovery method according to claim 1, wherein: when the main center fails, the copies are automatically supplemented at the vacant nodes of the secondary center based on the original copies of the secondary center, so that the raft cluster is restored to a multi-copy state again.

5. The two-center deployed data center level disaster recovery method according to claim 1, wherein: if data writing occurs in the stage of the current authority of the secondary center, in order to avoid sending a snapshot to the primary center copy by the secondary center copy after the primary center node is restarted, a log of the secondary center copy is reserved, and data synchronization is completed by the primary center in a log adding mode after the primary center is restarted.

6. The two-center deployed data center level disaster recovery method according to claim 4, wherein: when the main central nodes are restored one by one, the main central nodes restart, synchronize logs one by one and upgrade the logs into voting copies; and when one main center node is started, all normal nodes are connected by manually executing commands, all the copies on the started main center node are upgraded into voting copies again, a leader copy is set according to the original leader preference configuration, and redundant copies supplemented in spare nodes of the secondary center are deleted.

7. The two-center deployed data center level disaster recovery method according to claim 4, wherein: when the main center is recovered, the main center copy is started temporarily with a non-voting copy identity, the secondary center synchronizes the logs firstly, then the secondary center node is connected to execute commands, the copies of the main center are all upgraded into voting copies, the leader copies are set according to the original leader preference configuration, and redundant copies supplemented in the spare nodes of the secondary center are deleted.

8. The system of two-center deployed data center level disaster recovery methods according to claims 1-7, wherein: the system comprises a main center, a secondary center, a node management module and a copy management module; the system comprises a main center, a secondary center and a plurality of receivers, wherein the main center is provided with at least 2 nodes, the secondary center is provided with 3 nodes, the secondary center is distributed with one copy and is added with strong synchronization characteristics, the other copies are distributed in the main center, and the main center is provided with leader preference;