WO2016050074A1

WO2016050074A1 - Cluster split brain processing method and apparatus

Info

Publication number: WO2016050074A1
Application number: PCT/CN2015/079096
Authority: WO
Inventors: 胡智江
Original assignee: 中兴通讯股份有限公司
Priority date: 2014-09-29
Filing date: 2015-05-15
Publication date: 2016-04-07
Also published as: CN105450717A

Abstract

A cluster split brain processing method and apparatus relate to the field of computer application. When a cluster is split into a plurality of subsets due to the occurrence of split brain, the only subset allowing service continuation is chosen; and nodes in the subsets, except the only subset allowing service continuation, are controlled to stop working.

Description

Cluster brain splitting processing method and device

Technical field

This paper relates to the field of computer applications, and in particular to a cluster brain splitting method and apparatus.

Background technique

High availability clusters are server clustering technologies designed to reduce service downtime. The node that is running the service is called the primary machine. A node that is not running the service, but may subsequently take over the service running on the primary machine is called a standby machine. When the main machine fails, the standby machine will take over and continue to run the service to achieve the effect of providing continuous service.

The inter-node interconnection network is called a heartbeat line. Through the heartbeat line, each node in the cluster can communicate with any other node. Through the communication protocol, it can also know which nodes in the current cluster (the module that provides the communication function below is called "heartbeat communication module"). Once a node finds that there is a problem communicating with another node, there may be a heartbeat failure or a failure of the peer node. In summary, the cluster may split into multiple subsets. The industry calls this situation "brain cracking." When a node in a subset cannot understand the reason why other subsets lose contact, it can't guess the reason, and can't decide whether to run the service based on guessing (the module that controls or starts the service is called "service control logic module" below) ), otherwise the cluster may have problems with losing primary or multi-master.

Take two nodes A and B to form a cluster. For example, a service is running on A and B is used as a backup machine. When Node B finds that it cannot communicate with A, if it guesses that it is a network failure, then B will keep the standby role unchanged. However, if it is actually an A node failure, the cluster will lose its main use and the upper application will not continue to run. Conversely, if Node B guesses that Node A is faulty, then B will take over from A to run the service. But if it is only a network failure, A is still running normally, then there are two main machines A and B in the cluster. The situation of multiple active machines is also a cluster that needs to be avoided, because multiple active machines compete for resources with each other, and in severe cases, data may be destroyed.

In summary, there are problems in how to continue to control the cluster when a cluster has a brain split.

Summary of the invention

This paper provides a clustering method and device for brain splitting, which solves the problem of post-brain cracking control.

A cluster brain splitting method includes:

When a cluster splits into multiple subsets, a subset that is uniquely allowed to continue to be serviced is selected from the plurality of subsets;

Controls the nodes in other subsets except the subset that is allowed to continue to serve to stop working.

Optionally, when the cluster splits into multiple subsets, selecting a subset from the plurality of subsets that is allowed to continue to serve includes:

Select the subset of the main nodes before the occurrence of the brain splitting as the main subset expected;

Selecting a subset of nodes that is greater than half the number of cluster nodes before the occurrence of cerebral rupture as the only large subset;

From the desired primary subset and the unique large subset, select a subset that is uniquely allowed to continue service.

Optionally, the method further includes:

When the cluster is initialized, a disk space is opened on the shared medium as a decision disk, and the decision disk is partitioned, and each node in the cluster is uniquely corresponding to a partition of the decision disk;

Each node in the cluster writes a current timestamp to a corresponding partition in the decision disk through a disk input/output I/O operation;

One of the nodes whose number of times of updating the time stamp is greater than the threshold in a time range is selected as the primary node.

Optionally, the method further includes:

Each node in the cluster broadcasts or multicasts a KeepAlive message through an additional Ethernet period under normal conditions without brain splitting;

One of the nodes that issued the KeepAlive message for a number of times greater than the threshold within a time range is selected as the primary node.

Optionally, after the step of selecting a subset of the primary nodes before the occurrence of the brain splitting is a desired primary subset, the method further includes:

A representative node is assigned from the desired primary subset, instructing the representative node to notify all nodes of each of the subsets other than the desired primary subset to stop working after the first delay time.

Optionally, selecting a subset that is uniquely allowed to continue from the desired primary subset and the only large subset includes:

When there is no unique large subset, the desired primary subset is selected as the only subset that is allowed to continue the service;

When the desired primary subset is the same subset as the unique large subset, the subset is used as the only subset that allows for continued service;

When the desired primary subset and the unique large subset are different subsets, the unique large subset is used as the only subset that allows for continued service.

Optionally, after the step of selecting a subset of the number of nodes that is greater than half of the number of cluster nodes before the occurrence of the brain split as the only large subset, the method further includes:

Selecting a node from the only large subset as the only large subset representative;

Instructing the unique large subset representative to determine that the unique large subset is different from the expected primary subset, and after zero delay or the second delay time, notify the other than the only large subset All nodes of the other subsets stop working, and the second delay time is less than the first delay time.

Optionally, the method further includes:

When the node detects that the heartbeat communication is interrupted, the communication between the underlying heartbeat communication of the node and the upper layer service control logic is interrupted, and after reaching the first time length, the occurrence of brain splitting is determined, and the underlying heartbeat communication and the upper layer service control logic are restored. Communication between.

Optionally, when the only large subset is used as the only subset that allows to continue the service, the method further includes:

Electing a node from the only large subset as the new primary node, electing the new master The second time length is less than the first time length from the time when the node is determined to have a brain splitting to the second time length.

Optionally, the method further includes:

The current cluster member list, the number of members, and the cluster member change notification information are maintained at each node of the cluster.

A cluster splitting device includes:

The service subset selection module is further configured to: when the cluster splits into a plurality of subsets, select a subset from the plurality of subsets that is allowed to continue to serve;

The node shutdown control module is configured to: control the nodes in the other subsets except the subset that is allowed to continue to serve to stop working.

Optionally, the continuing service subset selection module includes:

The main subset selection unit is expected to be set to: select a subset of the main nodes before the occurrence of the brain splitting as a desired main subset;

The only large subset selection unit is set to: select the subset whose number of nodes is greater than half the number of cluster nodes before the occurrence of the brain split as the only large subset;

The continuation service subset selection unit is configured to select a subset that is uniquely allowed to continue from the desired primary subset and the unique large subset.

Optionally, the continuing service subset selection module further includes:

a representative node selecting unit, configured to: assign a representative node from the desired primary subset, instructing the representative node to notify all nodes of each of the subsets except the expected primary subset after the first delay time stop working.

Optionally, the continuing service subset selection unit includes:

a first selection sub-unit, configured to: when the unique large subset does not exist, select the desired primary subset as a subset that is uniquely allowed to continue serving;

a second selection subunit, configured to: in the desired primary subset and the only large subset The same subset, with this subset as the only subset that is allowed to continue the service;

And a third selection subunit, configured to: when the expected primary subset and the unique large subset are different subsets, use the unique large subset as the only subset that allows to continue the service.

Optionally, the continuing service subset selection module further includes:

The only large subset represents a selection unit, configured to: select a node from the unique large subset as a unique large subset representation, indicating that the unique large subset represents determining the unique large subset and the desired primary child When the sets are different subsets, after zero delay or the second delay time, all nodes of the subset other than the unique large subset are notified to stop working, and the second delay time is less than the first delay time. .

Optionally, the device further includes:

The internal communication management module is configured to: when the node detects that the heartbeat communication is interrupted, interrupt communication between the bottom heartbeat communication of the node and the upper service control logic, and after reaching the first time length, determine the occurrence of brain splitting, and restore the Communication between the underlying heartbeat communication and the upper layer service control logic.

Optionally, when the only large subset is used as the only subset that allows to continue the service, the apparatus further includes:

The primary node election module is configured to: elect a node from the unique large subset as a new primary node, and the election of the new primary node takes time from the time when the brain split is determined to the second time length, the first The second time length is less than the first time length.

Optionally, the device further includes:

The storage module is set to maintain the current cluster member list, the number of members, and the cluster member change notification information.

A computer readable storage medium storing program instructions that are implemented when the program instructions are executed.

This paper provides a method and device for processing a cluster splitting. When a cluster has a brain split, the only subset of the cluster that is allowed to continue to serve is selected, and the control is controlled except for the only subset that allows the service to continue. Nodes in other subsets stop working. The orderly management of the cluster under the condition of cluster splitting is realized, and the control problem after cluster splitting is solved.

BRIEF abstract

1 is a schematic diagram of a cluster splitting processing system according to Embodiment 1 of the present invention;

2 is a schematic diagram of module cooperation and timing relationship of a first-stage decision method in the event of a primary node failure;

3 is a schematic diagram of module cooperation and timing relationship of a first-step decision method when a non-primary node fails or a heartbeat line breaks;

4 is a schematic diagram of module cooperation and timing relationship when the second step decision method finds that only a large subset and only a large subset is not the main subset expected after the heartbeat line breaks;

5 is a schematic diagram of module cooperation and timing relationship when a two-step decision method finds a unique large subset and a unique large subset is identical to a desired primary subset after a heartbeat line break;

6 is a flowchart of a cluster splitting processing method according to Embodiment 2 of the present invention;

Figure 7 is a specific flow chart of step 601 of Figure 6;

FIG. 8 is a schematic structural diagram of a cluster splitting device according to Embodiment 3 of the present invention; FIG.

FIG. 9 is a schematic structural diagram of the continuation service subset selection module 801 of FIG. 8;

FIG. 10 is a schematic structural diagram of the continuation service subset selection unit 8013 in FIG.

Embodiments of the invention

In order to solve the above problems, embodiments of the present invention provide a cluster splitting processing method and apparatus. Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that, in the case of no conflict, the features in the embodiments and the embodiments in the present application may be arbitrarily combined with each other.

Embodiment 1 of the present invention will be described below with reference to the accompanying drawings.

The embodiment of the present invention provides a cluster splitting processing system. The structure of the system is as shown in FIG. 1 , including an underlying heartbeat communication module and an upper layer service logic module, and between the service control logic module and the heartbeat communication module. Brain splitting decision module. The heartbeat communication module provides the brain splitting decision module with information such as the current cluster member list, the number of members, and the cluster member change notification (or called a brain split event). The brain splitting decision module uses this information to determine which subset should continue to run the service, and reports the judgment result to the service control logic module, which performs necessary service control operations such as active/standby switchover based on the result.

After the cluster is divided into multiple subsets due to brain splitting, the split-brain decision module is responsible for determining the only subset that allows the service to continue to run. This subset is called the "primary subset." Services on other subsets need to be stopped (or called Fence), and these subsets are called "secondary subsets." In the embodiment of the present invention, the node that is being Fence is powered off or restarted immediately, and stops working. Second, the main subset is as close as possible to the subset with the largest number of nodes in all split sub-sets. This will ensure that most nodes can continue to work after the split. Third, after the brain splitting event, the brain splitting decision module can immediately determine the main subset according to the information that occurs when the brain splitting event occurs, and ensure that the upper layer service control logic can perform the master/slave switching as soon as possible.

In order to achieve the above features, the brain splitting decision module in the embodiment of the present invention implements a two-step decision method: Step 1: In the event of a splitting event, the "preferred main subset" is first determined through an additional information channel. . The additional information channel refers to other channels that can exchange information between cluster nodes in addition to the heartbeat line; the second step: if there is a subset, the number of nodes is greater than 50% of the number of cluster nodes before the brain split, then it It must be the largest of all subsets (hereinafter referred to as "the only large subset"). If the only large subset is not the primary subset of the expected decision in the first step, then the only large subset immediately replaces the expected primary subset of the first decision, and the decision is the final subset of the service that can continue to run (below) Called "main subset"). If the second subset does not find a single large subset, or The only large subset is the main subset of expectations found in the first step, then the second step decision does not work, and the main subset of expectations is judged to be the final major subset.

The first decision method is implemented by the first decision sub-module, and the second step decision is implemented by the second step decision sub-module. In order to ensure the normal operation of the first step decision, the embodiment of the present invention requires an additional information channel to provide the information interaction capability of the first step decision sub-module as follows: 1.1) to 1.3):

1.1) In the normal situation where no brain split occurs, all nodes that can access the additional information channels normally indicate to other nodes in the cluster that they are in a normal state through additional information channels. This node that is confirmed by additional information channels and is in a normal state is called a "health" node.

1.2) In the normal case of no brain splitting, all healthy nodes elect a unique node from the cluster through the additional information channel as the so-called "primary node". The "primary node" must be a healthy node, but the healthy node is not necessarily the primary node.

1.3) Re-election of the main node: If the brain split is caused by the failure of the above-mentioned main node, then if the new main node has not been elected in the first step of the splitting decision, the first step of the splitting decision will be Loss of judgment ability without the main node. In order to avoid this, the underlying heartbeat communication protocol module gives a minimum time interval T1 from the interruption of communication to the reporting of the brain splitting event to the brain splitting decision module. Then, the maximum time for the additional information channel from losing the primary node to re-electing the new primary node is T2. Then, as long as T2 < T1 is guaranteed, the new primary node can be re-elected before the brain splitting event occurs due to the failure of the main node, thus ensuring the correctness of the first step of the splitting decision; Cracking is caused by a failure of a non-primary node or a heartbeat, and no re-election of the primary node is involved.

The following 1.4) to 1.7) are the decision methods made by the first-level decision method with the established information of the current main node after the occurrence of a splitting event:

1.4) When a split occurs, the subset of the primary nodes is immediately judged as the so-called "primary subset of expectations", and the other subsets are judged to be secondary subsets. However, the result of the judgment is not reported to the service control logic.

1.5) The brain splitting decision module of each node in the desired main subset obtains a new member list from the brain split event message reported by the underlying heartbeat communication module, that is, the member column of the desired main subset. table.

1.6) The expected primary subset assigns a representative node from the new member list, which performs delayed Fence operations on other secondary subsets, ie, stops other subsets from working, avoiding multi-master. Here, the Fence operation requires a pre-delay (delay time is set to T _d ) in order to make the Fence operation slower than the zero-delay Fence that may occur in the second step of the following step, so that the subset performs the second Step judgment. Therefore, T _d is greater than the time consuming of the second step decision.

1.7) If the second step decision does not really work, then after the T _d time, the representative of the "desired primary subset" will perform the Fence operation, and the "desired primary subset" will ultimately be the primary subset. The judgment result is reported to the service control logic.

Figure 2 depicts the module cooperation and timing relationships for the first-step decision method in the event of a primary node failure. Figure 3 depicts the module cooperation and timing relationships for the first-step decision method in the event of a non-primary node failure or heartbeat line break.

After the above-mentioned first step of the splitting decision, it is enough to avoid the occurrence of multi-active and loss of the main problem. But it may also happen that "most of the working nodes lose their main use, that is, the cluster loses most of its computing power." Let's take an example of a cluster splitting of more than two nodes: Suppose four nodes A, B, C, and D form a cluster {A, B, C, D}, and are configured with additional requirements that meet the requirements of the first-step decision method. Information channels and Fence features. Then, assuming that the brain split occurs due to a heartbeat failure, all nodes will still work. The result is that A, B, and C become a subset, and D becomes a subset that is split into two subsets, {A, B, C}, and {D}. If D happens to be the primary node, then {D} is judged as the primary subset by the first splitting decision, and {A, B, C} is judged as the secondary subset. In the end, the 3/4 computing power represented by the already working {A, B, C} subset was excluded from the cluster. This leads to a large waste of computing power.

The second step decision method is detailed below. The second-step decision method begins work after the first-step decision. Its purpose is to try to make the subset with the largest number of nodes replace the expected major subset of the first decision as the true final main subset. The second step of the judgment method is to use the latest member relationship, which is the information available, to make the judgment. The method is as follows:

2.1) In the normal case where no brain split occurs, each node records the member column of the current cluster. The number of tables and members. This information is provided by the underlying heartbeat communication module at the time of the last split event.

2.2) After the brain splitting occurs, the second step decision module of each node of each subset also obtains the membership and the number of nodes of the subset from the brain split event message reported by the bottom heartbeat communication module. If the number of nodes in a subset exceeds 50% of the original cluster, all nodes in the subset can immediately determine that the subset is definitely the only subset with the largest number of nodes, that is, "the only large subset."

2.3) The only large subset selects a representative node from the new member list, called the only large subset representative.

2.4) If the only large subset of representatives finds that this subset is not the primary subset of the expected decision in the first step, then the node immediately performs a zero-delay Fence operation, letting all except the only large subset The nodes all stopped working. The zero-delay Fence operation must be earlier than the expected major subset of 1.6) to represent the delay Fence, so zero-delay Fence can successfully stop the expected main subset. Therefore, the above step 1.7) will not be executed.

2.5) The only large subset is ultimately judged as the main subset. The result is reported to the service control logic.

2.6) Because other subsets stop working and are no longer healthy, they lose the ability to compete for the primary node. After the T2 time, the new primary node will inevitably be re-elected in the only large subset to prepare for the next new splitting decision. According to the time constraints in 1.3) above, the main section of this re-election will not be interrupted by the new brain splitting event.

2.7) If the only large subset is not found in step 2.2), or if the only large subset is found to be the main subset expected in 2.4), then the second step of the splitting decision is not required, ie it will not go to 2.5) And step 2.6), step 1.7) of the first decision will be executed.

Figure 4 depicts the module cooperation and timing relationships for the second-step decision method when a unique subset is found and the only large subset is not the expected primary subset after the heartbeat is broken. Figure 5 depicts the module cooperation and timing relationships for the second-step decision method when a unique large subset is found and the only large subset is identical to the expected primary subset after the heartbeat is broken.

After applying the second-step decision method, analyze the cluster of the above five nodes: the 3/4 computing power represented by the {A, B, C} subset is the only large subset, so it will replace {D } as the most The main subset of the end becomes a new cluster that can continue to work. Assuming that A is the only large subset of representatives, then the D node has not had time to get Fence{A, B, C} to be A Fence first.

The Fence mechanism can be a node-level Fence based on power management, and the Fence node will stop running the service due to loss of power.

In a specific embodiment, the Fence may also be a node-level Fence based on the kernel Panic, and the node of the Fence may stop running the service due to the CPU stopping working.

In summary, the Fence mechanism is not limited to the above two mechanisms, and any technical means that can achieve the effect of any node of the Fence in each node in the cluster is within the scope of the present invention.

In a specific embodiment, the additional information channel may be implemented using a decision disk based on a shared storage medium. To obtain the primary node, the specific interactions between the decision disk based on the shared storage medium and the first decision submodule are as follows:

1) When the cluster is initialized, a disk space is opened on the shared medium (such as iSCSI, AOE, SAN, etc.) as a decision disk. The decision disk is spatially divided into several blocks. Each node of the cluster is assigned a node ID that increments from zero. Then with this ID as an index, each node corresponds to a unique block (the block is also indexed from zero).

2) In the normal case where no brain split occurs, all nodes that can normally access the decision disk write the current time stamp to the corresponding block in the decision disk through the disk I/O operation. Other nodes in the cluster determine whether a node is healthy based on whether the timestamp changes or not. A node is considered an unhealthy node if it cannot update its timestamp for a long time.

3) Configure the same primary node selection rule for each node in the cluster. For example, in the normal case where no brain split occurs, all healthy nodes consider health and the node with the smallest index is the only primary node; you can also select the index. The largest one node acts as the primary node. The present invention is not limited thereto, and it is within the scope of the present invention as long as the implementation method of selecting the only healthy node as the primary node can be achieved.

4) A node failure or a heartbeat line break event occurs. If it is a major node failure, the new primary node is selected during T2.

In a specific embodiment, the additional information channel may also be based on an additional Ethernet network (not It is the heartbeat line) to achieve. To get the primary node, the interaction between the additional Ethernet network and the first decision submodule is as follows:

2) Configure the same primary node selection rule for each node in the cluster. For example, in the normal case where no brain split occurs, all nodes that all healthy nodes consider healthy and whose MAC address or IP address is the smallest are the only primary nodes. The present invention is not limited thereto, and it is within the scope of the present invention as long as the implementation method of selecting the only healthy node as the primary node can be achieved.

3) A node failure or a heartbeat line break event occurs. If it is a major node failure, the new primary node is selected during T2.

In summary, the additional information channel is not limited to the above two implementation manners, but in any implementation manner, it is within the scope of the present invention to obtain an implementation manner for determining whether the node is healthy or not, and the embodiment of the present invention is This is not limited.

In a specific embodiment, the heartbeat communication module can use, but is not limited to, the Totem multicast communication protocol.

In a particular embodiment, the service control logic module may use, but is not limited to, Pacemaker or AMF of OpenAIS.

The following a.1) to a.4) are the decision methods made by the first decision method after the occurrence of a splitting event by means of the established information of the current main node:

A.1) After the occurrence of a brain split, the subset of the primary node is immediately determined to be the primary subset of the expectation, and the other subsets are judged to be the secondary subset.

A.2) The split-brain decision module of each node in the desired primary subset derives a new list of members, ie, a list of members of the desired primary subset, from the split-brain event message reported by the underlying heartbeat communication module.

A.3) The expected primary subset assigns the node with the lowest IP in its member list as the primary primary child of the expectation, and performs the delayed Fence operation on the other secondary subset. The delay time is T _d .

A.4) After the T _d time, the expected primary sub-representation performs the Fence operation, and the expected major subset is finally judged as the main subset. Each node of the desired primary subset reports this decision result to its respective service control logic module.

The following b.1) to b.6) are the second-step decision methods. It started working after the first step decision.

B.1) Each subset compares the number of members with the number of original cluster members: if the number of members of the subset is greater than 50% of the number of members of the original cluster, the subset considers itself to be the only large subset.

B.2) If each subset is not the only large subset, the second step decision ends immediately. Go to step a.4).

B.3) A subset finds itself to be the only large subset, but the subset happens to be the main subset of expectations, and the second step of the decision ends immediately. Go to step a.4).

B.4) A subset finds itself to be the only large subset and is not the main subset expected, then the only large subset assigns the node with the lowest IP in its member list to immediately all nodes except the only large subset Perform a zero-delay node-level Fence operation to stop them from working. Since the representative of the main subset expected is Fence, the above step a.4) will not be executed.

B.5) The only large subset of the final decision is the main subset of the post-brain split. The result is reported to the service control logic module of each node of the only large subset.

B.6) Since other subsets stop working and cannot access the decision disk, they are not healthy in terms of additional information channels, thus losing the ability to compete for the main node. After the T2 time, the new primary node will inevitably be re-elected in the only large subset to prepare for the next new split-brain decision.

Embodiment 2 of the present invention will be described below with reference to the accompanying drawings.

The embodiment of the invention provides a cluster splitting processing method, which can be applied to a node as shown in FIG. 1 , and the method is completed by a brain splitting decision module. Using this method, the flow of management control of the cluster during cluster splitting is shown in Figure 6, including:

Step 601: When a cluster split occurs, select a subset of the cluster that is allowed to continue to serve;

In the embodiment of the present invention, the current cluster member list, the number of members, and the cluster member change notification information are maintained at each node of the cluster. Alternatively, the above information can be maintained by the heartbeat communication module of FIG.

After a brain split occurs in the cluster, multiple subsets are formed. In this case, you need to select one of the subsets that are allowed to continue to serve, so that other subsets stop working. This step is shown in Figure 7, including:

Step 6011: selecting a subset of the main nodes before the occurrence of the brain splitting is a desired main subset;

In this step, according to the communication of the heartbeat line between the pre-brain nodes, each node can know the previous main node. When the brain split occurs, the subset of the main node is selected as the expected main subset.

The way to select the main node is as follows:

1. When the cluster is initialized, a disk space is opened on the shared medium as a decision disk, and the decision disk is partitioned, and each node in the cluster is uniquely corresponding to a partition of the decision disk, the cluster Each node in the middle writes the current timestamp to the corresponding partition in the decision disk through a disk I/O operation.

Then, select one of the nodes that continuously update the timestamp as the primary node. The continuous update timestamp indicates that the node is connected properly and belongs to the healthy node, and one of them can be selected as the primary node. The selection rules can be configured as needed to configure the same rules on all nodes in the cluster. The continuous update timestamp may be that the number of times the timestamp is updated within a time range is greater than a threshold.

If the primary node fails, after the failed node is excluded, the new primary node is again selected from the remaining healthy nodes.

2. Each node in the cluster periodically broadcasts or multicasts a KeepAlive message through an additional Ethernet network under normal conditions without brain splitting.

Then, one of the nodes that continuously issue the KeepAlive message is selected as the primary node. The keep-alive KeepAlive message indicates that the node is connected properly and belongs to the healthy node, and one of them can be selected as the primary node. The selection rules can be configured as needed to configure the same rules on all nodes in the cluster. The continuously issuing the KeepAlive message may be that the number of times the KeepAlive message is sent within a time range is greater than a threshold.

Step 6012: Assign a representative node from the desired primary subset, instructing the representative node to notify all nodes of each of the subsets except the expected primary subset to stop working after the first delay time.

Step 6013: Select a subset of the number of nodes that is greater than half of the number of cluster nodes before the occurrence of brain splitting. The only large subset;

This step is an optional step. When there is such a subset that the number of nodes is greater than half of the total number of nodes in the cluster before the occurrence of brain splitting, the subset is used as the only large subset.

Step 6014: Select a node from the unique large subset as the only large subset representative;

This step is an optional step. When it is determined in step 6013 that there is a unique large subset, this step selects one node in the subset as the only large subset representative.

Step 6015: Instructing the unique large subset to determine that the unique large subset is different from the expected primary subset, and after the zero delay or the second delay time, notify the only one All nodes of other subsets outside the set stop working, and the second delay time is less than the first delay time;

This step is an optional step that is performed when there is a unique large subset.

In this step, the second delay time is less than the first delay time, so that it can be ensured that after the completion of the confirmation whether there is a unique large subset operation, the desired main subset is likely to issue a notification requesting that the other subsets stop working. It does not happen that there is a unique large subset, but before the only large subset is selected, the expected major subset notifies the node in the only large subset to stop working, resulting in a loss of processing power.

Step 6016: Select, from the desired primary subset and the unique large subset, a subset that is uniquely allowed to continue serving;

This step involves the following situations:

In addition, when it is determined that a brain split occurs, it is also necessary to interrupt the node bottom heartbeat communication and the upper layer service control. The communication between the logics is resumed until the first time length is reached, and the communication between the underlying heartbeat communication and the upper layer service control logic is resumed. The purpose of this is to ensure that the upper service control logic does not respond to the cluster split event for the first time period, and strives for the time to complete the only subset that allows the service to continue between the upper layer and the upper layer.

While interrupting communication between the underlying heartbeat communication and the upper layer service control logic, the only choice to allow continued service subsets is completed. When the only large subset is used as the only allowed to continue the service subset, it is also necessary to select a new primary node. Optionally, a node is elected from the unique large subset as a new primary node, and the new primary node is elected. The time consuming is from the time when the brain splitting is determined to the second time length, and the second time length is less than the first time length. In this way, after the selection of the new primary node is completed, the communication between the underlying heartbeat communication and the upper layer service control logic is restored, and the upper layer service control logic directly acquires the new primary service node information, thereby avoiding each node determining its own running status band. The problem of management confusion.

Step 602: Control the nodes in the other subsets except the subset that is allowed to continue to serve to stop working;

In this step, the nodes in the subset that are allowed to continue the service can be notified that the nodes in the other subsets stop working.

Embodiment 3 of the present invention will be described below with reference to the accompanying drawings.

The embodiment of the present invention provides a cluster splitting processing device. The structure of the device is as shown in FIG. 8 and includes:

The continuation service subset selection module 801 is configured to select a subset of the cluster that is allowed to continue to serve when the cluster has a brain split;

The node downtime control module 802 is configured to control the nodes in the other subsets except the subset that is only allowed to continue to service to stop working.

Optionally, the structure of the continuation service subset selection module 801 is as shown in FIG. 9, and includes:

The main subset selection unit 8011 is configured to select a subset of the main nodes before the occurrence of the brain splitting as a desired main subset;

The only large subset selection unit 8012 is set to select the number of nodes larger than the cluster section before the occurrence of the brain splitting A subset of half the number of points as the only large subset;

The continuation service subset selection unit 8013 is arranged to select a subset that is uniquely allowed to continue from the desired primary subset and the unique large subset.

Optionally, the continuation service subset selection module 801 further includes:

Representative node selection unit 8014, configured to assign a representative node from the desired primary subset, instructing the representative node to notify all nodes of the subset other than the desired primary subset to stop after the first delay time jobs.

Optionally, the structure of the continuation service subset selection unit 8013 is as shown in FIG. 10, and includes:

a first selection sub-unit 1001, configured to select the desired primary subset as a subset that is uniquely allowed to continue to serve when there is no uniquely large subset;

a second selection sub-unit 1002, configured to use the subset as the only subset that allows for continued service when the desired primary subset is the same subset as the unique large subset;

The third selection sub-unit 1003 is configured to use the unique large subset as the only subset that allows for continued service when the desired primary subset and the unique large subset are different subsets.

A uniquely large subset representative selection unit 8015 is arranged to select a node from the unique large subset as the only large subset representative, indicating that the unique large subset representative determines the unique large subset and the desired primary child When the sets are different subsets, after zero delay or the second delay time, all nodes of the subset other than the unique large subset are notified to stop working, and the second delay time is less than the first delay time. .

Optionally, the device further includes:

The internal communication management module 803 is configured to interrupt the communication between the underlying heartbeat communication of the node and the upper layer service control logic when the node detects that the heartbeat communication is interrupted, and after the first time length is reached, determine that the brain split occurs and restore the bottom layer. Communication between heartbeat communication and upper layer service control logic.

The primary node election module 804 is configured to elect a node from the only large subset The new primary node, the election of the new primary node takes time from the time when the brain splitting is determined to the second time length, and the second time length is less than the first time length.

Optionally, the device further includes:

The storage module 805 is configured to maintain the current cluster member list, the number of members, and the cluster member change notification information.

The cluster splitting processing device can be integrated into the nodes in the cluster, and the node splitting processing method provided by the embodiment of the present invention is implemented by the node between the underlying heartbeat communication and the upper layer service control logic.

Embodiments of the present invention provide a cluster splitting processing method and apparatus. When a cluster splits, a subset of the cluster that is allowed to continue to serve is selected, and other than the subset that is allowed to continue to serve is controlled. The nodes in the subset stop working. The orderly management of the cluster under the condition of cluster splitting is realized, and the control problem after cluster splitting is solved.

One of ordinary skill in the art will appreciate that all or a portion of the steps of the above-described embodiments can be implemented using a computer program flow, which can be stored in a computer readable storage medium, such as on a corresponding hardware platform (eg, The system, device, device, device, etc. are executed, and when executed, include one or a combination of the steps of the method embodiments.

Alternatively, all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve. The devices/function modules/functional units in the above embodiments may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices.

When the device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium. The above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.

The scope of the invention should be determined by the scope of the claims.

Industrial applicability

The embodiment of the invention realizes the orderly management of the cluster under the condition of cluster splitting, and solves the control problem of the post-brain splitting.

Claims

A cluster brain splitting method includes:

When a cluster splits into multiple subsets, a subset that is uniquely allowed to continue to be serviced is selected from the plurality of subsets;

Controls the nodes in other subsets except the subset that is allowed to continue to serve to stop working.
The cluster splitting processing method according to claim 1, wherein when the cluster is split and split into a plurality of subsets, selecting a subset from the plurality of subsets that is uniquely allowed to continue to serve includes:

Select the subset of the main nodes before the occurrence of the brain splitting as the main subset expected;

Selecting a subset of nodes that is greater than half the number of cluster nodes before the occurrence of cerebral rupture as the only large subset;

From the desired primary subset and the unique large subset, select a subset that is uniquely allowed to continue service.
The cluster splitting processing method according to claim 2, further comprising:

When the cluster is initialized, a disk space is opened on the shared medium as a decision disk, and the decision disk is partitioned, and each node in the cluster is uniquely corresponding to a partition of the decision disk;

Each node in the cluster writes a current timestamp to a corresponding partition in the decision disk through a disk input/output I/O operation;

One of the nodes whose number of times of updating the time stamp is greater than the threshold in a time range is selected as the primary node.
The cluster splitting method according to claim 2, further comprising:

Each node in the cluster broadcasts or multicasts a KeepAlive message through an additional Ethernet period under normal conditions without brain splitting;

One of the nodes that issued the KeepAlive message for a number of times greater than the threshold within a time range is selected as the primary node.
The cluster splitting treatment method according to claim 2, which selects the main before the occurrence of cerebral schizophrenia After the step of the node having the subset of the desired primary subset, the method further includes:

A representative node is assigned from the desired primary subset, instructing the representative node to notify all nodes of each of the subsets other than the desired primary subset to stop working after the first delay time.
The cluster splitting processing method according to claim 5, wherein selecting a subset that is uniquely allowed to continue from the desired primary subset and the unique large subset includes:

When there is no unique large subset, the desired primary subset is selected as the only subset that is allowed to continue the service;

When the desired primary subset is the same subset as the unique large subset, the subset is used as the only subset that allows for continued service;

When the desired primary subset and the unique large subset are different subsets, the unique large subset is used as the only subset that allows for continued service.
The cluster splitting processing method according to claim 6, wherein after the step of selecting a subset of the number of nodes greater than half of the number of cluster nodes before the occurrence of the mitral split as the only large subset, the method further comprises:

Selecting a node from the only large subset as the only large subset representative;

Instructing the unique large subset representative to determine that the unique large subset is different from the expected primary subset, and after zero delay or the second delay time, notify the other than the only large subset All nodes of the other subsets stop working, and the second delay time is less than the first delay time.
The cluster splitting processing method according to claim 6, further comprising:

When the node detects that the heartbeat communication is interrupted, the communication between the underlying heartbeat communication of the node and the upper layer service control logic is interrupted, and after reaching the first time length, the occurrence of brain splitting is determined, and the underlying heartbeat communication and the upper layer service control logic are restored. Communication between.
The cluster splitting processing method according to claim 8, wherein when the only large subset is used as the only subset that allows the service to continue, the method further includes:

Electing a node from the unique large subset as a new primary node, and electing the new primary node takes time from the time when the brain splitting is determined to the second time length, and the second time length is less than The first length of time.
The cluster splitting processing method according to claim 1, further comprising:

The current cluster member list, the number of members, and the cluster member change notification information are maintained at each node of the cluster.
A cluster splitting device includes:

The service subset selection module is further configured to: when the cluster splits into a plurality of subsets, select a subset from the plurality of subsets that is allowed to continue to serve;

The node shutdown control module is configured to: control the nodes in the other subsets except the subset that is allowed to continue to serve to stop working.
The cluster splitting processing device according to claim 11, wherein the continuation service subset selection module comprises:

The main subset selection unit is expected to be set to: select a subset of the main nodes before the occurrence of the brain splitting as a desired main subset;

The only large subset selection unit is set to: select the subset whose number of nodes is greater than half the number of cluster nodes before the occurrence of the brain split as the only large subset;

The continuation service subset selection unit is configured to select a subset that is uniquely allowed to continue from the desired primary subset and the unique large subset.
The cluster splitting processing device of claim 12, wherein the continuation service subset selection module further comprises:

a representative node selecting unit, configured to: assign a representative node from the desired primary subset, instructing the representative node to notify all nodes of each of the subsets except the expected primary subset after the first delay time stop working.
The cluster splitting processing device according to claim 13, wherein the continuation service subset selecting unit comprises:

a first selection sub-unit, configured to: when the unique large subset does not exist, select the desired primary subset as a subset that is uniquely allowed to continue serving;

a second selection subunit, configured to: when the desired primary subset is the same subset as the unique large subset, the subset is the only subset that is allowed to continue serving;

And a third selection subunit, configured to: when the expected primary subset and the unique large subset are different subsets, use the unique large subset as the only subset that allows to continue the service.
The cluster splitting processing device of claim 14, wherein the continuation service subset selection module further comprises:

The only large subset represents a selection unit, configured to: select a node from the unique large subset as a unique large subset representation, indicating that the unique large subset represents determining the unique large subset and the desired primary child When the sets are different subsets, after zero delay or the second delay time, all nodes of the subset other than the unique large subset are notified to stop working, and the second delay time is less than the first delay time. .
The cluster splitting device according to claim 14, further comprising:

The internal communication management module is configured to: when the node detects that the heartbeat communication is interrupted, interrupt communication between the bottom heartbeat communication of the node and the upper service control logic, and after reaching the first time length, determine the occurrence of brain splitting, and restore the Communication between the underlying heartbeat communication and the upper layer service control logic.
The cluster splitting processing apparatus according to claim 16, wherein, when said unique large subset is the only subset that allows for continued service, the apparatus further comprises:

The primary node election module is configured to: elect a node from the unique large subset as a new primary node, and the election of the new primary node takes time from the time when the brain split is determined to the second time length, the first The second time length is less than the first time length.
The cluster splitting device according to claim 11, further comprising:

The storage module is set to maintain the current cluster member list, the number of members, and the cluster member change notification information.
A computer readable storage medium storing program instructions that, when executed, can implement the method of any of claims 1-10.