WO2017215430A1

WO2017215430A1 - Node management method in cluster and node device

Info

Publication number: WO2017215430A1
Application number: PCT/CN2017/085935
Authority: WO
Inventors: 骆旭剑
Original assignee: 中兴通讯股份有限公司
Priority date: 2016-06-14
Filing date: 2017-05-25
Publication date: 2017-12-21
Also published as: CN107508694A; CN107508694B

Abstract

Disclosed are a node management method in cluster and a node device. The method comprises: when a first node detects an internode heartbeat connecting abnormality, determining, according to a first management strategy, that the first node is a prepared management node of a cluster in which the first node is located; and determining whether the first node can be converted into a management node from the prepared management node on the basis of a second management strategy, and performing internode reconfiguration and internode task scheduling on cluster resources by the first node as the management node when determining that the first node can be converted into the management node from the prepared management node.

Description

Node management method and node device in cluster

Technical field

The present application relates to, but is not limited to, the field of communication technologies, and in particular, to a node management method and a node device in a cluster.

Background technique

In order to make the overall service of the cluster as available as possible, when the node in the high availability cluster fails, the cluster system should respond quickly, assign the task of the system to other working nodes in the cluster, and the faulty node Shared resources (such as IP, magnetic array) will also be taken over by other nodes.

In general, a heartbeat is used between nodes in a high-availability cluster to detect a node. However, when the heartbeat fails, a split-brain problem may occur. Brain splitting can cause data incompleteness and can have a serious impact on services. A high-availability cluster is inevitably faced with a split-brain problem. Currently, there are some solutions to the problem of split-brain:

1) Add redundant heartbeats, however this can only be reduced without avoiding brain splitting;

2) Do a good job of monitoring and alarming the split brain, such as mail and SMS, etc., when the problem occurs, it can be artificially involved in arbitration and reduce losses, but this requires manual participation;

3) Enable the disk lock, the service party locks the shared disk. When the brain split occurs, the other party can completely rob the shared disk resource. However, if the party occupying the shared disk does not actively unlock, the other party will never be shared. Disk, if the node occupying the shared disk suddenly crashes or crashes, the other party cannot execute the unlock command, and the backup node can not take over the shared resources and application services;

4) Increase the mechanism of third-party arbitration to determine the resource winners. However, this requires the introduction of a third party, and if the third party fails, the brain cracking problem cannot be solved.

Summary of invention

The following is an overview of the topics detailed in this document. This summary is not intended to limit the claims The scope of protection.

The embodiments of the present invention provide a node management method and a node device in a cluster, which can effectively solve the brain crack problem by using existing resources of the cluster without introducing a third-party device, and ensure high availability and reliability of the cluster.

An embodiment of the present invention provides a node management method in a cluster, where the method is applied to a first node, and the method includes:

When detecting a heartbeat connection abnormality between nodes, determining, according to the first management policy, that the first node is a preliminary management node of the subgroup in which the node is located;

Determining whether the first node can be converted into a management node by the preliminary management node based on the second management policy, and determining that the first node can be converted into a management node by the preliminary management node, Configuration and task scheduling between nodes.

In an embodiment, before the detecting an abnormality of the heartbeat connection between the nodes, the method further includes:

Determining, according to the third management policy, that the second node is a management node, so that the second node performs resource configuration and task scheduling.

In an embodiment, the determining, by the second management policy, whether the first node can be converted into a management node by the preliminary management node includes:

Determining whether the first node has an external network connection based on a preset network detection manner, determining that the first node can be converted into a management node when the determination is yes; determining that the first node cannot be converted into a Management node.

In an implementation manner, the shared storage device in the cluster supports multi-node common access;

The determining, according to the second management policy, whether the first node can be converted into a management node by the preliminary management node includes:

When it is determined that the shared storage device is not occupied, a placeholder file is created on the shared storage device, and after a preset time, it is detected whether there is a placeholder file created by another preparatory management node in the specific directory, if not Determining that the first node can be converted into a management node; if present, comparing the number of nodes of the sub-group in which the first node is located and the number of nodes of the sub-group in which the other preparatory management node is located, and based on the comparison result Determining whether the first node can be pre- The standby management node is converted into a management node.

In an embodiment, comparing the number of nodes of the subgroup in which the first node is located with the number of nodes of the subgroup in which the other preliminary management nodes are located, and determining, according to the comparison result, whether the first node can be managed by the preliminary node Convert to a management node, including:

Determining that the number of nodes of the sub-group in which the first node is located is greater than the number of nodes of the sub-group in which the other preparatory management nodes are located, determining that the first node can be converted into a management node;

Determining whether the node number of the first node is smaller than the first node when determining that the number of nodes of the sub-group in which the first node is located is the most and the sub-group having the same number of nodes as the sub-group in which the first node is located a node number of a preliminary management node in a subgroup having the same number of nodes in the subgroup, and determining that the first node can be converted into a management node when the determination is yes, and determining that the first node is not convertible when the determination is no Become a management node.

In an implementation manner, the shared storage device in the cluster supports single node exclusive access;

Determining an access time of the first node to the first partition of the shared storage device, and mounting the first partition when the access time arrives, and determining that there is no placeholder file in the first partition When it is determined that the first node can be converted into a management node; when it is determined that the location file exists in the first partition, it is determined that the first node cannot be converted into a management node.

In an embodiment, the determining, by the first management policy, that the first node is a preliminary management node of a subgroup, includes:

Determining that the first node is a node with the smallest node number in the subgroup where the first node is located, and the first node is a preliminary management node of the subgroup in which the first node is located.

An embodiment of the present invention further provides a node device, where the node device includes: a determining module and a determining module;

The determining module is configured to: when detecting an abnormal heartbeat connection between the nodes, determine, according to the first management policy, that the first node is a preliminary management node of the subgroup in which the node is located;

The determining module is configured to determine, according to the second management policy, whether the first node can be converted into a management node by the preliminary management node, and determine that the first node can be a preliminary management node When converted to a management node, the management node performs reconfiguration between nodes and task scheduling between nodes as a cluster resource.

In an embodiment, the determining module is further configured to determine, according to the third management policy, that the second node is a management node, so that the second node performs resource configuration and task scheduling.

In an embodiment, the determining module is configured to determine, according to a preset network detection manner, whether the first node has an external network connection, and if the determination is yes, determining that the first node can be converted into a management node; If not, it is determined that the first node cannot be converted into a management node.

The determining module is configured to: when the shared storage device is not occupied, create a placeholder file in a specific directory of the shared storage device, and detect whether there is another preparatory management in the specific directory after a preset time. a placeholder file created by the node, if not, determining that the first node can be converted into a management node; if present, the number of nodes of the subgroup where the first node is located and the subgroup of the other preparatory management node The number of nodes is compared, and based on the comparison result, it is determined whether the first node can be converted into a management node by the preliminary management node.

In an embodiment, the determining module is configured to determine that the number of nodes of the sub-group in which the first node is located is greater than the number of nodes of the sub-group in which the other preparatory management node is located, and determine that the first node can be converted into Management node

The determining module is configured to determine an access time of the first node to the first partition of the shared storage device, and mount the first partition when the access time arrives, and determine the first partition Whether there is a placeholder file, determining that there is no placeholder file in the first partition, determining that the first node can be converted into a management node; determining that the first partition has a placeholder When the file is determined, it is determined that the first node cannot be converted into a management node.

In an embodiment, the determining module is configured to determine that the first node is a node with the smallest node number in the subgroup where the first node is located, and the first node is a preliminary management node of the subgroup in which the first node is located.

The embodiment of the invention further provides a computer readable storage medium storing computer executable instructions, which are implemented by the processor to implement a node management method in the cluster.

Applying the node management method and the node device in the cluster according to the embodiment of the present invention, when the first node in the high-availability cluster detects that the heartbeat connection between the nodes is abnormal, the first node is determined to be the preliminary management according to the preset first management policy. a node; determining, according to a preset second management policy, whether the first node can be converted into a management node by the preparatory management node, and when the determination is yes, performing reconfiguration between nodes and inter-node tasks as the management node Scheduling; thus, when the heartbeat connection between nodes is abnormal, the cluster is split into two or more subgroups, and the first node further determines whether it can become the management of the cluster after determining that it is the preparatory management node of the subgroup in which it is located. Node, and if it is judged as YES, as the management node, the inter-node reconfiguration and inter-node task scheduling of the cluster resources effectively avoid the occurrence of brain splitting, ensure the high availability and reliability of the cluster, and more The first node is a node in the cluster, so there is no need to introduce a third-party management device, which is simple to implement.

Other aspects will be apparent upon reading and understanding the drawings and detailed description.

BRIEF abstract

1 is a schematic flowchart 1 of a node management method in a cluster according to an embodiment of the present invention;

2 is a schematic structural diagram of a cluster in an embodiment of the present invention;

3 is a second schematic flowchart of a node management method in a cluster according to an embodiment of the present invention;

4 is a schematic diagram of splitting a cluster into multiple subgroups and determining a preliminary management node according to an embodiment of the present invention;

5 is a schematic flowchart 3 of a node management method in a cluster according to an embodiment of the present invention;

6 is a schematic flowchart 4 of a node management method in a cluster according to an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of a node device according to an embodiment of the present invention.

Detailed

The inventor found that a high-availability cluster is inevitably faced with a brain splitting problem. The common brain splitting conditions can be described as follows:

Node A and Node B in the cluster pass the heartbeat detection to confirm the existence of the other party. When the heartbeat detection fails to confirm that the other party exists, the corresponding shared resource is taken over. If suddenly, the heartbeat between node A and node B does not exist (such as network disconnection), while node A and node B are actually in the active state, node A will take over the resources of node B. At the same time, Node B will take over the resources of Node A, which is the brain split.

When an abnormality occurs in the heartbeat network of the cluster, the cluster may be split into multiple node groups, that is, subgroups, each of which takes over the service and accesses file system resources (for example, concurrently written to the file system) to cause data corruption.

Brain splitting can cause data incompleteness and can have a serious impact on the service.

The brain splitting of the cluster can cause data incompleteness: the nodes in the cluster (during the brain splitting) access the same shared resource at the same time, and there is no lock mechanism to control access to the data, then there is data integrity. Or other possible errors.

The splitting of the cluster will also have a serious impact on the service. For example, node A and node B may not be preempting an IP resource, causing network data to fail to transmit.

The splitting of the cluster may cause serious negative consequences. In the embodiment of the present invention, if the first node in the cluster detects the abnormal connection between the nodes, the first node is determined according to the preset first management policy. a preliminary management node of the subgroup; determining whether the first node can be converted into a management node by the preliminary management node based on the preset second management policy, so as to determine the node as the management node Reconfiguration and inter-node task scheduling; thus, when the heartbeat connection between nodes is abnormal, the cluster is split into two or more subgroups, and the first node further determines after determining that it is the preparatory management node of the subgroup in which it is located. Whether it can become the management node of the cluster, and if it is judged to be YES, as the management node, the inter-node reconfiguration and the inter-node task scheduling of the cluster resources can effectively avoid the occurrence of brain splitting and ensure the high availability of the cluster. And reliability, and since the first node is a node within the cluster, there is no need The introduction of third-party management devices is simple to implement.

The details will be further described below in conjunction with the drawings and embodiments.

Embodiment 1

1 is a schematic flowchart of a method for managing a node in a cluster according to an embodiment of the present invention. The method is applied to a first node, where the first node is a node in the cluster, and FIG. 2 is a schematic structural diagram of a cluster in an embodiment of the present invention. As shown in FIG. 1 and FIG. 2, the node management method in the cluster in the embodiment of the present invention includes:

Step 101: When detecting that the heartbeat connection between the nodes is abnormal, determining that the first node is a preliminary management node of the subgroup according to the preset first management policy.

Here, when the first node in the cluster determines that there is an abnormality in the heartbeat connection between the nodes through the heartbeat detection, that is, when the node of the current cluster has a node failure, the cluster is split into two or more subgroups. A node exists in one of two or more subgroups that are split into one.

The first node determines, according to the preset first management policy, that it is a preliminary management node of the sub-group, and the first management policy is applicable to all nodes in the cluster, so it can be understood as the sub-group where the first node is located. All the nodes in the first management policy elect the first node as the preliminary management node; at the same time, the nodes in the other sub-groups that are abnormally split due to the heartbeat also elect the preliminary management node of each sub-group according to the first management policy. .

In implementation, the preset first management policy may be any pre-set election rule, for example, a node with the node number (each node in the cluster has a unique number) minimum/maximum is used as the preliminary management node.

Based on the foregoing embodiment of the present invention, in a practical application, before the first node detects an abnormality of the heartbeat connection between the nodes, the method may further include:

Determining, by the third management policy, that the second node is a management node, so that the second node performs resource configuration and task scheduling; and the third management policy may be the same as or different from the first management policy, when the same, When the cluster is normal, the nodes in the cluster elect the second node as the management node through the preset first management policy to perform resource configuration and task scheduling. For example, create a specific partition, directory, or file on the shared storage device for clustering. Association between nodes under abnormal conditions The quotient can determine that the node with the smallest node number in the cluster is the second node, and the second node becomes the management node.

It should be noted that in any case, there is only one management node in the cluster at the same time. Therefore, when a heartbeat abnormality occurs in the cluster, the management node determined when the cluster is normal is no longer the management node.

Step 102: Determine, according to a preset second management policy, whether the first node can be converted into a management node by the preliminary management node, so as to perform reconfiguration between the nodes and the internode between the cluster resources as the management node when the determination is yes. Task scheduling.

In actual implementation, the subgroups that are split due to abnormal heartbeat between nodes in the cluster respectively determine the preliminary management nodes of the respective subgroups based on the first management strategy. However, in order to avoid brain splitting, only one subgroup can work normally. Therefore, one of the preliminary management nodes is further determined as a management node to perform inter-node reconfiguration and inter-node task scheduling on the cluster resources, and correspondingly, the determined sub-group of the management node is a working sub-group. Other subgroups stopped working.

The first node determines whether it can become a management node based on the preset second management policy, and when the determination is yes, performs reconfiguration between nodes and inter-node task scheduling as a management node;

Here, the second management policy is preset according to the actual situation of the cluster. In an actual application, if the cluster has an externally connected network, the second management policy may be set as: the first node determines itself Whether there is an external network connection, that is, whether the external network entity is connected, and if the determination is yes, determining that the first node can be converted into a management node; if the determination is no, determining that the first node cannot be converted into a management node;

When the cluster has no externally connected network, and the shared storage device of the cluster supports the multi-node access, the second management policy may be: the first node determines whether the shared storage device is occupied, When the shared storage device is not occupied, a placeholder file is created in a specific directory of the shared storage device, and after a preset time, it is detected whether there is a placeholder file created by another preparatory management node in the specific directory, if it does not exist. Determining that the first node can be converted into a management node; if present, comparing the number of nodes of the sub-group in which the first node is located and the number of nodes of the sub-group in which the other preparatory management node is located, and based on the comparison result Determining the number Whether a node can be converted into a management node by a preliminary management node;

When the cluster does not have an externally connected network, and the shared storage device of the cluster supports only single-node exclusive access, the second management policy may be set as follows: the first node determines the first partition of the shared storage device by itself. Access time, and when the access time arrives, the first partition is mounted, and it is determined whether there is a placeholder file in the first partition. If there is no placeholder file in the first partition, the first A node can be converted into a management node.

According to the foregoing embodiment of the present invention, when an inter-node heartbeat abnormality occurs in the cluster, the first node in the cluster further determines whether it can become the management node of the cluster after determining that it is the preparatory management node of the subgroup in which the cluster is located, and determines that In the case of the management node, the inter-node reconfiguration and the inter-node task scheduling are performed on the cluster resources, so that the cluster system is load balanced, effectively avoiding the occurrence of brain splitting, and ensuring the high availability and reliability of the cluster. Since the first node is a node in the cluster, it is not necessary to introduce a third-party management device, and the implementation is simple.

Embodiment 2

FIG. 3 is a schematic flowchart of a method for managing a node in a cluster according to an embodiment of the present invention. The cluster has an external network connection. As shown in FIG. 3, the node management method in the cluster in the embodiment of the present invention includes:

Step 301: The first node determines, according to the preset first management policy, that the second node is a management node, so that the second node performs resource configuration and task scheduling.

In this embodiment, since the cluster has an externally connected network, the externally connected network plane may be preset as a redundant heartbeat plane. If there are multiple network planes, a critical network plane may be selected as the redundancy. Heartbeat plane; the key network plane may be a plane that affects the processing of data if the plane anomaly.

If the cluster has a shared storage device, after the step, that is, after determining that the second node is the management node, the second node may create a specific partition, directory, or file on the shared storage device for the cluster abnormality. Negotiate between the next nodes.

In this embodiment, the first node determines that the node with the smallest node number in the cluster is the management node.

Step 302: When the first node detects that the heartbeat connection between the nodes is abnormal, according to the first management The policy determines itself as the preparatory management node of the subgroup in which it is located.

Here, when there is an abnormal heartbeat connection in the cluster, the cluster is split into two or more subgroups, and the first node exists in one of two or more subgroups split into each subgroup. The node is determined according to the first management policy, and the preliminary management node of the respective sub-group is determined according to the first management policy; as shown in FIG. 4 is a schematic diagram of splitting the cluster into multiple sub-groups and determining a preliminary management node according to an embodiment of the present invention; wherein, AC, AD The heartbeat is abnormal between AEs, that is, node A and node B form a subgroup, and the remaining nodes form a subgroup.

In this embodiment, the first node determines, according to the first management policy, that it is a preliminary management node of the subgroup in which the first node determines that the first node determines that it is the node with the smallest node number of the subgroup, and then determines the first The node is the preliminary management node of the subgroup in which it is located.

Step 303: The first node determines whether there is an external network connection based on the preset network detection mode. If yes, step 304 is performed; if not, step 305 is performed.

In an actual application, the preset network detection mode may be a ping or an address resolution protocol (ARP);

The first node determines whether there is an external network connection, that is, whether the external network entity is connected.

Step 304: Determine that it can become a management node, and perform reconfiguration between nodes and task scheduling between nodes as a management node.

In the implementation, if the node recovers the heartbeat, the location of the management node of the first node may be maintained, and the management node may be re-determined according to the first management policy and the second management policy.

Step 305: Determine that it cannot become a management node, and end the current processing flow.

Applying the above embodiments of the present invention, the cluster uses its own internal nodes to implement self-management. When a heartbeat abnormality occurs in the cluster, the first node determines whether it can become a management node by determining whether it has an external network connection by determining the self-provisioning management node in the sub-group. In the case of being a management node, the management node performs reconfiguration between nodes and task scheduling between nodes, so that the cluster system is load balanced, effectively avoiding the occurrence of brain splitting, and ensuring high availability of the cluster and reliability.

Embodiment 3

FIG. 5 is a schematic flowchart of a method for managing a node in a cluster according to an embodiment of the present invention. The cluster does not have an external network connection, but a shared storage device that supports multi-node access is provided. As shown in FIG. 5, the cluster in the embodiment of the present invention is shown in FIG. The node management methods within:

Step 501: The first node determines, according to the preset first management policy, that the second node is a management node, so that the second node performs resource configuration and task scheduling.

In this embodiment, since the cluster has a shared storage device, after the step, that is, after determining that the second node is the management node, the second node may create a specific partition, a directory, or the shared storage device. File, used for negotiation between nodes in the case of cluster anomalies.

Step 502: When the first node detects that the heartbeat connection is abnormal between nodes, the first node determines that it is a preliminary management node of the subgroup according to the first management policy.

Here, when there is an abnormal heartbeat connection in the cluster, the cluster is split into two or more subgroups, and the first node exists in one of two or more subgroups split into each subgroup. The nodes all determine the preliminary management nodes of the respective subgroups according to the first management policy; as shown in FIG. 4, the schematic diagram of the cluster splitting into multiple subgroups and determining the preliminary management nodes in the embodiment of the present invention.

In actual implementation, the subgroup that is split due to the abnormal heartbeat between nodes in the cluster is based on the first The management strategy determines the preparatory management nodes of the respective subgroups. However, in order to avoid brain splitting, only one subgroup can work normally. Therefore, one of the various preparatory management nodes is further determined as a management node to perform cluster resources. Reconfiguration between nodes and task scheduling between nodes, correspondingly, the determined subgroup of the management node is the subgroup of the work, and the other subgroups stop working.

Step 503: The first node determines whether the shared storage device of the cluster is occupied. If it is not occupied, step 504 is performed; if it is occupied, step 509 is performed.

When the heartbeat of the node is normal, the second node creates an identification file on the shared storage device of the cluster as the occupation identifier of the shared storage device, and the timing (such as S seconds, S size can be based on actual needs). Setting) updating the identification file (such as updating the creation time and/or content of the identification file);

When the first step is performed, the first node determines whether the shared storage device of the cluster is occupied by periodically detecting the change of the identifier file created by the second node on the shared storage device, which may include: W seconds (W ≥ S) detects whether the identification file changes once. If the identification file is continuously detected T times (T is a positive integer, the actual value can be set according to actual needs), the identification file does not change, then the determination The shared storage device is not occupied; if the identification file changes, it is determined that the shared storage device is occupied.

Step 504: Create a placeholder file in a specific directory of the shared storage device, and detect, after a preset time, whether there is a placeholder file created by another preparatory management node in the specific directory, if yes, execute step 505; If not, step 508 is performed.

Here, when the first node determines that the shared storage device of the cluster is not occupied, a placeholder file may be created in a specific directory of the shared storage device, where the placeholder file includes the node number information of the first node and the first node The node number information of the subgroup, correspondingly, the placeholder file created by the other preparatory management node carries the node number information of the other preparatory management node and the node number information of the corresponding subgroup; the length of the preset time may be Set according to actual needs, but need to be greater than or equal to S seconds.

Step 505: Compare the number of nodes of the sub-group in which the first node is located and the number of nodes of the sub-group in which the other preparatory management node is located, and determine whether the number of nodes of the sub-group where the first node is located is the most. If yes, go to step 506. If not, step 509 is performed.

Step 506: Determine whether there is a subgroup corresponding to the number of nodes of the subgroup in which the first node is located in the subgroup where the other preparatory management node is located. If yes, go to step 507; if not, go to step 508.

Step 507: Determine whether the node number of the first node is smaller than the node number of the preliminary management node in the same sub-group as the number of nodes of the sub-group in which the first node is located. If yes, go to step 508; if no, go to step 509.

Step 508: Determine that the first node can become a management node, and perform, as a management node, perform reconfiguration between nodes and task scheduling between nodes.

Step 509: Determine that the first node cannot become a management node, and end the current processing flow.

According to the foregoing embodiment of the present invention, when a heartbeat abnormality occurs in the cluster, the first node further determines the occupation of the shared storage device, the creation and detection of the placeholder file, and the first The number of sub-group nodes in which a node is located, the node number information of the first node, etc., determine whether or not it can become a management node, and further, when the management node can be a management node, perform reconfiguration between nodes and nodes of the cluster resource. Inter-task scheduling, so that the cluster system load balancing, effectively avoiding the occurrence of brain splitting, ensuring the high availability and reliability of the cluster.

Embodiment 4

FIG. 6 is a schematic flowchart of a method for managing a node in a cluster according to an embodiment of the present invention. The cluster does not have an external network connection, and only a shared storage device that supports single-node exclusive access exists. As shown in FIG. 6, the cluster in the embodiment of the present invention is shown in FIG. The node management methods within:

Step 601: The first node determines, according to the preset first management policy, that the second node is a management node, so that the second node performs resource configuration and task scheduling.

In this embodiment, since the cluster has a shared storage device, after the step, that is, after determining that the second node is the management node, the second node may create a specific partition, directory, or file on the shared storage device. Used for negotiation between nodes in the case of cluster anomalies.

In this embodiment, the first node determines that the node with the smallest node number in the cluster is the management section. point.

Step 602: When the first node detects that the heartbeat connection between the nodes is abnormal, the first node determines that it is the preliminary management node of the subgroup according to the first management policy.

Step 603: The first node determines its own access time to the first partition of the shared storage device.

Here, the first partition is a small partition that is created by the second node on the shared storage device for the disaster-tolerant node negotiation. When the heartbeat between the nodes in the cluster is normal, the first partition is in an empty state.

The shared storage device of the cluster only supports single-node exclusive access, that is, only one node is allowed to access the shared storage device at the same time. The time range that the node numbered N can access is n*M+N* after the start of the zero point of the day. T to n*M+(N+1)*T seconds, where n is greater than or equal to 0, M is the sum of points, N is the node number, and T is the configurable secure access duration.

Step 604: When the first node determines that its own access time arrives, the first partition is mounted, and it is determined whether there is a placeholder file in the first partition. If not, step 605 is performed; Then, step 606 is performed.

Here, when the first node finds that there is a placeholder file in the first partition, it is determined that the shared storage device is occupied. If the placeholder file does not exist in the first partition, that is, the shared storage device is not occupied; wherein the placeholder file Refers to the file created by the preparatory management node that carries its own node number and the number of nodes in which it is located.

Step 605: Determine that it can become a management node, create a placeholder file in the first partition, and uninstall the first partition.

After the first node determines that it can become the management node, a placeholder file is created in the first partition to identify the shared storage file, and the node is reconfigured as a management node and the inter-node task is scheduled. .

Step 606: Determine that it cannot become a management node, and end the current processing flow.

Applying the foregoing embodiment of the present invention, when a heartbeat abnormality occurs in the cluster, the first node determines the time of accessing the first partition of the shared device, and arrives at the time when the self-determination is a preliminary management node in the sub-group. Determine whether the first partition can be a management node based on whether the first partition is occupied, and then, as a management node, perform reconfiguration between nodes and inter-node task scheduling as a management node to make the cluster system load Balanced, effectively avoiding the occurrence of brain splitting, ensuring high availability and reliability of the cluster.

Embodiment 5

FIG. 7 is a schematic structural diagram of a node device according to an embodiment of the present invention. As shown in FIG. 7, the composition of the node device in the embodiment of the present invention includes: a determining module 71 and a determining module 72;

The determining module 71 is configured to determine that the first node is a preliminary management node of the subgroup according to the preset first management policy when detecting an abnormal connection between the nodes;

The determining module 72 is configured to determine, according to the preset second management policy, whether the first node can be converted into a management node by the preliminary management node, so that when the determination is yes, the node is configured as a management node between the nodes. Reconfiguration and inter-node task scheduling.

In an embodiment, the determining module 71 is further configured to determine the third management policy according to the third management policy. The two nodes are management nodes, so that the second node performs resource configuration and task scheduling.

In an embodiment, the determining module 72 is configured to determine, according to a preset network detection manner, whether the first node has an external network connection, and if the determination is yes, determining that the first node can be converted into a management node; When the determination is no, it is determined that the first node cannot be converted into a management node.

In an embodiment, the shared storage device of the cluster supports multi-node common access;

The determining module 72 is configured to determine whether the shared storage device is occupied, and when the shared storage device is not occupied, create a placeholder file in a specific directory of the shared storage device, and after a preset time Detecting whether there is a placeholder file created by another preparatory management node in the specific directory, if not, determining that the first node can be converted into a management node; if yes, a node of the subgroup in which the first node is located The number is compared with the number of nodes of the subgroup in which the other preparatory management nodes are located, and based on the comparison result, it is determined whether the first node can be converted into a management node by the preparatory management node.

In an embodiment, the shared storage device of the cluster supports single node exclusive access;

The determining module 72 is configured to determine an access time of the first node to the first partition of the shared storage device, and mount the first partition when the access time arrives, and determine the first partition Whether there is a placeholder file, if there is no placeholder file in the first partition, it is determined that the first node can be converted into a management node.

In an embodiment, the determining module 71 is configured to determine that the first node is a node with the smallest node number in the subgroup where the first node is located, and the first node is a preliminary management node of the subgroup in which the first node is located. .

In the embodiment of the present invention, the determining module 71 and the determining module 72 in the node device may be a central processing unit (CPU) or a digital signal processor (DSP, Digital Signal Processor) in the terminal or the server. ), or Field Programmable Gate Array (FPGA), or Integrated Circuit (ASIC) implementation.

The above description of the node device is similar to the description of the above method, and the beneficial effects of the same method are described without further description. Technical details not disclosed in the node device embodiment of the present invention For a section, please refer to the description of the method embodiment of the present invention.

It can be understood by those skilled in the art that all or part of the steps of implementing the above method embodiments may be completed by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed. The foregoing storage medium includes: a mobile storage device, a random access memory (RAM), a read-only memory (ROM), a magnetic disk, or an optical disk. A medium that can store program code.

Alternatively, the above-described integrated unit of the embodiment of the present invention may be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a stand-alone product. Based on the understanding, the technical solution of the embodiment of the present invention may be embodied in the form of a software product stored in a storage medium, including a plurality of instructions for causing a computer device (which may be a personal computer, a server, or Either network device or the like) performs all or part of the methods described in various embodiments of the invention. The foregoing storage medium includes various media that can store program codes, such as a mobile storage device, a RAM, a ROM, a magnetic disk, or an optical disk.

The foregoing is only an embodiment of the present invention, but the scope of protection of the present application is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present application. It is covered by the scope of protection of this application. Therefore, the scope of protection of the present application should be determined by the scope of the claims.

Industrial applicability

According to the embodiment of the present invention, when the heartbeat connection between the nodes is abnormal, the cluster is split into two or more subgroups, and the first node further determines whether it can become a cluster after determining that it is the preparatory management node of the subgroup in which the subgroup is located. Management node, and when it is judged as YES, as a management node, the inter-node reconfiguration and inter-node task scheduling of the cluster resources effectively avoid the occurrence of brain splitting and ensure the high availability and reliability of the cluster. Moreover, since the first node is a node in the cluster, it is not necessary to introduce a third-party management device, and the implementation is simple.

Claims

A node management method in a cluster, the method being applied to a first node, the method comprising:

When detecting a heartbeat connection abnormality between nodes, determining, according to the first management policy, that the first node is a preliminary management node of the subgroup in which the node is located;

Determining whether the first node can be converted into a management node by the preliminary management node based on the second management policy, and determining that the first node can be converted into a management node by the preliminary management node, Configuration and task scheduling between nodes.
The method according to claim 1, wherein before the detecting an abnormality of the heartbeat connection between the nodes, the method further comprises:

Determining, according to the third management policy, that the second node is a management node, so that the second node performs resource configuration and task scheduling.
The method according to claim 1 or 2, wherein the determining, based on the second management policy, whether the first node can be converted into a management node by the preliminary management node comprises:

Determining whether the first node has an external network connection based on a preset network detection manner, determining that the first node can be converted into a management node when the determination is yes; determining that the first node cannot be converted into a Management node.
The method according to claim 1 or 2, wherein the shared storage device in the cluster supports multi-node common access;

The determining, according to the second management policy, whether the first node can be converted into a management node by the preliminary management node includes:

When it is determined that the shared storage device is not occupied, a placeholder file is created on the shared storage device, and after a preset time, it is detected whether there is a placeholder file created by another preparatory management node in the specific directory, if not Determining that the first node can be converted into a management node; if present, comparing the number of nodes of the sub-group in which the first node is located and the number of nodes of the sub-group in which the other preparatory management node is located, and based on the comparison result It is determined whether the first node can be converted into a management node by the preliminary management node.
The method of claim 4, wherein the section of the subgroup in which the first node is located The number of points and the number of nodes of the subgroup in which the other preparatory management nodes are located are compared, and based on the comparison result, it is determined whether the first node can be converted into a management node by the preliminary management node, including:

Determining that the number of nodes of the sub-group in which the first node is located is greater than the number of nodes of the sub-group in which the other preparatory management nodes are located, determining that the first node can be converted into a management node;

Determining whether the node number of the first node is smaller than the first node when determining that the number of nodes of the sub-group in which the first node is located is the most and the sub-group having the same number of nodes as the sub-group in which the first node is located a node number of a preliminary management node in a subgroup having the same number of nodes in the subgroup, and determining that the first node can be converted into a management node when the determination is yes, and determining that the first node is not convertible when the determination is no Become a management node.
The method according to claim 1 or 2, wherein the shared storage device in the cluster supports single node exclusive access;

The determining, according to the second management policy, whether the first node can be converted into a management node by the preliminary management node includes:

Determining an access time of the first node to the first partition of the shared storage device, and mounting the first partition when the access time arrives, and determining that there is no placeholder file in the first partition When it is determined that the first node can be converted into a management node; when it is determined that the location file exists in the first partition, it is determined that the first node cannot be converted into a management node.
The method according to claim 1 or 2, wherein the determining, according to the first management policy, that the first node is a preliminary management node of a subgroup, comprises:

Determining that the first node is a node with the smallest node number in the subgroup where the first node is located, and the first node is a preliminary management node of the subgroup in which the first node is located.
A node device, the node device includes: a determining module and a determining module;

The determining module is configured to: when detecting an abnormal heartbeat connection between the nodes, determine, according to the first management policy, that the first node is a preliminary management node of the subgroup in which the node is located;

The determining module is configured to determine, according to the second management policy, whether the first node can be converted into a management node by the preliminary management node, and determine that the first node can be converted into a management node by the preliminary management node, as a management node pair Cluster resources perform reconfiguration between nodes and task scheduling between nodes.
The node device according to claim 8, wherein

The determining module is further configured to determine, according to the third management policy, that the second node is a management node, so that the second node performs resource configuration and task scheduling.
A node device according to claim 8 or 9, wherein

The determining module is configured to determine, according to a preset network detection manner, whether the first node has an external network connection, and if the determination is yes, determining that the first node can be converted into a management node; The first node cannot be converted into a management node.
The node device according to claim 8 or 9, wherein the shared storage device in the cluster supports multi-node common access;

The determining module is configured to: when the shared storage device is not occupied, create a placeholder file in a specific directory of the shared storage device, and detect whether there is another preparatory management in the specific directory after a preset time. a placeholder file created by the node, if not, determining that the first node can be converted into a management node; if present, the number of nodes of the subgroup where the first node is located and the subgroup of the other preparatory management node The number of nodes is compared, and based on the comparison result, it is determined whether the first node can be converted into a management node by the preliminary management node.
The node device according to claim 11, wherein

The determining module is configured to determine that the number of nodes of the sub-group in which the first node is located is greater than the number of nodes of the sub-group in which the other preparatory management node is located, and determine that the first node can be converted into a management node;

Determining whether the node number of the first node is smaller than the first node when determining that the number of nodes of the sub-group in which the first node is located is the most and the sub-group having the same number of nodes as the sub-group in which the first node is located a node number of a preliminary management node in a subgroup having the same number of nodes in the subgroup, and determining that the first node can be converted into a management node when the determination is yes, and determining that the first node is not convertible when the determination is no Become a management node.
The node device according to claim 8 or 9, wherein the shared storage device in the cluster supports single node exclusive access;

The determining module is configured to determine an access time of the first node to the first partition of the shared storage device, and mount the first partition when the access time arrives, and determine the first Whether there is a placeholder file in the sub-area, determining that the first node does not have a placeholder file, determining that the first node can be converted into a management node; determining that the placeholder file exists in the first partition, determining The first node cannot be converted into a management node.
A node device according to claim 8 or 9, wherein

The determining module is configured to determine that the first node is a node with the smallest node number in the subgroup where the first node is located, and the first node is a preliminary management node of the subgroup in which the first node is located.
A computer readable storage medium storing computer executable instructions for performing a node management method within a cluster of any of claims 1-7.