WO2012083693A1 - 一种用于集群计算机系统的投票仲裁方法及装置 - Google Patents

一种用于集群计算机系统的投票仲裁方法及装置 Download PDF

Info

Publication number
WO2012083693A1
WO2012083693A1 PCT/CN2011/077598 CN2011077598W WO2012083693A1 WO 2012083693 A1 WO2012083693 A1 WO 2012083693A1 CN 2011077598 W CN2011077598 W CN 2011077598W WO 2012083693 A1 WO2012083693 A1 WO 2012083693A1
Authority
WO
WIPO (PCT)
Prior art keywords
resource
cluster
node
votes
sub
Prior art date
Application number
PCT/CN2011/077598
Other languages
English (en)
French (fr)
Inventor
杜学文
王卫伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2011/077598 priority Critical patent/WO2012083693A1/zh
Priority to CN201180001450.7A priority patent/CN102308559B/zh
Publication of WO2012083693A1 publication Critical patent/WO2012083693A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/142Reconfiguring to eliminate the error
    • G06F11/1425Reconfiguring to eliminate the error by reconfiguration of node membership
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2038Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with a single idle spare processing component
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2046Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant where the redundant components share persistent storage

Definitions

  • the present invention relates to computer communication networks, and more particularly to a voting arbitration method and apparatus for a cluster computer system. Background technique
  • a cluster computer system when a fault occurs and the cluster system is split into several sub-clusters, if the corresponding prevention means is not used, the cluster will be split, that is, several sub-clusters after the split take over the services of other sub-clusters. Provide services or access shared resources.
  • the arbitration mechanism is one of the means to solve the current brain-cracking of the cluster computer system.
  • the implementation method is as follows: Each node in the cluster system casts one or more votes. When the cluster splits, the sub-cluster with more nodes has a legal cluster. The legitimate cluster then takes over the services of the unlawful sub-cluster.
  • the invention provides a voting arbitration method and device for a cluster computer system, which effectively solves only the number of nodes by taking the number of nodes of the split sub-cluster and the number of resource votes on the node as considerations for the legal cluster takeover arbitration. As the arbitration considerations, the takeover switching time increases, and the continuous service time of the cluster system decreases.
  • a voting arbitration method for a cluster computer system comprising:
  • the arbitration is performed according to the number of resource votes and the number of node votes on the nodes in the split sub-cluster to ensure that the legal sub-cluster obtained by the arbitration continues to provide services. among them
  • a voting arbitration apparatus for a cluster computer system, the apparatus comprising:
  • the arbitration is performed according to the determination of the number of resource votes and the number of node votes on the nodes in the split sub-cluster to ensure that the legal sub-cluster obtained by the arbitration continues to be provided. Service, where
  • the number of resource votes may be set according to the startup time of the application resource running on the node.
  • the embodiments of the present invention have the following beneficial effects: By setting the number of resource votes for each node according to the startup time of the running resources on the node, after the cluster splits, the number of nodes of the split sub-cluster and the number of resource votes on the node are split.
  • the arbitration factor of the post-sub-cluster takeover effectively reduces the switching processing time of the split sub-cluster takeover, and achieves the effect of reducing service downtime.
  • FIG. 1 is a flow chart showing a voting arbitration method for a cluster computer system in accordance with an embodiment of the present invention.
  • FIG. 2 illustrates a flow diagram of a resource ticket number setting on a node within a cluster in accordance with an embodiment of the present invention.
  • FIG. 3 illustrates a schematic diagram of a networking model of a two-node high availability cluster computer system in accordance with an embodiment of the present invention.
  • FIG. 4 is a block diagram showing the structure of a voting arbitration apparatus for a cluster computer system according to an embodiment of the present invention.
  • FIG. 5 illustrates a schematic structural diagram of a resource ticket number setting module according to an embodiment of the present invention. detailed description
  • FIG. 1 a flow chart of a voting arbitration method for a cluster computer system according to an embodiment of the present invention is illustrated.
  • the voting arbitration method for a cluster computer system provided by the present invention includes:
  • the arbitration is performed according to the number of resource votes and the number of node votes on the nodes in the split sub-cluster to ensure that the legal sub-cluster obtained by the arbitration continues to provide services. among them
  • the number of resource votes may be set according to the startup time of the application resource running on the node.
  • the determination of the legal cluster after the split is arbitrated to make arbitration.
  • the resulting legal sub-cluster continues to provide services.
  • the fault may be a cluster split caused by a heartbeat detection fault between nodes, or may be a cluster split caused by a node's own fault. For example, a two-node cluster system splits due to heartbeat detection failures between two nodes, splitting into sub-cluster 1 (including node 1) and sub-cluster 2 (including node 2).
  • the number of node votes in the split sub-cluster can be implemented in the form of one vote or one vote per node.
  • each node can vote for one vote
  • sub-cluster 1 includes only one node (ie, node 1)
  • the number of node votes is one
  • sub-cluster 2 includes one node (ie, node 2), and the number of node votes is also one.
  • the number of resource votes on the node is the sum of the number of votes of the application resources running on the node.
  • the number of resource votes can be set according to the startup time of each application resource on the node.
  • the startup time of application resource appl is 20S.
  • S is the time measurement unit seconds
  • the number of resource votes can be set to 1 ticket
  • the startup time of the application resource app2 is 40S
  • the number of resource tickets can be set to 2 votes
  • the number of resource votes on the node 1 is the application resources appl and app2 on the node.
  • the sum of the number of resources that is, 3 votes. It is to be noted that the corresponding relationship between the application resource startup time and the number of resource votes can be set according to the application requirements, and is not limited to the corresponding relationship mentioned in the embodiment of the present invention.
  • the number of resource votes of the split sub-cluster is the sum of the number of resource votes of each node in the sub-cluster, for example, the sub-cluster 1 includes node 1 and node 2, and the number of resource votes of node 1 is 2 votes, and node 2
  • the number of resource tickets is 3, and the number of resource votes of the sub-cluster 1 is the sum of the number of resource votes of the node 1 and the node 2 in the sub-cluster, that is, 5 votes.
  • a four-node cluster computer system is taken as an example, splitting into two sub-clusters due to failure, sub-cluster 1 (including node 1 and node 2) and sub-cluster 2 (including node 3 and node) 4), node 1 has 4 votes, node 2 has 2 votes, node 3 has 1 ticket, and node 4 has 1 ticket.
  • the cluster 1 includes two nodes, the number of node votes is 2 votes, and the sub-cluster 2 includes two nodes, and the number of node votes is also two votes.
  • the voting arbitration method provided by the present invention, according to the number of resource votes on the nodes in the split sub-cluster The number of node votes determines the legal sub-cluster.
  • the number of nodes in sub-cluster 1 and sub-cluster 2 is the same.
  • the legal sub-cluster cannot be determined by comparing the number of nodes.
  • sub-cluster 1 (6 votes for resources) and sub-cluster 2 (2 votes for resources)
  • the number of resource votes is such that the number of resource votes of the sub-cluster 1 is greater than the number of resource votes of the sub-cluster 2, and the sub-cluster 1 is determined to be a legal sub-cluster, and the determined legal sub-cluster 1 is connected to the pipe cluster 2.
  • FIG. 2 a schematic flowchart of a resource ticket setting on a node in a cluster according to an embodiment of the present invention is illustrated, which may specifically include:
  • the resource startup unit starts an application resource on the node.
  • the monitoring unit monitors a startup time of the application resource by using a monitoring script.
  • the resource voting score setter sets the number of resource votes of the application resource according to the startup time obtained by the monitoring.
  • the resource initiation unit, the monitoring unit, and the resource voting score setter in the embodiments of the present invention may be deployed in a device for managing a cluster computing system.
  • the application resources include an httpd application resource, a tomcat application resource, and the like.
  • the resource startup unit starts the application resource, the corresponding number of votes is set for the application resource on each node according to the startup time monitored by the monitoring script, where the number of votes is related to the startup time of the resource monitored by the monitoring script, and the monitoring script monitors.
  • the resource start time and the number of resource tickets may be as shown in Table 1 below, where the startup time is T and S is the time measurement unit.
  • the relationship between the startup time and the number of resource votes can be set according to the application requirements, and is not limited to the correspondence shown in Table 1 in the embodiment of the present invention.
  • the function of monitoring the traffic volume of the resource may be added to the monitoring script in consideration of the problem that the startup time of the resource increases as the traffic volume increases after the application resource is started.
  • the application resources in the embodiment of the present invention may further include an Oracle database application resource, that is, an oracle application resource.
  • the size of the traffic is a major factor affecting the resource startup time.
  • the method of the present invention may include:
  • the traffic volume of the application resource is obtained by the monitoring script.
  • the resource voting score setter resets the resource ticket number for the application resource.
  • the command for obtaining the traffic volume may be added to the monitoring script to obtain the traffic volume of the application resource.
  • the resource voting score setter is started to reset the resource ticket number for the resource.
  • the predetermined threshold may be set by a technician according to an application requirement.
  • the highly available cluster computer system of the present invention may comprise a two node high availability cluster computer system.
  • FIG. 3 a schematic diagram of a networking model of a two-node high availability cluster computer system according to an embodiment of the present invention is illustrated.
  • a two-node cluster computer system fails, it needs to use a third party (disk, quorum server, etc.) to determine which node is a legitimate node, and the legitimate node takes over the service running on the other node.
  • a third party disk, quorum server, etc.
  • the relationship between the startup time and the number of resource votes shown in Table 1 can be used.
  • the startup time of the resource appl is 80S
  • the number of votes that can be set is 4 votes
  • the startup time of the resource app2 is 30S.
  • the number of votes that can be set is 2 votes, the startup time of resource app3 is 20S, and the number of votes can be set to 1 ticket, where S is the time measurement unit.
  • the node with the largest number of resource votes can be determined by comparing the number of resource votes on the two nodes in the two-node high-availability cluster computer system, and the determined node with the largest number of resource votes (node 1) is taken as the legal child.
  • the cluster takes over the illegal sub-cluster (node 2) so that the legitimate sub-cluster can continue to provide external services. Since the number of nodes of the two nodes is the same in the two-node cluster system, it can be compared during the arbitration process.
  • node 1 By comparing the number of resource votes of node 1 and node 2, it is determined that node 1 is a legal node, and node 1 acquires the control of the disk. The right to take over the resources running on node 2, that is, to restart resources app2, app3 on node 1, takes about 30 seconds. It should be noted that, in order to ensure that the split dual-node sub-cluster continues to provide external services, if it is determined that the node with high resource votes (node 1) should take over the service on another node (node 2), but when node 1 fails When the takeover is not possible, node 2 can be used to take over to continue to provide services. Under the networking model shown in Figure 3, the existing node voting method is used for arbitration.
  • Node 2 has at least 50% chance of acquiring the control of the disk to take over the running resource appl on node 1, and the time required is approximately 80S.
  • the method provided by the present invention effectively reduces the processing time of resource switching and improves the continuous external service time of the cluster system.
  • the cluster computer system of the present invention may include not only a two-node high availability cluster computer system, but also a high availability cluster computer system having more than three nodes. It should be noted that, in a three-node cluster system, when the cluster is split into two sub-clusters, sub-cluster 1 includes two nodes, and sub-cluster 2 includes one node, in order to avoid a single point of failure in the cluster, first consider the cluster splitting. The number of node votes of the sub-cluster determines the sub-cluster 1 including the two nodes as a legitimate sub-cluster.
  • a four-node cluster system is taken as an example to describe a method for arbitrating to continue to provide services according to the number of resource votes and node votes on nodes in each sub-cluster after splitting.
  • each node has resource ticket information and corresponding node ticket information of all nodes in the cluster, node 1 runs resource appl, node 2 runs resources app2 and app3, and node 3 runs resource app4.
  • the resource app5 is run on the node 4, wherein the relationship between the start time of the resource and the number of resource votes on each node can be as shown in Table 2.
  • Each node in the cluster can store the number of resource votes and the number of node votes as shown in Table 3.
  • the method provides arbitration according to the determination of the number of resource votes and the number of node votes on the nodes in the split sub-cluster, and the number of nodes of the sub-cluster 1 and the sub-cluster 2 are the same, both are 2 votes, and the resources of the sub-cluster 1
  • the sum of the number of votes for each node is 4 votes, and the sum of the number of resource votes of each node of sub-cluster 2 is 7 votes, and the number of resource votes of sub-cluster 2 is higher than the number of resource votes of sub-cluster 1, and sub-cluster can be determined.
  • the present invention provides the split sub-cluster node according to the present invention.
  • the combination of the number of votes and the number of resource votes significantly reduces the processing time required for resource switching in the process of taking over the legitimate sub-cluster, and improves the continuous external service time of the cluster system.
  • the embodiment of the present invention can determine the legal sub-cluster by comparing the number of resource votes and the number of node votes on the nodes in the split sub-cluster. For example, the number of node votes of each sub-cluster in the split sub-cluster can be compared to determine the number of node votes occupying the cluster.
  • the sub-cluster has more than two-thirds of the total number of nodes, if the sub-cluster includes the sub-cluster that meets the above-mentioned number of node votes, the sub-cluster is determined to be a legal sub-cluster, and if the split sub-cluster does not include the above-mentioned
  • the sub-cluster of the node ticket number condition further determines whether the sub-cluster including the number of node votes in the split sub-cluster accounts for more than one-third of the total node votes of the cluster and includes the largest resource ticket node, if the split sub-cluster includes the above If the sub-cluster of the node ticket number condition and the resource ticket number condition is determined as the legal sub-cluster, if the sub-cluster that meets the above-mentioned number of node votes and the number of resource votes is not included, the cluster system is down, and the service cannot be continued.
  • Node 1 runs resource appl and node 2 runs resource app2.
  • the resource app3 is run on the node 3
  • the resource app4 is run on the node 4
  • the resource app5 is run on the node 5.
  • Table 4 The correspondence between the startup time of the resource running on each node and the number of resource votes can be as shown in Table 4, and the number of resource votes stored on the node and The node ticket number information is shown in Table 5.
  • sub-cluster 1 including node 1, node 2, node 3, and node 4
  • sub-cluster 2 including only node 5
  • the number of nodes in sub-cluster 1 (4 votes) is greater than two-thirds of the total number of nodes in the cluster (5 votes). It can be determined that sub-cluster 1 is a legal cluster.
  • sub-cluster 1 including node 1, node 2, and node 3
  • sub-cluster 2 including node 4 and node 5
  • the above arbitration scheme provided by the present invention
  • the resource startup time of the other sub-cluster that is taken over is higher than the maximum resource number.
  • the time is short, so the processing time of resource switching can be shortened in the sub-cluster takeover process, and the continuous external service time of the cluster can be improved.
  • other scores may be used, and those skilled in the art may perform corresponding according to the application. Settings.
  • the apparatus 400 includes:
  • the arbitration module 402 is configured to: when the splitting of the cluster computer system occurs, arbitrate according to the determination of the number of resource votes and the number of node votes on the nodes in the split sub-cluster for the legal sub-cluster after splitting to make the arbitration legal
  • the sub-cluster continues to provide services, of which
  • the number of resource votes may be set according to the startup time of the application resource running on the node.
  • the arbitration module 402 in the embodiment of the present invention may be deployed in a device for managing a cluster computer system.
  • the arbitration module 402 when the cluster computer system is split, the arbitration module 402 may be configured to perform arbitration according to the determination of the number of resource votes and the number of node votes on the nodes in each split sub-cluster for the split legal sub-cluster to obtain arbitration.
  • the legal sub-cluster continues to provide external services.
  • the number of node votes in the split sub-cluster can be implemented in the form of one vote or one vote per node.
  • each node can vote for one vote
  • sub-cluster 1 includes only one node (ie, node 1)
  • the number of node votes is one
  • sub-cluster 2 includes one node (ie, node 2), and the number of node votes is also one.
  • the number of resource votes on the node is the sum of the number of votes of the application resources running on the node.
  • the number of resource votes can be set according to the startup time of each application resource on the node.
  • the startup time of application resource appl is 20S.
  • S is the time measurement unit seconds;
  • the number of resource votes can be set to 1 ticket
  • the startup time of the application resource app2 is 40S
  • the number of resource tickets can be set to 2 votes
  • the number of resource votes on the node 1 is the application resources appl and app2 on the node.
  • the sum of the number of resources that is, 3 votes. It should be noted that a person skilled in the art can set a correspondence between an application resource startup time and a resource ticket number according to an application requirement, and is not limited to this. Corresponding relationship mentioned in the embodiment of the invention.
  • the number of resource votes of the split sub-cluster is the sum of the number of resource votes of each node in the sub-cluster, for example, the sub-cluster 1 includes node 1 and node 2, and the number of resource votes of node 1 is 2 votes, and node 2
  • the number of resource tickets is 3, and the number of resource votes of the sub-cluster 1 is the sum of the number of resource votes of the node 1 and the node 2 in the sub-cluster, that is, 5 votes.
  • a four-node cluster computer system is taken as an example, split into two sub-clusters due to failure, sub-cluster 1 (including node 1 and node 2) and sub-cluster 2 (including node 3 and node 4), node 1
  • the number of resource votes is 4 votes
  • the number of resource votes of node 2 is 2 votes
  • the number of resource votes of node 3 is 1 ticket
  • the number of resource votes of node 4 is 1 ticket
  • each node takes the form of one vote
  • the sub-cluster 1 includes two.
  • the node has a node number of 2 votes
  • the sub-cluster 2 includes two nodes, and the number of node votes is also two.
  • the arbitration module 402 determines the legal number according to the number of resource votes and the number of node votes on the nodes in the split sub-cluster.
  • sub-cluster 1 and sub-cluster 2 have the same number of nodes, and the legal sub-cluster cannot be determined by comparing the number of nodes.
  • the number of resource votes of sub-cluster 1 (6 votes for resources) and sub-cluster 2 (2 votes for resources)
  • the number of resource votes of the sub-cluster 1 is greater than the number of resource votes of the sub-cluster 2
  • the sub-cluster 1 is determined as a legal sub-cluster, and the determined legal subset is determined. 1 to take over a subset of the group 2.
  • the voting arbitration apparatus for the cluster computer system includes not only the module shown in FIG. 4 but also a resource ticket number setting module.
  • a resource ticket number setting module Referring to FIG. 5, a schematic structural diagram of a resource ticket number setting module according to an embodiment of the present invention is illustrated.
  • the resource ticket number setting module 500 may specifically include:
  • a resource startup unit 502 configured to start an application resource on the node
  • the monitoring unit 504 is configured to monitor a startup time of the application resource by using a monitoring script.
  • the resource voting score setter 506 is configured to set a resource ticket number of the application resource according to the startup time monitored by the monitoring unit.
  • the resource initiating unit 502, the monitoring unit 504, and the resource voting score setter 506 in the embodiment of the present invention may be deployed in a device for managing a cluster computer system.
  • the application resources include httpd application resources, tomcat application resources, and the like.
  • the resource startup unit 502 can be used to start the application resource on the node, and the monitoring unit 504 monitors the startup time of the application resource by using the monitoring script, and then sets the resource for the application resource according to the startup time monitored by the monitoring unit 504 by the resource voting score setter 506.
  • the resource ticket number setting is related to the resource startup time obtained by the monitoring. The longer the startup time monitored by the monitoring unit 504 is, the higher the resource ticket number set by the resource voting score setter 506 for the application resource is.
  • the application resource in the embodiment of the present invention may further include an Oracle database application resource, that is, an oracle application resource.
  • an Oracle database application resource that is, an oracle application resource.
  • the size of the service volume is a major factor affecting the resource startup time, and the traffic volume increases after the resource is started.
  • the startup time of the resource will increase.
  • the monitoring unit 504 is further configured to acquire the traffic of the application resource by using a monitoring script.
  • the startup resource voting score setter 506 resets the resource ticket for the application resource.
  • a command for obtaining a traffic volume may be added to the monitoring script to obtain a traffic volume of the application resource.
  • the resource voting score setter is started to reset the resource ticket number for the resource, where the predetermined The threshold can be set by the technician according to the needs of the application.
  • the cluster computer system in the embodiment of the present invention may comprise a two-node high availability cluster computer system, which may be a two-node high availability cluster computer system as shown in FIG.
  • the arbitration module 402 can be configured to: determine the node with the largest number of resource votes by comparing the number of resource votes on the two nodes in the two-node high-availability cluster computer system, and determine the determined The node with the largest number of resource votes acts as a legitimate sub-cluster to enable the legitimate sub-cluster to continue to provide services.
  • the two-node cluster system with the resource ticket number setting as shown in FIG.
  • the arbitration module 402 determines that node 1 is a legitimate node by comparing the number of resource votes of node 1 and node 2, and the disk 1 is controlled by node 1. The right to take over the resources running on node 2, that is, to restart resources app2, app3 on node 1, takes about 30 seconds.
  • the cluster computer system of the present invention may include not only a two-node high availability cluster computer system, but also a high availability cluster computer system including three or more nodes.
  • the arbitration module of the embodiment of the present invention can determine the legal cluster by comparing the number of resource votes and the number of node votes on the nodes in the split sub-cluster. For example, the number of node votes of each sub-cluster in the split sub-cluster can be compared to determine the number of node votes.
  • the sub-cluster of the cluster has more than two-thirds of the total number of nodes, if the sub-cluster includes the sub-cluster that meets the above-mentioned number of node votes, the sub-cluster is determined to be a legal sub-cluster, and is not included in the sub-cluster after the split.
  • the sub-cluster meets the above-mentioned number of node votes, it is further determined whether the sub-cluster including the number of cluster node votes in the split sub-cluster accounts for more than one-third of the total node votes of the cluster and includes the largest resource ticket node, if the sub-cluster in the split sub-cluster If the sub-cluster meets the conditions of the above-mentioned node number of votes and the number of resource votes, the sub-cluster is determined to be a legal sub-cluster, and if the number of nodes corresponding to the above-mentioned nodes is not included With the sub-cluster of the resource ticket number condition, the cluster system is down and cannot continue to serve.
  • node 1 runs resource appl
  • node 2 runs resource app2
  • node 3 runs resource app3
  • node 4 runs resource app4
  • node 5 runs resource app5, and resources running on each node
  • Table 4 The correspondence between the startup time and the number of resource tickets can be as shown in Table 4.
  • the number of resource votes and node votes stored on the node are as shown in Table 5.
  • sub-cluster 1 including node 1, node 2, node 3, and node 4
  • sub-cluster 2 including only node 5
  • the arbitration module can compare the split sub-subs The number of node votes of the cluster 1 and the sub-cluster 2 is determined.
  • the number of node votes of the sub-cluster 1 (4 votes) is greater than two-thirds of the total number of node votes (5 votes) of the cluster, and it can be determined that the sub-cluster 1 is a legal cluster.
  • sub-cluster 1 including node 1, node 2, and node 3
  • sub-cluster 2 including node 4 and node 5
  • the arbitration module compares The number of node votes and the number of resource tickets can be determined as sub-cluster 2 as a legal cluster.
  • the resource startup time of the other sub-cluster that it takes over Both of them have shorter startup time than the maximum number of resource votes, so the processing time of resource switching can be shortened during the sub-cluster takeover process, and the continuous external service time of the cluster can be improved.
  • other scores may be used, and those skilled in the art may perform corresponding according to the application. Settings.
  • a voting arbitration method and apparatus for a cluster computer system embodying the present invention sets a resource ticket number for a node resource according to a startup time of an application resource on a node in the cluster, and combines the number of resource votes with the number of node votes into the determination of the legal sub-cluster after splitting (that is, the takeover of the sub-cluster) is arbitrated, which effectively reduces the processing time required for resource switching in the process of taking over the legitimate sub-cluster, and improves the continuous service time of the cluster system.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bus Control (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Hardware Redundancy (AREA)

Abstract

本发明公开了一种用于集群计算机系统的投票仲裁方法,所述方法包括:当所述集群计算机系统发生分裂时,根据所述分裂后子集群内节点上的资源票数和节点票数为分裂后合法子集群的确定进行仲裁以使所述仲裁得到的合法子集群继续提供服务。相应地,本发明还提供了一种用于集群计算机系统的投票仲裁装置。实施本发明提供的方法和装置,可有效地降低合法子集群确定过程中的资源切换处理时间,提高集群系统连续对外服务的时间。

Description

一种用于集群计算机系统的投票仲裁方法及装置 技术领域
本发明涉及计算机通信网络, 尤其涉及一种用于集群计算机系统的投票仲 裁方法及装置。 背景技术
在集群计算机系统中, 当发生故障使集群系统分裂为几个子集群时, 如果 不采用相应的防止手段会导致集群发生脑裂, 即分裂后的几个子集群都互相接 管其他子集群的服务共同向外提供服务或访问共享资源。 仲裁机制是解决当前 集群计算机系统脑裂的手段之一, 其实现方式为: 集群系统中的每个节点投一 票或多票, 当集群发生分裂时, 节点票数多的子集群为合法集群, 然后由合法 集群接管不合法子集群的服务。 但是, 采用上述节点投票方式为分裂后合法子 集群的确定(即子集群的接管)进行仲裁的过程中, 当不合法子集群上运行了 启动时间较长, 完成切换所需时间较长的资源, 而合法子集群上运行的资源切 换相对筒单时, 会增加资源的切换时间, 从增加服务宕机时间, 降低集群计算 机系统连续对外服务的时间, 即降低集群系统的可用性。 发明内容
本发明提供了一种用于集群计算机系统的投票仲裁方法及装置, 通过将分 裂后子集群的节点票数和节点上的资源票数作为合法集群接管仲裁的考虑因 素, 有效地解决了仅将节点数作为仲裁考虑因素而导致的接管切换时间增加, 集群系统连续服务时间降低的问题。
根据本发明的第一方面, 提供了一种用于集群计算机系统的投票仲裁方法, 所述方法包括:
当所述集群计算机系统发生分裂时, 根据所述分裂后子集群内节点上的资 源票数和节点票数为分裂后合法子集群的确定进行仲裁以使所述仲裁得到的合 法子集群继续提供服务, 其中
所述资源票数可根据节点上运行的应用资源的启动时间进行设置。 根据本发明的第二方面, 提供了一种用于集群计算机系统的投票仲裁装置, 所述装置包括:
用于当所述集群计算机系统发生分裂时, 根据所述分裂后子集群内节点上 的资源票数和节点票数为分裂后合法子集群的确定进行仲裁以使所述仲裁得到 的合法子集群继续提供服务, 其中
所述资源票数可根据节点上运行的应用资源的启动时间进行设置。
实施本发明实施例, 具有如下有益效果: 通过根据节点上的运行资源的启 动时间为各节点设置资源票数, 在集群发生分裂后, 将分裂后子集群的节点票 数和节点上的资源票数作为分裂后子集群接管的仲裁因素, 有效地降低了分裂 后子集群接管的切换处理时间, 达到了减少服务宕机时间的效果。 附图说明 例或现有技术描述中所需要使用的附图作筒单地介绍, 显而易见地, 下面描述 中的附图仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不付 出创造性劳动性的前提下, 还可以根据这些附图获得其他的附图。
图 1 图示了根据本发明实施方式的用于集群计算机系统的投票仲裁方法流 程示意图。
图 2 图示了根据本发明实施方式的集群内节点上的资源票数设置的流程示 意图。
图 3 图示了根据本发明实施方式的双节点高可用集群计算机系统的组网模 型示意图。
图 4 图示了根据本发明实施方式的用于集群计算机系统的投票仲裁装置的 结构示意图。
图 5图示了根据本发明实施方式的资源票数设置模块的结构示意图。 具体实施方式
下面将结合本发明实施例中的附图, 对本发明实施例中的技术方案进行清 楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而不是 全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有作出创造 性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。 参见图 1 ,图示了根据本发明实施方式的用于集群计算机系统的投票仲裁方 法流程示意图, 本发明提供的用于集群计算机系统的投票仲裁方法包括:
当所述集群计算机系统发生分裂时, 根据所述分裂后子集群内节点上的资 源票数和节点票数为分裂后合法子集群的确定进行仲裁以使所述仲裁得到的合 法子集群继续提供服务, 其中
所述资源票数可根据节点上运行的应用资源的启动时间进行设置。
本发明的实施方式中, 当集群计算机系统发生故障分裂时, 根据分裂后各 子集群内节点上的资源票数和节点票数为分裂后合法集群的确定(即子集群的 接管)进行仲裁以使仲裁得到的合法子集群继续提供服务。 其中, 所述故障可 能是因节点间的心跳检测故障而导致的集群分裂, 也可能是因某一节点自身故 障而导致的集群分裂。 例如, 双节点集群系统因两节点间的心跳检测故障发生 分裂, 分裂为子集群 1 (包括节点 1 )和子集群 2 (包括节点 2 )。
本发明实施方式中, 分裂后子集群内的节点票数可以采用每个节点投一票 或一票以上的形式实施, 例如上述的双节点集群系统中, 可以采用每个节点投 一票, 子集群 1仅包括 1个节点 (即节点 1 ), 则其节点票数为 1票, 子集群 2 包括 1个节点 (即节点 2 ), 则其节点票数也为 1票。 节点上的资源票数是节点 上运行的应用资源的票数之和, 可以根据节点上各个应用资源的启动时间进行 资源票数设置, 例如节点 1上有应用资源 appl和 app2, 应用资源 appl的启动 时间 20S ( S为时间度量单位秒), 可设置资源票数为 1票, 应用资源 app2的启 动时间为 40S, 可设置资源票数为 2票, 节点 1上的资源票数为该节点上应用资 源 appl和 app2的资源票数之和, 即 3票。值得指出的是, 本领域技术人员可以 根据应用需要设置应用资源启动时间与资源票数之间的对应关系, 并不限于本 发明实施方式中所提到的对应关系。
本发明实施方式中, 分裂后子集群的资源票数为该子集群内各节点的资源 票数之和, 例如子集群 1内包括节点 1和节点 2, 节点 1的资源票数为 2票, 节 点 2的资源票数为 3票, 则子集群 1的资源票数为该子集群内节点 1和节点 2 的资源票数之和, 即 5票。
本发明的一些实施方式中, 以四节点集群计算机系统为例, 因故障分裂为 两个子集群, 子集群 1 (包括节点 1和节点 2 )和子集群 2 (包括节点 3和节点 4 ), 节点 1的资源票数为 4票, 节点 2的资源票数为 2票, 节点 3的资源票数 为 1票, 节点 4的资源票数为 1票, 采用每个节点投一票的形式, 子集群 1包 括两个节点, 其节点票数为 2票, 子集群 2包括两个节点, 其节点票数同样为 两票, 通过本发明提供的投票仲裁方法根据分裂后子集群内节点上的资源票数 和节点票数确定合法子集群, 子集群 1和子集群 2的节点票数相同, 通过节点 票数比较不能确定出合法子集群, 通过比较子集群 1 (资源票数为 6票)和子集 群 2 (资源票数为 2票 )的资源票数, 可知子集群 1的资源票数大于子集群 2的 资源票数, 将子集群 1确定为合法子集群, 由确定出的合法子集群 1接管子集 群 2。
本发明的所述分裂后子集群内节点上的资源票数的确定可以采用图 2所示 的流程设置。 参见图 2, 图示了根据本发明实施方式的集群内节点上的资源票数 设置的流程示意图, 具体可以包括:
S200, 资源启动单元启动所述节点上的应用资源;
S202, 监测单元通过监测脚本监测所述应用资源的启动时间;
S204, 资源投票分数设置器根据监测得到的启动时间设置所述应用资源的 资源票数。
需要说明的是, 本发明实施方式中的资源启动单元、 监测单元和资源投票 分数设置器可部署于用于管理集群计算系统的设备内。 本发明的实施方式中, 为使节点上运行的应用资源参与投票,其中应用资源包括 httpd应用资源、 tomcat 应用资源等。 可以在资源启动单元启动应用资源后, 根据监测脚本监测到的启 动时间为各节点上的应用资源设置相应的票数, 其中票数设置的高低与监测脚 本监测得到的资源的启动时间有关, 监测脚本监测得到的启动时间越长, 资源 投票分数设置器为所述应用资源设置的资源票数越高。 例如, 在本发明的一些 实施方式中,资源启动时间与资源票数可以是如下表 1所示,表中启动时间为 T, S为时间度量单位秒。
表 1 启动时间 T/S 资源票数
0S<T<=10S 0票
10S<T<=30S 1 30S<T<=50S 2票
50S<T<=70S 3票
70S<T<=90S 4票
90S<T 5票
需要说明的是, 本领域技术人员可以根据应用需要设置启动时间和资源票 数之间的关系, 并不限于本发明实施方式中表 1所示的对应关系。
本发明的实施方式中, 考虑到应用资源启动后随着业务量的增加资源的启 动时间会增加的问题, 可以在监测脚本中加入监测资源的业务量的功能。 本发 明实施方式中的应用资源还可以包括甲骨文数据库应用资源, 即 oracle应用资 源, 对于 oracle应用资源, 业务量的大小是影响资源启动时间的主要因素。 在 oracle资源的资源票数设置过程中, 本发明的方法可以包括:
通过监测脚本获取所述应用资源的业务量, 当所述业务量超过预定的阈值 时, 资源投票分数设置器为所述应用资源重新设置资源票数。
具体地, 可以在监测脚本中加入获取业务量的命令来获取应用资源的业务 量, 当业务量超过预定的阈值时, 启动资源投票分数设置器为所述资源重新设 置资源票数。 其中, 所述预定的阈值可以由技术人员根据应用需要进行设置。
本发明的高可用集群计算机系统可以包括双节点高可用集群计算机系统。 参见图 3 ,图示了根据本发明实施方式的双节点高可用集群计算机系统的组网模 型示意图。 当双节点集群计算机系统出现故障分裂时, 需要借助第三方(磁盘、 仲裁服务器等)确定哪一个节点是合法节点, 并由合法节点接管另一节点上运 行的服务。 在图 2所示的双节点集群中, 可以采用表 1 中所示的启动时间和资 源票数对应关系, 资源 appl的启动时间为 80S, 可设置票数为 4票, 资源 app2 的启动时间为 30S, 可设置票数为 2票, 资源 app3的启动时间为 20S , 可设置 票数为 1票, 其中 S为时间度量单位秒。 当发生故障时, 可以通过比较所述双 节点高可用集群计算机系统内两节点上的资源票数确定出具有最大资源票数的 节点, 将确定出的具有最大资源票数的节点(节点 1 )作为合法子集群接管非法 子集群(节点 2 )以使合法子集群可以继续对外提供服务, 由于在双节点集群系 统中两节点的节点票数相同, 因而可以在仲裁过程中不进行比较。 通过比较节 点 1和节点 2的资源票数确定出节点 1为合法节点, 由节点 1获取磁盘的控制 权接管节点 2上运行的资源, 即在节点 1上重新启动资源 app2、 app3 , 所需的 时间大约为 30S。 需要说明的是, 为了保证分裂后的双节点子集群继续对外提供 服务, 如果确定出资源票数高的节点 (节点 1 )应当接管另一节点 (节点 2 )上 的服务, 但是当节点 1发生故障无法进行接管时, 可以采用节点 2进行接管以 继续对外提供服务。 在图 3 所示的组网模型下, 采用现有的节点投票方式进行 仲裁, 节点 2至少有 50%的机会获取磁盘的控制权接管节点 1上的运行的资源 appl , 其所需时间为大约 80S。 显然, 本发明提供的方法有效地降低了资源切换 的处理时间, 提高了集群系统的连续对外服务时间。
本发明的集群计算机系统不仅可以包括双节点高可用集群计算机系统, 还 可以包括含有三个以上节点的高可用集群计算机系统。 需要说明的是, 在三节 点集群系统中, 当集群分裂为两个子集群, 子集群 1 包括两个节点, 子集群 2 包括一个节点时, 为了避免集群中的单点故障, 首先考虑集群分裂后子集群的 节点票数, 将包括两个节点的子集群 1确定为合法子集群。
优选地, 以四节点集群系统为例, 说明本发明的根据分裂后各子集群内节 点上的资源票数和节点票数为分裂后各子集群的接管进行仲裁以继续提供服务 的方法。 四节点集群计算机系统中, 各节点上存有集群内所有节点的资源票数 信息及对应的节点票数信息, 节点 1上运行资源 appl , 节点 2上运行资源 app2 和 app3 , 节点 3上运行资源 app4, 节点 4上运行资源 app5 , 其中各节点上资源 的启动时间和资源票数对应关系可以如表 2所示, 集群中每个节点上可存有如 表 3所示的资源票数和节点票数信息。
表 2
Figure imgf000007_0001
表 3 资源名 运行节点 资源票数
appl 节点 1
app2 节点 2 2票 app3 节点 2
app4 节点 3 4票 app5 节点 4 3票 本发明实施方式中, 发生故障时, 分裂为子集群 1 (包括节点 1和节点 2 ), 子集群 2 (包括节点 3和节点 4 )。 按照本发明提供方法根据分裂后子集群内的 节点上的资源票数和节点票数为合法子集群的确定进行仲裁, 子集群 1 和子集 群 2的节点票数相同, 均为 2票, 子集群 1的资源票数为其各节点的资源票数 之和为 4票, 而子集群 2其各节点的资源票数之和为 7票, 子集群 2的资源票 数高于子集群 1的资源票数, 可以确定出子集群 2为合法集群, 由确定出的子 集群 2接管子集群 1上运行的资源, 所需的切换时间为大约 50S。若采用现有的 节点投票方案, 子集群 1具有 50%的机会接管子集群 2上运行的资源, 其所需 的资源切换时间大约为 140S , 显然, 本发明所提供的根据分裂后子集群节点票 数和资源票数结合的方法显著地减少了合法子集群接管过程中资源切换所需要 的处理时间, 提高了集群系统的连续对外服务时间。
本发明实施方式可以通过比较分裂后子集群内的节点上的资源票数和节点 票数来确定合法子集群, 例如可以通过比较分裂后子集群内各子集群的节点票 数来确定出节点票数占集群的总节点票数三分之二以上的子集群, 如果分裂后 子集群中包括符合上述节点票数条件的子集群, 则将该子集群确定为合法子集 群, 如果分裂后子集群中不包括符合上述节点票数条件的子集群, 则进一步判 断分裂后子集群内是否包括节点票数占集群的总节点票数三分之一以上且包括 最大资源票数节点的子集群, 如果分裂后子集群内包括符合上述节点票数条件 和资源票数条件的子集群, 则将该子集群确定为合法子集群, 如果没有包括符 合上述节点票数和资源票数条件的子集群, 则集群系统宕机, 无法继续服务。
以五节点集群系统为例,节点 1上运行资源 appl ,节点 2上运行资源 app2, 节点 3上运行资源 app3 , 节点 4上运行资源 app4, 节点 5上运行资源 app5 , 各 节点上运行的资源的启动时间与资源票数对应关系可以如表 4所示, 节点上存 有的资源票数和节点票数信息如表 5所示。
表 4
Figure imgf000009_0001
本发明的一些实施方式中, 发生故障后, 分裂为子集群 1 (包括节点 1 , 节 点 2, 节点 3和节点 4 ), 子集群 2 (仅包括节点 5 ), 根据本发明提供的上述仲 裁方案, 子集群 1的节点票数(4票) 大于集群的总节点票数(5票) 的三分之 二, 可以确定子集群 1 为合法集群。 本发明的另外一些实施方式中, 分裂为子 集群 1 (包括节点 1 , 节点 2和节点 3 ) , 子集群 2 (包括节点 4和节点 5 ) , 根据 本发明提供的上述仲裁方案, 在确定没有节点票数大于集群的总节点票数三分 之二的子集群后, 进一步判断是否包括节点票数大于集群的总节点票数三分之 一且包括最大资源票数的节点的子集群, 通过比较节点票数和资源票数, 可以 确定出子集群 2 为合法集群, 由于该子集群内包括了具有资源票数最大(启动 时间最长的资源 app5 ) 的节点, 其接管的另外的子集群的资源启动时间都比最 大资源票数的启动时间短, 因而可以在子集群接管过程中缩短资源切换的处理 时间, 提高集群连续对外服务时间。 值得指出的是, 本发明实施方式中的节点 票数判断除了可以采用提到的三分之二和三分之一之外, 还可以采用其他的分 数, 本领域技术人员可根据应用情况进行相应的设置。
以上结合附图和图表对本发明的用于集群计算机系统的投票仲裁方法进行 了说明, 下面将结合附图对本发明的用于集群计算机系统的投票仲裁装置进行 说明。
参见图 4,图示了根据本发明实施方式的用于集群计算机系统的投票仲裁装 置的结构示意图, 所述装置 400包括:
仲裁模块 402, 用于当所述集群计算机系统发生分裂时,根据所述分裂后子 集群内节点上的资源票数和节点票数为分裂后合法子集群的确定进行仲裁以使 所述仲裁得到的合法子集群继续提供服务, 其中
所述资源票数可根据节点上运行的应用资源的启动时间进行设置。
需要说明的是, 本发明实施方式中的仲裁模块 402可部署于用于管理集群 计算机系统的设备内。 本发明的实施方式中, 当集群计算机系统发生分裂时, 仲裁模块 402可以用于根据各分裂后子集群内节点上的资源票数和节点票数为 分裂后合法子集群的确定进行仲裁以使仲裁得到的合法子集群继续对外提供服 务。
本发明实施方式中, 分裂后子集群内的节点票数可以采用每个节点投一票 或一票以上的形式实施, 例如上述的双节点集群系统中, 可以采用每个节点投 一票, 子集群 1仅包括 1个节点 (即节点 1 ), 则其节点票数为 1票, 子集群 2 包括 1个节点 (即节点 2 ), 则其节点票数也为 1票。 节点上的资源票数是节点 上运行的应用资源的票数之和, 可以根据节点上各个应用资源的启动时间进行 资源票数设置, 例如节点 1上有应用资源 appl和 app2, 应用资源 appl的启动 时间 20S ( S为时间度量单位秒;), 可设置资源票数为 1票, 应用资源 app2的启 动时间为 40S, 可设置资源票数为 2票, 节点 1上的资源票数为该节点上应用资 源 appl和 app2的资源票数之和, 即 3票。值得指出的是, 本领域技术人员可以 根据应用需要设置应用资源启动时间与资源票数之间的对应关系, 并不限于本 发明实施方式中所提到的对应关系。
本发明实施方式中, 分裂后子集群的资源票数为该子集群内各节点的资源 票数之和, 例如子集群 1内包括节点 1和节点 2, 节点 1的资源票数为 2票, 节 点 2的资源票数为 3票, 则子集群 1的资源票数为该子集群内节点 1和节点 2 的资源票数之和, 即 5票。
本发明的一些实施方式中, 以四节点集群计算机系统为例, 因故障分裂为 两个子集群, 子集群 1 (包括节点 1和节点 2 )和子集群 2 (包括节点 3和节点 4 ), 节点 1的资源票数为 4票, 节点 2的资源票数为 2票, 节点 3的资源票数 为 1票, 节点 4的资源票数为 1票, 采用每个节点投一票的形式, 子集群 1包 括两个节点, 其节点票数为 2票, 子集群 2包括两个节点, 其节点票数同样为 两票, 通过本发明提供的仲裁模块 402根据分裂后子集群内节点上的资源票数 和节点票数确定合法子集群, 子集群 1和子集群 2的节点票数相同, 通过节点 票数比较不能确定出合法子集群, 通过比较子集群 1 (资源票数为 6票)和子集 群 2 (资源票数为 2票 )的资源票数, 可知子集群 1的资源票数大于子集群 2的 资源票数, 将子集群 1确定为合法子集群, 由确定出的合法子集群 1接管子集 群 2。
本发明的实施方式中, 所述用于集群计算机系统的投票仲裁装置不仅包括 图 4所示的模块, 还可以包括资源票数设置模块。 参见图 5 , 图示了根据本发明 实施方式的资源票数设置模块的结构示意图, 所述资源票数设置模块 500具体 可以包括:
资源启动单元 502, 用于启动所述节点上的应用资源;
监测单元 504, 用于通过监测脚本监测所述应用资源的启动时间; 资源投票分数设置器 506,用于根据监测单元监测得到的启动时间设置所述 应用资源的资源票数。
需要说明的是, 本发明实施方式中的资源启动单元 502、监测单元 504和资 源投票分数设置器 506可以部署于用于管理集群计算机系统的设备内。 本发明 实施方式中, 为使节点上运行的应用资源参与投票, 其中应用资源包括 httpd应 用资源、 tomcat应用资源等。 可以利用资源启动单元 502启动节点上的应用资 源, 监测单元 504通过监测脚本监测应用资源的启动时间, 然后通过资源投票 分数设置器 506根据监测单元 504监测得到的启动时间为所述应用资源设置资 源票数。 其中, 资源票数设置与监测得到的资源启动时间有关, 监测单元 504 监测得到的启动时间越长, 资源投票分数设置器 506 为所述应用资源设置的资 源票数越高。
本发明实施方式中的应用资源还可以包括甲骨文数据库应用资源,即 oracle 应用资源, 对于 oracle应用资源, 业务量的大小是影响资源启动时间的主要因 素, 考虑到资源启动后随着业务量的增加资源的启动时间会增加的问题。 监测 单元 504, 还用于通过监测脚本获取所述应用资源的业务量, 当所述业务量超过 预定的阈值时, 启动资源投票分数设置器 506 为所述应用资源重新设置资源票 数。 具体而言, 可以在监测脚本中加入获取业务量的命令来获取应用资源的业 务量, 当业务量超过预定的阈值时, 启动资源投票分数设置器为所述资源重新 设置资源票数, 其中预定的阈值可以由技术人员根据应用需要进行设置。
本发明实施方式中的集群计算机系统可以包括双节点高可用集群计算机系 统, 可以是如图 3 所示的双节点高可用集群计算机系统。 对于双节点高可用集 群计算机系统, 所述仲裁模块 402可以用于实现: 通过比较所述双节点高可用 集群计算机系统内两节点上的资源票数确定出具有最大资源票数的节点, 将确 定出的具有最大资源票数的节点作为合法子集群以使所述合法子集群继续提供 服务。 在如图 3 所示的资源票数设置的双节点集群系统中, 在发生故障时, 仲 裁模块 402通过比较节点 1和节点 2的资源票数确定出节点 1为合法节点, 由 节点 1获取磁盘的控制权接管节点 2上运行的资源, 即在节点 1上重新启动资 源 app2、 app3 , 所需的时间大约为 30S。
本发明的集群计算机系统不仅可以包括双节点高可用集群计算机系统, 还 可以包括含三个以上节点的高可用集群计算机系统。 本发明实施方式的仲裁模 块可以通过比较分裂后子集群内的节点上的资源票数和节点票数来确定合法集 群, 例如可以通过比较分裂后子集群内各子集群的节点票数来确定出节点票数 占集群的总节点票数三分之二以上的子集群, 如果分裂后子集群中包括符合上 述节点票数条件的子集群, 则将该子集群确定为合法子集群, 如果分裂后子集 群中不包括符合上述节点票数条件的子集群, 则进一步判断分裂后子集群内是 否包括集群节点票数占集群的总节点票数三分之一以上且包括最大资源票数节 点的子集群, 如果分裂后子集群内包括符合上述节点票数条件和资源票数条件 的子集群, 则将该子集群确定为合法子集群, 如果没有包括符合上述节点票数 和资源票数条件的子集群, 则集群系统宕机, 无法继续服务。 以五节点集群系 统为例, 节点 1上运行资源 appl , 节点 2上运行资源 app2, 节点 3上运行资源 app3 , 节点 4上运行资源 app4, 节点 5上运行资源 app5 , 各节点上运行的资源 的启动时间与资源票数对应关系可以如表 4所示, 节点上存有的资源票数和节 点票数信息如表 5所示。
本发明的一些实施方式中, 发生故障后, 分裂为子集群 1 (包括节点 1 , 节 点 2, 节点 3和节点 4 ), 子集群 2 (仅包括节点 5 ), 仲裁模块可以通过比较分 裂后子集群 1和子集群 2的节点票数确定出, 子集群 1的节点票数(4票)大于 集群的总节点票数(5票)的三分之二, 可以确定子集群 1为合法集群。 本发明 的另外一些实施方式中, 分裂为子集群 1 (包括节点 1 , 节点 2和节点 3 ), 子集 群 2 (包括节点 4和节点 5 ), 根据本发明提供的上述仲裁方案, 在确定没有节 点票数大于集群的总节点票数三分之二的子集群后, 进一步判断是否包括节点 票数大于集群的总节点票数三分之一且包括最大资源票数的节点的子集群, 仲 裁模块通过比较节点票数和资源票数, 可以确定出子集群 2 为合法集群, 由于 该子集群内包括了具有资源票数最大(启动时间最长的资源 app5 ) 的节点, 其 接管的另外的子集群的资源启动时间都比最大资源票数的启动时间短, 因而可 以在子集群接管过程中缩短资源切换的处理时间, 提高集群连续对外服务时间。 值得指出的是, 本发明实施方式中的节点票数判断除了可以采用提到的三分之 二和三分之一之外, 还可以采用其他的分数, 本领域技术人员可根据应用情况 进行相应的设置。
实施本发明的用于集群计算机系统的投票仲裁方法及装置, 根据集群内节 点上的应用资源的启动时间为节点资源设置资源票数, 并将资源票数与节点票 数结合为分裂后合法子集群的确定(即子集群的接管)进行仲裁, 有效地减少 了合法子集群接管过程中资源切换所需要的处理时间, 提高了集群系统的连续 服务时间。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程, 是可以通过计算机程序来指令相关的硬件来完成, 所述的程序可存储于一计算 机可读取存储介质中, 该程序在执行时, 可包括如上述各方法的实施例的流程。 其中, 所述的存储介质可为磁碟、 光盘、 只读存储记忆体(Read-Only Memory, ROM )或随机存储记忆体(Random Access Memory, RAM )等。 以上所述是本发明的优选实施方式, 应当指出, 对于本技术领域的普通技 术人员来说, 在不脱离本发明原理的前提下, 还可以做出若干改进和变化, 这 些改进和变化也视为本发明的保护范围。

Claims

权 利 要 求
1、一种用于集群计算机系统的投票仲裁方法, 其特征在于, 所述方法包括: 当所述集群计算机系统发生分裂时, 根据所述分裂后子集群内节点上的资 源票数和节点票数为分裂后合法子集群的确定进行仲裁以使所述仲裁得到的合 法子集群继续提供服务, 其中
所述资源票数可根据节点上运行的应用资源的启动时间进行设置。
2、 如权利要求 1所述的方法, 其特征在于, 所述分裂后子集群内节点上的 资源票数的设置步骤包括:
资源启动单元启动所述节点上的应用资源;
监测单元通过监测脚本监测所述应用资源的启动时间;
资源投票分数设置器根据监测得到的启动时间设置所述应用资源的资源票 数。
3、如权利要求 2所述的方法, 其特征在于, 所述监测得到的启动时间越长, 所述资源投票分数设置器为所述应用资源设置的资源票数越高。
4、 如权利要求 3所述的方法, 其特征在于, 所述应用资源包括甲骨文数据 库应用资源。
5、 如权利要求 4所述的方法, 其特征在于, 所述方法包括:
通过监测脚本获取所述应用资源的业务量, 当所述业务量超过预定的阈值 时, 资源投票分数设置器为所述应用资源重新设置资源票数。
6、 如权利要求 3所述的方法, 其特征在于, 所述集群计算机系统包括双节 点高可用集群计算机系统。
7、 如权利要求 6所述的方法, 其特征在于, 所述根据所述分裂后子集群内 节点上的资源票数和节点票数为分裂后合法子集群的确定进行仲裁以使所述仲 裁得到的合法子集群继续提供服务包括:
通过比较所述双节点高可用集群计算机系统内两节点上的资源票数确定出 具有最大资源票数的节点, 将确定出的具有最大资源票数的节点作为合法子集 群以使所述合法子集群继续提供服务。
8、 如权利要求 3所述的系统, 其特征在于, 所述集群计算机系统包括含三 个以上节点的高可用集群计算机系统。
9、一种用于集群计算机系统的投票仲裁装置, 其特征在于, 所述装置包括: 仲裁模块, 用于当所述集群计算机系统发生分裂时, 根据所述分裂后子集 群内节点上的资源票数和节点票数为分裂后合法子集群的确定进行仲裁以使所 述仲裁得到的合法子集群继续提供服务, 其中
所述资源票数可根据节点上运行的应用资源的启动时间进行设置。
10、 如权利要求 9所述的装置, 其特征在于, 所述装置还包括资源票数设 置模块, 具体包括:
资源启动单元, 用于启动所述节点上的应用资源;
监测单元, 用于通过监测脚本监测所述应用资源的启动时间;
资源投票分数设置器, 用于根据监测单元监测得到的启动时间设置所述应 用资源的资源票数。
11、 如权利要求 10所述的装置, 其特征在于, 所述监测单元监测得到的启 动时间越长, 所述资源投票分数设置器为所述应用资源设置的资源票数越高。
12、 如权利要求 11所述的装置, 其特征在于, 所述应用资源包括甲骨文数 据库应用资源。
13、 如权利要求 12所述的装置, 其特征在于, 所述监测单元, 还用于通过 监测脚本获取所述应用资源的业务量, 当所述业务量超过预定的阈值时, 启动 资源投票分数设置器为所述应用资源重新设置资源票数。
14、 如权利要求 11所述的装置, 其特征在于, 所述集群计算机系统包括双 节点高可用集群计算机系统。
15、 如权利要求 14所述的装置, 其特征在于, 所述仲裁模块用于实现: 通 过比较所述双节点高可用集群计算机系统内两节点上的资源票数确定出具有最 大资源票数的节点, 将确定出的具有最大资源票数的节点作为合法子集群以使 所述合法子集群继续提供服务。
16、 如权利要求 11所述的装置, 其特征在于, 所述集群计算机系统包括含 三个以上节点的高可用集群计算机系统。
PCT/CN2011/077598 2011-07-26 2011-07-26 一种用于集群计算机系统的投票仲裁方法及装置 WO2012083693A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2011/077598 WO2012083693A1 (zh) 2011-07-26 2011-07-26 一种用于集群计算机系统的投票仲裁方法及装置
CN201180001450.7A CN102308559B (zh) 2011-07-26 2011-07-26 一种用于集群计算机系统的投票仲裁方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/077598 WO2012083693A1 (zh) 2011-07-26 2011-07-26 一种用于集群计算机系统的投票仲裁方法及装置

Publications (1)

Publication Number Publication Date
WO2012083693A1 true WO2012083693A1 (zh) 2012-06-28

Family

ID=45381277

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/077598 WO2012083693A1 (zh) 2011-07-26 2011-07-26 一种用于集群计算机系统的投票仲裁方法及装置

Country Status (2)

Country Link
CN (1) CN102308559B (zh)
WO (1) WO2012083693A1 (zh)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102904946B (zh) * 2012-09-29 2015-06-10 浪潮(北京)电子信息产业有限公司 集群内节点管理方法和装置
CN103647820B (zh) * 2013-12-09 2016-11-23 华为数字技术(苏州)有限公司 用于分布式集群系统的仲裁方法及仲裁装置
CN104717077B (zh) * 2013-12-11 2018-05-22 中国移动通信集团山东有限公司 一种管理数据中心的方法、装置及系统
CN105450717A (zh) * 2014-09-29 2016-03-30 中兴通讯股份有限公司 集群脑裂处理方法和装置
CN104378232B (zh) * 2014-11-10 2018-01-19 东软集团股份有限公司 主备集群组网模式下的脑裂发现、恢复方法及装置
CN105704187B (zh) * 2014-11-27 2019-03-05 华为技术有限公司 一种集群脑裂的处理方法及装置
WO2016106682A1 (zh) * 2014-12-31 2016-07-07 华为技术有限公司 一种集群脑裂后仲裁处理方法、仲裁存储装置以及系统
CN106502822A (zh) * 2015-09-08 2017-03-15 中兴通讯股份有限公司 数据读写方法及装置
CN107181834B (zh) * 2017-06-13 2021-02-12 聚好看科技股份有限公司 一种redis管理虚拟IP地址的方法、装置及redis系统
CN108134712B (zh) * 2017-12-19 2020-12-18 海能达通信股份有限公司 一种分布式集群脑裂的处理方法、装置及设备
US11169854B2 (en) 2019-01-31 2021-11-09 Hewlett Packard Enterprise Development Lp Node eligibility determinations
CN111835534B (zh) * 2019-04-15 2022-05-06 华为技术有限公司 一种用于集群控制的方法,网络设备,主控节点装置及计算机可读存储介质
CN112711632A (zh) * 2019-12-27 2021-04-27 山东鲁能软件技术有限公司 一种高可用集群的异步数据流复制方法及系统
CN112468596B (zh) 2020-12-02 2022-07-05 苏州浪潮智能科技有限公司 一种集群仲裁方法、装置、电子设备及可读存储介质
CN113608836A (zh) * 2021-08-06 2021-11-05 上海英方软件股份有限公司 一种基于集群的虚拟机高可用方法及系统
US20230161633A1 (en) * 2021-11-23 2023-05-25 International Business Machines Corporation Avoidance of Workload Duplication Among Split-Clusters
CN114374707B (zh) * 2022-03-22 2022-06-21 联想凌拓科技有限公司 用于存储集群的管理方法、装置、设备及介质
CN115617917B (zh) * 2022-12-16 2023-03-10 中国西安卫星测控中心 一种数据库集群多活控制的方法、装置、系统和设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1483163A (zh) * 2000-12-21 2004-03-17 ���ܿ���ϵͳ���޹�˾ 利用网络介质链接状态功能来提高计算机集群系统可用性的方法
CN101178668A (zh) * 2006-11-11 2008-05-14 国际商业机器公司 一种用于在节点集群中管理分区划分的方法和设备
CN101252603A (zh) * 2008-04-11 2008-08-27 清华大学 基于存储区域网络san的集群分布式锁管理方法
US7496782B1 (en) * 2004-06-01 2009-02-24 Network Appliance, Inc. System and method for splitting a cluster for disaster recovery

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1483163A (zh) * 2000-12-21 2004-03-17 ���ܿ���ϵͳ���޹�˾ 利用网络介质链接状态功能来提高计算机集群系统可用性的方法
US7496782B1 (en) * 2004-06-01 2009-02-24 Network Appliance, Inc. System and method for splitting a cluster for disaster recovery
CN101178668A (zh) * 2006-11-11 2008-05-14 国际商业机器公司 一种用于在节点集群中管理分区划分的方法和设备
CN101252603A (zh) * 2008-04-11 2008-08-27 清华大学 基于存储区域网络san的集群分布式锁管理方法

Also Published As

Publication number Publication date
CN102308559A (zh) 2012-01-04
CN102308559B (zh) 2014-04-02

Similar Documents

Publication Publication Date Title
WO2012083693A1 (zh) 一种用于集群计算机系统的投票仲裁方法及装置
US6789213B2 (en) Controlled take over of services by remaining nodes of clustered computing system
WO2018192533A1 (zh) 节点设备运行方法、工作状态切换装置、节点设备及介质
WO2021184587A1 (zh) 基于Prometheus的私有云监控方法、装置、计算机设备及存储介质
WO2016107172A1 (zh) 一种集群脑裂后仲裁处理方法、仲裁存储装置以及系统
US9489230B1 (en) Handling of virtual machine migration while performing clustering operations
US10127124B1 (en) Performing fencing operations in multi-node distributed storage systems
US8671218B2 (en) Method and system for a weak membership tie-break
CN106533805B (zh) 一种微服务请求处理方法、微服务控制器及微服务架构
WO2018192534A1 (zh) 节点设备运行方法、工作状态切换装置、节点设备及介质
WO2014101424A1 (zh) 分布式数据库同步方法和系统
WO2021042733A1 (zh) 区块链事务处理方法、装置、计算机设备及存储介质
CN104410674B (zh) 一种单点登录系统的web会话同步方法
WO2014067254A1 (zh) 一种检测数据库数据一致性的方法、装置及数据库系统
WO2016078362A1 (zh) 一种双主控隔离的逐板升级的方法及装置
US9367298B1 (en) Batch configuration mode for configuring network devices
US20200067912A1 (en) Implementing authentication protocol for merging multiple server nodes with trusted platform modules utilizing provisioned node certificates to support concurrent node add and remove
CN113612614A (zh) 基于区块链网络的共识容灾方法、装置、设备和存储介质
CN111709023A (zh) 一种基于可信操作系统的应用隔离方法及系统
CN111209265A (zh) 一种数据库切换方法和终端设备
WO2013075501A1 (zh) 节点热插拔的方法及装置
WO2016101409A1 (zh) 数据倒换的方法、设备及系统
WO2023024821A1 (zh) 数据处理方法、系统、装置、计算机设备以及存储介质
WO2014075526A1 (zh) 网络数据处理方法及设备
US20210240698A1 (en) Asynchronous remote calls with undo data structures

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201180001450.7

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11851383

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11851383

Country of ref document: EP

Kind code of ref document: A1