CN112054926A

CN112054926A - Cluster management method and device, electronic equipment and storage medium

Info

Publication number: CN112054926A
Application number: CN202010899123.9A
Authority: CN
Inventors: 王文博; 万磊; 李毅; 王志远
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2020-08-31
Filing date: 2020-08-31
Publication date: 2020-12-08
Anticipated expiration: 2040-08-31
Also published as: CN112054926B

Abstract

The application provides a cluster management method, a cluster management device, electronic equipment and a storage medium. The cluster management method provided by the application is applied to a cluster, and the cluster comprises a plurality of data centers. Firstly, randomly selecting a node in each data center as a first supervision node, determining a leading node of each data center through the first supervision node, then obtaining a first vote number of each leading node in a classification duration, finally determining a main node of a cluster according to the first vote number corresponding to each leading node, and determining other nodes except the main node as slave nodes of the cluster so as to cooperatively process cluster services by using the main node and the slave nodes. Therefore, the interaction times among the nodes are greatly reduced, the time consumed for determining the master node and the slave node is effectively reduced, and the efficiency is improved. And the number of nodes in each data center, the actual transaction processing capacity and other factors are considered, so that the determined master node and the determined slave nodes meet the service requirements of the whole cluster.

Description

Cluster management method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a cluster management method and apparatus, an electronic device, and a storage medium.

Background

With the rapid development of computer technology and internet technology, the amount of traffic to be processed increases, so that the application of the cluster becomes wider and wider. For the nodes in the data centers included in the cluster, in order to perform the traffic processing, each data center is generally configured with a distributed application collaboration service (zookeeper) to determine the master node and the slave node from the nodes for performing the network traffic processing.

In the prior art, a node with the highest transaction processing capability in a cluster is generally determined as a master node, or nodes with a voting result exceeding half of the number of all nodes are determined as master nodes through mutual voting among all nodes, so as to perform service processing of the cluster.

However, for a relatively complex large cluster, the number of interactions between each other is increased due to the existence of multiple nodes, and thus, the time for determining the master node is long. Moreover, large clusters are usually distributed across data centers, and a conventional consistency principle is usually adopted in the prior art, that is, the number of nodes and the transaction processing capability of each data center are consistent by default, so that the determined master node and slave nodes cannot meet the service requirements of the whole cluster of the large cluster.

Disclosure of Invention

The application provides a cluster management method, a cluster management device, an electronic device and a storage medium, which are used for solving the technical problems that in the prior art, the efficiency of a main node is low, the processing capacity and the number of actual nodes are not considered, and the requirement of the whole cluster service cannot be met.

In a first aspect, the present application provides a cluster management method, which is applied to a cluster, where the cluster includes a plurality of data centers; the method comprises the following steps:

determining a leading node of each data center through a first supervision node, wherein the first supervision node is a node randomly selected from each data center;

acquiring a first vote number of each leading node in a classification time length;

and determining a master node of the cluster according to the first votes corresponding to the various leader nodes, and determining other nodes except the master node as slave nodes of the cluster so as to cooperatively process the service of the cluster by using the master node and the slave nodes.

In one possible design, each of the data centers is configured with a distributed application collaboration service zookeeper.

In one possible design, the number of nodes in each of the data centers may be the same or different.

In one possible design, the determining the master nodes of the cluster according to the first vote count corresponding to each of the master nodes includes:

determining a right calculation result corresponding to each leader node according to the first voting number and a first preset weighting algorithm;

and determining the main nodes of the cluster according to the rights calculation results corresponding to the various head nodes.

In one possible design, the determining the master nodes of the cluster according to the equity calculation result corresponding to each of the collar nodes includes:

and determining the leading node with the maximum rights and interests calculation result as the main node of the cluster.

In a possible design, if the number of the head nodes with the largest equity calculation result is multiple, the determining the master node of the cluster according to the equity calculation result corresponding to each head node includes:

generating a second vote number through a second supervision node, wherein the second supervision node is a randomly selected node in the candidate leading nodes;

and determining the target leader node corresponding to the second vote number as the main node of the cluster.

and when the first vote number corresponding to the candidate leading node is more than half of the total number of all the nodes, determining the candidate leading node as the main node.

In one possible design, the determining, by the first supervisory node, a head node of each data center includes:

splitting nodes in each data center into a plurality of voting clusters according to a preset division mechanism, wherein each voting cluster comprises at least one node;

acquiring a third voting number of each node in each voting cluster within a preset fixed time length through the first supervision node;

and determining the node with the highest third vote number as the leader node through the first supervision node.

In one possible design, when the third vote number is greater than half of the total number of nodes in the same data center, the candidate node corresponding to the third vote number is determined as the leading node.

In one possible design, when the number of candidate nodes with the highest third vote number is multiple, a fourth vote number is generated by the first supervising node, so that the candidate node corresponding to the fourth vote number is determined as the leading node.

In one possible design, the obtaining, by the first supervising node, a third number of votes for each node in each voting cluster within a preset fixed time period includes:

determining, by the first supervisory node, a survival status of each node in each data center;

and when the number of the nodes in the survival state is greater than a preset number threshold, acquiring the third vote number of the nodes in the survival state within the preset fixed time.

In one possible design, before obtaining the first number of votes for each leader node within the categorization time length, the method further includes:

judging whether all nodes have fault nodes or not through the second supervision node;

if so, the second supervision node sends heartbeat test packets of preset times to the failed node, and determines whether the failed node is in a down state according to feedback data of the heartbeat test packets;

and when the fault node is determined to be in the downtime state, the second monitoring node broadcasts the downtime state of the fault node to all nodes so as to enable the fault node not to generate the first vote number.

In one possible design, when the leading node is the failure node, the first monitoring node corresponding to the failure node is determined as the leading node by the second monitoring node.

In one possible design, when the leading node is the failed node, a candidate node in the same data center to which the failed node belongs is determined as the leading node by the second supervising node, and the candidate node is the node with the second ranking of the third votes.

determining a minimum voting cluster according to the plurality of voting clusters, and acquiring the basic processing time of the transaction processing capacity corresponding to the minimum voting cluster, wherein the minimum voting cluster is the voting cluster with the least number of nodes in all the plurality of voting clusters;

determining a complex coefficient corresponding to each voting cluster based on a preset simulation algorithm;

acquiring the time difference for processing a single request between each voting cluster and the minimum voting cluster;

and determining the classification duration according to the basic processing time, the complex coefficient, the time difference and the fixed time delay based on a second preset weighting algorithm.

In a second aspect, the present application provides a cluster management apparatus, which is applied to a cluster, where the cluster includes a plurality of data centers; the device comprises:

the first processing module is used for determining a leading node of each data center through a first supervision node, and the first supervision node is a node randomly selected from each data center;

the obtaining module is used for obtaining a first vote number of each leading node in the classification duration;

and the second processing module is used for determining a main node of the cluster according to the first votes corresponding to the various head nodes, determining other nodes except the main node as slave nodes of the cluster, and cooperatively processing the service of the cluster by using the main node and the slave nodes.

In one possible design, the second processing module includes:

the computing module is used for determining a rights calculation result corresponding to each leader node according to the first votes and a first preset weighting algorithm;

and the first determining module is used for determining the main nodes of the cluster according to the rights calculation result corresponding to each of the collar nodes.

In one possible design, the first determining module is specifically configured to:

In a possible design, if the number of the head nodes with the largest equity calculation result is multiple, the first determining module is specifically configured to:

In one possible design, the second processing module is further specifically configured to:

In one possible design, the first processing module further includes:

the splitting module is used for splitting the nodes in each data center into a plurality of voting clusters according to a preset dividing mechanism, and each voting cluster comprises at least one node;

the vote collecting module is used for acquiring a third vote number of each node in each voting cluster within a preset fixed time length through the first supervision node;

and the second determining module is used for determining the node with the highest third vote number as the leading node through the first supervision node.

In one possible design, the first processing module is further specifically configured to:

and when the third vote number is more than half of the total number of the nodes in the same data center, determining the candidate node corresponding to the third vote number as the leader node.

when the number of the candidate nodes with the highest third vote number is multiple, generating a fourth vote number through the first supervision node, and determining the candidate node corresponding to the fourth vote number as the leader node.

In one possible design, the booking module is specifically configured to:

In one possible design, the cluster management apparatus further includes: a third processing module; the third processing module is configured to:

In one possible design, the third processing module is further specifically configured to:

and when the leading node is the fault node, determining the first supervision node corresponding to the fault node as the leading node through the second supervision node.

and when the leading node is the fault node, determining a candidate node in the same data center to which the fault node belongs as the leading node through the second supervision node, wherein the candidate node is the node with the second third vote number rank.

In one possible design, the cluster management apparatus further includes: a fourth processing module; the fourth processing module is configured to:

In a third aspect, the present application provides an electronic device, comprising:

a processor; and

a memory communicatively coupled to the processor; wherein the memory stores instructions executable by the processor, and the instructions are executed by the processor to enable the processor to execute the cluster management method according to any one of the first aspect.

In a fourth aspect, the present application provides a non-transitory computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the cluster management method of any one of the first aspects.

The application provides a cluster management method, a cluster management device, electronic equipment and a storage medium, which are applied to a cluster, wherein the cluster comprises a plurality of data centers. The method comprises the steps of firstly, determining a leading node of each data center through a first supervision node, wherein the first supervision node is a node randomly selected from each data center. And then acquiring a first vote number of each leader node within the classification duration, finally determining a main node of the cluster according to the first vote number corresponding to each leader node, and determining other nodes except the main node as slave nodes of the cluster so as to cooperatively perform service processing of the cluster by using the main node and the slave nodes. And the master node is determined based on the nodes actually included in the cluster, so that the stability of the overall dispatching of the cluster is improved, and the overall service requirement of a large cluster is met.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;

fig. 2 is a schematic flowchart of a cluster management method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a process for determining a master node according to an embodiment of the present application;

fig. 4 is a schematic flowchart of another process for determining a master node according to an embodiment of the present application;

fig. 5 is a schematic flowchart of determining a leading node according to an embodiment of the present disclosure;

fig. 6 is a schematic flowchart of another cluster management method according to an embodiment of the present application;

fig. 7 is a schematic flowchart of another cluster management method according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a cluster management device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of another cluster management device according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a first processing module according to an embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of another cluster management device according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of another cluster management device according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of methods and apparatus consistent with certain aspects of the present application, as detailed in the appended claims.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

With the rapid development of computer technology and internet technology, the service processing capacity is increased, and further the application of the cluster is widened. In order to perform service processing through a cluster, each data center included in the cluster is configured with a distributed application collaboration service (zookeeper) to determine a master node and a slave node from the slave nodes for performing network service processing. In the prior art, a node with the strongest processing capability in a cluster is generally determined as a master node, or nodes with a voting result exceeding half of the number of all nodes are determined as master nodes through mutual voting among the nodes, so as to perform service processing of the cluster. However, as the complexity of the cluster increases and the data center is distributed across regions, the interaction between each other is dramatically increased by voting among an excessive number of nodes, resulting in an inefficient process of determining the master node. On the other hand, the conventional method generally adopts a conventional consistency principle, and the number of actual nodes and the actual processing capacity of each node are not considered, so that the determined master node and slave nodes cannot meet the service requirement of the whole cluster of the large cluster.

In view of the foregoing problems in the prior art, the present application provides a cluster management method, an apparatus, an electronic device, and a storage medium. And determining a leading node of each data center included in the cluster through the first round of voting, wherein the leading node participates in the second round of voting based on the rights and interests configuration, determines a main node from the second round of voting, and determines other nodes except the main node as slave nodes. Therefore, the time for determining the main node is greatly shortened, and the efficiency is improved. And the main node is determined based on the nodes actually included in the cluster, so that the stability of the overall scheduling of the cluster is improved, and the overall service requirement of a large cluster is met.

An exemplary application scenario of the embodiments of the present application is described below.

The cluster management method provided by the embodiment of the application can be executed by the cluster management device provided by the embodiment of the application, and the cluster management device provided by the embodiment of the application can be part or all of a server. Fig. 1 is a schematic view of an application scenario provided by an embodiment of the present application, where a cluster management method provided by the embodiment of the present application is applied to a cluster, where the cluster includes a plurality of Data centers, and a Data Center in the plurality of Data centers may be an Internet Data Center (IDC for short) distributed across logical areas, or a Data Communication Network Center (DCN for short) distributed in the same logical area, where the embodiment of the present application is not limited, each node of the plurality of Data centers is correspondingly configured with a server, and a node of each Data Center is configured with a distributed application collaboration service (zookeeper) to process a service of the cluster by using collaboration among the nodes, where the embodiment of the present application does not limit the number of all nodes, further, the number of nodes of each Data Center may be the same, or may be different. In other words, the number and specific types of servers included in the multiple data centers are not limited in the embodiments of the present application. In fig. 1, only two data centers included in the cluster 10 are described as each including two servers, and as shown in fig. 1, the network is used to provide a medium for communication links among the terminal devices 11, the terminal devices 12, the servers 13, the servers 14, the servers 15, and the servers 16. The network may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. The terminal device 11, the terminal device 12, and the server 13, the server 14, the server 15 and the server 16 may interact with each other through a network to receive or send messages, so that the server 13, the server 14, the server 15 and the server 16 in the data center included in the cluster 10 cooperate to process related traffic with each terminal device. The embodiment of the present application does not limit the type, number, and the like of the terminal device, for example, the terminal device may be various electronic devices installed thereon, such as various applications supporting a browser application, a virus killing application, a search application, and the like, including but not limited to a smart phone, a smart watch, a tablet computer, a personal digital assistant, an e-book reader, a laptop portable computer, a desktop computer, and the like.

Fig. 2 is a schematic flowchart of a cluster management method provided in an embodiment of the present application, and as shown in fig. 2, the cluster management method provided in this embodiment includes:

s101: and determining a leading node of each data center through the first supervision node.

The first supervision node is a node randomly selected from each data center.

The cluster includes a plurality of data centers, each including a plurality of nodes. For each data center, any node is randomly selected as a first supervision node of the data center, so that a leading node of the data center to which the node belongs is determined through the first supervision node. It is understood that any node in each data center that is alive is represented by a randomly selected node. The first supervision node of each data center has a Token (Token) right, for example, the first supervision node may judge the survival state of each node in the corresponding data center in real time, and eliminate nodes not in the survival state, so that the nodes do not participate in the link of determining the leading node.

The first monitoring node determines the leading node of each data center, which can be understood as that the first monitoring node performs vote counting on votes generated by nodes in the same data center to determine the leading node of the corresponding data center.

S102: and acquiring a first vote number of each leading node in the classification time length.

After the leader node of each data center is determined, voting is carried out among the leader nodes within the classification duration to generate a first voting number, and the first voting number is subjected to voting, namely the first voting number aiming at each leader node is obtained. In order to ensure the stability of the cluster, each leading node should be given a first vote number generated within a reasonable time, and the time is the classification time. The classification duration is related to the number of nodes in each data center included in the cluster, the transaction processing capability, the complexity of the cluster, and other factors.

S103: and determining a main node of the cluster according to the first vote number corresponding to each leading node, and determining other nodes except the main node as slave nodes of the cluster so as to cooperatively process the service of the cluster by using the main node and the slave nodes.

After the first vote number of each leading node is obtained, the main node of the cluster is determined according to the first vote number corresponding to each leading node, and after the main node is determined, other nodes except the main node in the cluster are determined as the slave nodes of the cluster, so that the main node and the slave nodes are utilized to cooperatively process the service of the cluster. It should be noted that, in this embodiment of the present application, the service of the cluster is not limited, and in an actual working condition, the service related to the cluster may be determined according to the service to be executed by configuring a server corresponding to each node, for example, the service may be storage of each data in a database, reading and writing of the data, and the like.

As can be seen from the description of the foregoing embodiments, in the cluster management method provided in the embodiments of the present application, the master node and the slave node of the cluster are determined through two rounds of voting, where the first round of voting is performed in each data center to determine the leading node of each data center. Each leader node participates in a second round of voting on behalf of the data center to which it belongs to determine a master node and a slave node therefrom. For a cluster with higher complexity, the number of nodes in a plurality of data centers included in the cluster is larger, and the plurality of data centers may be distributed across logical regions, and the corresponding network qualities are different. If all the nodes in the cluster vote simultaneously to determine the master node and the slave nodes, the time consumption is long, and the efficiency is low. In the embodiment of the application, not all the nodes vote simultaneously, and a corresponding mechanism that the slave leader node represents the respective data center node to vote is adopted, so that the interaction times among the nodes are greatly reduced, the time for determining the master node and the slave node is effectively reduced, and the efficiency is further improved. Further, in the cluster management method provided by the embodiment of the application, when the master node and the slave node are determined, the determination is performed based on the actual nodes in the data center, and the number of the nodes in each data center, the actual transaction processing capacity and other factors are considered, so that the determined master node and slave node are favorable for meeting the service requirements of the whole cluster.

The cluster management method provided by the embodiment of the application is applied to a cluster, and the cluster comprises a plurality of data centers. Firstly, randomly selecting a node in each data center as a first supervision node, determining a leading node of each data center through the first supervision node, then obtaining a first vote number of each leading node in a classification duration, finally determining a main node of a cluster according to the first vote number corresponding to each leading node, and determining other nodes except the main node as slave nodes of the cluster so as to cooperatively process cluster services by using the main node and the slave nodes. Therefore, the leader node is adopted to represent the corresponding data center to determine the master node and the slave node, the interaction times among the nodes are greatly reduced, the time for determining the master node and the slave node is effectively reduced, and the efficiency is further improved. Furthermore, based on the actual nodes in the data center, the number of the nodes in each data center, the actual transaction processing capacity and other factors are considered, so that the determined master node and the determined slave nodes are favorable for meeting the service requirements of the whole cluster.

In a possible design, a possible implementation manner of determining a master node through step S103 is shown in fig. 3, where fig. 3 is a schematic flowchart of a process for determining a master node provided in an embodiment of the present application, and as shown in fig. 3, the method for determining a master node provided in this embodiment includes:

s1031: determining a right calculation result corresponding to each leader node according to the first voting number and a first preset weighting algorithm;

s1032: and determining the main nodes of the cluster according to the interest calculation results corresponding to the various main node nodes.

And performing first preset weighting algorithm calculation on corresponding first votes generated by mutual voting among all the leader nodes to obtain a rights calculation result corresponding to each leader node. Because each leading node represents the data center to which the leading node belongs to determine the main node and the slave node, and the number of the nodes included in the data center to which each leading node belongs may be the same or different, the first votes of the leading nodes are subjected to equity calculation based on a first preset weighting algorithm to obtain corresponding equity calculation results, the main nodes of the cluster are determined according to the respective corresponding equity calculation results, the determination of the main nodes of the cluster is carried out starting from the actual number of the nodes included in each data center of the cluster, and the actual service requirements of the complex cluster are favorably met.

For example, the first preset weighting algorithm may be represented by the following formula (1):

(k1*a1+k2*a2+k3*a3+…kn*an)/(k1+k2+k3+…kn) (1)

n represents the number of data centers included in the cluster, k1, k2, k3, and.. k n represent the number of nodes included in each of the data centers 1 to n, respectively, a1, a2, a3, and.. an correspond to 0 or 1 being taken according to the first votes of k1, k2, k3, and.. k n, for example, when each node in the data center generates a first votes, the value of which is 1, and the value of the first votes is not generated and is 0. It is understood that the value of n is determined according to the number of data centers actually included in the cluster, and is a natural number greater than 1.

And carrying out equity calculation on the first votes corresponding to the leader nodes based on a first preset weighting algorithm to obtain corresponding equity calculation results, and further determining the main nodes of the cluster according to the equity calculation results.

According to the cluster management method provided by the embodiment of the application, the method for determining the main nodes is that firstly, a rights calculation result corresponding to each leader node is determined according to a first vote number and a first preset weighting algorithm, and then the main node of the cluster is determined according to the rights and the results corresponding to each leader node. Therefore, the main nodes are determined based on the actual number of the nodes included in each data center in the cluster, and the actual service requirement of the complex cluster can be met.

Further, one possible implementation manner of step S1032 may be that the leading node with the largest rights calculation result is determined as the master node of the cluster, and accordingly, other nodes except the master node are determined as the slave nodes.

And when determining the main node of the cluster according to the rights and interests calculation result corresponding to each leading node, selecting the leading node with the largest rights and interests calculation result as the main node of the cluster, thereby completing the determination of the main node of the cluster. Because the rights calculation result corresponding to the leading node is the largest, the leading node is taken as the main node of the cluster, and the main node and the slave nodes cooperate to complete the whole cluster service of the cluster, so that the running stability of the cluster is ensured.

It should be noted that, when determining the master node according to the right calculation result, the method includes, but is not limited to, determining the master node as the master node, and ranking the second or other master nodes as the master node according to the right calculation result.

In the foregoing embodiment, the number of the leader nodes with the largest rights and interests calculation result may be more than one, in other words, if the number of the leader nodes with the largest rights and interests calculation result is multiple, that is, when the leader nodes have a ticketing condition, the master node may be determined through the possible implementation manner shown in fig. 4, where fig. 4 is a schematic flowchart of another master node determination method provided in this embodiment, as shown in fig. 4, the master node determination method provided in this embodiment includes:

s201: a second number of votes is generated by the second supervisory node.

And the second supervision node is a randomly selected node in the candidate leading nodes.

S202: and determining the target leader node corresponding to the second vote number as the main node of the cluster.

If the number of the leader nodes with the maximum interest calculation result is multiple, a second vote number needs to be generated again through a second supervision node aiming at the multiple leader nodes with the maximum interest calculation result, and a target leader node corresponding to the second vote number is determined as a main node of the cluster. The second supervision node is a node randomly selected from candidate leading nodes, and the candidate leading nodes are each current leading node. The second monitoring node generates the second vote number, which can be understood as that the second monitoring node performs voting again on the plurality of leading nodes with the maximum rights and interests calculation result, that is, generates the second vote number, and then determines the leading node corresponding to the second vote number, that is, the target leading node, as the main node of the cluster.

According to the cluster management method provided by the embodiment of the application, when the main node is determined according to the rights and interests calculation result, when the number of the leader nodes with the largest rights and interests calculation result is multiple, one node is randomly selected from the candidate leader nodes to serve as a second supervision node, then a second vote number is generated through the second supervision node, and a target leader node corresponding to the second vote number is determined as the main node of the cluster, so that the problem of determining the main node under the condition of same vote is solved.

In one possible design, when the master node is determined through step S103, when the first number of votes corresponding to the candidate leading node is greater than half of the total number of all nodes, the candidate leading node is determined to be the master node.

After the first vote number of each leading node is determined, when the first vote number corresponding to the candidate leading node is greater than half of the total number of all nodes in the cluster, the candidate leading node of which the first vote number exceeds half of the total number of all nodes is directly determined as the master node without performing the equity calculation in the embodiment shown in fig. 3. Accordingly, all nodes except the master node are determined to be slave nodes. Therefore, the process of determining the main node is further accelerated, and the determining efficiency is effectively improved.

In one possible design, in order to speed up the process of determining the cluster master node and the slave node, when the head node of each data center is determined through step S101, the determination may be implemented through steps shown in fig. 5. Fig. 5 is a schematic flowchart of a process for determining a leading node according to an embodiment of the present application, and as shown in fig. 5, the method for determining a leading node according to the embodiment includes:

s1011: and splitting the nodes in each data center into a plurality of voting clusters according to a preset division mechanism.

Wherein each voting cluster comprises at least one node.

The nodes included in each data center are divided through a preset dividing mechanism to obtain a plurality of voting clusters, and each voting cluster includes at least one node. The preset partitioning mechanism may be set by an initial configuration menu, for example, the partitioning may be performed in a manner of dividing an integer by a unit number, so as to speed up the splitting of each data center. It should be understood that, the embodiment of the present application is not limited to a specific form of the preset partitioning mechanism, and may be set according to the distribution of the number of nodes in the data center included in the cluster.

S1012: and acquiring a third voting number of each node in each voting cluster within a preset fixed time through the first supervision node.

After each data center is divided into a plurality of voting clusters, and for each data center, a third voting number generated by each node in each voting cluster within a preset fixed time is acquired through a corresponding first supervision node. For example, each node in each data center is associated with a respective code number (Myid). Within a preset fixed time, each node in each voting cluster can simultaneously reflect the third voting number generated by each node through the code number corresponding to the selected node, and then the first supervision node counts the third voting number of each node in each voting cluster to obtain the corresponding third voting number. It should be noted that the preset fixed duration may be a fixed duration adopted by each voting cluster in each data center, and a specific numerical value of the duration may be set according to an actual situation of the cluster, which is not limited in this embodiment of the present application.

Further, when acquiring the third number of votes, in order to ensure the stability of each data center and the reliability of the third number of votes, step S1012 may specifically include:

determining the survival state of each node in each data center through a first supervision node;

and when the number of the nodes in the survival state is larger than a preset number threshold, acquiring a third vote number of the nodes in the survival state within a preset fixed time length.

The survival status of each node in each corresponding data center is determined by the first supervisory node, for example, the survival status of each node may be determined by communicating with each node in the same data center in real time. An alive state may be understood as each node being in a normal, fault-free operational state. The first supervision node determines that when the number of the nodes in the survival state is larger than a preset number threshold, a third vote number of the nodes in the survival state within the preset fixed time length is acquired. In other words, the third vote number should be generated by the node in the alive state. The preset data volume threshold may be determined according to the number of actual nodes of each data center or the operating environment, and the like, which is not limited in the embodiments of the present application.

Correspondingly, the first supervision node can also temporarily store the determined nodes which are not in the survival state into a preset blacklist, so that the nodes which are not in the survival state do not participate in the link of determining the leading node, and the preset blacklist is deleted until the round of determining the leading node is completed.

When the third vote number is obtained, the survival state of each node in each corresponding data center is determined through the first supervision node, and when the number of the nodes in the survival state is greater than the preset number threshold, the third vote number generated within the preset fixed time duration by the nodes in the survival state is obtained, so that all the nodes for determining the leader node are the nodes in the survival state, the stability of each data center is ensured, and the reliability of determining the leader node through the third vote number is effectively improved.

S1013: and determining the node with the highest third vote number as the leading node through the first supervision node.

After the first supervision node obtains the third vote number of each node in each voting cluster, the node with the highest third vote number is determined as a leading node for each data center, so that the leading node represents the corresponding data center to perform next round of voting, and accordingly the main node and the slave node of the cluster are determined.

According to the cluster management method provided by the embodiment of the application, when the leader node of each data center is determined through the first supervision node, the nodes in each data center are divided into a plurality of voting clusters according to a preset division mechanism, wherein each voting cluster comprises at least one node, then the third voting number of each node in each voting cluster in a preset fixed time duration is obtained through the first supervision node, and finally the node with the highest third voting number is determined as the leader node through the first supervision node. Each data center is divided into a plurality of voting clusters, so that each voting cluster generates a third vote at the same time, and then a leader node is determined according to the third voting number, so that the process of the first-round voting link is accelerated, the time for determining the master node and the slave node is shortened, and the determination efficiency is improved.

In the embodiment shown in fig. 5, when the third vote number in step S1012 is greater than half of the total number of nodes in the same data center, the candidate node corresponding to the third vote number is determined as the leading node.

After the third vote number is determined, when the third vote number is greater than half of the total number of all nodes in the same data center, the candidate node corresponding to the third vote number is directly determined as the leading node without performing step S1013, so as to further accelerate the process of determining the leading node, and effectively improve the determination efficiency. Wherein the candidate node generates a third number of votes for each node in each data center.

In step S1013, when the number of candidate nodes with the highest third vote count is multiple, a fourth vote count may be generated by the first supervising node, so as to determine the candidate node corresponding to the fourth vote count as the leader node.

And when the number of the candidate nodes with the highest third vote number is more than one, the first supervision node needs to generate a fourth vote number again, and the candidate node corresponding to the fourth vote number is determined as the head node. And voting again by the first supervision node aiming at the candidate nodes with the highest third voting number, namely generating a fourth voting number, and further determining the candidate node corresponding to the fourth voting number as the head node. The problem of the same ticket when the leading node is determined is solved.

In a possible design, before performing step S102, the cluster management method provided in this embodiment of the present application may further include the step shown in fig. 6, where fig. 6 is a flowchart of another cluster management method provided in this embodiment of the present application, and as shown in fig. 6, the cluster management method provided in this embodiment further includes:

s301: and judging whether all the nodes have fault nodes through the second supervision node.

Before the first vote number of each leading contact in the classification duration is obtained, in order to ensure that no fault node exists in all current nodes, the second supervision node needs to judge whether a fault node exists in all current nodes, for example, real-time communication is performed between the second supervision node and all current nodes to obtain the survival state of each node, and further judge whether a fault node exists.

S302: and when the judgment result is yes, the second supervision node sends heartbeat test packets of preset times to the fault node, and determines whether the fault node is in the downtime state or not according to the feedback data of the heartbeat test packets.

After the judgment of the second monitoring node, if the judgment result is yes, that is, it is determined that the fault node exists in all the current nodes, the second monitoring node may further confirm, for example, the second monitoring node may send a heartbeat test packet of a preset number of times to the judged fault node, and further determine whether the fault node is in the down state according to the feedback data of the heartbeat test packet. If the second supervising node sends three preset test packets to the failed node, and the three preset test packets are all returned, the failed node can be determined to be in a downtime state. At this time, the second supervising node may also identify the downtime state of the failed node, where a specific identification manner is not limited in the embodiment of the present application.

It should be noted that the downtime state related in this embodiment is not a downtime state of the node, and may be any state in a preset fault mode, which is not limited in this embodiment.

S303: and when the fault node is determined to be in the downtime state, the second monitoring node broadcasts the downtime state of the fault node to all the nodes so that the fault node does not generate the first vote number.

After the first monitoring node determines that the fault node is in the downtime state, further, the second monitoring node quickly broadcasts the downtime state of the fault node to all the current nodes, so that the fault node does not generate the first voting number, namely, the fault node does not participate in the voting link of determining the master node and the slave node by the second round of slave leading nodes, and the effective right of determining the leading node of the master node is improved.

Further, the second supervising node may also record the failed node and its downtime state in a corresponding data Version (Version) for subsequent recovery and cleaning.

According to the cluster management method provided by the embodiment of the application, before the first vote number of each leading node in the classification duration is obtained, whether fault nodes exist in all nodes can be judged through the second supervision node. And if so, the second supervision node sends heartbeat test packets of preset times to the fault node, and further determines whether the fault node is in the downtime state according to feedback data of the heartbeat test packets. And when the fault node is determined to be in the downtime state, the second monitoring node broadcasts the downtime state of the fault node to all the nodes so as to enable the fault node not to generate the first voting number. Therefore, the main node is determined based on the actual node, the effective rights and interests of the leader node when the main node is determined are improved, the reliability of the determined main node is further improved, and the stability of the cluster is guaranteed.

Further, when the leading node is determined to be the failed node through the embodiment shown in fig. 6, that is, when the leading node is the failed node, in a possible implementation manner, the first monitoring node corresponding to the failed node may be determined to be the leading node through the second monitoring node.

When the leading node is determined to be the fault node, the first supervision node in the data center to which the fault node belongs is used for replacing the fault node as the leading node through the second supervision node, so that all the nodes when the main node is determined are ensured to be non-fault nodes, the reliability of the subsequently determined main node is further improved, and the stability of the cluster is ensured.

When the leading node is the failed node, in another possible implementation manner, a candidate node in the same data center to which the failed node belongs may be determined as the leading node by the second supervising node, where the candidate node is a node with the second ranking of the third votes.

In this possible implementation, the second supervising node replaces the failed node with the node ranked second, i.e. the candidate node, in the same data center to which the failed node belongs as the lead node. All nodes are guaranteed to be non-fault nodes when the master node is determined, reliability of the subsequently determined master node is further improved, and stability of the cluster is guaranteed.

According to the cluster management method provided by the embodiment of the application, the actual transaction processing capacity and other factors of all nodes in the cluster are also considered in the determination of the master node and the slave nodes, so that the classification duration for generating the first vote number is determined according to the actual situation, and the determined master node and slave nodes are ensured to meet the overall cluster service requirement of the complex cluster. Therefore, in a possible design, before performing step S102, the cluster management method provided in the embodiment of the present application may further include a step of determining a classification duration as shown in fig. 7, where fig. 7 is a schematic flowchart of another cluster management method provided in the embodiment of the present application, and as shown in fig. 7, the cluster management method provided in this embodiment further includes:

s401: and determining a minimum voting cluster according to the plurality of voting clusters, and acquiring the basic processing time of the transaction processing capacity corresponding to the minimum voting cluster.

The minimum voting cluster is the voting cluster with the minimum number of nodes in the plurality of voting clusters.

As can be seen from the foregoing description of the illustrated embodiments, each data center may be split into a plurality of voting clusters, each voting cluster including at least one node. Therefore, for the whole cluster, the voting cluster with the smallest number of nodes can be determined from all the voting clusters, that is, the smallest voting cluster. And then acquiring the basic processing time of the transaction processing capacity corresponding to the minimum voting cluster. It will be appreciated that for each cluster, there is a respective transaction capability, and thus the base processing time for the smallest cluster may be represented by Tb, for example, corresponding to the base processing time for the existing transaction capability. It should be understood that the embodiment of the present application is not limited to the specific transaction corresponding to the transaction processing capability.

S402: and determining a complex coefficient corresponding to each voting cluster based on a preset simulation algorithm.

Because the number of nodes in a plurality of data centers included in a cluster may be the same or different, and because the plurality of data centers may be distributed across logical regions, various differences also exist in the network environment where the plurality of data centers are located, thereby affecting parameters such as data transmission delay between nodes and the like. For this situation in real conditions, it can be described by the complex coefficients of the clusters.

The linear relation between the complex coefficients of different clusters and the number of nodes included in the clusters, namely the number of servers, can be determined through a preset simulation algorithm. For example, the linear relationship is shown in equation (2):

P_i＝n′_ik (2)

wherein n' represents the number of nodes in the current cluster, k represents the corresponding linear coefficient, and P_iRepresenting complex coefficients, i, of the corresponding clusterThe values are 1 to n ', and n' is a natural number larger than 1.

Because the number of the nodes corresponding to different clusters is different, the linear coefficient k of the corresponding cluster can be obtained by a linear regression method, and then the corresponding complex coefficient P is determined_i。

In this embodiment, a corresponding P may be determined for each voting cluster_iI.e. determining the corresponding complex coefficient for each voting cluster.

S403: acquiring the time difference for processing a single request between each voting cluster and the minimum voting cluster;

because the number of nodes correspondingly included in each voting cluster is different, the time length for processing a single request by each voting cluster is different. Thus, the time duration for processing a single request by the minimum voting cluster may be determined first, and then the time difference between the time duration for processing a single request between each voting cluster and the minimum voting cluster may be determined in turn, e.g., Δ t may be used_iRepresenting the determined corresponding time difference.

S404: and determining classification duration according to the basic processing time, the complex coefficient, the time difference and the fixed time delay based on a second preset weighting algorithm.

For each voting cluster in the entire cluster, each voting cluster has a fixed time delay to generate the first number of votes within the fixed time delay, e.g., X may be used_i′Representing the fixed time delay of the corresponding voting cluster. Further, the determined basic processing time, the determined complex coefficient, the determined time difference and the determined fixed time delay are combined, and the actual time delay of generating the first vote count by each leader node, namely the classification time length, is determined based on a second preset weighting algorithm. Wherein, the second preset weighting algorithm is shown as formula (3):

where n "represents the number of voting clusters in the current cluster, P_i′Representing the corresponding complex coefficient, Δ t, for each voting cluster_i′Representing the corresponding time difference, X_i′Representing the fixed time delay of the corresponding voting cluster, the value of i' is 1 to n ", and n" is a natural number greater than 1.

For example, in actual conditions, the time for generating the first number of votes for each lead node may be controlled based on a logic clock and a classification duration. When the first vote number is generated, firstly, the logic clock is determined not to be disordered, then, the timing is started at the current moment corresponding to a certain logic clock, each leading node starts to generate the first vote number, after the classification duration, the timing is ended, the current round of voting is ended, and the first vote number generated by each leading node is obtained, so that the duration for generating the first vote is controlled within the classification duration. Further, in the process of generating the first vote number, each leader node may also send a data packet including information such as the number of nodes included in each data center, the corresponding data version, and the current logic clock to other nodes in the same data center for confirmation, so as to synchronize the current state of each node with other nodes.

According to the cluster management method provided by the embodiment of the application, before the first vote number of each leading node in the classification duration is obtained, the actual classification duration is determined according to the number of actual nodes, the transaction processing capacity and other factors. The method comprises the steps of firstly determining a minimum voting cluster according to a plurality of voting clusters, and acquiring basic processing time of transaction processing capacity corresponding to the minimum voting cluster, wherein the minimum voting cluster is the voting cluster with the least number of nodes in all the voting clusters. And then determining a complex coefficient corresponding to each voting cluster based on a preset simulation algorithm, acquiring a time difference for processing a single request between each voting cluster and the minimum voting cluster, and further determining classification duration based on a second preset weighting algorithm according to the basic processing time, the complex coefficient, the time difference and the fixed time delay. Therefore, the classification duration for generating the first vote number is determined according to the actual conditions of all nodes in the cluster, and the determined main node and the determined slave nodes can meet the overall cluster service requirements of the complex cluster.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Fig. 8 is a schematic structural diagram of a cluster management device provided in an embodiment of the present application, where the cluster management device provided in the embodiment of the present application is applied to a cluster, and the cluster includes a plurality of data centers. As shown in fig. 8, the cluster management apparatus 500 provided in this embodiment includes:

a first processing module 501, configured to determine a leading node of each data center through a first monitoring node, where the first monitoring node is a node randomly selected from each data center;

an obtaining module 502, configured to obtain a first vote count of each leading node within a classification duration;

the second processing module 503 is configured to determine a master node of the cluster according to the first vote number corresponding to each leader node, and determine other nodes except the master node as slave nodes of the cluster, so that the master node and the slave nodes cooperatively process the service of the cluster.

In one possible design, each data center is configured with a distributed application collaboration service.

In one possible design, the number of nodes in each data center may be the same or different.

Based on the embodiment shown in fig. 8, fig. 9 is a schematic structural diagram of another cluster management apparatus provided in the embodiment of the present application, and as shown in fig. 9, the second processing module 503 of the cluster management apparatus 500 provided in the embodiment of the present application includes:

a calculating module 5031, configured to determine, according to the first vote number and a first preset weighting algorithm, a right calculation result corresponding to each leader node;

the first determining module 5032 is configured to determine the master nodes of the cluster according to the benefit calculation result corresponding to each of the collar nodes.

In one possible design, the first determining module 5032 is specifically configured to:

In a possible design, if the number of the head nodes with the largest benefit calculation result is multiple, the first determining module 5032 is specifically configured to:

In one possible design, the second processing module 503 is further specifically configured to:

On the basis of the foregoing embodiment, fig. 10 is a schematic structural diagram of a first processing module provided in this embodiment, and as shown in fig. 10, the first processing module 501 in the cluster management apparatus 500 provided in this embodiment further includes:

the splitting module 5011 is configured to split the nodes in each data center into a plurality of voting clusters according to a preset splitting mechanism, where each voting cluster includes at least one node;

the vote collecting module 5012 is configured to obtain, through the first monitoring node, a third vote count of each node in each voting cluster within a preset fixed time duration;

a second determining module 5013, configured to determine, by the first supervising node, the node with the highest third vote number as the leader node.

In one possible design, the first processing module 501 is further specifically configured to:

and when the number of the candidate nodes with the highest third vote number is multiple, generating a fourth vote number through the first supervision node so as to determine the candidate node corresponding to the fourth vote number as the head node.

In one possible design, the crediting module 5012 is specifically configured to:

Fig. 11 is a schematic structural diagram of another cluster management device provided in an embodiment of the present application, and as shown in fig. 11, the cluster management device 500 provided in this embodiment further includes: a third processing module 504.

Wherein, the third processing module 504 is configured to:

judging whether all nodes have fault nodes or not through a second supervision node;

if so, the second supervision node sends heartbeat test packets of preset times to the fault node, and determines whether the fault node is in a down state according to feedback data of the heartbeat test packets;

and when the fault node is determined to be in the downtime state, the second monitoring node broadcasts the downtime state of the fault node to all the nodes so as to enable the fault node not to generate the first voting number.

In one possible design, the third processing module 504 is further specifically configured to:

and when the leading node is a fault node, determining a first supervision node corresponding to the fault node as the leading node through a second supervision node.

and when the leading node is the fault node, determining a candidate node in the same data center to which the fault node belongs as the leading node through a second supervision node, wherein the candidate node is a node with a second third vote number ranking.

Fig. 12 is a schematic structural diagram of another cluster management apparatus provided in an embodiment of the present application, and as shown in fig. 12, the cluster management apparatus 500 provided in this embodiment further includes: a fourth processing module 505.

Wherein, the fourth processing module 504 is configured to:

and determining classification duration according to the basic processing time, the complex coefficient, the time difference and the fixed time delay based on a second preset weighting algorithm.

The above device embodiments provided in the present application are merely illustrative, and the module division is only one logic function division, and there may be another division manner in actual implementation. For example, multiple modules may be combined or may be integrated into another system. The coupling of the various modules to each other may be through interfaces that are typically electrical communication interfaces, but mechanical or other forms of interfaces are not excluded. Thus, modules described as separate components may or may not be physically separate, may be located in one place, or may be distributed in different locations on the same or different devices.

It should be noted that the cluster management apparatus provided in the foregoing illustrated embodiment may be used to execute corresponding steps of the cluster management method provided in the foregoing embodiment, and specific implementation, principle and technical effect are similar to those of the foregoing method embodiment, and are not described herein again.

Fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present application, and as shown in fig. 13, an electronic device 700 according to the embodiment includes:

a processor 701; and

a memory 702 communicatively coupled to the processor 701; wherein the content of the first and second substances,

the memory 702 stores instructions executable by the processor 701, and the instructions are executed by at least one processor 701, so as to enable the processor 701 to execute each step of the cluster management method in the foregoing method embodiments, which may be referred to in particular with reference to the description in the foregoing method embodiments.

Alternatively, the memory 702 may be separate or integrated with the processor 701.

When the memory 702 is a separate device from the processor 701, the electronic device 700 may further include:

the bus 703 is used to connect the processor 701 and the memory 702.

In addition, embodiments of the present application also provide a non-transitory computer readable storage medium storing computer instructions for causing a computer to execute the steps of the cluster management method in the foregoing embodiments. For example, the readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. The cluster management method is applied to a cluster, wherein the cluster comprises a plurality of data centers; the method comprises the following steps:

2. The cluster management method according to claim 1, wherein each data center is configured with a distributed application collaboration service zookeeper.

3. The cluster management method according to claim 1, wherein the number of nodes in each of the data centers is the same or different.

4. The cluster management method according to any one of claims 1 to 3, wherein the determining the master nodes of the cluster according to the first vote count corresponding to each master node comprises:

5. The cluster management method according to claim 4, wherein the determining the master nodes of the cluster according to the equity calculation result corresponding to each of the master nodes comprises:

6. The cluster management method according to claim 5, wherein if there are a plurality of head nodes with the largest rights calculation result, determining the master node of the cluster according to the rights calculation result corresponding to each head node includes:

7. The cluster management method according to any one of claims 1 to 3, wherein the determining the master nodes of the cluster according to the first vote count corresponding to each master node comprises:

8. The cluster management method according to any one of claims 1 to 3, wherein the determining, by the first supervisory node, the head node of each data center comprises:

9. The cluster management method according to claim 8, wherein when the third vote number is greater than half of the total number of nodes in the same data center, the candidate node corresponding to the third vote number is determined as the leader node.

10. The cluster management method according to claim 8, wherein when the number of candidate nodes with the highest third vote count is plural, a fourth vote count is generated by the first supervising node, and a candidate node corresponding to the fourth vote count is determined as the leader node.

11. The cluster management method of claim 8, wherein the obtaining, by the first supervising node, a third number of votes for each node in each voting cluster within a preset fixed time period comprises:

12. The cluster management method according to claim 8, wherein before obtaining the first number of votes for each leader node within the classification duration, further comprising:

13. The cluster management method according to claim 12, wherein when the leading node is the failure node, the first supervising node corresponding to the failure node is determined as the leading node by the second supervising node.

14. The cluster management method according to claim 12, wherein when the leading node is the failed node, a candidate node in the same data center to which the failed node belongs is determined as the leading node by the second supervising node, and the candidate node is the node with the second rank of the third vote number.

15. The cluster management method according to claim 8, wherein before obtaining the first number of votes for each leader node within the classification duration, further comprising:

16. The cluster management device is applied to a cluster, and the cluster comprises a plurality of data centers; the device comprises:

the first processing module is used for determining a leading node of each data center through a first supervision node, wherein the first supervision node is a node randomly selected from each data center;

17. An electronic device, comprising:

a processor; and

a memory communicatively coupled to the processor; wherein the memory stores instructions executable by the processor to enable the processor to perform the cluster management method of any of claims 1-15.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the cluster management method of any of claims 1-15.