CN115801773A

CN115801773A - Method and device for electing distributed cluster

Info

Publication number: CN115801773A
Application number: CN202210913742.8A
Authority: CN
Inventors: 李想; 樊晓光; 曲秀超; 张廷乐; 皇甫利刚; 侯捷
Original assignee: Tianyi Cloud Technology Co Ltd
Current assignee: Tianyi Cloud Technology Co Ltd
Priority date: 2022-08-01
Filing date: 2022-08-01
Publication date: 2023-03-14

Abstract

The invention discloses a distributed cluster election method and equipment, which determine election timeout time through cluster historical information, improve the reasonability of election and ensure the stability of a system. The method comprises the following steps: if the distributed cluster has no main node, aiming at each candidate node in the distributed cluster, determining self election timeout time by the following method: determining election timeout time by a current candidate node according to a stored historical information set, wherein the historical information set comprises historical information of the current candidate node and historical information of at least one other candidate node except the current candidate node; and initiating voting election after the election timeout time of each candidate node, and if the obtained voting number exceeds the number of half of the nodes in the distributed cluster, determining the candidate node as an elected main node.

Description

Method and equipment for electing distributed cluster

Technical Field

The present invention relates to the field of distributed system technologies, and in particular, to a method and an apparatus for electing a distributed cluster.

Background

Distributed clusters generally adopt a master-slave architecture at present, that is, a master node and a plurality of slave nodes are included, and when the master node fails, a new master node is selected by election, so as to ensure high availability of the cluster. The distributed file system ensures second-level fault switching through a cluster election algorithm, ensures the consistency of data, and has the following problems although the cluster election algorithm has the advantages of high election speed, low algorithm complexity, easiness in implementation and the like:

the distributed file system determines election timeout time in a random generation mode, all slave nodes have the same probability and become new master nodes through election, the generation of the new master nodes has certain randomness, but actually, all the slave nodes are not suitable for becoming the new master nodes, so that election results are not reasonable enough, and cluster reliability is low.

Disclosure of Invention

The invention provides a distributed cluster election method and distributed cluster election equipment, which are used for determining election timeout time through cluster historical information, improving election rationality and ensuring system stability.

In a first aspect, a method for electing a distributed cluster provided in an embodiment of the present invention includes:

if the distributed cluster has no main node, determining the election timeout time of each candidate node in the distributed cluster by the following method:

determining election timeout time by a current candidate node according to a stored historical information set, wherein the historical information set comprises historical information of the current candidate node and historical information of at least one other candidate node except the current candidate node;

and initiating voting election after the election timeout time of each candidate node, and if the obtained voting number exceeds the number of half of the nodes in the distributed cluster, determining the candidate node as an elected main node.

According to the distributed cluster election method provided by the embodiment of the invention, the election timeout time is determined through the cluster history information, the most appropriate host node is selected, the election rationality is improved, and the stability of the system is ensured.

As an optional implementation manner, the determining, by the current candidate node, an election timeout time according to the stored historical information set includes:

determining the arrangement order of the history information of the current candidate node in the history information set;

and determining the election overtime time of the current candidate node according to a preset election overtime upper limit, a preset election overtime lower limit and the arrangement sequence of the historical information of the current candidate node.

As an optional implementation, the history information includes a log operation sequence number and an option number;

the determining the ranking order of the history information of the current candidate node in the history information set includes:

determining the weight sum of each candidate node according to the log operation serial number and the tenure number of each candidate node in the historical information set and the weight parameters respectively corresponding to the log operation serial number and the tenure number;

and sequencing the historical information of each candidate node in the historical information set according to the weight sum of each candidate node, and determining the arrangement order of the historical information of the current candidate node.

As an optional implementation manner, the determining the election timeout time of the current candidate node according to a preset election timeout upper limit, a preset election timeout lower limit, and a ranking order of the history information of the current candidate node includes:

determining an election time interval according to the preset election timeout upper limit, the preset election timeout lower limit and the number of candidate nodes corresponding to each historical information in the historical information set;

and determining the election overtime time of the current candidate node according to the preset election overtime lower limit, the election time interval and the arrangement sequence of the historical information of the current candidate node.

As an optional implementation, the method further comprises:

and if no candidate node with the obtained voting number exceeding the number of half of the nodes in the distributed cluster exists, determining the elected main node according to the historical information of each candidate node in the distributed cluster.

According to the distributed cluster election method provided by the embodiment of the invention, when election fails, a new main node is directly determined by using historical information, so that time consumption caused by repeated election is avoided, and election efficiency is improved.

As an optional implementation manner, the determining an elected master node according to history information of each candidate node in the distributed cluster includes:

and aiming at any candidate node, determining the elected master node according to each historical information in the historical information set stored by any candidate node.

As an optional implementation, the history information includes a log operation sequence number and an arbitrary number;

the determining the elected master node according to each historical information in the historical information set stored by any candidate node includes:

determining the weight sum of each historical information according to the log operation serial number and the tenure number of each historical information stored in any candidate node and the weight parameters respectively corresponding to the log operation serial number and the tenure number;

and determining the maximum weight and the corresponding candidate node as the elected main node.

As an optional implementation, the method further comprises:

under the non-electing condition, if the history information of any node in the distributed cluster is updated, the updated history information is synchronously updated among other nodes in the distributed cluster;

wherein the respective other nodes comprise nodes in the distributed cluster other than the any node.

In a second aspect, an apparatus for distributed cluster election provided in an embodiment of the present invention includes a processor and a memory, where the memory is used to store a program executable by the processor, and the processor is used to read the program in the memory and execute the following steps:

if the distributed cluster has no main node, aiming at each candidate node in the distributed cluster, determining self election timeout time by the following method:

and initiating voting after the voting timeout time of each candidate node, and if the obtained voting number exceeds half of the number of the nodes in the distributed cluster, determining the candidate nodes as the elected main nodes.

As an alternative embodiment, the processor is configured to perform:

the processing appliance is configured to perform:

As an alternative embodiment, the processor is configured to perform:

As an optional implementation manner, the processor is specifically further configured to perform:

and if no candidate node with the number of votes exceeding the number of half of the nodes in the distributed cluster exists, determining the elected main node according to the history information of each candidate node in the distributed cluster.

As an alternative embodiment, the treatment appliance is configured to perform:

the processing appliance is configured to perform:

determining the weight sum of each piece of historical information according to the log operation serial number and the tenure number of each piece of historical information stored in any candidate node and the weight parameters respectively corresponding to the log operation serial number and the tenure number;

under the non-election condition, if the history information of any node in the distributed cluster is updated, the updated history information is synchronously updated among other nodes in the distributed cluster;

In a third aspect, an embodiment of the present invention further provides a device for electing a distributed cluster, where the device includes:

an election time determining unit, configured to determine, for each candidate node in the distributed cluster, an election timeout time of each candidate node in the distributed cluster by: determining election timeout time by a current candidate node according to a stored historical information set, wherein the historical information set comprises historical information of the current candidate node and historical information of at least one other candidate node except the current candidate node;

and the voting election node unit is used for initiating voting elections after each candidate node passes through the election timeout time of the candidate node, and if the obtained voting number exceeds half of the number of the nodes in the distributed cluster, determining the candidate node as an elected main node.

In a fourth aspect, an embodiment of the present invention further provides a computer storage medium, on which a computer program is stored, where the computer program is used to implement the steps of the method in the first aspect when the computer program is executed by a processor.

These and other aspects of the present application will be more readily apparent from the following description of the embodiments.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a flow chart of cluster election of a current distributed file system of a CStor according to an embodiment of the present invention;

fig. 2 is a flowchart of an implementation of a method for electing a distributed cluster according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a cluster idle period according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a log operation sequence number according to an embodiment of the present invention;

fig. 5 is a flowchart of a specific implementation of a distributed cluster election according to an embodiment of the present invention;

fig. 6 is a schematic diagram of a distributed cluster election device according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a distributed cluster election apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The term "and/or" in the embodiments of the present invention describes an association relationship of associated objects, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The application scenario described in the embodiment of the present invention is for more clearly illustrating the technical solution of the embodiment of the present invention, and does not form a limitation on the technical solution provided in the embodiment of the present invention, and it can be known by a person skilled in the art that, with the occurrence of a new application scenario, the technical solution provided in the embodiment of the present invention is also applicable to similar technical problems. In the description of the present invention, the term "plurality" means two or more unless otherwise specified.

Embodiment 1, a distributed cluster generally adopts a master-slave architecture at present, that is, the distributed cluster includes a master node and a plurality of slave nodes, and when a master node fails, a new master node is selected by election, so as to ensure that the cluster is highly available. The distributed file system of the CStor guarantees second-level fault switching through a Raft cluster election algorithm, guarantees the consistency of data, and although the cluster election algorithm has the advantages of high election speed, low algorithm complexity, easiness in implementation and the like, the following problems still exist:

on one hand, the distributed file system determines election timeout time in a random generation mode, all slave nodes have the same probability and become new master nodes through election, the new master nodes have certain randomness, but actually, all the slave nodes are not suitable to become the new master nodes, so that election results are not reasonable enough, and clustering reliability is low; on the other hand, when election fails, the distributed file system adopts a repeated election mode, so that the problems of long election time, low election efficiency and slow cluster failure recovery are caused.

As shown in fig. 1, a cluster election process of a distributed file system of a server mainly includes the following steps:

step 100, judging whether the heartbeat is overtime, if yes, executing step 101, and otherwise, executing step 108;

in implementation, the master node (also called a Leader node) periodically sends a heartbeat to the rest of nodes (i.e. slave nodes) in the cluster, and the slave nodes in the cluster consider the cluster as having a master at this time as long as the heartbeat of the Leader node is received. If the heartbeat is not received within a certain time, namely the heartbeat is overtime, the cluster is considered to be not master at this time, and election needs to be carried out. Elections are initiated because the master node goes offline (i.e., a heartbeat timeout), and a new master node is then selected from the slave nodes other than the master node, which is converted from the master node to the slave node.

The heartbeat is a general word in cluster election, indicates a communication mode of each node in a cluster, and needs to be periodically sent to other nodes as long as the node becomes a master node to prove that the node is the master node.

Step 101, changing all slave nodes (also called Follower nodes) in the cluster into Candidate nodes (Candidate nodes);

step 102, each candidate node initiates election after waiting for election timeout time;

wherein, the election timeout time of each candidate node is generally randomly generated between 150ms and 300 ms.

And 103, after the election timeout time, the candidate nodes increase the own tenure numbers by 1, initiate voting requests and require the rest candidate nodes to vote for the candidate nodes.

Wherein, the shorter the election timeout time is, the earlier the voting initiation time is, and the more likely it will become a new master node.

And step 104, when the candidate node receives the voting request, determining to approve the voting or reject the voting according to a certain check standard.

Wherein the inspection criteria include:

(1) The tenure of the initiating voting node is not far away from that of the node;

(2) There is no master node in the cluster at this time.

If the check is passed, the candidate node votes and updates the local tenure stored by the candidate node to the tenure in the voting request, otherwise, the voting is refused.

105, judging whether candidate nodes which take votes exceeding N/2 in the cluster are stored, if so, executing step 106, otherwise, executing step 102;

and N is the number of all nodes in the cluster.

Step 106, determining that the candidate node becomes a new main node, and sending heartbeats to other nodes to declare the candidate node to become the main node;

and step 107, immediately stopping voting and election behaviors, returning the candidate nodes to be slave nodes, and finishing the election of the cluster.

And step 108, the node state is not required to be changed.

Therefore, the cluster election method of the CStor distributed file system has the advantages that randomly generated election timeout time is adopted, so that the randomness of election results is high, the election reliability is low, the time consumption of election is long and the election efficiency is low due to the fact that the election is retried after election fails.

In order to solve the foregoing technical problem, an embodiment of the present invention provides a method for electing a distributed cluster, where a core idea of design of the method is to determine an election timeout time by using history information of nodes, and it should be noted that information synchronization between nodes may exist in a master cluster with multiple slaves, where the information synchronization includes log synchronization, deadline synchronization, and the like. Since the decision strategy for the cluster is: the reply of most nodes is considered to be successful after being received, so that the information of logs, tenures and the like stored by all nodes is not necessarily completely consistent in practice. Therefore, if the history information of a node is richer, it means that the log of the node is more complete, and the appointment number is updated, then the node is more similar to any previous master node, and such a node becomes a new master node more appropriately.

The distributed cluster election method provided by the embodiment of the invention optimizes the Raft cluster election algorithm, determines the election timeout time through the cluster history information, selects the most appropriate main node, improves the election rationality and ensures the stability of the system. Meanwhile, when election fails, the new main node is directly determined by using historical information, so that time consumption caused by repeated elections is avoided, and election efficiency is improved.

As shown in fig. 2, an implementation flow of the method for electing a distributed cluster provided in this embodiment is as follows:

step 200, if the distributed cluster has no master node, determining self election timeout time for each candidate node in the distributed cluster by the following method:

in implementation, whether the distributed cluster has no master node is determined according to whether the heartbeat timeout exists in the current distributed cluster, namely when the heartbeat timeout exists in the distributed cluster, the distributed cluster is determined to have no master node, otherwise, the distributed cluster is determined to have a master node, and election is not performed.

When it is determined that the distributed cluster has no master node, the original master node is switched to a slave node, the original slave node is switched to a candidate node, and a new master node is selected from the candidate nodes.

Each candidate node in this embodiment determines its election timeout time by combining history information of other locally stored nodes, where a node in this embodiment refers to a node in a distributed cluster and includes a master node and a slave node according to a task of a current distributed cluster; or a slave node, a candidate node; and the master node, the slave node and the candidate node in the embodiment are all understood as nodes in the distributed cluster, and only the states of the nodes are different.

It should be noted that, in this embodiment, the history information stored by each candidate node includes history information of itself and history information of other candidate nodes, and the history information of each node in the distributed cluster may be updated in all nodes synchronously.

In some embodiments, the history information in this embodiment includes, but is not limited to, a log operation sequence number and an expiration number, and other history information, which is not limited in this embodiment.

In some embodiments, the current candidate node of the present embodiment determines the election timeout time from the stored historical information set by:

(1) Determining the arrangement order of the history information of the current candidate node in the history information set;

when the history information comprises a log operation sequence number and an arbitrary number, determining the arrangement order of the history information of the current candidate node in the history information set by the following method:

1a) Determining the weight sum of each candidate node according to the log operation serial number and the tenure number of each candidate node in the historical information set and the weight parameters respectively corresponding to the log operation serial number and the tenure number;

in implementation, as shown in fig. 3, this embodiment provides a schematic diagram of a cluster tenure, where a cluster has countless tenures during operation, the tenure number (denoted as term _ id) is 0, each time a new master node (Leader node) is elected by the cluster, the term _ id is increased by 1, and the Leader node will dominate the whole cluster until a new Leader node is generated by the next election. Thus, a larger term _ id indicates a more recent tenure.

As shown in fig. 4, this embodiment further provides a log operation sequence number diagram, where a Leader node in a cluster is responsible for processing a request of a client, and when receiving the request of the client, the Leader node adds an operation log, where an initial value of a log operation sequence number (denoted as op _ id) is 0, and the op _ id is increased by 1 each time the node adds the operation log. And broadcasting to all Follower nodes in the cluster after the Leader node appends the logs, and appending the same operation logs to the Follower nodes, wherein the Leader node considers that the operation is finished and feeds back the result to the client as long as more than half of replies are received. Therefore, as the cluster running time becomes longer, the log operation sequence numbers of the nodes gradually differ, and the larger the op _ id is, the newer the log is. For example, in fig. 4, op _ id =7 for node 1, op _ id =5 for node 2, and op _ id =6 for node 3, the op _ id for node 1 is the largest, and thus the log for node 1 is the latest.

It should be noted that the term _ id and the op _ id of each node in the cluster are sent along with the heartbeat message, and synchronization is performed between the nodes, that is, synchronization of the history information of each node between other nodes can be realized.

Taking the example that the history information includes the log operation serial number and the random number, the weight parameters corresponding to the log operation serial number and the random number are respectively recorded as w _t ，w _op Wherein w is _t ，w _op May be specified according to a particular scenario. The weighted sum of each candidate node may be determined by the following formula:

W＝w _t *term_id+w _op * op _ id formula (1);

wherein w _t Represents the weight corresponding to the ren term number, w _op And the weights corresponding to the log operation sequence numbers are represented, term _ id represents an arbitrary number, op _ id represents the log operation sequence numbers, and W represents the sum of the weights.

1b) And sequencing the historical information of each candidate node in the historical information set according to the weight sum of each candidate node, and determining the arrangement order of the historical information of the current candidate node.

In implementation, if the current candidate node locally stores the history information of i nodes, the weight sum corresponding to the history information of each node is W ₁ 、W ₂ 、W ₃ 、……、W _i ；

Wherein, W ₁ ≤W ₂ ≤……≤W _i The weight sum of the current candidate node is arranged at the jth bit, that is, the arrangement order of the history information of the current candidate node is j.

For example, if the current candidate node stores history information of four nodes including itself,<term_id,op_id>the method comprises the following steps:<5,6>,<5,5>,<6,6>,<6,4>. Get w _t ＝30％、w _op =80%, the sum of weights is W =6.3, 5.5, 6.6, 5, respectively. Wherein, the weight sum of the current candidate node is 6.3, and the ranking order is 2 nd.

(2) And determining the election overtime time of the current candidate node according to a preset election overtime upper limit, a preset election overtime lower limit and the arrangement sequence of the historical information of the current candidate node.

When the history information comprises a log operation sequence number and an optional number, determining the election timeout time of the current candidate node by the following method:

2a) Determining an election time interval according to the preset election timeout upper limit, the preset election timeout lower limit and the number of candidate nodes corresponding to each historical information in the historical information set;

in practice, if there are a preset election timeout period upper limit (defined as Upb) and election timeout period lower limit (defined as Lob) for each cluster, the election interval (position as I) can be obtained by the following formula:

and n represents the number of candidate nodes corresponding to each piece of history information locally stored in the current candidate node.

2b) And determining the election overtime time of the current candidate node according to the preset election overtime lower limit, the election time interval and the arrangement sequence of the historical information of the current candidate node.

In practice, the election timeout time is determined by the following equation:

t = random (Lob + I (j-1), lob + Ij-1) equation (3);

wherein j represents the arrangement order of the history information of the current candidate node, lob represents a preset election timeout time lower limit, I represents an election time interval, and T represents the election timeout time.

random (a, b) represents a random number within the interval [ a, b ].

For example, if the cluster is preset with Upb =200ms and Lob =100ms, and each of node 1 and node 2 locally stores history information of 4 nodes, that is, n =4, and then the node 1 weight and rank 1, the node 2 weight and rank 3 are:

election time interval

Election timeout T for node 1 ₁ ＝random(100ms+25ms*(1-1),100ms+25ms*1-1)＝random(100ms,124ms)；

Election timeout T for node 2 ₂ ＝random(100ms+25ms*(3-1),100ms+25ms*3-1)＝random(150ms,174ms)；

Obviously, node 1 has richer history information, and the election timeout time is shorter than that of node 2, so that election can be initiated more quickly and is more likely to become a Leader node. Therefore, by combining the historical information, the election timeout time can be more reasonable, the node which is more suitable for the Leader initiates the voting first, and the purpose of improving the election reliability is achieved.

Step 201, after the election timeout time of each candidate node, initiating voting, and if the obtained number of votes exceeds the number of half of the nodes in the distributed cluster, determining that the candidate node is the elected master node.

In implementation, after each candidate node passes the election timeout time of the candidate node, the candidate node increases the option number by 1, initiates voting election and requires the rest candidate nodes to vote for the candidate node.

When the candidate node receives the voting request, the candidate node decides to approve the voting or reject the voting according to a certain check standard.

Wherein the inspection standard comprises:

(2) There is no master node in the cluster at this time.

If the cluster has a vote which is more than N/2 and is taken by the candidate node, N is the number of all the nodes in the cluster, the candidate node becomes a new main node at the moment, and the heartbeat is sent to the other candidate nodes to announce the candidate node to become the main node; and the rest candidate nodes in the cluster receive the heartbeat of the master node, prove that a new master node is generated, immediately stop the behaviors of voting and election, return the candidate nodes to the slave nodes, and finish the election of the cluster.

In some embodiments, if there are no candidate nodes whose obtained votes exceed half the number of nodes in the distributed cluster, the elected master node is determined according to the history information of each candidate node in the distributed cluster.

In some embodiments, for any candidate node, the elected master node is determined according to each historical information in the historical information set stored by the any candidate node.

In some embodiments, when the history information includes a log operation serial number and an appointment number, determining a weight sum of each history information according to the log operation serial number and the appointment number of each history information stored in any candidate node and a weight parameter corresponding to each log operation serial number and appointment number; and determining the maximum weight and the corresponding candidate node as the elected master node.

In implementation, no candidate node in the cluster obtains more than half of votes of the nodes, and a new master node is selected according to the history information. That is, the current candidate node will compare the sum of the weights of the history information of other locally stored candidate nodes, if the sum of the weights corresponding to the current candidate node is the maximum, the current candidate node immediately becomes the master node, and sends a heartbeat statement to other nodes to declare the master node status of the current candidate node.

In some embodiments, the weight sum of the history information of each candidate node is determined according to the log operation serial number and the tenure number of the history information of each candidate node and the weight parameters respectively corresponding to the log operation serial number and the tenure number; and determining the maximum weight and the corresponding candidate node as the elected main node.

In some embodiments, in a non-election situation, if history information of any node in the distributed cluster is updated, the updated history information is synchronously updated among other nodes in the distributed cluster; wherein the respective other nodes comprise nodes in the distributed cluster other than the any node. That is, history information can be shared among nodes in the distributed cluster, so that in the case that no candidate node with the number of votes exceeding half of the number of nodes in the distributed cluster exists, a elected master node is determined according to each history information in a history information set stored by any candidate node in the distributed cluster. The historical information set stored by any candidate node comprises at least more than half of the historical information of the nodes.

The election overtime time is determined based on the historical information, so that the election result is more reasonable, the randomness of a cluster owner is reduced, and the stability and reliability of the cluster are effectively improved; when election fails, a new main is quickly determined based on historical information, election speed is increased, election efficiency is effectively improved, and the requirement of cluster fault recovery speed under various scenes is met.

As shown in fig. 5, an embodiment of the present invention provides a specific implementation flow of distributed cluster election, which is as follows:

step 500, judging whether the heartbeat is overtime, if yes, executing step 501, otherwise executing step 508;

Step 501, changing all slave nodes in the cluster into candidate nodes;

502, determining election timeout time by each candidate node according to a stored historical information set;

step 502a, under the non-electing condition, if the history information of any node in the distributed cluster is updated, the updated history information is synchronously updated among other nodes in the distributed cluster; wherein the respective other nodes comprise nodes in the distributed cluster other than the any node;

step 502b, the history information comprises a log operation sequence number and an arbitrary number; determining the weight sum of each candidate node according to the log operation serial number and the tenure number of each candidate node in the historical information set and the weight parameters respectively corresponding to the log operation serial number and the tenure number; according to the weight sum of each candidate node, sorting the historical information of each candidate node in the historical information set, and determining the arrangement order of the historical information of the current candidate node;

502c, determining an election time interval according to the preset election timeout upper limit, the preset election timeout lower limit and the number of candidate nodes corresponding to each historical information in the historical information set; and determining the election overtime time of the current candidate node according to the preset election overtime lower limit, the election time interval and the arrangement sequence of the historical information of the current candidate node.

The method comprises the steps that a current candidate node determines election timeout time according to a stored historical information set, wherein the historical information set comprises historical information of the current candidate node and historical information of at least one other candidate node except the current candidate node.

Step 503, after the election timeout time of each candidate node, initiating a voting election and requiring the rest candidate nodes to vote for itself, wherein the candidate nodes increase the due time numbers of themselves by 1;

Wherein the inspection standard comprises:

(2) There is no master node in the cluster at this time.

If the check is passed, the candidate node votes and updates the local tenure stored by the candidate node to be the tenure in the voting request, otherwise, the voting is refused.

Step 504, judging whether candidate nodes with the number exceeding the number of half of the nodes in the distributed cluster exist in the cluster, if so, executing step 505, otherwise, executing step 506;

505, determining that the candidate node becomes a new master node, and sending heartbeats to other nodes to declare the candidate node to become the master node;

step 506, determining the elected main node according to the history information of each candidate node in the distributed cluster, and sending heartbeat to other nodes to declare the node to be the main node;

and aiming at any candidate node, determining the elected master node according to each historical information in the historical information set stored by any candidate node. Specifically, the sum of the weights of the historical information is determined according to the log operation serial number and the tenure number of each piece of historical information stored in any one of the candidate nodes, and the weight parameters corresponding to the log operation serial number and the tenure number respectively; and determining the maximum weight and the corresponding candidate node as the elected main node.

And 507, immediately stopping voting and election behaviors, returning the candidate nodes to slave nodes, and finishing the election of the cluster.

And step 508, the node state is not required to be changed.

The method is optimized for the problem of high election randomness caused by randomly generating election timeout time, determines the election timeout time based on log, random period and other historical information, and distributes shorter time to nodes which are more suitable to become a Leader, so that election results are more reasonable, and stability and reliability of a cluster are effectively improved; the method is optimized aiming at the problem that the election consumes a long time due to repeated elections, a new main is determined quickly based on historical information, the election time is shortened, the election efficiency is effectively improved, and the cluster fault recovery speed is increased.

The embodiment can be applied to a file storage product of a CStor distributed file storage system. In the case of a complex cluster application scenario, if randomly generated election timeout time is adopted, cluster election reliability is low, and repeated election when election fails will cause the problem of long election time consumption. Cluster election is performed by using a historical information-based method, and compared with the traditional random generation election timeout time, the cluster reliability can be effectively improved, so that the election result is more reasonable; when election fails, a new main is quickly determined according to historical information, compared with the traditional repeated election mode, time consumed by election can be effectively reduced, election efficiency is improved, and the requirement of fault recovery speed under various scenes is met. The present application is therefore more suitable for use in Cstor distributed file storage.

Embodiment 2 provides a device for distributed cluster election based on the same inventive concept, and since the device is a device in the method in the embodiment of the present invention and the principle of the device to solve the problem is similar to that of the method, the implementation of the device may refer to the implementation of the method, and repeated parts are not described again.

As shown in fig. 6, the apparatus includes a processor 600 and a memory 601, where the memory 601 is used for storing programs executable by the processor 600, and the processor 600 is used for reading the programs in the memory 601 and executing the following steps:

As an alternative implementation, the processor 600 is specifically configured to perform:

the processor 600 is specifically configured to perform:

As an optional implementation manner, the processor 600 is specifically further configured to perform:

and aiming at any candidate node, determining the elected main node according to each historical information in the historical information set stored by any candidate node.

the processor 600 is specifically configured to perform:

As an optional implementation, the processor 600 is specifically further configured to perform:

Embodiment 3, based on the same inventive concept, the embodiment of the present invention further provides a device for distributed cluster election, where the device is a device in the method in the embodiment of the present invention, and a problem solving principle of the device is similar to that of the method, so that reference may be made to implementation of the device for the method, and repeated details are not described here.

As shown in fig. 7, the apparatus includes:

an election time determining unit 700, configured to determine, for each candidate node in the distributed cluster, an election timeout time of each candidate node in the distributed cluster according to the following manner if the distributed cluster has no master node: determining election timeout time by a current candidate node according to a stored historical information set, wherein the historical information set comprises historical information of the current candidate node and historical information of at least one other candidate node except the current candidate node;

and a voting node unit 701, configured to initiate voting after each candidate node has passed its own voting timeout time, and determine that the candidate node is a voted master node if the number of votes obtained exceeds half of the number of nodes in the distributed cluster.

As an optional implementation manner, the election time determining unit 700 is specifically configured to:

As an optional implementation, the history information includes a log operation sequence number and an option number; the election time determining unit 700 is specifically configured to:

As an optional implementation manner, the election determining time unit 700 is specifically configured to:

As an optional implementation manner, the voting node unit 701 is further specifically configured to:

As an optional implementation manner, the voting node unit 701 is specifically configured to:

the voting node unit 701 is specifically configured to:

and determining the maximum weight and the corresponding candidate node as the elected master node.

As an optional implementation manner, the method further includes a synchronous update unit, specifically configured to:

Based on the same inventive concept, an embodiment of the present invention further provides a computer storage medium having a computer program stored thereon, where the computer program is executed by a processor to implement the following steps:

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for distributed cluster election, the method comprising:

2. The method of claim 1, wherein determining, by the current candidate node, an election timeout time based on the stored set of historical information comprises:

3. The method of claim 2, wherein the historical information comprises a log operation sequence number and an option number;

the determining the ranking order of the history information of the current candidate node in the history information set comprises:

4. The method of claim 2, wherein determining the election timeout time of the current candidate node according to a preset election timeout upper limit, a preset election timeout lower limit and a ranking order of the history information of the current candidate node comprises:

5. The method of any one of claims 1 to 4, further comprising:

6. The method of claim 5, wherein determining the elected master node according to the historical information of each candidate node in the distributed cluster comprises:

7. The method of claim 6, wherein the historical information comprises a log operation sequence number and an option number;

8. The method according to any one of claims 1 to 4, further comprising:

9. A device for distributed cluster election, characterized in that the device comprises a processor and a memory, said memory being adapted to store a program executable by said processor, said processor being adapted to read the program in said memory and to perform the steps of the method according to any of claims 1 to 8.

10. A computer storage medium on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the steps of the method according to any one of claims 1 to 8.