CN113014634B

CN113014634B - Cluster election processing method, device, equipment and storage medium

Info

Publication number: CN113014634B
Application number: CN202110194074.3A
Authority: CN
Inventors: 赵永亮; 裴雁峰; 高斌; 张清林
Original assignee: Chengdu New Hope Finance Information Co Ltd
Current assignee: Chengdu New Hope Finance Information Co Ltd
Priority date: 2021-02-20
Filing date: 2021-02-20
Publication date: 2023-01-31
Anticipated expiration: 2041-02-20
Also published as: CN113014634A

Abstract

The application provides a cluster election processing method, a device, equipment and a storage medium, and relates to the technical field of distributed clusters. The method comprises the following steps: when the current main node fails, the current standby node is switched to a new main node of the distributed cluster system; and each slave node respectively initiates first proposal information to other slave nodes except the slave node, and elects a new standby node according to the first voting information of the other slave nodes to obtain a new standby node of the distributed cluster system, wherein the first proposal information is used for representing and election the new standby node, and the first voting information is used for indicating the slave node identifier selected by the other slave nodes as the standby node. When the main node breaks down, the main node can be directly switched, the switching time consumption is small, and the continuous work of the system can be ensured. After the standby nodes are switched to the main node, each slave node can trigger to elect a new standby node immediately so as to ensure that the standby nodes exist in the system all the time, thereby ensuring the switching continuity and improving the system performance.

Description

Cluster election processing method, device and equipment and storage medium

Technical Field

The present application relates to the field of distributed cluster technologies, and in particular, to a cluster election processing method, apparatus, device and storage medium.

Background

With the emergence of large-scale internet companies in recent years, the business complexity is higher and higher, the development trend of micro-service and componentization of the system division is realized, so that higher requirements are put on the risk resistance of the system, and a high-available system architecture is produced. In this highly available system framework, techniques for determining a master node by election are involved.

In a traditional high-availability system architecture, a master node is elected from all nodes in the election process, the elected master node controls slave nodes to execute system tasks, and when the master node is unavailable, the system triggers new election to determine a new master node.

However, the above method may cause the service of the whole system to be suspended in the process of triggering a new election, thereby reducing the high reliability and availability of the system.

Disclosure of Invention

An object of the present application is to provide a method, an apparatus, a device, and a storage medium for cluster election processing, so as to solve the problem in the prior art that the reliability and the availability of a distributed cluster system are poor.

In order to achieve the above purpose, the technical solutions adopted in the embodiments of the present application are as follows:

in a first aspect, an embodiment of the present application provides a cluster election processing method, which is applied to a distributed cluster system, where the distributed cluster system includes: the system comprises a current main node, a current standby node and at least one slave node; the method comprises the following steps:

when the current main node fails, the current standby node is switched to be a new main node of the distributed cluster system;

and each slave node respectively initiates first proposal information to other slave nodes except the slave node, and elects a new standby node according to first voting information of the other slave nodes to obtain the new standby node of the distributed cluster system, wherein the first proposal information is used for representing and election the new standby node, and the first voting information is used for indicating the slave node identifier selected by the other slave nodes as the standby node.

Optionally, the switching the current standby node to be a new master node of the distributed cluster system includes:

the new master node replaces a target identifier with a node identifier of the new master node, wherein the target identifier is used for identifying the current master node of the distributed cluster system;

and the new main node sends synchronization information to nodes except the new main node in the distributed cluster system, wherein the synchronization information is used for indicating the nodes except the new main node to replace the locally stored target identification with the node identification of the new main node.

Optionally, when the current master node fails, switching the current standby node to a new master node of the distributed cluster system includes:

if the current standby node does not receive the heartbeat information of the current main node within a preset time length, the current standby node sends second proposal information to each slave node of the distributed cluster system, wherein the second proposal information is used for representing the switching from the standby node to the main node;

receiving second voting information of each slave node, wherein the second voting information is used for indicating whether each slave node confirms that a current master node fails;

and determining to switch to a new main node of the distributed cluster system according to the second voting information.

Optionally, the initiating, by each slave node, first proposal information to other slave nodes except the slave node, and electing a new standby node according to the first voting information of the other slave nodes, respectively includes:

the first slave node sends third voting information to other slave nodes except the first slave node, wherein the third voting information is used for indicating the slave node identifier selected by the first slave node as a standby node, and the first slave node is any one of the slave nodes;

the first slave node receiving first voting information of other slave nodes, the first voting information being determined by the other slave nodes based on the received proposal information;

the first slave node determines a node to be selected according to the first voting information and the local voting information;

and the first slave node sends a confirmation proposal to the node to be selected, and the node to be selected determines whether to switch the node to be selected to a new standby node according to the number of the received confirmation proposals.

Optionally, the determining, by the first slave node, a node to be selected according to the first voting information and the local voting information includes:

and the first slave node determines a node to be selected according to the first voting information, the local voting information and the received proposal information from other slave nodes.

Optionally, the determining, by the first slave node, a node to be selected according to the first voting information, the local voting information, and the received proposal information from other slave nodes includes:

and if the first slave node receives proposal information from a second slave node and determines that the number of votes obtained by the second slave node meets a vote threshold value based on the first voting information and the local voting information, determining the second slave node as a node to be selected.

Optionally, the method further comprises:

when the current standby node fails, each slave node respectively initiates first proposal information to other slave nodes except the slave node, and elects a new standby node according to the first voting information of the other slave nodes.

Optionally, when the current master node fails, before the current standby node is switched to a new master node of the distributed cluster system, the method further includes:

when the distributed cluster system is initialized, an initial main node is selected from all nodes of the distributed cluster system, and an initial standby node is selected from all nodes except the initial main node.

In a second aspect, an embodiment of the present application further provides a device for processing cluster election, where the device is applied to a distributed cluster system, and the distributed cluster system includes: the system comprises a current main node, a current standby node and at least one slave node; the device comprises: a switching module and an election module;

the switching module is configured to switch the current standby node to a new master node of the distributed cluster system when the current master node fails;

the election module is configured to initiate first proposal information to other slave nodes except the slave node by each slave node, elect a new slave node according to first voting information of the other slave nodes, and obtain the new slave node of the distributed cluster system, where the first proposal information is used to represent an election of the new slave node, and the first voting information is used to indicate a slave node identifier selected by the other slave nodes as the slave node.

Optionally, the switching module is specifically configured to replace, by the new master node, a target identifier with a node identifier of the new master node, where the target identifier is used to identify a current master node of the distributed cluster system; and the new main node sends synchronization information to nodes except the new main node in the distributed cluster system, wherein the synchronization information is used for indicating the nodes except the new main node to replace the locally stored target identification with the node identification of the new main node.

Optionally, the switching module is specifically configured to, if the current standby node does not receive heartbeat information of the current host node within a preset time period, send second proposal information to each slave node of the distributed cluster system by the current standby node, where the second proposal information is used to represent that the standby node is switched to the host node; receiving second voting information of each slave node, wherein the second voting information is used for indicating whether each slave node confirms that a current master node fails; and determining to switch to a new main node of the distributed cluster system according to the second voting information.

Optionally, the election module is specifically configured to send, by the first slave node, third voting information to other slave nodes except the first slave node, where the third voting information is used to indicate a slave node identifier selected by the first slave node as a standby node, and the first slave node is any one of the slave nodes; the first slave node receiving first voting information of other slave nodes, the first voting information being determined by the other slave nodes based on the received proposal information; the first slave node determines a node to be selected according to the first voting information and the local voting information; and the first slave node sends a confirmation proposal to the node to be selected, and the node to be selected determines whether to switch the node to be selected to a new standby node according to the number of the received confirmation proposals.

Optionally, the election module is specifically configured to determine, by the first slave node, a node to be selected according to the first voting information, the local voting information, and the received proposal information from other slave nodes.

Optionally, the election module is specifically configured to determine the second slave node as the node to be elected, if the first slave node receives proposal information from the second slave node, and it is determined that a vote count of the second slave node satisfies a vote threshold based on the first voting information and the local voting information.

Optionally, the election module is further configured to, when the current slave node fails, initiate first proposal information to other slave nodes except the slave node by each slave node, and elect a new slave node according to the first voting information of the other slave nodes.

Optionally, the election module is further configured to, when the distributed cluster system is initialized, select an initial master node from each node of the distributed cluster system, and select an initial standby node from each node except the initial master node.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operated, the processor executing the machine-readable instructions to perform the steps of the method as provided in the first aspect when executed.

In a fourth aspect, embodiments of the present application provide a storage medium having a computer program stored thereon, where the computer program is executed by a processor to perform the steps of the method as provided in the first aspect.

The beneficial effect of this application is:

the application provides a cluster election processing method, a device, equipment and a storage medium, wherein the method comprises the following steps: when the current main node fails, the current standby node is switched to a new main node of the distributed cluster system; and each slave node respectively initiates first proposal information to other slave nodes except the slave node, and elects a new standby node according to the first voting information of the other slave nodes to obtain a new standby node of the distributed cluster system, wherein the first proposal information is used for representing and election the new standby node, and the first voting information is used for indicating the slave node identifier selected by the other slave nodes as the standby node. In the scheme, the distributed cluster system can comprise a main node and a standby node, and when the main node fails, the standby node can be switched to the main node to take over the task of the main node. Compared with the prior art, when the main node fails, each slave node triggers new election to determine the new main node, and service of the system is suspended in the election process. In addition, after the standby nodes are switched to the main node, each slave node can trigger and elect a new standby node immediately so as to ensure that the standby nodes exist in the system at all times, thereby ensuring the switching continuity and further improving the availability of the system.

In addition, by means of the mode that the master node is supervised by the slave nodes and the slave nodes supervise the slave nodes, the division of labor of each node in the system is more definite, the reliability of the system architecture is higher, and the cluster unavailability caused by the fault of the master node is avoided.

Drawings

To more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a schematic architecture diagram of a distributed cluster system according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a cluster election processing method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of another cluster election processing method according to an embodiment of the present application;

fig. 4 is a schematic diagram of information synchronization provided in an embodiment of the present application;

fig. 5 is a schematic flowchart of another cluster election processing method according to an embodiment of the present application;

fig. 6 is a schematic diagram illustrating an acknowledgement of a standby node switching master node according to an embodiment of the present application;

fig. 7 is a schematic flowchart of another cluster election processing method according to an embodiment of the present application;

fig. 8 is a schematic diagram of a cluster broadcasting process according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a validation proposal according to an embodiment of the present application;

fig. 10 is a schematic diagram of a cluster election processing apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are only for illustration and description purposes and are not used to limit the protection scope of the present application. Further, it should be understood that the schematic drawings are not drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and that steps without logical context may be reversed in order or performed concurrently. In addition, one skilled in the art, under the guidance of the present disclosure, may add one or more other operations to the flowchart, or may remove one or more operations from the flowchart.

In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as presented in the figures, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the term "comprising" will be used in the embodiments of the present application to indicate the presence of the features stated hereinafter, but does not exclude the addition of further features.

First, some noun terms that may be referred to in the present application are explained:

1. holding Leader (Ruling Leader): RL, the node that is providing Leader service, i.e. the master node, from which the data of other followers are synchronized.

2. Supervisor Leader (Opposition Leader): for short, OL, the backup Leader node, that is, the backup node, performs switching when RL cannot provide service.

3. Follower (Follow): f, the node that really provides service, has the right to vote, synchronizes data from the RL and monitors the condition of the RL.

4. Election: procedure for the whole cluster to pick out a new RL or OL when the RL is up or not up.

5. And (3) changing: the process of transferring from OL to RL when RL is not available.

6. And (3) during the election period: the period between when the whole cluster is started and when RL, OL is selected.

7. Switching period: the time when OL is switched to RL.

8. A certificate enforcement period: the RL controls the period during which the entire cluster provides normal service.

9. Proposal (propofol): the voting content of one election.

Fig. 1 is a schematic structural diagram of a distributed cluster system provided in an embodiment of the present application, where a cluster election processing method provided in the following embodiments of the present application is applied to the distributed cluster system, and as shown in fig. 1, the distributed cluster system may include: the master node is used for controlling the execution of the tasks of the whole cluster system, and the standby node is used as a standby node of the master node and is switched to the master node when the master node is unavailable, so that the uninterrupted task execution of the system is ensured. And the slave node is used as a real executor of the task, and executes the corresponding task according to the task information after receiving the task information sent by the master node.

Optionally, the standby node may monitor whether the master node is abnormal through heartbeat monitoring, when the heartbeat of the master node is not monitored for a preset time, a proposal for switching to the master node is initiated to the slave node, the slave node selects to accept or reject the proposal of the standby node according to the heartbeat condition of each monitored master node, and when more than half of the slave nodes accept the proposal of the standby node, the standby node is switched to the master node to become a new master node.

After the standby node is switched to the main node, the standby nodes are competitively selected among the slave nodes through proposal and voting so as to ensure that the standby nodes are always prepared in the cluster system, so that the standby nodes can be switched in time when the main node is unavailable, the time of service interruption is reduced, the uninterrupted operation of the system is ensured, and the high availability of the system is improved.

Fig. 2 is a schematic flowchart of a cluster election processing method according to an embodiment of the present application; the execution subject of the method may be any node in the distributed cluster system. The distributed cluster system includes: the system comprises a current main node, a current standby node and at least one slave node; as shown in fig. 2, the method may include:

s201, when the current main node has a fault, the current standby node is switched to be a new main node of the distributed cluster system.

Generally, a distributed cluster system is a system formed by a plurality of servers or computers through distributed arrangement, each server or computer can be used as a node in the system, a plurality of nodes can cooperate with each other, and the functions of the whole system are distributed on the nodes, so that the high availability of the system is achieved.

It should be noted that, a master node at any current time is also referred to as a current master node, for example: the current time is 10 points, then the master node in the system at 10 points is assumed to be a, the master node a is the current master node, and when the current time becomes 11 points, the master node B is the current master node if the master node of the system at 11 points is assumed to be B.

Optionally, in the distributed cluster system of the present application, at the beginning of system startup, initial master nodes and initial backup nodes are obtained by election among nodes in the system through election voting, and the rest nodes that have not been successfully elected are used as slave nodes. When receiving the task request, the initial master node sends the task request to each slave node, so that each slave node executes the task and maintains the service of the system.

In some cases, the master node may be down and unavailable due to overload of the master node or a failure of the hardware device of the master node. When the current main node has a fault, the current standby node can be switched to the main node to become a new main node in the system, and the task of the main node is executed. Because the standby node is predetermined, when the main node fails, the standby node can be immediately switched to the main node in a switching mode to execute the task of the main node, thereby ensuring the uninterrupted operation of the system and improving the high availability of the system.

S202, each slave node respectively initiates first proposal information to other slave nodes except the slave node, and elects a new standby node according to the first voting information of the other slave nodes to obtain the new standby node of the distributed cluster system, wherein the first proposal information is used for representing the election of the new standby node, and the first voting information is used for indicating the slave node identifier selected by the other slave nodes as the standby node.

Optionally, after the current standby node is switched to the master node, since the standby node does not exist in the system, election of the standby node can be triggered immediately among the slave nodes, and a new standby node is elected through election to ensure that the standby node exists in the system all the time, so that real-time switching can be ensured when the master node fails.

In an implementation manner, each slave node can respectively send first proposal information to other slave nodes except for the slave node, and can also receive the first proposal information sent by other slave nodes to the slave node, wherein the first proposal information is used for representing the slave node to select the slave node. Alternatively, each slave node may broadcast its own proposal information in the cluster in the form of a broadcast to inform other slave nodes.

Each slave node can also receive first voting information sent by other slave nodes at the same time, and the first voting information is used for representing the voting objects selected by other slave nodes, so that each slave node can count votes according to the received first voting information of other slave nodes. And selecting a new standby node according to the ticket counting result, and using the new standby node as the new standby node of the distributed cluster system.

When the node switched from the current standby node to the new main node fails again, the selected new standby node can be switched to the new main node, and the operation is continued, so that the high availability of the system is ensured.

In summary, the cluster election processing method provided in this embodiment includes: when the current main node fails, the current standby node is switched to a new main node of the distributed cluster system; and each slave node respectively initiates first proposal information to other slave nodes except the slave node, and elects a new standby node according to the first voting information of the other slave nodes to obtain a new standby node of the distributed cluster system, wherein the first proposal information is used for representing and election the new standby node, and the first voting information is used for indicating the slave node identifier selected by the other slave nodes as the standby node. In the scheme, the distributed cluster system can comprise a main node and a standby node, and when the main node fails, the standby node can be switched to the main node to take over the task of the main node. Compared with the prior art, when the main node fails, each slave node triggers new election to determine the new main node, and service of the system is suspended in the election process. In addition, after the standby nodes are switched to the main node, each slave node can trigger to elect a new standby node immediately so as to ensure that the standby nodes exist in the system at any time, thereby ensuring the switching continuity and further improving the availability of the system.

Fig. 3 is a schematic flowchart of another cluster election processing method according to an embodiment of the present application; optionally, as shown in fig. 3, in step S201, switching the current standby node to be a new master node of the distributed cluster system may include:

s301, the new master node replaces the target identification with the node identification of the new master node, and the target identification is used for identifying the current master node of the distributed cluster system.

Generally, each node's own marked data includes RL ID, that is, the identity of the master node, and OL ID, which represents the identity of the slave node, and at any time, the RL ID and OL ID marked in the node are the current latest master node identity and slave node identity.

Optionally, after the current standby node is switched to the new master node, the new master node needs to replace the target identifier marked by the new master node with the node identifier of the new master node, that is, with the node identifier of the standby node that is switched. The target identifier is identified as a current master node of the distributed cluster system, and after switching, the current master node is also the new master node.

For example, the current master node is node a, the target identifier marked at this time is the identifier of node a, after the current master node a fails, the current standby node B is switched to a new master node, before the standby node B is switched, the target identifier marked by the standby node is the identifier of node a, and after the standby node B is switched, the standby node B becomes a new master node.

S302, the new main node sends synchronization information to nodes except the new main node in the distributed cluster system, and the synchronization information is used for indicating the nodes except the new main node to replace the locally stored target identification with the node identification of the new main node.

In some embodiments, after replacing the target identifier marked by the new master node with the identifier of the new master node, the new master node may further send synchronization information to other nodes except the new master node to synchronize the identifier of the new master node to the other nodes, so as to instruct the other nodes to replace the locally marked target identifier with the node identifier of the new master node.

Fig. 4 is a schematic diagram of information synchronization provided in an embodiment of the present application. As shown in fig. 4, it is assumed that the standby node a is determined to be switched to a new master node in the above manner, and the other nodes respectively include: node B, node C, node D, node E, the new master node A target ID001, the new master node A will target ID001 to node B, node C, node D, node E synchronously.

It should be noted that one of the node B, the node C, the node D, and the node E may be a failed master node, and although the failed master node may not receive synchronization information due to a failure, during synchronization, all nodes except the failed master node are synchronized, and synchronization is not excluded from being transmitted to the failed master node.

Fig. 5 is a schematic flowchart of another cluster election processing method according to an embodiment of the present application; optionally, in step S201, when the current master node fails, switching the current standby node to a new master node of the distributed cluster system may include:

s501, if the current standby node does not receive the heartbeat information of the current main node within a preset time length, the current standby node sends second proposal information to each slave node of the distributed cluster system, and the second proposal information is used for representing that the standby node is switched to the main node.

Generally, as a master node in a system, in a process of taking the role of the master node, heartbeat information is sent to a standby node and a slave node in the system at a preset time interval, where the heartbeat information may refer to a self-defined structure (heartbeat packet) for the standby node and the slave node to monitor the master node so as to ensure the validity of connection. The preset time of the interval may be several seconds, and may be adaptively set according to actual conditions.

In the scheme, the standby node is used as a supervisor for monitoring the state of the main node. If the current standby node does not receive the heartbeat information of the current master node within the preset time length, second proposal information can be sent to each slave node so as to declare to be switched into the master node and enter a waiting state.

Optionally, by means of the mode that the master node is supervised by the slave nodes and the slave nodes supervise the slave nodes, the division of labor of each node in the system is more definite, the reliability of the system architecture is higher, and the cluster unavailability caused by the fault of the master node is avoided.

Fig. 6 is a schematic diagram of a confirmation that a standby node switches a master node according to an embodiment of the present application. The dummy device node is a node E, and the slave node includes: node B, node C, node D. The backup node E sends the second proposal information to the node B, the node C, and the node D, respectively.

S502, second voting information of each slave node is received, wherein the second voting information is used for indicating whether each slave node confirms that the current master node has a fault.

Alternatively, as shown in fig. 6, after receiving the second proposal information sent by the standby node, each slave node may determine the second voting information according to the heartbeat information of the master node monitored by the slave node, so as to send the second voting information to the standby node.

Optionally, each slave node may vote according to heartbeat information of the master node monitored by the slave node, and when any slave node receives second proposal information of the standby node, finds that the heartbeat information of the master node can be monitored, the slave node determines that the master node is not in fault, and sends second voting information for rejecting proposal to the standby node. When any slave node receives the second proposal information of the standby node, and finds that the heartbeat information of the master node cannot be monitored, the slave node confirms that the master node fails, and sends second voting information for accepting the proposal to the standby node.

And S503, determining to switch to a new master node of the distributed cluster system according to the second voting information.

In an implementation manner, the standby node may determine whether to switch to a new master node according to the received second voting information sent by each slave node.

Optionally, if the number of the second voting information that is submitted in the second voting information received by the standby node exceeds half of the total number of the voting nodes, it is determined that the standby node can be switched to a new master node. Where the total number of voting nodes refers to the number of all nodes participating in the vote.

Fig. 7 is a schematic flowchart of another cluster election processing method according to an embodiment of the present application; optionally, in step S202, the initiating, by each slave node, first proposal information to other slave nodes except the slave node, and electing a new standby node according to the first voting information of the other slave nodes may include:

and S701, the first slave node sends third voting information to other slave nodes except the first slave node, wherein the third voting information is used for indicating the slave node identifier selected by the first slave node as a standby node, and the first slave node is any one of the slave nodes.

In this embodiment, a method of electing a new slave node is described in terms of any slave node among the slave nodes.

Fig. 8 is a schematic diagram of a cluster broadcasting process according to an embodiment of the present application. Assuming that, in addition to the master node and the standby node, the other slave nodes include a node a, a node B, a node C, a node D, and a node E, in the process of each slave node competing for a new standby node, as shown in fig. 8, each slave node may initiate first proposal information to any other node, and at the same time, each slave node may also vote for the proposal information initiated by any other slave node.

Taking the first slave node as an example, the first slave node may send third voting information to other slave nodes except for itself, where the third voting information is used to indicate a voting object selected by the first slave node, where the voting object may refer to a slave node that the first slave node recognizes as a standby node.

S702, the first slave node receives first voting information of other slave nodes, and the first voting information is determined by the other slave nodes based on the received proposal information.

The first slave node can also receive the first voting information of other slave nodes while transmitting the voting information to other slave nodes, so that the first slave node can count votes according to the first voting information of each other slave node.

The first voting information of each slave node can be determined according to proposal information received by each slave node and sent by other slave nodes. The proposal information sent by each node may include: an identification of the node, and a timestamp, where the timestamp may refer to a timestamp of the initiation of the proposal information.

The following table 1 shows proposed information broadcast tables among node a, node B, node C, node D, and node E.

TABLE 1

Taking node a as an example, node a may determine its first voting information to vote according to the received first proposal information sent by node B, node C, node D, and node E, respectively.

In an implementation manner, if the votes of the first proposal information in the first proposal information received by the node a are different, the node with the smallest votes is determined as the voting object of the node a, and if the votes of at least two nodes are the same, the node with the smallest ID is determined as the voting object of the node a. According to the proposed information received by the node a in table 1, according to the above rule, it can be determined that the voting object of the node a is itself, that is, the first voting information of the node a is ID001, that is, the node a, and similarly, it can be obtained that the first voting information of the node B is also ID001 of the node a, and the first voting information of the node C, the node D, and the node E are all ID001 of the node a.

And S703, the first slave node determines the node to be selected according to the first voting information and the local voting information.

Optionally, the first slave node may determine the node to be selected according to the received first voting information sent by each slave node and the voting information of the first slave node. And determining nodes with the obtained voting number exceeding half of the total number of the nodes participating in voting as nodes to be selected according to the first voting information and the voting information of the nodes.

For example: assuming that the node A is a first node, the local vote of the node A is to be cast to the node A, the vote of the node B received by the node A is also to be cast to the node A, and the vote of the node C received by the node A is also to be cast to the node A, so that the current vote counting result of the node A is three votes, and the total votes participating in the vote include the node A, the node B, the node C, the node D and the node E, which are five in total, so that the vote counting of the node A is three and is more than half of five, the node A can determine that the node A is a node to be selected. And similarly, the node B, the node C, the node D and the node E can determine the node to be selected as the node A according to the ticket counting result of the node B, the node C, the node D and the node E.

S704, the first slave node sends a confirmation proposal to the node to be selected, and the node to be selected determines whether to switch the node to be selected to a new standby node according to the number of the received confirmation proposals.

Optionally, the first slave node may send a confirmation proposal to the candidate node confirmed by itself and wait for the state to be published, and for the candidate node, the candidate node may determine whether it becomes a new candidate node according to the number of the confirmation proposals received by itself.

Fig. 9 is a schematic diagram of a confirmation proposal according to an embodiment of the present application. As shown in the figure, as can be seen from the above, each node determines that node a is a node to be selected, node B, node C, node D, and node E all send confirmation proposal information to node a.

When the number of the received confirmation proposals is judged to exceed half of all the nodes participating in voting, namely to exceed 5, the node to be selected publishes the node to be selected as a new standby node to each slave node, and the node to be selected is switched to the new standby node.

Optionally, in step S703, the determining, by the first slave node, the node to be selected according to the first voting information and the local voting information may include: and the first slave node determines the node to be selected according to the first voting information, the local voting information and the received proposal information from other slave nodes.

In some embodiments, some slave nodes determine that they cannot assume the role of the standby node according to their operating states, and so on, and therefore may give up election, that is, do not send the first proposal information to other slave nodes, but still vote on the received first proposal information of other slave nodes.

Optionally, when determining the node to be selected, the first slave node further refers to the received proposal information sent by other slave nodes, and determines the node to be selected from the slave nodes that have sent the first proposal information according to the first voting information and the local voting information.

Optionally, in the foregoing step, the determining, by the first slave node, the node to be selected according to the first voting information, the local voting information, and the received proposal information from other slave nodes may include:

and if the first slave node receives proposal information from the second slave node and determines that the number of votes obtained by the second slave node meets the vote threshold value based on the first voting information and the local voting information, determining the second slave node as the node to be selected.

In this application, the vote threshold may be set to be half of the total number of the nodes participating in the voting, that is, when the vote of the second slave node exceeds half of the vote, the second slave node may be determined as the node to be selected.

Optionally, the method of the present application may further include: when the current standby node fails, each slave node initiates first proposal information to other slave nodes except the slave node, and elects a new standby node according to the first voting information of the other slave nodes.

In the above embodiment, a scheme that the standby node is switched to the main node and a new standby node is reselected when the main node fails is described. However, in practical applications, in some cases, the master node may be in normal operation, and the standby node may not be in normal operation due to a failure of hardware equipment or a load problem, and in such cases, in order to ensure that when the master node fails, a standby node can be switched in time, a new standby node needs to be selected immediately for standby.

Optionally, a new backup node may be elected between other slave nodes except for the failed master node and the failed backup node, and a specific election process is similar to that in fig. 2, when the current master node fails, the current backup node is switched to the new master node, and then the processes of electing the new backup node by the other slave nodes are similar, and details are not repeated here.

Optionally, in step S201, when the current master node fails and before the current standby node is switched to a new master node of the distributed cluster system, the method of the present application may further include: when the distributed cluster system is initialized, an initial main node is selected from all nodes of the distributed cluster system, and an initial standby node is selected from all nodes except the initial main node.

Optionally, at the beginning of system startup, each node in the system may generate an initial master node and an initial candidate node through voting election, where all nodes in the system may participate in the election when the initial master node is elected, and after the initial master node is determined, the election for the initial candidate node is performed in other nodes except the initial master node. The specific election method is similar to the aforementioned election method for the new candidate node, and is not described herein again.

To sum up, the cluster election processing method provided in the embodiment of the present application includes: when the current main node fails, the current standby node is switched to a new main node of the distributed cluster system; and each slave node respectively initiates first proposal information to other slave nodes except the slave node, and elects a new standby node according to the first voting information of the other slave nodes to obtain a new standby node of the distributed cluster system, wherein the first proposal information is used for representing and election the new standby node, and the first voting information is used for indicating the slave node identifier selected by the other slave nodes as the standby node. In the scheme, the distributed cluster system can comprise a main node and a standby node, and when the main node fails, the standby node can be switched to the main node to take over the task of the main node. Compared with the prior art that each slave node triggers new election and determines a new master node when the master node fails, the method can ensure continuous operation of the system and improve high availability and reliability of the system due to direct switching. In addition, after the standby nodes are switched to the main node, each slave node can trigger and elect a new standby node immediately so as to ensure that the standby nodes exist in the system at all times, thereby ensuring the switching continuity and further improving the availability of the system.

In addition, by means of the mode that the master node is supervised by the slave nodes and the slave nodes supervise the slave nodes, the labor division of each node in the system can be more definite, the reliability of the system architecture is higher, and the unavailability of the cluster due to the fault of the master node can be avoided.

The following describes a device, an apparatus, and a storage medium for executing the cluster election processing method provided in the present application, where specific implementation processes and technical effects of the device, the apparatus, and the storage medium are referred to above, and details are not described below.

Fig. 10 is a schematic diagram of a cluster election processing apparatus provided in this embodiment, where functions implemented by the cluster election processing apparatus correspond to steps executed by the foregoing method. The apparatus may be understood as a server corresponding to any node described above, or a processor of the server, or may be understood as a component that is independent of the server or the processor and implements the functions of the present application under the control of the server, as shown in fig. 10, the apparatus may include: a switching module 910 and an election module 920;

a switching module 910, configured to switch a current standby node to a new master node of the distributed cluster system when a current master node fails;

and an election module 920, configured to initiate first proposal information to other slave nodes except the slave node by each slave node, and elect a new slave node according to the first voting information of the other slave nodes to obtain a new slave node of the distributed cluster system, where the first proposal information is used to represent a candidate node for election, and the first voting information is used to indicate a slave node identifier selected by the other slave nodes as the slave node.

Optionally, the switching module 910 is specifically configured to replace, by the new master node, the target identifier with a node identifier of the new master node, where the target identifier is used to identify a current master node of the distributed cluster system; and the new main node sends synchronization information to nodes except the new main node in the distributed cluster system, wherein the synchronization information is used for indicating the nodes except the new main node to replace the locally stored target identification with the node identification of the new main node.

Optionally, the switching module 910 is specifically configured to, if the current standby node does not receive heartbeat information of the current host node within a preset time period, send second proposal information to each slave node of the distributed cluster system by the current standby node, where the second proposal information is used to represent that the standby node is switched to the host node; receiving second voting information of each slave node, wherein the second voting information is used for indicating whether each slave node confirms that the current master node has a fault; and determining to switch to a new main node of the distributed cluster system according to the second voting information.

Optionally, the election module 920 is specifically configured to send, by the first slave node, third voting information to other slave nodes except the first slave node, where the third voting information is used to indicate a slave node identifier selected by the first slave node as a standby node, and the first slave node is any one of the slave nodes; the first slave node receives first voting information of other slave nodes, and the first voting information is determined by the other slave nodes based on the received proposal information; the first slave node determines a node to be selected according to the first voting information and the local voting information; and the first slave node sends a confirmation proposal to the node to be selected, and the node to be selected determines whether to switch the node to be selected to a new standby node according to the number of the received confirmation proposals.

Optionally, the election module 920 is specifically configured to determine, by the first slave node, a node to be selected according to the first voting information, the local voting information, and the received proposal information from other slave nodes.

Optionally, the election module 920 is specifically configured to determine the second slave node as the node to be elected, if the first slave node receives proposal information from the second slave node, and it is determined that the number of votes obtained by the second slave node satisfies the vote threshold based on the first voting information and the local voting information.

Optionally, the election module 920 is further configured to, when the current slave node fails, initiate the first proposal information to other slave nodes except the slave node by each slave node, and elect a new slave node according to the first voting information of the other slave nodes.

Optionally, the election module 920 is further configured to, when the distributed cluster system is initialized, select an initial master node from each node of the distributed cluster system, and select an initial standby node from each node except the initial master node.

The above-mentioned apparatus is used for executing the method provided by the foregoing embodiment, and the implementation principle and technical effect are similar, which are not described herein again.

The above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. As another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

The modules may be connected or in communication with each other via a wired or wireless connection. The wired connection may include a metal cable, an optical cable, a hybrid cable, etc., or any combination thereof. The wireless connection may comprise a connection over a LAN, WAN, bluetooth, zigBee, NFC, or the like, or any combination thereof. Two or more modules may be combined into a single module, and any one module may be divided into two or more units. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this application.

It should be noted that the above modules may be one or more integrated circuits configured to implement the above methods, for example: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, the modules may be integrated together and implemented in the form of a System-on-a-chip (SOC).

Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application, where the electronic device may be a computing device with a data processing function.

The apparatus comprises: a processor 801 and a memory 802.

The memory 802 is used for storing programs, and the processor 801 calls the programs stored in the memory 802 to execute the above-mentioned method embodiments. The specific implementation and technical effects are similar, and are not described herein again.

The memory 802 stores therein program code that, when executed by the processor 801, causes the processor 801 to perform various steps of a cluster election processing method according to various exemplary embodiments of the present application described in the above section "exemplary methods" of the present specification.

The Processor 801 may be a general-purpose Processor, such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, or the like, and may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present Application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.

Memory 802, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory may include at least one type of storage medium, which may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charged Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and the like. The memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 802 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.

Optionally, the present application also provides a program product, such as a computer readable storage medium, comprising a program which, when being executed by a processor, is adapted to carry out the above-mentioned method embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or in the form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer-readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (in english: processor) to execute some steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims

1. A cluster election processing method is applied to a distributed cluster system, and the distributed cluster system comprises the following steps: the system comprises a current main node, a current standby node and at least one slave node; the method comprises the following steps:

each slave node initiates first proposal information to other slave nodes except the slave node, and elects a new standby node according to first voting information of the other slave nodes to obtain a new standby node of the distributed cluster system, wherein the first proposal information is used for representing and election the new standby node, and the first voting information is used for indicating the slave node identifier selected by the other slave nodes as the standby node;

when the current master node fails, switching the current standby node to a new master node of the distributed cluster system, including:

2. The method of claim 1, wherein switching the current standby node to be a new master node of the distributed cluster system comprises:

replacing a target identifier by the new master node with a node identifier of the new master node, wherein the target identifier is used for identifying the current master node of the distributed cluster system;

3. The method according to claim 1, wherein the initiating, by each slave node, first proposal information to other slave nodes except the slave node, and electing a new candidate node according to the first voting information of the other slave nodes, respectively, comprises:

4. The method according to claim 3, wherein the determining, by the first slave node, the node to be selected according to the first voting information and the local voting information includes:

5. The method according to claim 4, wherein the determining, by the first slave node, a node to be selected according to the first voting information, the local voting information, and the received proposal information from other slave nodes includes:

6. The method according to any one of claims 1-4, further comprising:

7. The method according to any of claims 1-4, wherein before the current standby node switches to a new master node of the distributed cluster system when the current master node fails, further comprising:

8. A cluster election processing device applied to a distributed cluster system, the distributed cluster system comprising: the system comprises a current main node, a current standby node and at least one slave node; the device comprises: a switching module and an election module;

the election module is configured to initiate first proposal information to other slave nodes except the slave node by each slave node, elect a new slave node according to first voting information of the other slave nodes, and obtain the new slave node of the distributed cluster system, where the first proposal information is used to represent a candidate node for election, and the first voting information is used to indicate the slave node identifier selected by the other slave nodes as the slave node;

the switching module is specifically configured to, if the current standby node does not receive heartbeat information of the current master node within a preset time period, send second proposal information to each slave node of the distributed cluster system by the current standby node, where the second proposal information is used to represent that the standby node is switched to the master node, receive second voting information of each slave node, where the second voting information is used to indicate whether each slave node confirms that the current master node has a fault, and determine to switch to a new master node of the distributed cluster system according to the second voting information.

9. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing program instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is running, the processor executing the program instructions to perform the steps of the method according to any one of claims 1 to 7 when executed.

10. A computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.