CN113965578A

CN113965578A - Method, device, equipment and storage medium for electing master node in cluster

Info

Publication number: CN113965578A
Application number: CN202111260095.7A
Authority: CN
Inventors: 钱晨亮; 刘新宇; 王蒙蒙
Original assignee: Shanghai Dameng Database Co Ltd
Current assignee: Shanghai Dameng Database Co Ltd
Priority date: 2021-10-28
Filing date: 2021-10-28
Publication date: 2022-01-21
Anticipated expiration: 2041-10-28
Also published as: CN113965578B

Abstract

The invention discloses a method, a device, equipment and a storage medium for electing a master node in a cluster. The method is executed by computer equipment serving as a node in a cluster and comprises the following steps: the original master node in the cluster sends message messages to all slave nodes, receives message reply messages fed back by all the slave nodes relative to the message messages, updates the cluster option number based on all the target message reply messages when the original master node receives at least one target message reply message containing the expired option, and initiates election and executes election operation in a consistency protocol algorithm. In the embodiment of the invention, when the original master node receives at least one target message reply message containing expired appointments, the original master node updates the own cluster appointments based on each target message reply message and initiates elections, thereby reducing the probability of the master node being abnormal and effectively avoiding the master node in the cluster from being frequently switched to the slave node, thereby improving the cluster performance and the user experience.

Description

Method, device, equipment and storage medium for electing master node in cluster

Technical Field

The embodiment of the invention relates to the field of databases, in particular to a method, a device, equipment and a storage medium for electing a master node in a cluster.

Background

RAFT is a consistent algorithm for managing log replication, and the only requirement when electing a leader is that the selected leader node has a log that is newer than, or at least as newer than, the respective logs owned by other nodes in the cluster. RAFT requires that the current leader is still effective, and nodes without timeout refuse to vote to other nodes so as to avoid the other nodes from electing as a master node, thus the leader node switching in the cluster caused by network faults of followers and other reasons can be avoided to a great extent, but the original master node still needs to be elected as the leader again through the mode switching process of leader- > follower- > leader, and thus the master node is switched frequently.

The RAFT protocol may be applied to a traditional data daemon cluster. In a database system, the cost of switching a master library node to a backup library in a cluster is very high. For the database system, the mode switching needs to perform termination and rollback of the current transaction, cleaning of a rollback page and truncation of a log, and under the condition that the data system is very stressful, the process takes a long time. Since the node (master node) providing the service to the outside is switched to the state (slave node) not providing the service to the outside, the switching process also causes disconnection of the current session and failure of the current operation, and a large amount of uncommitted data is likely to be discarded completely, which affects the fluency and reliability of the database service.

Disclosure of Invention

The invention provides an invention name to realize the infrequent switching of a master node and a slave node in various fault scenes.

In a first aspect, an embodiment of the present invention provides a method for electing a master node in a cluster, where the method is executed by a computer device serving as a node in the cluster, and includes:

an original master node in a cluster sends a message to each slave node and receives a message reply message fed back by each slave node relative to the message, wherein the original master node is determined by adopting a given consistency protocol algorithm in advance;

when the original master node receives at least one target message reply message containing expired appointments, the original master node updates the own cluster optional number based on each target message reply message, and initiates election and executes election operation in the consistency protocol algorithm.

In a second aspect, an embodiment of the present invention further provides an apparatus for electing a master node in a cluster, where the apparatus is executed by a computer device serving as a node in the cluster, and the apparatus includes:

the message receiving module is used for sending a message to each slave node by an original master node in the cluster and receiving a message reply message fed back by each slave node relative to the message, wherein the original master node is determined by adopting a given consistency protocol algorithm in advance;

and the election and execution module is used for updating the cluster option number of the original master node based on each target message reply message when the original master node receives at least one target message reply message containing expired option, and initiating election and executing election operation in the consistency protocol algorithm.

In a third aspect, an embodiment of the present invention further provides a computer device, where the computer device, as a node in a cluster, includes:

one or more processors for executing a program to perform,

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement a method of election of master nodes in a cluster as described in the first aspect above.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the method for electing a master node in a cluster as described in the first aspect.

In the technical scheme provided by the embodiment of the invention, the original master node in the cluster is used for sending the message to each slave node and receiving the message reply message fed back by each slave node relative to the message, wherein the original master node is determined in advance by adopting a given consistency protocol algorithm; then when the original main node receives at least one target message reply message containing expired appointments, the original main node updates the own cluster optional number based on each target message reply message, and initiates election and executes election operation in the consistency protocol algorithm. According to the technical scheme, when the original master node receives at least one target message reply message containing expired date, the own cluster option number is updated based on each target message reply message, election is initiated, the master node in the cluster is effectively prevented from being frequently switched to the slave node, the fluency and reliability of database service are improved, and therefore the cluster performance and user experience are improved. Compared with the prior art that the master library node in the cluster is switched to the standby library, the adopted election method of the master node in the cluster reduces the switching times of the master library in various fault scenes, enables the original master library to be successfully reselected as far as possible even if election occurs, and does not need to be switched to the standby library, so that the cluster performance and the user experience are improved. In addition, the technical scheme of the embodiment effectively avoids the frequent switching times of the master node and the slave node, thereby avoiding the loss of a large amount of data in the database, improving the fluency and reliability of the database service, and improving the performance of the cluster and the experience of users to a certain extent.

Drawings

Fig. 1 is a flowchart of a method for electing a master node in a cluster according to a first embodiment of the present invention;

fig. 2 is a flowchart of an election device of a master node in a cluster according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a computer device provided in the third embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

In order to facilitate verification of application of the election method of the master node in the cluster in the prior art, the relevant flow of the election method of the master node in the cluster in the prior art is given as follows: and (4) the slave node network is abnormal, the master node message cannot be received, and the option is initiated by adding any number. After the slave node network recovers, the message sent by the master node at regular time is received, the message is replied, and the master node is informed that the 'master node any number is expired'. And the master node receives the message reply, updates the option number to the option number in the message reply, and switches to the slave node. In the method, before the slave node initiates election, the slave node is switched to the slave node, waits for the time-out of the election, and then initiates the election. The normal master node must be switched to the slave node, which results in unnecessary switching between the master node and the slave node, and increases the switching frequency of the master node and the slave node.

In order to facilitate verification of the application of the election method of the master node in another cluster in the prior art, the relevant flow of the election method of the master node in another cluster in the prior art is given as follows: when the original master node network is abnormal, the new master node is selected by updating in the cluster at will. And the task number of the original main node network is unchanged during the abnormal period, and the network sends message messages to other nodes according to the task number of the network after recovery. And other nodes receive the message of the original main node and then reply the message. The original master node receives a message reply with a larger option number, updates the option number to the option number, and switches to the slave node. If the original master node receives the message of the new master node first, the original master node is directly added back to the cluster; if the original master node initiates election, the original master node and the new master node are selected in an competitive mode. In this method, the new master node is switched to the slave node.

Example one

Fig. 1 is a flowchart of a method for electing a master node in a cluster according to a first embodiment of the present invention, where the method is applicable to a situation where master and slave nodes frequently switch, and the method may be executed by an electing apparatus for a master node in a cluster, where the apparatus may be configured in a computer device.

A cluster is a group of mutually independent computer devices interconnected by a high-speed network, which form a group and are managed in a single system mode. The cluster as a whole provides services to the users. A cluster appears to be a stand-alone server when a user interacts with the cluster. The cluster configuration is for improved availability and scalability. The computer equipment is equivalent to a node in a cluster, the node in the cluster can be understood as a server, the nodes in the cluster are equivalent, and a user can log in any node in the cluster to obtain complete database service. The system is provided with a set of database systems, and one node is the whole system of the database. The database system is a database for data storage, analysis and transaction processing, and can store some data, such as some log data on business.

The method for electing the master node in the cluster specifically comprises the following steps:

s110, an original main node in the cluster sends message messages to all slave nodes and receives message reply messages fed back by all the slave nodes relative to the message messages, wherein the original main node is determined by adopting a given consistency protocol algorithm in advance.

Wherein the original master node can be understood as the original leader in the cluster. A slave node may be understood as all followers of the master node in the cluster. The original master node is determined in advance by using a given consistency protocol algorithm.

In this embodiment, the message packet is a message sent by an original master node in the cluster to each slave node. Generally, a message packet is sent from an original master node in a cluster when the original master node in the cluster is started until the original master node in the cluster is turned off, during which the original master node in the cluster sends a periodic or repeated message without interruption. When each slave node does not receive the message in a certain message receiving period, each slave node may consider that the original master node in the cluster has a problem of being closed, having a fault, or being unavailable at present.

It should be noted that the master node in the cluster may send a message to each node, where the message may include a heartbeat message, may also include a log packet, and may also include a message waiting for completion of application of the log packet, and the present embodiment is not limited herein. The master node sends the current log record to the slave node in real time; and the other is asynchronous log package, the master node finds out log records needed by the slave node from the log archive and sends the log records to the slave node for recovering the log of the slave node lost due to failure.

It should be noted that the heartbeat message may be understood as a log that does not include an operation step, and the heartbeat message includes an arbitrary number of the original master node, a currently submitted log packet sequence number, a written log packet sequence number, and the like, so as to avoid timeout of election by the original master node. When each slave node receives the message of the original master node, the message reply message is required to be carried out on the original master node, and the message reply message contains the current any number of each slave node. The log may be specifically a REDO log, and the REDO log (i.e., REDO log) may be understood as adding, deleting, modifying objects, or changing data in the DM database, and the DM may write the results of these operations into the current REDO log according to a specific format. The redo log is mainly used for backup and recovery of the database. The data pages in the database buffer will not have time to write data files, and when the DM instance is restarted, the state of the database can be restored to the state when an accident occurs by redoing the information in the log. In the cluster, the data is changed through log synchronization, and the slave node applies the same change to the local computer through replay after receiving the log of the master node, so that the data of the local computer is consistent with the data of the host computer.

It should be noted that the reply message may include a heartbeat reply, a log packet reply, a request voting reply, and the like, which is not limited in this embodiment. The message reply messages received in different scenarios may be different. Illustratively, when sending the heartbeat message, the heartbeat message is replied; after the voting message is sent, a voting request reply is carried out; when asynchronous recovery is needed, a series of replies to the asynchronous recovery are made. The present embodiment does not limit this.

It can be appreciated that in a normal database cluster, the logs of the master node and other slave nodes are synchronized and maintain a consistent state. If a node fails, the logs of the node and other nodes are inconsistent, and at this time, whether redundant logs are cut off or missing logs are filled up needs to be determined according to conditions.

It should be noted that, in the cluster, if the original master node generates a network anomaly and other slave nodes are in a normal network state, the original master node cannot send a message to each slave node, and the network can send a message to each slave node again after recovery; if the original master node is in a normal network state and some other slave node or a plurality of slave nodes are abnormal in network, the slave node of the abnormal network cannot receive the message sent by the original master node, and the message of the original master node can be received again after the network is recovered. The reason for the network anomaly may be a network card failure, a network congestion may be serious, a network hardware failure, and the like, which is not limited in this embodiment.

In this embodiment, the message reply message may be understood as a message that needs to be fed back after each slave node receives a message sent by an original master node. The message reply message contains the current arbitrary number of each slave node. The message reply message may be a message that has expired, which is not limited in this embodiment.

It can be known that the master node in the cluster is fully responsible for receiving a request command of a client, modifying data according to the command, generating a log entry, copying and sending the log entry to other slave node servers, and submitting the log and persisting data changes when the security is confirmed.

In the embodiment, the consistency protocol algorithm comprises three aspects of leader election, log replication and safety, and after a new master node is elected, the new master node manages the log replication of the whole cluster. The consistency protocol algorithm may cause the servers of a cluster to form a replicated state machine. Wherein a replicated state machine can be imagined as a group of servers, each server being a state machine, the operational state of a server can only be changed by a row of commands. Each state machine stores a log containing a series of instructions, the instructions in the log being executed strictly in sequence, one by one, and if all state machines can execute instructions according to the same log, they will eventually reach the same state.

S120, when the original main node receives at least one target message reply message containing expired appointments, updating the own cluster optional number based on each target message reply message, and initiating election and executing election operation in a consistency protocol algorithm.

The fact that the due period is expired can be understood that the original master node can not act as the master node any more, and the master node needs to be reselected through election or the new master node is admitted to be the slave node and added back to the cluster.

In this embodiment, the target message reply packet may be understood as a message that the reply message of each slave node received by the original master node includes a message that the expiration date of the original master node has expired. The number of target message reply messages that have expired at any time may be two, or may be three, which is not limited in this embodiment.

It should be noted that, when the original master node receives at least one reply packet containing a target message whose expiration date has expired, it may be understood that one or more slave nodes in each slave node at this time have a network anomaly. The network anomaly may be understood as a network card fault, a large network fluctuation, a network congestion, a network attack, a surge interference, a website hardware fault, and the like, which is not limited herein.

It should be noted that the cluster term number is understood to be the term number of each node in the cluster, including the original master node and each slave node. When the original master node receives at least one target message reply message containing expired appointments, the original master node updates the own cluster optional number based on each target message reply message, and then initiates election and executes election operation in a consistency protocol algorithm.

It should be noted that the manner in which the original master node updates the own cluster any number based on each target message reply packet may be: firstly, a cluster option number carried in a target message reply message with expired period is extracted from a message reply message fed back by each slave node, then the maximum cluster option number is added with 1 to be determined as the cluster option number of the slave node, and then election is initiated.

In this embodiment, election may be understood as a process of generating a new master node from an original master node of a cluster, where the process is used to generate the new master node in time after an old master node fails. Wherein, elections are divided into nodes and appointments, and one node can only initiate one election in one appointment. The nodes here include the original master node and the respective slave nodes.

Further, updating the cluster option number based on each target message reply packet includes:

extracting a cluster option number carried in a target message reply message;

and adding 1 to the maximum cluster option number to determine the cluster option number of the self.

The maximum cluster option number may be understood as at least one maximum node option number in a reply message including a target message whose option has expired, which is fed back by each slave node in the cluster.

In this embodiment, the cluster task number is updated based on each target message reply packet, and is an update task number that is performed by the original master node alone. Wherein, updating the task number may be understood as that the original master node adds 1 to the task number of the maximum cluster in the target message reply packet in the database memory, and then persists the task number and writes the persistent task number into the hard disk for storage. Illustratively, when the cluster option number carried in the extracted target message reply message is 5, the original master node adds 1 to the maximum cluster option number of 5 to determine that the original master node is the own cluster option number, that is, the own cluster option number is 6 at this time.

It should be noted that, when a certain slave node or multiple slave nodes with network abnormality recovers to normal, and thus receives the message sent by the original master library again, at this time, a message reply message is fed back to the original master node, which indicates that the expiry date of the original master node at this time has expired, so that when the original master node receives the message reply message, the original master node extracts the cluster any number carried in the target message reply message, and then immediately updates the any number to be the maximum cluster any number in the target message reply message plus 1, and uses the maximum cluster any number as its own cluster any number.

In the technical scheme provided by the embodiment of the invention, the original master node in the cluster is used for sending the message to each slave node and receiving the message reply message fed back by each slave node relative to the message, wherein the original master node is determined in advance by adopting a given consistency protocol algorithm; then when the original main node receives at least one target message reply message containing expired appointments, the original main node updates the own cluster optional number based on each target message reply message, and initiates election and executes election operation in the consistency protocol algorithm. According to the technical scheme, the original master node in the cluster sends the message to each slave node, so that the master node in the cluster is effectively prevented from being frequently switched to the slave node based on the sent message, the smoothness and the reliability of database service are improved, and the cluster performance and the user experience are improved. Compared with the prior art that the master library node in the cluster is switched to the standby library, the adopted election method of the master node in the cluster reduces the switching times of the master library in various fault scenes, enables the original master library to be successfully reselected as far as possible even if election occurs, and does not need to be switched to the standby library, so that the cluster performance and the user experience are improved. In addition, the above technical solution of this embodiment effectively avoids the frequent switching times of the master node and the slave node, thereby avoiding the loss of a large amount of data in the database, thereby improving the fluency and reliability of the database service, and thus improving the performance of the cluster and the experience of the user to a certain extent.

Optionally, after initiating election and executing election operation in the coherency protocol algorithm, the method further includes:

when the original master node elects to become a new master node, performing asynchronous recovery of data information on the abnormal slave nodes which feed back reply messages of each target message;

and when the original master node is not elected to become the new master node, the new master node performs data information synchronization operation on the original master node and other slave nodes.

In this embodiment, the new master node may be understood as a new master node reselected after initiating election and executing election operation in the coherence protocol algorithm, and as a new leader in the cluster, the new master node sends a message packet to each node, and receives a message reply packet fed back by each slave node with respect to the message packet.

In this embodiment, when an original master node initiating election receives at least one reply message including a target message whose expiration date is expired, an option number of its own cluster is updated based on each reply message of the target message, and then a voting request message is sent to all other nodes, and the other nodes can judge whether the current state meets the voting requirement according to the received message, and then send a voting request reply to inform a voting result, at this time, the original master node initiating election judges according to the received voting request reply, and when half or more than half of the other master nodes and slave nodes with older option numbers vote for the original master node, the original master node is considered to be successful in re-election.

It should be noted that election is a message that the node initiating election separately sends to the rest of nodes. In the election process, the original master node initiating the election sends a voting request, and each slave node sends a voting request reply to the original master node initiating the election. And after receiving the voting request of the original master node initiating the election, other slave nodes send voting request replies to the original master node initiating the election. The voting request of the original main node initiating the election at this time contains the current any number, the any number of the last log, the packet serial number and other information, and according to the information, each slave node receiving the voting request judges whether to cast the vote to the original main node according to the any number, the last packet serial number and the packet any number of the current moment when the voting request is received. The judgment condition is that the tenure number of the original master node initiating election is greater than or equal to the tenure number of the current slave node, the tenure number of the last log of the original master node initiating election is newer than or the same as the tenure number of the last log of the current slave node, and the slave node votes to the original master node initiating election when the condition is met; if the condition is not met, the primary node initiating the election is not voted.

It should be noted that each slave node has only one ticket during each election at will.

It can be known that, after voting, when the original master node initiating the election successfully reselects as the new master node, the abnormal slave nodes feeding back the reply messages of each target message perform asynchronous recovery of data information, and then when the message or voting request with a larger option number of the new master node is received, the abnormal slave nodes update their own option numbers as the option numbers in the current message after network recovery, and add them back to the cluster.

In this embodiment, the abnormal slave node may be understood as each slave node where a network abnormality occurs. The number of the abnormal slave nodes may be 1, 2 or three. The present embodiment does not limit this. When the abnormal slave node network is abnormal, the abnormal slave node network cannot receive the message of the original master node, then initiates election from the addition of the appointment number, but the abnormal slave node initiates election only by self-action, increases the appointment number of the abnormal slave node by 1, and tries to send the message. At this time, the message sent by the abnormal slave node is not received by the original master node and other slave nodes, so that the message is a failed election. Illustratively, the original option number of the abnormal slave node is 4, and when the abnormal slave node network is abnormal, the abnormal slave node network cannot receive the message of the original master node, and the original option number is increased by 1 on the basis of the original option number 4 to become 5, and election is started.

In the present embodiment, asynchronous recovery may be understood as a process of recovering an abnormal slave node to a normal slave node. After the asynchronous recovery is executed, the abnormal slave node is a normal slave node, and then the flow of the relevant protocol is executed.

It should be noted that, the asynchronous recovery process may be to send a relevant message to the new master node to inform the abnormal slave node of the need for asynchronous recovery, then the abnormal slave node starts to scan its own log file after receiving the message, finds out the current log entry information from the log file, and then feeds back it to the new master node, after receiving the current log entry information of the abnormal slave node, the new master node compares the current log entry information of the abnormal slave node with the log information of the new master node, then repeats the scanning of its own log file log by the abnormal slave node, and then after receiving the current log entry information of the abnormal slave node, the new master node performs a process of comparing the current log entry information of the abnormal slave node with the log information of the new master node until finding a log information of the new master node whose log information is consistent with the log information of the abnormal slave node, and after obtaining the same log information, the new main node sends the later inconsistent log information to the abnormal slave node one by one for asynchronous recovery.

It can be known that, after voting, when the original master node initiating the election does not elect to become the new master node, it may be considered that the other node elections are successful and called the new master node, and the original master node will drop, and the new master node may be understood as other nodes except the original master node. At this time, the new master node performs data information synchronization operation on the original master node and other slave nodes. The original master node and the slave node which is newer than the new master node log need to cut off the log, and the node which is older than the new master node log needs to be asynchronously recovered. Among other things, truncating the log may be understood as a process of deleting log records to reduce the size of the redo log.

In this embodiment, the synchronization operation may be understood as that when the original master node does not elect to become a new master node, the new master node performs data information synchronization on the original master node and other slave nodes.

and the abnormal slave nodes which feed back the reply messages of the target messages update the respective cluster any number according to the cluster any number carried in the messages sent by the new master node, and rejoin the cluster.

In this embodiment, when the original master node is successfully reselected to become the new master node, the tenure of the new master node may be considered as the latest tenure number, and the tenure number of the abnormal slave node is smaller than the tenure number of the new master node. And after the abnormal slave node receives the message sent by the new master node, the abnormal slave node updates the respective cluster option number to be the latest option number of the new master node according to the cluster option number carried in the message sent by the new master node and adds the latest option number back to the cluster.

Optionally, the method for electing a master node in a cluster further includes:

when the slave nodes in the cluster do not receive the message sent by the original master node within a given election duration, initiating election and determining a new master node by executing election operation in a consistency protocol algorithm;

and (4) preventing the original master node from initiating election again by each slave node except the original master node in the cluster through a set election blocking strategy.

The given election duration may be understood as an election duration configured for each node in the cluster in advance.

It should be noted that the given election duration of each node in the cluster is not fixed. The election duration given by each node in the cluster may be the same configuration duration, or different configuration durations, or the same configuration duration and different configuration durations exist in each node, which is not limited herein. Illustratively, the election duration of one slave node in the cluster is configured to be 1500ms, the election duration of another slave node in the cluster is configured to be 1500ms, and the election duration of another slave node in the cluster is configured to be 1500 ms; the election duration of one slave node in the cluster is configured to be 1000ms, the election duration of another slave node in the cluster is configured to be 1200ms, and the election duration of another slave node in the cluster is configured to be 1400 ms; the election duration of one slave node in the cluster is configured to be 1000ms, the election duration of another slave node in the cluster is configured to be 1000ms, and the election duration of another slave node in the cluster is configured to be 1300 ms.

It should be noted that, when the slave node in the cluster does not receive the message sent by the original master node within a given election duration, it may be understood that the original master node may have a network anomaly. The network abnormality can be understood as the phenomena of network card failure, network fluctuation, machine failure, operator network hang-up and the like. The message at this point allows the slave node to determine if and when the original master node has failed or terminated.

In this embodiment, the option number of the original main node during the network anomaly period is kept unchanged, and after the network is recovered, the original main node sends a message packet to each node by using its own option number.

In this embodiment, when a slave node in a cluster does not receive a message sent by an original master node within a given election duration, the cluster automatically performs an optional update, initiates election, and determines a new master node by performing election operation in a coherence protocol algorithm.

It should be noted that, the cluster is automatically updated at will, before initiating election and determining a new master node by performing election operation in the coherence protocol algorithm, an election mechanism exists, and each slave node of the cluster waits for an election timeout, that is, when the original master node has network abnormality, other slave nodes cannot receive the message sent by the original master node within a given election duration. Each node in the cluster can carry out automatic detection, and when one or more slave nodes in the cluster do not receive the message sent by the original master node within a given election time length, the slave nodes can be used as the slave nodes to reinitiate election. At this time, other slave nodes except the original master node can receive election information of the election initiating node. And then preventing the original master node from initiating election again through the election blocking strategy set for all slave nodes except the original master node in the cluster. Wherein, the election blocking policy can be understood as that the original master node cannot initiate election through the set election blocking policy.

In this embodiment, the way that each slave node except the original master node in the cluster prevents the original master node from initiating election again through the set election blocking policy may be: if each slave node except the original master node in the cluster receives a delay message sent by the original master node after a new master node is determined, and the cluster option number carried in the delay message is smaller than the current cluster option number of each slave node, ignoring the delay message and not sending a message reply message to the original master node; when the original master node does not receive the message reply message of the relative delay message, the execution of election initiation is forbidden, thereby avoiding the old master library from initiating election again.

Optionally, each slave node in the cluster except the original master node prevents the original master node from initiating election again through a set election blocking policy, including:

if all the slave nodes except the original master node in the cluster receive the delay message sent by the original master node after determining the new master node, and the cluster option number carried in the delay message is smaller than the current cluster option number of all the slave nodes, ignoring the delay message and not sending a message reply message to the original master node;

and when the original main node does not receive the message reply message of the relative delay message, prohibiting the execution of election initiation.

The delayed message may be understood as a message sent to each slave node after the original master node network recovers from an abnormality.

It should be noted that the delay message may be a time delay or a network delay. For example, when the original master node is disconnected from a part of slave node networks including the new master node, and the other part of slave node networks are normally connected, the original master node does not receive the message of the new master node at this time, and still considers itself to be the master node, and after the new master node becomes the master nodes of all slave nodes except the original master node, the original master node still sends the message of its own expiration date to the library which is normally connected with itself.

It should be noted that, if each slave node except the original master node in the cluster receives the delay message sent by the original master node after determining the new master node, then checking the correctness of the received delay message, analyzing the content of the delay message, comparing the cluster any number carried in the analyzed original main node delay message with the cluster any number carried in the memory of each slave node except the original main node, and comparing the two to obtain the result, wherein the cluster any number carried in the analyzed original main node delay message is less than the current cluster any number of each slave node, the original master node delayed message is ignored, no message reply message is sent to the original master node, therefore, the method avoids the original master node from initiating election again, simultaneously avoids the new master node from being switched to the slave node again, and accelerates the fault recovery of the old master library.

Optionally, after preventing the original master node from initiating the election again, the method further includes:

the new master node sends new message messages to the original master node and other slave nodes;

after receiving the new message, the original master node updates the cluster option number of the original master node and changes the new cluster option number into a slave node to be added into the cluster;

and the new master node performs data information synchronization operation on each slave node in the cluster.

The new message may be understood as a message sent by the new master node to the original master node and other slave nodes.

In this embodiment, after the original master node is prevented from initiating election again, the new master node sends a new message to the original master node and other slave nodes, then the original master node updates its own cluster option number and changes the new cluster option number to a slave node to add to the cluster after receiving the new message sent by the new master node, and then the new master node performs data information synchronization operation on each slave node in the cluster.

For example, to facilitate better understanding of the method for electing a master node in a cluster, the following provides a method for electing a master node in a cluster when an original master node receives at least one reply packet containing a target message whose expiration date has expired, and the steps may be:

a1, the original master node in the cluster sends message messages to each slave node, and receives message reply messages fed back by each slave node relative to the message messages.

a2, the original master node receives at least one target message reply message containing expired appointments, which indicates that one or more slave nodes may have network abnormality, and the message of the original master node cannot be received, and the election is initiated by the self-increment appointment numbers, but the election is invalid at the moment.

a3, when one or more slave nodes recover from the abnormal condition, it will receive the message sent by the original master node and reply the message to inform the original master node that the expiration date has expired.

a4, when the original master node receives at least one target message reply message containing expired expiration, extracting the cluster option number carried in the target message reply message, and determining the largest cluster option number plus 1 as the cluster option number of the master node, and then initiating election and executing election operation in the consistency protocol algorithm.

a5, the original master node initiating the election will send a voting request, and each slave node will send a voting request reply to the original master node initiating the election. And after receiving the voting request of the original master node initiating the election, other slave nodes send voting request replies to the original master node initiating the election. When the original master node elects to become a new master node, performing asynchronous recovery of data information on the abnormal slave nodes which feed back reply messages of each target message; and when the original master node is not elected to become a new master node, the new master node performs data information synchronization operation on the original master node and other slave nodes.

a6, the abnormal slave node receives the voting request or message with larger option number of the new master node, feeds back the abnormal slave node of each target message reply message to update the respective cluster option number according to the cluster option number carried in the message sent by the new master node, and rejoins the cluster.

For example, to better understand the method for electing a master node in a cluster, the following gives a method for electing a master node in a cluster when a slave node in the cluster does not receive a message sent by an original master node within a given election duration, and the steps may be:

b1, the original master node in the cluster sends message messages to each slave node, and receives message reply messages fed back by each slave node relative to the message messages.

b2, when the slave node in the cluster does not receive the message sent by the original master node within the given election duration, this indicates that the original master node may have network abnormality, and the slave node in the cluster will initiate election by itself. At this time, other slave nodes except the original master node can receive election information of the new master node.

b3, when the original main node network recovers from abnormal, it sends delay message to other nodes by its own any number.

b4, if each slave node except the original master node in the cluster receives the delay message sent by the original master node after determining the new master node, and the cluster option number carried in the delay message is smaller than the current cluster option number of each slave node, ignoring the delay message and not sending a message reply message to the original master node; and when the original main node does not receive the message reply message of the relative delay message, prohibiting the execution of election initiation.

b5, the new master node sends new message to the original master node and other slave nodes; and after receiving the new message, the original master node updates the cluster any number of the original master node and changes the new cluster any number into the slave node to join the cluster.

b6, the new main node synchronizes the data information of each slave node in the cluster.

Example two

Fig. 2 is a flowchart of an election device of a master node in a cluster according to a second embodiment of the present invention, where the election device of a master node in a cluster according to this embodiment may be implemented by software and/or hardware, and may be configured in a server to implement a method for electing a master node in a cluster according to the second embodiment of the present invention. As shown in fig. 2, the apparatus may specifically include: a message receiving module 210 and an election and execution module 220.

The message receiving module 210 is configured to send a message to each slave node by an original master node in a cluster, and receive a message reply message fed back by each slave node relative to the message, where the original master node is determined in advance by using a given consistency protocol algorithm;

an election and execution module 220, configured to update an option number of its own cluster based on each target message reply packet when the original master node receives at least one target message reply packet including an expired option, and initiate election and execute an election operation in the coherence protocol algorithm.

In the technical scheme provided by the embodiment of the invention, an original main node in a message receiving module cluster sends a message to each slave node and receives a message reply message fed back by each slave node relative to the message, wherein the original main node is determined by adopting a given consistency protocol algorithm in advance; then, when the original master node receives at least one target message reply message containing expired appointments, the election and execution module updates the own cluster optional numbers based on each target message reply message and initiates election operation in an election and execution consistency protocol algorithm. According to the technical scheme, the original master node in the cluster sends the message to each slave node, so that the master node in the cluster is effectively prevented from being frequently switched to the slave node based on the sent message, the smoothness and the reliability of database service are improved, and the cluster performance and the user experience are improved. Compared with the prior art that the master library node in the cluster is switched to the standby library, the adopted election method of the master node in the cluster reduces the switching times of the master library in various fault scenes, enables the original master library to be successfully reselected as far as possible even if election occurs, and does not need to be switched to the standby library, so that the cluster performance and the user experience are improved. In addition, the technical scheme of the embodiment effectively reduces the frequent switching times of the master node and the slave node, thereby avoiding the loss of a large amount of data in the database, improving the fluency and the reliability of the database service, and improving the performance of the cluster and the experience of users to a certain extent.

Optionally, on the basis of the foregoing embodiments, the election and execution module 220 may include:

an option number extracting unit, configured to extract a cluster option number carried in the target message reply packet;

and an arbitrary number determining unit, configured to determine the largest cluster arbitrary number plus 1 as its own cluster arbitrary number.

Optionally, on the basis of the foregoing embodiments, the method further includes:

the information recovery unit is used for performing asynchronous recovery of data information on the abnormal slave nodes which feed back the reply messages of the target messages when the original master node elects to become a new master node;

and the information synchronization unit is used for carrying out data information synchronization operation on the original master node and other slave nodes by the new master node when the original master node is not elected to become a new master node.

and the cluster adding unit is used for feeding back the abnormal slave nodes of the reply messages of the target messages, updating the respective cluster any number according to the cluster any number carried in the messages sent by the new master node, and adding the cluster any number into the cluster again.

Optionally, the method further includes:

a new master node determining module, configured to initiate election and determine a new master node by executing election operation in the coherence protocol algorithm when a slave node in the cluster does not receive a message sent by the original master node within a given election duration;

and the election blocking module of the original master node is used for preventing the original master node from initiating election again through the set election blocking strategy of each slave node except the original master node in the cluster.

Optionally, the module for preventing election by the original master node includes:

an information ignoring unit, configured to ignore a delay message carried in the delay message and not send a message reply message to the original master node if each slave node except the original master node in the cluster receives the delay message sent by the original master node after determining a new master node, and a cluster option number carried in the delay message is smaller than a current cluster option number of each slave node;

and the election forbidding unit is used for forbidding to execute election initiation when the original main node does not receive the message reply message relative to the delay message.

the message sending module is used for sending new message messages to the original master node and other slave nodes by the new master node;

the node number updating module is used for updating the node number of the original main node and changing the node number into a slave node to be added into the cluster after the original main node receives the new message;

and the data recovery module is used for the new master node to perform data synchronization of data information on each slave node in the cluster.

The election device of the master node in the cluster provided by the embodiment of the invention can execute the election method of the master node in the cluster provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

EXAMPLE III

Fig. 3 is a schematic structural diagram of a computer apparatus provided in a third embodiment of the present invention, as shown in fig. 3, the apparatus includes a processor 310, a memory 320, an input device 330, and an output device 340; the number of the processors 310 in the device may be one or more, and one processor 310 is taken as an example in fig. 3; the processor 310, the memory 320, the input device 330 and the output device 340 in the apparatus may be connected by a bus or other means, and fig. 3 illustrates the connection by a bus as an example.

The memory 320 serves as a computer-readable storage medium for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to a method for electing a master node in a cluster according to an embodiment of the present invention (for example, the message receiving module 210 and the election and execution module 220 in an electing device of a master node in a cluster). The processor 310 executes software programs, instructions and modules stored in the memory 320 so as to execute various functional applications and data processing of the device, namely, the election method of the master node in the cluster is realized.

The memory 320 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 320 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 320 may further include memory located remotely from the processor 310, which may be connected to the device/terminal/server via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 330 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the apparatus/terminal/server. The output device 340 may include a display device such as a display screen.

Example four

An embodiment of the present invention further provides a storage medium containing computer-executable instructions, where the computer-executable instructions are executed by a computer processor to perform a method for electing a master node in a cluster, where the method is performed by a computer device serving as a node in the cluster, and the method includes:

Of course, the storage medium provided in the embodiment of the present invention contains computer-executable instructions, and the computer-executable instructions are not limited to the method operations described above, and may also perform related operations in a method for electing a master node in a cluster provided in any embodiment of the present invention.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.

It should be noted that, in the embodiment of the above search apparatus, each included unit and module are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for electing a master node in a cluster, performed by a computer device acting as a node in the cluster, comprising:

2. The method of claim 1, wherein updating a cluster option number based on each of the target message reply messages comprises:

extracting the cluster option number carried in the target message reply message;

3. The method of claim 1, further comprising, after said initiating election and performing election operations in said coherency protocol algorithm:

when the original master node elects to become a new master node, performing asynchronous recovery of data information on the abnormal slave nodes which feed back the reply messages of the target messages;

and when the original master node is not elected to become a new master node, the new master node performs data information synchronization operation on the original master node and other slave nodes.

4. The method of claim 1, further comprising, after said initiating election and performing election operations in said coherency protocol algorithm:

and feeding back the abnormal slave nodes of the reply messages of the target messages, updating the respective cluster any number according to the cluster any number carried in the messages sent by the new master node, and adding the cluster any number into the cluster again.

5. The method of claim 1, further comprising:

when the slave nodes in the cluster do not receive the message sent by the original master node within a given election duration, initiating election and determining a new master node by executing election operation in the consistency protocol algorithm;

and preventing the original main node from initiating election again by each slave node except the original main node in the cluster through a set election blocking strategy.

6. The method of claim 5, wherein the step of preventing, by means of a set election blocking policy, each slave node in the cluster except the original master node from initiating election again by the original master node comprises:

if all slave nodes except the original master node in the cluster receive a delay message sent by the original master node after determining a new master node, and the cluster option number carried in the delay message is smaller than the current cluster option number of all slave nodes, ignoring the delay message and not sending a message reply message to the original master node;

and the original main node forbids to execute election initiation when not receiving the message reply message relative to the delay message.

7. The method of claim 5, further comprising, after preventing the original master node from reinitiating elections:

after receiving the new message, the original master node updates the self cluster option number and changes the node into a slave node to join the cluster;

8. An apparatus for electing a master node in a cluster, the apparatus being executed by a computer device acting as a node in the cluster, the apparatus comprising:

9. A computer device, as a node in a cluster, comprising:

one or more processors for executing a program to perform,

a storage device for storing one or more programs,

when executed by the one or more programs to cause the one or more processors to implement a method of election of master nodes in a cluster according to any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of election of master nodes in a cluster according to any one of claims 1-7.