CN117938720A - Cluster arbitration method, device, equipment and readable storage medium - Google Patents

Cluster arbitration method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN117938720A
CN117938720A CN202311836926.XA CN202311836926A CN117938720A CN 117938720 A CN117938720 A CN 117938720A CN 202311836926 A CN202311836926 A CN 202311836926A CN 117938720 A CN117938720 A CN 117938720A
Authority
CN
China
Prior art keywords
bmcs
node
bmc
local
message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311836926.XA
Other languages
Chinese (zh)
Inventor
高文强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Innovation Technology Co ltd
Original Assignee
Innovation Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Innovation Technology Co ltd filed Critical Innovation Technology Co ltd
Priority to CN202311836926.XA priority Critical patent/CN117938720A/en
Publication of CN117938720A publication Critical patent/CN117938720A/en
Pending legal-status Critical Current

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a cluster arbitration method, a device, equipment and a readable storage medium, and relates to the technical field of cluster arbitration, wherein a master node sends heartbeat packets to other BMCs through a local BMC; the slave node acquires a message packet from a message buffer corresponding to the BMC, and when the slave node does not find a heartbeat packet in the message packet, the slave node indicates that the master node is unhealthy, and the slave node and the master node are taken as alternative nodes; the candidate nodes send health messages to other BMCs through the local BMCs, and when the candidate nodes corresponding to the other BMCs receive the health messages, the candidate nodes indicate that the local BMCs are healthy; dividing a plurality of alternative nodes into different grades, and selecting a new master node from the alternative nodes of different grades according to a preset selection rule; according to the invention, by adding the BMC network function as the second heartbeat network, whether network faults occur or not is accurately judged, and a preset rule is given to select a new master node from the slave nodes.

Description

Cluster arbitration method, device, equipment and readable storage medium
Technical Field
The present invention relates to the field of cluster arbitration technologies, and in particular, to a cluster arbitration method, apparatus, device, and readable storage medium.
Background
The clustering technology refers to a relatively high benefit in terms of performance, reliability, flexibility and the like under the condition of low cost, and task scheduling is a core technology in a clustering system. A cluster is a group of mutually independent computers interconnected by a high-speed network, which form a group and are managed in a single system mode. When a client interacts with a cluster, the cluster appears as an independent server. Cluster configuration is used to increase availability and scalability. When an individual node in a cluster, or a heartbeat network, fails, the cluster may split into multiple sub-clusters. In order to avoid business conflict among sub-clusters, the health condition of the clusters is determined by using arbitration voting, and the phenomenon of brain cracking is avoided. However, the native operating system cannot directly use the BMC network for cluster management such as arbitration.
Disclosure of Invention
The present invention aims to provide a cluster arbitration method, device, equipment and readable storage medium, so as to improve the above problems. In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
in a first aspect, the present application provides a cluster arbitration method, including:
When the master node fails to send heartbeat to any slave node, the master node sends a heartbeat packet to other BMCs through the local BMC, wherein the local BMC and the other BMCs belong to the same BMC network;
The other BMC stores the heartbeat packet as a message packet into a message buffer area, and replies a successful message to the local BMC after checking that the heartbeat packet is correct;
When the local BMC receives success information of other BMCs and returns the success information to cluster management software of the master node, the local BMC indicates that the slave nodes corresponding to the other BMCs are healthy;
the slave node acquires a message packet from a message buffer corresponding to the BMC, and when the slave node does not find a heartbeat packet in the message packet, the slave node indicates that the master node is unhealthy, and the slave node and the master node are taken as alternative nodes;
the candidate nodes send health messages to other BMCs through the local BMCs, and when the candidate nodes corresponding to the other BMCs receive the health messages, the candidate nodes indicate that the local BMCs are healthy;
Based on whether the candidate nodes and the BMCs corresponding to the candidate nodes are healthy, dividing the candidate nodes into different grades, and electing a new master node from the candidate nodes of different grades according to a preset election rule.
In a second aspect, the present application further provides a cluster arbitration device, including:
A first sending module: when the master node fails to send heartbeat to a plurality of slave nodes, the master node sends a heartbeat packet to other BMCs through a local BMC, wherein the local BMC and the other BMCs belong to the same BMC network;
And a storage module: the other BMC stores the heartbeat packet as a message packet into a message buffer area, and replies a successful message to the local BMC after checking that the heartbeat packet is correct;
And a receiving module: when the local BMC receives success information of other BMCs and returns the success information to cluster management software of the master node, the local BMC indicates that the slave nodes corresponding to the other BMCs are healthy;
the acquisition module is used for: the slave node acquires a message packet from a message buffer corresponding to the BMC, and when the slave node does not find a heartbeat packet in the message packet, the slave node indicates that the master node is unhealthy, and the slave node and the master node are taken as alternative nodes;
and a second sending module: the candidate nodes send health messages to other BMCs through the local BMCs, and when the candidate nodes corresponding to the other BMCs receive the health messages, the candidate nodes indicate that the local BMCs are healthy;
And an election module: based on whether the candidate nodes and the BMCs corresponding to the candidate nodes are healthy, dividing the candidate nodes into different grades, and electing a new master node from the candidate nodes of different grades according to a preset election rule.
In a third aspect, the present application further provides a cluster arbitration device, including:
a memory for storing a computer program;
and the processor is used for realizing the steps of the cluster arbitration method when executing the computer program.
In a fourth aspect, the present application also provides a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the cluster-based arbitration method described above.
The beneficial effects of the invention are as follows:
According to the invention, by adding the BMC network function as the second heartbeat network and checking the health states of the slave node and the master node by using the second heartbeat network, whether the master node and the slave node have faults or not can be accurately judged. When the slave node fails, entering a node failure processing flow; and when the main node fails, entering a main node election process. In the election process, the slave node is divided into three candidates with different priority levels, and the candidate with the highest level preferentially participates in the election, so that the slave node has higher probability to become the master node. Under the operation of the election rule, the master node can be elected rapidly, and the uniqueness of the master node is ensured, so that the reliability of the cluster is improved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the embodiments of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a cluster arbitration method according to an embodiment of the invention;
FIG. 2 is a diagram of a cluster physical node deployment according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an election process according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a cluster arbitration device according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a cluster arbitration device according to an embodiment of the present invention.
The marks in the figure:
800. a cluster arbitration device; 801. a processor; 802. a memory; 803. a multimedia component; 804. an I/O interface; 805. a communication component.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.
Example 1:
The embodiment provides a cluster arbitration method which is applied to cluster management.
Referring to fig. 1, the method is shown to include:
S1, when a master node fails to send heartbeat to a plurality of slave nodes, namely a cluster heartbeat network fails, the master node sends a heartbeat packet to other BMCs through a local BMC, wherein the local BMC and the other BMCs belong to the same BMC network;
Referring to fig. 2, cluster physical nodes are interconnected through a heartbeat network, a cluster includes a plurality of nodes, an OS1 is installed in a node 1, an OS2 is installed in a node 2, and so on, where the OS represents an Operating System. Only one master node OS1 exists in a cluster, the rest are slave nodes OSi, each node corresponds to a local BMC, namely the master node OS1 corresponds to the local BMC1, the BMC1 is also the master BMC, and the slave nodes OSi correspond to the local BMCi; wherein BMC (Baseboard Management Controller) is a board-level management controller, and the server externally manages a core component and externally provides an access interface by using an IPMI protocol.
It should be noted that, in this embodiment, the cluster heartbeat network is a first heartbeat communication network, the BMC network may be used as a second heartbeat communication network other than the cluster heartbeat network, and the administrator may turn on or off the function in the BMC. When the BMC auxiliary communication function is started, the BMCs mutually send message packets, and the content of the message packets is (state information of) the node OS.
In this embodiment, when the first heartbeat communication network fails, the BMC auxiliary communication function is turned on.
The step S1 includes:
S11, calling a Local Send interface by a main node according to a preset frequency, preferably, calling a Local Send interface by a Local operating system by the main node OS1, such as an ipmi-driven system call interface, wherein the Local Send interface comprises an IP address of a target BMC;
s12, the master node sends a heartbeat packet to the Local BMC1 through the Local Send interface;
s13, after receiving the heartbeat packet, the local BMC forwards the heartbeat packet to a message sender;
s14, acquiring the IP addresses of other BMCs 1 from the parameters of the Local Send interface;
S15, the message transmitter transmits the heartbeat packet as a message packet to a BMC network according to IP addresses of other BMCs;
S16, other BMCs acquire the message packet from the BMC network.
Based on the above embodiment, the method further includes:
s2, the other BMCs store the heartbeat packet as a message packet in a message buffer area, and after checking that the heartbeat packet is correct, the other BMCs reply a successful message to the local BMC, wherein the other BMCs are local BMCs corresponding to the slave nodes, the message buffer area is an annular buffer area, and the message packet with a plurality of heartbeat cycles is stored;
specifically, the step S2 includes:
s21, receiving heartbeat packets by message receivers in other BMCs, and storing the heartbeat packets into a ring-shaped message buffer area;
s22, after checking that the heartbeat packet is correct, other BMCs reply a success message to the local BMC;
s23, returning a success message to the Local Send interface by the Local BMC;
Based on the above embodiment, the method further includes:
S3, when the local BMC receives success information of other BMCs and returns the success information to cluster management software of the master node, the local BMC indicates that the slave nodes corresponding to the other BMCs are healthy, otherwise, the corresponding slave nodes are faulty, and the node fault processing flow needs to be entered;
Based on the above embodiment, the method further includes:
s4, the slave node acquires a message packet from a message buffer area corresponding to the BMC, when the slave node does not find a heartbeat packet in the message packet, the slave node indicates that the master node is unhealthy, and the slave node and the master node are taken as alternative nodes;
Specifically, the step S4 includes:
s41, calling a Local Get interface from a message reader of the node, and acquiring a message packet from a ring message buffer area corresponding to the BMC;
S42, when the slave node analyzes the message packet to obtain a heartbeat packet, the heartbeat is healthy;
Based on the above embodiment, the method further includes:
S5, the candidate nodes send health messages to other BMCs through the local BMCs, and when the candidate nodes corresponding to the other BMCs receive the health messages, the candidate nodes indicate that the local BMCs are healthy;
Based on the above embodiment, the method further includes:
S6, dividing a plurality of alternative nodes into different grades based on whether the alternative nodes and the BMCs corresponding to the alternative nodes are healthy, and selecting a new main node from the alternative nodes of different grades according to a preset selection rule;
specifically, the classification method of the grades comprises the following steps:
the alternative node calls GET MASTER an interface to determine whether its corresponding BMC is the master BMC;
When the candidate nodes and the BMCs corresponding to the candidate nodes are healthy, and the BMC is a main BMC, the candidate nodes are super healthy nodes;
When the candidate nodes and the BMCs corresponding to the candidate nodes are healthy, and the BMCs are not the main BMCs, the candidate nodes are healthy nodes;
When the candidate node is healthy and the corresponding BMC is unhealthy, the candidate node is a sub-healthy node.
The priority order in the election process is super health node, health node and sub-health node.
The master node is used as a master node, processes write requests, manages log replication, and continuously transmits heartbeat information to indicate that the master node is still alive without initiating new elections;
When the master node exists, other nodes are all slave nodes, the messages from the master node are received and processed, when the heartbeat information of the master node is overtime, the original master node and the slave nodes are both candidate nodes, and the original master node and the slave nodes can be recommended to be candidate nodes;
the candidate node sends a message requesting voting to other nodes, informs the other nodes to vote, and promotes the current master node if most votes are won.
Specifically, referring to fig. 3, the step S6 includes:
S61, the super health node becomes a candidate to participate in the election of the main node, and a health group identifier of the class to which the super health node belongs is set in a voting request message, wherein the voting request message also comprises any period number of the candidate node and receives the node ID of the vote;
when the super healthy node acquires voting support of most nodes, the super healthy node can become a master node;
S62, when the election exceeds a preset election time period, the host node still cannot be selected, namely, when the election exceeds a preset election time period, the healthy node becomes a candidate to participate in the election of the host node, and a healthy group identifier of the affiliated grade is set in the voting request message, namely; a healthy node;
S63, when the main node still cannot be selected after the expiration of the election period, the sub-health node becomes a candidate to participate in the election of the main node, and a health group identifier of the affiliated grade is set in the voting request message;
S64, when the main node cannot be selected even if the selection exceeds the expiration period, the super healthy node is repeatedly used as a candidate, and the next round of main node election is carried out until the main node is selected.
When the master node is elected, the candidate with higher priority is found to appear, the cluster continues to move until the master node breaks down, and the candidate with higher priority participates in the master node election again.
Example 2:
The embodiment also provides a cluster arbitration method which is applied to cluster management.
The method comprises the following steps:
s1, a master node sends a heartbeat packet to other BMCs through a local BMC (baseboard management controller) while sending a command to a slave node, wherein the local BMC and the other BMCs belong to the same BMC network;
Specifically, the step S1 includes:
S11, the main node calls LocalSend an interface according to a second preset frequency;
s12, the master node sends a heartbeat packet to the local BMC through a LocalSend interface;
s13, after receiving the heartbeat packet, the local BMC transmits the heartbeat packet to a message transmitter;
S14, acquiring IP addresses of other BMCs from parameters of LocalSend interfaces;
S15, the message transmitter transmits the heartbeat packet as a message packet to a BMC network according to IP addresses of other BMCs;
S16, other BMCs acquire the message packet from the BMC network.
Based on the above embodiment, the method further includes:
s2, the other BMCs store the heartbeat packet as a message packet in a message buffer area, and after checking that the heartbeat packet is correct, the other BMCs reply a successful message to the local BMC;
specifically, the step S2 includes:
s21, receiving heartbeat packets by message receivers in other BMCs, and storing the heartbeat packets into a ring-shaped message buffer area;
s22, after checking that the heartbeat packet is correct, other BMCs reply a success message to the local BMC;
s23, the local BMC returns a success message to the LocalSend interface.
Based on the above embodiment, the method further includes:
S3, when the local BMC receives success information of other BMCs and returns the success information to cluster management software of the master node, the health of the slave nodes corresponding to the other BMCs is indicated;
Based on the above embodiment, the method further includes:
S4, when the master node fails to send heartbeat to any slave node, the slave node acquires a message packet from a message buffer area corresponding to the BMC, and when the slave node does not find the heartbeat packet in the message packet, the master node is unhealthy, and the slave node and the master node are used as alternative nodes;
Based on the above embodiment, the method further includes:
S5, the candidate nodes send health messages to other BMCs through the local BMCs, and when the candidate nodes corresponding to the other BMCs receive the health messages, the candidate nodes indicate that the local BMCs are healthy;
Based on the above embodiment, the method further includes:
S6, dividing a plurality of alternative nodes into different grades based on whether the alternative nodes and the BMCs corresponding to the alternative nodes are healthy, and selecting a new main node from the alternative nodes of different grades according to a preset selection rule;
specifically, the step S6 includes:
s61, the super health node becomes a candidate to participate in the election of the main node, and a health group identifier of the affiliated grade is set in the voting request message;
S62, when the main node still cannot be selected after the expiration of the election period, the healthy node becomes a candidate to participate in the election of the main node, and a healthy group identifier of the affiliated grade is set in the voting request message;
S63, when the main node still cannot be selected after the expiration of the election period, the sub-health node becomes a candidate to participate in the election of the main node, and a health group identifier of the affiliated grade is set in the voting request message;
S64, when the main node cannot be selected even if the selection exceeds the expiration period, the super healthy node is repeatedly used as a candidate, and the next round of main node election is carried out until the main node is selected.
In this embodiment, the cluster heartbeat network and the BMC network are synchronously performed, that is, the first heartbeat communication network and the second heartbeat communication network are simultaneously turned on, when the master node fails to send a heartbeat to any one of the slave nodes, the first heartbeat communication network fails, whether the second heartbeat communication network fails is judged by using whether the slave nodes can find the heartbeat packet in the message packet, and when the second heartbeat network fails, the process of electing the master node needs to be entered.
Example 3:
as shown in fig. 4, the present embodiment provides a cluster arbitration device, which includes:
A first sending module: when the master node fails to send heartbeat to a plurality of slave nodes, the master node sends a heartbeat packet to other BMCs through a local BMC, wherein the local BMC and the other BMCs belong to the same BMC network;
And a storage module: the other BMC stores the heartbeat packet as a message packet into a message buffer area, and replies a successful message to the local BMC after checking that the heartbeat packet is correct;
And a receiving module: when the local BMC receives success information of other BMCs and returns the success information to cluster management software of the master node, the local BMC indicates that the slave nodes corresponding to the other BMCs are healthy;
the acquisition module is used for: the slave node acquires a message packet from a message buffer corresponding to the BMC, and when the slave node does not find a heartbeat packet in the message packet, the slave node indicates that the master node is unhealthy, and the slave node and the master node are taken as alternative nodes;
and a second sending module: the candidate nodes send health messages to other BMCs through the local BMCs, and when the candidate nodes corresponding to the other BMCs receive the health messages, the candidate nodes indicate that the local BMCs are healthy;
And an election module: based on whether the candidate nodes and the BMCs corresponding to the candidate nodes are healthy, dividing the candidate nodes into different grades, and electing a new master node from the candidate nodes of different grades according to a preset election rule.
Based on the above embodiments, the first sending module includes:
calling unit: the master node calls a Local Send interface according to a preset frequency;
a first transmitting unit: the master node sends a heartbeat packet to the Local BMC through the Local Send interface;
A forwarding unit: after receiving the heartbeat packet, the local BMC transmits the heartbeat packet to a message transmitter;
a first acquisition unit: acquiring IP addresses of other BMCs from parameters of the Local Send interface;
A second transmitting unit: the message transmitter transmits the heartbeat packet as a message packet to the BMC network according to the IP addresses of other BMCs;
a first acquisition unit: and the other BMCs acquire the message packet from the BMC network.
Based on the above embodiments, the storage module includes:
and a storage unit: message receivers in other BMCs receive the heartbeat packet and store the heartbeat packet into the annular message buffer area;
and a recovery unit: after checking that the heartbeat packet is correct, other BMCs reply a success message to the local BMC;
a return unit: the Local BMC returns a success message to the Local Send interface.
Based on the above embodiments, the election module includes:
A first election unit: the super health node becomes a candidate to participate in the election of the main node, and a health group identifier of the affiliated grade is set in the voting request message;
A second election unit: when the election exceeds a period and the master node still cannot be selected, namely, the election time period exceeds a preset election time period, the healthy nodes become candidates to participate in the election of the master node, and healthy group identifiers of the belonging grades are set in the voting request message;
A third election unit: when the main node still cannot be selected after the expiration of the election period, the sub-health node becomes a candidate to participate in the election of the main node, and a health group identifier of the affiliated grade is set in the voting request message;
a fourth election unit: and when the selection exceeds the period and the master node cannot be selected, the super healthy node is repeatedly used as a candidate, and the next round of master node election is carried out until the master node is selected.
It should be noted that, regarding the apparatus in the above embodiments, the specific manner in which the respective modules perform the operations has been described in detail in the embodiments regarding the method, and will not be described in detail herein.
Example 4:
Corresponding to the above method embodiment, a cluster arbitration device is also provided in this embodiment, and a cluster arbitration device described below and a cluster arbitration method described above may be referred to correspondingly.
Fig. 5 is a block diagram of a cluster arbitration device 800, shown in accordance with an exemplary embodiment. As shown in fig. 5, the cluster arbitration device 800 may include: a processor 801, a memory 802. The cluster arbitration device 800 may also include one or more of a multimedia component 803, an i/O interface 804, and a communication component 805.
The processor 801 is configured to control overall operation of the cluster arbitration device 800 to perform all or part of the above-described cluster arbitration method. The memory 802 is used to store various types of data to support operation at the cluster arbitration device 800, which may include, for example, instructions for any application or method operating on the cluster arbitration device 800, as well as application-related data, such as contact data, messages sent and received, pictures, audio, video, and so forth. The Memory 802 may be implemented by any type or combination of volatile or non-volatile Memory devices, such as static random access Memory (Static Random Access Memory, SRAM for short), electrically erasable programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM for short), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM for short), programmable Read-Only Memory (Programmable Read-Only Memory, PROM for short), read-Only Memory (ROM for short), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia component 803 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen, the audio component being for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signals may be further stored in the memory 802 or transmitted through the communication component 805. The audio assembly further comprises at least one speaker for outputting audio signals. The I/O interface 804 provides an interface between the processor 801 and other interface modules, which may be a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 805 is configured to perform wired or wireless communication between the cluster arbitration device 800 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, near field Communication (NFC for short), 2G, 3G, or 4G, or a combination of one or more thereof, and accordingly the Communication component 805 may comprise: wi-Fi module, bluetooth module, NFC module.
In an exemplary embodiment, the cluster arbitration device 800 may be implemented by one or more Application Specific Integrated Circuits (ASIC), digital signal processors (DIGITAL SIGNAL Processor DSP), digital signal processing device (DIGITAL SIGNAL Processing Device DSPD), programmable logic device (Programmable Logic Device PLD), field programmable gate array (Field Programmable GATE ARRAY FPGA), controller, microcontroller, microprocessor, or other electronic component for performing the cluster arbitration method described above.
In another exemplary embodiment, a computer readable storage medium is also provided, comprising program instructions which, when executed by a processor, implement the steps of the cluster arbitration method described above. For example, the computer readable storage medium may be the memory 802 described above that includes program instructions executable by the processor 801 of the cluster arbitration device 800 to perform the cluster arbitration method described above.
Example 5:
corresponding to the above method embodiment, there is further provided a readable storage medium in this embodiment, and a readable storage medium described below and a cluster arbitration method described above may be referred to correspondingly.
A readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a cluster arbitration method of the above method embodiments.
The readable storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, which may store various program codes.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (10)

1. A cluster arbitration method applied to cluster management software, comprising:
When the master node fails to send heartbeat to any slave node, the master node sends a heartbeat packet to other BMCs through the local BMC, wherein the local BMC and the other BMCs belong to the same BMC network;
The other BMC stores the heartbeat packet as a message packet into a message buffer area, and replies a successful message to the local BMC after checking that the heartbeat packet is correct;
When the local BMC receives success information of other BMCs and returns the success information to cluster management software of the master node, the local BMC indicates that the slave nodes corresponding to the other BMCs are healthy;
the slave node acquires a message packet from a message buffer corresponding to the BMC, and when the slave node does not find a heartbeat packet in the message packet, the slave node indicates that the master node is unhealthy, and the slave node and the master node are taken as alternative nodes;
the candidate nodes send health messages to other BMCs through the local BMCs, and when the candidate nodes corresponding to the other BMCs receive the health messages, the candidate nodes indicate that the local BMCs are healthy;
Based on whether the candidate nodes and the BMCs corresponding to the candidate nodes are healthy, dividing the candidate nodes into different grades, and electing a new master node from the candidate nodes of different grades according to a preset election rule.
2. The cluster arbitration method according to claim 1, wherein the master node sends the heartbeat packet to the other BMCs through the local BMC, wherein the local BMC and the other BMCs belong to the same BMC network, and the method comprises:
the master node calls a Local Send interface according to a preset frequency;
the master node sends a heartbeat packet to the Local BMC through the Local Send interface;
after receiving the heartbeat packet, the local BMC transmits the heartbeat packet to a message transmitter;
Acquiring IP addresses of other BMCs from parameters of the Local Send interface;
the message transmitter transmits the heartbeat packet as a message packet to the BMC network according to the IP addresses of other BMCs;
And the other BMCs acquire the message packet from the BMC network.
3. The cluster arbitration method according to claim 2, wherein the other BMCs store the heartbeat packet as a message packet in the message buffer, and after checking that the heartbeat packet is correct, reply a success message to the local BMC, including:
message receivers in other BMCs receive the heartbeat packet and store the heartbeat packet into the annular message buffer area;
after checking that the heartbeat packet is correct, other BMCs reply a success message to the local BMC;
The Local BMC returns a success message to the Local Send interface.
4. The cluster arbitration method according to claim 1, wherein the electing of the new master node from among the electing nodes of different levels according to the preset election rule includes:
the super health node becomes a candidate to participate in the election of the main node, and a health group identifier of the affiliated grade is set in the voting request message;
When the main node cannot be selected even if the election exceeds the period, the healthy node becomes a candidate to participate in the main node election, and a healthy group identifier of the affiliated grade is set in the voting request message;
when the main node still cannot be selected after the expiration of the election period, the sub-health node becomes a candidate to participate in the election of the main node, and a health group identifier of the affiliated grade is set in the voting request message;
and when the selection exceeds the period and the master node cannot be selected, the super healthy node is repeatedly used as a candidate, and the next round of master node election is carried out until the master node is selected.
5. A cluster arbitration device, comprising:
A first sending module: when the master node fails to send heartbeat to a plurality of slave nodes, the master node sends a heartbeat packet to other BMCs through a local BMC, wherein the local BMC and the other BMCs belong to the same BMC network;
And a storage module: the other BMC stores the heartbeat packet as a message packet into a message buffer area, and replies a successful message to the local BMC after checking that the heartbeat packet is correct;
And a receiving module: when the local BMC receives success information of other BMCs and returns the success information to cluster management software of the master node, the local BMC indicates that the slave nodes corresponding to the other BMCs are healthy;
the acquisition module is used for: the slave node acquires a message packet from a message buffer corresponding to the BMC, and when the slave node does not find a heartbeat packet in the message packet, the slave node indicates that the master node is unhealthy, and the slave node and the master node are taken as alternative nodes;
and a second sending module: the candidate nodes send health messages to other BMCs through the local BMCs, and when the candidate nodes corresponding to the other BMCs receive the health messages, the candidate nodes indicate that the local BMCs are healthy;
And an election module: based on whether the candidate nodes and the BMCs corresponding to the candidate nodes are healthy, dividing the candidate nodes into different grades, and electing a new master node from the candidate nodes of different grades according to a preset election rule.
6. The cluster arbitration device of claim 5, wherein the first transmitting module includes:
calling unit: the master node calls a Local Send interface according to a preset frequency;
a first transmitting unit: the master node sends a heartbeat packet to the Local BMC through the Local Send interface;
A forwarding unit: after receiving the heartbeat packet, the local BMC transmits the heartbeat packet to a message transmitter;
a first acquisition unit: acquiring IP addresses of other BMCs from parameters of the Local Send interface;
A second transmitting unit: the message transmitter transmits the heartbeat packet as a message packet to the BMC network according to the IP addresses of other BMCs;
a first acquisition unit: and the other BMCs acquire the message packet from the BMC network.
7. The cluster arbitration device of claim 5, wherein the memory module comprises:
and a storage unit: message receivers in other BMCs receive the heartbeat packet and store the heartbeat packet into the annular message buffer area;
and a recovery unit: after checking that the heartbeat packet is correct, other BMCs reply a success message to the local BMC;
a return unit: the Local BMC returns a success message to the Local Send interface.
8. The cluster arbitration device of claim 5, wherein the election module comprises:
A first election unit: the super health node becomes a candidate to participate in the election of the main node, and a health group identifier of the affiliated grade is set in the voting request message;
a second election unit: when the main node cannot be selected even if the election exceeds the period, the healthy node becomes a candidate to participate in the main node election, and a healthy group identifier of the affiliated grade is set in the voting request message;
A third election unit: when the main node still cannot be selected after the expiration of the election period, the sub-health node becomes a candidate to participate in the election of the main node, and a health group identifier of the affiliated grade is set in the voting request message;
a fourth election unit: and when the selection exceeds the period and the master node cannot be selected, the super healthy node is repeatedly used as a candidate, and the next round of master node election is carried out until the master node is selected.
9. A cluster arbitration device, comprising:
a memory for storing a computer program;
Processor for implementing the steps of the cluster arbitration method according to any of claims 1 to 4 when executing said computer program.
10. A readable storage medium, characterized by: the readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the cluster arbitration method according to any of claims 1 to 4.
CN202311836926.XA 2023-12-28 2023-12-28 Cluster arbitration method, device, equipment and readable storage medium Pending CN117938720A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311836926.XA CN117938720A (en) 2023-12-28 2023-12-28 Cluster arbitration method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311836926.XA CN117938720A (en) 2023-12-28 2023-12-28 Cluster arbitration method, device, equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN117938720A true CN117938720A (en) 2024-04-26

Family

ID=90762180

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311836926.XA Pending CN117938720A (en) 2023-12-28 2023-12-28 Cluster arbitration method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN117938720A (en)

Similar Documents

Publication Publication Date Title
CN107295080B (en) Data storage method applied to distributed server cluster and server
US10979286B2 (en) Method, device and computer program product for managing distributed system
US11765110B2 (en) Method and system for providing resiliency in interaction servicing across data centers
CN108259629B (en) Virtual internet protocol address switching method and device
CN109308227B (en) Fault detection control method and related equipment
CN110535692B (en) Fault processing method and device, computer equipment, storage medium and storage system
WO2016150066A1 (en) Master node election method and apparatus, and storage system
CN109802986B (en) Equipment management method, system, device and server
CN107666493B (en) Database configuration method and equipment thereof
CN111651291A (en) Shared storage cluster brain crack prevention method, system and computer storage medium
US20100332532A1 (en) Distributed directory environment using clustered ldap servers
US11917001B2 (en) Efficient virtual IP address management for service clusters
CN115088235A (en) Main node selection method and device, electronic equipment and storage medium
CN111858588A (en) Distributed application index service platform and data processing method
CN111541762A (en) Data processing method, management server, device and storage medium
CN113326100B (en) Cluster management method, device, equipment and computer storage medium
CN111342986B (en) Distributed node management method and device, distributed system and storage medium
CN111147274A (en) System and method for creating a highly available arbitration set for a cluster solution
CN108509296B (en) Method and system for processing equipment fault
CN112199176B (en) Service processing method, device and related equipment
CN106790610B (en) Cloud system message distribution method, device and system
CN113765690A (en) Cluster switching method, system, device, terminal, server and storage medium
CN109587218B (en) Cluster election method and device
CN117938720A (en) Cluster arbitration method, device, equipment and readable storage medium
JP2017027166A (en) Operation management unit, operation management program, and information processing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination