CN111708668B - Cluster fault processing method and device and electronic equipment - Google Patents

Cluster fault processing method and device and electronic equipment Download PDF

Info

Publication number
CN111708668B
CN111708668B CN202010477541.9A CN202010477541A CN111708668B CN 111708668 B CN111708668 B CN 111708668B CN 202010477541 A CN202010477541 A CN 202010477541A CN 111708668 B CN111708668 B CN 111708668B
Authority
CN
China
Prior art keywords
node
cluster
nodes
root
slave
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010477541.9A
Other languages
Chinese (zh)
Other versions
CN111708668A (en
Inventor
汤爱迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Cloud Network Technology Co Ltd
Original Assignee
Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Cloud Network Technology Co Ltd filed Critical Beijing Kingsoft Cloud Network Technology Co Ltd
Priority to CN202010477541.9A priority Critical patent/CN111708668B/en
Publication of CN111708668A publication Critical patent/CN111708668A/en
Application granted granted Critical
Publication of CN111708668B publication Critical patent/CN111708668B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/184Distributed file systems implemented as replicated file system
    • G06F16/1844Management specifically adapted to replicated file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to a cluster fault processing method and device and electronic equipment, and relates to the field of cloud computing. The method comprises the following steps: receiving node information reported by root nodes of a cluster, wherein the root nodes are determined through mutual communication among nodes in the cluster; determining the number of root nodes in the cluster according to the received node information; and under the condition that the number of the root nodes is more than one, determining that the cluster has network partition faults. The method can automatically discover the faults of the database clusters under the condition of network partition, discover the problems in time and avoid production accidents.

Description

Cluster fault processing method and device and electronic equipment
Technical Field
The present invention relates to the field of cloud computing technologies, and in particular, to a method for processing a cluster failure, and an electronic device.
Background
Databases (e.g., redis) are widely used in various business scenarios for content identification, and in database clusters, when we encounter bottlenecks such as stand-alone memory, concurrency, traffic, etc., a high availability can be achieved through the database clusters.
The nodes in the database cluster are divided into Master nodes (Master nodes) and Slave nodes (Slave nodes), wherein the Master nodes are responsible for maintaining all read-write requests and cluster key information, and the Slave nodes are only responsible for copying data and state information of the Master nodes.
For example, the nodes can communicate by adopting a P2P Gossip protocol, each Master node in the cluster can periodically send ping messages to other Master nodes, a receiving node replies with a pong message, if the Master node A fails to communicate with the Master node B all the time within the cluster-node-timeout time, the A marks B as subjectively offline, and the A can broadcast the message of which the B is regarded as subjectively offline to the Master cluster. When more than half of Master nodes identify B as subjectively offline, B is identified as objectively offline, and fault discovery is completed.
In the related art, the fault discovery function of the cluster can only cope with the situation that the server fault causes the fault of the individual Master node under the normal condition of the network, and the fault discovery cannot be performed on the related fault of the network partition.
Therefore, a new technical solution for cluster fault handling needs to be provided.
Disclosure of Invention
An object of the present invention is to provide a new solution for cluster failure handling.
According to a first aspect of the present invention, there is provided a method of handling a cluster failure, the cluster comprising a plurality of nodes including at least one master node corresponding to at least one slave node, wherein the method is implemented by a monitor outside the cluster, comprising:
Receiving node information reported by root nodes of the cluster, wherein the root nodes are determined through mutual communication among nodes in the cluster;
determining the number of root nodes in the cluster according to the received node information;
and under the condition that the number of the root nodes is more than one, determining that the cluster has network partition faults.
Optionally, after the determining that the cluster fails in the network partition, the method further includes:
sending a merging instruction to root nodes of the clusters so as to enable merging operation to be carried out among more than one root node in the clusters;
receiving merging feedback information of more than one root node in the cluster;
and under the condition that the merging feedback information indicates that the merging operation fails to be executed, confirming that the cluster has network partition faults.
Optionally, after the determining that the cluster fails in the network partition, the method further includes:
transmitting a status report instruction to a plurality of nodes of the cluster, so that each node in the plurality of nodes reports own status information to the monitor, wherein the status information comprises at least one of node identification, node address, node type, identification of the belonging root node, information of the belonging master node and identification of the corresponding Ha Xicao;
And receiving the state information sent by the plurality of nodes.
Optionally, after the receiving the status information sent by the plurality of nodes, the method further includes:
determining at least two network communication areas according to the identification of the root node to which each node belongs, wherein the network communication areas are in one-to-one correspondence with the root nodes;
detecting whether each master node and corresponding slave node are positioned in the same network communication area;
and if the first master node and at least one corresponding slave node are positioned in a first network communication area, and the at least one slave node of the first master node is positioned in a second network communication area, sending a release instruction to the cluster so as to clear the master-slave relationship between the slave nodes in the second network communication area and the first master node.
Optionally, after the receiving the status information sent by the plurality of nodes, the method further includes:
periodically sending connectivity detection instructions to each node of the cluster according to a preset frequency;
determining whether a plurality of nodes in the cluster have offline nodes according to a response result of each node to the connectivity detection instruction;
and determining a new master node from slave nodes of the offline node under the condition that the offline node exists in the cluster and the node type of the offline node is the master node.
Optionally, the determining a new master node from slave nodes of the offline node includes:
transmitting an offset reporting instruction to a slave node of the offline node so that the slave node feeds back the offset of self-synchronous data;
and determining the slave node with the largest offset from the slave nodes of the offline nodes, and sending a type conversion instruction to the slave node with the largest offset so as to convert the slave node with the largest offset into the master node.
Optionally, the method further comprises:
sending a hash slot clearing instruction to the cluster to clear the corresponding relation between the offline node and the corresponding hash slot;
and sending a hash slot allocation instruction to the cluster to establish a corresponding relation between the hash slot corresponding to the offline node and the slave node with the largest offset.
Optionally, the method further comprises:
and under the condition that the merging feedback information indicates that the merging operation is successfully executed, sending a network partition restoration instruction to the cluster so as to restore the cluster to a normal running state.
According to a second aspect of the present invention, there is provided a method of handling a cluster failure, implemented by a first node in a cluster, comprising:
Receiving a father node identifier sent by a second node in the cluster, wherein the father node identifier of each node is initially a self identifier;
comparing the father node identification sent by the second node with the father node identification of the father node to update the father node identification of the father node;
after at least one updating step, if the identity of the father node of the self is the same as the identity of the self, determining the self as a root node;
in case of itself being the root node, the node information of itself is sent to the monitor as described above.
According to a third aspect of the present invention, there is provided an apparatus for handling a cluster failure, the cluster comprising a plurality of nodes including at least one master node corresponding to at least one slave node, wherein the apparatus is applied to a monitor outside the cluster, comprising:
the first receiving module is used for receiving node information reported by the root nodes of the cluster, wherein the root nodes are determined through mutual communication among the nodes in the cluster;
the first processing module is used for determining the number of root nodes in the cluster according to the received node information;
and the second processing module is used for determining that the cluster has network partition faults under the condition that the number of the root nodes is more than one.
Optionally, the device further comprises a review module for:
sending a merging instruction to root nodes of the clusters so as to enable merging operation to be carried out among more than one root node in the clusters;
receiving merging feedback information of more than one root node in the cluster;
and under the condition that the merging feedback information indicates that the merging operation fails to be executed, confirming that the cluster has network partition faults.
Optionally, the apparatus further comprises a state collection module for:
transmitting a status report instruction to a plurality of nodes of the cluster, so that each node in the plurality of nodes reports own status information to the monitor, wherein the status information comprises at least one of node identification, node address, node type, identification of the belonging root node, information of the belonging master node and identification of the corresponding Ha Xicao;
and receiving the state information sent by the plurality of nodes.
Optionally, the device further comprises a master-slave control module for:
determining at least two network communication areas according to the identification of the root node to which each node belongs, wherein the network communication areas are in one-to-one correspondence with the root nodes;
detecting whether each master node and corresponding slave node are positioned in the same network communication area;
And if the first master node and at least one corresponding slave node are positioned in a first network communication area, and the at least one slave node of the first master node is positioned in a second network communication area, sending a release instruction to the cluster so as to clear the master-slave relationship between the slave nodes in the second network communication area and the first master node.
Optionally, the apparatus further comprises a failover module, the failover module further comprising:
a sending unit, configured to periodically send a connectivity detection instruction to each node of the cluster according to a preset frequency;
the detection unit is used for determining whether a plurality of nodes in the cluster have offline nodes according to the response result of each node to the connectivity detection instruction;
and the main selecting unit is used for determining a new main node from the slave nodes of the offline nodes under the condition that the offline nodes exist in the cluster and the node type of the offline nodes is the main node.
Optionally, the selecting main unit further includes:
the offset obtaining subunit is used for sending an offset reporting instruction to the slave node of the offline node so as to enable the slave node to feed back the offset of self-synchronous data;
And the master-slave conversion subunit is used for determining the slave node with the largest offset from the slave nodes of the offline node, and sending a type conversion instruction to the slave node with the largest offset so as to convert the slave node with the largest offset into the master node.
Optionally, the apparatus further comprises a clearing module for:
sending a hash slot clearing instruction to the cluster to clear the corresponding relation between the offline node and the corresponding hash slot;
and sending a hash slot allocation instruction to the cluster to establish a corresponding relation between the hash slot corresponding to the offline node and the slave node with the largest offset.
Optionally, the apparatus further comprises a recovery module for:
and under the condition that the merging feedback information indicates that the merging operation is successfully executed, sending a network partition restoration instruction to the cluster so as to restore the cluster to a normal running state.
According to a fourth aspect of the present invention, there is provided a processing apparatus for a cluster failure, applied to a first node in a cluster, comprising:
the receiving module is used for receiving father node identifiers sent by a second node in the cluster, wherein the father node identifier of each node is initially self-identifier;
The updating module is used for comparing the father node identification sent by the second node with the father node identification of the second node so as to update the father node identification of the second node;
the determining module is used for determining the self as a root node if the identity of the self parent node is the same as the identity of the self parent node after at least one updating step;
and the sending module is used for sending the node information of the self to the monitor in the case that the self is the root node.
According to a fifth aspect of the present invention there is also provided an electronic device comprising a memory storing executable commands and a processor implementing the method according to the first or second aspect of the present invention when executing the executable commands.
According to the method for processing the cluster faults, the root node information of the clusters is obtained through the monitor, whether the network partition faults occur in the clusters is judged through the root node information, the faults of the database clusters can be automatically found under the condition of the network partition, the problems are found timely, and production accidents are avoided.
Other features of the present invention and its advantages will become apparent from the following detailed description of exemplary embodiments of the invention, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a schematic diagram of an electronic device that may be used to implement an embodiment of the invention.
Fig. 2 is a flow chart of a method of handling a cluster failure according to an embodiment of the invention.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.
The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, the techniques, methods, and apparatus should be considered part of the specification.
In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of exemplary embodiments may have different values.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
< hardware configuration >
Fig. 1 shows a hardware configuration of an electronic device that may be used to implement an embodiment of the invention.
Referring to fig. 1, the electronic apparatus 1000 includes a processor 1100, a memory 1200, an interface device 1300, a communication device 1400, a display device 1500, and an input device 1600. The processor 1100 may be, for example, a central processing unit CPU, a micro control unit MCU, or the like. The memory 1200 includes, for example, ROM (read only memory), RAM (random access memory), nonvolatile memory such as a hard disk, and the like. The interface device 1300 includes, for example, a USB interface, a serial interface, and the like. The communication device 1400 is, for example, a wired network card or a wireless network card. The display device 1500 is, for example, a liquid crystal display. The input device 1600 includes, for example, a touch screen, keyboard, mouse, microphone, etc.
In an embodiment applied to this description, the memory 1200 of the electronic device 1000 is used to store instructions for controlling the processor 1100 to operate in support of implementing a method according to any embodiment of this description. The skilled person can design instructions according to the solution disclosed in the present specification. How the instructions control the processor to operate is well known in the art and will not be described in detail here.
It will be appreciated by those skilled in the art that although a plurality of devices of the electronic apparatus 1000 are shown in fig. 1, the electronic apparatus 1000 of the embodiment of the present description may relate to only some of the devices thereof, for example, only the processor 1100, the memory 1200, and the communication device 1400.
The hardware configuration shown in fig. 1 is merely illustrative and is in no way intended to limit the invention, its applications or uses.
< method example >
The present embodiment provides a method for handling a cluster failure, which is implemented by, for example, the electronic device 1000 shown in fig. 1.
In this embodiment, the cluster includes a plurality of nodes, where the plurality of nodes includes at least one master node, and the master node corresponds to at least one slave node.
In addition, it should be further noted that the present invention is applicable to all database clusters, and in the following embodiments, the cluster of the dis database is taken as an example, but is not limited to the dis database, and thus the following examples do not limit the present invention.
In this embodiment, the electronic device 1000 is an off-cluster monitor. A monitor is a component dedicated to monitoring cluster status, e.g., a type of client software, capable of communicating with all nodes.
As shown in fig. 2, the method includes the following steps S1100-S1300.
In step S1100, node information reported by root nodes of the cluster is received, where the root nodes are determined by mutual communication between nodes in the cluster.
In this embodiment, the root node is determined by the mutual communication between the nodes in the cluster. One root node corresponds to one network connection area. The network connection area refers to a set of a plurality of nodes capable of exchanging data between each other. A connectivity relationship exists between two nodes within the same network connectivity area. .
In this embodiment, the communication relationship means that two nodes can directly or indirectly communicate.
In one example, the root node (root node) acquisition process includes: a first node in the plurality of nodes acquires parent node (parent node) information of a second node based on a consistency protocol; the first node updates own father node information according to the father node information of the second node and a preset updating rule; and the first node judges itself as the root node under the condition that the father node is itself.
In the above example, the parent node of a certain node is initially the node itself.
In the above example, the coherence protocol is, for example, the Gossip protocol.
In the above example, the update rule is, for example, the node ID minimum rule or the node ID maximum rule.
In one example, the first node updates its own parent node information according to the parent node information of the second node and a preset update rule, including: under the condition that the first node corresponds to different father nodes with the second node, acquiring the root node of the first node and the root node of the second node; the first node updates parent node information according to the root node of the first node, the root node of the second node and preset updating rules.
The root node acquiring the root node of the root node and the root node of the second node are, for example: in the chain E-F-G-H-K-J (root node), the father node of E is F, the root node is J, and then E changes the father node of E to J, so that the chain is changed to E-J, wherein the father node and the root node of E are J.
The process of obtaining the root node can be implemented based on Union-Find algorithm. And a Union-FindSet, also known as a disjoint data structure. Refers to a set of disjoint Sets (Sets) that provide both merging (Union) and Find (Find) operations. find (I), i.e., find the set to which I belongs, we typically use find (I) and find (j) to determine if I and j are connected, i.e., belong to the same set. The Union method is that two sets of I and J are connected, and after the Union method is executed, the set of I and all elements of the set of J are connected
In one example, the process of obtaining node information for the root node includes the following steps.
1. All Master and Slave nodes in the cluster mutually propagate information in the cluster according to a Gossip protocol, and the information carries a parent node nodeId (node ID) to which the information belongs.
2. When node A receives the information of node B, it will check if the parent node of B and its parent node are identical.
3. If the nodes are the same, the node A stops acting when the two nodes are already in the same communication graph.
4. If not, the node A executes find operation, and finds the final root node according to the parent node (i.e. the parent node is equal to the node of itself).
The find operation updates the parent node to the root node for path compression.
6. And the node A judges whether root nodes of the node A and the node B are consistent, and if so, the node A stops acting.
7. If the root nodes are different, at this time, because the A and the B can communicate with each other, the node A initiates Union operation on the own root node and the root node of the B, for example, the root node with the smallest nodeId is taken as the root node of the new communication graph, and the root nodes of the two nodes are updated. At this time, A and B are in the same communication diagram.
7. Because the Redis cluster communication follows the Gossip protocol, after a period of time, the nodes which can communicate with each other all have the same root node.
8. If the parent node of the node is equal to the parent node, the node is a root node, and only one root node exists in each connected graph.
And 9, reporting the self state to the monitor by the root node at regular time.
In step S1200, the number of root nodes in the cluster is determined according to the received node information.
In this embodiment, the monitor may determine the number of root node IDs according to the received node information, that is, the number of root nodes in the cluster.
In step S1300, in the case where the number of root nodes is greater than one, it is determined that the cluster has a network partition failure.
In this embodiment, the number of root nodes is greater than one, which means that the cluster network includes at least two partitions, that is, a network partition failure occurs.
According to the cluster fault processing method, the root node information of the clusters is obtained through the monitor, whether the network partition faults occur in the clusters is judged through the root node information, faults of the Redis clusters can be automatically found under the condition of network partition, problems are found timely, and production accidents are avoided.
In one example, after determining that the cluster fails in a network partition, the method further comprises: sending a merging instruction to root nodes of the cluster so as to enable merging operation to be carried out among more than one root node in the cluster; receiving merging feedback information of more than one root node in the cluster; and under the condition that the merging feedback information indicates that the merging operation fails to be executed, confirming that the cluster has network partition faults.
The process can realize reconfirmation of the network partition faults so as to ensure the accuracy of partition fault detection.
As an example of reconfirming the network partition failure, after the foregoing step 9, the method further includes the steps of:
10. if the Redis monitor only receives information reported by one root node in the same time period, the network is in a normal state, and the nodes can communicate with each other. Monitoring continues.
11. If the Redis monitor receives two or more root nodes to report information, the Redis cluster is indicated to possibly have network partition faults.
Redis monitor issues Union requests to multiple root nodes.
The Union request carries the address information of another node B, the node A receiving the Union request firstly carries out ping operation to the node B, if successful, the two nodes carry out Union operation, and the root node with the smallest nodeId is taken as the root node of the new connectivity graph.
14. Update and return Union success. The failure returns a Union failure.
15. If Union can succeed among all root nodes, the network is normal, and the Redis cluster is updated so that all nodes belong to a connected graph.
16. If Union failure exists, the occurrence of the network partition is confirmed.
In one example, in the event of a network partition failure, the method further comprises the following step of detecting, by the monitor, the cluster state: transmitting a state reporting instruction to a plurality of nodes of the cluster, so that each node in the plurality of nodes reports own state information to the monitor, wherein the state information comprises at least one of a node identifier, a node address, a node type, an identifier of a root node, information of a master node and an identifier of a corresponding Ha Xicao; status information sent by a plurality of nodes is received.
In one example of detecting cluster status by a monitor, the following steps 1-4 are specifically included.
1. The monitor informs all root nodes that the network partition occurs, and the root nodes propagate in the communicable nodes in a network partition mode according to the Gossip protocol
2. And after receiving the information, all the nodes switch to a network partition mode, and update parent node information into root nodes.
3. And all nodes report states to the Redis monitor at regular intervals in a network partition mode, wherein the states comprise nodeId, addresses, root nodes, node types (Master/Slave), the Master nodes report hash slot information, and the Slave reports nodeId of the Master nodes to which the Master nodes belong.
4. In the network partition mode, the Redis cluster loses the automatic fault transfer capability, fault discovery and fault transfer responsibilities are handed over from within the Redis cluster to the Redis monitor, and the monitor is responsible for managing and maintaining node metadata information.
Redis clusters do not use consistent hashes, but rather introduce the concept of hash slots. 16384 hash slots are built in the Redis cluster, when a key-value needs to be placed in the Redis cluster, the Redis firstly calculates a result for the key by using a crc16 algorithm, and then the result pair 16384 is subjected to remainder calculation, so that each key corresponds to one hash slot with the number of 0-16383, and the Redis maps the hash slots to different nodes according to the approximately equal number of the nodes.
In the above example, each node in the cluster maintains cluster metadata, broadcasts updates to each other, and changes to the monitor to maintain metadata, which is beneficial to avoiding metadata inconsistency in case of network partition failure and avoiding data loss during network partition.
In one example, in the event of a network partition failure of the cluster, the method further comprises the step of reassigning the master-slave node: determining at least two network communication areas according to the identification of the root node to which each node belongs, wherein the network communication areas correspond to the root nodes one by one; detecting whether each master node and corresponding slave node are positioned in the same network communication area; and if the first master node and at least one corresponding slave node are positioned in the first network communication area and the at least one slave node of the first master node is positioned in the second network communication area, sending a release instruction to the cluster so as to clear the master-slave relationship between the slave nodes in the second network communication area and the first master node.
In another example, the process of master-slave node assignment includes: in the case that the target master node and at least one target slave node are located in the same partition, keeping the target master node and the target slave nodes unchanged, and releasing the slave nodes of which the target master nodes are located in other partitions; alternatively, the target master node is released and the master node is reassigned among the at least two slave nodes of the target master node in the case where the target master node is located in the first partition and the at least two slave nodes of the target master node are located in the second partition.
As an example, the master node is reassigned by the following steps.
1. If the Master node and all the Slave nodes are in the same network partition, the next Master node is continuously checked by the monitor without changing
2. If the Master node and the Slave node are in different partitions, checking whether the number of the Slave nodes in the same partition is greater than or equal to 1, if so, releasing the Slave node resources which are not in the same partition, and continuously checking the next node
3. If the Master node and all the Slave nodes below the Master node are in different partitions, checking whether resources are newly started from the Slave nodes in the partition where the Master node is located, and if so, adding the Slave nodes newly, and releasing the Slave resources of other partitions.
4. If the Master node and all the Slave nodes below the Master node are in different partitions and no resource can be allocated to a new Slave under the network partition where the Master is located, checking whether the number of the Slave nodes in a certain partition is greater than or equal to 2, if so, improving the Slave node with the largest offset as the Master node, and ensuring that the topology structure of one Master and multiple slaves is unchanged. And then releasing Master node resources and the rest partition Slave node resources.
5. If the number of the Slave nodes in the partition is not more than or equal to 2, searching whether resources exist in the partition where the Slave node is located or not to enable the Slave node to be newly started, if yes, adding the Slave node, lifting the original Slave node to be a Master, and then releasing the resources.
6. If none of the above is true, the administrator is notified of the newly added machine by mail or the like.
7. After the Slave nodes are reassigned, the monitor broadcasts a ping message into the cluster to update all Master node and Slave node information.
In the above example, master-slave nodes are automatically reassigned during network partitioning, which saves labor and time and ensures availability of clusters in partitioned state.
In one example, in the event of a network partition failure of the cluster, the method further includes the step of failover: periodically sending connectivity detection instructions to each node of the cluster according to a preset frequency; determining whether a plurality of nodes in the cluster have offline nodes according to the response result of each node to the connectivity detection instruction; in the case that an offline node exists in the cluster and the node type of the offline node is a master node, a new master node is determined from slave nodes of the offline node. And sending a hash slot clearing instruction to the cluster to clear the corresponding relation between the offline node and the corresponding hash slot; and sending a hash slot allocation instruction to the cluster to establish a corresponding relation between the hash slot corresponding to the offline node and the slave node with the largest offset.
In the above example, determining a new master node from slave nodes of the offline node includes: transmitting an offset reporting instruction to a slave node of the offline node so that the slave node feeds back the offset of self-synchronizing data; and determining the slave node with the largest offset from the slave nodes of the offline nodes, and sending a type conversion instruction to the slave node with the largest offset so as to convert the slave node with the largest offset into the master node.
As one example, the failover process specifically includes the following steps.
1. When the monitor monitors that a certain node fails to communicate all the time within the cluster-node-timeout time, the node is considered to have a fault, and the node is marked as a down state.
2. If the node is a Slave node, the monitor does not process, and the Slave node is automatically synchronized with the Master node after waiting for recovery, and the Master existing data is copied in full quantity.
3. If the node is a Master node, the monitor issues requests to all the Slave nodes under the Master for qualification checking, and the Slave nodes report the offset to the monitor.
4. The monitor selects the Slave node with the largest offset to replace the Master node, and the Slave node cancels the copying and becomes the Master node.
5. Executing the clusterDelSlot operation withdraws the slot that the failed master node is responsible for, and executing the clusterAddSlot delegates the slot to the Slave node.
6. The monitor broadcasts pong information into the cluster, informs all nodes in the cluster that the Slave node becomes a Master node, and takes over the slot information of the failure Master.
In the above example, failover is automatically performed during network partitioning, saving manpower and time, ensuring that clusters are highly available.
In one example, in the event of a network partition failure of the cluster, the method further includes the step of cluster recovery: and under the condition that the merging feedback information indicates that the merging operation is successfully executed, sending a network partition restoration instruction to the cluster so as to restore the cluster to a normal running state.
As one example, the process of cluster restoration includes the following steps.
1. And when the monitor monitors that all the nodes are restored to the same root node, the network partition restoration is indicated.
2. The monitor transmits a network partition recovery request to the root node, the root node follows the Gossip protocol to carry the current normal state when information is transmitted, and all nodes in the connectivity graph switch to the normal state after receiving the information.
3. The monitor transfers the fault discovery and automatic transfer rights to the cluster for self-management. Returning to the normal running state of the cluster.
In the above example, after the network partition is restored, the monitor takes over and restores to the self-management in the cluster, which is beneficial to timely restoring to the normal state.
In this embodiment, the Redis monitor detects whether a network partition exists in the Redis cluster through the Union-Find algorithm, and if so, hands over fault discovery and failover responsibilities from within the Redis cluster to the Redis monitor. The Redis monitor is simultaneously responsible for cluster metadata maintenance, and if the Master node and the Slave node are found to be in an unclosed network, the Slave node is reassigned to ensure that the Slave and the Master are in the same network environment. The monitor is responsible for monitoring node failures, and when a node failure occurs, the monitor is responsible for performing failover as a leader and notifying all nodes. When the monitor detects the recovery of the network partition, the fault discovery and the fault transfer responsibilities are handed over from the Redis monitor back to the Redis cluster, so that the elastic recovery is ensured.
The embodiment also provides another method for processing the cluster fault, which is implemented by a first node in the cluster, and includes the following steps: receiving a father node identifier sent by a second node in the cluster, wherein the father node identifier of each node is initially a self identifier; comparing the father node identification sent by the second node with the father node identification of the father node to update the father node identification of the father node; after at least one updating step, if the identity of the father node of the self is the same as the identity of the self, the self is determined to be a root node; and in the case that the node is the root node, sending the node information of the node to the monitor.
< device example >
The embodiment provides a processing device for cluster faults, wherein the cluster comprises a plurality of nodes, the plurality of nodes comprise at least one master node, and the master node corresponds to at least one slave node, and the device is applied to a monitor outside the cluster and comprises a first receiving module, a first processing module and a second processing module.
The first receiving module is used for receiving node information reported by the root nodes of the cluster, wherein the root nodes are determined through mutual communication among the nodes in the cluster.
And the first processing module is used for determining one of the root nodes in the cluster according to the received node information.
And the second processing module is used for determining that the network partition fault occurs to the cluster under the condition that the number of the root nodes is more than one.
In one example, the apparatus further comprises a review module for: sending a merging instruction to root nodes of the cluster so as to enable merging operation to be carried out among more than one root node in the cluster; receiving merging feedback information of more than one root node in the cluster; and under the condition that the merging feedback information indicates that the merging operation fails to be executed, confirming that the cluster has network partition faults.
In one example, the apparatus further comprises a state collection module to: transmitting a state reporting instruction to a plurality of nodes of the cluster, so that each node in the plurality of nodes reports own state information to the monitor, wherein the state information comprises at least one of a node identifier, a node address, a node type, an identifier of a root node, information of a master node and an identifier of a corresponding Ha Xicao; status information sent by a plurality of nodes is received.
In one example, the apparatus further comprises a master-slave control module for: determining at least two network communication areas according to the identification of the root node to which each node belongs, wherein the network communication areas correspond to the root nodes one by one; detecting whether each master node and corresponding slave node are positioned in the same network communication area; and if the first master node and at least one corresponding slave node are positioned in the first network communication area and the at least one slave node of the first master node is positioned in the second network communication area, sending a release instruction to the cluster so as to clear the master-slave relationship between the slave nodes in the second network communication area and the first master node.
In one example, the apparatus further comprises a failover module, the failover module further comprising: the sending unit is used for periodically sending connectivity detection instructions to each node of the cluster according to preset frequency; the detection unit is used for determining whether a plurality of nodes in the cluster have offline nodes according to the response result of each node to the connectivity detection instruction; and the main selecting unit is used for determining a new main node from the slave nodes of the offline nodes under the condition that the offline nodes exist in the cluster and the node type of the offline nodes is the main node.
In one example, the master unit further comprises: the offset obtaining subunit is used for sending an offset reporting instruction to the slave node of the downlink node so as to enable the slave node to feed back the offset of self-synchronous data; and the master-slave conversion subunit is used for determining the slave node with the largest offset from the slave nodes of the offline node and sending a type conversion instruction to the slave node with the largest offset so as to convert the slave node with the largest offset into the master node.
In one example, the apparatus further comprises a purge module to: sending a hash slot clearing instruction to the cluster to clear the corresponding relation between the offline node and the corresponding hash slot; and sending a hash slot allocation instruction to the cluster to establish a corresponding relation between the hash slot corresponding to the offline node and the slave node with the largest offset.
In one example, the apparatus further comprises a recovery module to: and under the condition that the merging feedback information indicates that the merging operation is successfully executed, sending a network partition restoration instruction to the cluster so as to restore the cluster to a normal running state.
The embodiment also provides a processing device for cluster faults, which is applied to a first node in a cluster and comprises: the receiving module is used for receiving the father node identification sent by the second node in the cluster, wherein the father node identification of each node is initially self identification; the updating module is used for comparing the father node identification sent by the second node with the father node identification of the second node so as to update the father node identification of the second node; the determining module is used for determining the self as a root node if the identity of the parent node of the determining module is the same as the identity of the determining module after at least one updating step; and the sending module is used for sending the node information of the self to the monitor under the condition that the self is the root node.
The processing device for cluster faults in this embodiment can implement each step described in the method embodiment of the present invention, and can also implement the same technical effects, which are not described herein again.
< electronic device embodiment >
The embodiment provides an electronic device, which comprises a processor and a memory, wherein the memory stores machine executable instructions capable of being executed by the processor, and the processor executes the machine executable instructions to realize the cluster fault processing method described by the method embodiment of the invention.
The electronic device in this embodiment can implement each step described in the method embodiment of the present invention, and can also implement the same technical effects, which are not described herein again.
The present invention may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present invention.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.
Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are all equivalent.
The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims (11)

1. A method for handling a cluster failure, wherein the cluster comprises a plurality of nodes, the plurality of nodes comprising at least one master node, the master node corresponding to at least one slave node, and wherein the method is implemented by a monitor outside the cluster, comprising:
receiving node information reported by root nodes of the cluster, wherein the root nodes are determined through mutual communication among nodes in the cluster;
determining the number of root nodes in the cluster according to the received node information;
Under the condition that the number of the root nodes is more than one, determining that the cluster has network partition faults;
after said determining that the cluster fails in a network partition, the method further comprises:
sending a merging instruction to root nodes of the clusters so as to enable merging operation to be carried out among more than one root node in the clusters;
receiving merging feedback information of more than one root node in the cluster;
and under the condition that the merging feedback information indicates that the merging operation fails to be executed, confirming that the cluster has network partition faults.
2. The method of claim 1, wherein after the determining that the cluster fails in a network partition, the method further comprises:
transmitting a status report instruction to a plurality of nodes of the cluster, so that each node in the plurality of nodes reports own status information to the monitor, wherein the status information comprises at least one of node identification, node address, node type, identification of the belonging root node, information of the belonging master node and identification of the corresponding Ha Xicao;
and receiving the state information sent by the plurality of nodes.
3. The method of claim 2, wherein after said receiving status information sent by said plurality of nodes, said method further comprises:
Determining at least two network communication areas according to the identification of the root node to which each node belongs, wherein the network communication areas are in one-to-one correspondence with the root nodes;
detecting whether each master node and corresponding slave node are positioned in the same network communication area;
and if the first master node and at least one corresponding slave node are positioned in a first network communication area, and the at least one slave node of the first master node is positioned in a second network communication area, sending a release instruction to the cluster so as to clear the master-slave relationship between the slave nodes in the second network communication area and the first master node.
4. The method of claim 2, wherein after said receiving status information sent by said plurality of nodes, said method further comprises:
periodically sending connectivity detection instructions to each node of the cluster according to a preset frequency;
determining whether a plurality of nodes in the cluster have offline nodes according to a response result of each node to the connectivity detection instruction;
and determining a new master node from slave nodes of the offline node under the condition that the offline node exists in the cluster and the node type of the offline node is the master node.
5. The method of claim 4, wherein the determining a new master node from the slave nodes of the offline node comprises:
transmitting an offset reporting instruction to a slave node of the offline node so that the slave node feeds back the offset of self-synchronous data;
and determining the slave node with the largest offset from the slave nodes of the offline nodes, and sending a type conversion instruction to the slave node with the largest offset so as to convert the slave node with the largest offset into the master node.
6. The method of claim 5, wherein the method further comprises:
sending a hash slot clearing instruction to the cluster to clear the corresponding relation between the offline node and the corresponding hash slot;
and sending a hash slot allocation instruction to the cluster to establish a corresponding relation between the hash slot corresponding to the offline node and the slave node with the largest offset.
7. The method according to claim 1, wherein the method further comprises:
and under the condition that the merging feedback information indicates that the merging operation is successfully executed, sending a network partition restoration instruction to the cluster so as to restore the cluster to a normal running state.
8. A method for handling a cluster failure, implemented by a first node in a cluster, comprising:
receiving a father node identifier sent by a second node in the cluster, wherein the father node identifier of each node is initially a self identifier;
comparing the father node identification sent by the second node with the father node identification of the father node to update the father node identification of the father node;
after at least one updating step, if the identity of the father node of the self is the same as the identity of the self, determining the self as a root node;
in case of itself being a root node, sending node information of itself to the monitor according to claim 1.
9. A cluster failure handling device, wherein the cluster includes a plurality of nodes, the plurality of nodes including at least one master node, the master node corresponding to at least one slave node, and wherein the device is applied to a monitor outside the cluster, and comprises:
the first receiving module is used for receiving node information reported by the root nodes of the cluster, wherein the root nodes are determined through mutual communication among the nodes in the cluster;
the first processing module is used for determining the number of root nodes in the cluster according to the received node information;
The second processing module is used for determining that the cluster has network partition faults under the condition that the number of the root nodes is more than one;
the system also comprises a review module for: sending a merging instruction to root nodes of the cluster so as to enable merging operation to be carried out among more than one root node in the cluster; receiving merging feedback information of more than one root node in the cluster; and under the condition that the merging feedback information indicates that the merging operation fails to be executed, confirming that the cluster has network partition faults.
10. A cluster failure handling device, applied to a first node in a cluster, comprising:
the receiving module is used for receiving father node identifiers sent by a second node in the cluster, wherein the father node identifier of each node is initially self-identifier;
the updating module is used for comparing the father node identification sent by the second node with the father node identification of the second node so as to update the father node identification of the second node;
the determining module is used for determining the self as a root node if the identity of the self parent node is the same as the identity of the self parent node after at least one updating step;
a sending module, configured to send the node information of itself to the monitor according to claim 1, where the node is a root node.
11. An electronic device comprising a memory storing executable commands and a processor that, when executing the executable commands, performs the method of any of claims 1-8.
CN202010477541.9A 2020-05-29 2020-05-29 Cluster fault processing method and device and electronic equipment Active CN111708668B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010477541.9A CN111708668B (en) 2020-05-29 2020-05-29 Cluster fault processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010477541.9A CN111708668B (en) 2020-05-29 2020-05-29 Cluster fault processing method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN111708668A CN111708668A (en) 2020-09-25
CN111708668B true CN111708668B (en) 2023-07-07

Family

ID=72538409

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010477541.9A Active CN111708668B (en) 2020-05-29 2020-05-29 Cluster fault processing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111708668B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113810216A (en) * 2020-12-31 2021-12-17 京东科技控股股份有限公司 Cluster fault switching method and device and electronic equipment
CN113315657B (en) * 2021-05-26 2023-11-24 中电信数智科技有限公司 Method and system for analyzing influence of telecommunication transmission network clients based on union collection
CN115037595B (en) * 2022-04-29 2024-04-23 北京华耀科技有限公司 Network recovery method, device, equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012209625A (en) * 2011-03-29 2012-10-25 Nec Corp System and method for reducing wiring complexity in cluster system

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6877107B2 (en) * 2001-07-05 2005-04-05 Softwired Ag Method for ensuring operation during node failures and network partitions in a clustered message passing server
US9098439B2 (en) * 2012-01-05 2015-08-04 International Business Machines Corporation Providing a fault tolerant system in a loosely-coupled cluster environment using application checkpoints and logs
CN102594596B (en) * 2012-02-15 2014-08-20 华为技术有限公司 Method and device for recognizing available partitions, and clustering network system
US9146820B2 (en) * 2013-04-29 2015-09-29 King Fahd University Of Petroleum And Minerals WSAN simultaneous failures recovery method
US10341252B2 (en) * 2015-09-30 2019-07-02 Veritas Technologies Llc Partition arbitration optimization
CN107526659B (en) * 2016-06-21 2021-02-12 伊姆西Ip控股有限责任公司 Method and apparatus for failover
CN106656624B (en) * 2017-01-04 2019-05-14 合肥康捷信息科技有限公司 Optimization method based on Gossip communication protocol and Raft election algorithm
US10237346B2 (en) * 2017-02-08 2019-03-19 Vmware, Inc. Maintaining partition-tolerant distributed metadata
US10719417B2 (en) * 2018-01-30 2020-07-21 EMC IP Holding Company, LLC Data protection cluster system supporting multiple data tiers
CN109040212B (en) * 2018-07-24 2021-09-21 苏州科达科技股份有限公司 Method, system, device and storage medium for accessing device to server cluster

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012209625A (en) * 2011-03-29 2012-10-25 Nec Corp System and method for reducing wiring complexity in cluster system

Also Published As

Publication number Publication date
CN111708668A (en) 2020-09-25

Similar Documents

Publication Publication Date Title
CN111708668B (en) Cluster fault processing method and device and electronic equipment
CN109729111B (en) Method, apparatus and computer program product for managing distributed systems
CN109683826B (en) Capacity expansion method and device for distributed storage system
EP2923272B1 (en) Distributed caching cluster management
US9367261B2 (en) Computer system, data management method and data management program
JP2018045715A (en) Failover and recovery for replicated data instances
US10462250B2 (en) Distributed caching cluster client configuration
US20190075084A1 (en) Distributed Lock Management Method, Apparatus, and System
CN107404509B (en) Distributed service configuration system and information management method
US9529772B1 (en) Distributed caching cluster configuration
US20130139178A1 (en) Cluster management system and method
CN112083889A (en) Data migration method, device, equipment and readable storage medium
JPWO2014076838A1 (en) Virtual machine synchronization system
CN104850416A (en) Upgrading system, method and device and cloud computing node
WO2017071384A1 (en) Message processing method and apparatus
US10877994B2 (en) Identifier based data replication
CN106254814B (en) Conference recovery method, service management center and system
CN111104250B (en) Method, apparatus and computer readable medium for data processing
CN111147312A (en) Resource allocation management method and device, resource allocation cache management method and device, and allocation management system
CN108509296B (en) Method and system for processing equipment fault
US10659289B2 (en) System and method for event processing order guarantee
JP2010244463A (en) Event detection control method and system
CN108206843B (en) Cluster access method and device
CN104794026A (en) Cluster instance and multi-data-source binding failover method
CN1725758A (en) Method for synchronizing a distributed system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant