CN111708668A

CN111708668A - Cluster fault processing method and device and electronic equipment

Info

Publication number: CN111708668A
Application number: CN202010477541.9A
Authority: CN
Inventors: 汤爱迪
Original assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Current assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2020-09-25
Anticipated expiration: 2040-05-29
Also published as: CN111708668B

Abstract

The invention relates to a processing method and device of cluster faults and electronic equipment, and relates to the field of cloud computing. The method comprises the following steps: receiving node information reported by a root node of a cluster, wherein the root node is determined by mutual communication among nodes in the cluster; determining the number of root nodes in the cluster according to the received node information; and determining that the cluster has a network partition fault when the number of the root nodes is more than one. The method can automatically find the fault of the database cluster under the condition of network partition, find the problem in time and avoid production accidents.

Description

Cluster fault processing method and device and electronic equipment

Technical Field

The invention relates to the technical field of cloud computing, in particular to a processing method of a cluster fault, a processing method of a cluster fault and electronic equipment.

Background

Databases (e.g., Redis) are widely used in various service scenarios for content identification, and in a database cluster, when we encounter bottlenecks of stand-alone memory, concurrency, traffic, and the like, a purpose of high availability can be achieved through the database cluster.

The nodes in the database cluster are divided into a Master node (Master node) and a Slave node (Slave node), the Master node is responsible for all read-write requests and maintenance of cluster key information, and the Slave node is only responsible for copying Master node data and state information.

The nodes can communicate by adopting a Gossip protocol of P2P, for example, each Master node in the cluster can regularly send ping messages to other Master nodes and receive node reply pong messages, if Master node A fails to communicate with Master node B all the time within cluster-node-timeout, A marks B as a subjective offline, and simultaneously, A broadcasts a message that B is considered as the subjective offline to Master cluster. When more than half of Master nodes determine that B is subjectively offline, B is determined to be objectively offline, and fault finding is completed.

In the related art, the fault discovery function of the cluster can only deal with the situation that a server fault causes a fault of an individual Master node under the condition that the network is normal, and the fault discovery cannot be performed on the fault related to the network partition.

Therefore, it is necessary to provide a new technical solution for processing cluster failures.

Disclosure of Invention

An object of the present invention is to provide a new technical solution for cluster fault handling.

According to a first aspect of the present invention, there is provided a method for handling a cluster failure, where the cluster includes a plurality of nodes, and the plurality of nodes includes at least one master node corresponding to at least one slave node, where the method is implemented by a monitor outside the cluster, and includes:

receiving node information reported by a root node of the cluster, wherein the root node is determined through mutual communication among nodes in the cluster;

determining the number of root nodes in the cluster according to the received node information;

and determining that the cluster has a network partition fault when the number of the root nodes is more than one.

Optionally, after the determining that the cluster has a network partition failure, the method further includes:

sending a merging instruction to the root nodes of the cluster so as to enable more than one root node in the cluster to perform merging operation;

receiving merged feedback information of more than one root node in the cluster;

and under the condition that the merging feedback information indicates that the merging operation fails to be executed, confirming that the cluster has network partition faults.

sending a state reporting instruction to a plurality of nodes of the cluster so that each node of the plurality of nodes reports its own state information to the monitor, wherein the state information comprises at least one of a node identifier, a node address, a node type, an identifier of a root node to which the node belongs, information of a main node to which the node belongs, and an identifier of a corresponding hash slot;

and receiving the state information sent by the plurality of nodes.

Optionally, after the receiving the status information sent by the plurality of nodes, the method further includes:

determining at least two network communication areas according to the identifier of a root node to which each node belongs, wherein the network communication areas correspond to the root nodes one to one;

detecting whether each main node and the corresponding slave node are located in the same network communication area;

and if the first main node and at least one corresponding slave node are positioned in a first network communication area and at least one slave node of the first main node is positioned in a second network communication area, sending a release instruction to the cluster to clear the master-slave relationship between the slave node and the first main node in the second network communication area.

periodically sending a connectivity detection instruction to each node of the cluster according to a preset frequency;

determining whether offline nodes exist in a plurality of nodes in the cluster according to the response result of each node to the connectivity detection instruction;

and determining a new master node from the slave nodes of the offline node under the condition that the offline node exists in the cluster and the node type of the offline node is the master node.

Optionally, the determining a new master node from the slave nodes of the downline nodes includes:

sending an offset reporting instruction to the slave node of the offline node so that the slave node feeds back the offset of self synchronous data;

and determining the slave node with the maximum offset from the slave nodes of the downline nodes, and sending a type conversion instruction to the slave node with the maximum offset so as to convert the slave node with the maximum offset into the master node.

Optionally, the method further comprises:

sending a hash slot clearing instruction to the cluster so as to clear the corresponding relation between the offline node and the corresponding hash slot;

and sending a hash slot distribution instruction to the cluster to establish a corresponding relation between the hash slot corresponding to the downline node and the slave node with the maximum offset.

Optionally, the method further comprises:

and sending a network partition recovery instruction to the cluster under the condition that the merging feedback information shows that the merging operation is successfully executed, so that the cluster is recovered to a normal operation state.

According to a second aspect of the present invention, there is provided a method for processing a cluster fault, which is implemented by a first node in a cluster, including:

receiving a father node identifier sent by a second node in the cluster, wherein the father node identifier of each node is initially a self identifier;

comparing the father node identification sent by the second node with the father node identification of the second node to update the father node identification of the second node;

after at least one updating step, if the father node identification of the self is the same as the self identification, the self is determined to be a root node;

in the case where the node itself is the root node, the node information of itself is sent to the monitor as described above.

According to a third aspect of the present invention, there is provided an apparatus for processing a cluster failure, where the cluster includes a plurality of nodes, and the plurality of nodes includes at least one master node corresponding to at least one slave node, where the apparatus is applied to a monitor outside the cluster, and includes:

a first receiving module, configured to receive node information reported by a root node of the cluster, where the root node is determined by mutual communication between nodes in the cluster;

the first processing module is used for determining the number of root nodes in the cluster according to the received node information;

and the second processing module is used for determining that the cluster has a network partition fault when the number of the root nodes is more than one.

Optionally, the apparatus further comprises a review module configured to:

Optionally, the apparatus further comprises a status collection module configured to:

and receiving the state information sent by the plurality of nodes.

Optionally, the apparatus further comprises a master-slave control module configured to:

Optionally, the apparatus further comprises a failover module, the failover module further comprising:

the sending unit is used for regularly sending a connectivity detection instruction to each node of the cluster according to a preset frequency;

a detecting unit, configured to determine whether offline nodes exist in multiple nodes in the cluster according to a response result of each node to the connectivity detection instruction;

and the master selecting unit is used for determining a new master node from the slave nodes of the offline node under the condition that the offline node exists in the cluster and the node type of the offline node is the master node.

Optionally, the master unit further comprises:

the offset obtaining subunit is configured to send an offset reporting instruction to the slave node of the offline node, so that the slave node feeds back an offset of self-synchronization data;

and the master-slave conversion sub-unit is used for determining the slave node with the maximum offset from the slave nodes of the downline nodes and sending a type conversion instruction to the slave node with the maximum offset so as to convert the slave node with the maximum offset into the master node.

Optionally, the apparatus further comprises a clearing module configured to:

Optionally, the apparatus further comprises a recovery module configured to:

According to a fourth aspect of the present invention, there is provided a device for processing a cluster fault, which is applied to a first node in a cluster, and includes:

the receiving module is used for receiving father node identifications sent by second nodes in the cluster, wherein the father node identification of each node is initially the self identification;

the updating module is used for comparing the father node identification sent by the second node with the father node identification of the updating module so as to update the father node identification of the updating module;

the determining module is used for determining that the self is a root node if the father node identification of the self is the same as the self identification after at least one updating step;

a sending module, configured to send node information of itself to the monitor as described above when the sending module is a root node.

According to a fifth aspect of the present invention, there is also provided an electronic device, including a memory and a processor, where the memory stores executable commands, and the processor executes the executable commands to implement the method according to the first aspect or the second aspect of the present invention.

In the cluster fault processing method in this embodiment, the monitor acquires the root node information of the cluster, and determines whether the cluster has a network partition fault or not according to the root node information, so that a fault of the database cluster can be automatically found under the condition of network partition, a problem can be timely found, and a production accident can be avoided.

Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a schematic diagram of an electronic device that may be used to implement an embodiment of the invention.

Fig. 2 is a flowchart of a processing method of a cluster failure according to an embodiment of the present invention.

Detailed Description

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

< hardware configuration >

Fig. 1 shows a hardware configuration of an electronic device that can be used to implement an embodiment of the present invention.

Referring to fig. 1, an electronic device 1000 includes a processor 1100, a memory 1200, an interface device 1300, a communication device 1400, a display device 1500, and an input device 1600. The processor 1100 may be, for example, a central processing unit CPU, a micro control unit MCU, or the like. The memory 1200 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 1300 includes, for example, a USB interface, a serial interface, and the like. The communication device 1400 is, for example, a wired network card or a wireless network card. The display device 1500 is, for example, a liquid crystal display panel. The input device 1600 includes, for example, a touch screen, a keyboard, a mouse, a microphone, and the like.

In an embodiment applied to this description, the memory 1200 of the electronic device 1000 is used to store instructions for controlling the processor 1100 to operate in support of implementing a method according to any embodiment of this description. The skilled person can design the instructions according to the solution disclosed in the present specification. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.

It should be understood by those skilled in the art that although a plurality of devices of the electronic apparatus 1000 are shown in fig. 1, the electronic apparatus 1000 of the embodiments of the present specification may refer to only some of the devices, for example, only the processor 1100, the memory 1200 and the communication device 1400.

The hardware configuration shown in FIG. 1 is illustrative only and is not intended to limit the present invention, its application, or uses in any way.

< method examples >

The embodiment provides a method for processing a cluster fault, which is implemented by the electronic device 1000 shown in fig. 1, for example.

In this embodiment, the cluster includes a plurality of nodes, where the plurality of nodes includes at least one master node, and the master node corresponds to at least one slave node.

In addition, it should be further noted that the present invention is applicable to all database clusters, and in the following embodiments, a cluster of a Redis database is taken as an example, but is not limited to the Redis database, and therefore, the following example does not constitute a limitation to the present invention.

In this embodiment, the electronic device 1000 is a monitor outside the cluster. The monitor is a component dedicated to monitoring the cluster state, for example, a client software, which is capable of communicating with all nodes.

As shown in fig. 2, the method includes the following steps S1100-S1300.

In step S1100, node information reported by a root node of a cluster is received, where the root node is determined by mutual communication between nodes in the cluster.

In this embodiment, the root node is determined by mutual communication between nodes in the cluster. One root node corresponds to one network communication area. The network communication area refers to a set of a plurality of nodes capable of exchanging data between each two nodes. And a communication relation exists between two nodes in the same network communication area. .

In this embodiment, the connected relationship means that two nodes can directly or indirectly communicate.

In one example, the acquisition process of the root node (root node) includes: a first node in the plurality of nodes acquires parent node (parent node) information of a second node based on a consistency protocol; the first node updates the father node information of the first node according to the father node information of the second node and a preset updating rule; and the first node judges that the first node is a root node under the condition that the father node is the first node.

In the above example, the parent node of a certain node is initially the node itself.

In the above example, the coherence protocol is, for example, the Gossip protocol.

In the above example, the update rule is, for example, a node ID minimum rule or a node ID maximum rule.

In one example, the updating, by the first node, parent node information of itself according to the parent node information of the second node and a preset updating rule includes: under the condition that the first node corresponds to different father nodes with the second node, acquiring a root node of the first node and a root node of the second node; and the first node updates the father node information of the first node according to the root node of the first node, the root node of the second node and a preset updating rule.

The above-mentioned root node for obtaining the root node of the self and the root node of the second node are, for example: in a chain E-F-G-H-K-J (root node), if the parent node of E is F and the root node is J, the E changes the parent node of the E into J, so that the chain becomes E-J, wherein the parent node and the root node of the E are J.

The process of acquiring the root node can be realized based on a Union-Find algorithm. And looking up sets (Union-FindSet), also known as disjoint sets data structures. Refers to a set of disjoint Sets, providing both merge (Union) and Find (Find) operations. find (I) is the set to which I belongs, and usually we use find (I) and find (j) to determine if I and j are connected, i.e. belong to the same set. The Union method connects two sets of I and J, and after the method is executed, the set of I is connected with all elements of the set of J and all elements of the set of J are connected

In one example, the obtaining process of the node information of the root node includes the following steps.

1. All Master and Slave nodes in the cluster follow the Gossip protocol to mutually transmit information in the cluster, and the information carries parent node nodeId (node ID) of the Master and the Slave nodes.

2. When node a receives the information of node B, it checks whether the parent node of B and its own parent node are the same.

3. If the two nodes are the same, the two nodes are already in the same connected graph, and the node A stops acting.

4. If not, the node A executes find operation, and finds the final root node according to the parent node (namely, the parent node is equal to the node of the node).

And 5, the find operation updates the parent node into the root node for path compression.

6. And the node A judges whether the root nodes of the node A and the node B are consistent, and if the root nodes of the node A and the node B are consistent, the two nodes are already in the same connected graph, and the node A stops acting.

7. If the root nodes are different, at this time, because a and B can communicate with each other, node a initiates a Union operation on its own root node and the root node of B, for example, the root node with the minimum nodeId is taken as the root node of the new connectivity graph, and the root nodes of both are updated. At this time, A and B are in the same communication diagram.

7. Because the Redis cluster communication follows the Gossip protocol, after a period of time, nodes capable of communicating with each other all have the same root node.

8. If the parent node of the node is equal to the parent node, the node is a root node, and each connected graph only has one root node.

And 9, reporting the self state of the root node to a monitor at regular time.

In step S1200, the number of root nodes in the cluster is determined according to the received node information.

In this embodiment, the monitor may determine the number of root node IDs according to the received node information, that is, the number of root nodes in the cluster.

In step S1300, when the number of root nodes is greater than one, it is determined that a network partition failure occurs in the cluster.

In this embodiment, the number of root nodes is greater than one, which means that the cluster network includes at least two partitions, that is, a network partition failure occurs.

In the method for processing the cluster fault in the embodiment, the monitor is used for acquiring the root node information of the cluster, and whether the cluster has the network partition fault is judged according to the root node information, so that the fault of the Redis cluster can be automatically found under the condition of network partition, the problem can be timely found, and the production accident can be avoided.

In one example, after determining that the cluster has a network partition failure, the method further comprises: sending a merging instruction to the root nodes of the cluster so as to enable more than one root node in the cluster to perform merging operation; receiving the combined feedback information of more than one root node in the cluster; and confirming that the cluster has a network partition fault in the case that the merging feedback information indicates that the merging operation fails to be executed.

Through the process, the network partition fault can be confirmed again, so that the accuracy of partition fault detection is guaranteed.

As an example of reconfirming the network partition failure, after the step 9, the method further includes the following steps:

10. if the Redis monitor only receives the information reported by one root node in the same time period, the network is in a normal state, and the nodes can communicate with each other. And continuing monitoring.

11. If the Redis monitor receives the information reported by two or more root nodes, the network partition fault of the Redis cluster may occur.

A Redis monitor issuing Union requests to a plurality of root nodes.

And 13, the Union request carries address information of another node B, the node A receiving the Union request firstly pings the node B, if the node A succeeds in the ping, the two nodes perform the Union operation, and the root node with the minimum nodeId is taken as the root node of the new connected graph.

14. Update and return Union success. The failure returns a Union failure.

15. If all root nodes can be successfully Union, the network is normal, and at the moment, the Redis cluster is updated to be that all nodes belong to a connected graph.

16. If there is a Union failure, it is confirmed that network partitioning has occurred.

In one example, in the event of a network partition failure, the method further comprises the following steps of detecting, by the monitor, a cluster state: sending a state reporting instruction to a plurality of nodes of the cluster so that each node of the plurality of nodes reports state information of the node to a monitor, wherein the state information comprises at least one of node identification, node address, node type, identification of a root node to which the node belongs, information of a main node to which the node belongs and identification of a corresponding hash slot; and receiving the state information sent by the plurality of nodes.

In one example of the monitor detecting the cluster state, the following steps 1-4 are specifically included.

1. The monitor informs all root nodes of the occurrence of network partition, and the root nodes propagate in the connectable nodes according to the Gossip protocol and are currently in a network partition mode

2. And all the nodes switch to the network partition mode after receiving the change information, and update the parent node information to be the root node.

3. All nodes in the network partition mode report states to a Redis monitor periodically, wherein the states include nodeId, addresses, the node to which the node belongs, node types (Master/Slave), Hash slot information reported by the Master node, and nodeId of the Master node to which the Slave node belongs.

4. In the network partition mode, the Redis cluster loses the automatic fault transfer capability, fault discovery and fault transfer responsibilities are handed over from the Redis cluster to a Redis monitor, and the monitor is responsible for managing and maintaining node metadata information.

Redis clusters do not use a consistent hash, but rather introduce the concept of a hash-slot. 16384 hash slots are built in the Redis cluster, when a key-value needs to be placed in the Redis cluster, Redis calculates a result for the key by using a crc16 algorithm, then calculates the remainder for the result pair 16384, so that each key corresponds to a hash slot with a number between 0 and 16383, and the Redis used for mapping the hash slot to different nodes according to the approximately equal number of the nodes.

In the above example, originally, each node in the cluster maintains the cluster metadata, and broadcasts and updates each other, and the monitor maintains the metadata, which is beneficial to avoiding the inconsistency of the metadata under the condition of network partition failure and avoiding data loss during the network partition.

In one example, in the event of a network partition failure of the cluster, the method further comprises the steps of reassigning the master and slave nodes by: determining at least two network communication areas according to the identifier of the root node to which each node belongs, wherein the network communication areas correspond to the root nodes one to one; detecting whether each main node and the corresponding slave node are located in the same network communication area; and if the first main node and the at least one corresponding slave node are positioned in the first network communication area and the at least one slave node of the first main node is positioned in the second network communication area, sending a release instruction to the cluster to clear the master-slave relationship between the slave node and the first main node in the second network communication area.

In another example, the process of performing master-slave node assignment includes: under the condition that the target master node and at least one target slave node are located in the same partition, keeping the target master node and the target slave node unchanged, and releasing slave nodes of which the target master node is located in other partitions; or, in the case that the target master node is located in the first partition and at least two slave nodes of the target master node are located in the second partition, the target master node is released and the master node is reassigned among the at least two slave nodes of the target master node.

As an example, the master node is reassigned by the following steps.

1. If the Master node and all Slave nodes thereof are in the same network partition, the monitor continues to check the next Master node without changing

2. If the Master node and the Slave node are in different partitions, checking whether the number of the Slave nodes in the same partition is more than or equal to 1, if so, releasing the Slave node resources not in the same partition, and continuously checking the next node

3. And if the Master node and all Slave nodes below the Master node are in different partitions, checking whether the partition where the Master node is located has a resource new Slave node, and if so, adding the Slave node newly and releasing the Slave resources of other partitions.

4. And if the Master node and all Slave nodes below the Master node are in different partitions and no resource under the network partition where the Master is located can allocate a new Slave, checking whether the number of the Slave nodes under a certain partition is more than or equal to 2, and if so, promoting the Slave node with the largest offset as the Master node to ensure that the topological structure of one Master and multiple slaves is unchanged. And then releasing the Master node resource and the rest partition Slave node resources.

5. If the number of the Slave nodes in a partition is not larger than or equal to 2, searching whether resources in the partition where the Slave node is located can be newly started, if so, newly adding the Slave node, promoting the original Slave node to be the Master, and then releasing the resources.

6. If the situations are not met, the administrator is informed of adding the machine in a mode of e.g. mail.

7. And after the Slave node is reassigned, the monitor broadcasts ping information to the cluster and updates the information of all Master nodes and the Slave node.

In the above example, the master and slave nodes are automatically reassigned during network partitioning, saving labor and time and ensuring the availability of the cluster in the partitioned state.

In one example, in the event of a network partition failure of the cluster, the method further comprises the step of failover: periodically sending a connectivity detection instruction to each node of the cluster according to a preset frequency; determining whether a plurality of nodes in the cluster have offline nodes or not according to the response result of each node to the connectivity detection instruction; and determining a new master node from the slave nodes of the offline node under the condition that the offline node exists in the cluster and the node type of the offline node is the master node. Sending a Hash slot clearing instruction to the cluster so as to clear the corresponding relation between the offline node and the corresponding Hash slot; and sending a hash slot distribution instruction to the cluster to establish a corresponding relation between the hash slot corresponding to the downline node and the slave node with the maximum offset.

In the above example, determining a new master node from the slave nodes of the downline nodes includes: sending an offset reporting instruction to a slave node of a downlink node so that the slave node feeds back the offset of self synchronous data; and determining the slave node with the maximum offset from the slave nodes of the downline nodes, and sending a type conversion instruction to the slave node with the maximum offset so as to convert the slave node with the maximum offset into the master node.

As an example, the process of failover specifically includes the following steps.

1. When the monitor monitors that a certain node fails to communicate all the time within the cluster-node-timeout time, the node is considered to have a fault, and the node is marked to be in a down state.

2. If the node is the Slave node, the monitor does not process the node, automatically synchronizes with the Master node after the Slave node is recovered, and completely copies the Master existing data.

3. If the node is a Master node, the monitor issues requests to all Slave nodes under the Master node for qualification check, and the Slave nodes report the offset to the monitor.

4. And the monitor selects the Slave node with the maximum offset to replace the Master node, and the Slave node is deduplicated to become the Master node.

5. Executing the clusterDelSlot operation revokes the slot responsible for the failed master node and executing the clusterAddSlot delegates the slot to the Slave node.

6. The monitor broadcasts a pong message to the cluster, informs all nodes in the cluster that the Slave node becomes a Master node, and takes over the slot information of the failed Master.

In the above example, failover is automatically performed during network partitioning, saving labor and time, and ensuring that the cluster is highly available.

In one example, in the event of a network partition failure of the cluster, the method further comprises the following cluster recovery steps: and under the condition that the merging feedback information shows that the merging operation is successfully executed, sending a network partition recovery instruction to the cluster so as to enable the cluster to recover to a normal operation state.

As an example, the process of cluster recovery includes the following steps.

1. When the monitor monitors that all nodes are recovered to the same root node, the network partition recovery is illustrated.

2. The monitor issues a network partition recovery request to the root node, the root node follows the Gossip protocol and is in a normal state when information is transmitted, and all nodes in the connected graph switch to the normal state after receiving the information.

3. The monitor transfers the failure discovery and automatic transfer right to self management in the cluster. And returning to the normal running state of the cluster.

In the above example, the monitor takes over the recovery after the network partition is recovered, so as to recover to the self-management in the cluster, which is beneficial to timely recovering to the normal state.

In the embodiment, the Redis monitor detects whether a network partition exists in the Redis cluster through a Union-Find algorithm, and if the network partition exists, the fault discovery and fault transfer responsibility is handed over from the Redis cluster to the Redis monitor. The Redis monitor is simultaneously responsible for cluster metadata maintenance, and if the Master node and the Slave node are found to be in an unconnected network, the Slave node is reassigned to ensure that the Slave and the Master are in the same network environment. The monitor is responsible for monitoring node failures, and when a node failure occurs, the monitor is responsible for performing failover as a leader and notifying all nodes. When the monitor detects that the network partition is recovered, the fault discovery and fault transfer responsibilities are handed over from the Redis monitor back to the Redis cluster, and the elastic recovery is ensured.

The present embodiment further provides another method for processing a cluster fault, which is implemented by a first node in a cluster, and includes the following steps: receiving a father node identifier sent by a second node in the cluster, wherein the father node identifier of each node is initially the self identifier; comparing the father node identification sent by the second node with the father node identification of the second node to update the father node identification of the second node; after at least one updating step, if the father node identification of the self is the same as the self identification, the self is determined to be a root node; and sending the node information of the node to the monitor under the condition that the node is the root node.

< apparatus embodiment >

The embodiment provides a processing apparatus for a cluster fault, where a cluster includes a plurality of nodes, the plurality of nodes includes at least one master node, and the master node corresponds to at least one slave node, where the apparatus is applied to a monitor outside the cluster, and includes a first receiving module, a first processing module, and a second processing module.

The first receiving module is used for receiving node information reported by a root node of a cluster, wherein the root node is determined by mutual communication among nodes in the cluster.

And the first processing module is used for determining the number of the root nodes in the cluster according to the received node information.

And the second processing module is used for determining that the cluster has a network partition fault under the condition that the number of the root nodes is more than one.

In one example, the apparatus further comprises a review module to: sending a merging instruction to the root nodes of the cluster so as to enable more than one root node in the cluster to perform merging operation; receiving the combined feedback information of more than one root node in the cluster; and confirming that the cluster has a network partition fault in the case that the merging feedback information indicates that the merging operation fails to be executed.

In one example, the apparatus further comprises a state collection module to: sending a state reporting instruction to a plurality of nodes of the cluster so that each node of the plurality of nodes reports state information of the node to a monitor, wherein the state information comprises at least one of node identification, node address, node type, identification of a root node to which the node belongs, information of a main node to which the node belongs and identification of a corresponding hash slot; and receiving the state information sent by the plurality of nodes.

In one example, the apparatus further comprises a master-slave control module to: determining at least two network communication areas according to the identifier of the root node to which each node belongs, wherein the network communication areas correspond to the root nodes one to one; detecting whether each main node and the corresponding slave node are located in the same network communication area; and if the first main node and the at least one corresponding slave node are positioned in the first network communication area and the at least one slave node of the first main node is positioned in the second network communication area, sending a release instruction to the cluster to clear the master-slave relationship between the slave node and the first main node in the second network communication area.

In one example, the apparatus further comprises a failover module, the failover module further comprising: the sending unit is used for regularly sending a connectivity detection instruction to each node of the cluster according to a preset frequency; the detection unit is used for determining whether a plurality of nodes in the cluster have offline nodes according to the response result of each node to the connectivity detection instruction; and the master selecting unit is used for determining a new master node from the slave nodes of the offline node under the condition that the offline node exists in the cluster and the node type of the offline node is the master node.

In one example, selecting the master unit further comprises: the offset acquisition subunit is used for sending an offset reporting instruction to the slave nodes of the downlink nodes so as to enable the slave nodes to feed back the offset of self synchronous data; and the master-slave conversion sub-unit is used for determining the slave node with the maximum offset from the slave nodes of the downline nodes and sending a type conversion instruction to the slave node with the maximum offset so as to convert the slave node with the maximum offset into the master node.

In one example, the apparatus further comprises a clearing module to: sending a Hash slot clearing instruction to the cluster so as to clear the corresponding relation between the offline node and the corresponding Hash slot; and sending a hash slot distribution instruction to the cluster to establish a corresponding relation between the hash slot corresponding to the downline node and the slave node with the maximum offset.

In one example, the apparatus further comprises a recovery module to: and under the condition that the merging feedback information shows that the merging operation is successfully executed, sending a network partition recovery instruction to the cluster so as to enable the cluster to recover to a normal operation state.

The present embodiment further provides a device for processing a cluster fault, which is applied to a first node in a cluster, and includes: the receiving module is used for receiving father node identifications sent by second nodes in the cluster, wherein the father node identification of each node is initially the self identification; the updating module is used for comparing the father node identification sent by the second node with the father node identification of the updating module so as to update the father node identification of the updating module; the determining module is used for determining that the self is a root node if the father node identification of the self is the same as the self identification after at least one updating step; and the sending module is used for sending the node information of the sending module to the monitor under the condition that the sending module is the root node.

The processing apparatus for cluster failure in this embodiment can implement each step described in the method embodiment of the present invention, and can also implement the same technical effect, which is not described herein again.

< electronic device embodiment >

The embodiment provides an electronic device, which includes a processor and a memory, where the memory stores machine executable instructions capable of being executed by the processor, and the processor executes the machine executable instructions to implement the method for handling cluster faults described in the method embodiment of the present invention.

The electronic device in this embodiment can implement each step described in the method embodiment of the present invention, and can also implement the same technical effect, which is not described herein again.

The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present invention may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing an electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), with state information of computer-readable program instructions, which can execute the computer-readable program instructions.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, by software, and by a combination of software and hardware are equivalent.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims

1. A method for handling a cluster failure, wherein the cluster includes a plurality of nodes, and wherein the plurality of nodes includes at least one master node corresponding to at least one slave node, and wherein the method is implemented by a monitor outside the cluster, and comprises:

2. The method of claim 1, wherein after the determining that the cluster has a network partition failure, the method further comprises:

3. The method of claim 1, wherein after the determining that the cluster has a network partition failure, the method further comprises:

and receiving the state information sent by the plurality of nodes.

4. The method of claim 3, wherein after said receiving the status information sent by the plurality of nodes, the method further comprises:

5. The method of claim 3, wherein after said receiving the status information sent by the plurality of nodes, the method further comprises:

6. The method of claim 5, wherein determining a new master node from the slave nodes of the downline nodes comprises:

7. The method of claim 5, further comprising:

8. The method of claim 1, further comprising:

9. A method for handling cluster failures, implemented by a first node in a cluster, includes:

in case of itself being a root node, sending its own node information to the monitor of claim 1.

10. An apparatus for processing cluster failure, wherein the cluster comprises a plurality of nodes, and the plurality of nodes comprises at least one master node corresponding to at least one slave node, and wherein the apparatus is applied to a monitor outside the cluster, and comprises:

11. The device for processing the cluster fault is applied to a first node in a cluster and comprises the following components:

a sending module, configured to send node information of itself to the monitor according to claim 1, if the node is a root node.

12. An electronic device comprising a memory storing executable commands and a processor that, when executing the executable commands, implements the method of any one of claims 1-9.