CN111901422A - Method, system and device for managing nodes in cluster - Google Patents

Method, system and device for managing nodes in cluster Download PDF

Info

Publication number
CN111901422A
CN111901422A CN202010738723.7A CN202010738723A CN111901422A CN 111901422 A CN111901422 A CN 111901422A CN 202010738723 A CN202010738723 A CN 202010738723A CN 111901422 A CN111901422 A CN 111901422A
Authority
CN
China
Prior art keywords
node
cluster
nodes
fault
grouping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010738723.7A
Other languages
Chinese (zh)
Other versions
CN111901422B (en
Inventor
李二明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN202010738723.7A priority Critical patent/CN111901422B/en
Publication of CN111901422A publication Critical patent/CN111901422A/en
Application granted granted Critical
Publication of CN111901422B publication Critical patent/CN111901422B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/2866Architectures; Arrangements
    • H04L67/30Profiles

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention discloses a method, a system and a device for managing nodes in a cluster, which are used for determining the grouping condition of each node in the cluster according to a preset cluster grouping and deploying strategy when a deploying instruction for representing a cluster grouping mode is received; configuring a configuration file corresponding to each node in the cluster according to the grouping condition of each node in the cluster; after configuration files corresponding to the nodes in the cluster are configured, restarting the nodes in the cluster to enable the cluster grouping mode to take effect; in the cluster-packet mode, the target node is configured to monitor node status of the remaining nodes of the same group through message passing with the remaining nodes. Therefore, under a large-scale cluster, each node in the cluster can be deployed in a cluster grouping mode, and each node in the same group only needs to be monitored mutually, so that stable monitoring of states among the nodes is facilitated, and misjudgment is not easy to cause; moreover, the cluster grouping mode is adopted to help identify the fault node in the cluster so as to avoid the fault node from influencing the service.

Description

Method, system and device for managing nodes in cluster
Technical Field
The present invention relates to the field of cluster node management, and in particular, to a method, a system, and an apparatus for managing nodes in a cluster.
Background
At present, a method for monitoring node states in a cluster includes: each node in the cluster establishes a TCP (Transmission Control Protocol) connection with all other nodes. For any node, whether the states of all other nodes are normal or not is judged through message passing with all other nodes. When the cluster scale is small, the message flow among the nodes is not large, but when the cluster scale is large, if the number of the nodes in the cluster reaches hundreds or even thousands, the message flow among the nodes is very large, which is not beneficial to the stable monitoring of the state among the nodes and is easy to cause misjudgment; moreover, in a large-scale cluster, identification of a failed node in the cluster is difficult, and the consequences are serious if the service is affected.
Therefore, how to provide a solution to the above technical problem is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a method, a system and a device for managing nodes in a cluster, wherein under a large-scale cluster, the method can deploy each node in the cluster in a cluster grouping mode, and each node in the same group only needs to be monitored mutually, so that the method is favorable for stable monitoring of the state among the nodes and is not easy to cause misjudgment; moreover, the cluster grouping mode is adopted to help identify the fault node in the cluster so as to avoid the fault node from influencing the service.
In order to solve the above technical problem, the present invention provides a method for managing nodes in a cluster, including:
when a deployment instruction representing a cluster grouping mode is received, determining the grouping condition of each node in a cluster according to a preset cluster grouping deployment strategy;
configuring configuration files corresponding to the nodes in the cluster according to the grouping condition of the nodes in the cluster; the target configuration file corresponding to the target node represents a specific node in the same group with the target node; the target node is any node in the cluster;
restarting each node in the cluster after configuration files corresponding to each node in the cluster are configured, so that the cluster grouping mode takes effect; wherein, in the cluster grouping mode, the target node is configured to monitor node states of the remaining nodes through message passing with the remaining nodes of the same group.
Preferably, the process of determining the grouping condition of each node in the cluster according to the preset cluster grouping deployment policy includes:
and determining the grouping condition of each node in the cluster based on a cluster grouping deployment strategy for dividing the nodes belonging to the same network segment and/or corresponding to the same storage pool into the same group.
Preferably, the method for managing nodes in the cluster further includes:
when a certain node in the cluster fails, selecting a main node from normal nodes in the group of the failed node according to a preset election mechanism;
judging whether the master node holds a distributed lock;
if yes, executing fault processing operation of the fault node;
if not, triggering the main node to send distributed lock acquisition requests to other normal nodes in the same group, and judging whether the total number of nodes replying to the main node based on the distributed lock acquisition requests is larger than a preset reply number threshold value;
if so, determining that the master node successfully acquires the distributed lock to execute the fault processing operation of the fault node;
and if not, determining that the main node is a false main node, forbidding the false main node to enter a connection state within a preset duration, and re-executing the operation of selecting the main node from normal nodes in the group of the fault node according to a preset election mechanism.
Preferably, the process of performing the fault handling operation of the faulty node includes:
performing data recovery on the database of the fault node, and synchronizing the database content of the normal node of the group of the fault node based on the database of the data recovery; wherein, the database contents of the nodes in the same group are the same;
and releasing the virtual IP of the fault node, and reallocating the virtual IP of the fault node to a normal node of the group where the fault node is located based on a load balancing strategy so that the normal node replaces the fault node to continue processing node tasks.
Preferably, the process of executing the fault handling operation of the faulty node further includes:
and informing all normal nodes of the group of the fault nodes of the fault information of the fault nodes.
Preferably, the method for managing nodes in the cluster further includes:
when a deployment instruction representing a cluster mode is received, all nodes in the cluster are divided into the same group;
configuring a configuration file corresponding to each node in the cluster according to the condition that each node in the cluster belongs to the same group;
restarting each node in the cluster after configuration files corresponding to each node in the cluster are configured, so that the cluster mode takes effect; wherein, in the cluster mode, the target node is configured to monitor node states of the remaining nodes in the cluster through message passing with the remaining nodes.
In order to solve the above technical problem, the present invention further provides a management system for nodes in a cluster, including:
the first grouping module is used for determining the grouping condition of each node in the cluster according to a preset cluster grouping deployment strategy when a deployment instruction for representing a cluster grouping mode is received;
the first configuration module is used for configuring configuration files corresponding to the nodes in the cluster according to the grouping condition of the nodes in the cluster; the target configuration file corresponding to the target node represents a specific node in the same group with the target node; the target node is any node in the cluster;
the first restarting module is used for restarting each node in the cluster after the configuration files corresponding to each node in the cluster are configured, so that the cluster grouping mode takes effect; wherein, in the cluster grouping mode, the target node is configured to monitor node states of the remaining nodes through message passing with the remaining nodes of the same group.
Preferably, the management system of the nodes in the cluster further includes:
the system comprises an election module, a master node and a slave node, wherein the election module is used for electing the master node from normal nodes of a group where a fault node is located according to a preset election mechanism when a certain node in the cluster fails;
the judging module is used for judging whether the main node holds the distributed lock; if yes, executing the processing module; if not, executing an acquisition module;
the processing module is used for executing the fault processing operation of the fault node;
the acquisition module is used for triggering the main node to send distributed lock acquisition requests to other normal nodes in the same group and judging whether the total number of nodes replying to the main node based on the distributed lock acquisition requests is larger than a preset reply number threshold value or not; if yes, executing the processing module; if not, executing a prohibition module;
and the prohibiting module is used for determining that the master node is a false master node, prohibiting the false master node from entering a connection state within a preset duration, and re-executing the election module.
Preferably, the management system of the nodes in the cluster further includes:
the second grouping module is used for dividing all nodes in the cluster into the same group when a deployment instruction for representing a cluster mode is received;
the second configuration module is used for configuring configuration files corresponding to the nodes in the cluster according to the condition that the nodes in the cluster belong to the same group;
the second restarting module is used for restarting each node in the cluster after the configuration files corresponding to each node in the cluster are configured, so that the cluster mode takes effect; wherein, in the cluster mode, the target node is configured to monitor node states of the remaining nodes in the cluster through message passing with the remaining nodes.
In order to solve the above technical problem, the present invention further provides a management apparatus for nodes in a cluster, including:
a memory for storing a computer program;
and the processor is used for realizing the steps of the management method of the nodes in any one of the clusters when executing the computer program.
The invention provides a management method of nodes in a cluster, which comprises the steps of determining grouping conditions of all nodes in the cluster according to a preset cluster grouping deployment strategy when a deployment instruction representing a cluster grouping mode is received; configuring a configuration file corresponding to each node in the cluster according to the grouping condition of each node in the cluster; the target configuration file corresponding to the target node represents a specific node in the same group with the target node; after configuration files corresponding to the nodes in the cluster are configured, restarting the nodes in the cluster to enable the cluster grouping mode to take effect; wherein, in the cluster grouping mode, the target node is used for monitoring the node states of the other nodes through message transmission with the other nodes in the same group. Therefore, under a large-scale cluster, each node in the cluster can be deployed in a cluster grouping mode, and each node in the same group only needs to be monitored mutually, so that stable monitoring of states among the nodes is facilitated, and misjudgment is not easy to cause; moreover, the cluster grouping mode is adopted to help identify the fault node in the cluster so as to avoid the fault node from influencing the service.
The invention also provides a system and a device for managing the nodes in the cluster, and the system and the device have the same beneficial effects as the management method.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed in the prior art and the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a flowchart of a method for managing nodes in a cluster according to an embodiment of the present invention;
fig. 2 is a deployment scheme of a cluster grouping mode according to an embodiment of the present invention;
fig. 3 is a flowchart of distributed lock acquisition in a cluster grouping mode according to an embodiment of the present invention;
fig. 4 is a deployment scheme of a cluster mode according to an embodiment of the present invention.
Detailed Description
The core of the invention is to provide a method, a system and a device for managing nodes in a cluster, under a large-scale cluster, the method can deploy each node in the cluster by adopting a cluster grouping mode, and each node in the same group only needs to be monitored mutually, thereby being beneficial to the stable monitoring of the state among the nodes and not easy to cause misjudgment; moreover, the cluster grouping mode is adopted to help identify the fault node in the cluster so as to avoid the fault node from influencing the service.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating a method for managing nodes in a cluster according to an embodiment of the present invention.
The management method of the nodes in the cluster comprises the following steps:
step S1: and when a deployment instruction for representing a cluster grouping mode is received, determining the grouping condition of each node in the cluster according to a preset cluster grouping deployment strategy.
It should be noted that the preset of the present application is set in advance, and only needs to be set once, and the reset is not needed unless the modification is needed according to the actual situation.
Specifically, the cluster grouping mode is set for the nodes in the cluster (each node in the cluster can establish TCP connection with all other nodes), that is, the nodes in the cluster are divided into a plurality of groups, as shown in fig. 2 (taking nodes n1-n9 as an example), all the nodes in the same group only need to monitor each other in the cluster, and all the nodes in different groups are independent of each other.
Based on this, the cluster grouping deployment strategy for guiding the grouping of the nodes in the cluster needs to be set in advance, so that when a deployment instruction representing a cluster grouping mode is received, the nodes in the cluster are grouped and divided according to the set cluster grouping deployment strategy, and thus the grouping condition of each node in the cluster is determined.
Step S2: configuring a configuration file corresponding to each node in the cluster according to the grouping condition of each node in the cluster; and the target configuration file corresponding to the target node represents a specific node in the same group with the target node.
It should be noted that the target node in the present application is any node in the cluster.
Specifically, each node in the cluster corresponds to one configuration file, and taking the target node as an example, the role of the configuration file is described as follows: the target configuration file corresponding to the target node indicates specific nodes in the same group as the target node, that is, what specific nodes are monitored by the target node can be known from the target configuration file.
Based on the configuration file, the configuration file corresponding to each node in the cluster can be configured according to the grouping condition of each node in the cluster, so that each node meeting the same group can be monitored subsequently, and each node in different groups is independent.
Step S3: after configuration files corresponding to the nodes in the cluster are configured, restarting the nodes in the cluster to enable the cluster grouping mode to take effect; wherein, in the cluster grouping mode, the target node is used for monitoring the node states of the other nodes through message transmission with the other nodes in the same group.
Specifically, after configuration files corresponding to the nodes in the cluster are configured, the nodes in the cluster need to be restarted, and after the nodes in the cluster are restarted, the cluster grouping mode is enabled. In the cluster grouping mode, taking a target node as an example, the principle of monitoring the node state is explained as follows: the target node monitors the node states of the other nodes through message transmission with the other nodes in the same group, specifically, in the same group, a CTDB (Cluster TrivialDatabase) service running on each node is used, and the purpose of monitoring the node states is achieved through the CTDB service running on the node.
The invention provides a management method of nodes in a cluster, which comprises the steps of determining grouping conditions of all nodes in the cluster according to a preset cluster grouping deployment strategy when a deployment instruction representing a cluster grouping mode is received; configuring a configuration file corresponding to each node in the cluster according to the grouping condition of each node in the cluster; the target configuration file corresponding to the target node represents a specific node in the same group with the target node; after configuration files corresponding to the nodes in the cluster are configured, restarting the nodes in the cluster to enable the cluster grouping mode to take effect; wherein, in the cluster grouping mode, the target node is used for monitoring the node states of the other nodes through message transmission with the other nodes in the same group. Therefore, under a large-scale cluster, each node in the cluster can be deployed in a cluster grouping mode, and each node in the same group only needs to be monitored mutually, so that stable monitoring of states among the nodes is facilitated, and misjudgment is not easy to cause; moreover, the cluster grouping mode is adopted to help identify the fault node in the cluster so as to avoid the fault node from influencing the service.
On the basis of the above-described embodiment:
as an optional embodiment, the process of determining the grouping condition of each node in the cluster according to the preset cluster grouping deployment policy includes:
and determining the grouping condition of each node in the cluster based on a cluster grouping deployment strategy for dividing the nodes belonging to the same network segment and/or corresponding to the same storage pool into the same group.
Specifically, there are three cluster grouping deployment strategies: 1) and the nodes belonging to the same network segment in the cluster are divided into the same group, so that later maintenance is facilitated. 2) The nodes corresponding to the same storage pool in the cluster are divided into the same group, so that data loss caused by discontinuous data storage in subsequent fault transfer is avoided; for example, 200 nodes in a cluster, 40 nodes share one storage pool, so that the nodes in the cluster can be divided into five groups, and the nodes corresponding to the same storage pool belong to the same group. 3) And dividing the nodes which belong to the same network segment and correspond to the same storage pool in the cluster into the same group.
As an optional embodiment, the method for managing nodes in a cluster further includes:
when a certain node in the cluster fails, selecting a main node from normal nodes in the group of the failed node according to a preset election mechanism;
judging whether the main node holds a distributed lock;
if yes, executing fault processing operation of the fault node;
if not, triggering the main node to send distributed lock acquisition requests to other normal nodes in the same group, and judging whether the total number of nodes replying to the main node based on the distributed lock acquisition requests is greater than a preset reply number threshold value;
if so, determining that the master node successfully acquires the distributed lock to execute the fault processing operation of the fault node;
and if not, determining that the main node is a false main node, forbidding the false main node to enter a connection state within a preset duration, and re-executing the operation of selecting the main node from the normal nodes in the group where the fault node is located according to a preset election mechanism.
Further, referring to fig. 3, when a node in the cluster fails, the present application may select a single master node from normal nodes in the group where the failed node is located according to a preset election mechanism (for example, an election mechanism in which a node with the earliest start time is used as a master node, etc.), and if the selected node is already a true master node before the election, the selected node holds a distributed lock (a locking authority that the true master node has).
Based on this, after selecting only one main node from normal nodes in the group where the fault node is located, the method judges whether the selected main node holds a distributed lock, if so, the main node is determined to be a true main node, and the fault processing operation of the fault node can be continuously executed; if the distributed lock is not held, triggering the master node to send distributed lock acquisition requests to other normal nodes in the same group, waiting for the other normal nodes to reply to the master node after receiving the distributed lock acquisition requests, and judging whether the total number of the nodes replying to the master node based on the distributed lock acquisition requests is greater than a preset reply number threshold (the reply number threshold is generally set to 1/2 of the number of all the normal nodes in the group where the master node is located), if so, determining that the master node successfully acquires the distributed lock, taking the role of the master node into effect, and continuing to execute fault processing operation of the fault node; if the number of the selected nodes is not larger than the preset reply number threshold, determining that the master node is a false master node, invalidating the election result, prohibiting the false master node from entering a connection state within a preset duration (only normal nodes in the group can be elected as the master node, and prohibiting the false master node from entering the connection state means that the false master node cannot elect as the master node within a period of time), and then restarting election, namely re-executing the operation of electing the master node from normal nodes in the group where the fault node is located according to a preset election mechanism until a real master node is elected.
As an alternative embodiment, the process of performing the fault handling operation of the faulty node includes:
performing data recovery on the database of the fault node, and synchronizing the database contents of the normal nodes of the group of the fault node based on the database of the data recovery; wherein, the database contents of the nodes in the same group are the same;
and releasing the virtual IP of the fault node, and reallocating the virtual IP of the fault node to a normal node of the group where the fault node is located based on a load balancing strategy so that the normal node replaces the fault node to continue processing the node task.
Specifically, the fault handling operation of the faulty node includes: 1) and (3) database recovery: in consideration of the synchronization of the contents of the databases corresponding to the nodes of the same group, when the failed node fails, the nodes of the same group may not perform a new round of database content synchronization, so that the database of the failed node is subjected to data recovery, and the database contents of the normal nodes of the group where the failed node is located are synchronized based on the database of the data recovery, thereby ensuring the integrity of the database contents of the nodes of the same group. 2) Virtual IP (Internet Protocol) reallocation: and releasing the virtual IP of the fault node, selecting a new node which replaces the fault node to continue processing the node task from the normal nodes of the group where the fault node is located based on a load balancing strategy, and reallocating the virtual IP of the fault node to the selected new node, so that the new node replaces the fault node to continue processing the node task.
As an alternative embodiment, the process of performing the fault handling operation of the faulty node further includes:
and informing all normal nodes of the group of the fault nodes of the fault information of the fault nodes.
Further, the fault handling operation of the faulty node further includes: and the fault notification is to specifically notify the fault information of the fault node to all normal nodes in the group where the fault node is located.
As an optional embodiment, the method for managing nodes in a cluster further includes:
when a deployment instruction representing a cluster mode is received, all nodes in a cluster are divided into the same group;
configuring a configuration file corresponding to each node in the cluster according to the condition that each node in the cluster belongs to the same group;
after configuration files corresponding to the nodes in the cluster are configured, restarting the nodes in the cluster to enable the cluster mode to take effect; wherein, in the cluster mode, the target node is used for monitoring the node states of the rest nodes through message passing with the rest nodes in the cluster.
Further, the cluster mode is set for the nodes in the cluster, that is, all the nodes in the cluster are divided into the same group, and as shown in fig. 4, all the nodes in the cluster monitor each other. Based on the method, when the deployment instruction representing the cluster mode is received, all nodes in the cluster are divided into the same group, and the configuration files corresponding to all nodes in the cluster are configured according to the condition that all nodes in the cluster belong to the same group, so that the nodes in the cluster can be monitored subsequently.
After configuration files corresponding to the nodes in the cluster are configured, the nodes in the cluster need to be restarted, and after the nodes in the cluster are restarted, the cluster mode is enabled to take effect. In the cluster mode, taking a target node as an example, the principle of monitoring the node state is explained as follows: the target node monitors the node states of the other nodes through message transmission with the other nodes in the cluster, specifically, in the cluster, the CTDB service running on each node achieves the purpose of monitoring the node states through the CTDB service running on the node.
It should be noted that when the cluster scale is small (< a preset node number threshold), a cluster mode is adopted to deploy nodes in the cluster; and when the cluster scale is large (more than or equal to a preset node number threshold), deploying the nodes in the cluster by adopting a cluster grouping mode.
The present application further provides a management system for nodes in a cluster, including:
the first grouping module is used for determining the grouping condition of each node in the cluster according to a preset cluster grouping deployment strategy when a deployment instruction for representing a cluster grouping mode is received;
the first configuration module is used for configuring configuration files corresponding to all nodes in the cluster according to the grouping condition of all nodes in the cluster; the target configuration file corresponding to the target node represents a specific node in the same group with the target node; the target node is any node in the cluster;
the first restarting module is used for restarting each node in the cluster after the configuration files corresponding to each node in the cluster are configured so as to enable the cluster grouping mode to take effect; wherein, in the cluster grouping mode, the target node is used for monitoring the node states of the other nodes through message transmission with the other nodes in the same group.
As an optional embodiment, the management system of the nodes in the cluster further includes:
the system comprises an election module, a master node and a slave node, wherein the election module is used for electing the master node from normal nodes of a group where a fault node is located according to a preset election mechanism when a certain node in the cluster fails;
the judging module is used for judging whether the main node holds the distributed lock; if yes, executing the processing module; if not, executing an acquisition module;
the processing module is used for executing fault processing operation of the fault node;
the acquisition module is used for triggering the main node to send distributed lock acquisition requests to other normal nodes in the same group and judging whether the total number of nodes replying to the main node based on the distributed lock acquisition requests is larger than a preset reply number threshold value or not; if yes, executing the processing module; if not, executing a prohibition module;
and the prohibiting module is used for determining that the master node is a false master node, prohibiting the false master node from entering a connection state within a preset duration, and executing the election module again.
As an optional embodiment, the management system of the nodes in the cluster further includes:
the second grouping module is used for dividing all nodes in the cluster into the same group when a deployment instruction for representing the cluster mode is received;
the second configuration module is used for configuring configuration files corresponding to the nodes in the cluster according to the condition that the nodes in the cluster belong to the same group;
the second restarting module is used for restarting each node in the cluster after the configuration files corresponding to each node in the cluster are configured, so that the cluster mode takes effect; wherein, in the cluster mode, the target node is used for monitoring the node states of the rest nodes through message passing with the rest nodes in the cluster.
For introduction of the management system provided in the present application, please refer to the above-mentioned embodiment of the management method, which is not described herein again.
The present application further provides a management apparatus for nodes in a cluster, including:
a memory for storing a computer program;
and the processor is used for implementing the steps of the management method of the nodes in any one of the clusters in the execution of the computer program.
For introduction of the management apparatus provided in the present application, please refer to the embodiments of the management method described above, and the description of the present application is omitted here.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for managing nodes in a cluster is characterized by comprising the following steps:
when a deployment instruction representing a cluster grouping mode is received, determining the grouping condition of each node in a cluster according to a preset cluster grouping deployment strategy;
configuring configuration files corresponding to the nodes in the cluster according to the grouping condition of the nodes in the cluster; the target configuration file corresponding to the target node represents a specific node in the same group with the target node; the target node is any node in the cluster;
restarting each node in the cluster after configuration files corresponding to each node in the cluster are configured, so that the cluster grouping mode takes effect; wherein, in the cluster grouping mode, the target node is configured to monitor node states of the remaining nodes through message passing with the remaining nodes of the same group.
2. The method for managing nodes in a cluster according to claim 1, wherein the process of determining the grouping condition of each node in the cluster according to the preset cluster grouping deployment policy comprises:
and determining the grouping condition of each node in the cluster based on a cluster grouping deployment strategy for dividing the nodes belonging to the same network segment and/or corresponding to the same storage pool into the same group.
3. The method for managing nodes in a cluster according to claim 1, wherein the method for managing nodes in a cluster further comprises:
when a certain node in the cluster fails, selecting a main node from normal nodes in the group of the failed node according to a preset election mechanism;
judging whether the master node holds a distributed lock;
if yes, executing fault processing operation of the fault node;
if not, triggering the main node to send distributed lock acquisition requests to other normal nodes in the same group, and judging whether the total number of nodes replying to the main node based on the distributed lock acquisition requests is larger than a preset reply number threshold value;
if so, determining that the master node successfully acquires the distributed lock to execute the fault processing operation of the fault node;
and if not, determining that the main node is a false main node, forbidding the false main node to enter a connection state within a preset duration, and re-executing the operation of selecting the main node from normal nodes in the group of the fault node according to a preset election mechanism.
4. The method for managing nodes in a cluster according to claim 3, wherein the process of performing the fault handling operation of the faulty node comprises:
performing data recovery on the database of the fault node, and synchronizing the database content of the normal node of the group of the fault node based on the database of the data recovery; wherein, the database contents of the nodes in the same group are the same;
and releasing the virtual IP of the fault node, and reallocating the virtual IP of the fault node to a normal node of the group where the fault node is located based on a load balancing strategy so that the normal node replaces the fault node to continue processing node tasks.
5. The method for managing nodes in a cluster according to claim 4, wherein the process of performing the fault handling operation of the faulty node further comprises:
and informing all normal nodes of the group of the fault nodes of the fault information of the fault nodes.
6. The method for managing nodes in a cluster according to any one of claims 1 to 5, wherein the method for managing nodes in a cluster further comprises:
when a deployment instruction representing a cluster mode is received, all nodes in the cluster are divided into the same group;
configuring a configuration file corresponding to each node in the cluster according to the condition that each node in the cluster belongs to the same group;
restarting each node in the cluster after configuration files corresponding to each node in the cluster are configured, so that the cluster mode takes effect; wherein, in the cluster mode, the target node is configured to monitor node states of the remaining nodes in the cluster through message passing with the remaining nodes.
7. A system for managing nodes in a cluster, comprising:
the first grouping module is used for determining the grouping condition of each node in the cluster according to a preset cluster grouping deployment strategy when a deployment instruction for representing a cluster grouping mode is received;
the first configuration module is used for configuring configuration files corresponding to the nodes in the cluster according to the grouping condition of the nodes in the cluster; the target configuration file corresponding to the target node represents a specific node in the same group with the target node; the target node is any node in the cluster;
the first restarting module is used for restarting each node in the cluster after the configuration files corresponding to each node in the cluster are configured, so that the cluster grouping mode takes effect; wherein, in the cluster grouping mode, the target node is configured to monitor node states of the remaining nodes through message passing with the remaining nodes of the same group.
8. The system for managing nodes in a cluster of claim 7, wherein the system for managing nodes in a cluster further comprises:
the system comprises an election module, a master node and a slave node, wherein the election module is used for electing the master node from normal nodes of a group where a fault node is located according to a preset election mechanism when a certain node in the cluster fails;
the judging module is used for judging whether the main node holds the distributed lock; if yes, executing the processing module; if not, executing an acquisition module;
the processing module is used for executing the fault processing operation of the fault node;
the acquisition module is used for triggering the main node to send distributed lock acquisition requests to other normal nodes in the same group and judging whether the total number of nodes replying to the main node based on the distributed lock acquisition requests is larger than a preset reply number threshold value or not; if yes, executing the processing module; if not, executing a prohibition module;
and the prohibiting module is used for determining that the master node is a false master node, prohibiting the false master node from entering a connection state within a preset duration, and re-executing the election module.
9. The management system for nodes in a cluster according to any of claims 7-8, wherein the management system for nodes in a cluster further comprises:
the second grouping module is used for dividing all nodes in the cluster into the same group when a deployment instruction for representing a cluster mode is received;
the second configuration module is used for configuring configuration files corresponding to the nodes in the cluster according to the condition that the nodes in the cluster belong to the same group;
the second restarting module is used for restarting each node in the cluster after the configuration files corresponding to each node in the cluster are configured, so that the cluster mode takes effect; wherein, in the cluster mode, the target node is configured to monitor node states of the remaining nodes in the cluster through message passing with the remaining nodes.
10. An apparatus for managing nodes in a cluster, comprising:
a memory for storing a computer program;
processor for implementing the steps of the method of managing nodes in a cluster according to any of claims 1 to 6 when executing said computer program.
CN202010738723.7A 2020-07-28 2020-07-28 Method, system and device for managing nodes in cluster Active CN111901422B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010738723.7A CN111901422B (en) 2020-07-28 2020-07-28 Method, system and device for managing nodes in cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010738723.7A CN111901422B (en) 2020-07-28 2020-07-28 Method, system and device for managing nodes in cluster

Publications (2)

Publication Number Publication Date
CN111901422A true CN111901422A (en) 2020-11-06
CN111901422B CN111901422B (en) 2022-11-11

Family

ID=73182254

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010738723.7A Active CN111901422B (en) 2020-07-28 2020-07-28 Method, system and device for managing nodes in cluster

Country Status (1)

Country Link
CN (1) CN111901422B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328581A (en) * 2020-12-30 2021-02-05 望海康信(北京)科技股份公司 ProxySQL automatic operation and maintenance system and method, corresponding equipment and storage medium
CN113127310A (en) * 2021-04-30 2021-07-16 北京奇艺世纪科技有限公司 Task processing method and device, electronic equipment and storage medium
CN113434279A (en) * 2021-07-14 2021-09-24 上海浦东发展银行股份有限公司 Task execution method, device, equipment and storage medium
CN113535517A (en) * 2021-07-23 2021-10-22 烽火通信科技股份有限公司 Controller cluster node management method and device
CN114070739A (en) * 2021-11-11 2022-02-18 杭州和利时自动化有限公司 Cluster deployment method, device, equipment and computer readable storage medium
CN115277379A (en) * 2022-07-08 2022-11-01 北京城市网邻信息技术有限公司 Distributed lock disaster tolerance processing method and device, electronic equipment and storage medium
CN115904822A (en) * 2022-12-21 2023-04-04 长春吉大正元信息技术股份有限公司 Cluster repairing method and device
CN116170346A (en) * 2022-12-27 2023-05-26 天翼云科技有限公司 Monitoring method and device and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030158921A1 (en) * 2002-02-15 2003-08-21 International Business Machines Corporation Method for detecting the quick restart of liveness daemons in a distributed multinode data processing system
US20050132154A1 (en) * 2003-10-03 2005-06-16 International Business Machines Corporation Reliable leader election in storage area network
CN109995842A (en) * 2018-01-02 2019-07-09 北京奇虎科技有限公司 A kind of group technology and device for distributed server cluster
CN110233767A (en) * 2019-07-26 2019-09-13 苏州浪潮智能科技有限公司 Service configuration method, system, device and the readable storage medium storing program for executing of distributed type assemblies
CN110519348A (en) * 2019-08-15 2019-11-29 苏州浪潮智能科技有限公司 A kind of mostly service distributed type assemblies deployment system and method
CN111221700A (en) * 2019-10-31 2020-06-02 北京浪潮数据技术有限公司 Cluster node state monitoring method, device, equipment and readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030158921A1 (en) * 2002-02-15 2003-08-21 International Business Machines Corporation Method for detecting the quick restart of liveness daemons in a distributed multinode data processing system
US20050132154A1 (en) * 2003-10-03 2005-06-16 International Business Machines Corporation Reliable leader election in storage area network
CN109995842A (en) * 2018-01-02 2019-07-09 北京奇虎科技有限公司 A kind of group technology and device for distributed server cluster
CN110233767A (en) * 2019-07-26 2019-09-13 苏州浪潮智能科技有限公司 Service configuration method, system, device and the readable storage medium storing program for executing of distributed type assemblies
CN110519348A (en) * 2019-08-15 2019-11-29 苏州浪潮智能科技有限公司 A kind of mostly service distributed type assemblies deployment system and method
CN111221700A (en) * 2019-10-31 2020-06-02 北京浪潮数据技术有限公司 Cluster node state monitoring method, device, equipment and readable storage medium

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328581A (en) * 2020-12-30 2021-02-05 望海康信(北京)科技股份公司 ProxySQL automatic operation and maintenance system and method, corresponding equipment and storage medium
CN113127310A (en) * 2021-04-30 2021-07-16 北京奇艺世纪科技有限公司 Task processing method and device, electronic equipment and storage medium
CN113127310B (en) * 2021-04-30 2023-09-01 北京奇艺世纪科技有限公司 Task processing method and device, electronic equipment and storage medium
CN113434279A (en) * 2021-07-14 2021-09-24 上海浦东发展银行股份有限公司 Task execution method, device, equipment and storage medium
CN113535517A (en) * 2021-07-23 2021-10-22 烽火通信科技股份有限公司 Controller cluster node management method and device
CN113535517B (en) * 2021-07-23 2022-04-12 烽火通信科技股份有限公司 Controller cluster node management method and device
CN114070739A (en) * 2021-11-11 2022-02-18 杭州和利时自动化有限公司 Cluster deployment method, device, equipment and computer readable storage medium
CN114070739B (en) * 2021-11-11 2024-01-26 杭州和利时自动化有限公司 Cluster deployment method, device, equipment and computer readable storage medium
CN115277379A (en) * 2022-07-08 2022-11-01 北京城市网邻信息技术有限公司 Distributed lock disaster tolerance processing method and device, electronic equipment and storage medium
CN115904822A (en) * 2022-12-21 2023-04-04 长春吉大正元信息技术股份有限公司 Cluster repairing method and device
CN116170346A (en) * 2022-12-27 2023-05-26 天翼云科技有限公司 Monitoring method and device and electronic equipment

Also Published As

Publication number Publication date
CN111901422B (en) 2022-11-11

Similar Documents

Publication Publication Date Title
CN111901422B (en) Method, system and device for managing nodes in cluster
CN111290834B (en) Method, device and equipment for realizing high service availability based on cloud management platform
CN108710544B (en) Process monitoring method of database system and rail transit comprehensive monitoring system
CN102355369B (en) Virtual clustered system as well as processing method and processing device thereof
WO2016106682A1 (en) Post-cluster brain split quorum processing method and quorum storage device and system
CN110830283B (en) Fault detection method, device, equipment and system
US7870425B2 (en) De-centralized nodal failover handling
CN106789141B (en) Gateway equipment fault processing method and device
CN109445927B (en) Task management method and device for storage cluster
CN102394914A (en) Cluster brain-split processing method and device
CN110134518A (en) A kind of method and system improving big data cluster multinode high application availability
CN107480014A (en) A kind of High Availabitity equipment switching method and device
CN112612545A (en) Configuration hot loading system, method, equipment and medium of server cluster
CN110971662A (en) Two-node high-availability implementation method and device based on Ceph
CN110532096B (en) System and method for multi-node grouping parallel deployment
CN108600284B (en) Ceph-based virtual machine high-availability implementation method and system
CN113965576B (en) Container-based big data acquisition method, device, storage medium and equipment
CN108509296B (en) Method and system for processing equipment fault
CN113867915A (en) Task scheduling method, electronic device and storage medium
CN113055203B (en) Method and device for recovering exception of SDN control plane
CN116185697B (en) Container cluster management method, device and system, electronic equipment and storage medium
CN115712521A (en) Cluster node fault processing method, system and medium
CN114036032A (en) Real-time program monitoring method and device
CN114116178A (en) Cluster framework task management method and related device
CN112612652A (en) Distributed storage system abnormal node restarting method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant