CN108847982B

CN108847982B - Distributed storage cluster and node fault switching method and device thereof

Info

Publication number: CN108847982B
Application number: CN201810668234.1A
Authority: CN
Inventors: 孙业宽
Original assignee: Zhengzhou Yunhai Information Technology Co Ltd
Current assignee: Zhengzhou Yunhai Information Technology Co Ltd
Priority date: 2018-06-26
Filing date: 2018-06-26
Publication date: 2021-11-19
Anticipated expiration: 2038-06-26
Also published as: CN108847982A

Abstract

The invention discloses a distributed storage cluster node power-off switching method and a device thereof, which are applied to a main node of a distributed storage cluster, and the method comprises the following steps: detecting the state of each node in the cluster according to the CTDB heartbeat detection mode; after the power-off node is detected, acquiring service information of the power-off node; and sending the service information to normal nodes with corresponding service functions in the distributed storage cluster for each normal node receiving the service information to perform service drift and service recovery according to the service information. The method shortens the detection and recovery process time of the power failure node from the original minute level to the second level, accelerates the speed of recovering the cluster to be normal and recovering the access of the service of the power failure node, and improves the reliability of the cluster; the invention also discloses a distributed storage cluster based on the method.

Description

Distributed storage cluster and node fault switching method and device thereof

Technical Field

The invention relates to the technical field of distributed cluster high availability, in particular to a distributed storage cluster node power-off switching method and a distributed storage cluster node power-off switching device. The invention also relates to a distributed storage cluster.

Background

The distributed storage cluster is a cluster formed by a plurality of storage node servers, supports that a piece of data is stored on a plurality of nodes, each node can acquire complete data through communication among the nodes, and can recover the complete data according to a configured strategy when the node goes down, and the distributed storage cluster comprises service modules such as a monitoring module, a storage pool module and a metadata management module.

When a distributed storage cluster runs, a part of nodes may have faults such as power line looseness or power line unplugging to cause node power failure, and at the moment, if the number of power failure nodes is within the range of the number of power failure nodes allowed by the cluster (namely, the redundancy number of the nodes of the cluster), the distributed storage cluster recovers to be normal and continues to provide normal access of services, and time of minute level is required, because whether each node is powered off is determined by each service module through heartbeat detection at present, and the heartbeat detection precision of the service module is of minute level, namely, more than 60s (because the heartbeat detection precision is lower than 60s, the cluster is vibrated), at present, whether node power failure occurs or not needs to be determined through the time of more than 60s, and then cluster recovery, service recovery of the power failure nodes and the like are performed.

Therefore, in the current node power failure detection and recovery process, the cluster cannot quickly detect the power failure fault, and further cannot quickly perform cluster recovery and recover service access on the power failure node, so that service interruption time is long, and cluster reliability is poor.

Therefore, how to provide a distributed storage cluster node power-off switching method with high reliability, an apparatus thereof, and a distributed storage cluster are problems that need to be solved by those skilled in the art at present.

Disclosure of Invention

The invention aims to provide a distributed storage cluster node power-off switching method and a device thereof, which shorten the detection recovery process time of a power-off node from the original minute level to the second level, accelerate the speed of recovering the cluster to be normal and recovering the service of the power-off node to access, and improve the reliability of the cluster; another object of the present invention is to provide a distributed storage cluster based on the above method.

In order to solve the above technical problem, the present invention provides a distributed storage cluster node power-off switching method, which is applied to a master node of a distributed storage cluster, and the method includes:

detecting the state of each node in the cluster according to a heartbeat detection mode of a CTDB lightweight cluster database;

after detecting that a node is powered off, acquiring service information of the powered-off node;

and sending the service information to normal nodes with corresponding service functions in the distributed storage cluster, so that each normal node receiving the service information can perform service drift and service recovery according to the service information.

Preferably, after detecting that there is a node that is powered off, before acquiring service information of the powered-off node, the method further includes:

and judging whether the power-off node is obtained through heartbeat detection, and if so, acquiring service information of the power-off node.

Preferably, the service information includes a virtual IP.

Preferably, the service information further includes service cache data.

Preferably, the process of sending the service information to the normal node with the corresponding service function in the distributed storage cluster specifically includes:

calling a failover program in the distributed storage cluster;

selecting normal nodes containing each service function;

and sending the service information to the selected node.

Preferably, the service functions include a monitoring function, a storage pool function, and a metadata management function.

Preferably, the process of detecting the node state according to the CTDB heartbeat detection method specifically includes:

sending a plurality of heartbeat packets to each node in the distributed storage cluster in each heartbeat detection period;

and judging whether responses returned by all the nodes are received within preset time, and if the nodes which do not return responses exist, determining the nodes which do not return responses to be power-off nodes.

In order to solve the above technical problem, the present invention further provides a distributed storage cluster node power-off switching apparatus, which is applied to a master node of a distributed storage cluster, and the apparatus includes:

the state monitoring module is used for detecting the state of each node in the cluster according to the CTDB heartbeat detection mode;

the information acquisition module is used for acquiring the service information of the power-off node after detecting that the node is powered off;

and the sending module is used for sending the service information to normal nodes with corresponding service functions in the distributed storage cluster, and carrying out service drifting and service recovery by each normal node receiving the service information according to the service information.

In order to solve the technical problem, the invention also provides a distributed storage cluster, which comprises a plurality of nodes with CTDB functions, wherein one node is selected from the plurality of nodes as a main node; the master node includes:

a memory for storing a computer program;

a processor for implementing the steps of the distributed storage cluster node power-down switching method as described in any one of the above when executing the computer program.

Preferably, the nodes other than the master node are specifically configured to:

and performing self service recovery operation in parallel and performing service drifting operation according to the service information.

The invention provides a distributed storage cluster node power-off switching method and a device thereof, which utilize the heartbeat detection of CTDB to detect whether a power-off node exists in a cluster, and then if the power-off node is detected, the service information of the power-off node is obtained and sent to a normal node with a corresponding service function (or can be understood as having a corresponding service module) for the normal node receiving the service information to carry out service drift and recover the service access on the power-off node. It can be understood that, because the heartbeat detection time precision of the CTDB is in the second level, that is, usually in several seconds, the heartbeat detection of the CTDB can quickly detect whether there is a node outage, and send the service information of the outage node into the normal node, so that the normal node can timely perform data recovery and service drift, that is, the detection recovery process time of the outage node is shortened from the original minute level to the second level, thereby accelerating the speed of recovering the normal state of the cluster and recovering the access of the service of the outage node, so as to shorten the terminal time of the service as much as possible, and improve the reliability of the cluster. The invention also provides a distributed storage cluster based on the method, and the distributed storage cluster also has the advantages.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed in the prior art and the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a flowchart illustrating a procedure of a distributed storage cluster node power-off switching method according to the present invention;

fig. 2 is a schematic structural diagram of a distributed storage cluster node power-off switching apparatus provided in the present invention.

Detailed Description

The core of the invention is to provide a distributed storage cluster node power-off switching method and a device thereof, which shorten the detection recovery process time of a power-off node from the original minute level to the second level, accelerate the speed of recovering the cluster to be normal and recovering the service of the power-off node to access, and improve the reliability of the cluster; the other core of the invention is to provide a distributed storage cluster based on the method.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a distributed storage cluster node power-off switching method, which is applied to a master node of a distributed storage cluster, and is shown in fig. 1, wherein fig. 1 is a flow chart of a process of the distributed storage cluster node power-off switching method provided by the invention; the method comprises the following steps:

step s 1: detecting the state of each node in the cluster according to the CTDB heartbeat detection mode;

it is understood that a CTDB (clustered virtual database) is a highly available management software for cluster for monitoring cluster node status and traffic distribution. Generally, a distributed storage cluster with a CTDB function has CTDB software installed in each cluster node, so that each node can perform heartbeat detection according to the CTDB, and detection results between different nodes are interacted. All nodes in the distributed storage cluster with the CTDB function select a main node, and at the moment, fault recovery operation (such as virtual IP allocation and the like) is only performed by the main node.

In the current distributed storage cluster, the CTDB has a heartbeat detection function, and the time precision of detection is also in the second level, but the detection result of the CTDB is not applied to cluster and service restoration after the node is powered off, and whether the cluster and the service restoration operation of the powered-off node is performed in the prior art is performed according to the detection result of the service module in the node, and the time precision level is in the minute level, and the time is long, and the efficiency is low, that is, the heartbeat detection of the CTDB and the service restoration of the powered-off node are two sets of mutually unrelated processes in the prior art. In the invention, the heartbeat detection of the CTDB and the service recovery of the power-off node are linked together, the subsequent recovery operation of the cluster and the power-off node service is controlled according to the heartbeat detection result of the CTDB, the recovery operation time of the cluster and the power-off node service is shortened to the second level, and the cluster recovery efficiency and reliability are improved. Step s 2: after the power-off node is detected, acquiring service information of the power-off node;

after a node is powered off, the service on the powered-off node is definitely interrupted, and in order to recover the normal access of the service as soon as possible, the service needs to be migrated (or switched) to other normal nodes to operate, so that information of the service operating on the powered-off node needs to be determined, and a suitable node is conveniently selected and the service is migrated. Since the power-off node is powered off at this time, the service information of the power-off node is usually obtained from the master node, because the master node is responsible for service allocation, and therefore, the master node stores information about services running in each node.

Step s 3: and sending the service information to normal nodes with corresponding service functions in the distributed storage cluster for each normal node receiving the service information to perform service drift and service recovery according to the service information.

It is understood that a distributed storage cluster includes a plurality of nodes, and the nodes are associated with each other to collectively perform processing of a service, so that different nodes may have different functions, that is, different nodes may include the same service module or different service modules. When service drifting is carried out, in order to ensure that the service can normally operate subsequently, various service functions required by the service operation need to be determined firstly, then service information is distributed to normal nodes with the service functions, the service of the power-off node is drifted to the normal nodes, and after the normal nodes are recovered to be normal subsequently, the access execution of the service can be completed together according to the service information.

In addition, the service restoration may also be understood as node restoration, and because the nodes in the distributed storage cluster are associated with each other, once a node fails and is powered off, other nodes are also affected and cannot work normally, and at this time, in order to make the service of the powered-off node run at other nodes, the service needs to be migrated to other nodes, and the configuration of the nodes needs to be adjusted to be restored to a normal working state; moreover, if there is a special requirement for the service to be migrated, when the nodes are subjected to recovery operation, the configuration data thereof needs to be adjusted so that the nodes can support the operation of the migrated service.

The service operation needs to complete service drift and service recovery firstly, so the time for successfully switching the service to other nodes is based on the time when the service drift and the service recovery are both completed. If the service drift and the service recovery are performed in series, the service switching time is equal to the sum of the service drift and the service recovery, and if the service drift and the service recovery are performed in parallel, the service switching time is based on the longer time of the service drift and the service recovery.

Experiments show that through the operation, the power failure detection can be controlled within 10 seconds generally, and the service drift and the node service recovery time are about 10 seconds basically, so that the overall completion time of service switching is controlled within 30 seconds, the service recovery time is shorter than that of the current minute-level service switching, and the reliability and the stability of the cluster are improved.

Wherein the service functions include a monitoring function, a storage pool function, and a metadata management function.

It can be understood that the cluster is required to operate normally, the storage and metadata management functions are indispensable, and in order to find out problems existing in the service operation in time, a monitoring function is also required to monitor the operation condition of the service. Of course, other service functions may also be included in the distributed storage cluster, and the present invention is not limited thereto.

In a preferred embodiment, after the power-off node is detected, before the service information of the power-off node is acquired, the method further includes:

and judging whether the power failure node is obtained through heartbeat detection, and if so, acquiring service information of the power failure node.

It can be understood that, although the invention adopts heartbeat detection of the CTDB to detect whether there is a node power failure, the finally obtained power failure node may not be obtained by heartbeat detection, because the CTDB may also determine that the node where the CTDB is located is the power failure node when executing the stop or restart command, at this time, it is obviously wrong, and therefore, in order to distinguish this situation, before acquiring the service information of the power failure node, it is necessary to first distinguish whether the power failure node is obtained by heartbeat detection, because only the failed node detected by the heartbeat detection function is the power failure node, otherwise, the failed node is not processed. The specific implementation mode is that a flag bit is added in a power-off node identifier detected by heartbeat, and then the node identifier can be distinguished by judging whether the detected node contains the flag bit. Of course, the above is only one implementation manner, and whether the node is a power-off node may also be determined by other manners, which is not limited in the present invention.

In a particular embodiment, the traffic information includes virtual IP.

It is understood that for a distributed storage cluster, virtual IPs are one-to-one correspondence to traffic, and the CTDB master node is responsible for the allocation of virtual IPs. When a cluster node fails, in order to ensure normal access to the service on the node, the virtual IP allocated to the node is migrated to another node, and then the service of the node is migrated to another node along with the virtual IP, thereby ensuring high availability of the cluster.

Of course, for most distributed storage clusters, the service drift only needs virtual IP, but in some cases, the service drift may also be implemented according to other parameters, which is not limited by the present invention.

In addition, the service information also comprises service cache data. For some cases, the continuous access of the service may require previous data, and at this time, the continuous access of the service may not be completed only by the virtual IP, so that the service information needs to include service cache data. Of course, the service information may also include a host number of the power-off node, and the specific content of the service information is not limited in the present invention.

In a specific embodiment, the process of sending the service information to the normal node having the corresponding service function in the distributed storage cluster specifically includes:

calling a fault switching program in the distributed storage cluster;

selecting normal nodes containing each service function;

and sending the service information to the selected node.

It can be understood that, in the current distributed storage cluster, a failover program is usually provided in each node, and as long as one node calls the program of itself, because there is data interaction between nodes in the cluster, the programs in other nodes will run to perform service switching operation, so that the main node directly calls the self fault switching program after acquiring service information, a program in the other node, which has the function of node selection, is also started, and is ready for service switching, and, after the program is called by the master node, the program itself selects a suitable node, and the master node sends the service information to the selected node, since the failover procedure in the selected node has already been started, the node can start service switching (service drift, service restoration, etc.) as soon as it receives the service information.

Of course, if the cluster is not provided with the failover program with the above functions, the master node may analyze the setting condition of each node in the cluster by itself, and then select the corresponding node. Specifically, which way to select the node receiving the service information is adopted, the present invention is not limited.

The process of detecting the node state according to the CTDB heartbeat detection mode specifically comprises the following steps:

and judging whether responses returned by all the nodes are received within preset time, and if the nodes which do not return responses exist, determining the nodes which do not return responses as power-off nodes.

The preset time generally corresponds to the heartbeat detection period, but since the transmission and reception of the signal require time, the heartbeat detection period is preferably slightly longer than the heartbeat detection period.

For example, assuming that each heartbeat detection period is 4 seconds (the time interval between two heartbeat periods is not limited in the present invention), a heartbeat packet is sent every 2 seconds, and is sent for 2 times in total, and if the opposite node does not receive the heartbeat packet within 4 seconds, the node is considered to be faulty; or the heartbeat detection period can also be 8 seconds, namely a heartbeat packet is sent once every 2 seconds and is sent 4 times, so that the fault misjudgment caused by too small heartbeat is avoided. Of course, the length of the heartbeat detection period and the transmission frequency of the heartbeat packet are not limited in the present invention.

The invention provides a distributed storage cluster node power-off switching method, which utilizes the heartbeat detection of CTDB to detect whether a power-off node exists in a cluster, and then if the power-off node is detected, the service information of the power-off node is obtained and sent to a normal node with a corresponding service function (or can be understood as having a corresponding service module) for the normal node receiving the service information to carry out service drift and recover the service access on the power-off node. It can be understood that, because the heartbeat detection time precision of the CTDB is in the second level, that is, usually in several seconds, the heartbeat detection of the CTDB can quickly detect whether there is a node outage, and send the service information of the outage node into the normal node, so that the normal node can timely perform data recovery and service drift, that is, the detection recovery process time of the outage node is shortened from the original minute level to the second level, thereby accelerating the speed of recovering the normal state of the cluster and recovering the access of the service of the outage node, so as to shorten the terminal time of the service as much as possible, and improve the reliability of the cluster.

The invention further provides a distributed storage cluster node power-off switching device, which is applied to a master node of a distributed storage cluster, and as shown in fig. 2, fig. 2 is a schematic structural diagram of the distributed storage cluster node power-off switching device provided by the invention.

The device includes:

the state monitoring module 1 is used for detecting the state of each node in the cluster according to the CTDB heartbeat detection mode;

the information acquisition module 2 is used for acquiring the service information of the power-off node after the power-off node is detected;

and the sending module 3 is used for sending the service information to normal nodes with corresponding service functions in the distributed storage cluster, so that each normal node receiving the service information can perform service drift and service recovery according to the service information.

The invention also provides a distributed storage cluster which comprises a plurality of nodes with CTDB functions, wherein one node is selected from the nodes as a main node; the master node includes:

a memory for storing a computer program;

a processor for implementing the steps of the distributed storage cluster node power down switching method as any one of the above when executing the computer program.

In a preferred embodiment, the nodes other than the master node are specifically configured to:

and performing self service recovery operation and service drifting operation according to the service information in parallel.

It will be appreciated that since both the traffic restoration operation and the traffic drift operation do not interfere, the parallel operation can reduce the time for traffic switching as much as possible compared to the serial operation.

The above embodiments are only preferred embodiments of the present invention, and the above embodiments can be combined arbitrarily, and the combined embodiments are also within the scope of the present invention. It should be noted that other modifications and variations that may suggest themselves to persons skilled in the art without departing from the spirit and scope of the invention are intended to be included within the scope of the invention as defined by the appended claims.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A distributed storage cluster node power-off switching method is applied to a master node of a distributed storage cluster, and comprises the following steps:

after the node power failure is detected, judging whether the power failure node is obtained through heartbeat detection, if so, acquiring service information of the power failure node; if not, not processing;

sending the service information to normal nodes with corresponding service functions in the distributed storage cluster, and enabling each normal node receiving the service information to perform service drift and service recovery according to the service information;

the process of sending the service information to the normal nodes with the corresponding service functions in the distributed storage cluster specifically includes:

calling a failover program in the distributed storage cluster;

selecting normal nodes containing each service function;

sending the service information to the selected node;

judging whether responses returned by all nodes are received within preset time, if the nodes which do not return responses exist, the nodes which do not return responses are power-off nodes;

the fault switching program is a program which is pre-deployed in each node and has a function of selecting nodes; the service information comprises a virtual IP and service cache data; the service functions include a monitoring function, a storage pool function, and a metadata management function.

2. A distributed storage cluster node power-off switching apparatus, applied to a master node of the distributed storage cluster, the apparatus comprising:

the state monitoring module is used for detecting the state of each node in the cluster according to the CTDB heartbeat detection mode, and specifically comprises the following steps: sending a plurality of heartbeat packets to each node in the distributed storage cluster in each heartbeat detection period; judging whether responses returned by all nodes are received within preset time, if the nodes which do not return responses exist, the nodes which do not return responses are power-off nodes;

the information acquisition module is used for judging whether the power failure node is obtained through heartbeat detection after detecting that the node is powered off, and if so, acquiring service information of the power failure node; if not, not processing;

a sending module, configured to send the service information to normal nodes in the distributed storage cluster having corresponding service functions, so that each normal node receiving the service information performs service drift and service recovery according to the service information, where the sending module specifically includes: calling a failover program in the distributed storage cluster; selecting normal nodes containing each service function; sending the service information to the selected node;

3. A distributed storage cluster is characterized by comprising a plurality of nodes with CTDB functions, wherein one node is selected from the plurality of nodes as a main node; the master node includes:

a memory for storing a computer program;

a processor for implementing the steps of the distributed storage cluster node power-down switching method of claim 1 when executing the computer program.

4. The distributed storage cluster of claim 3, wherein the nodes other than the master node are specifically configured to: