CN111212127A

CN111212127A - Storage cluster, service data maintenance method, device and storage medium

Info

Publication number: CN111212127A
Application number: CN201911386440.4A
Authority: CN
Inventors: 史宗华; 何营
Original assignee: Inspur Electronic Information Industry Co Ltd
Current assignee: Inspur Electronic Information Industry Co Ltd
Priority date: 2019-12-29
Filing date: 2019-12-29
Publication date: 2020-05-29

Abstract

The application discloses a storage cluster and a service data maintenance method, a device and a computer readable storage medium, wherein the method is applied to each storage node in the storage cluster, and each storage node runs a network-attached storage service based on a deployed CTDB; the method comprises the following steps: calling a network card monitoring service preset in the CTDB so as to periodically detect the network state of the storage node; if the storage node network is abnormal, generating a kill command based on the CTDB to end a residual process of the network-attached storage service operated on the storage node so as to clear the currently cached business data; continuing to call the network card monitoring service to periodically detect the network state of the storage node; and after the storage node network is recovered, restarting the network attached storage service on the storage node. The method and the device effectively avoid the problem that the cache data of the residual process causes the data inconsistency of the storage nodes, and further improve the data storage reliability of the storage cluster.

Description

Storage cluster, service data maintenance method, device and storage medium

Technical Field

The present application relates to the field of cluster storage technologies, and in particular, to a method and an apparatus for maintaining a storage cluster and service data, and a computer-readable storage medium.

Background

In the society of today, with the rise of cloud computing and big data, the amount of data generated every day is exponentially increased; the traditional storage can not meet the requirement, and the distributed mass storage supporting dynamic capacity expansion is produced.

CTDB (Database Cluster) is a lightweight Cluster Database implementation, is a Cluster Database component of the Cluster Samba, and is commonly used for processing Samba cross-node messages. The CTDB is a TDB database implemented in a distributed manner on a cluster node, and is effective for ensuring high availability of a Storage service, and in particular, may be specifically applied to a Network Attached Storage service (NAS).

For network attached storage services, it is one of the conditions under which the storage node's network remains unobstructed. Therefore, in the prior art, a certain storage node is often inconsistent with client data due to network anomaly. In view of the above, it is an important need for those skilled in the art to provide a solution to the above technical problems.

Disclosure of Invention

The application aims to provide a storage cluster, a service data maintenance method, a service data maintenance device and a computer readable storage medium, so that the problem of inconsistent service data is effectively solved, and the data storage reliability of the storage cluster is improved.

In order to solve the above technical problem, in a first aspect, the present application discloses a service data maintenance method, which is applied to each storage node in a storage cluster, where each storage node operates a network-attached storage service based on a deployed CTDB; the method comprises the following steps:

calling a network card monitoring service preset in the CTDB so as to periodically detect the network state of the storage node;

if the storage node network is abnormal, generating a kill command based on the CTDB to end a residual process of the network-attached storage service operated on the storage node so as to clear the currently cached business data;

continuing to call the network card monitoring service to periodically detect the network state of the storage node;

and after the storage node network is recovered, restarting the network attached storage service on the storage node.

Optionally, the determining process of the storage node network anomaly includes:

and if the periodic network state detection results of the continuous preset number of times are all abnormal, judging that the storage node network is abnormal.

Optionally, after the invoking a network card monitoring service preset in the CTDB to periodically detect the network state of the storage node, the method further includes:

if the storage node network is normal, calling a service state monitoring service preset in the CTDB so as to periodically detect the running state of the network-attached storage service of the storage node;

and if the network attached storage service of the storage node stops running, restarting the network attached storage service on the storage node.

if the storage node network is normal, sending heartbeat signals to other storage nodes at regular time; after other storage nodes detect the interruption of the heartbeat signal of the storage node, the proxy node with normal network state is elected by triggering the preset service switching process in the CTDB, and the task of the network-attached storage service of the storage node is replaced and executed by the proxy node.

Optionally, after the network of the storage node is recovered, and the network attached storage service is restarted on the storage node, the method further includes:

continuously sending heartbeat signals to other storage nodes at regular time; and the proxy node switches the task of the network-attached storage service, which is executed by replacing the storage node, to the storage node by triggering the service switching process again after other storage nodes detect that the heartbeat signal of the storage node is recovered.

Optionally, after the network of the storage node is abnormal, the method further includes:

and modifying the network state identifier of the storage node from a normal state to a fault state.

In a second aspect, the present application further discloses a service data maintenance apparatus, which is applied to each storage node in a storage cluster, where each storage node runs a network-attached storage service based on a deployed CTDB; the device comprises:

the network card monitoring module is used for calling network card monitoring service which is preset in the CTDB so as to periodically detect the network state of the storage node;

the cache clearing module is used for generating a kill command based on the CTDB to end a residual process of network-attached storage service running on the storage node when the network of the storage node is abnormal so as to clear current cached business data; the network card monitoring module continues to call the network card monitoring service to periodically detect the network state of the storage node;

and the service restarting module is used for restarting the network attached storage service on the storage node after the storage node network is recovered.

Optionally, the method further comprises:

the service monitoring module is used for calling service state monitoring service preset in the CTDB when the network of the storage node is normal so as to periodically detect the running state of the network-attached storage service of the storage node;

the service restart module is further configured to: and when the network of the storage node is normal and the network attached storage service stops running, restarting the network attached storage service on the storage node.

Optionally, the network card monitoring module is specifically configured to:

Optionally, the method further comprises:

the heartbeat signal module is used for sending heartbeat signals to other storage nodes at regular time if the network of the storage node is normal after the service monitoring module periodically detects the network state of the storage node; after other storage nodes detect the interruption of the heartbeat signal of the storage node, the proxy node with normal network state is elected by triggering the preset service switching process in the CTDB, and the task of the network-attached storage service of the storage node is replaced and executed by the proxy node.

Optionally, the heartbeat signal module is further configured to:

after the storage node network is recovered, after network-attached storage service is started on the storage node again, the heartbeat signal is continuously sent to other storage nodes at regular time; and the proxy node switches the task of the network-attached storage service, which is executed by replacing the storage node, to the storage node by triggering the service switching process again after other storage nodes detect that the heartbeat signal of the storage node is recovered.

Optionally, the method further comprises:

the state identification module is used for modifying the network state identification of the storage node from a normal state to a fault state after the network of the storage node is abnormal; and after the storage node network is recovered, modifying the network state identifier of the storage node from the fault state to the normal state.

In a third aspect, the present application further discloses a storage cluster, including a plurality of storage nodes, where each of the storage nodes includes:

a memory for storing a computer program;

a processor for executing the computer program to implement the steps of any of the service data maintenance methods described above.

In a fourth aspect, the present application further discloses a computer-readable storage medium, in which a computer program is stored, and the computer program is used to implement the steps of any one of the service data maintenance methods described above when executed by a processor.

The service data maintenance method provided by the application is applied to each storage node in a storage cluster, and each storage node runs a network-attached storage service based on the deployed CTDB; the method comprises the following steps: calling a network card monitoring service preset in the CTDB so as to periodically detect the network state of the storage node; if the storage node network is abnormal, generating a kill command based on the CTDB to end a residual process of the network-attached storage service operated on the storage node so as to clear the currently cached business data; continuing to call the network card monitoring service to periodically detect the network state of the storage node; and after the storage node network is recovered, restarting the network attached storage service on the storage node.

Therefore, the network monitoring service preset in the CTDB can effectively monitor the network state of the storage node, and further timely clear the residual process of the network-attached storage service after the network is abnormal, so that the problem that the cache data of the residual process causes the data inconsistency of the storage node is effectively avoided, and the data storage reliability of the storage cluster is improved. The service data maintenance device, the storage cluster and the computer-readable storage medium provided by the application also have the beneficial effects.

Drawings

In order to more clearly illustrate the technical solutions in the prior art and the embodiments of the present application, the drawings that are needed to be used in the description of the prior art and the embodiments of the present application will be briefly described below. Of course, the following description of the drawings related to the embodiments of the present application is only a part of the embodiments of the present application, and it will be obvious to those skilled in the art that other drawings can be obtained from the provided drawings without any creative effort, and the obtained other drawings also belong to the protection scope of the present application.

Fig. 1 is a flowchart of a service data maintenance method disclosed in an embodiment of the present application;

fig. 2 is a schematic diagram of a service switching process after a storage node network is abnormal according to an embodiment of the present application;

fig. 3 is a schematic diagram of a service switching process after a storage node network is restored according to an embodiment of the present disclosure;

fig. 4 is a block diagram of a service data maintenance apparatus according to an embodiment of the present application;

fig. 5 is a block diagram of a storage node in a storage cluster according to an embodiment of the present disclosure.

Detailed Description

The core of the application is to provide a storage cluster, a service data maintenance method, a service data maintenance device and a computer-readable storage medium, so as to effectively solve the problem of inconsistent service data and improve the data storage reliability of the storage cluster.

In order to more clearly and completely describe the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Currently, with the rise of cloud computing and big data, the amount of data generated each day is exponentially growing; the traditional storage can not meet the requirement, and the distributed mass storage supporting dynamic capacity expansion is produced.

CTDB (Database Cluster) is a lightweight Cluster Database implementation, is a Cluster Database component of the Cluster Samba, and is commonly used for processing Samba cross-node messages. The CTDB is a distributed TDB database implemented on a cluster node, and may be effectively used to ensure high availability of storage services, such as performing functions of node monitoring, node switching, IP switching, and the like. In particular, it is applicable to Network attached storage service (NAS).

For network attached storage services, it is one of the conditions under which the storage node's network remains unobstructed. Therefore, in the prior art, a certain storage node is often inconsistent with client data due to network anomaly.

Specifically, after the client requests the storage node for data storage, if the storage node is only a network failure (hardware devices such as a power supply and a network card are normal), the storage node may still retain some residual processes of the network-attached storage service and have certain cache data, although the storage node cannot normally operate the network-attached storage service. In this way, after the network of the storage node is restored, the client initiates a data storage request to the storage node again, which is affected by the previously cached data, and at this time, the data of the storage node is inconsistent with the data requested by the client. In view of this, the present application provides a service data maintenance scheme, which can effectively solve the above problems.

Referring to fig. 1, an embodiment of the present application discloses a service data maintenance method, which is applied to each storage node in a storage cluster, where each storage node runs a network-attached storage service based on a deployed CTDB; the method comprises the following steps:

s101: and calling a network card monitoring service preset in the CTDB so as to periodically detect the network state of the storage node.

Specifically, the service data maintenance method provided in the embodiment of the present application may be specifically applied to each storage node deployed with a CTDB. Because the network attached storage service running on the storage node needs the support of the network, and the data inconsistency of the storage node is likely to occur due to network interruption and other abnormalities, the embodiment of the application specifically sets the network card monitoring service in the CTDB in advance so as to detect the network state of the storage node.

In particular, the detection of the network state may be repeated periodically. Based on the network card monitoring service, the network status can be checked once every certain checking period, and the checking period can be set reasonably according to the actual situation, for example, can be specifically set to 2 s.

S102: if the storage node network is abnormal, generating a kill command based on the CTDB to end a residual process of the network attached storage service operated on the storage node so as to clear the currently cached business data.

Specifically, if the network of the storage node is abnormal, such as network disconnection and blocking, the network-attached storage service cannot operate normally, and problems such as communication interruption and data not being dropped occur. Therefore, once the network of the storage node is found to be abnormal, the embodiment of the application can terminate the residual processes of the network attached storage service by using the kill command, so as to remove the cache data of the residual processes and prevent the problem of data inconsistency caused by the next restart of the network attached storage service by the cached service data.

S103: and continuing to call the network card monitoring service to periodically detect the network state of the storage node.

Specifically, after the residual process of the network attached storage service of the storage node is terminated, the network state of the storage node may be continuously and periodically monitored, so that the network attached storage service is restarted after the network is recovered.

S104: and after the storage node network is recovered, restarting the network attached storage service on the storage node.

It is easily understood that, based on step S102, in the embodiment of the present application, after the communication between the client and the storage node is interrupted or delayed due to a network anomaly, the residual process is closed, and the cached data is cleared, so that when the network attached storage service is restarted, the storage node can maintain data consistency with the client.

The service data maintenance method provided by the embodiment of the application is applied to each storage node in a storage cluster, and each storage node runs a network-attached storage service based on the deployed CTDB; the method comprises the following steps: calling a network card monitoring service preset in the CTDB so as to periodically detect the network state of the storage node; if the storage node network is abnormal, generating a kill command based on the CTDB to end a residual process of the network-attached storage service operated on the storage node so as to clear the currently cached business data; continuing to call the network card monitoring service to periodically detect the network state of the storage node; and after the storage node network is recovered, restarting the network attached storage service on the storage node.

Therefore, the network monitoring service preset in the CTDB can effectively monitor the network state of the storage node, and further timely clear the residual process of the network-attached storage service after the network is abnormal, so that the problem that the cache data of the residual process causes the data inconsistency of the storage node is effectively avoided, and the data storage reliability of the storage cluster is improved.

As a specific embodiment, in the service data maintenance method provided in the embodiment of the present application, on the basis of the foregoing content, a determination process of a network anomaly of the storage node includes:

Specifically, in order to prevent erroneous judgment of the network state, judgment may be performed by integrating the judgment results of a predetermined number of times. For example, the preset number may be specifically 4 times, that is, if the determination results in 4 consecutive detection periods are all network anomalies, it may be determined that the storage node is an anomaly in the network.

As a specific embodiment, the method for maintaining service data provided in this embodiment of the present application, based on the foregoing content, after invoking a network card monitoring service preset in the CTDB to periodically detect a network state of the storage node, further includes:

Specifically, in this embodiment, a service status monitoring service is further set in the CTDB, and is used to periodically detect the operating status of the network-attached storage service of the storage node. Specifically, as before, if the storage node is only a network failure (hardware devices such as power supply and network card are normal), the network attached storage service of the storage node will not operate normally but will leave a residual process. However, if hardware devices such as the power supply and the network card of the storage node have hardware faults (e.g., power failure), all the network-attached storage services cannot be started and stop running. At this time, once the service state monitoring service monitors such a situation, it may try to restart the network attached storage service on the local storage node after the failure is recovered.

if the storage node network is normal, sending heartbeat signals to other storage nodes at regular time; after other storage nodes detect the interruption of the heartbeat signal of the storage node, the proxy node with normal network state is elected by triggering the service switching process preset in the CTDB, and the task of the network-attached storage service of the storage node is replaced and executed by the proxy node.

Specifically, the present embodiment further sets a service switching flow based on the CTDB. If the storage node is in network abnormity, other storage nodes can elect proxy nodes based on the service switching process to replace the storage node to run the task of the storage node.

Referring to fig. 2, fig. 2 is a schematic view of a service switching process after a storage node network is abnormal according to an embodiment of the present disclosure. Fig. 2 specifically shows the node 1 in which the network failure occurs, and two other storage nodes: a master node 2 and a node 3. After the service switching process is executed, the main node 2 is selected as a proxy node, and the task of the node 1 is switched to the main node 2 to be executed.

As a specific embodiment, on the basis of the foregoing, after the storage node network is restored and the network-attached storage service is restarted on the storage node, the method for maintaining the service data provided in the embodiment of the present application further includes:

after the storage node network is recovered, continuously and regularly sending heartbeat signals to other storage nodes; so that other storage nodes trigger the service switching process again after detecting that the heartbeat signal of the storage node is recovered;

continuously sending heartbeat signals to other storage nodes at regular time; after other storage nodes detect that the heartbeat signal of the storage node is recovered, the proxy node switches the task of the network-attached storage service, which is executed by replacing the storage node, to the storage node by triggering the service switching process again.

Specifically, after the network is restored, the storage node may continue to send the heartbeat signal to notify other storage nodes. Based on cluster load balancing control, other storage nodes can switch and return tasks before the storage node to the storage node through a service switching process.

Referring to fig. 3, fig. 3 is a schematic view illustrating a service switching process after a storage node network is restored according to an embodiment of the present disclosure. After the network fault of the node 1 is solved, the network is recovered to be normal, and after the main node 2 and the node 3 execute the service switching process again, the original task of the node 1 is switched back to the node 1 to be executed.

As a specific embodiment, the method for maintaining service data provided in the embodiment of the present application, based on the foregoing content, further includes, after the network of the storage node is abnormal: modifying the network state identifier of the storage node from a normal state to a fault state;

after the storage node network is recovered, the method further comprises the following steps: and modifying the network state identifier of the storage node from a fault state to a normal state.

Specifically, the embodiment of the present application further sets a network status identifier for each storage node, where the network status identifier has two identification states: a normal state and a fault state. Therefore, after the storage node network is determined to be abnormal, the storage node network can be set to a fault state. Similarly, after the storage node network is judged to be recovered, the storage node network can be set to be in a normal state.

Referring to fig. 4, an embodiment of the present application discloses a service data maintenance apparatus, which is applied to each storage node in a storage cluster, where each storage node runs a network-attached storage service based on a deployed CTDB; the device comprises:

a network card monitoring module 201, configured to invoke a network card monitoring service preset in the CTDB, so as to periodically detect a network state of the storage node;

the cache clearing module 202 is configured to generate a kill command based on the CTDB to end a residual process of a network-attached storage service running on the storage node when the network of the storage node is abnormal, so as to clear currently cached service data; the network card monitoring module 201 continues to call the network card monitoring service to periodically detect the network state of the storage node;

and the service restarting module 203 is used for restarting the network attached storage service on the storage node after the storage node network is recovered.

It can be seen that the service data maintenance device disclosed in the embodiment of the present application, based on the network card monitoring service preset in the CTDB, can effectively monitor the network state of the storage node, and then timely clean the residual process of the network attached storage service after the network is abnormal, thereby effectively avoiding the problem that the cache data of the residual process causes the storage node to have data inconsistency, and further improving the data storage reliability of the storage cluster

For the specific content of the service data maintenance device, reference may be made to the foregoing detailed description of the service data maintenance method, and details thereof are not repeated here.

As a specific embodiment, the service data maintenance apparatus disclosed in the embodiment of the present application further includes, on the basis of the foregoing content:

the service monitoring module is used for calling service state monitoring service preset in the CTDB when the storage node network is normal so as to periodically detect the running state of the network-attached storage service of the storage node;

the service restart module 203 is further configured to: and when the network of the storage node is normal and the network attached storage service stops running, restarting the network attached storage service on the storage node.

As a specific embodiment, in the service data maintenance apparatus disclosed in the embodiment of the present application, on the basis of the foregoing content, the network card monitoring module 201 is specifically configured to:

the heartbeat signal module is used for sending heartbeat signals to other storage nodes at regular time if the storage node network is normal after the service monitoring module periodically detects the network state of the storage node; after other storage nodes detect the interruption of the heartbeat signal of the storage node, the proxy node with normal network state is elected by triggering the service switching process preset in the CTDB, and the task of the network-attached storage service of the storage node is replaced and executed by the proxy node.

As a specific embodiment, in the service data maintenance apparatus disclosed in the embodiment of the present application, on the basis of the foregoing content, the heartbeat signal module is further configured to:

after the storage node network is recovered, after the network-attached storage service is started on the storage node again, the heartbeat signal is continuously sent to other storage nodes at regular time; after other storage nodes detect that the heartbeat signal of the storage node is recovered, the proxy node switches the task of the network-attached storage service, which is executed by replacing the storage node, to the storage node by triggering the service switching process again.

the state identification module is used for modifying the network state identification of the storage node from a normal state to a fault state after the network of the storage node is abnormal; and after the storage node network is recovered, the network state identification of the storage node is changed from the fault state to the normal state.

Further, the present application also discloses a storage cluster, which includes a plurality of storage nodes, and as shown in fig. 5, each of the storage nodes includes:

a memory 301 for storing a computer program;

a processor 302 for executing said computer program to implement the steps of any of the service data maintenance methods described above.

Further, the present application also discloses a computer-readable storage medium, in which a computer program is stored, and the computer program is used for implementing the steps of any service data maintenance method described above when being executed by a processor.

For the specific content of the storage cluster and the computer-readable storage medium, reference may be made to the foregoing detailed description on the service data maintenance method, and details are not described here again.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the equipment disclosed by the embodiment, the description is relatively simple because the equipment corresponds to the method disclosed by the embodiment, and the relevant parts can be referred to the method part for description.

It is further noted that, throughout this document, relational terms such as "first" and "second" are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The technical solutions provided by the present application are described in detail above. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, without departing from the principle of the present application, several improvements and modifications can be made to the present application, and these improvements and modifications also fall into the protection scope of the present application.

Claims

1. A maintenance method of service data is applied to each storage node in a storage cluster, and each storage node runs a network-attached storage service based on a deployed CTDB; the method comprises the following steps:

2. The method for maintaining business data according to claim 1, wherein the determining process of the local storage node network anomaly comprises:

3. The method for maintaining business data according to claim 1, wherein after the invoking of the network card monitoring service preset in the CTDB for periodically detecting the network status of the storage node, the method further comprises:

4. The method for maintaining business data according to claim 1, wherein after the invoking of the network card monitoring service preset in the CTDB for periodically detecting the network status of the storage node, the method further comprises:

5. The method for maintaining business data according to claim 4, wherein after the network of the local storage node is recovered and the network-attached storage service is restarted on the local storage node, the method further comprises:

6. The method for maintaining business data according to any one of claims 1 to 5, wherein after the network of local storage nodes is abnormal, the method further comprises:

modifying the network state identifier of the storage node from a normal state to a fault state;

after the storage node network is recovered, the method further includes:

and modifying the network state identifier of the storage node from the fault state to the normal state.

7. The service data maintenance device is applied to each storage node in a storage cluster, and each storage node runs a network-attached storage service based on a deployed CTDB; the device comprises:

8. The apparatus for maintaining business data according to claim 7, further comprising:

9. A storage cluster comprising a plurality of storage nodes, wherein each of said storage nodes comprises:

a memory for storing a computer program;

processor for executing said computer program for implementing the steps of the method for maintenance of business data according to any one of claims 1 to 6.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method for maintaining business data according to any one of claims 1 to 6.