CN110515557B

CN110515557B - Cluster management method, device and equipment and readable storage medium

Info

Publication number: CN110515557B
Application number: CN201910785358.2A
Authority: CN
Inventors: 王新忠
Original assignee: Beijing Inspur Data Technology Co Ltd
Current assignee: Beijing Inspur Data Technology Co Ltd
Priority date: 2019-08-23
Filing date: 2019-08-23
Publication date: 2022-06-17
Anticipated expiration: 2039-08-23
Also published as: CN110515557A

Abstract

The application discloses a cluster management method, a device, equipment and a computer readable storage medium, wherein the method comprises the following steps: adding the I/O request into a to-be-processed linked list; controlling a transaction module and a write cache module in the metadata of the survival node to stop waiting for receiving a message sent by a fault node, switching the write cache of the metadata into a log mode, and switching a main node mode into a mode without the fault node; reading a root area of metadata of a fault node as a main node into a memory of a survival node; controlling the write cache module to open a disk-brushing switch and controlling the transaction module to perform rollback redoing on the unfinished transaction; and issuing the I/O request in the linked list to be processed. According to the technical scheme disclosed by the application, all services of the fault node are managed through the operation processing of the live node, the transaction module and the write cache module in the metadata of the live node, so that the high availability of the full flash memory system is realized, and the reliability of the full flash memory system is improved.

Description

Cluster management method, device and equipment and readable storage medium

Technical Field

The present application relates to the field of full flash storage technologies, and in particular, to a cluster management method, apparatus, device, and computer-readable storage medium.

Background

With the development of information, the full flash memory system with the characteristics of strong processing capability, good expansibility and maintainability and the like is widely applied. The storage system is used as a bottom foundation of related computer services, and has high requirements on reliability.

For a full flash storage system, metadata is the most important part of the system, and there is currently no effective way to achieve high availability of a full flash storage system from a metadata perspective.

In summary, how to implement high availability of a full flash memory system from the perspective of metadata to improve reliability thereof is a technical problem to be solved by those skilled in the art.

Disclosure of Invention

In view of the above, an object of the present application is to provide a cluster management method, apparatus, device and computer readable storage medium to achieve high availability of a full flash storage system from the perspective of metadata, thereby improving reliability thereof.

In order to achieve the above purpose, the present application provides the following technical solutions:

a cluster management method is applied to a full flash storage system based on a cluster, and comprises the following steps:

when a fault node exists in the cluster, adding an I/O request newly received by a surviving node into a to-be-processed linked list;

controlling a transaction module and a write cache module in the live node metadata to stop waiting for receiving the message sent by the fault node, switching the write mode of the metadata into a log mode, and switching the main node mode into a mode without the fault node;

reading a root area of metadata of the fault node, which is a main node, into a memory of the survival node;

controlling the write cache module to open a disk brushing switch, and controlling the transaction module to perform rollback redo on an unfinished transaction;

and issuing the I/O request in the linked list to be processed.

Preferably, before reading the root area of the metadata of the node with the primary node as the failed node into the memory of the surviving node, the method further includes:

and dividing a preset area from the logical address of the disk to be used as a root area of the metadata, and storing a root node of the metadata in the root area.

Preferably, the storing the root node of the metadata in the root zone includes:

and storing the root node in the root zone in a double-copy mode.

Preferably, the root node includes LunID, CRC check value, MagicNumber.

Preferably, the method further comprises the following steps:

when the fault node is recovered to be normal, adding a newly received I/O request into a to-be-processed linked list;

waiting for the write cache module in the metadata of the surviving node to complete the task of being refreshed, and removing the read cache data of the failure node which is a main node in the metadata read cache module;

switching the writing mode of the metadata to a mirror mode, and switching the main node mode to a mode containing the fault node after the normal recovery;

and synchronizing the writing mode and the main node mode to the fault node after the normal recovery, and recovering the main node as a root area of the metadata of the fault node from the disk to a memory of the main node by the fault node after the normal recovery.

Preferably, the recovering, by the failed node, the root area of the metadata of the failed node, which is the master node, from the disk to the memory of the failed node includes:

the fault node traverses the logic address of the disk, and reads a main node as a root area of metadata of the fault node into a memory of the fault node; wherein the root zone comprises the root node in a double-copy form;

the fault node carries out CRC check value check and MagicNumber check on the root node in a double-copy mode;

if the root nodes in the double copy forms pass the verification, the fault node stores the root node with the later timestamp in the memory of the fault node; if only one root node in the root nodes in the double copy mode passes the verification, the fault node stores the root node passing the verification in the memory of the fault node.

A cluster management device is applied to a full flash storage system based on a cluster, and comprises:

the first adding module is used for adding an I/O request newly received by a surviving node into a to-be-processed linked list when a fault node exists in the cluster;

the first control module is used for controlling the transaction module and the write cache module in the metadata of the surviving node to stop waiting for receiving the message sent by the fault node, switching the write mode of the metadata into a log mode and switching the main node mode into a mode without the fault node;

the reading module is used for reading the main node as the root area of the metadata of the fault node into the memory of the survival node;

the second control module is used for controlling the write cache module to open a disk refreshing switch and controlling the transaction module to perform rollback redo on an unfinished transaction;

and the issuing module is used for issuing the I/O request in the linked list to be processed.

Preferably, the method further comprises the following steps:

and the dividing and storing module is used for dividing a preset area from a logic address of a disk as a root area of the metadata before reading the root area of the metadata of which the main node is the fault node into the memory of the survival node, and storing the root node of the metadata in the root area.

A cluster management device, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the cluster management method according to any one of the above when executing the computer program.

A computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the cluster management method according to any of the preceding claims.

The application provides a cluster management method, a device, equipment and a computer readable storage medium, wherein the method is applied to a full flash storage system based on a cluster, and comprises the following steps: when a fault node exists in the cluster, adding an I/O request newly received by a surviving node into a to-be-processed linked list; controlling a transaction module and a write cache module in the metadata of the survival node to stop waiting for receiving the message sent by the fault node, switching the write cache of the metadata into a log mode, and switching a main node mode into a mode without the fault node; reading a root area of metadata of a fault node as a main node into a memory of a survival node; controlling the write cache module to open a disk-brushing switch and controlling the transaction module to perform rollback redoing on the unfinished transaction; and issuing the I/O request in the linked list to be processed. According to the technical scheme disclosed by the application, when the fault node exists in the cluster, all services of the fault node are managed through the live node, the transaction module and the write cache module in the metadata of the live node, so that the high availability of the full flash storage system is realized, and the reliability of the full flash storage system is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart of a cluster management method according to an embodiment of the present application;

fig. 2 is a flowchart for taking over a service of a failed node according to an embodiment of the present application;

fig. 3 is a flowchart of a failed node service switch back provided in an embodiment of the present application;

fig. 4 is a schematic diagram illustrating metadata root zone recovery performed by a failed node after restoration according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a cluster management device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a cluster management device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, which shows a flowchart of a cluster management method provided in an embodiment of the present application, applied to a full flash storage system based on a cluster, and including:

s11: and when the fault node exists in the cluster, adding the I/O request newly received by the surviving node into the to-be-processed linked list.

For a cluster-based full flash storage system, a plurality of nodes (i.e., a plurality of controllers) are included, and the nodes are divided into master nodes and slave nodes.

When one node in the cluster fails (the failed node is the failed node), in order to achieve high availability of the full flash storage system, the surviving node in the cluster (i.e., the surviving node) needs to take over all the services of the failed node, and in the process of taking over, the node is allowed to receive the I/O request sent by the upper layer service.

Considering that, in the system running process, for each process stage, operations may be performed, some configuration information needs to be read, and in order to be able to safely modify the configuration information, it is necessary to ensure that modifications are not performed in the running process, therefore, when a failed node exists in the cluster, the surviving node enters a silent state to notify the corresponding module to perform corresponding processing (specifically, to complete processing on a task being processed, and not to start processing on the task), so as to achieve a task that is not being processed within the whole process, so as to safely modify the task.

In the process that the surviving node performs the state in silence, the upper layer service can still issue a new I/O request (only the new I/O request issued by the upper layer service is not responded temporarily), at this time, the surviving node adds the newly received I/O request into the to-be-processed linked list, so as to issue and process the I/O request after all services of the failed node are taken over.

S12: and controlling a transaction module and a write cache module in the metadata of the survival node to stop waiting for receiving the message sent by the fault node, switching the write mode of the metadata into a log mode, and switching the main node mode into a mode without the fault node.

Considering that there is a message transmission and response between the master node and the slave node under normal conditions, there are cases where it is necessary to wait for a peer message, for example: for node a and node B, node a sends a message to node B, and after the operation of node B is completed, it needs to reply a corresponding message to node a, so that node a can know that node B has received the message and completed the processing. However, when a node fails, the failed node cannot send a message to the surviving node, and at this time, the surviving node may actively cancel the waiting, and specifically, the surviving node may control the transaction module and the write cache module in the metadata corresponding to the surviving node to stop waiting for receiving the message sent by the failed node, and may mark the message as a failure.

For metadata, for general I/O services, the metadata needs to manage the mapping relationship of logical addresses to physical addresses (i.e., LPs); for garbage collection function, metadata needs to manage the mapping relationship from physical address to logical address (i.e. PL); for the supported deduplication functionality, the metadata needs to manage the mapping of the fingerprint values of the I/O to the physical addresses (i.e., HP). Therefore, for one I/O, operations such as LP, PL and HP need to be modified for many times, and therefore, transactions are needed to ensure atomicity.

The metadata can be internally divided into the following modules: a metadata object module, which is responsible for managing metadata objects, including LUN (logical Unit Number, which can be briefly expressed as a logical volume) information, root nodes of a metadata tree structure (the metadata form of the present application takes B + tree as an example), and operations such as initializing, updating, and recovering the root zone data structure; the transaction module, in combination with the above, needs a transaction mechanism to ensure atomicity, because one request can be divided into multiple sub-requests: if the completion is finished, the completion is completed; if one of the sub-requests is not completed, the sub-request fails, rollback redo is needed, and at the moment, the completed sub-request also needs to cancel operation; a write cache module: the system is responsible for caching the processing of business I/O in a memory, and is divided into a WRITE _ BACK (WRITE BACK mode) mode and a WRITE _ THROUGH (WRITE THROUGH mode) mode according to business requirements, wherein in the WRITE _ BACK mode, the WRITE cache is divided into a certain memory space and caches the operation sent by a transaction module, and the WRITE is performed under the condition that a certain condition is met, while the WRITE _ THROUGH mode directly performs the WRITE of a request sent by the transaction module; b + tree module: the method is responsible for realizing the whole metadata B + tree operation algorithm; a reading and caching module: the method is responsible for reading and caching the metadata; the query module: and is responsible for the query operation of the metadata.

After the transaction module and the write cache module in the control metadata of the surviving node stop waiting for receiving the message sent by the failed node, the surviving node enters a silent state, so that no I/O being processed can be ensured, and at the moment, the modification of the configuration information can be safely carried out. Specifically, the surviving node may switch the write mode of the metadata to a log mode (i.e., a LOGGING mode, which indicates that the operation log of the transaction module is to be protected by writing the operation log to a disk, and in the LOGGING mode, the write cache module corresponds to a write-through mode), specifically, the dual-control mirroring mode of the transaction module needs to be switched to a single-control log mode, and meanwhile, the dual-control write-back mode of the write cache module needs to be switched to a single-control write-through mode. When the failure node fails, the writing mode of the metadata is a mirror mode (i.e. a cache mode, which means that synchronization is performed only by writing the operation log of the transaction module into a controller (i.e. a node) of the opposite end, and in the cache mode, the writing cache module corresponds to a write-back mode).

In addition, the surviving node needs to switch the master node mode to a mode that does not include the failed node, that is, the failed node needs to be excluded when the master node is subsequently allocated, and at this time, for a case that the cluster includes two nodes, the master node mode is switched to a mode that only includes the surviving node (that is, switched to the single node mode).

It should be noted that after the switching between the write mode and the master node mode is completed, the modified I/O request is processed in a new mode, so as to achieve high availability of the system and improve the reliability of the system.

S13: and reading the root area of the metadata of the fault node which is the main node into the memory of the survival node.

After completing the write mode and the master node mode switch, the surviving node needs to restore the root zone of the metadata. Considering that the root area of the metadata also has a master-slave partition, each node in the cluster only stores the master node as the root area of the current node, so that before the failure node fails, the surviving node stores the root area of the metadata of the surviving node as the master node, the failure node stores the root area of the metadata of the failure node as the master node, and when the failure node fails and the surviving node needs to recover the root area of the metadata, so as to take over the service of the failure node, the surviving node needs to read the root area of the metadata of the failure node as the master node into the memory of the surviving node, so that the operation of the metadata can be performed.

S14: and controlling the write cache module to open a disk brushing switch, and controlling the transaction module to perform rollback redo on the unfinished transaction.

After the metadata root zone is recovered, the surviving node controls the write cache module to open a disk refreshing switch and controls the transaction module to perform rollback redo on the unfinished transaction.

It should be noted that, at this time, before the transaction module starts to work, the write cache module is required to allow the flushing to be performed in the single-control write-through mode, and the transaction module is required to perform redo according to the new write mode (i.e., the single-control log mode).

S15: and issuing the I/O request in the linked list to be processed.

After the transaction module performs rollback redo on an incomplete transaction, the surviving node enters a running state, at this time, the surviving node completes taking over a fault node task, the surviving node can resend the I/O request in the to-be-processed linked list, and the surviving node allows to receive a new I/O request, so that the full flash storage system can keep normal work, high availability of the system is achieved, and reliability of the system is improved.

In addition, the surviving node can send a notification of taking over the completion of the task of the failed node to the upper-layer service while issuing the I/O request in the chain table to be processed or after issuing the I/O request in the chain table to be processed, so that the upper-layer service can timely know the related information of the successful taking over of the failed node service. Taking an example that the cluster includes a node a and a node B, where the node B is a failed node and the node a is a surviving node, refer to fig. 2 specifically, which shows a flowchart for taking over a service of the failed node provided in the embodiment of the present application.

According to the technical scheme disclosed by the application, when the fault node exists in the cluster, all services of the fault node are managed through the live node, the transaction module and the write cache module in the metadata of the live node, so that the high availability of the full flash storage system is realized, and the reliability of the full flash storage system is improved.

Before reading a root area of metadata of a node with a failure as a master node into a memory of a surviving node, the cluster management method provided by the embodiment of the present application may further include:

The general principle for metadata is to ensure that its data structure is ready for read and write operations. Considering that the most important of the metadata of the tree structure is the root node, the tree structure can be operated again only if the root node is available, so that in order to facilitate the repair of the metadata, a preset region can be divided from a logical address of a disk as a root zone of the metadata (for example, a part of the region is divided from zero as the root zone), and the root node of the metadata is stored in the root zone, so that the root node can be directly read into a memory by using the logical address to be recovered when an exception occurs in the system.

In addition, the root zone may be initialized once when the log volume is created, that is, the initialized root node may be used to perform a disk writing operation.

The cluster management method provided in this embodiment of the present application may store a root node of metadata in a root zone, and may include:

and storing the root node in the root zone in a double-copy mode.

In order to increase the reliability of the root zone and the root node, the root node may be stored in the root zone in a form of double copies, so as to implement a redundancy design through the form of double copies, thereby increasing the reliability of the root zone and the root node.

In the cluster management method provided in the embodiment of the present application, the root node may include a LunID, a CRC check value, and a MagicNumber.

The metadata root node includes, but is not limited to, a LunID, a Cyclic Redundancy Check (CRC) Check value, and a MagicNumber corresponding to the current tree.

And calculating a CRC check value and updating the CRC check value into a data structure of the root node every time the rootAddress is modified.

The CRC check value and the MagicNumber are set in the root node, and the two data can be utilized to realize multiple tests on the root node so as to improve the reliability of obtaining the root node.

The cluster management method provided in the embodiment of the present application may further include:

when the fault node is recovered to be normal, adding a newly received I/O request into the linked list to be processed;

waiting for a write cache module in the metadata of the surviving nodes to finish the task under brushing, and removing read cache data of which a main node is a fault node in a metadata read cache module;

switching the writing mode of the metadata into a mirror mode, and switching the main node mode into a mode containing a fault node after the normal recovery;

and synchronizing the writing mode and the main node mode to the fault node after the normal recovery, and recovering the main node as the root zone of the metadata of the fault node from the disk to the memory of the main node by the fault node after the normal recovery.

When a failed node is recovered to be normal, in order to improve the reliability and performance of the full flash memory system, the failed node which is recovered to be normal needs to take over tasks which originally belong to the failed node, and for the full flash memory system comprising two nodes, a dual-control mode needs to be formed again, at this time, a service switching-back operation is involved, and in the process, service I/O is not interrupted.

When the fault node is on-line again, all modules of the fault node carry out initialization operation, and all the modules carry out recovery operation of the configuration information. Considering that all configuration information is reserved in the surviving nodes, the nodes which come on line again only need to guarantee the synchronization time.

After the failed node recovers to normal, the surviving node performs a silent state in order to safely modify the configuration information, and at this time, the surviving node adds the newly received I/O request to the to-be-processed linked list and redos the I/O request in the running stage.

After the surviving node adds the newly received I/O request into the linked list to be processed, the surviving node waits for the write cache module in the metadata to complete the task of being refreshed, and can remove the read cache data of which the main node is the fault node in the metadata read cache module. The method includes that a surviving node takes over the service of a fault node before the fault node is recovered to be normal, therefore, a read cache originally belonging to the fault node is cached in the surviving node, and after the fault node is recovered to be normal, in order to reduce the pressure of the surviving node and reduce the resource occupation of the surviving node, the surviving node needs to eliminate read cache data originally belonging to the fault node.

Then, the surviving node enters a silent state, at this time, the surviving node can switch the write mode to the mirror mode, and switch the master node mode to a mode including the failed node after recovering to normal (i.e., switch the master node mode to the normal mode). For the description of the mirroring mode, reference may be made to the detailed description of the corresponding part in the service of taking over the failed node by the surviving node, which is not described herein again.

After the surviving node switches the write mode and the master node mode, the failed node after recovering to normal also enters a silent state, and at this time, the surviving node can synchronize the switched write mode and the master node mode to the failed node after recovering to normal.

After the failed node after recovering from the normal state is synchronized with the surviving node, the failed node after recovering from the normal state can recover the primary node as the root zone of the metadata of the failed node from the disk to the memory of the failed node, so as to complete the back-cut of the service.

In addition, after the failed node after recovering normal completes the back-switch of the service, the surviving node and the failed node after recovering normal can inform the upper layer service that the back-switch has been completed, so that the full flash memory system can conveniently work according to a normal mode, and the reliability and the performance of the full flash memory system are improved.

Taking an example that the cluster includes a node a and a node B, where the node B is a failed node and the node a is a surviving node, refer to fig. 3, which shows a flowchart of service switching back of the failed node provided in this embodiment of the present application.

Referring to fig. 4, a schematic diagram of a failed node performing metadata root zone recovery after restoring to normal according to an embodiment of the present application is shown. In the cluster management method provided in the embodiment of the present application, after the recovery of the normal fault node, the recovery of the root area of the metadata, where the master node is the fault node, from the disk to the memory of the fault node may include:

s41: and the fault node traverses the logic address of the disk, and reads the root area of the metadata of which the main node is the fault node into the memory of the fault node.

Wherein, the root zone can comprise a root node in the form of a double copy.

In combination with the above process, since the preset area is divided from the logical address of the disk as the root area of the metadata, and the root node is stored in the root area in a form of double copies, and the root node includes the CRC check value and the MagicNumber, when the metadata root area is restored, the logical address of the disk can be traversed by the failed node, and the root area with the master node as the failed node is read into the memory of the failed node.

Then, the failed node selects the root node in the form of the double copy read into the memory according to the timestamp, the CRC check value, and the MagicNumber (see specifically step S42 to step S47), so as to ensure that the root area and the root node of the recovered metadata are correct.

S42: and the fault node performs CRC check value check and MagicNumber check on the root node in the form of double copies.

S43: and the fault node judges whether the root nodes in the double-copy mode pass the verification, if so, the step S44 is executed, and if not, the step S45 is executed.

S44: and the fault node stores the root node with the later time stamp in the memory of the fault node.

S45: and the fault node judges whether one root node in the double-copy type root nodes passes the test, if so, the step S46 is executed, and if not, the step S47 is executed.

S46: and the fault node stores the root node passing the check in the memory of the fault node.

S47: the failed node marks the root zone as corrupted.

An embodiment of the present application further provides a cluster management device, see fig. 5, which shows a schematic structural diagram of the cluster management device provided in the embodiment of the present application, and the cluster management device is applied to a full flash storage system based on a cluster, and may include:

a first adding module 51, configured to add, when a failed node exists in the cluster, an I/O request newly received by a surviving node to the to-be-processed linked list;

a first control module 52, configured to control the transaction module and the write cache module in the metadata of the surviving node to stop waiting for receiving the message sent by the failed node, switch the write mode of the metadata to the log mode, and switch the master node mode to a mode that does not include the failed node;

the reading module 53 is configured to read a root area of metadata of a node with a failure as a main node into a memory of a surviving node;

the second control module 54 is configured to control the write cache module to open a disk-flushing switch, and control the transaction module to perform rollback redo on an incomplete transaction;

and the issuing module 55 is configured to issue the I/O request in the to-be-processed linked list.

The cluster management device provided in this embodiment may further include:

and the dividing and storing module is used for dividing a preset area from the logic address of the disk as a root area of the metadata and storing the root node of the metadata in the root area before reading the root area of the metadata of which the main node is the fault node into the memory of the survival node.

In the cluster management apparatus provided in the embodiment of the present application, the dividing and storing module may include a storing unit, configured to store the root node in the root area in a form of a double copy.

In the cluster management device provided in the embodiment of the present application, the root node may include a LunID, a CRC check value, and a MagicNumber.

The cluster management device provided in this embodiment may further include:

the second adding module is used for adding the newly received I/O request into the linked list to be processed after the fault node is recovered to be normal;

the waiting module is used for waiting for the write cache module in the metadata of the surviving node to finish the task of brushing and removing the read cache data of which the main node is a fault node in the metadata read cache module;

the switching module is used for switching the writing mode of the metadata into a mirror mode and switching the main node mode into a mode containing a fault node after the normal recovery;

and the synchronization module is used for synchronizing the write mode and the main node mode to the fault node after the normal recovery, and recovering the main node as the root zone of the metadata of the fault node from the disk to the memory of the fault node by the fault node after the normal recovery.

An embodiment of the present application further provides a cluster management device, see fig. 6, which shows a schematic structural diagram of a cluster management device provided in the embodiment of the present application, and the schematic structural diagram may include:

a memory 61 for storing a computer program;

the processor 62 is configured to execute the computer program stored in the memory 62 to implement the following steps:

when a fault node exists in the cluster, adding an I/O request newly received by a surviving node into a to-be-processed linked list; controlling a transaction module and a write cache module in the metadata of the survival node to stop waiting for receiving a message sent by a fault node, switching a write mode of the metadata into a log mode, and switching a main node mode into a mode without the fault node; reading a root area of metadata of a fault node as a main node into a memory of a survival node; controlling the write cache module to open a disk-brushing switch and controlling the transaction module to perform rollback redoing on the unfinished transaction; and issuing the I/O request in the linked list to be processed.

An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, can implement the following steps:

when a fault node exists in the cluster, adding an I/O request newly received by a surviving node into a to-be-processed linked list; controlling a transaction module and a write cache module in the metadata of the survival node to stop waiting for receiving a message sent by a fault node, switching a write mode of the metadata into a log mode, and switching a main node mode into a mode without the fault node; reading a root area of metadata of a fault node as a main node into a memory of a survival node; controlling the write cache module to open a disk brushing switch and controlling the transaction module to perform rollback redo on an unfinished transaction; and issuing the I/O request in the linked list to be processed.

The computer-readable storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

For a description of a relevant part in a cluster management apparatus, a device, and a computer-readable storage medium provided in the embodiments of the present application, please refer to a detailed description of a corresponding part in a cluster management method provided in the embodiments of the present application, which is not described herein again.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include elements inherent in the list. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element. In addition, parts of the above technical solutions provided in the embodiments of the present application, which are consistent with the implementation principles of corresponding technical solutions in the prior art, are not described in detail so as to avoid redundant description.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A cluster management method is applied to a full flash storage system based on a cluster, and comprises the following steps:

controlling the write cache module to open a disk-brushing switch and controlling the transaction module to perform rollback redo on an unfinished transaction;

issuing the I/O request in the linked list to be processed;

before reading the root area of the metadata of the fault node, which is the master node, into the memory of the surviving node, the method further includes:

2. The cluster management method of claim 1, wherein storing a root node of the metadata in the root zone comprises:

and storing the root node in the root zone in a double-copy mode.

3. The cluster management method according to claim 2, wherein the root node comprises LunID, CRC check value, MagicNumber.

4. The cluster management method of claim 3, further comprising:

5. The cluster management method according to claim 4, wherein the recovering of the failed node from the failed node to recover the primary node as the root zone of the metadata of the failed node from the disk into its own memory includes:

the fault node traverses the logic address of the disk, and reads a main node as a root area of metadata of the fault node into a memory of the fault node; the root zone comprises the root node in a double-copy form;

6. The cluster management device is applied to a full flash storage system based on a cluster, and comprises the following components:

the first control module is used for controlling the transaction module and the write cache module in the live node metadata to stop waiting for receiving the message sent by the fault node, switching the write mode of the metadata into a log mode and switching the main node mode into a mode without the fault node;

the issuing module is used for issuing the I/O request in the linked list to be processed;

further comprising:

7. A cluster management device, comprising:

a memory for storing a computer program;

processor for implementing the steps of the cluster management method according to any of claims 1 to 4 when executing said computer program.

8. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the cluster management method according to any of the claims 1 to 5.