CN116610499A

CN116610499A - Cluster role switching method, device, equipment and medium in file system

Info

Publication number: CN116610499A
Application number: CN202310882845.7A
Authority: CN
Inventors: 王凯; 刘昌鑫; 李红; 韦新伟
Original assignee: Lenovo Netapp Technology Ltd
Current assignee: Lenovo Netapp Technology Ltd
Priority date: 2023-07-19
Filing date: 2023-07-19
Publication date: 2023-08-18
Anticipated expiration: 2043-07-19
Also published as: CN116610499B

Abstract

The disclosure provides a cluster role switching method, device, equipment and medium in a file system. The cluster role switching method in the file system provided by the disclosure comprises the following steps: receiving a switching request for switching from a first cluster role to a second cluster role; in response to the switch request, configuring a storage area associated with a second cluster role to be switched to; transmitting a switching request for switching from the second cluster role to the first cluster role to a second cluster having the second cluster role; and in response to confirming that the role switch has been completed, setting the current state to the second cluster role.

Description

Cluster role switching method, device, equipment and medium in file system

Technical Field

The present disclosure relates to the field of data storage technologies, and in particular, to a method, an apparatus, a device, and a medium for switching cluster roles in a file system.

Background

In the field of file system storage, for example, a distributed storage mode can solve the problem of file system service access caused by the failure of single nodes or multiple nodes in the cluster. The geographical positions of physical nodes at the rear end of the cluster are relatively concentrated, and in most cases, all physical nodes of the cluster are arranged in the same machine room. Because the territories of the physical nodes are too concentrated, some unpleasantness reasons or periodic maintenance may occur, which may cause the entire file system to cease servicing.

Disclosure of Invention

Some embodiments of the present disclosure provide a method, an apparatus, a device, and a medium for switching roles of a cluster in a file system, which are used to implement role switching of the cluster in the file system, so as to improve stability of the file system and provide stable file service.

According to an aspect of the present disclosure, there is provided a cluster role switching method in a file system, which is applicable to a first cluster having a first cluster role, including: receiving a switching request for switching from a first cluster role to a second cluster role; in response to the switch request, configuring a storage area associated with a second cluster role to be switched to; transmitting a switching request for switching from the second cluster role to the first cluster role to a second cluster having the second cluster role; and in response to confirming that the role switch has been completed, setting the current state to the second cluster role.

According to some embodiments of the present disclosure, the cluster role switching method further includes: before receiving the handover request, performing a remote copy task between the first cluster and the second cluster; and before configuring the storage area related to the second cluster role to be switched to, confirming that the remote copy task between the first cluster and the second cluster is stopped, and confirming that the master-slave directory in the first cluster is in a split state.

According to some embodiments of the present disclosure, configuring the storage area associated with the second cluster role to be switched to includes: and creating a node mapping storage area, wherein the node mapping storage area is used for storing node mapping relation data of the directory and the file between the first cluster and the second cluster, and the node mapping relation data is used for executing a remote copying task between the first cluster and the second cluster.

According to some embodiments of the present disclosure, the cluster role switching method further includes: and receiving node mapping relation reversal data of the node mapping reversal storage area from the second cluster, and storing the received node mapping relation reversal data as node mapping relation data into the node mapping storage area, wherein under the condition that all the node mapping relation reversal data are stored into the node mapping storage area, the role switching is confirmed to be completed.

According to some embodiments of the present disclosure, configuring the storage area associated with the second cluster role to be switched to includes: removing a replication task control storage area, wherein the replication task control storage area is used for storing flow control information of a remote replication task between a first cluster and a second cluster; and creating a metadata difference storage area, wherein the metadata difference storage area is used for storing snapshot differences between a latest snapshot and a latest copy snapshot of data of the second cluster.

According to some embodiments of the present disclosure, the cluster role switching method further includes: after receiving the switching request, setting a role switching flag bit, wherein the role switching flag bit is used for monitoring a role switching process by a role switching initiating terminal in the background, and when the role switching flag bit is in a first state, the role switching flag bit indicates that the first cluster is in a role switching process, and when the role switching flag bit is in a second state, the role switching flag bit indicates that the role switching process of the first cluster is ended.

According to another aspect of the present disclosure, there is provided a cluster role switching method in a file system, applicable to a second cluster having a second cluster role, including: receiving a switching request for switching from the second cluster role to the first cluster role; in response to a switching request, configuring a storage area related to a first cluster role to be switched to; transmitting a switching request for switching from the first cluster role to the second cluster role to the first cluster with the first cluster role; and in response to confirming that the role switch has been completed, setting the current state to the first cluster role.

According to some embodiments of the present disclosure, the cluster role switching method further includes: before receiving the handover request, performing a remote copy task between the first cluster and the second cluster; and before configuring the storage area related to the first cluster role to be switched to, confirming that the remote copy task between the first cluster and the second cluster is stopped, and confirming that the master-slave directory in the first cluster is in a split state.

According to some embodiments of the present disclosure, configuring a storage area associated with a first cluster role to switch to includes: creating a node mapping reversal storage area, wherein the second cluster comprises a node mapping storage area for storing node mapping relation data of a directory and a file between the first cluster and the second cluster, and the node mapping relation data is used for executing a remote copying task between the first cluster and the second cluster; inverting the node mapping relation data stored in the node mapping storage area to obtain node mapping relation inversion data, and storing the node mapping relation inversion data into the node mapping inversion storage area; and transmitting the node mapping relation reversal data stored in the node mapping reversal storage area to the first cluster.

According to some embodiments of the present disclosure, the cluster role switching method further includes: and receiving feedback information successfully saved by node mapping relation reverse data from the first cluster, wherein in response to receiving the feedback information, the role switch is confirmed to be completed.

According to some embodiments of the present disclosure, the cluster role switching method further includes: in response to confirming that the role switch has been completed, the node map reversal store and the node map store are removed.

According to some embodiments of the present disclosure, configuring a storage area associated with a first cluster role to switch to includes: creating a replication task control storage area, wherein the replication task control storage area is used for storing flow control information of remote replication tasks between the first cluster and the second cluster; and creating a metadata difference storage area, wherein the metadata difference storage area is used for storing snapshot differences between a latest snapshot and a latest copy snapshot of data of the second cluster.

According to some embodiments of the present disclosure, the cluster role switching method further includes: after receiving the switching request, setting a role switching zone bit, wherein the role switching zone bit is used for monitoring a role switching process by a role switching initiating terminal in the background, and when the role switching zone bit is in a first state, the role switching zone bit indicates that the second cluster is in a role switching process, and when the role switching zone bit is in a second state, the role switching zone bit indicates that the role switching process of the second cluster is ended.

According to some embodiments of the present disclosure, the cluster role switching method further includes: and under the condition that the interruption of the cluster role switching process is detected, reconfiguring a storage area related to the second cluster role to be switched to, and regardless of the current state of the cluster role switching process.

According to some embodiments of the present disclosure, the cluster role switching method further includes: under the condition that interruption of the cluster role switching process is detected, comparing the data of the node mapping storage area with the data of the node mapping reverse storage area, and continuously executing the reverse rotation of the node mapping relation data from the reverse interruption position; and checking the transmission progress of the node mapping relation reversal data to the first cluster, and continuing to transmit the node mapping relation reversal data from the transmission interruption position.

According to yet another aspect of the present disclosure, there is provided a first cluster apparatus in a file system, including: a receiving unit configured to receive a switching request to switch from a first cluster role to a second cluster role; a processing unit configured to configure a memory area associated with a second cluster role to be switched to in response to a switching request; a transmission unit configured to transmit a switching request for switching from the second cluster role to the first cluster role to a second cluster having the second cluster role; and the processing unit is further configured to set the current state to the second cluster role in response to confirming that the role switch has been completed.

According to yet another aspect of the present disclosure, there is provided a second cluster apparatus in a file system, including: a receiving unit configured to receive a switching request to switch from the second cluster role to the first cluster role; a processing unit configured to configure a memory area associated with a first cluster role to be switched to in response to a switching request; a transmission unit configured to transmit a switching request for switching from a first cluster role to a second cluster role to a first cluster having the first cluster role; and the processing unit is further configured to set the current state to the first cluster role in response to confirming that the role switch has been completed.

According to yet another aspect of the present disclosure, there is provided a cluster device in a file system, including: a processor, and a memory, wherein the memory has stored thereon computer executable instructions that, when executed by the processor, cause the processor to perform a cluster role switching method in a file system according to some embodiments of the present disclosure.

According to yet another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored thereon computer-readable instructions, wherein the computer-readable instructions, when executed by a processor, cause the processor to perform a cluster role switching method in a file system according to some embodiments of the present disclosure.

The disclosure provides a cluster role switching method, device, equipment and medium in a file system. The file system comprises a master cluster for realizing file access tasks and a slave cluster serving as backup, wherein a remote copy backup function is provided between the master cluster and the slave cluster. Under the condition that the master cluster fails or is checked regularly, the roles of the master cluster are switched to the slave clusters, and the roles of the slave clusters are switched to the master clusters, so that the former slave clusters are used as new master clusters to continuously provide file processing service, the service stability of a file system is ensured, and further, the correctness and the usability of metadata and data in the file system in the role switching process are ensured.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required to be used in the description of the embodiments will be briefly described below. It should be apparent that the drawings in the following description are only some exemplary embodiments of the present disclosure, and that other drawings may be obtained from these drawings by those of ordinary skill in the art without undue effort.

FIG. 1 illustrates a schematic flow diagram of a cluster role switch method in accordance with some embodiments of the present disclosure;

FIG. 2 illustrates another schematic flow diagram of a cluster role switch method in accordance with some embodiments of the present disclosure;

FIG. 3 illustrates a primary cluster synchronization role switch execution flow diagram in accordance with some embodiments of the present disclosure;

FIG. 4 illustrates a flow diagram for performing a slave cluster synchronization role switch in accordance with some embodiments of the present disclosure;

FIG. 5 illustrates a primary cluster asynchronous role switch execution flow diagram in accordance with some embodiments of the present disclosure;

FIG. 6A illustrates a schematic diagram of an abort occurring when a role switch is performed, according to some embodiments of the present disclosure;

FIG. 6B illustrates another diagram of an abort occurring during a role switch performed in accordance with some embodiments of the present disclosure;

Fig. 7 illustrates a schematic block diagram of a first cluster tool in accordance with some embodiments of the disclosure;

fig. 8 illustrates a schematic block diagram of a second cluster tool in accordance with some embodiments of the disclosure;

fig. 9 illustrates a schematic block diagram of a cluster tool, in accordance with some embodiments of the disclosure;

FIG. 10 illustrates an architectural diagram of an exemplary computing device according to some embodiments of the present disclosure;

fig. 11 illustrates a schematic diagram of a non-transitory computer-readable storage medium according to some embodiments of the disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure. It will be apparent that the described embodiments are merely embodiments of a portion, but not all, of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without the need for inventive faculty, are intended to be within the scope of the present disclosure, based on the embodiments in this disclosure.

Furthermore, unless the context clearly indicates otherwise, the words "a," "an," "the," and/or "the" are not specific to the singular, but may include the plural. The terms "first," "second," and the like, as used in this disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element or object from another. Likewise, the word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the word, and equivalents thereof, to elements or items listed after the word, without excluding other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.

A flowchart is used in this disclosure to describe the steps of a method according to an embodiment of the present disclosure. It should be understood that the preceding or following steps are not necessarily performed in order precisely. Rather, the method steps may be implemented in reverse order or concurrently or in other feasible orders. Also, other operations may be added to these processes.

It will be understood that terms and nouns used herein have the meanings known to those of ordinary skill in the art.

In the field of file system storage, distributed storage is generally used to solve the problem of file system service access caused by single-node or multi-node failures in the present cluster. For the distributed file system, the geographical positions of physical nodes at the rear end of the cluster are relatively concentrated, and in most cases, all physical nodes of the cluster are arranged in the same machine room, and due to the fact that the regions of the physical nodes of the cluster are too concentrated, faults caused by some unreliability can occur, or periodic maintenance is required for the cluster, which can lead to the whole file system to stop serving service. In general, a distributed file system is configured with a master-slave cluster architecture, that is, in general, a master cluster provides a file processing service, and a slave cluster serves as a backup device of the master cluster, which makes it necessary to provide a remote copy function between the master cluster and the slave cluster, so as to backup data in the master cluster to the slave cluster, for example, periodically. It will be appreciated that the "master" and "slave" of a master cluster and a slave cluster herein are merely used to distinguish between clusters for implementing different functions and do not represent any order or importance. For example, processing devices having the same physical structure may be implemented as a master cluster and a slave cluster, respectively.

Some embodiments of the present disclosure provide a cluster role switching method in a file system, which is applicable to a distributed file system with a master and slave cluster architecture. In the case of a failure of a master cluster, for example, the role of the original master cluster is switched to a new slave cluster, and the role of the original slave cluster is switched to the new master cluster, so that when the master cluster fails or needs to be maintained, the original slave cluster (i.e., the new master cluster) takes over the service, continues to provide the file access service, and the original master cluster (i.e., the new slave cluster) is used as a backup device, thereby realizing failover. In addition, after the original master cluster recovery function, the master cluster state is switched back, and the original slave cluster is switched back to the slave cluster state, so that fault recovery (failback) is realized. Therefore, by using the cluster role switching method according to the embodiment of the disclosure, role switching of the master and slave clusters can be realized, stable file service business is provided, accuracy and usability of data during switching are ensured, and performance and use experience of a file system are improved.

According to some embodiments of the present disclosure, a cluster role switching method in a file system is provided, which is specifically applicable to a first cluster having a first cluster role, that is, an execution subject is the first cluster. For example, the first cluster role is a master role, and the first cluster with the first cluster role is a master cluster in a file system and provides a file access service. For example, correspondingly, the second cluster role is a slave role, the second cluster with the second cluster role is a slave cluster in the file system, and as a backup device of the master cluster, for example, a remote copy backup process can be periodically executed between the master cluster and the slave cluster.

Fig. 1 illustrates a schematic flow diagram of a cluster role switch method according to some embodiments of the present disclosure, the flow of fig. 1 corresponding to the role switch steps performed by a master cluster. For ease of description, the method described in connection with fig. 1 may also be referred to as a master cluster role switch method, i.e., performed by a master cluster to switch a master cluster role to a slave cluster role. As shown in fig. 1, the master cluster role switching method includes steps S101 to S104.

In step S101, a switching request to switch from a first cluster role to a second cluster role is received. As an example, this switch request may be sent by a role switch initiator (e.g., an administrator of the file system) to the master cluster for causing the master cluster to switch from the current master role to the slave role. For example, in the event that the manager finds that the primary cluster is faulty or requires maintenance, the primary cluster will be prompted to perform a role switch by sending a switch request. As another example, the switch request may be sent by a slave cluster to a master cluster in the file system. For example, a failure of the master cluster may be detected at the slave cluster side, and for example, the manager instructs the master cluster to perform role switching through the slave cluster. Furthermore, it will be appreciated that the handoff request may also be generated by a component of the primary cluster such that the primary cluster performs the role handoff on a self-triggering basis, without limitation.

Next, in step S102, in response to the switching request, a memory area associated with the second cluster role to be switched to is configured. Since the current master cluster is to be switched to the slave role, it is proposed according to some embodiments of the present disclosure that in the process of performing role switching, a storage area corresponding to the slave cluster to be switched to is configured by the master cluster after receiving the switching request, and the objective is to operate as a new slave cluster after the switching is completed. The storage area related to the slave role that needs to be configured in this step S102 will be described in detail below.

Next, in step S103, a switching request for switching from the second cluster role to the first cluster role is transmitted to the second cluster having the second cluster role. For example, after the storage area for implementing role switching is configured in the master cluster own device, an instruction is sent to the slave cluster in the file system so that it also performs role switching, that is, the slave role is switched to the master role to implement the file service as a new master cluster. The steps for role switching from the cluster will be described below in connection with fig. 2.

In step S104, in response to confirming that the role switch has been completed, the current state is set to the second cluster role. The master cluster sets the current state to the second cluster role, i.e., the slave role, only in case it is confirmed that the role switch has been completed, thereby avoiding data loss or unavailability caused by the role switch.

Through the above steps S101 to S104, the master cluster in the file system can switch from the master role to the slave role as the backup service provided by the slave cluster.

According to some embodiments of the present disclosure, the primary cluster role switching method may further include: before receiving a handover request, performing a remote copy task between the first cluster and the second cluster; and before configuring the storage area related to the second cluster role to be switched to, confirming that the remote copy task between the first cluster and the second cluster is stopped, and confirming that the master-slave directory in the first cluster is in a split state.

As described above, a master-slave cluster architecture is provided in the distributed file system, and a master cluster implements a primary file service business, and a slave cluster is used as a backup device of the master cluster. For this reason, it is necessary to periodically perform a remote copy task between the master and slave clusters, that is, to backup files stored in the master cluster to the slave cluster. In other words, the functions implemented by the master cluster and the slave cluster are different: the master cluster is generally responsible for driving a remote copy task to synchronize data under a master cluster designated directory, including directory metadata and file data, to a slave cluster designated directory at regular intervals; the secondary cluster is generally used to receive backup data (metadata and file data) sent by the primary cluster and save the received data under a specified directory.

In order to facilitate understanding of the implementation process of the cluster role switching method according to embodiments of the present disclosure, before describing specific method steps, a process of performing remote replication tasks by a master and slave clusters in a distributed file system and storage areas that need to be maintained respectively are described first, and then the role switching method according to some embodiments of the present disclosure is described in detail, so as to facilitate understanding of steps performed in the method according to embodiments of the present disclosure with respect to performing cluster role switching.

It will be appreciated that, for the master cluster and the slave clusters in the file system, in a general state, the file service is provided by the master cluster, and the slave clusters serve as backup devices, the master cluster needs to regularly backup data to the slave clusters, that is, the master cluster needs to periodically perform a remote copy task before receiving the switching request involved in step S101. In the method according to the embodiment of the present disclosure, after receiving the handover request related in step S101, the master cluster may first confirm whether a remote copy task between the master and slave clusters is currently performed, and if so, stop the task to perform stable role handover, so as to avoid data loss due to the ongoing remote copy task. Further, the first cluster will further confirm that the master-slave directory therein is in a split state. In the main cluster (currently, the first cluster), a timed synchronization task may be set, so that on the basis of confirming that no remote replication task is currently performed, the master-slave directory in the main cluster can be set to be in a split state through a command, so that the timed synchronization task is prevented from being automatically started, a stable role switching process is ensured, and data loss or role switching failure caused by the remote replication task is avoided.

Next, the storage areas that the master-slave cluster needs to maintain in performing remote copy tasks will be described in detail.

As one example, the storage area may be implemented as a block device (RADOS Block Devices, RBD) in a distributed file system (e.g., ceph file system). Specifically, ceph is a unified distributed file system that has good performance and high availability and scalability. Ceph's entity can now provide file systems, block stores, and object stores, and is distributed in that it can be dynamically expanded. RADOS is the most critical technology of Ceph, which is a complete object storage system, and all data stored in the Ceph system is finally stored by this layer. It will be appreciated that the implementation of the storage area as an RBD is merely exemplary, and that in other file system implementations the storage area described above may be implemented in other ways as well, without limitation.

As described above, a master-slave cluster architecture is provided in the distributed file system, and the functions implemented by the master cluster and the slave cluster are different: the master cluster is generally responsible for driving a remote copy task to synchronize data under a master cluster designated directory, such as directory metadata and file data, to a slave cluster designated directory at regular intervals; the secondary cluster is generally used to receive backup data (metadata and file data) sent by the primary cluster and save the received data under a specified directory.

In order to realize remote data backup, the master cluster needs to remotely copy data to the slave cluster regularly, and the slave cluster needs to store backup data sent by the master cluster. To achieve the above functionality, a storage area is created in the primary cluster for implementing remote copy tasks, e.g., maintaining one or more RBDs. The slave clusters also create some RBDs for performing backup tasks.

As an example, for a primary cluster, a replication task control store (control RBD) needs to be maintained. The control RBD is used for storing remote copy task flow control information, creating the control RBD when a remote copy task process starts or creates a remote copy pair (pair), and deleting and reconstructing the RBD after the copy task of the round is completed. As another example, the primary cluster also needs to maintain a metadata differential store (metadif RBD). For each round of remote copy tasks, the metadata difference is implemented based on the difference between the latest snapshot and the last snapshot of the master cluster directory, and the metadif RBD is used to save the metadata difference between the secondary snapshots and synchronize to the slave cluster. The RBD is created when the remote copy task process starts or a pair is created, and the metadif RBD is deleted and rebuilt after each round of copy task is completed.

As an example, for the slave cluster, the RBD involved in the remote copy task includes a metadata differential storage area (metadif RBD), where the metadif RBD in the slave cluster is used to receive and store metadif data sent by the master cluster, and after receiving, the metadif data is applied to the slave cluster directory to ensure that the directory metadata at both ends of the master and slave clusters are consistent. The RBD is created when the remote copy task process starts or a pair is created, and the metadif RBD is deleted and rebuilt after each round of copy task is completed. As another example, for the slave cluster, the RBD involved in the remote copy task further includes a node map storage area (inomap RBD) to hold node map relationship data of directories and files between the master cluster and the slave cluster. In particular, the node mapping relationship between the master cluster and the slave cluster may be represented as a remote copy pair (pair), i.e., a directory of the master cluster maps to a directory of the slave cluster, which is called a pair. For example, the master cluster directory a and the slave cluster directory a create a mapping relationship, that is, a pair, so that the master cluster directory a can synchronize data to the slave cluster directory a periodically, multiple pairs can be created between the master cluster and the slave cluster, and each pair can set different remote copy time. Thus, a node (inode) mapping relationship between a master directory and a slave directory under a pair needs to be stored in a slave cluster, for example, a node number (inode number) of a file a under a master cluster directory a is 1, after the pair is created, the mapping relationship is mapped to a file a under a slave cluster directory a, the node number is 2, and a node mapping relationship between a node number 1 of the master cluster and a node number 2 of the slave cluster about the file a is stored in the inomap RBD. In the slave cluster, this inomap RBD is created when the pair is created, and removed when the pair is deleted.

According to some embodiments of the present disclosure, configuring the storage area related to the second cluster role to be switched to in step S102 includes: and creating a node mapping storage area, wherein the node mapping storage area is used for storing node mapping relation data of the directory and the file between the first cluster and the second cluster, and the node mapping relation data is used for executing a remote copying task between the first cluster and the second cluster.

Since the current master cluster is to be switched to have the slave role, the processing required by the slave cluster in remote copy is implemented as a new slave cluster after the role switch is completed, whereby, after receiving the switch request, the master cluster will create a node map storage area inomap RBD for storing the above node map data for implementing the functions of the slave cluster.

According to some embodiments of the present disclosure, configuring the storage area related to the second cluster role to be switched to in step S102 further includes: removing a replication task control storage area (control RBD), wherein the replication task control storage area is used for storing flow control information of a remote replication task between a first cluster and a second cluster; and creating a metadata difference storage area (metadif RBD), wherein the metadata difference storage area is used for storing a snapshot difference between a latest snapshot and a last copied snapshot of the data of the second cluster.

It will be appreciated that a control RBD is maintained therein for hosting the remote copy task prior to the role switch by the master cluster. In the process of switching from a master role to a slave role, the master cluster may delete this control RBD, since this is no longer needed as a new slave cluster, and a new control RBD will be configured by the new master cluster. In addition, in the process of performing role switching by the master cluster, a metadiff RBD for implementing the slave role needs to be created. As described above, although this metadif RBD is included in both the master and slave clusters, the functions of both are different. The metadif RBD in the master cluster is used to generate metadata differences between two snapshots of the master cluster and transmit the differences to the slave cluster, and the metadif RBD in the slave cluster is used to receive and save the metadata differences transmitted by the master cluster. Thus, after receiving the handover request, the first cluster will recreate the metadif RBD for implementing the function of the slave cluster. It will be appreciated that the meta-f RBD re-created in the first cluster is used to implement the function of the slave cluster, i.e., to store data snapshot differences received from the new master cluster (i.e., the second cluster).

According to some embodiments of the present disclosure, the primary cluster role switching method further includes: and receiving node mapping relation reversal data of the node mapping reversal storage area from the second cluster, and storing the received node mapping relation reversal data as node mapping relation data into the node mapping storage area. It is understood that before performing the role switch, metadif RBD and inomap RBD for implementing the backup function may be configured from the cluster. After receiving the role switch request from the cluster, the data in the inomap RBD is processed, and the processed data is sent to the master cluster, so that the master cluster comprises the inomap RBD required by the slave role. The process for obtaining node map inversion data from a cluster will be described below in connection with fig. 2.

In accordance with some embodiments of the present disclosure, the primary cluster confirms that the role switch has been completed in the event that it confirms that the node map reversal data is all saved to the node map storage area. The node mapping reverse data can be completely stored in the master cluster, so that the loss of the part of data is avoided, and the data can be backed up as a new slave cluster after the role switching is completed.

By using the storage area configured in the step S102, the primary cluster after role switching can serve as a new secondary cluster to provide stable backup service, so as to ensure the stable performance of the file system and avoid the file service termination caused by factors such as faults. And the master-slave clusters in the file system can quickly and efficiently realize the fault transfer in the file system through a simple role switching step, and ensure the normal service after the master-slave roles are switched.

According to some embodiments of the present disclosure, after the role switch initiator sends the role switch request, the process may be set to wait for the feedback result of the execution of the primary cluster, that is, the initiator will stay in the role switch process before receiving the feedback result, and this implementation may understand the synchronous switching manner, that is, the initiator waits for the feedback result of the execution of the primary cluster after triggering the role switch, during which no further operation is performed.

According to other embodiments of the present disclosure, the primary cluster role switching method further includes: after receiving the switching request, setting a role switching flag bit, wherein the role switching flag bit is used for monitoring a role switching process by a role switching initiating terminal in the background, and when the role switching flag bit is in a first state, the role switching flag bit indicates that the first cluster is in a role switching process, and when the role switching flag bit is in a second state, the role switching flag bit indicates that the role switching process of the first cluster is ended.

The above role switch mode with the role switch flag bit set may be understood as an asynchronous switch mode, for example, after the initiator of role switch sends a command for executing role switch to the main cluster, the process returns immediately. The background thread of the main cluster will perform the role switching process, and update the state of the background switching process by setting the role switching flag bit in the pair state, for example. The initiator knows the role switch process by checking the flag bit. If the flag bit is in the first state (true), it represents that the background is executing the switching process, and if the flag bit is in the second state (false), it represents that the switching process has ended, i.e. the master-slave cluster has completed the role switching. Thus, the initiator first sends a switching request to the cluster, informing the cluster that the cluster needs to switch its role, for example, the Master role is switched to the Slave role (Master to Slave), or the Slave role is switched to the Master role (Slave to Master), and then the initiator monitors whether the role switching process is finished or not through the flag bit. As another example, after the host cluster completes its role switch, the host cluster notifies the initiator of the role switch result by returning the network response, if the return is successful, the initiator continues to execute its role switch flow, if the return fails, the initiator performs corresponding processing according to the error information.

The asynchronous control mode can enable the initiating terminal to return to other processes immediately after sending the role switching request, the front-end process such as an interactive interface and the like cannot be blocked in the role switching process, and other operations can be continuously realized. The initiating terminal only needs to query the flag bit in the pair state at fixed time to detect whether the role switching process is finished, and can also check error information returned by the master-slave clusters to query whether the switching process is successfully executed.

The steps performed by a Master to Slave (Master to Slave) in a file system are described above in connection with fig. 1. The master cluster role switching method according to the embodiment of the present disclosure described above in connection with fig. 1 can switch the role of the first cluster from the first cluster role (master role) to the second cluster role (slave cluster) to perform a subsequent operation as a new slave cluster.

Next, a step (Slave to Master) of performing role switching from a cluster according to an embodiment of the present disclosure will be described with reference to fig. 2

According to some embodiments of the present disclosure, a cluster role switching method in a file system is provided, which is specifically applicable to a second cluster having a second cluster role, that is, an execution subject is the second cluster. For example, the first cluster role is a master role, and the first cluster with the first cluster role is a master cluster in a file system and provides a file access service. Correspondingly, the second cluster role is a slave role, the second cluster with the second cluster role is a slave cluster in the file system, and the second cluster is used as a backup device of the master cluster, and a remote copy backup process can be periodically executed between the master cluster and the slave cluster.

Fig. 2 illustrates another schematic flow diagram of a cluster role switch method according to some embodiments of the present disclosure, the flow of fig. 2 corresponding to the role switch steps performed by the slave clusters. For convenience of description, the method described in connection with fig. 2 is referred to as a Slave cluster role switching method, i.e., performed by a Slave cluster, to switch the Slave cluster to a Master cluster (Slave to Master). As shown in fig. 2, the slave cluster role switching method includes steps S201 to S204.

In step S201, a switching request to switch from the second cluster role to the first cluster role is received. As an example, this switch request may be sent by a role switch initiator (e.g., an administrator of the file system) to the slave cluster for causing the slave cluster to switch from the current slave role to the master role. For example, in the case where the manager finds that the primary cluster has a failure or needs maintenance, a role switch will be caused to be performed by the secondary cluster by sending a switch request for taking over the primary cluster to implement the file service function. As another example, the switch request may also be sent by a master cluster to a slave cluster in the file system. For example, after the master cluster itself performs role switching, the slave cluster in the file system may be instructed to perform role switching correspondingly. Furthermore, it will be appreciated that the handoff request may also be generated by a component of the slave cluster such that the slave cluster autonomously performs a role handoff, without limitation.

Next, in step S202, in response to the switching request, a memory area associated with the first cluster role to be switched to is configured. Since the current slave cluster is to be switched to the master role, it is proposed according to some embodiments of the present disclosure that in the process of performing role switching, the slave cluster configures a storage area corresponding to the master cluster to be switched to after receiving the switching request, and the target is to operate as a new master cluster after the switching is completed. The memory area related to the master role that needs to be configured in this step S202 will be described in detail below.

Next, in step S203, a switching request for switching from the first cluster role to the second cluster role is transmitted to the first cluster having the first cluster role. For example, after the storage area for realizing role switching is configured in the slave cluster self device, an instruction is sent to the master cluster in the file system so that it also performs role switching, that is, the master role is switched to the slave role to realize the file service as a new slave cluster. Regarding the step of performing role switching by the master cluster, reference may be made to the description above in connection with fig. 1.

In step S204, in response to confirming that the role switch has been completed, the current state is set to the first cluster role. The slave cluster sets the current state to the first cluster role, i.e., the master role, only in case that it is confirmed that the role switch has been completed, thereby avoiding data loss or unavailability caused by the role switch.

Through the above steps S201 to S204, the slave cluster in the file system can realize the switching from the slave role to the master role, and provide the file service business as a new master cluster.

According to some embodiments of the present disclosure, the slave cluster role switching method may further include: before receiving the handover request, performing a remote copy task between the first cluster and the second cluster; and before configuring the storage area related to the first cluster role to be switched to, confirming that the remote copy task between the first cluster and the second cluster is stopped, and confirming that the master-slave directory in the first cluster is in a split state.

As described above, for the master cluster and the slave clusters in the file system, in a general state, file service traffic is provided by the master cluster, and the slave clusters serve as backup devices, the master cluster needs to regularly backup data to the slave clusters. That is, the slave cluster needs to periodically perform a remote copy task together with the master cluster, i.e., receive and save the metadata differences transmitted by the master cluster, before receiving the handover request involved in step S201. In the method according to the embodiment of the present disclosure, after the slave cluster receives the handover request involved in step S201, it may first confirm whether a remote copy task between the master and slave clusters is currently being performed, and if so, stop the task to perform stable role handover, avoiding data loss due to the ongoing remote copy task.

According to some embodiments of the present disclosure, configuring a memory area related to a first cluster role to be switched to in step S202 includes: creating a node mapping reversal storage area, wherein the second cluster comprises a node mapping storage area for storing node mapping relation data of a directory and a file between the first cluster and the second cluster, and the node mapping relation data is used for executing a remote copying task between the first cluster and the second cluster; inverting the node mapping relation data stored in the node mapping storage area to obtain node mapping relation inversion data, and storing the node mapping relation inversion data into the node mapping inversion storage area; and transmitting the node mapping relation reversal data stored in the node mapping reversal storage area to the first cluster.

It will be appreciated that there are inomap RBDs and metadif RBDs maintained from the cluster that are required for remote copy tasks prior to performing the role switch. In the process of performing role switching, the slave cluster needs to invert the node mapping relationship data in the inomap RBD in the slave cluster itself, and transmit the inverted data (i.e., the node mapping relationship inversion data) to the master cluster to be a new slave cluster. For the data in inomap RBD, before and after role switching, the roles of the master and slave clusters are reversed, so that the mapping relationship also needs to be reversed. Thus, in a slave role switching method according to the disclosed embodiment, a node map reversal storage area (inomap invert RBD) is created from a cluster for storing node map relationship reversal data.

According to some embodiments of the present disclosure, configuring the memory area related to the first cluster role to be switched to in step S202 further includes: creating a replication task control storage area, wherein the replication task control storage area is used for storing flow control information of remote replication tasks between the first cluster and the second cluster; and creating a metadata difference storage area, wherein the metadata difference storage area is used for storing snapshot differences between a latest snapshot and a latest copy snapshot of data of the second cluster. Since the slave cluster will operate as a new master cluster after performing the role switch, and the master cluster needs to control the remote copy function, in the slave cluster role switch method, the control RBD and metadif RBD required for the master cluster will be created. It will be appreciated that the metadata difference storage area newly created in the second cluster is a snapshot difference between the latest snapshot and the latest copy snapshot of the data for storing the next as the primary cluster, i.e., a snapshot difference between the latest snapshot and the latest copy snapshot of the data of the second cluster. This is because the second cluster performs an operation as a new master cluster after having undergone a role switch.

The slave cluster role switching method according to some embodiments of the present disclosure further includes: and receiving feedback information successfully saved by node mapping relation reverse data from the first cluster, wherein in response to receiving the feedback information, the role switch is confirmed to be completed. The slave cluster role switching method according to some embodiments of the present disclosure may further include: in response to confirming that the role switch has been completed, the node map reversal store and the node map store are removed. This is because the slave cluster will operate as a new master cluster after performing the role switch, and the functions of the master cluster do not need the inomap RBD and inomap invert RBD described above. Further, the slave cluster deletes inomap RBDs and inomap invert RBD in the slave cluster only after determining that the role switch has been completed (i.e., receiving feedback information that node map reverse data from the slave cluster is successfully stored), so as to avoid unnecessary loss caused by losing node map data in the file system during the role switch.

According to some embodiments of the present disclosure, in the case where it is confirmed that the node mapping relationship reverse data is all saved to the node mapping storage area of the primary cluster, for example, by receiving the above-described feedback information, it is confirmed from the cluster that the role switch has been completed. The node mapping reverse data can be completely stored in the master cluster, so that the loss of the part of data is avoided, and the data can be backed up as a new slave cluster after the role switching is completed.

By using the storage area configured in the step S202, the slave cluster after role switching can serve as a new master cluster to provide stable file service, so that the performance stability of the file system is ensured, and file service termination caused by factors such as faults is avoided. And the master-slave clusters in the file system can quickly and efficiently realize the fault transfer in the file system through a simple role switching step, and ensure the normal service after the master-slave roles are switched.

According to some embodiments of the present disclosure, after the role switch initiator sends a role switch request to the slave cluster, the process may be set to wait for the feedback result of the execution of the slave cluster, that is, the initiator will stay in the role switch process before receiving the feedback result, and this implementation may understand the synchronous switching manner, that is, the initiator waits for the feedback result of the execution of the slave cluster after triggering the role switch, during which no other operation is performed.

According to other embodiments of the present disclosure, the slave cluster role switching method further includes: after receiving the switching request, setting a role switching zone bit, wherein the role switching zone bit is used for monitoring a role switching process by a role switching initiating terminal in the background, and when the role switching zone bit is in a first state, the role switching zone bit indicates that the second cluster is in a role switching process, and when the role switching zone bit is in a second state, the role switching zone bit indicates that the role switching process of the second cluster is ended.

The above role switch mode with the role switch flag bit set may be understood as an asynchronous switch mode, and the specific implementation of this asynchronous switch mode may refer to the description of the master cluster role switch method above, which is not repeated here. Similarly, the asynchronous control mode can enable the initiating terminal to return to other processes immediately after sending the role switching request, the front-end processes such as an interactive interface and the like cannot be blocked in the role switching process, and other operations can be continuously realized. The initiating terminal only needs to query the flag bit in the pair state at fixed time to detect whether the role switching process is finished, and can also check error information returned by the master-slave clusters to query whether the switching process is successfully executed.

By using the above-mentioned master cluster role switching method and slave cluster role switching method provided by the embodiments of the present disclosure, under the condition that a master cluster fails or performs regular inspection, etc., the role of the master cluster is switched to a slave cluster, and the role of the slave cluster is switched to the master cluster, so that the former slave cluster is used as a new master cluster to continue to provide file processing services, so as to ensure the service stability of a file system, and further, ensure the correctness and availability of metadata and data in the file system during the role switching process.

According to some embodiments of the present disclosure, the slave cluster role switching method may further include: and under the condition that the interruption of the cluster role switching process is detected, reconfiguring a storage area related to the second cluster role to be switched to, and regardless of the current state of the cluster role switching process. This approach may be denoted as interrupt handling for full redo.

The slave cluster role switching method may further include: under the condition that interruption of the cluster role switching process is detected, comparing the data of the node mapping storage area with the data of the node mapping reverse storage area, and continuously executing the reverse rotation of the node mapping relation data from the reverse interruption position; and checking the transmission progress of the node mapping relation reversal data to the first cluster, and continuing to transmit the node mapping relation reversal data from the transmission interruption position. This approach may be denoted as interrupt handling for breakpoint resume.

In the process of performing the role switch, each step of performing the operation may have an error or an abnormal interrupt. The core principle of role switching exception handling is how to ensure that the inomap RBD on the original slave cluster (i.e., the new master cluster) is not lost, and that the new slave cluster (i.e., the original master cluster) can reconstruct the inomap RBD completely at the home end.

To ensure the integrity of the inomap RBD data described above, in the cluster role switch method according to some embodiments of the present disclosure, the slave cluster confirms that the role switch has been completed after receiving the feedback information successfully held inomap invert data from the master cluster, and removes the inomap RBD and inomap invert RBD after confirming that the role switch has been completed.

In addition, in order to ensure that the role switching is not affected by the abort, the role switching method according to the embodiment of the disclosure further includes the interrupt processing mode of the complete redo and the breakpoint resume.

In the interrupt handling approach of full redo, the slave cluster is not concerned whether inomap data inversion has been performed to completion. When the interruption ends, the role switch is re-performed again from the cluster, i.e., the inversion operation is re-headed from the first inomap recording, even though all inomap inversion data has been recorded to inomap invert RBD. In addition, the slave cluster is not concerned whether inomap invert RBD data has completed transmission to the master cluster. When the interrupt is over, the role switch is re-performed again from the cluster, and the reverse data is re-sent to the master cluster from the first record of inomap invert RBD, even if all the reverse data has been sent to the master cluster and successfully saved.

In the interrupt processing method of breakpoint resume, when role switching is performed again from the cluster after the interrupt is ended, the contents in inomap RBD and inomap invert RBD are compared, and the inversion processing is continuously performed from the last inomap record which has been inverted. The slave cluster also persists inomap invert RBD data transmission progress, and when the slave cluster performs role switching again after the interrupt is ended, the reverse data is continuously transmitted from the breakpoint record.

In addition, in other embodiments according to the present disclosure, the initiator of the role switch (e.g., an administrator of the file system) may also persist the master-slave correspondence of the local role switch, set the role switch flag bit, and then initiate the role switch operation before instructing the master-slave cluster to perform the role switch task. After the interruption is finished, the manager can initiate the role switching request of the same master-slave corresponding relation again according to the persistence record of the master-slave corresponding relation, so as to avoid the problem of inomap RBD data loss caused by the incomplete role switching operation of the previous round.

By the method, important data in the inomap RBD is not lost, so that stable file service business is provided, and a user can continuously access files without being influenced by the cluster role switching process.

The steps performed in a file system for performing a role switch (Slave to Master) from a cluster are described above in connection with fig. 2. The slave cluster role switching method according to the embodiment of the present disclosure described above in connection with fig. 2 can switch the role of the second cluster from the second cluster role (slave role) to the first cluster role (master cluster) to perform a subsequent operation as a new master cluster.

For a more detailed understanding of aspects of the present disclosure, a role switch flow diagram in accordance with some embodiments of the present disclosure will be described next in conjunction with the accompanying drawings. It is to be understood that the following flow descriptions are merely exemplary and that the execution of methods according to embodiments of the present disclosure is not limited to the sequential steps shown in the flow diagrams.

Fig. 3 illustrates a primary cluster synchronization role switch execution flow diagram in accordance with some embodiments of the present disclosure. As shown in fig. 3, the initiator may first initiate a role switch to the master cluster, i.e., instruct the master cluster to switch to the slave cluster. Next, the Master cluster (Master) may perform a state validity check, which may check, as an example, whether a remote copy task is currently being performed, and may stop remote copy if it is being performed, for example, and may further include confirming that the Master-slave directory is in a split state, avoiding starting a timing synchronization task. Next, the master cluster may configure storage areas associated with the slave roles to switch to, including removing replication task control storage areas (remove control RBD), creating metadata difference storage areas (rebuild metadiff RBD), and creating node map storage areas (rebuild inomap RBD). After the configuration is completed, the master cluster may send a network request to the peer to perform the role switch, i.e., the master cluster sends a switch request to the slave cluster in the file system (Switch role to Master). The master cluster confirms that the role switch is completed after successfully storing inomap index data sent by the Slave cluster, and sets the local role as a Slave role (Slave).

Referring to fig. 3, a similar status validity check will also be performed from the cluster after receiving the handover request. Next, the slave cluster may configure storage areas associated with the master role to switch to, including creating a replication task control storage area (rebuild control RBD), creating a metadata difference storage area (rebuild metadiff RBD), and creating a node map reversal storage area (rebuild inomap invert RBD). After the RBD configuration is completed, the slave cluster will invert the node mapping relationship data in its own inomap RBD and save the inverted data to the created inomap invert RBD. Thereafter, the slave cluster sends the data in inomap invert RBD to the peer (i.e., the master cluster) and performs some necessary environmental cleanup operations. After confirming that the data has been stored in the inomap RBD in the Master cluster, the slave cluster sets the home role to Master (Master), and finally, deletes the inomap RBD and inomap invert RBD from the cluster.

In addition, as shown in fig. 3, in the process of performing role switching between the master cluster and the slave cluster, the current state feedback information may be continuously sent to the role switching initiator to inform the initiator of the current role switching execution progress, which may enable the initiator to obtain the content such as the switching progress and the stage where the switching failure occurs based on the feedback information.

Fig. 4 illustrates a Slave-cluster synchronous role switch execution flow chart according to some embodiments of the present disclosure, in contrast to fig. 3, in the method illustrated in fig. 4, the role switch initiator sends a switch request to the Slave cluster (Slave) first, that is, the Slave cluster starts to perform role switching first, for example, performing operations such as state validity checking, configuring RBD, and the like, and sends a role switch request to the master cluster by the Slave cluster, so that the master cluster performs role switching. The other process shown in fig. 4 may refer to the description of fig. 3 and is not repeated here.

Fig. 5 illustrates a flow chart of an asynchronous role switch execution of a primary cluster according to some embodiments of the present disclosure, in contrast to fig. 3, in fig. 5, after a role switch initiator sends a switch request to the primary cluster, the primary cluster first performs a state validity check to confirm whether a role switch is currently enabled. Then, the master cluster sets a role switch flag (e.g., by setting a pair), and then starts the background role switch flow. By setting the flag bit, the initiating terminal can monitor the role switch flag bit in the background process and check whether the flag bit is reset to confirm whether the role switch is completed. In addition, the initiating terminal can also check the error codes sent by the master cluster or the slave cluster in the background so as to check the role switching error information and record the execution result. The subsequent role switch steps performed by the master and slave clusters may be described with reference to fig. 3 and will not be repeated here.

Fig. 6A illustrates a schematic diagram of an abort occurring to perform role switching, in which in the example of fig. 6A, the initiator persists the master-slave switching data, i.e., maintains a role switching record for the master-slave cluster, so that in the event of an abort, the cluster switching state can be obtained by looking up the record. As shown in fig. 6A, the cluster switch state may be recorded by setting a value of a switch flag (set switch_role=true), for example, when the flag is True, the Master cluster is switched to the Slave cluster (Master to Slave) at the present time, and when the flag is False, the Slave cluster is switched to the Master cluster (Slave to Master) at the present time. In contrast to the role switch procedure of fig. 3, the primary cluster first checks whether the inomap RBD is stored after receiving the role switch request, and performs RBD configuration in the absence. Similarly, the slave cluster first checks whether inomap invert RBD exists after receiving the handover request, and in case of no existence, performs RBD configuration again. Further, in the switching flow shown in fig. 6A, there is also shown a situation in which a role switching abort occurs, for example, a data transmission error remote end is abnormally off-line, that is, the slave cluster in fig. 6A is off-line due to an abnormal cause in the process of transmitting inomap invert RBD data to the master cluster, and transmission is suspended.

Next, fig. 6B illustrates another schematic diagram of an abort occurring when a role switch is performed, according to some embodiments of the present disclosure. Fig. 6B shows an execution flow in which the slave (slave cluster) is brought on line again after the occurrence of the abort shown in fig. 6A, and the role switch is continued. After coming online from the cluster, the initiator acquires the master-slave switching persistence data, for example, records the cluster switching state by setting the value of the switching flag (set switch_role=true). Next, the initiator sends a handoff request to the master cluster, which first checks if there is an inomap RBD, similar to fig. 6A, and if so, if there is data. Since a partial role switching flow has been performed in fig. 6A, the transmission process is interrupted during the data transmission due to an anomaly, and thus partial data exists in the current inomap RBD of the main cluster. The master cluster then sends a switch request to the slave cluster, which checks inomap invert RBD the progress of the transmission, and the slave cluster continues to transmit data at the breakpoint. An implementation in which an interruption occurs in the course of performing a role switch, and the role switch is continued from the recorded interruption position after the disappearance of the abnormal situation is described based on fig. 6A and 6B.

The present disclosure provides a cluster role switching method in a file system, which includes a master cluster implementing a file access task, and a slave cluster as a backup, that is, a remote copy backup function is provided between the occurrence of the master cluster and the slave cluster. Under the conditions of failure of a master cluster or regular checking, the role of the master cluster is switched to a slave cluster, and the role of the slave cluster is switched to the master cluster, so that file processing service is continuously provided by the former slave cluster as a new master cluster, and a backup function is provided by the former master cluster as a new slave cluster, thereby ensuring the service stability of a file system.

According to some embodiments of the present disclosure, there is also provided a first cluster apparatus in a file system, which may be, as an example, a primary cluster in the file system described above, i.e. having a first cluster role.

Fig. 7 illustrates a schematic block diagram of a first cluster tool in accordance with some embodiments of the disclosure, which will be described in detail below in conjunction with fig. 7.

As shown in fig. 7, the first cluster apparatus 1000 may include a receiving unit 1010, a processing unit 1020, and a transmitting unit 1030. According to some embodiments of the present disclosure, the receiving unit 1010 is configured to receive a switch request to switch from a first cluster role to a second cluster role. The processing unit 1020 is configured to configure a memory area associated with the second cluster role to be switched to in response to the switch request. The transmission unit 1030 is configured to transmit a switching request to switch from the second cluster role to the first cluster role to the second cluster having the second cluster role. Next, the processing unit 1020 is further configured to set the current state to the second cluster role in response to confirming that the role switch has been completed.

According to some embodiments of the present disclosure, the processing unit 1020 is further configured to perform a remote copy task between the first cluster and the second cluster prior to receiving the handover request; and before configuring the storage area related to the second cluster role to be switched to, confirming that the remote copy task between the first cluster and the second cluster is stopped, and confirming that the master-slave directory in the first cluster is in a split state.

According to some embodiments of the present disclosure, the processing unit 1020 configuring the storage area associated with the second cluster role to be switched to includes: and creating a node mapping storage area, wherein the node mapping storage area is used for storing node mapping relation data of the directory and the file between the first cluster and the second cluster, and the node mapping relation data is used for executing a remote copying task between the first cluster and the second cluster.

According to some embodiments of the present disclosure, the receiving unit 1010 is further configured to receive node mapping relation reversal data of the node mapping reversal storage area from the second cluster. According to some embodiments of the present disclosure, the processing unit 1020 is further configured to save the received node mapping relation reversal data as node mapping relation data to the node mapping storage area, wherein in case that it is confirmed that the node mapping relation reversal data is all saved to the node mapping storage area, it is confirmed that the role switch has been completed.

According to some embodiments of the present disclosure, the processing unit 1020 configuring the storage area associated with the second cluster role to be switched to further comprises: removing a replication task control storage area, wherein the replication task control storage area is used for storing flow control information of a remote replication task between a first cluster and a second cluster; and creating a metadata difference storage area, wherein the metadata difference storage area is used for storing snapshot differences between a latest snapshot and a latest copy snapshot of data of the second cluster.

According to some embodiments of the present disclosure, after receiving the handover request, the processing unit 1020 is further configured to: setting a role switching zone bit, wherein the role switching zone bit is used for monitoring a role switching process by a role switching initiating terminal in the background, and when the role switching zone bit is in a first state, the role switching zone bit indicates that a first cluster is in a role switching process, and when the role switching zone bit is in a second state, the role switching zone bit indicates that the role switching process of the first cluster is ended.

According to some embodiments of the present disclosure, there is also provided a second cluster apparatus in a file system, which may be a slave cluster in the file system described above, i.e. having a second cluster role, as an example.

Fig. 8 illustrates a schematic block diagram of a second cluster tool in accordance with some embodiments of the disclosure, which will be described in detail below in conjunction with fig. 8.

As shown in fig. 8, the second cluster apparatus 1100 may include a receiving unit 1110, a processing unit 1120, and a transmitting unit 1130. According to some embodiments of the present disclosure, the receiving unit 1110 is configured to receive a switch request to switch from the second cluster role to the first cluster role. The processing unit 1120 is configured to configure a memory area associated with the first cluster role to be switched to in response to the switch request. The transmission unit 1130 is configured to transmit a switching request to switch from the first cluster role to the second cluster role to the first cluster having the first cluster role. Next, the processing unit 1120 is further configured to set the current state to the first cluster role in response to confirming that the role switch has been completed.

According to some embodiments of the present disclosure, the processing unit 1120 is further configured to: before receiving the switch request, performing a remote copy task between the first cluster and the second cluster; and before configuring the storage area related to the first cluster role to be switched to, confirming that the remote copy task between the first cluster and the second cluster is stopped, and confirming that the master-slave directory in the first cluster is in a split state.

According to some embodiments of the present disclosure, the configuring of the memory area associated with the first cluster role to be switched to by the processing unit 1120 includes: creating a node mapping reversal storage area, wherein the second cluster comprises a node mapping storage area for storing node mapping relation data of a directory and a file between the first cluster and the second cluster, and the node mapping relation data is used for executing a remote copying task between the first cluster and the second cluster; inverting the node mapping relation data stored in the node mapping storage area to obtain node mapping relation inversion data, and storing the node mapping relation inversion data into the node mapping inversion storage area. Next, the transmission unit 1130 is further configured to transmit the node map relationship reversal data stored in the node map reversal storage area to the first cluster.

According to some embodiments of the present disclosure, the receiving unit 1110 is further configured to receive feedback information successfully saved by node mapping relationship reversal data from the first cluster, wherein in response to receiving the feedback information, it is confirmed that the role switch has been completed.

According to some embodiments of the present disclosure, the processing unit 1120 is further configured to remove the node map reversal memory area and the node map memory area in response to confirming that the role switch has been completed.

According to some embodiments of the present disclosure, the configuring of the memory area associated with the first cluster role to be switched to by the processing unit 1120 includes: creating a replication task control storage area, wherein the replication task control storage area is used for storing flow control information of remote replication tasks between the first cluster and the second cluster; and creating a metadata difference storage area, wherein the metadata difference storage area is used for storing snapshot differences between a latest snapshot and a latest copy snapshot of data of the second cluster.

According to some embodiments of the present disclosure, after receiving the handover request, the processing unit 1120 is further configured to set a role handover flag, where the role handover flag is used for the role handover initiator to monitor the role handover process in the background, and when the role handover flag is in the first state, it indicates that the second cluster is performing the role handover process, and when the role handover flag is in the second state, it indicates that the role handover process of the second cluster is ended.

According to some embodiments of the present disclosure, the processing unit 1120 is further configured to reconfigure a memory area related to the second cluster role to be switched to, in case an interruption of the cluster role switching process is detected, regardless of the state in which the cluster role switching process is currently in.

According to some embodiments of the present disclosure, the processing unit 1120 is further configured to: and under the condition that the interruption of the cluster role switching process is detected, comparing the data of the node mapping storage area with the data of the node mapping reverse storage area, and continuously executing the reverse rotation of the node mapping relation data from the reverse interruption position. Next, the transmission unit 1130 is further configured to check the transmission progress of the node map reversal data to the first cluster, and continue to transmit the node map reversal data from the transmission interruption position.

Regarding a specific implementation procedure involved in performing role switching by the first cluster apparatus 1000 according to an embodiment of the present disclosure, reference may be made to the cluster role switching method described above in connection with fig. 1, and a description thereof will not be repeated here. Similarly, regarding a specific implementation procedure involved in performing role switching by the second cluster apparatus 1100 according to an embodiment of the present disclosure, reference may be made to the cluster role switching method described above in connection with fig. 2, and a description thereof will not be repeated here. The first cluster device and the second cluster device can perform similar cluster role switching processes and achieve similar technical effects.

The disclosure provides a cluster device and a medium in a file system besides the method and the device for switching cluster roles in the file system. The following description will be made with reference to the accompanying drawings. The above description of the cluster role switching method and apparatus applies equally to the cluster devices and media to be described below, unless explicitly stated otherwise.

According to yet another aspect of the present disclosure, a cluster device in a file system is provided. Fig. 9 shows a schematic block diagram of a cluster tool in accordance with an embodiment of the disclosure.

As shown in fig. 9, cluster device 2000 may include a processor 2010 and a memory 2020. In accordance with an embodiment of the present disclosure, memory 2020 has stored therein executable instructions that, when executed by processor 2010, can perform a cluster role switching method as described above.

Processor 2010 may perform various actions and processes in accordance with programs stored in memory 2020. In particular, processor 2010 may be an integrated circuit having signal processing capabilities. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. For example, a processor herein may refer to a device capable of implementing a distributed file system.

Memory 2020 stores computer-executable instructions that, when executed by processor 2010, are capable of causing the processor to implement a cluster role switching method in accordance with some embodiments of the present disclosure. The memory 2020 may be volatile memory or nonvolatile memory or may include both volatile and nonvolatile memory. It should be noted that the memory described herein may be any suitable type of memory. By way of example, a processor can implement the steps of a cluster role switch method for use in a file system, such as the cluster role switch method described above in connection with the figures, by executing computer-executable instructions in memory 2020.

The cluster role switching method or apparatus according to embodiments of the present disclosure may also be implemented by means of the architecture of an exemplary computing device 3000 as shown in fig. 10. As shown in fig. 10, the computing device 3000 may include a bus 3010, one or more central processing units (Central Processing Unit, CPU) 3020, read Only Memory (ROM) 3030, random Access Memory (RAM) 3040, communication ports 3050 connected to a network, input/output components 3060, hard disk 3070, and the like. A storage device in the computing device 3000, such as a ROM 3030 or hard disk 3070, may store various data or files for use in processing and/or communication of the methods provided by the present disclosure and program instructions for execution by the CPU. The computing device 3000 may also include a user interface 3080. Of course, the architecture shown in FIG. 10 is merely exemplary, and one or more components of the computing device shown in FIG. 10 may be omitted as may be practical in implementing different devices.

According to yet another aspect of the present disclosure, there is also provided a non-transitory computer-readable storage medium. Fig. 11 shows a schematic diagram of a storage medium according to an embodiment of the present disclosure.

As shown in fig. 11, computer-readable storage medium 4000 has stored thereon computer-readable instructions 4010. When the computer readable instructions 4010 are executed by the processor, the cluster role switch method described with reference to the above figures may be performed. Computer-readable storage media include, but are not limited to, volatile memory and/or nonvolatile memory. Volatile memory can include, for example, random Access Memory (RAM) and/or cache memory (cache) and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. For example, the computer-readable storage medium 4000 may be connected to a computing device such as a computer, and then the cluster role switching method provided according to the embodiments of the present disclosure as described above may be performed in a case where the computing device runs the computer-readable instructions 4010 stored on the computer-readable storage medium 4000.

In summary, the disclosure provides a method, a device, equipment and a medium for switching cluster roles in a file system. The file system comprises a master cluster for realizing file access tasks and a slave cluster as a backup, namely, a remote copy backup function is provided between the master cluster and the slave cluster. Under the conditions of failure of a master cluster or regular checking, the role of the master cluster is switched to a slave cluster, and the role of the slave cluster is switched to the master cluster, so that file processing service is continuously provided by the former slave cluster as a new master cluster, and a backup function is provided by the former master cluster as a new slave cluster, thereby ensuring the service stability of a file system.

Those skilled in the art will appreciate that various modifications and improvements can be made to the disclosure. For example, the various devices or components described above may be implemented in hardware, or may be implemented in software, firmware, or a combination of some or all of the three.

Further, while the present disclosure makes various references to certain elements in a system according to embodiments of the present disclosure, any number of different elements may be used and run on a client and/or server. The units are merely illustrative and different aspects of the systems and methods may use different units.

A flowchart is used in this disclosure to describe the steps of a method according to an embodiment of the present disclosure. It should be understood that the steps that follow or before do not have to be performed in exact order. Rather, the various steps may be processed in reverse order or simultaneously. Also, other operations may be added to these processes.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the methods described above may be implemented by a computer program to instruct related hardware, and the program may be stored in a computer readable storage medium, such as a read only memory, a magnetic disk, or an optical disk. Alternatively, all or part of the steps of the above embodiments may be implemented using one or more integrated circuits. Accordingly, each module/unit in the above embodiment may be implemented in the form of hardware, or may be implemented in the form of a software functional module. The present disclosure is not limited to any specific form of combination of hardware and software.

Unless defined otherwise, all terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The foregoing is illustrative of the present disclosure and is not to be construed as limiting thereof. Although a few exemplary embodiments of this disclosure have been described, those skilled in the art will readily appreciate that many modifications are possible in the exemplary embodiments without materially departing from the novel teachings and advantages of this disclosure. Accordingly, all such modifications are intended to be included within the scope of this disclosure as defined in the claims. It is to be understood that the foregoing is illustrative of the present disclosure and is not to be construed as limited to the specific embodiments disclosed, and that modifications to the disclosed embodiments, as well as other embodiments, are intended to be included within the scope of the appended claims. The disclosure is defined by the claims and their equivalents.

Claims

1. A cluster role switching method in a file system, applicable to a first cluster having a first cluster role, the method comprising:

Receiving a switching request for switching from the first cluster role to a second cluster role;

in response to the switch request, configuring a storage area associated with the second cluster role to be switched to;

transmitting a switching request for switching from the second cluster role to the first cluster role to a second cluster with the second cluster role; and

and setting the current state as the second cluster role in response to confirming that the role switch has been completed.

2. The cluster role switching method in a file system according to claim 1, wherein the method further comprises:

before receiving the handover request, performing a remote copy task between the first cluster and the second cluster; and

before configuring a storage area associated with the second cluster role to be switched to, confirming that a remote copy task between the first cluster and the second cluster is stopped, and confirming that a master-slave directory in the first cluster is in a split state.

3. The cluster role switching method in a file system according to claim 1, wherein the configuring a storage area related to the second cluster role to be switched to includes:

Creating a node mapping storage area, wherein the node mapping storage area is used for storing node mapping relation data of a directory and a file between the first cluster and the second cluster, and the node mapping relation data is used for executing a remote copy task between the first cluster and the second cluster.

4. A cluster role switch method in a file system in accordance with claim 3, the method further comprising:

and receiving node mapping relation reversal data of a node mapping reversal storage area from the second cluster, and storing the received node mapping relation reversal data as the node mapping relation data into the node mapping storage area, wherein under the condition that the node mapping relation reversal data are all confirmed to be stored into the node mapping storage area, role switching is confirmed to be completed.

5. The cluster role switching method in a file system according to claim 1, wherein the configuring a storage area related to the second cluster role to be switched to includes:

removing a replication task control storage area, wherein the replication task control storage area is used for storing flow control information of a remote replication task between the first cluster and the second cluster; and

A metadata difference storage area is created, wherein the metadata difference storage area is used for storing snapshot differences between a latest snapshot and a latest copy snapshot of data of the second cluster.

6. The cluster role switching method in a file system according to claim 1, wherein the method further comprises:

after receiving the switching request, setting a role switching zone bit, wherein the role switching zone bit is used for monitoring a role switching process by a role switching initiating terminal in a background, the role switching zone bit is in a first state and indicates that the first cluster is in a role switching process, and the role switching zone bit is in a second state and indicates that the role switching process of the first cluster is ended.

7. A cluster role switching method in a file system, applicable to a second cluster having a second cluster role, the method comprising:

receiving a switching request for switching from the second cluster role to the first cluster role;

in response to the switch request, configuring a storage area associated with the first cluster role to be switched to;

transmitting a switching request for switching from the first cluster role to the second cluster role to a first cluster having the first cluster role; and

And setting the current state as the first cluster role in response to confirming that the role switch is completed.

8. The cluster role switching method in a file system of claim 7, wherein the method further comprises:

before configuring a storage area associated with the first cluster role to be switched to, confirming that a remote copy task between the first cluster and the second cluster is stopped, and confirming that a master-slave directory in the first cluster is in a split state.

9. The cluster role switching method in a file system according to claim 7, wherein the configuring a storage area related to the first cluster role to be switched to includes:

creating a node mapping reverse storage area, wherein the second cluster comprises a node mapping storage area for storing node mapping relation data of a directory and a file between the first cluster and the second cluster, and the node mapping relation data is used for executing a remote copy task between the first cluster and the second cluster;

Inverting the node mapping relation data stored in the node mapping storage area to obtain node mapping relation inversion data, and storing the node mapping relation inversion data into the node mapping inversion storage area; and

and sending the node mapping relation reversal data stored in the node mapping reversal storage area to the first cluster.

10. The cluster role switching method in a file system according to claim 9, wherein the method further comprises:

and receiving feedback information successfully saved by the node mapping relation reversal data from the first cluster, wherein the role switching is confirmed to be completed in response to the feedback information.

11. The cluster role switching method in a file system according to claim 10, wherein the method further comprises:

the node map reverse storage area and the node map storage area are removed in response to confirming that the role switch has been completed.

12. The cluster role switching method in a file system according to claim 7, wherein the configuring a storage area related to the first cluster role to be switched to includes:

Creating a replication task control storage area, wherein the replication task control storage area is used for storing flow control information of a remote replication task between the first cluster and the second cluster; and

13. The cluster role switching method in a file system of claim 7, wherein the method further comprises:

after receiving the switching request, setting a role switching zone bit, wherein the role switching zone bit is used for monitoring a role switching process by a role switching initiating terminal in a background, the role switching zone bit indicates that the second cluster is in a role switching process when in a first state, and indicates that the role switching process of the second cluster is ended when in a second state.

14. The cluster role switching method in a file system according to claim 9, wherein the method further comprises:

and under the condition that the interruption of the cluster role switching process is detected, reconfiguring a storage area related to the second cluster role to be switched to, regardless of the current state of the cluster role switching process.

15. The cluster role switching method in a file system according to claim 9, wherein the method further comprises:

under the condition that interruption of the cluster role switching process is detected, comparing the data of the node mapping storage area with the data of the node mapping inversion storage area, and continuing to execute inversion of the node mapping relation data from the inversion interruption position; and

and checking the transmission progress of the node mapping relation reversal data to the first cluster, and continuing to transmit the node mapping relation reversal data from a transmission interrupt position.

16. A first cluster apparatus in a file system, the apparatus comprising:

a receiving unit configured to receive a switching request to switch from the first cluster role to a second cluster role;

a processing unit configured to configure a memory area associated with the second cluster role to be switched to in response to the switching request;

a transmission unit configured to transmit a switching request to a second cluster having the second cluster role to switch from the second cluster role to the first cluster role; and

the processing unit is further configured to set the current state to the second cluster role in response to confirming that the role switch has been completed.

17. A second cluster apparatus in a file system, the apparatus comprising:

a receiving unit configured to receive a switching request to switch from the second cluster role to the first cluster role;

a processing unit configured to configure a memory area associated with the first cluster role to be switched to in response to the switch request;

a transmission unit configured to transmit a switching request to switch from the first cluster role to the second cluster role to a first cluster having the first cluster role; and

the processing unit is further configured to set a current state to the first cluster role in response to confirming that a role switch has been completed.

18. A cluster device in a file system, the cluster device comprising:

a processor, and

a memory storing computer-executable instructions that, when executed by the processor, cause the processor to perform the cluster role switching method in a file system as claimed in any of claims 1-6 or to perform the cluster role switching method in a file system as claimed in any of claims 7-15.

19. A non-transitory computer-readable storage medium having stored thereon computer-readable instructions, wherein the computer-readable instructions, when executed by a processor, cause the processor to perform the cluster role switching method in a file system according to any of claims 1-6 or the cluster role switching method in a file system according to any of claims 7-15.