CN115145782A

CN115145782A - Server switching method, mooseFS system and storage medium

Info

Publication number: CN115145782A
Application number: CN202110341290.6A
Authority: CN
Inventors: 奚诚
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2022-10-04

Abstract

The embodiment of the application discloses a server switching method, a MooseFS system and a storage medium, wherein the MooseFS system comprises: the server switching method comprises the following steps: when the backup server monitors that the main server operates abnormally, a query request is sent to the cluster monitoring module; wherein, the query request carries the identification information of the main server; the cluster monitoring module determines first service information corresponding to the main server according to the identification information; the first service information represents the actual running state of the main server; the cluster monitoring module sends a query response to the backup server; wherein the query response carries first service information; and the main server and the backup server determine whether to perform server switching according to the first service information. The server switching method can prevent the master node and the standby node in the MooseFS system from contending for resources, and high availability of the MooseFS system is achieved.

Description

Server switching method, mooseFS system and storage medium

Technical Field

The invention relates to the technical field of distributed storage, in particular to a server switching method, a MooseFS system and a storage medium.

Background

Distributed storage refers to a storage cluster formed by connecting a plurality of single machines, and the storage and reading and writing capabilities of all the machines can be combined. The MooseFS system is a distributed file system, the problem that the storage capacity of a single machine is limited is solved by storing files in a cluster, the limitation of the throughput of the single machine is solved through a network medium, data safety is guaranteed by multiple sets of local mechanisms, and the system is convenient to expand.

The dynamic replacement with IP (Ucarp) is a Linux implemented version of Common Access Redundancy Protocol (card), which allows a main server and other servers to share one virtual Internet Protocol (IP) address, and when the main server fails, the other servers automatically take over the main server to provide services to a cluster.

However, in practical applications, a phenomenon of misjudging that the main server fails usually occurs, at this time, the main server still operates normally and provides services to the cluster, but the Ucarp judges that other servers replace the main server to provide services to the cluster, so that the other servers compete for resources with the main server, and further the problems of chaos of a MooseFS system and data damage are caused.

Disclosure of Invention

The embodiment of the application provides a server switching method, a MooseFS system and a storage medium, which can avoid the problems of disordered states and data damage of the MooseFS system caused by the fact that a main node and a standby node in the MooseFS system contend for resources, and further effectively realize high availability of the MooseFS system.

The technical scheme of the embodiment of the application is realized as follows:

in a first aspect, an embodiment of the present application provides a server switching method, where the method is applied to a MooseFS system, where the MooseFS system includes: the method comprises the following steps:

when the backup server monitors that the main server operates abnormally, sending a query request to the cluster monitoring module; wherein the query request carries identification information of the main server;

the cluster monitoring module determines first service information corresponding to the main server according to the identification information; wherein the first service information characterizes an actual operating state of the primary server;

the cluster monitoring module sends a query response to the backup server; wherein the query response carries the first service information;

and the main server and the backup server determine whether to perform server switching or not according to the first service information.

In a second aspect, an embodiment of the present application provides a MooseFS system, which includes: main server, backup server and cluster monitoring module, wherein:

the main server is used for managing the MooseFS system, and carrying out data transmission with the data node;

the backup server is used for monitoring the running state of the main server and storing the metadata in the main server;

and the cluster monitoring module is used for monitoring the actual running states of the main server and the data nodes.

In a third aspect, an embodiment of the present application provides a MooseFS system, including: the MooseFS system further comprises a processor and a memory storing executable instructions of the processor, and when the instructions are executed by the processor, the server switching method is realized.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, having a program stored thereon, for use in a MooseFS system, the MooseFS system including: the system comprises a main server, a backup server and a cluster monitoring module, and is characterized in that when the program is executed by a processor, the server switching method is realized.

The embodiment of the application provides a server switching method, which comprises a MooseFS system and a storage medium, wherein the MooseFS system comprises a main server, a backup server and a cluster monitoring module, and when the backup server monitors that the main server operates abnormally, a query request is sent to the cluster monitoring module; wherein, the query request carries the identification information of the main server; the cluster monitoring module determines first service information corresponding to the main server according to the identification information; the first service information represents the actual running state of the main server; the cluster monitoring module sends a query response to the backup server; wherein the query response carries first service information; and the main server and the backup server determine whether to perform server switching according to the first service information. That is to say, in the embodiment of the present application, when determining that the current state of the main server is abnormal, the backup server in the moose fs system does not directly execute the server switching process, but first obtains first service information corresponding to the main server through the cluster monitoring module, determines the actual operating state of the main server through the first service information, and then determines whether to execute the server switching process. Therefore, if abnormal conditions such as communication interruption between the main node and the standby node occur in the process of normally managing the whole MooseFS system by the main node, the MooseFS system can prevent the standby node from misjudging the state of the main node at the moment to contend for resources and services through monitoring the main node and the standby node by the cluster monitoring module, so that the conditions of state confusion and data damage of the MooseFS system are avoided, and the high availability of the MooseFS system is ensured.

Drawings

Fig. 1 is a first schematic structural diagram of a MooseFS system according to an embodiment of the present disclosure;

fig. 2 is a schematic flow chart of an implementation process of a server switching method according to an embodiment of the present application

Fig. 3 is a schematic flow chart illustrating an implementation process of a server switching method according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a MooseFS system according to an embodiment of the present application;

fig. 5 is a schematic flow chart illustrating an implementation process of a server switching method according to an embodiment of the present application;

fig. 6 is a schematic flow chart illustrating an implementation process of a server switching method according to an embodiment of the present application;

fig. 7 is a schematic structural diagram three of the MooseFS system according to an embodiment of the present application;

fig. 8 is a schematic flowchart illustrating an implementation flow of a server switching method according to an embodiment of the present application;

fig. 9 is a schematic flow chart illustrating a sixth implementation process of the server switching method according to the embodiment of the present application;

FIG. 10 is a block diagram of a server according to an embodiment of the present application a seventh implementation flow diagram of the switching method;

fig. 11 is an eighth schematic flow chart illustrating an implementation process of a server switching method according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of a MooseFS system according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of a MooseFS system according to an embodiment of the present application;

fig. 14 is a schematic structural diagram six of the MooseFS system according to the embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the specific embodiments described herein are illustrative of the relevant application and are not limiting of the application. It should be noted that, for the convenience of description, only the parts related to the related applications are shown in the drawings.

At present, with the development of internet technology and the gradual deepening of various information systems, new data sources are continuously emerging, the quantity of service data is gradually increased, and the demand for the storage of unstructured files is remarkably increased. However, high-end Storage such as a Storage Area Network (SAN) and a Network Access Storage (NAS) is usually adopted for traditional centralized Storage to cope with explosive growth of data, and as the space and capacity of the Storage cannot be expanded at will, expensive monetary cost is needed for equipment improvement, and the defects of low Storage efficiency, insufficient lateral expansion function, poor load balancing capability, low concurrent access performance and the like exist.

The MooseFS system is a distributed file system, and file allocation storage is performed through a client of the MooseFS system, so that a traditional mode of directly storing a local disk is replaced. However, the moose fs system has a high overall failure rate, and a brain crack problem caused by abnormal conditions such as a master-slave node communication failure may occur, so that the moose fs system is disordered in state and data is damaged, which continuously provides service for users and improves the overall high availability of the moose fs system in order to ensure that the moose fs system does not lose important data in the face of the above abnormal conditions, and thus becomes a technical problem to be solved urgently.

In order to solve the problems of the existing MooseFS system, the embodiment of the application provides a server switching method, a MooseFS system and a storage medium, and the script adjustment is made based on a MooseFS 3.0 mechanism. Specifically, the MooseFS system includes a primary server, a backup server, and a cluster monitoring module. When the backup server monitors that the main server operates abnormally, a query request is sent to the cluster monitoring module; wherein, the query request carries the identification information of the main server; the cluster monitoring module determines first service information corresponding to the main server according to the identification information; the first service information represents the actual running state of the main server; the cluster monitoring module sends a query response to the backup server; wherein, the inquiry response carries the first service information; and the main server and the backup server determine whether to perform server switching according to the first service information. Therefore, the problem that the standby node misjudges the state of the main node when the network communication is abnormal, and warns resources and services to cause split brain is prevented, the conditions of state confusion and data damage of the MooseFS system are prevented, and the high availability of the MooseFS system is ensured.

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

Example one

The embodiment of the present application provides a server switching method, which is applied to a MooseFS system, and fig. 1 is a schematic view of a composition structure of the MooseFS system, and as shown in fig. 1, a MooseFS system 10 may include a main server 11, a backup server 12, and a cluster monitoring module 13.

In the embodiment of the present application, the host server 11 is responsible for maintaining the namespace of the entire MooseFS system 10 and exposing it for the user to use, for data transmission with multiple data nodes, and for managing the entire MooseFS system 10.

The backup server 12 is connected to the main server 11, and can perform active/standby switching to take over the work of the main server 11 when the main server fails, and the backup server 12 can synchronize the metadata of the main server 11 in real time, and synchronize the changed files or directories by using rssync and Sersync architectures, so that not only is the transmission rate high, but also the purpose of synchronizing and backing up the metadata in real time can be achieved.

The cluster monitoring module 13 is used for monitoring the main server 11 and the backup server 12, and by monitoring the main server and the backup server in real time, the cluster monitoring module prevents certain abnormal situations, such as communication failure, misjudgment of the state of the main server by the backup server, misjudgment of the main server as a fault, resource contention and brain fracture caused by service contention, and achieves the effect of preventing the MooseFS system 10 from state confusion and data damage.

Fig. 2 is a schematic flow chart illustrating an implementation of the server switching method provided in the embodiment of the present application, and as shown in fig. 2, in the embodiment of the present application, the server switching method may include the following steps:

step 101, when the backup server monitors that the main server operates abnormally, sending a query request to a cluster monitoring module, wherein the query request carries identification information of the main server.

In the embodiment of the application, a backup server in the moose fs system may monitor a main server in real time based on a heartbeat monitoring mechanism, and when the backup server determines that the current state of the main server is abnormal based on the heartbeat monitoring mechanism, the backup server may request the cluster monitoring module to query the current state of the main server, that is, send a query request to the cluster monitoring module, where the query request carries identification information of the main server.

It should be noted that, in the embodiment of the present application, the backup server periodically sends a heartbeat packet to the main server based on a heartbeat monitoring mechanism, and receives feedback information, and if the sent heartbeat packet does not receive the feedback information within a preset time, it is determined that the current state of the main server is abnormal. And then sending a query request to the cluster monitoring module, wherein the query request carries the identification information of the main server.

In an embodiment of the present application, the query request may be used to obtain a service status of the primary server. Specifically, when the backup server does not receive the feedback information within the preset time and determines that the current state of the primary server is abnormal, the backup server sends the monitored first service information of the primary server through the request cluster monitoring module, and the service state of the primary server can be determined according to the received first service information.

In an embodiment of the application, the identification information may be used to represent identity information of the main server, and further, the cluster monitoring module may determine, in the service information of all the nodes obtained and stored by the cluster monitoring module, first service information corresponding to the main server according to the identification information sent by the backup server. The setting mode of the identification information is not specifically limited in this application.

It should be noted that, since the monitoring of the main server by the backup server based on the heartbeat monitoring mechanism may cause misjudgment due to communication interruption between the main server and the backup server, when the backup server determines that the main server may have a fault at this time, it needs to send an inquiry request to the cluster monitoring module to clarify the real state of the main server at this time, instead of directly performing main-backup switching according to the fault determination.

102, the cluster monitoring module determines first service information corresponding to the main server according to the identification information; the first service information represents the actual running state of the main server.

In the embodiment of the application, when the backup server determines that the current state of the main server is abnormal based on the heartbeat monitoring mechanism, the backup server sends a query request to the cluster monitoring module, and after receiving the query request which is sent by the backup server and carries identification information, the cluster monitoring module in the moose fs system can further determine first service information of the main server according to the identification information.

In an embodiment of the present application, the first service information of the main server may be first service information obtained by monitoring the main server by the cluster monitoring module, and the first service information is first service information generated according to all services running on the main server, and indicates another service state information obtained by monitoring the main server by the cluster monitoring module on the upper layer of the backup server, in addition to the current state information obtained by monitoring the main server by the backup server, that is, the first service information, where the first service information is used to indicate an actual running state of the main server.

For example, in an embodiment of the present application, when the cluster monitoring module determines the first service information corresponding to the main server according to the identification information, the cluster monitoring module may monitor the service state of the main server at all times, and store the first service information according to the identification information, and after receiving an inquiry request sent by the backup server, the cluster monitoring module may locate the corresponding first service information pre-stored in the cluster monitoring module according to the identification information carried in the inquiry request; the cluster monitoring module may also directly query the first state information of the master server at the moment according to the identification information without pre-storing the first service information.

For example, in an embodiment of the present application, when the cluster monitoring module determines, according to the identification information, first service information corresponding to the main server, the cluster monitoring module may monitor the main server to obtain the first service information, and then determine, according to the first service information, a service state of the main server, where especially when the backup server performs fault judgment on the current service state of the main server, the cluster monitoring module needs to perform more objective judgment and determination on the service state of the main server according to the first service information sent by the cluster monitoring module.

Step 103, the cluster monitoring module sends a query response to the backup server, wherein the query response carries the first service information.

In an embodiment of the present application, after determining, by the cluster monitoring module, the first service information corresponding to the main server according to the identification information, the query request sent by the backup server is responded, and a query response carrying the first service information is sent to the backup server, so that the backup server can obtain the first service information of the main server monitored by the cluster monitoring module.

And step 104, the main server and the backup server determine whether to perform server switching according to the first service information.

In an embodiment of the application, after the cluster monitoring module sends the query response to the backup server, the main server and the backup server determine whether to perform server switching according to the first service information.

In the embodiment of the application, the server switching refers to switching between the backup server and the main server, and the backup server is used for replacing the work of the main server, so as to provide service for the system.

It should be noted that, since the first service information represents the actual operating state of the primary server, the primary server and the standby server may determine whether the server needs to perform the server switching based on the first service information, execute the server switching if it is determined that the server switching is needed according to the first service information, take over the work of the primary server by the backup server, and do not execute the server switching if it is determined that the server switching is not needed according to the first service information.

Fig. 3 is a schematic view of a second implementation flow of the server switching method provided in the embodiment of the present application, as shown in fig. 3, in the embodiment of the present application, a main server and a backup server determine whether to perform server switching according to first service information, that is, step 104 may include the following steps:

and step 104a, if the first service information received by the backup server is normal, the main server and the backup server do not execute server switching processing.

In the embodiment of the application, the main server and the backup server determine whether to perform server switching according to the first service information, and if the backup server determines that the first service information of the main server is normal at the moment, the main server and the backup server do not perform server switching processing.

In the embodiment of the application, if the first service information acquired by the backup server is normal, it indicates that the backup server may misjudge the current service state of the main server, that is, the main server does not have a fault, and it may be that the backup server does not receive the information fed back by the main server due to the communication failure between the main server and the backup server, so that the backup server mistakenly considers that the state of the main server is abnormal. Therefore, according to the obtained first service information, the backup server can determine that the main server is in a normal operation state, so that the main server and the backup server do not execute server switching processing, the problems of system confusion and data loss caused by split brain caused by the condition that communication is not communicated are avoided, and the high availability of the MooseFS system is ensured.

In the embodiment of the application, when the main server has no response temporarily due to an excessively high load, that is, when the main service is falsely dead, the backup server may also determine that the main server is down, so that a false determination is caused, and a split brain problem may be caused. The specific fault information is not limited in this application.

And step 104b, if the first service information received by the backup server is abnormal, executing server switching processing by the main server and the backup server.

In the embodiment of the application, the cluster monitoring module sends a query response to the backup server, wherein the query response carries the first service information, if the first service information is displayed as abnormal in the query response received by the backup server, the current state of the main server is indicated to be abnormal, the main server and the backup server execute server switching processing, and the backup server takes over the work of the main server, so that the fault processing time of the moose FS system is reduced, and the high availability of the moose FS system is ensured.

The embodiment of the application provides a server switching method, which comprises a MooseFS system and a storage medium, wherein the MooseFS system comprises a main server, a backup server and a cluster monitoring module, and when the backup server monitors that the main server operates abnormally, a query request is sent to the cluster monitoring module; wherein, the query request carries the identification information of the main server; the cluster monitoring module determines first service information corresponding to the main server according to the identification information; the first service information represents the actual running state of the main server; the cluster monitoring module sends a query response to the backup server; wherein the query response carries first service information; and the main server and the backup server determine whether to switch the servers or not according to the first service information. That is to say, in the embodiment of the present application, when determining that the current state of the main server is abnormal, the backup server in the moose fs system does not directly perform the server switching process, but first obtains the first service information corresponding to the main server through the cluster monitoring module, determines the actual operating state of the main server through the first service information, and then determines whether to perform the server switching process. Therefore, if abnormal conditions such as communication interruption between the main node and the standby node occur in the process of normally managing the whole MooseFS system by the main node, the MooseFS system can prevent the standby node from misjudging the state of the main node at the moment to contend for resources and services through monitoring the main node and the standby node by the cluster monitoring module, thereby avoiding the conditions of state confusion and data damage of the MooseFS system and ensuring high availability of the MooseFS system.

Example two

Based on the first embodiment, in a further embodiment of the present application, fig. 4 is a schematic structural diagram of a composition structure of a MooseFS system, and as shown in fig. 4, the cluster monitoring module 13 in the MooseFS system 10 may include a service monitoring submodule 131, a threshold monitoring submodule 132, an information storage submodule 133, and a global monitoring submodule 134.

Fig. 5 is a schematic flow chart illustrating a third implementation process of the server switching method provided in the embodiment of the present application, and as shown in fig. 5, the method for determining, by the cluster monitoring module, the first service information corresponding to the main server according to the identification information includes the following steps:

102a, the service monitoring submodule monitors the service state of the main server based on a heartbeat monitoring mechanism and acquires at least one service message corresponding to at least one service; wherein one service corresponds to one service information.

It should be noted that, in the embodiment of the present application, the at least one service of the main server may include at least one of a plurality of services, such as a heartbeat information storage service, a MooseFS system log storage service, and a MooseFS system data node service, which is not limited in this application.

And 102b, if at least one piece of service information is normal, determining that the first service information is normal.

Step 102c, if any service information in the at least one service information is abnormal, determining that the first service information is abnormal.

For example, in the present application, if the service monitoring submodule monitors that the obtained heartbeat information storage service of the main server, the MooseFS system log storage service, and the MooseFS system data node service are all normal, the first service information of the main server is considered to be normal at this time.

Further, in the embodiment of the present application, the information storage sub-module in the cluster monitoring module may receive the heartbeat monitoring file corresponding to the main server sent by the service monitoring sub-module, and store the heartbeat monitoring file.

It should be noted that, in the embodiment of the present application, the information storage sub-module stores a heartbeat monitoring file, that is, service heartbeat related information, in a pre-established shared directory, and the information storage sub-module serves the service monitoring sub-module.

Fig. 6 is a schematic diagram illustrating an implementation flow of a server switching method according to an embodiment of the present application, as shown in fig. 6, the method for determining the first service information corresponding to the main server by the cluster monitoring module according to the identification information further comprises the following steps:

and 102d, the cluster monitoring module acquires the heartbeat monitoring file corresponding to the main server stored by the information storage submodule.

In the embodiment of the application, the information storage submodule receives the heartbeat monitoring file corresponding to the main server and sent by the service monitoring submodule, and stores the heartbeat monitoring file. When the cluster monitoring module needs to acquire the heartbeat monitoring file, the heartbeat monitoring file can be acquired from the information storage submodule.

And 102e, the cluster monitoring module performs verification processing according to the heartbeat maintenance file and the heartbeat monitoring file to obtain a verification result.

In the embodiment of the application, after the cluster monitoring module acquires the heartbeat monitoring file, verification processing can be performed according to the heartbeat maintenance file and the heartbeat monitoring file, and a verification result is obtained.

It should be noted that, in the embodiment of the present application, the heartbeat maintenance file is heartbeat information of the main server obtained by the backup server based on the heartbeat monitoring mechanism.

It should be noted that, in the embodiment of the present application, the heartbeat monitoring file is heartbeat monitoring information corresponding to the main server, which is obtained by the service monitoring sub-module in the cluster monitoring module.

And 102f, if the verification result is that the verification is successful, determining that the first service information corresponding to the main server is normal.

And 102g, if the verification result is verification failure, determining that the first service information corresponding to the main server is abnormal.

In the embodiment of the application, the cluster monitoring module performs verification processing according to the heartbeat maintenance file and the heartbeat monitoring file, and if the verification result is successful, the first service information of the main server is determined to be normal; and if the verification result is verification failure, determining that the first service information corresponding to the main server is abnormal.

In the embodiment of the application, the successful verification means that after the heartbeat maintenance file and the heartbeat monitoring file are compared or verified, the heartbeat monitoring file is found to display that the first service information of the main server is normal, that is, it can be determined that the state of the main server displayed by the heartbeat maintenance file is abnormal, and the main server should be in a normal service state at the moment, and the result is successful verification; the verification failure means that after the heartbeat maintenance file and the heartbeat monitoring file are compared or verified, the heartbeat monitoring file is still found to display that the first service information of the main server is abnormal, and it can be determined that the main server really has the abnormality at this time, that is, the result is the verification failure.

The embodiment of the application provides a server switching method, a MooseFS system and a storage medium, wherein the MooseFS system comprises a main server, a backup server and a cluster monitoring module, when the backup server monitors that the main server operates abnormally, a query request is sent to the cluster monitoring module; wherein, the query request carries the identification information of the main server; the cluster monitoring module determines first service information corresponding to the main server according to the identification information; the first service information represents the actual running state of the main server; the cluster monitoring module sends a query response to the backup server; wherein the query response carries first service information; and the main server and the backup server determine whether to perform server switching according to the first service information. That is to say, in the embodiment of the present application, when determining that the current state of the main server is abnormal, the backup server in the moose fs system does not directly execute the server switching process, but first obtains first service information corresponding to the main server through the cluster monitoring module, determines the actual operating state of the main server through the first service information, and then determines whether to execute the server switching process. Therefore, if abnormal conditions such as communication interruption between the main node and the standby node occur in the process of normally managing the whole MooseFS system by the main node, the MooseFS system can prevent the standby node from misjudging the state of the main node at the moment to contend for resources and services through monitoring the main node and the standby node by the cluster monitoring module, so that the conditions of state confusion and data damage of the MooseFS system are avoided, and the high availability of the MooseFS system is ensured.

EXAMPLE III

Based on the above-described embodiments one and two, in another embodiment of the present application, fig. 7 is a schematic structural diagram of a composition structure of the MooseFS system method according to the embodiment of the present application, and as shown in fig. 7, the MooseFS system 10 further includes a data node 14.

It should be noted that, in the embodiment of the present application, a plurality of data nodes 14 may also be included in the MooseFS system, and are used to provide a storage service for real file data.

Fig. 8 is a schematic flow chart of an implementation process of the server switching method provided in the embodiment of the present application, as shown in fig. 8, in the embodiment of the present application, the server switching method may include the following steps:

step 201, the service monitoring submodule monitors the data node based on the heartbeat monitoring mechanism to obtain second service information corresponding to the data node.

In the embodiment of the application, the MooseFS system further includes a plurality of data nodes, the service monitoring submodule further monitors the plurality of data nodes in the system at the same time, and obtains second service information of the plurality of data nodes, where the second service information represents a service state of the data nodes.

It should be noted that, in the embodiment of the present application, the data node may use a dual network card binding technology, which not only can increase the network transmission speed, but also can ensure that when one of the network cards fails, normal and efficient operation can still be achieved. Illustratively, when one network card of a certain data node fails, the other network card takes over all the loads immediately, so that the service is not interrupted, and a maintenance worker waits for subsequent maintenance, thereby ensuring the normal use of the whole system.

Step 202, if the second service information is suspended, performing service pull-up processing, and monitoring the data node again to obtain updated second service information.

And step 203, if the second service information is stopped after updating, performing alarm processing.

In the embodiment of the application, the service monitoring submodule monitors the data node based on a heartbeat monitoring mechanism, and after second service information corresponding to the data node is obtained, if the second service state is found to be suspended, the second service is pulled up; and if the second service state is that the second service state cannot be started again, performing alarm processing.

In the embodiment of the application, if the second service information is suspended, the service pull-up processing is executed, and the data node is monitored again to obtain the updated second service information; and if the second service information is stopped after updating, performing alarm processing.

Further, in the embodiment of the present application, the information storage sub-module in the cluster monitoring module may receive the second service information corresponding to the data node sent by the service monitoring sub-module, and store the second service information.

In the embodiment of the application, after the service monitoring submodule monitors the data node based on the heartbeat monitoring mechanism and obtains and processes the second service information corresponding to the data node, the information storage submodule is further configured to store the second service information of the data node.

The embodiment of the application provides a server switching method, which comprises a MooseFS system and a storage medium, wherein the MooseFS system comprises a main server, a backup server and a cluster monitoring module, and when the backup server monitors that the main server abnormally operates, a query request is sent to the cluster monitoring module; wherein, the query request carries the identification information of the main server; the cluster monitoring module determines first service information corresponding to the main server according to the identification information; the first service information represents the actual running state of the main server; the cluster monitoring module sends a query response to the backup server; wherein the query response carries first service information; and the main server and the backup server determine whether to perform server switching according to the first service information. That is to say, in the embodiment of the present application, when determining that the current state of the main server is abnormal, the backup server in the moose fs system does not directly execute the server switching process, but first obtains first service information corresponding to the main server through the cluster monitoring module, determines the actual operating state of the main server through the first service information, and then determines whether to execute the server switching process. Therefore, if abnormal conditions such as communication interruption between the main node and the standby node occur in the process of normally managing the whole MooseFS system by the main node, the MooseFS system can prevent the standby node from misjudging the state of the main node at the moment to contend for resources and services through monitoring the main node and the standby node by the cluster monitoring module, thereby avoiding the conditions of state confusion and data damage of the MooseFS system and ensuring high availability of the MooseFS system.

Example four

Based on the first to third embodiments, in another embodiment of the present application, fig. 9 is a sixth schematic implementation flow chart of the server switching method provided in the embodiment of the present application, as shown in fig. 9, in the embodiment of the present application, the server switching method may further include the following steps:

step 301, the threshold monitoring submodule acquires a first state parameter corresponding to the main server.

In an embodiment of the application, the threshold monitoring submodule may obtain a first state parameter corresponding to the main server.

It should be noted that, in the embodiment of the present application, the first state parameter is an operation state parameter of the main server, and may include multiple parameters representing operation states, such as a Central Processing Unit (CPU) usage rate, a memory usage rate, an Input/Output (I/O) access rate, a disk storage rate, and the like, which correspond to the main server, and the present application is not limited specifically.

Step 302, if the first state parameter is larger than a first preset state threshold value, performing alarm processing; the first state parameter is used for monitoring the running state of the main server.

In the embodiment of the application, after the threshold monitoring submodule acquires the first state parameter corresponding to the main server, the threshold monitoring submodule may compare the first state parameter with a first preset state threshold, so that further processing may be performed according to a comparison result.

Specifically, in the application, after acquiring a first state parameter corresponding to the main server, the threshold monitoring submodule compares the first state parameter with a first preset threshold, and performs alarm processing if the first state parameter is greater than the first preset state threshold; the first state parameter is used for monitoring the running state of the main server.

In the embodiment of the application, the first preset threshold is a value preset according to the operation parameter of the main server, and is used for monitoring the operation state of the main server, when the first state parameter is greater than the first preset state threshold, it is indicated that the overload condition possibly exists in the operation state of the main server at the moment, an alarm is given, cluster management personnel are informed to perform timely maintenance, and the conditions that the main server breaks down, the chaos of a MooseFS system is caused, and data is lost are avoided, so that the high-availability effect of the MooseFS system is achieved.

Fig. 10 is a seventh implementation flow diagram of the server switching method provided in the embodiment of the present application, as shown in fig. 10, in the embodiment of the present application, the server switching method may further include the following steps:

step 303 and the threshold monitoring submodule acquires a second state parameter corresponding to the data node.

In an embodiment of the present application, the second state parameter is an operation state parameter of each data node, and may include multiple parameters representing an operation state, such as a CPU usage rate, a memory usage rate, an I/O access rate, and a disk storage rate, corresponding to the data node.

Step 304, if the second state parameter is larger than a second preset state threshold value, performing alarm processing; and the second state parameter is used for monitoring the operation state of the data node.

In the embodiment of the application, after the threshold monitoring submodule acquires the second state parameter corresponding to the data node, if the second state parameter is greater than a second preset state threshold, performing alarm processing; and the second state parameter is used for monitoring the operation state of the data node.

In the embodiment of the application, the second preset state threshold is a value preset according to the operation parameter of the data node, and is used for monitoring the operation state of the data node, and when the second state parameter is greater than the second preset state threshold, it indicates that the operation state of the data node may have an overload condition at this time, and gives an alarm to notify cluster management personnel to perform timely maintenance, so as to avoid that the data node is failed, the load of the remaining data nodes is increased, and the data node may be down, thereby realizing high availability of the data node.

It should be noted that the first state parameter and the second state parameter are operation state parameters of the main server and the data node, respectively, and represent operation states of the main server and the data node, respectively; the first preset state threshold value and the second preset state threshold value are respectively preset values for the running states of the main server and the data nodes and are used for early warning the running states of the main server and the data nodes.

Fig. 11 is an eighth implementation flowchart of the server switching method provided in the embodiment of the present application, as shown in fig. 11, in the embodiment of the present application, the server switching method may further include the following steps:

step 305, the global monitoring submodule monitors the service monitoring submodule, the threshold monitoring submodule and the information storage submodule to obtain a first service process corresponding to the service monitoring submodule, a second service process corresponding to the threshold monitoring submodule and a third service process corresponding to the information storage submodule.

In an embodiment of the present application, the global monitoring submodule is one submodule in the cluster monitoring module, and is configured to: the service monitoring submodule, the threshold monitoring submodule and the information storage submodule monitor the service process so as to ensure that the service process of the service monitoring submodule, the threshold monitoring submodule and the information storage submodule can be kept in a normal state and realize high availability of the cluster monitoring module.

And step 306, if any one of the first service process, the second service process and the third service process is abnormal, performing alarm processing to ensure high availability of the cluster monitoring module.

In the embodiment of the application, after the global monitoring submodule monitors the service monitoring submodule, the threshold monitoring submodule and the information storage submodule and obtains a first service process corresponding to the service monitoring submodule, a second service process corresponding to the threshold monitoring submodule and a third service process corresponding to the information storage submodule, if any one of the first service process, the second service process and the third service process is abnormal, alarm processing is performed.

In the embodiment of the application, the first service process, the second service process and the third service process are monitored by the global monitoring module, when the process is abnormal, an alarm is given for the abnormal condition, cluster management personnel are notified to maintain, and the high availability of the cluster monitoring module is guaranteed.

Fig. 12 is a schematic structural diagram of a composition structure of the moose fs system provided in the embodiment of the present application, and as shown in fig. 12, in the embodiment of the present application, the moose fs system may include a main server, a backup server, a data node, a switch, a cluster monitoring module, and an application of a client.

In the embodiment of the application, a main backup high availability module is composed of a main server and a backup server, the high availability function between the main backup server and the backup server is mainly realized by means of Ucarp and a virtual IP technology, the virtual IP technology provides a floating access point between an application and a data node, and the connection and interaction between the application and the data node are not influenced during the main backup switching.

In the embodiment of the application, the backup server can synchronize the metadata of the main server in real time, and an Rsync and Sersync architecture is adopted, the Sersync records a certain file or a certain directory name which is changed in any way under a monitored directory, and the rssync is responsible for transmitting the file in real time, so that the file or directory which is changed can be synchronized under the collocation of the Rsync and the Sersync architecture, the transmission rate is high, the purpose of synchronizing and backing up the metadata in real time can be achieved, and the problem that the metadata of the backup server is lost due to the fact that a metadata synchronizing mechanism of the main backup server fails and the like is effectively prevented.

In the embodiment of the application, the switch is used for connecting the server and assisting the server to complete the work of data receiving and forwarding. The switch in this application uses two network cards for arbitrary switch breaks down and also can not influence the normal use of system, promotes the usability of switch.

In the embodiment of the application, the data node uses a double-network-card binding technology, so that not only can the network transmission speed be increased, but also normal and efficient work can be still realized when one network card fails. Illustratively, when one network card of a certain data node fails, the other network card takes over all the loads immediately, so that the service is not interrupted, and a maintenance worker waits for subsequent maintenance, thereby ensuring the normal use of the whole system.

In the embodiment of the application, the cluster monitoring module includes a service monitoring submodule, a threshold monitoring submodule, an information storage submodule, and a global monitoring submodule. The cluster monitoring module is used for monitoring the main server, the backup server and the data nodes, and the integral high availability of the MooseFS system is realized.

The embodiment of the application provides a server switching method, which comprises a MooseFS system and a storage medium, wherein the MooseFS system comprises a main server, a backup server and a cluster monitoring module, and when the backup server monitors that the main server operates abnormally, a query request is sent to the cluster monitoring module; wherein, the query request carries the identification information of the main server; the cluster monitoring module determines first service information corresponding to the main server according to the identification information; the first service information represents the actual running state of the main server; the cluster monitoring module sends a query response to the backup server; wherein, the inquiry response carries the first service information; and the main server and the backup server determine whether to switch the servers or not according to the first service information. That is to say, in the embodiment of the present application, when determining that the current state of the main server is abnormal, the backup server in the moose fs system does not directly execute the server switching process, but first obtains first service information corresponding to the main server through the cluster monitoring module, determines the actual operating state of the main server through the first service information, and then determines whether to execute the server switching process. Therefore, if abnormal conditions such as communication interruption between the main node and the standby node occur in the process of normally managing the whole MooseFS system by the main node, the MooseFS system can prevent the standby node from misjudging the state of the main node at the moment to contend for resources and services through monitoring the main node and the standby node by the cluster monitoring module, thereby avoiding the conditions of state confusion and data damage of the MooseFS system and ensuring high availability of the MooseFS system.

EXAMPLE five

Based on the server switching methods provided in the first to fourth embodiments, as shown in fig. 1, an embodiment of the present application provides a MooseFS system 10, including: main server 11, backup server 12, and cluster monitoring module 13, wherein:

the main server is used for managing the MooseFS system and transmitting data with the data nodes;

Fig. 13 is a schematic structural diagram of a moose fs system according to an embodiment of the present disclosure, and as shown in fig. 13, the moose fs system 10 according to the embodiment of the present disclosure includes: a sending unit 15, a determining unit 16 and an executing unit 17.

The sending unit 15 is configured to send a query request to the cluster monitoring module when the backup server monitors that the operation of the main server is abnormal; and the query request carries the identification information of the main server.

A determining unit 16, configured to determine, by the cluster monitoring module, first service information corresponding to the master server according to the identification information; wherein the content of the first and second substances, the first service information characterizes an actual operating state of the primary server.

Further, the sending unit 15 is further configured to send, by the cluster monitoring module, a query response to the backup server; wherein the query response carries the first service information.

The determining unit 16 is further configured to determine whether to perform server switching according to the first service information by the main server and the backup server.

And an executing unit 17, configured to, if the first service information is normal, not execute the server switching process by the main server and the backup server.

Further, the executing unit 17 is further configured to execute the server switching process by the primary server and the backup server if the first service information is abnormal.

In an embodiment of the present application, as shown in fig. 13, the MooseFS system provided in the embodiment of the present application may further include: an acquisition unit 18.

And an obtaining unit 18, configured to perform service state monitoring on the main server by the node service monitoring sub-module based on the heartbeat monitoring mechanism, and obtain at least one service state corresponding to the at least one service.

Further, in an embodiment of the present application, the determining unit 16 is specifically configured to determine that the current service state is normal if at least one service state is normal; and if any one of the at least one service state is abnormal, determining that the current service state is abnormal.

In an embodiment of the present application, as shown in fig. 13, the MooseFS system provided in the embodiment of the present application may further include: and an alarm unit 19.

The alarm unit 19 is used for carrying out alarm processing if the first state parameter is greater than a first preset state threshold value; the first state parameter is used for monitoring the motion state of the main server.

Further, in the embodiment of the present application, the obtaining unit 18 is further configured to obtain, by the threshold monitoring submodule, the first state parameter corresponding to the main server.

In an embodiment of the present application, further, as shown in fig. 13, the moose fs system provided in the embodiment of the present application may further include: a receiving unit 110 and a storage unit 111.

The receiving unit 110 is configured to receive, by the information storage sub-module, first service information corresponding to the main server sent by the service monitoring sub-module.

The storage unit 111 is configured to store the first service information.

Further, in the embodiment of the present application, the obtaining unit 18 is further configured to obtain, by the cluster monitoring module, first service information of the main server stored by the information storage submodule; and the cluster monitoring module performs verification processing according to the heartbeat maintenance file and the first service information to obtain a verification result.

Further, in the embodiment of the present application, the determining unit 16 is further configured to determine that the current service state of the primary server is normal if the verification result is that the verification is successful; and if the verification result is verification failure, determining that the first service information corresponding to the main server is abnormal.

In an embodiment of the present application, as shown in fig. 13, the MooseFS system provided in the embodiment of the present application may further include: a monitoring unit 112.

And a monitoring unit 112, configured to monitor the service monitoring submodule, the threshold monitoring submodule, and the information storage submodule by the global monitoring submodule, and obtain a first service process corresponding to the service monitoring submodule, a second service process corresponding to the threshold monitoring submodule, and a third service process corresponding to the information storage submodule.

Further, in the embodiment of the present application, the alarm unit 19 is further configured to perform alarm processing if any one of the first service process, the second service process, and the third service process is abnormal.

Further, in this embodiment of the application, the obtaining unit 18 is further configured to monitor the data node by the service monitoring sub-module based on a heartbeat monitoring mechanism, and obtain the second service information corresponding to the data node.

Further, in the embodiment of the present application, the execution unit 17 is further configured to pull up the second service if the second service status is suspension; and if the second service state is that the second service state cannot be started again, performing alarm processing.

Further, in the embodiment of the present application, the obtaining unit 18 is further configured to obtain, by the threshold monitoring submodule, the second state parameter of the data node.

Further, in the embodiment of the present application, the execution unit 17 is further configured to perform an alarm process if the second state parameter is greater than a second preset state threshold.

Further, in the embodiment of the present application, the receiving unit 110 is further configured to receive, by the information storage sub-module, second service information corresponding to the data node sent by the service monitoring sub-module.

Further, in the embodiment of the present application, the storage unit 111 is further configured to store the second service information.

Fig. 14 is a sixth schematic structural diagram of the MooseFS system according to the embodiment of the present disclosure, and as shown in fig. 14, the MooseFS system further includes a processor 113, a memory 114 storing executable instructions of the processor 113, a communication interface 115, and a bus 116 for connecting the processor 113, the memory 114, and the communication interface 115.

In an embodiment of the present invention, the Processor 113 may be at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a ProgRAMmable Logic Device (PLD), a Field ProgRAMmable Gate Array (FPGA), a Central Processing Unit (CPU), a controller, a microcontroller, and a microprocessor. It is understood that the electronic devices for implementing the above processor functions may be other devices, and the embodiments of the present application are not limited in particular. A memory 114 may also be included, which memory 114 may be coupled to the processor 113, wherein the memory 114 is configured to store executable program code comprising computer operating instructions, and the memory 114 may comprise a high speed RAM memory and may also include a non-volatile memory, such as at least two disk memories.

In an embodiment of the present application, a bus 116 is used to connect the communication interface 115, the processor 113, and the memory 114 and the intercommunication among these devices.

In an embodiment of the present application, the memory 114 is used for storing instructions and data.

Further, in an embodiment of the present application, the processor 113 is configured to send an inquiry request to the cluster monitoring module when the backup server monitors that the operation of the primary server is abnormal; wherein, the query request carries the identification information of the main server; the cluster monitoring module determines first service information corresponding to the main server according to the identification information; the first service information represents the actual running state of the main server; the cluster monitoring module sends a query response to the backup server; wherein the query response carries first service information; and the main server and the backup server determine whether to perform server switching according to the first service information.

In practical applications, the Memory 114 may be a volatile Memory (volatile Memory), such as a Random-Access Memory (RAM); or a non-volatile Memory (non-volatile Memory), such as a Read-Only Memory (ROM), a flash Memory (flash Memory), a Hard Disk (Hard Disk Drive, HDD) or a Solid-State Drive (SSD); or a combination of the above types of memories and provides instructions and data to the processor 111.

In addition, each functional module in this embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a hardware mode, and can also be realized in a software functional module mode.

Based on the understanding that the technical solutions of the present embodiment substantially or partially contribute to the prior art, or all or part of the technical solutions may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the present embodiment. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The embodiment of the application provides a server switching method, which comprises a MooseFS system and a storage medium, wherein the MooseFS system comprises a main server, a backup server and a cluster monitoring module, and when the backup server monitors that the main server abnormally operates, a query request is sent to the cluster monitoring module; wherein, the query request carries the identification information of the main server; the cluster monitoring module determines first service information corresponding to the main server according to the identification information; the first service information represents the actual running state of the main server; the cluster monitoring module sends a query response to the backup server; wherein the query response carries first service information; and the main server and the backup server determine whether to perform server switching according to the first service information. That is to say, in the embodiment of the present application, when determining that the current state of the main server is abnormal, the backup server in the moose fs system does not directly execute the server switching process, but first obtains first service information corresponding to the main server through the cluster monitoring module, determines the actual operating state of the main server through the first service information, and then determines whether to execute the server switching process. Therefore, if abnormal conditions such as communication interruption between the main node and the standby node occur in the process of normally managing the whole MooseFS system by the main node, the MooseFS system can prevent the standby node from misjudging the state of the main node at the moment to contend for resources and services through monitoring the main node and the standby node by the cluster monitoring module, so that the conditions of state confusion and data damage of the MooseFS system are avoided, and the high availability of the MooseFS system is ensured.

The embodiment of the present application provides a first computer-readable storage medium, on which a program is stored, and the program implements the method according to the first to fourth embodiments when executed by a first processor.

Specifically, the program instructions corresponding to a server switching method in the present embodiment may be stored on a storage medium such as an optical disc, a hard disc, or a usb disk, and when the program instructions corresponding to a server switching method in the storage medium are read or executed by an electronic device, the method includes the following steps:

and the main server and the backup server determine whether to perform server switching according to the first service information.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of implementations of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart block or blocks and/or flowchart block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks in the flowchart and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application.

Claims

1. A server switching method is characterized in that, the server switching method is applied to a MooseFS system of a distributed file system, and the MooseFS system comprises the following steps: the method comprises the following steps:

2. The method of claim 1, wherein the determining, by the primary server and the backup server, whether to perform server switching according to the first service information comprises:

if the first service information received by the backup server is normal, the main server and the backup server do not execute server switching processing;

and if the first service information received by the backup server is abnormal, executing server switching processing by the main server and the backup server.

3. The method of claim 1, wherein the cluster monitoring module comprises: the service monitoring submodule, the cluster monitoring module determines the first service information corresponding to the main server according to the identification information, including:

the service monitoring submodule monitors the service state of the main server based on a heartbeat monitoring mechanism and obtains at least one service message corresponding to at least one service; wherein, one service corresponds to one service information;

if the at least one service message is normal, determining that the first service message is normal;

and if any one of the at least one piece of service information is abnormal, determining that the first service information is abnormal.

4. The method of claim 3, wherein the cluster monitoring module further comprises: a threshold monitoring sub-module, the method further comprising:

the threshold monitoring submodule acquires a first state parameter corresponding to the main server;

if the first state parameter is larger than a first preset state threshold value, performing alarm processing; the first state parameter is used for monitoring the running state of the main server.

5. The method of claim 4, wherein the cluster monitoring module further comprises: an information storage sub-module, the method further comprising:

and the information storage submodule receives the heartbeat monitoring file corresponding to the main server sent by the service monitoring submodule and stores the heartbeat monitoring file.

6. The method according to claim 5, wherein the query request further carries a heartbeat maintenance file of the primary server, and the determining, by the cluster monitoring module, first service information corresponding to the primary server according to the identification information includes:

the cluster monitoring module acquires the heartbeat monitoring file corresponding to the main server stored by the information storage submodule;

the cluster monitoring module carries out verification processing according to the heartbeat maintenance file and the heartbeat monitoring file to obtain a verification result;

if the verification result is that verification is successful, determining that the first service information corresponding to the main server is normal;

and if the verification result is verification failure, determining that the first service information corresponding to the main server is abnormal.

7. The method of claim 6, wherein the cluster monitoring module further comprises: a global monitoring submodule, the method further comprising:

the global monitoring submodule monitors the service monitoring submodule, the threshold monitoring submodule and the information storage submodule to obtain a first service process corresponding to the service monitoring submodule, a second service process corresponding to the threshold monitoring submodule and a third service process corresponding to the information storage submodule;

and if any one of the first service process, the second service process and the third service process is abnormal, performing alarm processing to ensure high availability of the cluster monitoring module.

8. The method of claim 3, wherein the MooseFS system further comprises: a data node, the method further comprising:

the service monitoring submodule monitors the data node based on a heartbeat monitoring mechanism to obtain second service information corresponding to the data node;

if the second service information is suspended, performing service pull-up processing, monitoring the data node again, and acquiring updated second service information;

and if the second service information is stopped after the updating, performing alarm processing.

9. A MooseFS system, comprising: a main server, a backup server, and a cluster monitoring module, wherein,

10. A MooseFS system, comprising: a primary server, a backup server, and a cluster monitoring module, the MooseFS system further comprising a processor, a memory storing instructions executable by the processor, the instructions when executed by the processor implementing the method of any one of claims 1-8.

11. A computer readable storage medium having a program stored thereon, for use in a MooseFS system, the MooseFS system comprising: a primary server, a backup server and a cluster monitoring module, characterized in that said programs, when executed by a processor, implement the method according to any of claims 1-8.