CN116112500B - NFS high availability system and method based on fault detection and routing strategy - Google Patents

NFS high availability system and method based on fault detection and routing strategy Download PDF

Info

Publication number
CN116112500B
CN116112500B CN202310082854.8A CN202310082854A CN116112500B CN 116112500 B CN116112500 B CN 116112500B CN 202310082854 A CN202310082854 A CN 202310082854A CN 116112500 B CN116112500 B CN 116112500B
Authority
CN
China
Prior art keywords
nfs
node
server
current
virtual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310082854.8A
Other languages
Chinese (zh)
Other versions
CN116112500A (en
Inventor
陈奇
徐文豪
王弘毅
张凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SmartX Inc
Original Assignee
SmartX Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SmartX Inc filed Critical SmartX Inc
Priority to CN202310082854.8A priority Critical patent/CN116112500B/en
Publication of CN116112500A publication Critical patent/CN116112500A/en
Application granted granted Critical
Publication of CN116112500B publication Critical patent/CN116112500B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1034Reaction to server failures by a load balancer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/28Routing or path finding of packets in data switching networks using route fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses an NFS high availability system and method based on fault detection and routing strategy, which establishes a subnetwork specifically used for NFS mode connection; then setting fixed NFS service end node virtual IP on the current NFS client node; virtual IP routing of the NFS server node to the NFS server node; detecting the connection state of the NFS mode connection and the running state of the current NFS server node in real time, and switching the connection state of the failed NFS mode connection to the normal NFS server node or switching the failed NFS server node to the normal NFS server node in real time according to the detection result of the monitoring module.

Description

NFS high availability system and method based on fault detection and routing strategy
Technical Field
The invention relates to the field of super-fusion storage data processing, in particular to an NFS high availability system and method based on fault detection and routing strategies.
Background
In the super fusion infrastructure, the computational load (application/virtual machine) is on the same set of physical servers as the associated data. However, unlike the traditional storage device directly connected with the application program only by using the device, in the super fusion system, the storage resources (hard disk or AEP and other novel storage media) are not directly exposed to the application program for use, but all the storage resources in the whole super fusion cluster are pooled first, and then virtual storage services (virtual disk, virtual file system and the like) are provided for delivering the application program for use. The accessed data of each application program may be distributed over all nodes of the whole super fusion system, and when the data loss is caused by the abnormality of a single storage server or a single disk or other storage media, redundant/backup data can be obtained from other healthy disks or servers to reconstruct the lost data.
In addition, in the super-fusion infrastructure, after the super-fusion distributed storage cluster pools the storage resources, the storage resources may be exposed to the server host in a variety of ways, and NFS is one of them. However, there are some problems with directly using NFS, the most significant of which are: NFS has the potential for single point failures to render storage unavailable, that is, NFS itself does not have high availability capabilities.
To address this problem, the prior art generally uses DNS-based NFS high availability schemes with VIP-based NFS high availability schemes,
in a common DNS-based high availability solution, the NFS client no longer directly accesses the NFS server, but instead proxies through a domain name; the domain name can be mapped to a plurality of IPs, an available NFS server is simply polled and found, and the NFS request is transferred to the corresponding server; however, multiple NFS servers are maintained to ensure high availability of NFS, where multiple NFS servers belong to the same cluster and represent the same set of data. Data is synchronized between them.
The VIP-based NFS high availability solution workflow is similar to the DNS-based NFS high availability solution, and the client sees a unique VIP (virtual IP) address. VIPs will remain highly available in the storage cluster (i.e., only one primary server holds this VIP address at the same time). After the main server is abnormal, other servers in the cluster can sense through the cluster strategy, reselect a new main service and automatically configure the VIP to provide services to the outside. This VIP address is always connected to the client.
The VIP-based high availability solution failover may be faster than DNS-based high availability solutions (since the DNS protocol resolution and probing normally used for external public networks tolerates anomaly thresholds that are higher from the original design goals than IP protocols primarily used for internal networks to avoid unnecessary anomaly switching caused by high delay jitter common in public networks). But the NFS server to which VIP maps at any one time in this scheme is unique. Therefore, the conventional NFS high availability schemes have a single point problem, that is, only one NFS server actually provides services at any time, and other NFS servers only provide hot backups, but do not provide services. This is disadvantageous for fully exploiting the cluster performance.
Disclosure of Invention
The invention aims to provide an NFS high availability system and method based on fault detection and routing strategies, which solve the technical problems pointed out in the prior art.
The invention provides an NFS high availability system based on fault detection and routing strategy, which comprises a distributed storage server cluster, an NFS system, a plurality of NFS clients and a monitoring module, wherein the NFS client is connected with the distributed storage server cluster;
the distributed storage server cluster comprises a plurality of NFS servers;
the NFS system comprises a heartbeat module, a network detection module and a fault detection module;
the heartbeat module is used for asynchronously collecting the connection state of the NFS mode connection and the running state of the current NFS service end node every preset time period which changes at intervals, and then reporting the connection state of the NFS mode connection and the running state of the current NFS service end node to the monitoring module;
the network detection module is used for acquiring node information of the distributed storage cluster; then sending a data packet to each NFS server node to check whether each NFS server node is IP accessible;
the fault detection module is used for detecting whether the destination of the virtual IP of the current NFS server node is faulty in real time according to the IP accessible node list, and if the NFS server node is faulty, actively switching the virtual IP of the NFS server to the next normal NFS server node in the cluster according to the ordering of the IP accessible node list; meanwhile, if the NFS server node is recovered to be normal, actively switching the connection of the virtual IP of the NFS server back to the original NFS server node;
the NFS server side is connected with the NFS client side through an NFS system respectively;
the NFS client is used for sending an access signal to the NFS server through the NFS system;
the NFS server is used for receiving the access signal of the NFS client through the NFS system and feeding back the access signal of the NFS client.
Accordingly, the invention provides an NFS high availability method based on fault detection and routing strategy, comprising the following operation steps:
establishing a subnet special for NFS mode connection on a currently operated server so as to ensure that a current NFS client and a current NFS server node virtual IP are in the same network segment; then setting fixed NFS service end node virtual IP on the current NFS client node;
initializing node information of a current distributed storage cluster, recording IP information of the distributed storage cluster, and recording IP information of NFS server nodes in the distributed storage cluster; and virtual IP routing of the NFS server end node to a normal NFS server end node;
asynchronously collecting the connection state of the NFS mode connection and the running state of the current NFS server node in every interval-changing preset time period, and reporting the connection state of the NFS mode connection and the running state of the current NFS server node to a monitoring module;
acquiring node information of a distributed storage cluster; then, sending a data packet to each NFS server node to check whether each NFS server node is IP accessible, selecting an NFS server with IP accessible, and establishing an IP accessible node list according to the serial number of the NFS server;
detecting whether the destination of the virtual IP of the current NFS server node fails or not in real time according to the IP accessible node list, and if the NFS server node fails, actively switching the virtual IP of the NFS server to the next normal NFS server node in the cluster according to the ordering of the IP accessible node list; meanwhile, if the NFS server node returns to normal, the connection of the virtual IP of the NFS client is actively switched back to the original NFS server node.
Preferably, as an embodiment; the distributed storage cluster node information is distributed storage server cluster node information, and the distributed storage server cluster node information comprises attribute information of all NFS server nodes in a server cluster.
Preferably, as an embodiment; the fault detection module detects whether the destination of the virtual IP of the current NFS server node fails or not in real time according to the IP accessible node list, and if the NFS server node fails, the NFS server virtual IP is actively switched to the next normal NFS server node in the cluster according to the ordering of the IP accessible node list; meanwhile, if the NFS server node returns to normal, the connection of the virtual IP of the NFS client is actively switched back to the original NFS server node, which specifically includes the following steps:
accessing a distributed storage server cluster, and updating an NFS server node IP list in the distributed storage server cluster;
checking whether the connection configuration of the virtual IP of the current NFS server node is already configured, if not, preferentially selecting to route the virtual IP to any one normal NFS server node in the distributed storage NFS server cluster; if the selected current NFS service end node fails, selecting the next non-failure NFS service end according to the ordering of the IP accessible node list, and routing the virtual IP to the non-failure NFS service end node.
Preferably, as an embodiment; the subnetwork comprises an NFS server, an NFS client and an NFS system; the NFS mode connection refers to a mode of realizing connection between an NFS server and an NFS client by using an NFS system.
Preferably, as an embodiment; and the currently operated server is a server where the current NFS client accesses the corresponding NFS server through NFS connection.
Compared with the prior art, the embodiment of the invention has at least the following technical advantages:
according to the technical scheme adopted by the embodiment of the invention, the subnet special for the NFS mode connection is established, so that the current NFS client and the current NFS server node virtual IP are ensured to be in the same network segment, and IP conflict is avoided even if the NFS server virtual IP on each node is the same; then setting fixed NFS service end node virtual IP on the current NFS client node;
acquiring record NFS server node IP information, and routing NFS server node virtual IP to NFS server node; each NFS client has a corresponding NFS server, so that access signals initiated by different NFS clients are sent to different NFS servers, thereby dispersing access pressure and fully utilizing the capabilities of a plurality of NFS servers;
and detecting the connection state of the NFS mode connection and the running state of the current NFS server node in real time by utilizing a monitoring module, detecting whether the current NFS server fails or not in real time according to the detection result of the monitoring module, and if the current NFS server fails, routing the NFS server node virtual IP to another normal NFS server in real time.
By analyzing the NFS high availability system and the method based on the fault detection and routing strategy, provided by the invention, when the NFS client side is specifically applied, the NFS client side firstly accesses the NFS server node virtual IP, and the configuration of the NFS client side can be kept unchanged by abstracting the server side virtual IP, so that the change of a background system is shielded;
compared with the traditional polling mode, the method has the advantages that problems can be discovered faster and routes can be switched actively, and in addition, because strategies such as polling are not needed to be directly routed and linked to the target NFS server, the access request speed is faster;
the access exception is processed by adopting a route switching mode, and compared with the domain name resolution, the IP is changed in a much faster way;
localization can be achieved as much as possible, and after the local NFS server nodes corresponding to the NFS client are normal, the local NFS server nodes can be actively switched back even if no access abnormality occurs, so that the access of the client is faster and more accurate, delay is reduced, and the efficiency of the access per se is improved;
the method also has the automatic expansion capability, and can automatically acquire and update the latest node list without manual intervention configuration after the distributed storage cluster is newly added or deleted.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of an NFS high availability system architecture based on a fault detection and routing policy according to a first embodiment of the present invention;
fig. 2 is a schematic operation flow diagram of an NFS high availability method based on a fault detection and routing policy according to a second embodiment of the present invention;
fig. 3 is a schematic diagram of fault detection flow in an NFS high availability method based on fault detection and routing policy according to a second embodiment of the present invention.
Reference numerals: a distributed storage server cluster 10; NFS system 20; NFS client 30; a monitoring module 40; NFS server 11; a heartbeat module 21; a network detection module 22; the fault detection module 23.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention will now be described in further detail with reference to specific examples thereof in connection with the accompanying drawings.
Example 1
As shown in fig. 1, the present invention proposes an NFS high availability system based on fault detection and routing policy, which includes a distributed storage server cluster 10, an NFS system 20, a plurality of NFS clients 30, and a monitoring module 40;
the distributed storage server cluster 10 includes a plurality of NFS servers 11;
the NFS system comprises a heartbeat module 21, a network detection module 22 and a fault detection module 23;
the heartbeat module 21 is configured to collect, for each interval of a preset time period, a connection state of NFS connection and an operation state of a current NFS server node, and then report, to the monitoring module 40, the connection state of the NFS connection and the operation state of the current NFS server node;
the network detection module 22 is configured to obtain node information of the distributed storage cluster 10; then, sending a data packet to each NFS server node to check whether each NFS server node is IP accessible, selecting an NFS server with IP accessible, and establishing an IP accessible node list according to the serial number of the NFS server;
the fault detection module 23 is configured to detect, in real time, whether a current NFS server node has a fault according to the IP accessible node list, and if it is detected that the current NFS server node has a fault, actively switch the NFS server virtual IP to a next normal NFS server node in the cluster according to the ordering of the IP accessible node list; meanwhile, if the NFS server node is recovered to be normal, actively switching the connection of the virtual IP of the NFS client back to the original NFS server node;
the NFS server 11 is connected with the NFS client 30 through the NFS system 20;
the NFS client 30 is configured to send, through the NFS system 20, an access signal to the NFS server 11;
the NFS server 11 is configured to receive, by using the NFS system 20, an access signal of the NFS client 30, and make an access signal feedback to the access signal of the NFS client 30.
In summary, in the above-mentioned NFS high availability system based on fault detection and routing policy, when the current server initially operates, a subnet specific for NFS connection is established, and a fixed NFS server node virtual IP is set on the current NFS client node; recording IP information of the distributed storage cluster, and recording IP information of NFS server nodes in the distributed storage cluster; and routing the NFS server node virtual IP to the current NFS server node (the routing refers to routing the NFS server node virtual IP to the current NFS server node); detecting the connection state of NFS mode connection and the running state of the current NFS server node through a heartbeat module, sending the detected connection state of NFS mode connection and the running state of the current NFS server node to a monitoring module, detecting whether each NFS client node is IP accessible or not through a network detection module, selecting an NFS server which is IP accessible, and establishing an IP accessible node list according to the serial number of the NFS server; further, the fault detection module acquires the connection state of the NFS mode connection of the monitoring module, the running state of the current NFS server node and the list of the standing IP accessible nodes of the network detection module, detects whether the current NFS server node with the fault has the fault in real time, and if the current NFS server node has the fault, switches the connection state of the NFS mode connection with the fault to the normal NFS server node or switches the NFS server node with the fault to the normal NFS server node.
Example two
As shown in fig. 2, correspondingly, the invention further provides an NFS high availability method based on fault detection and routing policy, which comprises the following operation steps:
step S10: establishing a subnet special for NFS mode connection on a currently operated server so as to ensure that a current NFS client and a current NFS server node virtual IP are in the same network segment; the subnetwork comprises an NFS server, an NFS client and an NFS system; the NFS mode connection is a mode of realizing connection between an NFS server and an NFS client by utilizing an NFS system; then setting fixed NFS service end node virtual IP on the current NFS client node; (i.e., since the foregoing steps have illustrated that the NFS server and the NFS client are already established in the same ad hoc subnet, it is ensured that the current NFS client and the NFS server node virtual IP of the current NFS server are on the same network segment);
initializing current distributed storage cluster node information (the distributed storage cluster node information is distributed storage server cluster node information, and the distributed storage server cluster node information comprises attribute information of all NFS server nodes in a server cluster) after setting a subnet connected in an NFS mode and a fixed NFS server node virtual IP (the IP information is not the NFS server node virtual IP), recording the IP information of the distributed storage cluster, and recording the IP information of the NFS server nodes in the distributed storage cluster; virtual IP routing of the NFS server end node to the current NFS server end node;
the currently operated server is a server where the current NFS client accesses the corresponding NFS server through the NFS mode connection;
it should be noted that, a subnet is pre-planned on a node operated by an instance of the method ("the node operated by the instance of the method" is on a server operated by the instance of the method, that is, a server where a current NFS client accesses a current corresponding NFS server through an NFS connection), a fixed virtual IP of an NFS server node is set on the current NFS client node, and it is ensured that the virtual IP of the NFS server node and the current NFS client node IP are in the same network segment. It should be noted that the current NFS client and the NFS server node virtual IP are in the same subnet, so that even if the NFS server node virtual IP accessed on all NFS clients are the same, the problem of IP conflict does not exist;
initializing node information of a distributed storage cluster, and recording IP information of the storage cluster. In addition, the local storage node IP information in the distributed storage cluster needs to be recorded, and then the NFS server virtual IP is routed to a local NFS server node IP, where the local NFS server node is the NFS server node closest to the current NFS client node. The reason for this is that the method of the present invention hopes to make NFS highly available and also IO friendly, so when the local NFS server node is healthy, the method will reroute the NFS server node virtual IP to the local NFS server node as much as possible.
Step S20: asynchronously (the asynchronously is not carried out simultaneously with the steps, but is detected in real time by a single module, asynchronous operation is not influenced by the connection state of the current NFS mode connection and the operation state of the current NFS service end node, the efficiency is higher), the connection state of the NFS mode connection and the operation state of the current NFS service end node are collected every interval-changed (or different) preset time periods, and then the connection state of the NFS mode connection and the operation state of the current NFS service end node are reported to a monitoring module;
it should be noted that, in the above technical solution in the embodiment of the present invention, the current service state and the routing state are continuously checked (the "connection state of NFS connection" is the state of connection access between an NFS client and an NFS server; the "current service state" is the state of a current NFS server), and the status of the current service state is reported to the monitoring module;
the heartbeat module is created and started, and is an independent module which can be independently operated after being created and started;
the heartbeat module asynchronously collects the routing state and the running state of the method every preset time period, and then reports the information to the monitoring module.
Step S30: creating and starting a network detection module; acquiring node information of a distributed storage cluster; then, sending a data packet (the data packet is a detection data packet) to each NFS server node to check whether each NFS server node is IP accessible, selecting an NFS server with IP accessible, and establishing an IP accessible node list according to the serial number of the NFS server;
it should be noted that, the network detection module records the cluster node accessible by the IP, but the IP is accessible and does not represent that the NFS service can work normally, so the fault detection module continues to perform fault detection; creating and starting a network detection module, wherein the network detection module is an independent module and can independently operate after being created and started; the network detection module can acquire node information of the distributed storage cluster. Then sending data packets to each node to check whether each node is IP accessible, and finally updating and maintaining an IP accessible node list;
step S40: the fault detection module detects whether the current NFS service end node breaks down according to the IP accessible node list in real time (whether the destination of the virtual IP of the current NFS service end node breaks down is the destination NFS service end connected with the NFS client end or not), and if the NFS service end node breaks down, the NFS service end virtual IP is actively switched to the next normal NFS service end node in the cluster according to the ordering of the IP accessible node list; meanwhile, if the NFS server node returns to normal (the NFS server node is the NFS server node closest to the network of the NFS client in the initial state or the NFS server node corresponding to the NFS client, the NFS server node connected to the NFS client during initialization is the local NFS server node), the virtual IP route (or the NFS connection) of the NFS client is actively switched back to the original NFS server node.
Specifically, as shown in fig. 3, in step S40, if it is detected that the NFS server node fails, the NFS server virtual IP is actively switched to the next normal NFS server node in the cluster according to the ordering of the IP accessible node list; meanwhile, if the NFS server node returns to normal, the virtual IP route of the NFS client is actively switched back to the original NFS server node, which includes the following steps:
step S41: accessing a distributed storage server cluster, and updating an NFS server node IP list in the distributed storage server cluster;
it should be noted that, the distributed storage cluster (distributed storage server cluster) may dynamically add or delete nodes (NFS server nodes), and update node information to ensure that the latest node information can be maintained all the time. The storage nodes of the distributed storage cluster can be understood herein as NFS server side lists;
step S42: checking whether the connection configuration of the virtual IP of the current NFS service end node is already configured (checking whether the connection configuration of the virtual IP of the current NFS service end node is already configured or not is the checking whether the virtual IP of the NFS service end node on the current NFS client node is already routed to the current NFS service end node or not), if not, preferentially selecting to route the virtual IP of the NFS service end node to any normal NFS service end node in the distributed storage NFS service end cluster; if the NFS server node fails (namely, the current NFS server fails), selecting the next NFS server without failure according to the ordering of the IP accessible node list, and virtually routing the NFS server node to the NFS server without failure by using the virtual IP;
it should be noted that in the section of the failure detection module, an NFS server node is normally referred to as IP accessible for the NFS server node, and the NFS server node NFS service can also provide a service to determine whether the NFS server node IP is accessible to the IP accessible node list that needs to be maintained by the network detection module.
In summary, according to the NFS high availability system and method based on the fault detection and routing policy provided by the embodiments of the present invention, by establishing a subnet specifically used for NFS connection, it is ensured that the current NFS client and the current NFS server node virtual IP are in the same network segment, and it is ensured that no IP conflict occurs even if the NFS server virtual IP on each node is the same; then setting a fixed NFS service end node virtual IP on the current NFS service end node;
acquiring record NFS server node IP information, and transmitting NFS server node virtual IP to the NFS server node; each NFS client has a corresponding NFS server, so that access signals initiated by different NFS clients are sent to different NFS servers, thereby dispersing access pressure and fully utilizing the capabilities of a plurality of NFS servers;
detecting the connection state of the NFS mode connection and the running state of the current NFS server node in real time by utilizing a monitoring module, and switching the connection state of the failed NFS mode connection to a normal NFS server node or switching the failed NFS server node to the normal NFS server node in real time according to the detection result of the monitoring module;
the NFS client accesses the NFS server node virtual IP first, and the configuration of the NFS client side can be kept unchanged by abstracting the server virtual IP, so that the change of a background system is shielded;
compared with the traditional polling mode, the method has the advantages that problems can be discovered faster and routes can be switched actively, and in addition, because strategies such as polling are not needed to be directly routed and linked to the target NFS server, the access request speed is faster;
the access exception is processed by adopting a route switching mode, and compared with the domain name resolution, the IP is changed in a much faster way;
localization can be achieved as much as possible, and after the NFS server nodes corresponding to the NFS clients are normal, even if no access abnormality occurs, the local storage nodes can be actively switched back, so that the access is friendly, and the efficiency of the access is improved;
the method also has the automatic expansion capability, and can automatically acquire and update the latest node list without manual intervention configuration after the distributed storage cluster is newly added or deleted.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; modifications of the technical solutions described in the foregoing embodiments, or equivalent substitutions of some or all of the technical features thereof, may be made by those of ordinary skill in the art; such modifications and substitutions do not depart from the spirit of the invention.

Claims (6)

1. An NFS high availability system based on fault detection and routing strategy comprises a distributed storage server cluster, an NFS system, a plurality of NFS clients and a monitoring module;
the distributed storage server cluster comprises a plurality of NFS servers;
the NFS system comprises a heartbeat module, a network detection module and a fault detection module;
the NFS system is used for establishing a subnet special for NFS mode connection on a currently operated server so as to ensure that a current NFS client and a current NFS server node virtual IP are in the same network segment; then setting fixed NFS service end node virtual IP on the current NFS client node; initializing node information of a current distributed storage cluster, recording IP information of the distributed storage cluster, and recording IP information of NFS server nodes in the distributed storage cluster; virtual IP routing of the NFS server end node to a normal NFS server end node;
the heartbeat module is used for asynchronously collecting the connection state of the NFS mode connection and the running state of the current NFS service end node every preset time period which changes at intervals, and then reporting the connection state of the NFS mode connection and the running state of the current NFS service end node to the monitoring module;
the network detection module is used for acquiring node information of the distributed storage cluster; then sending a data packet to each NFS server node to check whether each NFS server node is IP accessible; selecting an NFS server which can be accessed by the IP, and establishing an IP accessible node list according to the number of the NFS server;
the fault detection module is used for detecting whether the current NFS server node has a fault in real time according to the IP accessible node list, and if the NFS server node is detected to have the fault, actively switching the virtual IP of the NFS server node to the next normal NFS server node in the cluster according to the ordering of the IP accessible node list; meanwhile, if the NFS server node is recovered to be normal, actively switching the connection of the virtual IP of the NFS server back to the original NFS server node;
the NFS server side is connected with the NFS client side through an NFS system respectively;
the NFS client is used for sending an access signal to the NFS server through the NFS system;
the NFS server is used for receiving the access signal of the NFS client through the NFS system and feeding back the access signal of the NFS client.
2. An NFS high availability method based on fault detection and routing policy, comprising the following steps:
establishing a subnet special for NFS mode connection on a currently operated server so as to ensure that a current NFS client and a current NFS server node virtual IP are in the same network segment; then setting fixed NFS service end node virtual IP on the current NFS client node;
initializing node information of a current distributed storage cluster, recording IP information of the distributed storage cluster, and recording IP information of NFS server nodes in the distributed storage cluster; virtual IP routing of the NFS server end node to a normal NFS server end node;
asynchronously collecting the connection state of the NFS mode connection and the running state of the current NFS server node in every interval-changing preset time period, and reporting the connection state of the NFS mode connection and the running state of the current NFS server node to a monitoring module;
acquiring node information of a distributed storage cluster; then, sending a data packet to each NFS server node to check whether each NFS server node is IP accessible, selecting an NFS server with IP accessible, and establishing an IP accessible node list according to the serial number of the NFS server;
detecting whether the current service end node fails or not in real time according to the IP accessible node list, and if the NFS service end node fails, actively switching the virtual IP of the NFS service end node to the next normal NFS service end node in the cluster according to the ordering of the IP accessible node list; meanwhile, if the NFS server node returns to normal, the connection of the virtual IP of the NFS client is actively switched back to the original NFS server node.
3. The NFS high availability method based on fault detection and routing policy according to claim 2, wherein the distributed storage cluster node information is distributed storage server cluster node information, and the distributed storage server cluster node information includes attribute information of all NFS server nodes in a server cluster.
4. The method for high availability of NFS based on fault detection and routing policy according to claim 3, wherein detecting whether the current service end node fails according to the list of IP accessible nodes in real time, if the NFS service end node fails, actively switching the virtual IP of the NFS service end node to the next normal NFS service end node in the cluster according to the ordering of the list of IP accessible nodes; meanwhile, if the NFS server node returns to normal, the connection of the virtual IP of the NFS client is actively switched back to the original NFS server node, which specifically includes the following steps:
accessing a distributed storage server cluster, and updating an NFS server node IP list in the distributed storage server cluster;
checking whether the connection route of the virtual IP of the current NFS server has been configured, if not, preferentially selecting to route the virtual IP to any NFS server node in the distributed storage NFS server cluster, and if the local NFS server node is not available, selecting a normal NFS server node; if the selected current NFS service end node fails, selecting the next non-failure NFS service end according to the ordering of the IP accessible node list, and virtually routing the NFS service end node to the non-failure NFS service end.
5. The NFS high availability method based on fault detection and routing policy of claim 4, wherein the subnetwork comprises an NFS server and an NFS client and an NFS system; the NFS mode connection refers to a mode of realizing connection between an NFS server and an NFS client by using an NFS system.
6. The method for NFS high availability based on fault detection and routing policy according to claim 5, wherein the currently running server accesses a server at a currently corresponding NFS server for a current NFS client through NFS connection.
CN202310082854.8A 2023-02-08 2023-02-08 NFS high availability system and method based on fault detection and routing strategy Active CN116112500B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310082854.8A CN116112500B (en) 2023-02-08 2023-02-08 NFS high availability system and method based on fault detection and routing strategy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310082854.8A CN116112500B (en) 2023-02-08 2023-02-08 NFS high availability system and method based on fault detection and routing strategy

Publications (2)

Publication Number Publication Date
CN116112500A CN116112500A (en) 2023-05-12
CN116112500B true CN116112500B (en) 2023-08-15

Family

ID=86259420

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310082854.8A Active CN116112500B (en) 2023-02-08 2023-02-08 NFS high availability system and method based on fault detection and routing strategy

Country Status (1)

Country Link
CN (1) CN116112500B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1554055A (en) * 2001-07-23 2004-12-08 �Ƚ�΢װ�ù�˾ High-availability cluster virtual server system
CN102231681A (en) * 2011-06-27 2011-11-02 中国建设银行股份有限公司 High availability cluster computer system and fault treatment method thereof
CN107404524A (en) * 2017-07-24 2017-11-28 郑州云海信息技术有限公司 A kind of method and device of distributed type assemblies node visit
CN107566466A (en) * 2017-08-24 2018-01-09 新华三大数据技术有限公司 Load-balancing method and device
CN108900647A (en) * 2018-09-13 2018-11-27 新华三技术有限公司成都分公司 Address switching handling method and device
CN111209260A (en) * 2019-12-30 2020-05-29 创新科技术有限公司 NFS cluster based on distributed storage and method for providing NFS service
CN111737201A (en) * 2020-06-05 2020-10-02 苏州浪潮智能科技有限公司 Method for closing opened file, computer equipment and storage medium
CN111885112A (en) * 2020-06-24 2020-11-03 广东浪潮大数据研究有限公司 Node service exception handling method, device, equipment and storage medium
CN112084007A (en) * 2020-09-10 2020-12-15 星辰天合(北京)数据科技有限公司 NAS storage upgrading method and device based on virtual machine technology
CN112087516A (en) * 2020-09-10 2020-12-15 星辰天合(北京)数据科技有限公司 Storage upgrading method and device based on Docker virtualization technology
CN112492011A (en) * 2020-11-19 2021-03-12 苏州浪潮智能科技有限公司 Distributed storage system fault switching method, system, terminal and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7246256B2 (en) * 2004-01-20 2007-07-17 International Business Machines Corporation Managing failover of J2EE compliant middleware in a high availability system
US11917001B2 (en) * 2020-02-04 2024-02-27 Nutanix, Inc. Efficient virtual IP address management for service clusters

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1554055A (en) * 2001-07-23 2004-12-08 �Ƚ�΢װ�ù�˾ High-availability cluster virtual server system
CN102231681A (en) * 2011-06-27 2011-11-02 中国建设银行股份有限公司 High availability cluster computer system and fault treatment method thereof
CN107404524A (en) * 2017-07-24 2017-11-28 郑州云海信息技术有限公司 A kind of method and device of distributed type assemblies node visit
CN107566466A (en) * 2017-08-24 2018-01-09 新华三大数据技术有限公司 Load-balancing method and device
CN108900647A (en) * 2018-09-13 2018-11-27 新华三技术有限公司成都分公司 Address switching handling method and device
CN111209260A (en) * 2019-12-30 2020-05-29 创新科技术有限公司 NFS cluster based on distributed storage and method for providing NFS service
CN111737201A (en) * 2020-06-05 2020-10-02 苏州浪潮智能科技有限公司 Method for closing opened file, computer equipment and storage medium
CN111885112A (en) * 2020-06-24 2020-11-03 广东浪潮大数据研究有限公司 Node service exception handling method, device, equipment and storage medium
CN112084007A (en) * 2020-09-10 2020-12-15 星辰天合(北京)数据科技有限公司 NAS storage upgrading method and device based on virtual machine technology
CN112087516A (en) * 2020-09-10 2020-12-15 星辰天合(北京)数据科技有限公司 Storage upgrading method and device based on Docker virtualization technology
CN112492011A (en) * 2020-11-19 2021-03-12 苏州浪潮智能科技有限公司 Distributed storage system fault switching method, system, terminal and storage medium

Also Published As

Publication number Publication date
CN116112500A (en) 2023-05-12

Similar Documents

Publication Publication Date Title
EP1410229B1 (en) HIGH-AVAILABILITY CLUSTER VIRTUAL SERVER SYSTEM and method
CN111581284B (en) Database high availability method, device, system and storage medium
EP1653711B1 (en) Fault tolerant network architecture
KR100617344B1 (en) Reliable fault resolution in a cluster
JP5123955B2 (en) Distributed network management system and method
CN100544342C (en) Storage system
CN107465721B (en) Global load balancing method and system based on double-active architecture and scheduling server
US7518983B2 (en) Proxy response apparatus
CN109344014B (en) Main/standby switching method and device and communication equipment
US20070070975A1 (en) Storage system and storage device
US20130159487A1 (en) Migration of Virtual IP Addresses in a Failover Cluster
US10917289B2 (en) Handling network failures in networks with redundant servers
JP5617304B2 (en) Switching device, information processing device, and fault notification control program
JPH1168745A (en) System and method for managing network
JP2010103695A (en) Cluster system, cluster server and cluster control method
JPH08212095A (en) Client server control system
CA2401635A1 (en) Multiple network fault tolerance via redundant network control
CN113810439B (en) Ethernet storage system and information notification method and related device thereof
JP5326308B2 (en) Computer link method and system
JP4464256B2 (en) Network host monitoring device
CA2433576A1 (en) Software-based fault tolerant networking using a single lan
CN116112500B (en) NFS high availability system and method based on fault detection and routing strategy
JP3542980B2 (en) Network system, network entity monitoring method, recording medium
JP2003203018A (en) Pseudo cluster system using san
GB2362230A (en) Delegated fault detection in a network by mutual node status checking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant