CN110933142A - ICFS cluster network card monitoring method, device and equipment and medium - Google Patents
ICFS cluster network card monitoring method, device and equipment and medium Download PDFInfo
- Publication number
- CN110933142A CN110933142A CN201911082326.2A CN201911082326A CN110933142A CN 110933142 A CN110933142 A CN 110933142A CN 201911082326 A CN201911082326 A CN 201911082326A CN 110933142 A CN110933142 A CN 110933142A
- Authority
- CN
- China
- Prior art keywords
- node
- icfs
- network card
- cluster network
- nodes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000012544 monitoring process Methods 0.000 title claims abstract description 41
- 238000012806 monitoring device Methods 0.000 claims abstract description 11
- 238000011084 recovery Methods 0.000 claims description 46
- 230000005540 biological transmission Effects 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 10
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 230000008569 process Effects 0.000 description 13
- 238000012545 processing Methods 0.000 description 11
- 238000001514 detection method Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 4
- 230000002159 abnormal effect Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/10—Active monitoring, e.g. heartbeat, ping or trace-route
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/14—Session management
- H04L67/141—Setup of application sessions
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Cardiology (AREA)
- General Health & Medical Sciences (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The application provides an ICFS cluster network card monitoring method, which comprises the following steps: establishing TCP connection with the cluster IP between other nodes by using the CTDB service; sending first ICFS heartbeat information to other nodes according to a preset period, and receiving second ICFS heartbeat information sent by other nodes; and determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node mark set and the second ICFS heartbeat information. Therefore, the ICFS cluster network card of each node can be monitored in real time, monitoring efficiency is high, the nodes can be timely recovered according to the running state of each node, and influences on client services are reduced. The application also provides an ICFS cluster network card monitoring device, electronic equipment and a computer readable storage medium, which have the beneficial effects.
Description
Technical Field
The present disclosure relates to the field of server technologies, and in particular, to an ICFS cluster network card monitoring method, an ICFS cluster network card monitoring apparatus, an electronic device, and a computer-readable storage medium.
Background
In order to separate and avoid the external communication inside and outside the cluster, the CTDB network card and the ICFS cluster network card are usually set as different network cards, the ICFS cluster network card is only used inside the cluster, and the CTDB network card is used for communication between the CTDB nodes and provides a virtual IP for the client to access. When the CTDB network card of a certain node fails, the CTDB can sense and recover the failure, the virtual IP of the failed node is floated, and the normal node continues to provide service for the client. However, when the ICFS cluster network card fails, the CTDB cannot sense the failure, and the failure recovery is not performed, which may cause the client to cut off the traffic and cause serious impact.
Therefore, how to provide a solution to the above technical problem is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The application aims to provide an ICFS cluster network card monitoring method, an ICFS cluster network card monitoring device, electronic equipment and a computer readable storage medium, and the ICFS cluster network card can be efficiently monitored. The specific scheme is as follows:
the application discloses an ICFS cluster network card monitoring method, which comprises the following steps:
establishing TCP connection with the cluster IP between other nodes by using the CTDB service;
sending first ICFS heartbeat information to other nodes according to a preset period, and receiving second ICFS heartbeat information sent by other nodes;
and determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node mark set and the second ICFS heartbeat information.
Optionally, the determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node flag set, and the second ICFS heartbeat information includes:
acquiring a node mark of a target node from the node mark set;
if the node mark of the target node is not connectable, after receiving second ICFS heartbeat information of the target node, determining that an ICFS cluster network card of the target node is normal, and changing the node mark of the target node in the node mark set into connectable;
and reading the ICFS cluster network card state of the node by using an execution script event, and if the ICFS cluster network card state of the node is normal and the node mark is not connectable, setting the node mark as connectable.
Optionally, if the node mark of the target node is connectable, when the second ICFS heartbeat information of the target node is not received for a preset number of times, reading the ICFS cluster network card state of the local node by using an execution script event;
if the ICFS cluster network card of the node is in a fault state and the node mark is connectable, setting the node mark as not connectable and determining the ICFS cluster network card fault of the node;
if the ICFS cluster network card state of the node is normal, determining that the ICFS cluster network card of the target node has a fault;
changing the node designation of the target node in the node designation set to be non-connectable.
Optionally, after the node flag of the target node in the node flag set is changed to be not connectable, the method further includes:
and establishing the TCP connection with the target node at preset time intervals by using a timer until the connection is successful.
Optionally, after determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node flag set, and the second ICFS heartbeat information, the method includes:
determining a fault node according to the node mark set and an original node mark set;
performing fault recovery on the fault node, wherein the fault recovery comprises database synchronization, node fault notification and virtual IP allocation;
and the node corresponding to the ICFS cluster network card state of the node is the main node.
Optionally, performing fault recovery on the failed node, including:
judging whether the fault node comprises the local node;
and if the fault node comprises the local node, determining a new main node so as to facilitate the new main node to carry out fault recovery.
Optionally, the establishing a TCP connection with a cluster IP between other nodes by using the CTDB service includes:
when the CTDB is started, reading the cluster IP and ICFS cluster network cards of other nodes from a configuration file;
determining a local node cluster IP, and executing bind and list to the local node cluster IP in sequence so as to monitor whether other nodes establish connection with the local node;
if the connection requests of other nodes are read, sending agreement information to other nodes so as to establish message transmission queues with other nodes;
and connecting other nodes, establishing message transmission queues with other nodes after receiving the consent information of other nodes, and sending the ICFS heartbeat information to the corresponding other nodes.
The application provides an ICFS cluster network card monitoring device, includes:
the connection establishing module is used for establishing TCP connection with the cluster IP between other nodes by utilizing the CTDB service;
the heartbeat information receiving and sending module is used for sending first ICFS heartbeat information to other nodes according to a preset period and receiving second ICFS heartbeat information sent by other nodes;
and the running state determining module is used for determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node mark set and the second ICFS heartbeat information.
The application provides an electronic device, including:
a memory for storing a computer program;
and the processor is used for realizing the steps of the ICFS cluster network card monitoring method when executing the computer program.
The present application provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the above-mentioned ICFS cluster network card monitoring method.
The application provides an ICFS cluster network card monitoring method, which comprises the following steps: establishing TCP connection with the cluster IP between other nodes by using the CTDB service; sending first ICFS heartbeat information to other nodes according to a preset period, and receiving second ICFS heartbeat information sent by other nodes; and determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node mark set and the second ICFS heartbeat information.
Therefore, the CTDB is used for establishing TCP connection with other node clusters IP, the running state of the ICFS cluster network card is determined according to the ICFS cluster network card state of the node, the node mark set and the second ICFS heartbeat information, the ICFS cluster network card of each node can be monitored in real time, the monitoring efficiency is high, the nodes can be timely recovered according to the running state of each node, and the influence on the client service is reduced.
The application also provides an ICFS cluster network card monitoring device, an electronic device and a computer readable storage medium, which all have the beneficial effects and are not described herein again.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a method for monitoring an ICFS cluster network card according to an embodiment of the present disclosure;
fig. 2 is a flowchart of establishing a TCP connection between nodes according to an embodiment of the present application;
FIG. 3 is a flowchart of a main process provided in an embodiment of the present application;
FIG. 4 is a flow diagram of fault detection and recovery provided by an embodiment of the present application;
fig. 5 is a schematic structural diagram of an ICFS cluster network card monitoring device according to an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
When the CTDB network card of a certain node fails, the CTDB can sense and recover the failure, the virtual IP of the failed node is floated, and the normal node continues to provide service for the client. However, when the ICFS cluster network card fails, the CTDB cannot sense the failure, and the failure recovery is not performed, which may cause the client to cut off the traffic and cause serious impact. Based on the above technical problem, this embodiment provides an ICFS cluster network card monitoring method, and please refer to fig. 1 specifically, where fig. 1 is a flowchart of an ICFS cluster network card monitoring method provided in this embodiment, and specifically includes:
s101, establishing a TCP connection with the cluster IP between other nodes by utilizing the CTDB service.
The execution subject of the present embodiment is a master node, but of course, may be another node as long as the object of the present embodiment can be achieved. And each node is provided with a CTDB service for realizing ICFS cluster network card monitoring of each node.
Wherein, step S101 includes: when the CTDB is started, reading cluster IP and ICFS cluster network cards of other nodes from the configuration file; determining the IP of the node cluster, and executing bind and list to the IP of the node cluster in sequence so as to monitor whether other nodes establish connection with the node; if the connection request of other nodes is read, the consent information is sent to other nodes so as to establish message transmission queues with other nodes; and connecting other nodes, establishing message transmission queues with other nodes after receiving the agreement information of other nodes, and sending ICFS heartbeat information to the corresponding other nodes.
Referring to fig. 2, fig. 2 is a flowchart of establishing a TCP connection between nodes according to an embodiment of the present application, and first, when a CTDB is started, Cluster IPs, i.e., Cluster IPs and ICFS Cluster network cards, of other nodes are read from a configuration file; finding out the Cluster IP of the node and binding the Cluster IP of the node; and after binding succeeds, beginning list, and when the node is connected with other nodes, namely the node reads connection requests of other nodes, sending agreement information, namely calling an accept function, so as to establish a message transmission queue with the other side and facilitate the receiving and sending processing of messages. The node serves as a client to connect with each other node, after successful connection, namely after receiving the consent information of other nodes, a message transmission queue with the other node is established to facilitate message receiving and sending processing, and ICFS heartbeat information is actively sent to the other node once to trigger ICFS heartbeat detection. It is understood that when the number of nodes is n, the connection establishment process described above is performed by all nodes so as to successfully establish the TCP connections for mutual communication between all nodes.
S102, sending first ICFS heartbeat information to other nodes according to a preset period, and receiving second ICFS heartbeat information sent by other nodes.
The general preset period is two seconds, but of course, other times are also possible, as long as the purpose of the present embodiment can be achieved. Specifically, all nodes send first ICFS heartbeat information to other nodes every two seconds; therefore, each node also receives the second ICFS heartbeat messages sent by other nodes.
S103, determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node mark set and the second ICFS heartbeat information.
It can be understood that, when the ICFS cluster network card status of the node is normal, the node flag set is determined according to the received second ICFS heartbeat information, and when the node flag set changes from the original node flag set, it is proved that the operating status of other nodes changes, and the node flag set changes from normal to fault, or changes from fault to normal. The preset times do not receive second ICFS heartbeat information of the target node, and when the ICFS cluster network card state of the node is obtained to be a fault through executing the script, the ICFS cluster network card state of the node is the fault; and if the ICFS cluster network card state of the node is normal acquired by executing the script, the ICFS cluster network card state of the target node is a fault, so that the state monitoring of the ICFS cluster network card is realized.
Further, determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node mark set and the second ICFS heartbeat information comprises the following steps:
and acquiring the node mark of the target node from the node mark set.
If the node mark of the target node is not connectable, after second ICFS heartbeat information of the target node is received, determining that an ICFS cluster network card of the target node is normal, and changing the node mark of the target node in the node mark set into connectable; and reading the ICFS cluster network card state of the node by using the execution script event, and if the ICFS cluster network card state of the node is normal and the node mark is not connectable, setting the node mark as connectable.
If the node mark of the target node is connectable, when second ICFS heartbeat information of the target node is not received for a preset number of times, reading the ICFS cluster network card state of the node by using an execution script event; if the ICFS cluster network card of the node is in a fault state and the node mark is connectable, setting the node mark as not connectable and determining the fault of the ICFS cluster network card of the node; if the ICFS cluster network card state of the node is normal, determining the ICFS cluster network card fault of the target node; the node designation of the target node in the node designation set is changed to be non-connectable.
Each node is stored with a node mark set, which includes node marks of all nodes, and it should be noted that the node mark attribute of each node is set by the node, but the node mark set is the node mark of the corresponding node determined according to the reception of the ICFS heartbeat information. Wherein, after the node mark of the target node in the node mark set is changed to be not connectable, the method further comprises the following steps: and establishing TCP connection with the target node at preset time intervals by using a timer until the connection is successful.
For example, when the node reads the state of the ICFS cluster network card of the node, if the state of the ICFS cluster network card of the node is normal, the node flag is set to be connectable, at this time, the attribute of the node flag is set, and the node flag is written to be connectable in the node flag set, but when the node flag of the target node in the original node flag set is not connectable, the ICFS heartbeat information of the target node is received, the ICFS cluster network card of the target node is proved to be normal, and the node flag set is obtained, wherein the node flag of the target node in the node flag set is changed from being unconnected to being connectable. When the node marks of the target nodes in the original node mark set are connectable, if the ICFS heartbeat information of the target nodes is not received for a preset number of times, the ICFS cluster network card of the target nodes is proved to be in fault, and the node mark set is obtained, wherein the node marks of the target nodes in the node mark set are changed from connectable to non-connectable. It can be understood that, during the connection process, the master node performs fault recovery on the target node, and when the fault recovery is completed, the connection can be established.
It can be understood that each node CTDB service has two processes, one is a main process and one is a recovery process, specifically, please refer to fig. 3, where fig. 3 is a flowchart of the main process provided in the embodiment of the present application, and includes:
when the CTDB is started, the ICFS _ flag, namely the node flag, of all the nodes is set to ICFS _ DISCONNECTED, namely the nodes cannot be connected. And then, acquiring the ICFS cluster network card state of the node once through the script event, and if the ICFS _ flag of the node is normal, setting the ICFS _ flag of the node as ICFS _ OK to connect.
The heartbeat detection is performed periodically (default 2s), each detection traversing each node.
If the node considers that the ICFS _ flag of the target node is ICFS _ DISCONNECTED, namely the node flag of the target node in the current node flag set is not connectable, but the ICFS heartbeat message of the opposite side is received, the ICFS cluster network card of the target node is considered to be recovered to be normal, the ICFS _ flag corresponding to the target node in the node flag set is ICFS _ OK, and the ICFS cluster network card state of the target node is set to be normal. And then, acquiring the ICFS cluster network card state of the node once in real time through a script event, and if the ICfs _ flag of the node is in a normal state and is ICFS _ DISCONNECTED, setting the ICfs _ flag of the node as ICFS _ OK.
If the node considers that the ICFS _ flag of the target node is ICFS _ OK but the ICFS heartbeat message of the target node is not received for 4 times of detection, the ICFS heartbeat of the target node is considered to be lost, the ICFS _ flag of the target node is set as ICFS _ DISCONNECTED, and the ICFS cluster network card state of the target node is set to be abnormal. And simultaneously acquiring the ICFS cluster network card state of the node, and if the ICfs _ flag of the node is in an abnormal state and is not the ICFS _ DISCONNECTED mark, setting the ICfs _ flag of the node as ICFS _ DISCONNECTED. And if the target node ICFS _ flag is not ICFS _ DISCONNECTED but does not receive the ICFS heartbeat message of the opposite side for 4 times continuously, the ICFS cluster network card of the opposite side is considered to be normal, and the ICFS heartbeat message is continuously sent to the opposite side. And starting a timer to connect the target node every 1s, and stopping the timer until the connection is successful, so that the tcp connection of the Cluster IP can be reestablished when the target node ICFS Cluster network card is recovered to be normal. It can be understood that, during the connection process, the master node performs fault recovery on the target node, and when the fault recovery is completed, the connection can be established.
Based on the technical scheme, the CTDB is used for establishing TCP connection with other node clusters IP, the running state of the ICFS cluster network card is determined according to the ICFS cluster network card state of the node, the node mark set and the second ICFS heartbeat information, each node ICFS cluster network card can be monitored in real time, the monitoring efficiency is high, and therefore the node can be timely recovered according to the running state of each node, and the influence on the client service is reduced.
In an implementation manner, after determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node flag set, and the second ICFS heartbeat information, the method includes: determining a fault node according to the node mark set and the original node mark set; carrying out fault recovery on the fault node, wherein the fault recovery comprises database synchronization, node fault notification and virtual IP allocation; and the node corresponding to the ICFS cluster network card state of the node is the main node. Wherein, carrying out fault recovery on the fault node comprises: judging whether the fault node comprises the local node; and if the fault node comprises the local node, determining a new main node so that the new main node can carry out fault recovery.
The CTDB service has two processes, a main process and a recovery process. The main process is responsible for ICFS heartbeat information receiving and sending and service processing among all the nodes, the recovery process circularly detects whether the cluster is abnormal or not, and if yes, fault recovery and the like are carried out. The ICFS heartbeat detection processing is carried out in the main process, and the fault detection and recovery processing is carried out in the recovery process of the main node.
Referring to fig. 4 in detail, fig. 4 is a flowchart of fault detection and recovery provided by an embodiment of the present application, including:
and the recovery process pulls the node flags icfs _ flag of each node from the main process and stores the node flags into a node flag set icfs _ flags, wherein the icfs _ flags [ i ] represents whether the Cluster IP Cluster IP of the ith node can be connected or not.
Directly updating the icfs _ flags saved by the recovery process by the non-host node; if the master node is the master node, comparing the previously stored original node flag set ICFS _ flags with the newly obtained node flag set ICFS _ flags, if the original node flag set ICFS _ flags and the newly obtained node flag set ICFS _ flags are not consistent, indicating that the ICFS cluster network card with the nodes has a fault or is recovered to be normal, updating the node flag set ICFS _ flags at the moment, and then performing recovery processing.
When recovery processing is carried out, if the node flag ICFS _ flag of the node is detected to be ICFS _ DISCONNECTED, the ICFS cluster network card of the node is indicated to have a fault, fault recovery processing is not carried out, a new main node is determined again, and fault recovery is carried out by the new main node.
If the ICFS _ flag of the user is detected not to be ICFS _ DISCONNECTED, fault recovery is started, and the fault recovery mainly carries out database synchronization, executes a fault recovery script to inform a cluster of which nodes have faults, virtual IP allocation and the like.
Therefore, it can be understood that each node has a main process and a recovery process, but only the recovery process of the main node can perform fault recovery, when an ICFS cluster network card of the main node fails, a new main node is determined, and the fault recovery is performed by using the new main node, and thus, a TCP connection between each node is established through a CTDB service set in each node, so that the state of the ICFS cluster network card of each node can be detected through ICFS heartbeat information, and the monitoring efficiency is improved.
It can be understood that, in this embodiment, the CTDB service is used to enable the nodes of the cluster to mutually detect the state of the ICFS cluster network card of the other party through the ICFS heartbeat, when the ICFS heartbeat of the other party is lost and the ICFS cluster network card of this node normally operates, it is considered that the ICFS cluster network card of the other party has a fault, and the master node is used to perform fault recovery processing, thereby ensuring that the client service is quickly switched to the normal node. When the ICFS cluster network card of the failed node is recovered to be normal again, other nodes detect that the ICFS heartbeat of the node is recovered, the ICFS cluster network card of the node is considered to be recovered to be normal, and the main node can perform fault recovery processing to recover the node to be normally used. The method meets the constantly changing requirements in actual production, improves the stability and high availability of the cluster, avoids the problem that the state of the ICFS cluster network card of the node cannot be notified to other nodes through the CTDB network card when the CTDB network card of the node fails, and in addition, after the ICFS heartbeat loss is detected, the ICFS cluster network card state of the node is acquired in real time through executing a script event to determine whether the ICFS heartbeat loss caused by the ICFS cluster network card failure of the node is caused or whether the ICFS heartbeat loss caused by the ICFS cluster network card failure of the other side is caused.
Therefore, the technical scheme provided by the embodiment has high monitoring efficiency, and when the ICFS network card of a node fails, the CTDB can sense and quickly recover the failure, so that the client service can be timely recovered, and the influence on the client service is reduced.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an ICFS cluster network card monitoring device according to an embodiment of the present disclosure, where the ICFS cluster network card monitoring device and the ICFS cluster network card monitoring method described above are referred to correspondingly, and the schematic structural diagram includes:
a connection establishing module 100, configured to establish a TCP connection with a cluster IP between other nodes by using a CTDB service;
the heartbeat information receiving and sending module 200 is configured to send first ICFS heartbeat information to other nodes according to a preset period, and receive second ICFS heartbeat information sent by other nodes;
and the running state determining module 300 is configured to determine a running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node mark set, and the second ICFS heartbeat information.
Optionally, the operation state determining module 300 includes:
a node mark acquisition unit, configured to acquire a node mark of a target node from a node mark set;
the first determining unit is used for determining that the ICFS cluster network card of the target node is normal after receiving second ICFS heartbeat information of the target node if the node mark of the target node is not connectable, and changing the node mark of the target node in the node mark set into connectable;
and the setting unit is used for reading the ICFS cluster network card state of the node by utilizing the execution script event, and setting the node mark as connectable if the ICFS cluster network card state of the node is normal and the node mark is not connectable.
Optionally, the operation state determining module 300 further includes:
the reading unit is used for reading the ICFS cluster network card state of the node by utilizing the execution script event when the second ICFS heartbeat information of the target node is not received for the continuous preset times if the node mark of the target node is connectable;
the second determining unit is used for setting the mark of the node as not connectable if the ICFS cluster network card of the node is in a fault state and the mark of the node is connectable, and determining the fault of the ICFS cluster network card of the node;
the fault determining unit is used for determining the ICFS cluster network card fault of the target node if the ICFS cluster network card state of the node is normal;
and the changing unit is used for changing the node marks of the target nodes in the node mark set into the unconnected nodes.
Optionally, the method further includes:
and the connection establishing module is used for establishing TCP connection with the target node at preset time intervals by utilizing a timer until the connection is successful.
Optionally, the method further includes:
the fault node determining module is used for determining a fault node according to the node mark set and the original node mark set;
and the fault recovery module is used for performing fault recovery on the fault node, wherein the fault recovery comprises database synchronization, node fault notification and virtual IP allocation.
Optionally, the failure recovery module includes:
the judging module is used for judging whether the fault node comprises the local node;
and the new main node determining module is used for determining the new main node if the fault node comprises the local node so as to facilitate the fault recovery of the new main node.
Optionally, the connection establishing module includes:
the reading unit is used for reading cluster IP and ICFS cluster network cards of other nodes from the configuration file when the CTDB is started;
the processing unit is used for determining the node cluster IP and sequentially executing bind and list to the node cluster IP so as to monitor whether other nodes establish connection with the node;
the agreement unit is used for sending agreement information to other nodes so as to establish message transmission queues with other nodes after reading the connection requests of other nodes;
and the establishing unit is used for connecting other nodes, establishing message transmission queues with other nodes after receiving the agreement information of other nodes, and sending the ICFS heartbeat information to the corresponding other nodes.
Since the embodiment of the ICFS cluster network card monitoring device portion corresponds to the embodiment of the ICFS cluster network card monitoring method portion, for the embodiment of the ICFS cluster network card monitoring device portion, reference is made to the description of the embodiment of the ICFS cluster network card monitoring method portion, and details are not described here.
In the following, the electronic device provided by the embodiment of the present application is introduced, and the electronic device described below and the ICFS cluster network card monitoring method described above may be referred to correspondingly.
The present embodiment provides an electronic device, including:
a memory for storing a computer program;
and the processor is used for realizing the steps of the ICFS cluster network card monitoring method when executing the computer program.
Since the embodiment of the electronic device portion corresponds to the embodiment of the ICFS cluster network card monitoring method portion, reference is made to the description of the embodiment of the ICFS cluster network card monitoring method portion for the embodiment of the electronic device portion, and details are not described here for the moment.
In the following, a computer-readable storage medium provided in an embodiment of the present application is introduced, and the computer-readable storage medium described below and the ICFS cluster network card monitoring method described above may be referred to correspondingly.
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, and when being executed by a processor, the computer program implements the steps of the above ICFS cluster network card monitoring method.
Since the embodiment of the computer-readable storage medium portion corresponds to the embodiment of the ICFS cluster network card monitoring method portion, for the embodiment of the computer-readable storage medium portion, reference is made to the description of the embodiment of the ICFS cluster network card monitoring method portion, and details are not described here for the moment.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The ICFS cluster network card monitoring method, the ICFS cluster network card monitoring device, the electronic device, and the computer-readable storage medium provided by the present application are described in detail above. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
Claims (10)
1. An ICFS cluster network card monitoring method is characterized by comprising the following steps:
establishing TCP connection with the cluster IP between other nodes by using the CTDB service;
sending first ICFS heartbeat information to other nodes according to a preset period, and receiving second ICFS heartbeat information sent by other nodes;
and determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node mark set and the second ICFS heartbeat information.
2. The ICFS cluster network card monitoring method according to claim 1, wherein determining the operating state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node flag set, and the second ICFS heartbeat information includes:
acquiring a node mark of a target node from the node mark set;
if the node mark of the target node is not connectable, after receiving second ICFS heartbeat information of the target node, determining that an ICFS cluster network card of the target node is normal, and changing the node mark of the target node in the node mark set into connectable;
and reading the ICFS cluster network card state of the node by using an execution script event, and if the ICFS cluster network card state of the node is normal and the node mark is not connectable, setting the node mark as connectable.
3. The ICFS cluster network card monitoring method of claim 2, wherein if the node mark of the target node is connectable, when the second ICFS heartbeat information of the target node is not received for a preset number of consecutive times, the state of the ICFS cluster network card of the node is read by using the execution script event;
if the ICFS cluster network card of the node is in a fault state and the node mark is connectable, setting the node mark as not connectable and determining the ICFS cluster network card fault of the node;
if the ICFS cluster network card state of the node is normal, determining that the ICFS cluster network card of the target node has a fault, and changing the node marks of the target node in the node mark set into non-connectable nodes.
4. The ICFS cluster network card monitoring method according to claim 3, wherein after the node flag of the target node in the node flag set is changed to be not connectable, the method further comprises:
and establishing the TCP connection with the target node at preset time intervals by using a timer until the connection is successful.
5. The ICFS cluster network card monitoring method of claim 1, wherein after determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node mark set and the second ICFS heartbeat information, the method comprises the following steps:
determining a fault node according to the node mark set and an original node mark set;
performing fault recovery on the fault node, wherein the fault recovery comprises database synchronization, node fault notification and virtual IP allocation;
and the node corresponding to the ICFS cluster network card state of the node is the main node.
6. The ICFS cluster network card monitoring method of claim 5, wherein the fault recovery of the fault node comprises:
judging whether the fault node comprises the local node;
and if the fault node comprises the local node, determining a new main node so as to facilitate the new main node to carry out fault recovery.
7. The ICFS cluster network card monitoring method of claim 1, wherein the establishing TCP connections with the cluster IP among other nodes using the CTDB service includes:
when the CTDB is started, reading the cluster IP and ICFS cluster network cards of other nodes from a configuration file;
determining a local node cluster IP, and executing bind and list to the local node cluster IP in sequence so as to monitor whether other nodes establish connection with the local node;
if the connection requests of other nodes are read, sending agreement information to other nodes so as to establish message transmission queues with other nodes;
and connecting other nodes, establishing message transmission queues with other nodes after receiving the consent information of other nodes, and sending the ICFS heartbeat information to the corresponding other nodes.
8. An ICFS cluster network card monitoring device is characterized by comprising:
the connection establishing module is used for establishing TCP connection with the cluster IP between other nodes by utilizing the CTDB service;
the heartbeat information receiving and sending module is used for sending first ICFS heartbeat information to other nodes according to a preset period and receiving second ICFS heartbeat information sent by other nodes;
and the running state determining module is used for determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node mark set and the second ICFS heartbeat information.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the ICFS cluster network card monitoring method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the ICFS cluster network card monitoring method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911082326.2A CN110933142A (en) | 2019-11-07 | 2019-11-07 | ICFS cluster network card monitoring method, device and equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911082326.2A CN110933142A (en) | 2019-11-07 | 2019-11-07 | ICFS cluster network card monitoring method, device and equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110933142A true CN110933142A (en) | 2020-03-27 |
Family
ID=69852560
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911082326.2A Pending CN110933142A (en) | 2019-11-07 | 2019-11-07 | ICFS cluster network card monitoring method, device and equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110933142A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111885097A (en) * | 2020-06-01 | 2020-11-03 | 视联动力信息技术股份有限公司 | Network card processing method and device, electronic equipment and storage medium |
CN112769652A (en) * | 2021-01-14 | 2021-05-07 | 苏州浪潮智能科技有限公司 | Node service monitoring method, device, equipment and medium |
CN112866408A (en) * | 2021-02-09 | 2021-05-28 | 山东英信计算机技术有限公司 | Service switching method, device, equipment and storage medium in cluster |
CN114363150A (en) * | 2021-12-28 | 2022-04-15 | 浪潮通信技术有限公司 | Network card connectivity monitoring method and device for server cluster |
CN114826892A (en) * | 2022-04-28 | 2022-07-29 | 济南浪潮数据技术有限公司 | Cluster node control method, device, equipment and medium |
CN115102887A (en) * | 2022-07-15 | 2022-09-23 | 济南浪潮数据技术有限公司 | Cluster node monitoring method and related equipment |
CN115118638A (en) * | 2022-06-29 | 2022-09-27 | 济南浪潮数据技术有限公司 | Method, device and medium for monitoring back-end network card |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103067242A (en) * | 2012-12-04 | 2013-04-24 | 中国电信股份有限公司云计算分公司 | Virtual machine system used for providing network service |
US20150019671A1 (en) * | 2012-03-30 | 2015-01-15 | Fujitsu Limited | Information processing system, trouble detecting method, and information processing apparatus |
CN105302661A (en) * | 2014-06-04 | 2016-02-03 | 北京云端时代科技有限公司 | System and method for implementing virtualization management platform high availability |
CN105763471A (en) * | 2014-12-16 | 2016-07-13 | 中兴通讯股份有限公司 | Link management method, device and system in virtual machine environment |
CN107995106A (en) * | 2017-12-04 | 2018-05-04 | 山东超越数控电子股份有限公司 | A kind of interchanger redundant system of data storing platform |
CN108989476A (en) * | 2018-06-12 | 2018-12-11 | 新华三技术有限公司 | A kind of address distribution method and device |
CN109213507A (en) * | 2018-08-27 | 2019-01-15 | 郑州云海信息技术有限公司 | A kind of upgrade method and server |
CN109218141A (en) * | 2018-11-20 | 2019-01-15 | 郑州云海信息技术有限公司 | A kind of malfunctioning node detection method and relevant apparatus |
CN109831341A (en) * | 2019-03-19 | 2019-05-31 | 中国电子科技集团公司第三十六研究所 | A kind of fast switch over method and device of redundancy double netcard |
-
2019
- 2019-11-07 CN CN201911082326.2A patent/CN110933142A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150019671A1 (en) * | 2012-03-30 | 2015-01-15 | Fujitsu Limited | Information processing system, trouble detecting method, and information processing apparatus |
CN103067242A (en) * | 2012-12-04 | 2013-04-24 | 中国电信股份有限公司云计算分公司 | Virtual machine system used for providing network service |
CN105302661A (en) * | 2014-06-04 | 2016-02-03 | 北京云端时代科技有限公司 | System and method for implementing virtualization management platform high availability |
CN105763471A (en) * | 2014-12-16 | 2016-07-13 | 中兴通讯股份有限公司 | Link management method, device and system in virtual machine environment |
CN107995106A (en) * | 2017-12-04 | 2018-05-04 | 山东超越数控电子股份有限公司 | A kind of interchanger redundant system of data storing platform |
CN108989476A (en) * | 2018-06-12 | 2018-12-11 | 新华三技术有限公司 | A kind of address distribution method and device |
CN109213507A (en) * | 2018-08-27 | 2019-01-15 | 郑州云海信息技术有限公司 | A kind of upgrade method and server |
CN109218141A (en) * | 2018-11-20 | 2019-01-15 | 郑州云海信息技术有限公司 | A kind of malfunctioning node detection method and relevant apparatus |
CN109831341A (en) * | 2019-03-19 | 2019-05-31 | 中国电子科技集团公司第三十六研究所 | A kind of fast switch over method and device of redundancy double netcard |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111885097A (en) * | 2020-06-01 | 2020-11-03 | 视联动力信息技术股份有限公司 | Network card processing method and device, electronic equipment and storage medium |
CN112769652A (en) * | 2021-01-14 | 2021-05-07 | 苏州浪潮智能科技有限公司 | Node service monitoring method, device, equipment and medium |
CN112866408A (en) * | 2021-02-09 | 2021-05-28 | 山东英信计算机技术有限公司 | Service switching method, device, equipment and storage medium in cluster |
CN114363150A (en) * | 2021-12-28 | 2022-04-15 | 浪潮通信技术有限公司 | Network card connectivity monitoring method and device for server cluster |
CN114363150B (en) * | 2021-12-28 | 2024-05-14 | 浪潮通信技术有限公司 | Network card connectivity monitoring method and device of server cluster |
CN114826892A (en) * | 2022-04-28 | 2022-07-29 | 济南浪潮数据技术有限公司 | Cluster node control method, device, equipment and medium |
CN115118638A (en) * | 2022-06-29 | 2022-09-27 | 济南浪潮数据技术有限公司 | Method, device and medium for monitoring back-end network card |
CN115102887A (en) * | 2022-07-15 | 2022-09-23 | 济南浪潮数据技术有限公司 | Cluster node monitoring method and related equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110933142A (en) | ICFS cluster network card monitoring method, device and equipment and medium | |
CN104506392B (en) | A kind of delay machine detection method and equipment | |
CN111176873B (en) | Automatic micro-service offline method and device, computer equipment and storage medium | |
CN108737574B (en) | Node offline judgment method, device, equipment and readable storage medium | |
CN107483260B (en) | Fault processing method and device and electronic equipment | |
CN107404522B (en) | Cross-node virtual machine cluster high-availability implementation method and device | |
CN108430116A (en) | Suspension reconnection method, medium, device and computing device | |
CN108429629A (en) | Equipment fault restoration methods and device | |
CN112506702B (en) | Disaster recovery method, device, equipment and storage medium for data center | |
CN110618864A (en) | Interrupt task recovery method and device | |
CN112636979B (en) | Cluster alarm method and related device | |
CN112769652B (en) | Node service monitoring method, device, equipment and medium | |
CN111565135A (en) | Method for monitoring operation of server, monitoring server and storage medium | |
CN114615310B (en) | Method and device for maintaining TCP connection and electronic equipment | |
CN114268565A (en) | Terminal device, heartbeat packet transmission interval detection method thereof and storage medium | |
CN110809262A (en) | Internet of things equipment operation and maintenance management method based on COAP protocol | |
JP6421516B2 (en) | Server device, redundant server system, information takeover program, and information takeover method | |
CN109194521B (en) | Flow forwarding method and equipment | |
CN110224872B (en) | Communication method, device and storage medium | |
CN112787918B (en) | Data center addressing and master-slave switching method based on service routing tree | |
CN110597672A (en) | Method and device for main/standby switching of ATCA switching system | |
CN114422335B (en) | Communication method, device, server and storage medium | |
CN110895521A (en) | OSD and MON connection method, device, equipment and storage medium | |
CN113098978B (en) | Data transmission method, device and medium | |
CN112367386A (en) | Ignite-based automatic operation and maintenance method, apparatus and computer equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200327 |