CN110933142A - ICFS cluster network card monitoring method, device and equipment and medium - Google Patents

ICFS cluster network card monitoring method, device and equipment and medium Download PDF

Info

Publication number
CN110933142A
CN110933142A CN201911082326.2A CN201911082326A CN110933142A CN 110933142 A CN110933142 A CN 110933142A CN 201911082326 A CN201911082326 A CN 201911082326A CN 110933142 A CN110933142 A CN 110933142A
Authority
CN
China
Prior art keywords
node
icfs
network card
cluster network
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911082326.2A
Other languages
Chinese (zh)
Inventor
翟云磊
张立强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201911082326.2A priority Critical patent/CN110933142A/en
Publication of CN110933142A publication Critical patent/CN110933142A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • H04L67/141Setup of application sessions

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application provides an ICFS cluster network card monitoring method, which comprises the following steps: establishing TCP connection with the cluster IP between other nodes by using the CTDB service; sending first ICFS heartbeat information to other nodes according to a preset period, and receiving second ICFS heartbeat information sent by other nodes; and determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node mark set and the second ICFS heartbeat information. Therefore, the ICFS cluster network card of each node can be monitored in real time, monitoring efficiency is high, the nodes can be timely recovered according to the running state of each node, and influences on client services are reduced. The application also provides an ICFS cluster network card monitoring device, electronic equipment and a computer readable storage medium, which have the beneficial effects.

Description

ICFS cluster network card monitoring method, device and equipment and medium
Technical Field
The present disclosure relates to the field of server technologies, and in particular, to an ICFS cluster network card monitoring method, an ICFS cluster network card monitoring apparatus, an electronic device, and a computer-readable storage medium.
Background
In order to separate and avoid the external communication inside and outside the cluster, the CTDB network card and the ICFS cluster network card are usually set as different network cards, the ICFS cluster network card is only used inside the cluster, and the CTDB network card is used for communication between the CTDB nodes and provides a virtual IP for the client to access. When the CTDB network card of a certain node fails, the CTDB can sense and recover the failure, the virtual IP of the failed node is floated, and the normal node continues to provide service for the client. However, when the ICFS cluster network card fails, the CTDB cannot sense the failure, and the failure recovery is not performed, which may cause the client to cut off the traffic and cause serious impact.
Therefore, how to provide a solution to the above technical problem is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The application aims to provide an ICFS cluster network card monitoring method, an ICFS cluster network card monitoring device, electronic equipment and a computer readable storage medium, and the ICFS cluster network card can be efficiently monitored. The specific scheme is as follows:
the application discloses an ICFS cluster network card monitoring method, which comprises the following steps:
establishing TCP connection with the cluster IP between other nodes by using the CTDB service;
sending first ICFS heartbeat information to other nodes according to a preset period, and receiving second ICFS heartbeat information sent by other nodes;
and determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node mark set and the second ICFS heartbeat information.
Optionally, the determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node flag set, and the second ICFS heartbeat information includes:
acquiring a node mark of a target node from the node mark set;
if the node mark of the target node is not connectable, after receiving second ICFS heartbeat information of the target node, determining that an ICFS cluster network card of the target node is normal, and changing the node mark of the target node in the node mark set into connectable;
and reading the ICFS cluster network card state of the node by using an execution script event, and if the ICFS cluster network card state of the node is normal and the node mark is not connectable, setting the node mark as connectable.
Optionally, if the node mark of the target node is connectable, when the second ICFS heartbeat information of the target node is not received for a preset number of times, reading the ICFS cluster network card state of the local node by using an execution script event;
if the ICFS cluster network card of the node is in a fault state and the node mark is connectable, setting the node mark as not connectable and determining the ICFS cluster network card fault of the node;
if the ICFS cluster network card state of the node is normal, determining that the ICFS cluster network card of the target node has a fault;
changing the node designation of the target node in the node designation set to be non-connectable.
Optionally, after the node flag of the target node in the node flag set is changed to be not connectable, the method further includes:
and establishing the TCP connection with the target node at preset time intervals by using a timer until the connection is successful.
Optionally, after determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node flag set, and the second ICFS heartbeat information, the method includes:
determining a fault node according to the node mark set and an original node mark set;
performing fault recovery on the fault node, wherein the fault recovery comprises database synchronization, node fault notification and virtual IP allocation;
and the node corresponding to the ICFS cluster network card state of the node is the main node.
Optionally, performing fault recovery on the failed node, including:
judging whether the fault node comprises the local node;
and if the fault node comprises the local node, determining a new main node so as to facilitate the new main node to carry out fault recovery.
Optionally, the establishing a TCP connection with a cluster IP between other nodes by using the CTDB service includes:
when the CTDB is started, reading the cluster IP and ICFS cluster network cards of other nodes from a configuration file;
determining a local node cluster IP, and executing bind and list to the local node cluster IP in sequence so as to monitor whether other nodes establish connection with the local node;
if the connection requests of other nodes are read, sending agreement information to other nodes so as to establish message transmission queues with other nodes;
and connecting other nodes, establishing message transmission queues with other nodes after receiving the consent information of other nodes, and sending the ICFS heartbeat information to the corresponding other nodes.
The application provides an ICFS cluster network card monitoring device, includes:
the connection establishing module is used for establishing TCP connection with the cluster IP between other nodes by utilizing the CTDB service;
the heartbeat information receiving and sending module is used for sending first ICFS heartbeat information to other nodes according to a preset period and receiving second ICFS heartbeat information sent by other nodes;
and the running state determining module is used for determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node mark set and the second ICFS heartbeat information.
The application provides an electronic device, including:
a memory for storing a computer program;
and the processor is used for realizing the steps of the ICFS cluster network card monitoring method when executing the computer program.
The present application provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the above-mentioned ICFS cluster network card monitoring method.
The application provides an ICFS cluster network card monitoring method, which comprises the following steps: establishing TCP connection with the cluster IP between other nodes by using the CTDB service; sending first ICFS heartbeat information to other nodes according to a preset period, and receiving second ICFS heartbeat information sent by other nodes; and determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node mark set and the second ICFS heartbeat information.
Therefore, the CTDB is used for establishing TCP connection with other node clusters IP, the running state of the ICFS cluster network card is determined according to the ICFS cluster network card state of the node, the node mark set and the second ICFS heartbeat information, the ICFS cluster network card of each node can be monitored in real time, the monitoring efficiency is high, the nodes can be timely recovered according to the running state of each node, and the influence on the client service is reduced.
The application also provides an ICFS cluster network card monitoring device, an electronic device and a computer readable storage medium, which all have the beneficial effects and are not described herein again.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a method for monitoring an ICFS cluster network card according to an embodiment of the present disclosure;
fig. 2 is a flowchart of establishing a TCP connection between nodes according to an embodiment of the present application;
FIG. 3 is a flowchart of a main process provided in an embodiment of the present application;
FIG. 4 is a flow diagram of fault detection and recovery provided by an embodiment of the present application;
fig. 5 is a schematic structural diagram of an ICFS cluster network card monitoring device according to an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
When the CTDB network card of a certain node fails, the CTDB can sense and recover the failure, the virtual IP of the failed node is floated, and the normal node continues to provide service for the client. However, when the ICFS cluster network card fails, the CTDB cannot sense the failure, and the failure recovery is not performed, which may cause the client to cut off the traffic and cause serious impact. Based on the above technical problem, this embodiment provides an ICFS cluster network card monitoring method, and please refer to fig. 1 specifically, where fig. 1 is a flowchart of an ICFS cluster network card monitoring method provided in this embodiment, and specifically includes:
s101, establishing a TCP connection with the cluster IP between other nodes by utilizing the CTDB service.
The execution subject of the present embodiment is a master node, but of course, may be another node as long as the object of the present embodiment can be achieved. And each node is provided with a CTDB service for realizing ICFS cluster network card monitoring of each node.
Wherein, step S101 includes: when the CTDB is started, reading cluster IP and ICFS cluster network cards of other nodes from the configuration file; determining the IP of the node cluster, and executing bind and list to the IP of the node cluster in sequence so as to monitor whether other nodes establish connection with the node; if the connection request of other nodes is read, the consent information is sent to other nodes so as to establish message transmission queues with other nodes; and connecting other nodes, establishing message transmission queues with other nodes after receiving the agreement information of other nodes, and sending ICFS heartbeat information to the corresponding other nodes.
Referring to fig. 2, fig. 2 is a flowchart of establishing a TCP connection between nodes according to an embodiment of the present application, and first, when a CTDB is started, Cluster IPs, i.e., Cluster IPs and ICFS Cluster network cards, of other nodes are read from a configuration file; finding out the Cluster IP of the node and binding the Cluster IP of the node; and after binding succeeds, beginning list, and when the node is connected with other nodes, namely the node reads connection requests of other nodes, sending agreement information, namely calling an accept function, so as to establish a message transmission queue with the other side and facilitate the receiving and sending processing of messages. The node serves as a client to connect with each other node, after successful connection, namely after receiving the consent information of other nodes, a message transmission queue with the other node is established to facilitate message receiving and sending processing, and ICFS heartbeat information is actively sent to the other node once to trigger ICFS heartbeat detection. It is understood that when the number of nodes is n, the connection establishment process described above is performed by all nodes so as to successfully establish the TCP connections for mutual communication between all nodes.
S102, sending first ICFS heartbeat information to other nodes according to a preset period, and receiving second ICFS heartbeat information sent by other nodes.
The general preset period is two seconds, but of course, other times are also possible, as long as the purpose of the present embodiment can be achieved. Specifically, all nodes send first ICFS heartbeat information to other nodes every two seconds; therefore, each node also receives the second ICFS heartbeat messages sent by other nodes.
S103, determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node mark set and the second ICFS heartbeat information.
It can be understood that, when the ICFS cluster network card status of the node is normal, the node flag set is determined according to the received second ICFS heartbeat information, and when the node flag set changes from the original node flag set, it is proved that the operating status of other nodes changes, and the node flag set changes from normal to fault, or changes from fault to normal. The preset times do not receive second ICFS heartbeat information of the target node, and when the ICFS cluster network card state of the node is obtained to be a fault through executing the script, the ICFS cluster network card state of the node is the fault; and if the ICFS cluster network card state of the node is normal acquired by executing the script, the ICFS cluster network card state of the target node is a fault, so that the state monitoring of the ICFS cluster network card is realized.
Further, determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node mark set and the second ICFS heartbeat information comprises the following steps:
and acquiring the node mark of the target node from the node mark set.
If the node mark of the target node is not connectable, after second ICFS heartbeat information of the target node is received, determining that an ICFS cluster network card of the target node is normal, and changing the node mark of the target node in the node mark set into connectable; and reading the ICFS cluster network card state of the node by using the execution script event, and if the ICFS cluster network card state of the node is normal and the node mark is not connectable, setting the node mark as connectable.
If the node mark of the target node is connectable, when second ICFS heartbeat information of the target node is not received for a preset number of times, reading the ICFS cluster network card state of the node by using an execution script event; if the ICFS cluster network card of the node is in a fault state and the node mark is connectable, setting the node mark as not connectable and determining the fault of the ICFS cluster network card of the node; if the ICFS cluster network card state of the node is normal, determining the ICFS cluster network card fault of the target node; the node designation of the target node in the node designation set is changed to be non-connectable.
Each node is stored with a node mark set, which includes node marks of all nodes, and it should be noted that the node mark attribute of each node is set by the node, but the node mark set is the node mark of the corresponding node determined according to the reception of the ICFS heartbeat information. Wherein, after the node mark of the target node in the node mark set is changed to be not connectable, the method further comprises the following steps: and establishing TCP connection with the target node at preset time intervals by using a timer until the connection is successful.
For example, when the node reads the state of the ICFS cluster network card of the node, if the state of the ICFS cluster network card of the node is normal, the node flag is set to be connectable, at this time, the attribute of the node flag is set, and the node flag is written to be connectable in the node flag set, but when the node flag of the target node in the original node flag set is not connectable, the ICFS heartbeat information of the target node is received, the ICFS cluster network card of the target node is proved to be normal, and the node flag set is obtained, wherein the node flag of the target node in the node flag set is changed from being unconnected to being connectable. When the node marks of the target nodes in the original node mark set are connectable, if the ICFS heartbeat information of the target nodes is not received for a preset number of times, the ICFS cluster network card of the target nodes is proved to be in fault, and the node mark set is obtained, wherein the node marks of the target nodes in the node mark set are changed from connectable to non-connectable. It can be understood that, during the connection process, the master node performs fault recovery on the target node, and when the fault recovery is completed, the connection can be established.
It can be understood that each node CTDB service has two processes, one is a main process and one is a recovery process, specifically, please refer to fig. 3, where fig. 3 is a flowchart of the main process provided in the embodiment of the present application, and includes:
when the CTDB is started, the ICFS _ flag, namely the node flag, of all the nodes is set to ICFS _ DISCONNECTED, namely the nodes cannot be connected. And then, acquiring the ICFS cluster network card state of the node once through the script event, and if the ICFS _ flag of the node is normal, setting the ICFS _ flag of the node as ICFS _ OK to connect.
The heartbeat detection is performed periodically (default 2s), each detection traversing each node.
If the node considers that the ICFS _ flag of the target node is ICFS _ DISCONNECTED, namely the node flag of the target node in the current node flag set is not connectable, but the ICFS heartbeat message of the opposite side is received, the ICFS cluster network card of the target node is considered to be recovered to be normal, the ICFS _ flag corresponding to the target node in the node flag set is ICFS _ OK, and the ICFS cluster network card state of the target node is set to be normal. And then, acquiring the ICFS cluster network card state of the node once in real time through a script event, and if the ICfs _ flag of the node is in a normal state and is ICFS _ DISCONNECTED, setting the ICfs _ flag of the node as ICFS _ OK.
If the node considers that the ICFS _ flag of the target node is ICFS _ OK but the ICFS heartbeat message of the target node is not received for 4 times of detection, the ICFS heartbeat of the target node is considered to be lost, the ICFS _ flag of the target node is set as ICFS _ DISCONNECTED, and the ICFS cluster network card state of the target node is set to be abnormal. And simultaneously acquiring the ICFS cluster network card state of the node, and if the ICfs _ flag of the node is in an abnormal state and is not the ICFS _ DISCONNECTED mark, setting the ICfs _ flag of the node as ICFS _ DISCONNECTED. And if the target node ICFS _ flag is not ICFS _ DISCONNECTED but does not receive the ICFS heartbeat message of the opposite side for 4 times continuously, the ICFS cluster network card of the opposite side is considered to be normal, and the ICFS heartbeat message is continuously sent to the opposite side. And starting a timer to connect the target node every 1s, and stopping the timer until the connection is successful, so that the tcp connection of the Cluster IP can be reestablished when the target node ICFS Cluster network card is recovered to be normal. It can be understood that, during the connection process, the master node performs fault recovery on the target node, and when the fault recovery is completed, the connection can be established.
Based on the technical scheme, the CTDB is used for establishing TCP connection with other node clusters IP, the running state of the ICFS cluster network card is determined according to the ICFS cluster network card state of the node, the node mark set and the second ICFS heartbeat information, each node ICFS cluster network card can be monitored in real time, the monitoring efficiency is high, and therefore the node can be timely recovered according to the running state of each node, and the influence on the client service is reduced.
In an implementation manner, after determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node flag set, and the second ICFS heartbeat information, the method includes: determining a fault node according to the node mark set and the original node mark set; carrying out fault recovery on the fault node, wherein the fault recovery comprises database synchronization, node fault notification and virtual IP allocation; and the node corresponding to the ICFS cluster network card state of the node is the main node. Wherein, carrying out fault recovery on the fault node comprises: judging whether the fault node comprises the local node; and if the fault node comprises the local node, determining a new main node so that the new main node can carry out fault recovery.
The CTDB service has two processes, a main process and a recovery process. The main process is responsible for ICFS heartbeat information receiving and sending and service processing among all the nodes, the recovery process circularly detects whether the cluster is abnormal or not, and if yes, fault recovery and the like are carried out. The ICFS heartbeat detection processing is carried out in the main process, and the fault detection and recovery processing is carried out in the recovery process of the main node.
Referring to fig. 4 in detail, fig. 4 is a flowchart of fault detection and recovery provided by an embodiment of the present application, including:
and the recovery process pulls the node flags icfs _ flag of each node from the main process and stores the node flags into a node flag set icfs _ flags, wherein the icfs _ flags [ i ] represents whether the Cluster IP Cluster IP of the ith node can be connected or not.
Directly updating the icfs _ flags saved by the recovery process by the non-host node; if the master node is the master node, comparing the previously stored original node flag set ICFS _ flags with the newly obtained node flag set ICFS _ flags, if the original node flag set ICFS _ flags and the newly obtained node flag set ICFS _ flags are not consistent, indicating that the ICFS cluster network card with the nodes has a fault or is recovered to be normal, updating the node flag set ICFS _ flags at the moment, and then performing recovery processing.
When recovery processing is carried out, if the node flag ICFS _ flag of the node is detected to be ICFS _ DISCONNECTED, the ICFS cluster network card of the node is indicated to have a fault, fault recovery processing is not carried out, a new main node is determined again, and fault recovery is carried out by the new main node.
If the ICFS _ flag of the user is detected not to be ICFS _ DISCONNECTED, fault recovery is started, and the fault recovery mainly carries out database synchronization, executes a fault recovery script to inform a cluster of which nodes have faults, virtual IP allocation and the like.
Therefore, it can be understood that each node has a main process and a recovery process, but only the recovery process of the main node can perform fault recovery, when an ICFS cluster network card of the main node fails, a new main node is determined, and the fault recovery is performed by using the new main node, and thus, a TCP connection between each node is established through a CTDB service set in each node, so that the state of the ICFS cluster network card of each node can be detected through ICFS heartbeat information, and the monitoring efficiency is improved.
It can be understood that, in this embodiment, the CTDB service is used to enable the nodes of the cluster to mutually detect the state of the ICFS cluster network card of the other party through the ICFS heartbeat, when the ICFS heartbeat of the other party is lost and the ICFS cluster network card of this node normally operates, it is considered that the ICFS cluster network card of the other party has a fault, and the master node is used to perform fault recovery processing, thereby ensuring that the client service is quickly switched to the normal node. When the ICFS cluster network card of the failed node is recovered to be normal again, other nodes detect that the ICFS heartbeat of the node is recovered, the ICFS cluster network card of the node is considered to be recovered to be normal, and the main node can perform fault recovery processing to recover the node to be normally used. The method meets the constantly changing requirements in actual production, improves the stability and high availability of the cluster, avoids the problem that the state of the ICFS cluster network card of the node cannot be notified to other nodes through the CTDB network card when the CTDB network card of the node fails, and in addition, after the ICFS heartbeat loss is detected, the ICFS cluster network card state of the node is acquired in real time through executing a script event to determine whether the ICFS heartbeat loss caused by the ICFS cluster network card failure of the node is caused or whether the ICFS heartbeat loss caused by the ICFS cluster network card failure of the other side is caused.
Therefore, the technical scheme provided by the embodiment has high monitoring efficiency, and when the ICFS network card of a node fails, the CTDB can sense and quickly recover the failure, so that the client service can be timely recovered, and the influence on the client service is reduced.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an ICFS cluster network card monitoring device according to an embodiment of the present disclosure, where the ICFS cluster network card monitoring device and the ICFS cluster network card monitoring method described above are referred to correspondingly, and the schematic structural diagram includes:
a connection establishing module 100, configured to establish a TCP connection with a cluster IP between other nodes by using a CTDB service;
the heartbeat information receiving and sending module 200 is configured to send first ICFS heartbeat information to other nodes according to a preset period, and receive second ICFS heartbeat information sent by other nodes;
and the running state determining module 300 is configured to determine a running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node mark set, and the second ICFS heartbeat information.
Optionally, the operation state determining module 300 includes:
a node mark acquisition unit, configured to acquire a node mark of a target node from a node mark set;
the first determining unit is used for determining that the ICFS cluster network card of the target node is normal after receiving second ICFS heartbeat information of the target node if the node mark of the target node is not connectable, and changing the node mark of the target node in the node mark set into connectable;
and the setting unit is used for reading the ICFS cluster network card state of the node by utilizing the execution script event, and setting the node mark as connectable if the ICFS cluster network card state of the node is normal and the node mark is not connectable.
Optionally, the operation state determining module 300 further includes:
the reading unit is used for reading the ICFS cluster network card state of the node by utilizing the execution script event when the second ICFS heartbeat information of the target node is not received for the continuous preset times if the node mark of the target node is connectable;
the second determining unit is used for setting the mark of the node as not connectable if the ICFS cluster network card of the node is in a fault state and the mark of the node is connectable, and determining the fault of the ICFS cluster network card of the node;
the fault determining unit is used for determining the ICFS cluster network card fault of the target node if the ICFS cluster network card state of the node is normal;
and the changing unit is used for changing the node marks of the target nodes in the node mark set into the unconnected nodes.
Optionally, the method further includes:
and the connection establishing module is used for establishing TCP connection with the target node at preset time intervals by utilizing a timer until the connection is successful.
Optionally, the method further includes:
the fault node determining module is used for determining a fault node according to the node mark set and the original node mark set;
and the fault recovery module is used for performing fault recovery on the fault node, wherein the fault recovery comprises database synchronization, node fault notification and virtual IP allocation.
Optionally, the failure recovery module includes:
the judging module is used for judging whether the fault node comprises the local node;
and the new main node determining module is used for determining the new main node if the fault node comprises the local node so as to facilitate the fault recovery of the new main node.
Optionally, the connection establishing module includes:
the reading unit is used for reading cluster IP and ICFS cluster network cards of other nodes from the configuration file when the CTDB is started;
the processing unit is used for determining the node cluster IP and sequentially executing bind and list to the node cluster IP so as to monitor whether other nodes establish connection with the node;
the agreement unit is used for sending agreement information to other nodes so as to establish message transmission queues with other nodes after reading the connection requests of other nodes;
and the establishing unit is used for connecting other nodes, establishing message transmission queues with other nodes after receiving the agreement information of other nodes, and sending the ICFS heartbeat information to the corresponding other nodes.
Since the embodiment of the ICFS cluster network card monitoring device portion corresponds to the embodiment of the ICFS cluster network card monitoring method portion, for the embodiment of the ICFS cluster network card monitoring device portion, reference is made to the description of the embodiment of the ICFS cluster network card monitoring method portion, and details are not described here.
In the following, the electronic device provided by the embodiment of the present application is introduced, and the electronic device described below and the ICFS cluster network card monitoring method described above may be referred to correspondingly.
The present embodiment provides an electronic device, including:
a memory for storing a computer program;
and the processor is used for realizing the steps of the ICFS cluster network card monitoring method when executing the computer program.
Since the embodiment of the electronic device portion corresponds to the embodiment of the ICFS cluster network card monitoring method portion, reference is made to the description of the embodiment of the ICFS cluster network card monitoring method portion for the embodiment of the electronic device portion, and details are not described here for the moment.
In the following, a computer-readable storage medium provided in an embodiment of the present application is introduced, and the computer-readable storage medium described below and the ICFS cluster network card monitoring method described above may be referred to correspondingly.
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, and when being executed by a processor, the computer program implements the steps of the above ICFS cluster network card monitoring method.
Since the embodiment of the computer-readable storage medium portion corresponds to the embodiment of the ICFS cluster network card monitoring method portion, for the embodiment of the computer-readable storage medium portion, reference is made to the description of the embodiment of the ICFS cluster network card monitoring method portion, and details are not described here for the moment.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The ICFS cluster network card monitoring method, the ICFS cluster network card monitoring device, the electronic device, and the computer-readable storage medium provided by the present application are described in detail above. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

Claims (10)

1. An ICFS cluster network card monitoring method is characterized by comprising the following steps:
establishing TCP connection with the cluster IP between other nodes by using the CTDB service;
sending first ICFS heartbeat information to other nodes according to a preset period, and receiving second ICFS heartbeat information sent by other nodes;
and determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node mark set and the second ICFS heartbeat information.
2. The ICFS cluster network card monitoring method according to claim 1, wherein determining the operating state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node flag set, and the second ICFS heartbeat information includes:
acquiring a node mark of a target node from the node mark set;
if the node mark of the target node is not connectable, after receiving second ICFS heartbeat information of the target node, determining that an ICFS cluster network card of the target node is normal, and changing the node mark of the target node in the node mark set into connectable;
and reading the ICFS cluster network card state of the node by using an execution script event, and if the ICFS cluster network card state of the node is normal and the node mark is not connectable, setting the node mark as connectable.
3. The ICFS cluster network card monitoring method of claim 2, wherein if the node mark of the target node is connectable, when the second ICFS heartbeat information of the target node is not received for a preset number of consecutive times, the state of the ICFS cluster network card of the node is read by using the execution script event;
if the ICFS cluster network card of the node is in a fault state and the node mark is connectable, setting the node mark as not connectable and determining the ICFS cluster network card fault of the node;
if the ICFS cluster network card state of the node is normal, determining that the ICFS cluster network card of the target node has a fault, and changing the node marks of the target node in the node mark set into non-connectable nodes.
4. The ICFS cluster network card monitoring method according to claim 3, wherein after the node flag of the target node in the node flag set is changed to be not connectable, the method further comprises:
and establishing the TCP connection with the target node at preset time intervals by using a timer until the connection is successful.
5. The ICFS cluster network card monitoring method of claim 1, wherein after determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node mark set and the second ICFS heartbeat information, the method comprises the following steps:
determining a fault node according to the node mark set and an original node mark set;
performing fault recovery on the fault node, wherein the fault recovery comprises database synchronization, node fault notification and virtual IP allocation;
and the node corresponding to the ICFS cluster network card state of the node is the main node.
6. The ICFS cluster network card monitoring method of claim 5, wherein the fault recovery of the fault node comprises:
judging whether the fault node comprises the local node;
and if the fault node comprises the local node, determining a new main node so as to facilitate the new main node to carry out fault recovery.
7. The ICFS cluster network card monitoring method of claim 1, wherein the establishing TCP connections with the cluster IP among other nodes using the CTDB service includes:
when the CTDB is started, reading the cluster IP and ICFS cluster network cards of other nodes from a configuration file;
determining a local node cluster IP, and executing bind and list to the local node cluster IP in sequence so as to monitor whether other nodes establish connection with the local node;
if the connection requests of other nodes are read, sending agreement information to other nodes so as to establish message transmission queues with other nodes;
and connecting other nodes, establishing message transmission queues with other nodes after receiving the consent information of other nodes, and sending the ICFS heartbeat information to the corresponding other nodes.
8. An ICFS cluster network card monitoring device is characterized by comprising:
the connection establishing module is used for establishing TCP connection with the cluster IP between other nodes by utilizing the CTDB service;
the heartbeat information receiving and sending module is used for sending first ICFS heartbeat information to other nodes according to a preset period and receiving second ICFS heartbeat information sent by other nodes;
and the running state determining module is used for determining the running state of the ICFS cluster network card according to the ICFS cluster network card state of the node, the node mark set and the second ICFS heartbeat information.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the ICFS cluster network card monitoring method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the ICFS cluster network card monitoring method according to any one of claims 1 to 7.
CN201911082326.2A 2019-11-07 2019-11-07 ICFS cluster network card monitoring method, device and equipment and medium Pending CN110933142A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911082326.2A CN110933142A (en) 2019-11-07 2019-11-07 ICFS cluster network card monitoring method, device and equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911082326.2A CN110933142A (en) 2019-11-07 2019-11-07 ICFS cluster network card monitoring method, device and equipment and medium

Publications (1)

Publication Number Publication Date
CN110933142A true CN110933142A (en) 2020-03-27

Family

ID=69852560

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911082326.2A Pending CN110933142A (en) 2019-11-07 2019-11-07 ICFS cluster network card monitoring method, device and equipment and medium

Country Status (1)

Country Link
CN (1) CN110933142A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111885097A (en) * 2020-06-01 2020-11-03 视联动力信息技术股份有限公司 Network card processing method and device, electronic equipment and storage medium
CN112769652A (en) * 2021-01-14 2021-05-07 苏州浪潮智能科技有限公司 Node service monitoring method, device, equipment and medium
CN112866408A (en) * 2021-02-09 2021-05-28 山东英信计算机技术有限公司 Service switching method, device, equipment and storage medium in cluster
CN114363150A (en) * 2021-12-28 2022-04-15 浪潮通信技术有限公司 Network card connectivity monitoring method and device for server cluster
CN114826892A (en) * 2022-04-28 2022-07-29 济南浪潮数据技术有限公司 Cluster node control method, device, equipment and medium
CN115102887A (en) * 2022-07-15 2022-09-23 济南浪潮数据技术有限公司 Cluster node monitoring method and related equipment
CN115118638A (en) * 2022-06-29 2022-09-27 济南浪潮数据技术有限公司 Method, device and medium for monitoring back-end network card

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103067242A (en) * 2012-12-04 2013-04-24 中国电信股份有限公司云计算分公司 Virtual machine system used for providing network service
US20150019671A1 (en) * 2012-03-30 2015-01-15 Fujitsu Limited Information processing system, trouble detecting method, and information processing apparatus
CN105302661A (en) * 2014-06-04 2016-02-03 北京云端时代科技有限公司 System and method for implementing virtualization management platform high availability
CN105763471A (en) * 2014-12-16 2016-07-13 中兴通讯股份有限公司 Link management method, device and system in virtual machine environment
CN107995106A (en) * 2017-12-04 2018-05-04 山东超越数控电子股份有限公司 A kind of interchanger redundant system of data storing platform
CN108989476A (en) * 2018-06-12 2018-12-11 新华三技术有限公司 A kind of address distribution method and device
CN109213507A (en) * 2018-08-27 2019-01-15 郑州云海信息技术有限公司 A kind of upgrade method and server
CN109218141A (en) * 2018-11-20 2019-01-15 郑州云海信息技术有限公司 A kind of malfunctioning node detection method and relevant apparatus
CN109831341A (en) * 2019-03-19 2019-05-31 中国电子科技集团公司第三十六研究所 A kind of fast switch over method and device of redundancy double netcard

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150019671A1 (en) * 2012-03-30 2015-01-15 Fujitsu Limited Information processing system, trouble detecting method, and information processing apparatus
CN103067242A (en) * 2012-12-04 2013-04-24 中国电信股份有限公司云计算分公司 Virtual machine system used for providing network service
CN105302661A (en) * 2014-06-04 2016-02-03 北京云端时代科技有限公司 System and method for implementing virtualization management platform high availability
CN105763471A (en) * 2014-12-16 2016-07-13 中兴通讯股份有限公司 Link management method, device and system in virtual machine environment
CN107995106A (en) * 2017-12-04 2018-05-04 山东超越数控电子股份有限公司 A kind of interchanger redundant system of data storing platform
CN108989476A (en) * 2018-06-12 2018-12-11 新华三技术有限公司 A kind of address distribution method and device
CN109213507A (en) * 2018-08-27 2019-01-15 郑州云海信息技术有限公司 A kind of upgrade method and server
CN109218141A (en) * 2018-11-20 2019-01-15 郑州云海信息技术有限公司 A kind of malfunctioning node detection method and relevant apparatus
CN109831341A (en) * 2019-03-19 2019-05-31 中国电子科技集团公司第三十六研究所 A kind of fast switch over method and device of redundancy double netcard

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111885097A (en) * 2020-06-01 2020-11-03 视联动力信息技术股份有限公司 Network card processing method and device, electronic equipment and storage medium
CN112769652A (en) * 2021-01-14 2021-05-07 苏州浪潮智能科技有限公司 Node service monitoring method, device, equipment and medium
CN112866408A (en) * 2021-02-09 2021-05-28 山东英信计算机技术有限公司 Service switching method, device, equipment and storage medium in cluster
CN114363150A (en) * 2021-12-28 2022-04-15 浪潮通信技术有限公司 Network card connectivity monitoring method and device for server cluster
CN114363150B (en) * 2021-12-28 2024-05-14 浪潮通信技术有限公司 Network card connectivity monitoring method and device of server cluster
CN114826892A (en) * 2022-04-28 2022-07-29 济南浪潮数据技术有限公司 Cluster node control method, device, equipment and medium
CN115118638A (en) * 2022-06-29 2022-09-27 济南浪潮数据技术有限公司 Method, device and medium for monitoring back-end network card
CN115102887A (en) * 2022-07-15 2022-09-23 济南浪潮数据技术有限公司 Cluster node monitoring method and related equipment

Similar Documents

Publication Publication Date Title
CN110933142A (en) ICFS cluster network card monitoring method, device and equipment and medium
CN104506392B (en) A kind of delay machine detection method and equipment
CN111176873B (en) Automatic micro-service offline method and device, computer equipment and storage medium
CN108737574B (en) Node offline judgment method, device, equipment and readable storage medium
CN107483260B (en) Fault processing method and device and electronic equipment
CN107404522B (en) Cross-node virtual machine cluster high-availability implementation method and device
CN108430116A (en) Suspension reconnection method, medium, device and computing device
CN108429629A (en) Equipment fault restoration methods and device
CN112506702B (en) Disaster recovery method, device, equipment and storage medium for data center
CN110618864A (en) Interrupt task recovery method and device
CN112636979B (en) Cluster alarm method and related device
CN112769652B (en) Node service monitoring method, device, equipment and medium
CN111565135A (en) Method for monitoring operation of server, monitoring server and storage medium
CN114615310B (en) Method and device for maintaining TCP connection and electronic equipment
CN114268565A (en) Terminal device, heartbeat packet transmission interval detection method thereof and storage medium
CN110809262A (en) Internet of things equipment operation and maintenance management method based on COAP protocol
JP6421516B2 (en) Server device, redundant server system, information takeover program, and information takeover method
CN109194521B (en) Flow forwarding method and equipment
CN110224872B (en) Communication method, device and storage medium
CN112787918B (en) Data center addressing and master-slave switching method based on service routing tree
CN110597672A (en) Method and device for main/standby switching of ATCA switching system
CN114422335B (en) Communication method, device, server and storage medium
CN110895521A (en) OSD and MON connection method, device, equipment and storage medium
CN113098978B (en) Data transmission method, device and medium
CN112367386A (en) Ignite-based automatic operation and maintenance method, apparatus and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200327