CN113542052A - Node fault determination method and device and server - Google Patents

Node fault determination method and device and server Download PDF

Info

Publication number
CN113542052A
CN113542052A CN202110629101.5A CN202110629101A CN113542052A CN 113542052 A CN113542052 A CN 113542052A CN 202110629101 A CN202110629101 A CN 202110629101A CN 113542052 A CN113542052 A CN 113542052A
Authority
CN
China
Prior art keywords
cluster
node
port
detection
ports
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110629101.5A
Other languages
Chinese (zh)
Inventor
曾军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Information Technologies Co Ltd
Original Assignee
New H3C Information Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Information Technologies Co Ltd filed Critical New H3C Information Technologies Co Ltd
Priority to CN202110629101.5A priority Critical patent/CN113542052A/en
Publication of CN113542052A publication Critical patent/CN113542052A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The specification provides a node fault determination method, a node fault determination device and a server, and relates to the technical field of communication. A node fault determination method is applied to cluster nodes and comprises the following steps: sending a detection message to the outside through at least two ports, and receiving feedback messages sent by other cluster nodes in the cluster aiming at the detection message; if a feedback message sent by one port of other cluster nodes is not received in a preset period, determining that the port has a fault; if all ports of one other cluster node are determined to have faults, determining that the other cluster node is abnormal; and if the election identification of the node per se meets the preset condition, the node serves as a monitoring node to alarm other failed cluster nodes. By the method, the reliability of fault detection can be improved.

Description

Node fault determination method and device and server
Technical Field
The present disclosure relates to the field of communications technologies, and in particular, to a method, an apparatus, and a server for determining a node fault.
Background
The application of big data is increasingly wide, the scale of a distributed system serving mass data is continuously enlarged, and how to keep the environment of the distributed system stable and reliable becomes more and more challenging. The distributed system utilizes a plurality of servers to cooperatively work, solves the problems of calculation, storage, transmission and the like which cannot be solved by a single server, and is the most common form of the distributed system such as a cluster, wherein a plurality of servers are deployed in one cluster to serve as cluster nodes. Each server acts as a node in the distributed system and handles a portion of the tasks carried by the distributed system.
In the cluster, in order to determine the working state of each cluster node, each cluster node interacts with a heartbeat message through a set port, and one node in the cluster is used as a monitoring node to collect and alarm other nodes in the cluster. However, a cluster node in a cluster generally has a plurality of ports for data interaction, and if a port for interacting a heartbeat message fails and other ports work normally, the cluster node also cannot send the heartbeat message to the outside and is considered as a failed cluster node, so that a problem of false alarm occurs, and reliability of fault detection of the cluster is reduced.
Disclosure of Invention
In order to overcome the problems in the related art, the present specification provides a node failure determination method, an apparatus, and a server.
In combination with the first aspect of the embodiments of the present specification, the present application provides a node fault determination method, applied to a cluster node, including:
sending a detection message to the outside through at least two ports, and receiving feedback messages sent by other cluster nodes in the cluster aiming at the detection message;
if a feedback message sent by one port of other cluster nodes is not received in a preset period, determining that the port has a fault;
if all ports of one other cluster node are determined to have faults, determining that the other cluster node is abnormal;
and if the election identification of the node per se meets the preset condition, the node serves as a monitoring node to alarm other failed cluster nodes.
Optionally, the election identifier is an IP address, and the preset condition is an IP address maximum value or an IP address minimum value; or,
the election mark is a device mark, and the preset condition is a device mark maximum value or a device mark minimum value.
Optionally, after sending the detection packet to the outside through at least two ports and receiving the feedback packet sent by other cluster nodes in the cluster for the detection packet, the method further includes:
and if the at least two ports do not receive the feedback messages sent by other cluster nodes, determining that the ports have faults.
Optionally, the sending of the detection packet to the outside through at least two ports, and receiving of the feedback packet sent by other cluster nodes in the cluster for the detection packet, include:
generating a detection message through at least two threads, wherein one thread corresponds to one port;
based on a thread, sending the generated detection message to other cluster nodes in the cluster through a port corresponding to the thread;
and receiving feedback messages sent by other cluster nodes in the cluster through a port corresponding to a thread based on the thread.
Optionally, the detection message and the feedback message are generated based on an ICMP protocol.
In combination with the second aspect of the embodiments of the present specification, the present application provides a node fault determination apparatus, which is applied to a cluster node, and includes:
the interaction unit is used for sending the detection message to the outside through at least two ports and receiving feedback messages sent by other cluster nodes in the cluster aiming at the detection message;
the port fault detection unit is used for determining that a port has a fault if a feedback message sent by the port is not received in a preset period;
the node fault detection unit is used for determining that other cluster nodes are abnormal if all ports of the other cluster nodes are determined to have faults;
and the alarm unit is used for alarming other failed cluster nodes as the monitoring node if the election identification of the alarm unit per se meets the preset condition.
Optionally, the election identifier is an internet protocol IP address, and the preset condition is an IP address maximum value or an IP address minimum value; or,
the election mark is a device mark, and the preset condition is a device mark maximum value or a device mark minimum value.
Optionally, the apparatus further includes:
and the self-checking unit is used for determining that the self-checking unit has a fault if the at least two ports do not receive the feedback messages sent by other cluster nodes.
Optionally, the sending of the detection packet to the outside through at least two ports, and receiving of the feedback packet sent by other cluster nodes in the cluster for the detection packet, include:
generating a detection message through at least two threads, wherein one thread corresponds to one port;
based on a thread, sending the generated detection message to other cluster nodes in the cluster through a port corresponding to the thread;
and receiving feedback messages sent by other cluster nodes in the cluster through a port corresponding to a thread based on the thread.
Optionally, the detection message and the feedback message are generated based on an ICMP protocol.
In combination with the three aspects of the embodiments of the present specification, the present application provides a server, applied in a cluster, including: a processor, a machine-readable storage medium, and at least two ports;
a machine-readable storage medium stores machine-executable instructions executable by a processor, the processor being caused by the machine-executable instructions to: implementing any of the above method steps.
The technical scheme provided by the implementation mode of the specification can have the following beneficial effects:
in the embodiment of the present specification, a detection packet is sent through at least two ports on a cluster node, and a feedback packet sent by other cluster nodes for the sent detection packet is received, and in a preset period, if the feedback packet is not received, it is considered that a port fault occurs on other cluster nodes, and then when it is determined that all ports on one other cluster node have faults, it is determined that the whole other cluster node has faults, and an alarm is performed, so that a problem that all services of the other cluster node are interrupted due to the fact that the cluster node serving as a monitoring node determines that the other cluster node has a fault when only one port of the other cluster node has a fault is avoided, and reliability of cluster fault detection is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the specification.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present specification and together with the description, serve to explain the principles of the specification.
FIG. 1 is a flow chart of a node fault determination method to which the present application is directed;
FIG. 2 is a schematic diagram of a cluster to which the present application relates;
fig. 3 is a schematic diagram illustrating interaction between a detection packet and a feedback packet in a node fault determination method according to the present application;
fig. 4 is a schematic structural diagram of a node failure determination apparatus according to the present application;
fig. 5 is a schematic diagram of a server according to the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification.
The application provides a node fault determination method, which is applied to cluster nodes, and as shown in fig. 1, the method includes:
s100, sending a detection message to the outside through at least two ports, and receiving feedback messages sent by other cluster nodes in the cluster aiming at the detection message.
As shown in fig. 2, a cluster in a network includes a plurality of cluster nodes, each cluster node is provided with at least two ports, and each port belongs to a different network card, a different network segment, and a different network device (e.g., a router or a switch). Subsequently, description will be given by taking 3 cluster nodes (cluster node 1, cluster node 2, and cluster node 3) in a cluster, each cluster node being provided with a plurality of network cards (cluster node 1 includes network card 10 and network card 11, cluster node 2 includes network card 20 and network card 21, cluster node 3 includes network card 30 and network card 31), each network card may include one port (network card 10 includes port 100, network card 11 includes port 110, network card 20 includes port 200, network card 21 includes port 210, network card 30 includes port 300, and network card 31 includes port 310). It should be noted that the foregoing is merely an example, and does not limit the cluster nodes included in the cluster, the number of network cards included in each cluster node, and the number of ports set on the network cards.
After the cluster is started, the platform software running on the cluster nodes may release address information of each cluster node according to the address information, such as an IP (Internet Protocol) address and/or a Media Access Control (MAC) address, deployed in the cluster. After each cluster node receives the IP addresses of other cluster nodes, a thread can be created to generate a detection message of a sub-network segment by taking a port as a reference, and the detection message is forwarded to the outside through the set port. In the cluster shown in fig. 2, the cluster node 1 may generate two detection messages, and send the two detection messages to the port 200 of the cluster node 2 and the port 300 of the cluster node 3 connected to the network device 1 through the port 100 of the network card 10, and the cluster node 1 may also generate another two detection messages, and send the two detection messages to the port 210 of the cluster node 2 and the port 310 of the cluster node 3 connected to the network device 2 through the port 110 of the network card 11.
When other cluster nodes in the cluster receive the detection message, the detection message is analyzed, the detection message is constructed to a corresponding feedback message, and the feedback message is returned to the cluster nodes through the port for receiving the detection message. Therefore, one cluster node can detect whether one port of another cluster node fails.
Optionally, the detection Message and the feedback Message are generated based on an ICMP Protocol (Control Message Protocol), that is, the detection Message may be a Ping Message in the ICMP Protocol, and the feedback Message may be a Ping echo Message in the ICMP Protocol. Of course, the detection message and the feedback message may also be generated based on other communication protocols, such as KeepAlive message in TCP/IP Protocol (Transmission Control/network Protocol/Internet Protocol), and may also be heartbeat message in a cluster.
S101, if a feedback message sent by one port of other cluster nodes is not received in a preset period, determining that the port has a fault.
For example, the cluster node 1 sends the detection message to the outside through the port set on the network card, as shown in fig. 3. The cluster node 1 sends out the detection message through the port 100 on the network card 10 and the port 110 on the network card 11, starts timing respectively for the sent detection message, and assumes that the detection message reaches the cluster node 2 and the cluster node 3 respectively. The cluster node 2 and the cluster node 3 analyze the detection message respectively and construct a feedback message corresponding to the detection message. At this time, if the port 210 on the cluster node 2 fails, the detection message cannot be received, and then the cluster node 2 may return a feedback message to the port 100 through the port 200 on the network card 20.
The cluster node 1 may receive the feedback packet sent by the port 200, determine that the port 200 of the cluster node 2 is normal, but cannot receive the feedback packet sent by the port 210 all the time, and determine that the port 210 fails.
For the cluster node 3, assuming that the power is down, the port 300 and the port 310 are both failed, and cannot receive the detection packet, that is, cannot send the feedback packet to the cluster node 1. At this point, the timer on cluster node 1 started for port 300 and port 310 will time out, thereby determining that both ports on cluster node 3 have failed. In fig. 3, the ports marked with white are indicated as working ports, and the ports marked with black are indicated as failure ports.
S102, if all the ports of one other cluster node are determined to be in fault, determining that the other cluster node is abnormal.
At this time, the cluster node 1 may determine that the cluster node 2 is in a working state because the port 200 is working normally and the cluster node 2 has a normal port.
Cluster node 1 may determine that port 300 and port 310 on cluster node 3 are failed, and that cluster node 3 has no normal ports and is in an abnormal state.
Finally, the cluster node 1 may determine that the cluster node 2 is in a normal state and the cluster node 3 is in an abnormal state.
S103, if the election identification of the node per se meets the preset condition, the node serves as a monitoring node to alarm other failed cluster nodes.
When the cluster node 1 determines that the cluster node 2 is in a normal state and the cluster node 3 is in an abnormal state, the cluster node 1 may determine whether the preset condition is met according to the election identification of the cluster node 1.
Optionally, the election identifier is an IP address, and the preset condition is an IP address maximum value or an IP address minimum value; or, the election identifier is an equipment identifier, and the preset condition is an equipment identifier maximum value or an equipment identifier minimum value. The election identifier is used to select one cluster node from a plurality of cluster nodes of the cluster as a monitoring node, and the monitoring node may report the cluster node in a fault state to platform software configured in the cluster, so that the platform software can manage the cluster nodes in the cluster based on the reported state, for example, isolate the cluster nodes in an abnormal state and the like.
For example, when an IP address is used as the election identifier, the minimum value of the IP address is used as the preset condition. The IP address of cluster node 1 is 192.168.1.4, the IP address of cluster node 2 is 192.168.1.5, and the IP address of cluster node 3 is 192.168.1.6. The comparison can show that the IP address of the cluster node 1 is the minimum value of the IP addresses of the cluster nodes in the cluster, so that the cluster node 1 can be used as a monitoring node to report the fault of the cluster node 3 to the platform software, so that the cluster node 3 is isolated.
Of course, the cluster node 1 may also report the port 210 failure of the cluster node 2 to prompt the worker to check the port 210 of the cluster node 2.
In addition, the election identifier may also be an identifier of another node such as a MAC address, which is not limited to this.
In addition, in the foregoing process, when the cluster node can send a feedback packet for the detection packet, the method further includes: and S104, if the feedback message sent by one port of other cluster nodes is received in the preset period, determining that the port does not have a fault.
In the embodiment of the present specification, a detection packet is sent through at least two ports on a cluster node, and a feedback packet sent by other cluster nodes for the sent detection packet is received, and in a preset period, if the feedback packet is not received, it is considered that a port fault occurs on other cluster nodes, and then when it is determined that all ports on one other cluster node have faults, it is determined that the whole other cluster node has faults, and an alarm is performed, so that a problem that all services of the other cluster node are interrupted due to the fact that the cluster node serving as a monitoring node determines that the other cluster node has a fault when only one port of the other cluster node has a fault is avoided, and reliability of cluster fault detection is improved.
Optionally, in step S100, after sending the detection packet to the outside through at least two ports and receiving the feedback packet sent by other cluster nodes in the cluster for the detection packet, the method further includes:
s105, if the at least two ports do not receive the feedback messages sent by other cluster nodes, determining that the ports are in fault.
After the cluster node 1 sends out the detection message, after the timing is overtime, if no other cluster node in the cluster sends a feedback message to the cluster node 1, the cluster node can be considered to have a fault. At this time, if the cluster node 1 can still interact with the platform software, the cluster node 1 may be notified of its own fault, so that the platform software can isolate the cluster node 1.
Optionally, step S100, sending a detection packet to the outside through at least two ports, and receiving a feedback packet sent by other cluster nodes in the cluster for the detection packet, includes:
S100A, generating detection messages through at least two threads.
S100B, based on a thread, sends the generated detection packet to other cluster nodes in the cluster through the port corresponding to the thread.
In the process of detecting other cluster nodes in the cluster by the cluster node, possibly due to the fact that the service processed by the cluster node and the detection share the same thread, at this time, if the service processing amount of the cluster node is large, the processor can continuously process the service message within a period of time, so that the problem that the detection message cannot be processed by the cluster node although being received by the cluster node is solved, and the efficiency of determining the fault by the cluster node is reduced.
At this time, the cluster node 1 may create at least two threads in the processor, where each thread is created for one port and is independent from the thread enabled by the service, that is, one thread corresponds to one port. At this time, for one thread, a plurality of detection messages that need to be sent to the outside through the port may be constructed, and these detection messages are sent to other cluster nodes in the cluster through the same port and the same network device.
S100C, based on a thread, receives a feedback packet sent by another cluster node in the cluster through a port corresponding to the thread.
And the cluster node receiving the detection message constructs a feedback message and sends back the feedback message. At this time, the cluster node 1 may receive a feedback packet sent back by other cluster nodes for the detection packet on the port. The feedback message is received and processed through the threads independently set for the ports, so that the cluster node 1 can avoid the problem that the detection flow is blocked by the service due to the service and the detection common thread, and the efficiency of fault detection of the cluster node is improved.
Correspondingly, the present application provides a node failure determining apparatus, applied to a cluster node, as shown in fig. 4, including:
the interaction unit is used for sending the detection message to the outside through at least two ports and receiving feedback messages sent by other cluster nodes in the cluster aiming at the detection message;
the port fault detection unit is used for determining that a port has a fault if a feedback message sent by the port is not received in a preset period;
the node fault detection unit is used for determining that other cluster nodes are abnormal if all ports of the other cluster nodes are determined to have faults;
and the alarm unit is used for alarming other failed cluster nodes as the monitoring node if the election identification of the alarm unit per se meets the preset condition.
Optionally, the election identifier is an IP address, and the preset condition is an IP address maximum value or an IP address minimum value; or,
the election mark is a device mark, and the preset condition is a device mark maximum value or a device mark minimum value.
Optionally, the apparatus further includes:
and the self-checking unit is used for determining that the self-checking unit has a fault if the at least two ports do not receive the feedback messages sent by other cluster nodes.
Optionally, the sending of the detection packet to the outside through at least two ports, and receiving of the feedback packet sent by other cluster nodes in the cluster for the detection packet, include:
generating a detection message through at least two threads, wherein one thread corresponds to one port;
based on a thread, sending the generated detection message to other cluster nodes in the cluster through a port corresponding to the thread;
and receiving feedback messages sent by other cluster nodes in the cluster through a port corresponding to a thread based on the thread.
Optionally, the detection message and the feedback message are generated based on an ICMP protocol.
Correspondingly, the present application provides a server, applied in a cluster, as shown in fig. 5, including: a processor, a machine-readable storage medium, and at least two ports;
a machine-readable storage medium stores machine-executable instructions executable by a processor, the processor being caused by the machine-executable instructions to: implementing any of the above method steps.
The technical scheme provided by the implementation mode of the specification can have the following beneficial effects:
in the embodiment of the present specification, a detection packet is sent through at least two ports on a cluster node, and a feedback packet sent by other cluster nodes for the sent detection packet is received, and in a preset period, if the feedback packet is not received, it is considered that a port fault occurs on other cluster nodes, and then when it is determined that all ports on one other cluster node have faults, it is determined that the whole other cluster node has faults, and an alarm is performed, so that a problem that all services of the other cluster node are interrupted due to the fact that the cluster node serving as a monitoring node determines that the other cluster node has a fault when only one port of the other cluster node has a fault is avoided, and reliability of cluster fault detection is improved.
It will be understood that the present description is not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof.
The above description is only for the purpose of illustrating the preferred embodiments of the present disclosure and is not to be construed as limiting the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (11)

1. A node fault determination method is applied to cluster nodes and comprises the following steps:
sending a detection message to the outside through at least two ports, and receiving feedback messages sent by other cluster nodes in a cluster aiming at the detection message;
if a feedback message sent by one port of other cluster nodes is not received in a preset period, determining that the port has a fault;
if all ports of one other cluster node are determined to have faults, determining that the other cluster node is abnormal;
and if the election identification of the node per se meets the preset condition, the node serves as a monitoring node to alarm other failed cluster nodes.
2. The method of claim 1, wherein the election identifier is an Internet Protocol (IP) address, and the preset condition is an IP address maximum value or an IP address minimum value; or,
the election identification is an equipment identification, and the preset condition is an equipment identification maximum value or an equipment identification minimum value.
3. The method according to claim 1, wherein after the sending out the detection packet through the at least two ports and receiving the feedback packet sent by other cluster nodes in the cluster for the detection packet, the method further comprises:
and if the at least two ports do not receive the feedback messages sent by other cluster nodes, determining that the ports have faults.
4. The method of claim 1, wherein the sending the detection packet outside through the at least two ports and receiving the feedback packet sent by other cluster nodes in the cluster for the detection packet comprises:
generating a detection message through at least two threads, wherein one thread corresponds to one port;
based on a thread, sending the generated detection message to other cluster nodes in the cluster through a port corresponding to the thread;
and receiving feedback messages sent by other cluster nodes in the cluster through a port corresponding to a thread based on the thread.
5. The method according to any of claims 1-4, wherein the detection message and the feedback message are generated based on ICMP protocol.
6. A node fault determination device is applied to cluster nodes and comprises the following components:
the interaction unit is used for sending a detection message to the outside through at least two ports and receiving feedback messages sent by other cluster nodes in the cluster aiming at the detection message;
the port fault detection unit is used for determining that a port has a fault if a feedback message sent by the port is not received in a preset period;
the node fault detection unit is used for determining that other cluster nodes are abnormal if all ports of the other cluster nodes are determined to have faults;
and the alarm unit is used for alarming other failed cluster nodes as the monitoring node if the election identification of the alarm unit per se meets the preset condition.
7. The apparatus of claim 6, wherein the election identifier is an Internet Protocol (IP) address, and the preset condition is a maximum IP address value or a minimum IP address value; or,
the election identification is an equipment identification, and the preset condition is an equipment identification maximum value or an equipment identification minimum value.
8. The apparatus of claim 6, further comprising:
and the self-checking unit is used for determining that the self-checking unit has a fault if the at least two ports do not receive the feedback messages sent by other cluster nodes.
9. The apparatus according to claim 6, wherein the sending out the detection packet through the at least two ports and receiving the feedback packet sent by other cluster nodes in the cluster for the detection packet comprises:
generating a detection message through at least two threads, wherein one thread corresponds to one port;
based on a thread, sending the generated detection message to other cluster nodes in the cluster through a port corresponding to the thread;
and receiving feedback messages sent by other cluster nodes in the cluster through a port corresponding to a thread based on the thread.
10. The apparatus according to any of claims 6-9, wherein the detection message and the feedback message are generated based on an ICMP protocol.
11. A server, applied in a cluster, comprising: a processor, a machine-readable storage medium, and at least two ports;
the machine-readable storage medium stores machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: carrying out the method steps of any one of claims 1 to 5.
CN202110629101.5A 2021-06-07 2021-06-07 Node fault determination method and device and server Pending CN113542052A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110629101.5A CN113542052A (en) 2021-06-07 2021-06-07 Node fault determination method and device and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110629101.5A CN113542052A (en) 2021-06-07 2021-06-07 Node fault determination method and device and server

Publications (1)

Publication Number Publication Date
CN113542052A true CN113542052A (en) 2021-10-22

Family

ID=78124583

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110629101.5A Pending CN113542052A (en) 2021-06-07 2021-06-07 Node fault determination method and device and server

Country Status (1)

Country Link
CN (1) CN113542052A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116684256A (en) * 2023-08-01 2023-09-01 苏州浪潮智能科技有限公司 Node fault monitoring method, device and system, electronic equipment and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101217402A (en) * 2008-01-15 2008-07-09 杭州华三通信技术有限公司 A method to enhance the reliability of the cluster and a high reliability communication node
CN102571635A (en) * 2012-01-18 2012-07-11 浪潮(北京)电子信息产业有限公司 Message transmission method and equipment
CN104506625A (en) * 2014-12-22 2015-04-08 国云科技股份有限公司 Method for improving reliability of metadata nodes of cloud databases
US20160147592A1 (en) * 2014-11-25 2016-05-26 Intel Corporation Header parity error handling
CN105700967A (en) * 2016-01-08 2016-06-22 华为技术有限公司 PCIe (Peripheral Component Interconnect Express) equipment and detection method thereof
CN106301853A (en) * 2015-06-05 2017-01-04 华为技术有限公司 The fault detection method of group system interior joint and device
CN106331098A (en) * 2016-08-23 2017-01-11 东方网力科技股份有限公司 Server cluster system
CN107360025A (en) * 2017-07-07 2017-11-17 郑州云海信息技术有限公司 A kind of distributed memory system cluster monitoring method and apparatus
CN107995029A (en) * 2017-11-28 2018-05-04 紫光华山信息技术有限公司 Elect control method and device, electoral machinery and device
CN108494585A (en) * 2018-02-28 2018-09-04 新华三技术有限公司 Elect control method and device
CN108769199A (en) * 2018-05-29 2018-11-06 郑州云海信息技术有限公司 A kind of distributed file storage system host node management method and device
CN109218141A (en) * 2018-11-20 2019-01-15 郑州云海信息技术有限公司 A kind of malfunctioning node detection method and relevant apparatus
CN111885097A (en) * 2020-06-01 2020-11-03 视联动力信息技术股份有限公司 Network card processing method and device, electronic equipment and storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101217402A (en) * 2008-01-15 2008-07-09 杭州华三通信技术有限公司 A method to enhance the reliability of the cluster and a high reliability communication node
CN102571635A (en) * 2012-01-18 2012-07-11 浪潮(北京)电子信息产业有限公司 Message transmission method and equipment
US20160147592A1 (en) * 2014-11-25 2016-05-26 Intel Corporation Header parity error handling
CN104506625A (en) * 2014-12-22 2015-04-08 国云科技股份有限公司 Method for improving reliability of metadata nodes of cloud databases
CN106301853A (en) * 2015-06-05 2017-01-04 华为技术有限公司 The fault detection method of group system interior joint and device
CN105700967A (en) * 2016-01-08 2016-06-22 华为技术有限公司 PCIe (Peripheral Component Interconnect Express) equipment and detection method thereof
CN106331098A (en) * 2016-08-23 2017-01-11 东方网力科技股份有限公司 Server cluster system
CN107360025A (en) * 2017-07-07 2017-11-17 郑州云海信息技术有限公司 A kind of distributed memory system cluster monitoring method and apparatus
CN107995029A (en) * 2017-11-28 2018-05-04 紫光华山信息技术有限公司 Elect control method and device, electoral machinery and device
CN108494585A (en) * 2018-02-28 2018-09-04 新华三技术有限公司 Elect control method and device
CN108769199A (en) * 2018-05-29 2018-11-06 郑州云海信息技术有限公司 A kind of distributed file storage system host node management method and device
CN109218141A (en) * 2018-11-20 2019-01-15 郑州云海信息技术有限公司 A kind of malfunctioning node detection method and relevant apparatus
CN111885097A (en) * 2020-06-01 2020-11-03 视联动力信息技术股份有限公司 Network card processing method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116684256A (en) * 2023-08-01 2023-09-01 苏州浪潮智能科技有限公司 Node fault monitoring method, device and system, electronic equipment and storage medium
CN116684256B (en) * 2023-08-01 2023-11-03 苏州浪潮智能科技有限公司 Node fault monitoring method, device and system, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US6978302B1 (en) Network management apparatus and method for identifying causal events on a network
CA2493525C (en) Method and apparatus for outage measurement
US7991867B2 (en) Server checking using health probe chaining
US6813634B1 (en) Network fault alerting system and method
US7756046B2 (en) Apparatus and method for locating trouble occurrence position in communication network
US8134928B1 (en) Technique for identifying a failed network interface card within a team of network interface cards
US7545741B1 (en) Technique for identifying a failed network interface card within a team of network interface cards
WO2005062698A2 (en) System and method for managing protocol network failures in a cluster system
CN110908872B (en) Method and system for detecting server state
JP2006501717A (en) Telecom network element monitoring
CN104883360A (en) ARP spoofing fine-grained detecting method and system
CN104901953A (en) Distributed detection method and system for ARP (Address Resolution Protocol) cheating
US8559317B2 (en) Alarm threshold for BGP flapping detection
CN113542052A (en) Node fault determination method and device and server
CN110932921B (en) Method for determining route oscillation information and related equipment thereof
CN106713038B (en) remote transmission line quality detection method and system
WO2012070274A1 (en) Communication system and network malfunction detection method
CN114363150B (en) Network card connectivity monitoring method and device of server cluster
KR101207219B1 (en) Method for protecting DDS network overload
JP2002318735A (en) Method for monitoring abnormality in communication terminal of specified class on network, network managing system and network managing program
US8463940B2 (en) Method of indicating a path in a computer network
CN112600733A (en) Computer health monitoring method
CN112636999A (en) Port detection method and network monitoring system
CN106603334B (en) A kind of IP address monitoring method and device
KR100279660B1 (en) Redundancy Monitoring of Fault Monitoring Devices Using Internet Control Message Protocol (ICMP)

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20211022