US20090092054A1 - Method for providing notifications of a failing node to other nodes within a computer network - Google Patents

Method for providing notifications of a failing node to other nodes within a computer network Download PDF

Info

Publication number
US20090092054A1
US20090092054A1 US11/869,370 US86937007A US2009092054A1 US 20090092054 A1 US20090092054 A1 US 20090092054A1 US 86937007 A US86937007 A US 86937007A US 2009092054 A1 US2009092054 A1 US 2009092054A1
Authority
US
United States
Prior art keywords
node
nodes
notification packet
error notification
error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/869,370
Inventor
Matthew C. Compton
Andrew G. Hourselt
Michael R. Maletich
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US11/869,370 priority Critical patent/US20090092054A1/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HOURSELT, ANDREW G., COMPTON, MATTHEW C., MALETICH, MICHAEL R.
Publication of US20090092054A1 publication Critical patent/US20090092054A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/20Hop count for routing purposes, e.g. TTL
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/28Routing or path finding of packets in data switching networks using route fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/40Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass for recovering from a failure of a protocol instance or entity, e.g. service redundancy protocols, protocol state redundancy or protocol service redirection

Definitions

  • the present invention relates to computer networks in general, and more particularly, to a method for providing notifications of a failing node to other nodes within a computer network.
  • High-availability computer networks typically include multiple interconnected nodes (or computer systems). Since the processing load of a computer network may be distributed across multiple nodes, the nodes within a high-availability computer network are becoming increasingly interdependent. If one node within a computer network experiences a failure, the problem can impair the performance of other nodes within the computer network.
  • a failing node In a conventional high-availability computer network, a failing node is aware of its own failure and can send a failure notification to a service personnel when a problem occurs. However, a node that depends on the failing node will continue to operate normally (i.e., without any knowledge of the failure) until the node that depends on the failing node attempts to contact the failing node. Upon learning of the failure node, the node that depends on the failing node must handle the unexpected failure in a reactive manner. Furthermore, the node that depends on the failing node typically does not have the ability to determine the details of a failure occurring on another node. Thus, a huge amount of time and resources can be used to determine the cause, severity, and potential corrective actions for a failing node.
  • a first node monitors data traffic within a computer network. If the data traffic includes data exchanged between the first node and a second node, the first node adds the second node to a list of interested nodes stored within the first node. If the first node experiences an error, the first node generates an error notification packet that includes a hop limit value that corresponds to a pre-defined level of nodes within the computer network that the error notification packet may propagate. The first node sends the error notification packet with the hop limit value to the second node and other nodes within the list of interested nodes. After receiving the error notification packet, the second node decrements the hop limit, performs one or more actions, and if the hop limit value is greater than zero, the second node also forwards the error notification packet to each node within its list of interested nodes.
  • FIG. 1 is a block diagram of a computer network in which a preferred embodiment of the present invention is incorporated;
  • FIG. 2A is a diagram of a memory of a node within the computer network of FIG. 1 , in accordance with a preferred embodiment of the present invention
  • FIG. 2B is a diagram of an error notification packet, in accordance with a preferred embodiment of the present invention.
  • FIG. 3A is a high-level logic flow diagram of a method for generating lists of interested nodes within the computer network of FIG. 1 , in accordance with a preferred embodiment of the present invention
  • FIG. 3B is a high-level logic flow diagram of a method for providing notifications of a failing node to other nodes within the computer network of FIG. 1 , in accordance with a preferred embodiment of the present invention.
  • FIG. 3C is a high-level logic flow diagram of a method for reacting to notifications of a failing node within the computer network of FIG. 1 , in accordance with a preferred embodiment of the present invention.
  • a computer network 100 includes multiple nodes 105 A through 105 G.
  • a node refers to a computer system or other processing device within computer network 100 .
  • Computer network 100 may be a local-area network (LAN), a wide-area network (WAN), or a distributed network, such as the Internet.
  • each of nodes 105 A- 105 G within computer network 100 is similarly configured and includes a processor, a memory, and an input/output (I/O) interface.
  • node 105 A includes a processor 10 A, which is coupled to a memory 115 A and an I/O interface 120 A.
  • I/O interface 120 A enables node 105 A to communicate with one or more other nodes, such as node 105 B and node 105 C, within computer network 100 .
  • memory 115 A includes a list of interested nodes 200 and a hop limit counter 205 .
  • Interested nodes list 200 includes one or more nodes that node 105 A has previously communicated with (i.e., sent data to and/or received data from).
  • Node 105 A updates interested nodes list 200 according to the process illustrated in FIG. 3 , which will be discussed in details below.
  • a node that experiences an error sends an error notification packet to one or more interested nodes, and in turn, each of which may then send its own error notification packet to their own list of interested nodes.
  • a hop limit counter such as hop limit counter 205 , contains a pre-defined value that determines how far out within a computer network an error notification packet will propagate, and each error notification packet contains the value from the hop limit counter of the node that sends the error notification packet.
  • node 105 A if node 105 A experiences an error, node 105 A will send an error notification packet to other nodes.
  • interested nodes list 200 include node B, node C, node E, and node N
  • node 105 A will send an error notification packet to nodes B, C, E and N, and each of which will, in turn, send its own error notification packet to other nodes according to their respective interested nodes list. Since the value within hop limit counter 205 is 1, the error notification packet can only propagate to exactly one more level of nodes, and each of nodes B, C, E and N will only forward its own error notification packet to nodes on its interested nodes list.
  • an error notification packet 210 includes an error location field 215 , an error type field 220 , an error status field 225 , and a hop limit value field 230 .
  • Error location field 215 contains the node from which error notification packet 210 was generated.
  • Error type field 220 provides information corresponding to the nature of the error (e.g., hardware failure, software failure, connectivity failure, or data integrity error).
  • Error status field 225 provides information corresponding to the status of the error (e.g., unresolved, repair in progress, or resolved).
  • Hop limit value field 230 includes a hop limit value from the hop limit counter of a sending node. A node may send an initial error notification packet when an error occurs, and the node may subsequently send a second error notification packet after the error has been resolved.
  • a node (such as node 105 A from FIG. 1 ) monitors data traffic in a computer network, as depicted in block 305 .
  • a determination is then made whether or not the node has detected data traffic to and/or from another node, as shown in block 310 . If the node has not detected any data traffic to and/or from another node, the process returns to block 305 to continue monitoring data traffic.
  • the node adds the node corresponding to the data traffic to a list of interested nodes (such as interested nodes list 200 from FIG. 2A ), as depicted in block 315 , and the process terminates at block 317 .
  • a list of interested nodes such as interested nodes list 200 from FIG. 2A
  • FIG. 3B there is illustrated a high-level logic flow diagram of a method for providing notifications of a failing node to other nodes within a computer network, in accordance with a preferred embodiment of the present invention.
  • a determination is made whether or not a node has detected any error within its own operation (after the node has performed a local health check), as shown in block 320 . If the node has not detected any errors within its own operation (i.e., the node is operating normally), the process returns to block 320 . Otherwise, if the node has detected one or more errors occurred within its own operation, the node generates an error notification packet (such as error notification packet 210 from FIG. 2B ) having a hop limit value, and the node sends the error notification packet to each node on the list of interested nodes, as shown in block 325 . The process subsequently terminates at block 327 .
  • an error notification packet such as error notification packet 210 from FIG. 2B
  • FIG. 3C there is illustrated a high-level logic flow diagram of a method for reacting to notifications of a failing node within a computer network, in accordance with a preferred embodiment of the present invention.
  • the process begins at block 328 .
  • Each node on the list of interested nodes receives an error notification packet sent by a failing node, decrements the hop limit value of the error notification packet by one, and performs one or more actions based on factors that include the values of error type and error status, as depicted in block 330 .
  • Possible actions that can be performed by a node that receives an error notification packet may include, but are not limited to, the following:
  • the node that received the error notification packet forwards the error notification packet to each node on its corresponding list of interested nodes, as depicted in block 440 , and the process returns to block 330 .
  • the maximum number of error notification packets that can be forwarded to other nodes is dictated by the value of the hop limit value in the first error notification packet.
  • the present invention provides an improved method for providing notifications of a failing node to other nodes within a computer network.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A method for providing failure notifications to dependent nodes within a computer network is disclosed. A first node monitors data traffic within a computer network. If the data traffic includes data exchanged between the first node and a second node, the first node adds the second node to a list of interested nodes stored within the first node. If the first node experiences an error, the first node generates an error notification packet that includes a hop limit value that corresponds to a pre-defined level of nodes within the computer network that the error notification packet may propagate. The first node sends the error notification packet with the hop limit value to the second node and other nodes within the list of interested nodes. After receiving the error notification packet, the second node decrements the hop limit, performs one or more actions, and if the hop limit value is greater than zero, the second node also forwards the error notification packet to each node within its list of interested nodes.

Description

    BACKGROUND OF THE INVENTION
  • 1. Technical Field
  • The present invention relates to computer networks in general, and more particularly, to a method for providing notifications of a failing node to other nodes within a computer network.
  • 2. Description of Related Art
  • High-availability computer networks typically include multiple interconnected nodes (or computer systems). Since the processing load of a computer network may be distributed across multiple nodes, the nodes within a high-availability computer network are becoming increasingly interdependent. If one node within a computer network experiences a failure, the problem can impair the performance of other nodes within the computer network.
  • In a conventional high-availability computer network, a failing node is aware of its own failure and can send a failure notification to a service personnel when a problem occurs. However, a node that depends on the failing node will continue to operate normally (i.e., without any knowledge of the failure) until the node that depends on the failing node attempts to contact the failing node. Upon learning of the failure node, the node that depends on the failing node must handle the unexpected failure in a reactive manner. Furthermore, the node that depends on the failing node typically does not have the ability to determine the details of a failure occurring on another node. Thus, a huge amount of time and resources can be used to determine the cause, severity, and potential corrective actions for a failing node.
  • Consequently, it would be desirable to provide an improved method for supplying notifications of a failing node to other nodes within a computer network.
  • SUMMARY OF THE INVENTION
  • In accordance with a preferred embodiment of the present invention, a first node monitors data traffic within a computer network. If the data traffic includes data exchanged between the first node and a second node, the first node adds the second node to a list of interested nodes stored within the first node. If the first node experiences an error, the first node generates an error notification packet that includes a hop limit value that corresponds to a pre-defined level of nodes within the computer network that the error notification packet may propagate. The first node sends the error notification packet with the hop limit value to the second node and other nodes within the list of interested nodes. After receiving the error notification packet, the second node decrements the hop limit, performs one or more actions, and if the hop limit value is greater than zero, the second node also forwards the error notification packet to each node within its list of interested nodes.
  • All features and advantages of the present invention will become apparent in the following detailed written description.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The invention itself, as well as a preferred mode of use, further objects, and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
  • FIG. 1 is a block diagram of a computer network in which a preferred embodiment of the present invention is incorporated;
  • FIG. 2A is a diagram of a memory of a node within the computer network of FIG. 1, in accordance with a preferred embodiment of the present invention;
  • FIG. 2B is a diagram of an error notification packet, in accordance with a preferred embodiment of the present invention; and
  • FIG. 3A is a high-level logic flow diagram of a method for generating lists of interested nodes within the computer network of FIG. 1, in accordance with a preferred embodiment of the present invention;
  • FIG. 3B is a high-level logic flow diagram of a method for providing notifications of a failing node to other nodes within the computer network of FIG. 1, in accordance with a preferred embodiment of the present invention; and
  • FIG. 3C is a high-level logic flow diagram of a method for reacting to notifications of a failing node within the computer network of FIG. 1, in accordance with a preferred embodiment of the present invention.
  • DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT
  • With reference now to the drawings, and in particular to FIG. 1, there is depicted a block diagram of a computer network in which a preferred embodiment of the present invention is incorporated. As shown, a computer network 100 includes multiple nodes 105A through 105G. As utilized herein, a node refers to a computer system or other processing device within computer network 100. Computer network 100 may be a local-area network (LAN), a wide-area network (WAN), or a distributed network, such as the Internet.
  • For the present embodiment, each of nodes 105A-105G within computer network 100 is similarly configured and includes a processor, a memory, and an input/output (I/O) interface. For example, node 105A includes a processor 10A, which is coupled to a memory 115A and an I/O interface 120A. I/O interface 120A enables node 105A to communicate with one or more other nodes, such as node 105B and node 105C, within computer network 100.
  • With reference now to FIG. 2A, there is illustrated a block diagram of memory 115A within node 105A from FIG. 1, in accordance with a preferred embodiment of the present invention. As shown, memory 115A includes a list of interested nodes 200 and a hop limit counter 205. Interested nodes list 200 includes one or more nodes that node 105A has previously communicated with (i.e., sent data to and/or received data from). Node 105A updates interested nodes list 200 according to the process illustrated in FIG. 3, which will be discussed in details below.
  • With the present invention, a node that experiences an error sends an error notification packet to one or more interested nodes, and in turn, each of which may then send its own error notification packet to their own list of interested nodes. A hop limit counter, such as hop limit counter 205, contains a pre-defined value that determines how far out within a computer network an error notification packet will propagate, and each error notification packet contains the value from the hop limit counter of the node that sends the error notification packet.
  • For example, if node 105A experiences an error, node 105A will send an error notification packet to other nodes. Since interested nodes list 200 include node B, node C, node E, and node N, node 105A will send an error notification packet to nodes B, C, E and N, and each of which will, in turn, send its own error notification packet to other nodes according to their respective interested nodes list. Since the value within hop limit counter 205 is 1, the error notification packet can only propagate to exactly one more level of nodes, and each of nodes B, C, E and N will only forward its own error notification packet to nodes on its interested nodes list.
  • Referring now to FIG. 2B, there is illustrated a block diagram of an error notification packet, in accordance with a preferred embodiment of the present invention. As shown, an error notification packet 210 includes an error location field 215, an error type field 220, an error status field 225, and a hop limit value field 230. Error location field 215 contains the node from which error notification packet 210 was generated. Error type field 220 provides information corresponding to the nature of the error (e.g., hardware failure, software failure, connectivity failure, or data integrity error). Error status field 225 provides information corresponding to the status of the error (e.g., unresolved, repair in progress, or resolved). Hop limit value field 230 includes a hop limit value from the hop limit counter of a sending node. A node may send an initial error notification packet when an error occurs, and the node may subsequently send a second error notification packet after the error has been resolved.
  • Referring now to FIG. 3A, there is illustrated a high-level logic flow diagram of a method for generating lists of interested nodes within a computer network, in accordance with a preferred embodiment of the present invention. Starting at block 300, a node (such as node 105A from FIG. 1) monitors data traffic in a computer network, as depicted in block 305. A determination is then made whether or not the node has detected data traffic to and/or from another node, as shown in block 310. If the node has not detected any data traffic to and/or from another node, the process returns to block 305 to continue monitoring data traffic. Otherwise, if the node has detected data traffic to and/or from another node, the node adds the node corresponding to the data traffic to a list of interested nodes (such as interested nodes list 200 from FIG. 2A), as depicted in block 315, and the process terminates at block 317.
  • Referring now to FIG. 3B, there is illustrated a high-level logic flow diagram of a method for providing notifications of a failing node to other nodes within a computer network, in accordance with a preferred embodiment of the present invention. Starting at block 319, a determination is made whether or not a node has detected any error within its own operation (after the node has performed a local health check), as shown in block 320. If the node has not detected any errors within its own operation (i.e., the node is operating normally), the process returns to block 320. Otherwise, if the node has detected one or more errors occurred within its own operation, the node generates an error notification packet (such as error notification packet 210 from FIG. 2B) having a hop limit value, and the node sends the error notification packet to each node on the list of interested nodes, as shown in block 325. The process subsequently terminates at block 327.
  • Referring now to FIG. 3C, there is illustrated a high-level logic flow diagram of a method for reacting to notifications of a failing node within a computer network, in accordance with a preferred embodiment of the present invention. The process begins at block 328. Each node on the list of interested nodes receives an error notification packet sent by a failing node, decrements the hop limit value of the error notification packet by one, and performs one or more actions based on factors that include the values of error type and error status, as depicted in block 330. Possible actions that can be performed by a node that receives an error notification packet may include, but are not limited to, the following:
      • a. calling a central service center on behalf of the malfunctioning node (e.g., if the malfunctioning node is experiencing a connectivity error);
      • b. forwarding the error notification packet to all nodes within the list of interested nodes on behalf of the malfunctioning node (e.g., if a grid connection or some other component of a distributed network is down);
      • c. sharing one or more resources with the malfunctioning node (e.g., if the notified node includes a duplicate copy of a database that has become corrupted in the malfunctioning node); and/or
      • d. entering a read-only and/or off-line state for a pre-defined time period (e.g., if the failure may impair the data integrity of neighboring nodes).
  • Next, a determination is made whether or not the node that received the error notification packet has previously received the error notification packet, as shown in block 332. If the node that received the error notification packet has previously received the error notification packet, the process terminates at block 345. Otherwise, if the node that received the error notification packet has not previously received the error notification packet, another determination is made whether or not the hop limit value included in the error notification packet is greater than 0, as shown in block 335. If the hop limit value is not greater than 0, the node that received the error notification packet will not forward the error notification packet, and the process terminates at block 345. Otherwise, if the hop limit value is greater than 0, the node that received the error notification packet forwards the error notification packet to each node on its corresponding list of interested nodes, as depicted in block 440, and the process returns to block 330. As mentioned above, the maximum number of error notification packets that can be forwarded to other nodes is dictated by the value of the hop limit value in the first error notification packet.
  • As has been described, the present invention provides an improved method for providing notifications of a failing node to other nodes within a computer network.
  • While an illustrative embodiment of the present invention has been described in the context of a fully functional storage system, those skilled in the art will appreciate that the software aspects of an illustrative embodiment of the present invention are capable of being distributed as a program product in a variety of forms, and that an illustrative embodiment of the present invention applies equally regardless of the particular type of media used to actually carry out the distribution. Examples of the types of media include recordable type media such as thumb drives, floppy disks, hard drives, CD ROMs, DVDs, and transmission type media such as digital and analog communication links.
  • While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims (6)

1. A method for providing notifications of a failing node to other nodes within a computer network, said method comprising:
generating an interested node list in a node, wherein said interested node list includes any other node that has previously communicated with said node;
in response to a determination that said node is experiencing an error, sending an error notification packet from said node to each node on said interested nodes list; and
after the receipt of said error notification packet, performing one or more actions by a node on said interested nodes list.
2. The method of claim 1, wherein said method further includes forwarding said error notification packet by said node on said interested nodes list to a node on a local interested nodes list stored within said node on said interested nodes list according to a hop limit value, wherein said hop limit value corresponds to a pre-defined level of nodes within said computer network that said error notification packet may propagate, wherein said hop limit is decremented by said node on said interested nodes list.
3. The method of claim 1, wherein said error notification packet includes a hop limit value field for containing a hop limit value from a hop limit counter of said node.
4. The method of claim 1, wherein nature of error includes hardware failure, software failure, connectivity failure, or data integrity error.
5. The method of claim 1, wherein status of error field includes unresolved, repair in progress, or resolved.
6. The method of claim 1, wherein said one or more actions include:
calling a central service center on behalf of said node;
forwarding said error notification packet to all nodes on said interested nodes list on behalf of said node;
sharing one or more resources with said node;
entering a read-only state for a first pre-defined time period; and
entering an offline state for a second pre-defined time period.
US11/869,370 2007-10-09 2007-10-09 Method for providing notifications of a failing node to other nodes within a computer network Abandoned US20090092054A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/869,370 US20090092054A1 (en) 2007-10-09 2007-10-09 Method for providing notifications of a failing node to other nodes within a computer network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/869,370 US20090092054A1 (en) 2007-10-09 2007-10-09 Method for providing notifications of a failing node to other nodes within a computer network

Publications (1)

Publication Number Publication Date
US20090092054A1 true US20090092054A1 (en) 2009-04-09

Family

ID=40523153

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/869,370 Abandoned US20090092054A1 (en) 2007-10-09 2007-10-09 Method for providing notifications of a failing node to other nodes within a computer network

Country Status (1)

Country Link
US (1) US20090092054A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100153770A1 (en) * 2008-12-16 2010-06-17 Industrial Technology Research Institute Real-time image monitoring and recording system and method
US20130107724A1 (en) * 2011-10-31 2013-05-02 Itron, Inc Quick advertisement of a failure of a network cellular router
WO2013066378A1 (en) * 2011-10-31 2013-05-10 Itron, Inc. Quick advertisement of a failure of a network cellular router
US8990631B1 (en) * 2011-03-03 2015-03-24 Netlogic Microsystems, Inc. Packet format for error reporting in a content addressable memory
US20150172152A1 (en) * 2013-12-12 2015-06-18 International Business Machines Corporation Alerting Service Desk Users of Business Services Outages
US20170302504A1 (en) * 2015-01-05 2017-10-19 Huawei Technologies Co., Ltd. Method for Processing Forwarding Device Fault, Device, and Controller
US11075829B2 (en) * 2018-11-30 2021-07-27 Sap Se Distributed monitoring in clusters with self-healing

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5687308A (en) * 1995-06-07 1997-11-11 Tandem Computers Incorporated Method to improve tolerance of non-homogeneous power outages
US5835482A (en) * 1995-09-22 1998-11-10 Mci Communications Corporation Communication system and method providing optimal restoration of failed paths

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5687308A (en) * 1995-06-07 1997-11-11 Tandem Computers Incorporated Method to improve tolerance of non-homogeneous power outages
US5835482A (en) * 1995-09-22 1998-11-10 Mci Communications Corporation Communication system and method providing optimal restoration of failed paths

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100153770A1 (en) * 2008-12-16 2010-06-17 Industrial Technology Research Institute Real-time image monitoring and recording system and method
US8341682B2 (en) * 2008-12-16 2012-12-25 Industrial Technology Research Institute Real-time image monitoring and recording system and method
US8990631B1 (en) * 2011-03-03 2015-03-24 Netlogic Microsystems, Inc. Packet format for error reporting in a content addressable memory
US20130107724A1 (en) * 2011-10-31 2013-05-02 Itron, Inc Quick advertisement of a failure of a network cellular router
WO2013066378A1 (en) * 2011-10-31 2013-05-10 Itron, Inc. Quick advertisement of a failure of a network cellular router
US9007923B2 (en) * 2011-10-31 2015-04-14 Itron, Inc. Quick advertisement of a failure of a network cellular router
US20150172152A1 (en) * 2013-12-12 2015-06-18 International Business Machines Corporation Alerting Service Desk Users of Business Services Outages
US20150347219A1 (en) * 2013-12-12 2015-12-03 International Business Machines Corporation Alerting Service Desk Users of Business Services Outages
US9830212B2 (en) * 2013-12-12 2017-11-28 International Business Machines Corporation Alerting service desk users of business services outages
US9921901B2 (en) * 2013-12-12 2018-03-20 International Business Machines Corporation Alerting service desk users of business services outages
US20170302504A1 (en) * 2015-01-05 2017-10-19 Huawei Technologies Co., Ltd. Method for Processing Forwarding Device Fault, Device, and Controller
US10756958B2 (en) * 2015-01-05 2020-08-25 Huawei Technologies Co., Ltd. Method, device, and controller for processing forwarding device faults received from forwarding devices on a forwarding path
US11496355B2 (en) 2015-01-05 2022-11-08 Huawei Technologies Co., Ltd. Method for processing forwarding device fault, device, and controller
US11075829B2 (en) * 2018-11-30 2021-07-27 Sap Se Distributed monitoring in clusters with self-healing
US11438250B2 (en) 2018-11-30 2022-09-06 Sap Se Distributed monitoring in clusters with self-healing

Similar Documents

Publication Publication Date Title
US20090092054A1 (en) Method for providing notifications of a failing node to other nodes within a computer network
US9819733B2 (en) Peer-to-peer exchange of data resources in a control system
US9917741B2 (en) Method and system for processing network activity data
US20070078809A1 (en) Robust data availability system having decentralized storage and multiple access paths
JP2005209190A (en) Reporting of multi-state status for high-availability cluster node
JP2001249856A (en) Method for processing error in storage area network(san) and data processing system
JP2004086792A (en) Obstruction information collecting program and obstruction information collecting device
US9231779B2 (en) Redundant automation system
US10732873B1 (en) Timeout mode for storage devices
CN112217847A (en) Micro service platform, implementation method thereof, electronic device and storage medium
JP3924247B2 (en) Software-based fault-tolerant network using a single LAN
US11563671B2 (en) Routing engine switchover based on health determined by support vector machine
US20050022048A1 (en) Fault tolerance in networks
CN115550287B (en) Method for establishing remote copy relationship and related device
JP2011203941A (en) Information processing apparatus, monitoring method and monitoring program
JP5922127B2 (en) Fault processing method, computer-readable storage medium, and computer system
JP7474168B2 (en) Monitoring system and fault monitoring method
JP2017028539A (en) Communication device, control device and communication system
JP2006260400A (en) Method of monitoring computer device condition
Matić et al. Health monitoring and auto-scaling RabbitMQ queues within the smart home system
JP6670877B2 (en) Failure determination device, failure determination system, failure determination method, and program
EP3355530A1 (en) Method, apparatus and device for processing service failure
JP4863984B2 (en) Monitoring processing program, method and apparatus
JP2007272328A (en) Computer system
US11947431B1 (en) Replication data facility failure detection and failover automation

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:COMPTON, MATTHEW C.;HOURSELT, ANDREW G.;MALETICH, MICHAEL R.;REEL/FRAME:019935/0491;SIGNING DATES FROM 20071008 TO 20071009

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION