US20090092054A1 - Method for providing notifications of a failing node to other nodes within a computer network - Google Patents
Method for providing notifications of a failing node to other nodes within a computer network Download PDFInfo
- Publication number
- US20090092054A1 US20090092054A1 US11/869,370 US86937007A US2009092054A1 US 20090092054 A1 US20090092054 A1 US 20090092054A1 US 86937007 A US86937007 A US 86937007A US 2009092054 A1 US2009092054 A1 US 2009092054A1
- Authority
- US
- United States
- Prior art keywords
- node
- nodes
- notification packet
- error notification
- error
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 230000008439 repair process Effects 0.000 claims description 2
- 230000001419 dependent effect Effects 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 12
- 230000008569 process Effects 0.000 description 9
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/069—Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/02—Topology update or discovery
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/20—Hop count for routing purposes, e.g. TTL
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/28—Routing or path finding of packets in data switching networks using route fault recovery
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/40—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass for recovering from a failure of a protocol instance or entity, e.g. service redundancy protocols, protocol state redundancy or protocol service redirection
Definitions
- the present invention relates to computer networks in general, and more particularly, to a method for providing notifications of a failing node to other nodes within a computer network.
- High-availability computer networks typically include multiple interconnected nodes (or computer systems). Since the processing load of a computer network may be distributed across multiple nodes, the nodes within a high-availability computer network are becoming increasingly interdependent. If one node within a computer network experiences a failure, the problem can impair the performance of other nodes within the computer network.
- a failing node In a conventional high-availability computer network, a failing node is aware of its own failure and can send a failure notification to a service personnel when a problem occurs. However, a node that depends on the failing node will continue to operate normally (i.e., without any knowledge of the failure) until the node that depends on the failing node attempts to contact the failing node. Upon learning of the failure node, the node that depends on the failing node must handle the unexpected failure in a reactive manner. Furthermore, the node that depends on the failing node typically does not have the ability to determine the details of a failure occurring on another node. Thus, a huge amount of time and resources can be used to determine the cause, severity, and potential corrective actions for a failing node.
- a first node monitors data traffic within a computer network. If the data traffic includes data exchanged between the first node and a second node, the first node adds the second node to a list of interested nodes stored within the first node. If the first node experiences an error, the first node generates an error notification packet that includes a hop limit value that corresponds to a pre-defined level of nodes within the computer network that the error notification packet may propagate. The first node sends the error notification packet with the hop limit value to the second node and other nodes within the list of interested nodes. After receiving the error notification packet, the second node decrements the hop limit, performs one or more actions, and if the hop limit value is greater than zero, the second node also forwards the error notification packet to each node within its list of interested nodes.
- FIG. 1 is a block diagram of a computer network in which a preferred embodiment of the present invention is incorporated;
- FIG. 2A is a diagram of a memory of a node within the computer network of FIG. 1 , in accordance with a preferred embodiment of the present invention
- FIG. 2B is a diagram of an error notification packet, in accordance with a preferred embodiment of the present invention.
- FIG. 3A is a high-level logic flow diagram of a method for generating lists of interested nodes within the computer network of FIG. 1 , in accordance with a preferred embodiment of the present invention
- FIG. 3B is a high-level logic flow diagram of a method for providing notifications of a failing node to other nodes within the computer network of FIG. 1 , in accordance with a preferred embodiment of the present invention.
- FIG. 3C is a high-level logic flow diagram of a method for reacting to notifications of a failing node within the computer network of FIG. 1 , in accordance with a preferred embodiment of the present invention.
- a computer network 100 includes multiple nodes 105 A through 105 G.
- a node refers to a computer system or other processing device within computer network 100 .
- Computer network 100 may be a local-area network (LAN), a wide-area network (WAN), or a distributed network, such as the Internet.
- each of nodes 105 A- 105 G within computer network 100 is similarly configured and includes a processor, a memory, and an input/output (I/O) interface.
- node 105 A includes a processor 10 A, which is coupled to a memory 115 A and an I/O interface 120 A.
- I/O interface 120 A enables node 105 A to communicate with one or more other nodes, such as node 105 B and node 105 C, within computer network 100 .
- memory 115 A includes a list of interested nodes 200 and a hop limit counter 205 .
- Interested nodes list 200 includes one or more nodes that node 105 A has previously communicated with (i.e., sent data to and/or received data from).
- Node 105 A updates interested nodes list 200 according to the process illustrated in FIG. 3 , which will be discussed in details below.
- a node that experiences an error sends an error notification packet to one or more interested nodes, and in turn, each of which may then send its own error notification packet to their own list of interested nodes.
- a hop limit counter such as hop limit counter 205 , contains a pre-defined value that determines how far out within a computer network an error notification packet will propagate, and each error notification packet contains the value from the hop limit counter of the node that sends the error notification packet.
- node 105 A if node 105 A experiences an error, node 105 A will send an error notification packet to other nodes.
- interested nodes list 200 include node B, node C, node E, and node N
- node 105 A will send an error notification packet to nodes B, C, E and N, and each of which will, in turn, send its own error notification packet to other nodes according to their respective interested nodes list. Since the value within hop limit counter 205 is 1, the error notification packet can only propagate to exactly one more level of nodes, and each of nodes B, C, E and N will only forward its own error notification packet to nodes on its interested nodes list.
- an error notification packet 210 includes an error location field 215 , an error type field 220 , an error status field 225 , and a hop limit value field 230 .
- Error location field 215 contains the node from which error notification packet 210 was generated.
- Error type field 220 provides information corresponding to the nature of the error (e.g., hardware failure, software failure, connectivity failure, or data integrity error).
- Error status field 225 provides information corresponding to the status of the error (e.g., unresolved, repair in progress, or resolved).
- Hop limit value field 230 includes a hop limit value from the hop limit counter of a sending node. A node may send an initial error notification packet when an error occurs, and the node may subsequently send a second error notification packet after the error has been resolved.
- a node (such as node 105 A from FIG. 1 ) monitors data traffic in a computer network, as depicted in block 305 .
- a determination is then made whether or not the node has detected data traffic to and/or from another node, as shown in block 310 . If the node has not detected any data traffic to and/or from another node, the process returns to block 305 to continue monitoring data traffic.
- the node adds the node corresponding to the data traffic to a list of interested nodes (such as interested nodes list 200 from FIG. 2A ), as depicted in block 315 , and the process terminates at block 317 .
- a list of interested nodes such as interested nodes list 200 from FIG. 2A
- FIG. 3B there is illustrated a high-level logic flow diagram of a method for providing notifications of a failing node to other nodes within a computer network, in accordance with a preferred embodiment of the present invention.
- a determination is made whether or not a node has detected any error within its own operation (after the node has performed a local health check), as shown in block 320 . If the node has not detected any errors within its own operation (i.e., the node is operating normally), the process returns to block 320 . Otherwise, if the node has detected one or more errors occurred within its own operation, the node generates an error notification packet (such as error notification packet 210 from FIG. 2B ) having a hop limit value, and the node sends the error notification packet to each node on the list of interested nodes, as shown in block 325 . The process subsequently terminates at block 327 .
- an error notification packet such as error notification packet 210 from FIG. 2B
- FIG. 3C there is illustrated a high-level logic flow diagram of a method for reacting to notifications of a failing node within a computer network, in accordance with a preferred embodiment of the present invention.
- the process begins at block 328 .
- Each node on the list of interested nodes receives an error notification packet sent by a failing node, decrements the hop limit value of the error notification packet by one, and performs one or more actions based on factors that include the values of error type and error status, as depicted in block 330 .
- Possible actions that can be performed by a node that receives an error notification packet may include, but are not limited to, the following:
- the node that received the error notification packet forwards the error notification packet to each node on its corresponding list of interested nodes, as depicted in block 440 , and the process returns to block 330 .
- the maximum number of error notification packets that can be forwarded to other nodes is dictated by the value of the hop limit value in the first error notification packet.
- the present invention provides an improved method for providing notifications of a failing node to other nodes within a computer network.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
A method for providing failure notifications to dependent nodes within a computer network is disclosed. A first node monitors data traffic within a computer network. If the data traffic includes data exchanged between the first node and a second node, the first node adds the second node to a list of interested nodes stored within the first node. If the first node experiences an error, the first node generates an error notification packet that includes a hop limit value that corresponds to a pre-defined level of nodes within the computer network that the error notification packet may propagate. The first node sends the error notification packet with the hop limit value to the second node and other nodes within the list of interested nodes. After receiving the error notification packet, the second node decrements the hop limit, performs one or more actions, and if the hop limit value is greater than zero, the second node also forwards the error notification packet to each node within its list of interested nodes.
Description
- 1. Technical Field
- The present invention relates to computer networks in general, and more particularly, to a method for providing notifications of a failing node to other nodes within a computer network.
- 2. Description of Related Art
- High-availability computer networks typically include multiple interconnected nodes (or computer systems). Since the processing load of a computer network may be distributed across multiple nodes, the nodes within a high-availability computer network are becoming increasingly interdependent. If one node within a computer network experiences a failure, the problem can impair the performance of other nodes within the computer network.
- In a conventional high-availability computer network, a failing node is aware of its own failure and can send a failure notification to a service personnel when a problem occurs. However, a node that depends on the failing node will continue to operate normally (i.e., without any knowledge of the failure) until the node that depends on the failing node attempts to contact the failing node. Upon learning of the failure node, the node that depends on the failing node must handle the unexpected failure in a reactive manner. Furthermore, the node that depends on the failing node typically does not have the ability to determine the details of a failure occurring on another node. Thus, a huge amount of time and resources can be used to determine the cause, severity, and potential corrective actions for a failing node.
- Consequently, it would be desirable to provide an improved method for supplying notifications of a failing node to other nodes within a computer network.
- In accordance with a preferred embodiment of the present invention, a first node monitors data traffic within a computer network. If the data traffic includes data exchanged between the first node and a second node, the first node adds the second node to a list of interested nodes stored within the first node. If the first node experiences an error, the first node generates an error notification packet that includes a hop limit value that corresponds to a pre-defined level of nodes within the computer network that the error notification packet may propagate. The first node sends the error notification packet with the hop limit value to the second node and other nodes within the list of interested nodes. After receiving the error notification packet, the second node decrements the hop limit, performs one or more actions, and if the hop limit value is greater than zero, the second node also forwards the error notification packet to each node within its list of interested nodes.
- All features and advantages of the present invention will become apparent in the following detailed written description.
- The invention itself, as well as a preferred mode of use, further objects, and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
-
FIG. 1 is a block diagram of a computer network in which a preferred embodiment of the present invention is incorporated; -
FIG. 2A is a diagram of a memory of a node within the computer network ofFIG. 1 , in accordance with a preferred embodiment of the present invention; -
FIG. 2B is a diagram of an error notification packet, in accordance with a preferred embodiment of the present invention; and -
FIG. 3A is a high-level logic flow diagram of a method for generating lists of interested nodes within the computer network ofFIG. 1 , in accordance with a preferred embodiment of the present invention; -
FIG. 3B is a high-level logic flow diagram of a method for providing notifications of a failing node to other nodes within the computer network ofFIG. 1 , in accordance with a preferred embodiment of the present invention; and -
FIG. 3C is a high-level logic flow diagram of a method for reacting to notifications of a failing node within the computer network ofFIG. 1 , in accordance with a preferred embodiment of the present invention. - With reference now to the drawings, and in particular to
FIG. 1 , there is depicted a block diagram of a computer network in which a preferred embodiment of the present invention is incorporated. As shown, acomputer network 100 includesmultiple nodes 105A through 105G. As utilized herein, a node refers to a computer system or other processing device withincomputer network 100.Computer network 100 may be a local-area network (LAN), a wide-area network (WAN), or a distributed network, such as the Internet. - For the present embodiment, each of
nodes 105A-105G withincomputer network 100 is similarly configured and includes a processor, a memory, and an input/output (I/O) interface. For example,node 105A includes a processor 10A, which is coupled to amemory 115A and an I/O interface 120A. I/O interface 120A enablesnode 105A to communicate with one or more other nodes, such asnode 105B andnode 105C, withincomputer network 100. - With reference now to
FIG. 2A , there is illustrated a block diagram ofmemory 115A withinnode 105A fromFIG. 1 , in accordance with a preferred embodiment of the present invention. As shown,memory 115A includes a list ofinterested nodes 200 and ahop limit counter 205.Interested nodes list 200 includes one or more nodes thatnode 105A has previously communicated with (i.e., sent data to and/or received data from). Node 105A updatesinterested nodes list 200 according to the process illustrated inFIG. 3 , which will be discussed in details below. - With the present invention, a node that experiences an error sends an error notification packet to one or more interested nodes, and in turn, each of which may then send its own error notification packet to their own list of interested nodes. A hop limit counter, such as
hop limit counter 205, contains a pre-defined value that determines how far out within a computer network an error notification packet will propagate, and each error notification packet contains the value from the hop limit counter of the node that sends the error notification packet. - For example, if
node 105A experiences an error,node 105A will send an error notification packet to other nodes. Sinceinterested nodes list 200 include node B, node C, node E, and node N,node 105A will send an error notification packet to nodes B, C, E and N, and each of which will, in turn, send its own error notification packet to other nodes according to their respective interested nodes list. Since the value withinhop limit counter 205 is 1, the error notification packet can only propagate to exactly one more level of nodes, and each of nodes B, C, E and N will only forward its own error notification packet to nodes on its interested nodes list. - Referring now to
FIG. 2B , there is illustrated a block diagram of an error notification packet, in accordance with a preferred embodiment of the present invention. As shown, anerror notification packet 210 includes anerror location field 215, anerror type field 220, anerror status field 225, and a hoplimit value field 230.Error location field 215 contains the node from whicherror notification packet 210 was generated.Error type field 220 provides information corresponding to the nature of the error (e.g., hardware failure, software failure, connectivity failure, or data integrity error).Error status field 225 provides information corresponding to the status of the error (e.g., unresolved, repair in progress, or resolved). Hoplimit value field 230 includes a hop limit value from the hop limit counter of a sending node. A node may send an initial error notification packet when an error occurs, and the node may subsequently send a second error notification packet after the error has been resolved. - Referring now to
FIG. 3A , there is illustrated a high-level logic flow diagram of a method for generating lists of interested nodes within a computer network, in accordance with a preferred embodiment of the present invention. Starting atblock 300, a node (such asnode 105A fromFIG. 1 ) monitors data traffic in a computer network, as depicted inblock 305. A determination is then made whether or not the node has detected data traffic to and/or from another node, as shown inblock 310. If the node has not detected any data traffic to and/or from another node, the process returns to block 305 to continue monitoring data traffic. Otherwise, if the node has detected data traffic to and/or from another node, the node adds the node corresponding to the data traffic to a list of interested nodes (such as interested nodes list 200 fromFIG. 2A ), as depicted inblock 315, and the process terminates atblock 317. - Referring now to
FIG. 3B , there is illustrated a high-level logic flow diagram of a method for providing notifications of a failing node to other nodes within a computer network, in accordance with a preferred embodiment of the present invention. Starting atblock 319, a determination is made whether or not a node has detected any error within its own operation (after the node has performed a local health check), as shown inblock 320. If the node has not detected any errors within its own operation (i.e., the node is operating normally), the process returns to block 320. Otherwise, if the node has detected one or more errors occurred within its own operation, the node generates an error notification packet (such aserror notification packet 210 fromFIG. 2B ) having a hop limit value, and the node sends the error notification packet to each node on the list of interested nodes, as shown inblock 325. The process subsequently terminates atblock 327. - Referring now to
FIG. 3C , there is illustrated a high-level logic flow diagram of a method for reacting to notifications of a failing node within a computer network, in accordance with a preferred embodiment of the present invention. The process begins atblock 328. Each node on the list of interested nodes receives an error notification packet sent by a failing node, decrements the hop limit value of the error notification packet by one, and performs one or more actions based on factors that include the values of error type and error status, as depicted inblock 330. Possible actions that can be performed by a node that receives an error notification packet may include, but are not limited to, the following: -
- a. calling a central service center on behalf of the malfunctioning node (e.g., if the malfunctioning node is experiencing a connectivity error);
- b. forwarding the error notification packet to all nodes within the list of interested nodes on behalf of the malfunctioning node (e.g., if a grid connection or some other component of a distributed network is down);
- c. sharing one or more resources with the malfunctioning node (e.g., if the notified node includes a duplicate copy of a database that has become corrupted in the malfunctioning node); and/or
- d. entering a read-only and/or off-line state for a pre-defined time period (e.g., if the failure may impair the data integrity of neighboring nodes).
- Next, a determination is made whether or not the node that received the error notification packet has previously received the error notification packet, as shown in
block 332. If the node that received the error notification packet has previously received the error notification packet, the process terminates atblock 345. Otherwise, if the node that received the error notification packet has not previously received the error notification packet, another determination is made whether or not the hop limit value included in the error notification packet is greater than 0, as shown inblock 335. If the hop limit value is not greater than 0, the node that received the error notification packet will not forward the error notification packet, and the process terminates atblock 345. Otherwise, if the hop limit value is greater than 0, the node that received the error notification packet forwards the error notification packet to each node on its corresponding list of interested nodes, as depicted in block 440, and the process returns to block 330. As mentioned above, the maximum number of error notification packets that can be forwarded to other nodes is dictated by the value of the hop limit value in the first error notification packet. - As has been described, the present invention provides an improved method for providing notifications of a failing node to other nodes within a computer network.
- While an illustrative embodiment of the present invention has been described in the context of a fully functional storage system, those skilled in the art will appreciate that the software aspects of an illustrative embodiment of the present invention are capable of being distributed as a program product in a variety of forms, and that an illustrative embodiment of the present invention applies equally regardless of the particular type of media used to actually carry out the distribution. Examples of the types of media include recordable type media such as thumb drives, floppy disks, hard drives, CD ROMs, DVDs, and transmission type media such as digital and analog communication links.
- While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.
Claims (6)
1. A method for providing notifications of a failing node to other nodes within a computer network, said method comprising:
generating an interested node list in a node, wherein said interested node list includes any other node that has previously communicated with said node;
in response to a determination that said node is experiencing an error, sending an error notification packet from said node to each node on said interested nodes list; and
after the receipt of said error notification packet, performing one or more actions by a node on said interested nodes list.
2. The method of claim 1 , wherein said method further includes forwarding said error notification packet by said node on said interested nodes list to a node on a local interested nodes list stored within said node on said interested nodes list according to a hop limit value, wherein said hop limit value corresponds to a pre-defined level of nodes within said computer network that said error notification packet may propagate, wherein said hop limit is decremented by said node on said interested nodes list.
3. The method of claim 1 , wherein said error notification packet includes a hop limit value field for containing a hop limit value from a hop limit counter of said node.
4. The method of claim 1 , wherein nature of error includes hardware failure, software failure, connectivity failure, or data integrity error.
5. The method of claim 1 , wherein status of error field includes unresolved, repair in progress, or resolved.
6. The method of claim 1 , wherein said one or more actions include:
calling a central service center on behalf of said node;
forwarding said error notification packet to all nodes on said interested nodes list on behalf of said node;
sharing one or more resources with said node;
entering a read-only state for a first pre-defined time period; and
entering an offline state for a second pre-defined time period.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/869,370 US20090092054A1 (en) | 2007-10-09 | 2007-10-09 | Method for providing notifications of a failing node to other nodes within a computer network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/869,370 US20090092054A1 (en) | 2007-10-09 | 2007-10-09 | Method for providing notifications of a failing node to other nodes within a computer network |
Publications (1)
Publication Number | Publication Date |
---|---|
US20090092054A1 true US20090092054A1 (en) | 2009-04-09 |
Family
ID=40523153
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/869,370 Abandoned US20090092054A1 (en) | 2007-10-09 | 2007-10-09 | Method for providing notifications of a failing node to other nodes within a computer network |
Country Status (1)
Country | Link |
---|---|
US (1) | US20090092054A1 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100153770A1 (en) * | 2008-12-16 | 2010-06-17 | Industrial Technology Research Institute | Real-time image monitoring and recording system and method |
US20130107724A1 (en) * | 2011-10-31 | 2013-05-02 | Itron, Inc | Quick advertisement of a failure of a network cellular router |
WO2013066378A1 (en) * | 2011-10-31 | 2013-05-10 | Itron, Inc. | Quick advertisement of a failure of a network cellular router |
US8990631B1 (en) * | 2011-03-03 | 2015-03-24 | Netlogic Microsystems, Inc. | Packet format for error reporting in a content addressable memory |
US20150172152A1 (en) * | 2013-12-12 | 2015-06-18 | International Business Machines Corporation | Alerting Service Desk Users of Business Services Outages |
US20170302504A1 (en) * | 2015-01-05 | 2017-10-19 | Huawei Technologies Co., Ltd. | Method for Processing Forwarding Device Fault, Device, and Controller |
US11075829B2 (en) * | 2018-11-30 | 2021-07-27 | Sap Se | Distributed monitoring in clusters with self-healing |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5687308A (en) * | 1995-06-07 | 1997-11-11 | Tandem Computers Incorporated | Method to improve tolerance of non-homogeneous power outages |
US5835482A (en) * | 1995-09-22 | 1998-11-10 | Mci Communications Corporation | Communication system and method providing optimal restoration of failed paths |
-
2007
- 2007-10-09 US US11/869,370 patent/US20090092054A1/en not_active Abandoned
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5687308A (en) * | 1995-06-07 | 1997-11-11 | Tandem Computers Incorporated | Method to improve tolerance of non-homogeneous power outages |
US5835482A (en) * | 1995-09-22 | 1998-11-10 | Mci Communications Corporation | Communication system and method providing optimal restoration of failed paths |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100153770A1 (en) * | 2008-12-16 | 2010-06-17 | Industrial Technology Research Institute | Real-time image monitoring and recording system and method |
US8341682B2 (en) * | 2008-12-16 | 2012-12-25 | Industrial Technology Research Institute | Real-time image monitoring and recording system and method |
US8990631B1 (en) * | 2011-03-03 | 2015-03-24 | Netlogic Microsystems, Inc. | Packet format for error reporting in a content addressable memory |
US20130107724A1 (en) * | 2011-10-31 | 2013-05-02 | Itron, Inc | Quick advertisement of a failure of a network cellular router |
WO2013066378A1 (en) * | 2011-10-31 | 2013-05-10 | Itron, Inc. | Quick advertisement of a failure of a network cellular router |
US9007923B2 (en) * | 2011-10-31 | 2015-04-14 | Itron, Inc. | Quick advertisement of a failure of a network cellular router |
US20150172152A1 (en) * | 2013-12-12 | 2015-06-18 | International Business Machines Corporation | Alerting Service Desk Users of Business Services Outages |
US20150347219A1 (en) * | 2013-12-12 | 2015-12-03 | International Business Machines Corporation | Alerting Service Desk Users of Business Services Outages |
US9830212B2 (en) * | 2013-12-12 | 2017-11-28 | International Business Machines Corporation | Alerting service desk users of business services outages |
US9921901B2 (en) * | 2013-12-12 | 2018-03-20 | International Business Machines Corporation | Alerting service desk users of business services outages |
US20170302504A1 (en) * | 2015-01-05 | 2017-10-19 | Huawei Technologies Co., Ltd. | Method for Processing Forwarding Device Fault, Device, and Controller |
US10756958B2 (en) * | 2015-01-05 | 2020-08-25 | Huawei Technologies Co., Ltd. | Method, device, and controller for processing forwarding device faults received from forwarding devices on a forwarding path |
US11496355B2 (en) | 2015-01-05 | 2022-11-08 | Huawei Technologies Co., Ltd. | Method for processing forwarding device fault, device, and controller |
US11075829B2 (en) * | 2018-11-30 | 2021-07-27 | Sap Se | Distributed monitoring in clusters with self-healing |
US11438250B2 (en) | 2018-11-30 | 2022-09-06 | Sap Se | Distributed monitoring in clusters with self-healing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20090092054A1 (en) | Method for providing notifications of a failing node to other nodes within a computer network | |
US9819733B2 (en) | Peer-to-peer exchange of data resources in a control system | |
US9917741B2 (en) | Method and system for processing network activity data | |
US20070078809A1 (en) | Robust data availability system having decentralized storage and multiple access paths | |
JP2005209190A (en) | Reporting of multi-state status for high-availability cluster node | |
JP2001249856A (en) | Method for processing error in storage area network(san) and data processing system | |
JP2004086792A (en) | Obstruction information collecting program and obstruction information collecting device | |
US9231779B2 (en) | Redundant automation system | |
US10732873B1 (en) | Timeout mode for storage devices | |
CN112217847A (en) | Micro service platform, implementation method thereof, electronic device and storage medium | |
JP3924247B2 (en) | Software-based fault-tolerant network using a single LAN | |
US11563671B2 (en) | Routing engine switchover based on health determined by support vector machine | |
US20050022048A1 (en) | Fault tolerance in networks | |
CN115550287B (en) | Method for establishing remote copy relationship and related device | |
JP2011203941A (en) | Information processing apparatus, monitoring method and monitoring program | |
JP5922127B2 (en) | Fault processing method, computer-readable storage medium, and computer system | |
JP7474168B2 (en) | Monitoring system and fault monitoring method | |
JP2017028539A (en) | Communication device, control device and communication system | |
JP2006260400A (en) | Method of monitoring computer device condition | |
Matić et al. | Health monitoring and auto-scaling RabbitMQ queues within the smart home system | |
JP6670877B2 (en) | Failure determination device, failure determination system, failure determination method, and program | |
EP3355530A1 (en) | Method, apparatus and device for processing service failure | |
JP4863984B2 (en) | Monitoring processing program, method and apparatus | |
JP2007272328A (en) | Computer system | |
US11947431B1 (en) | Replication data facility failure detection and failover automation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:COMPTON, MATTHEW C.;HOURSELT, ANDREW G.;MALETICH, MICHAEL R.;REEL/FRAME:019935/0491;SIGNING DATES FROM 20071008 TO 20071009 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |