US8467301B2 - Router misconfiguration diagnosis - Google Patents

Router misconfiguration diagnosis Download PDF

Info

Publication number
US8467301B2
US8467301B2 US11/446,914 US44691406A US8467301B2 US 8467301 B2 US8467301 B2 US 8467301B2 US 44691406 A US44691406 A US 44691406A US 8467301 B2 US8467301 B2 US 8467301B2
Authority
US
United States
Prior art keywords
management information
interface
information value
node
notification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US11/446,914
Other versions
US20070280120A1 (en
Inventor
Kam C. Wong
Peter C. Zwetkof
David M. Rhodes
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hewlett Packard Enterprise Development LP
Original Assignee
Hewlett Packard Development Co LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Development Co LP filed Critical Hewlett Packard Development Co LP
Priority to US11/446,914 priority Critical patent/US8467301B2/en
Assigned to HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. reassignment HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: RHODES, DAVID M., WONG, KAM C., ZWETKOF, PETER C.
Publication of US20070280120A1 publication Critical patent/US20070280120A1/en
Application granted granted Critical
Publication of US8467301B2 publication Critical patent/US8467301B2/en
Assigned to HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP reassignment HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0866Checking the configuration
    • H04L41/0869Validating the configuration within one network element
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/28Routing or path finding of packets in data switching networks using route fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/02Standardisation; Integration
    • H04L41/0213Standardised network management protocols, e.g. simple network management protocol [SNMP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis

Definitions

  • Routing protocols provide reachability information and network path preference information for transmission of data packets across communications networks. Routing protocols include, but are not limited to, routing protocol families such as Interior Gateway Protocol (IGP) and Exterior Gateway Protocol (EGP). Examples of IGP protocols include Intermediate-System to Intermediate-System (IS-IS), Open Shortest Path First (OSPF), and Enhanced Interior Gateway Routing Protocol (EIGRP). Examples of EGP protocols include Border Gateway Protocol (BGP) and BGP4.
  • IGP Interior Gateway Protocol
  • EGP Exterior Gateway Protocol
  • IGP protocols include Intermediate-System to Intermediate-System (IS-IS), Open Shortest Path First (OSPF), and Enhanced Interior Gateway Routing Protocol (EIGRP).
  • EGP protocols include Border Gateway Protocol (BGP) and BGP4.
  • Route listening technologies can monitor the data packets that flow between routers, using routing protocols. Route listening technologies are able to detect route failures and anomalies. Such technologies are able to provide near real-time reporting of routing symptoms that may indicate that a component of the communications network has gone awry.
  • route failures are caused by physical network failures that are reported by a network monitoring service.
  • route failures are often caused by a protocol miss configuration in a router. Troubleshooting in such cases typically requires manual comparison of protocol configuration values, and logging on to affected routers to perform a set of pertinent diagnostic commands. The process is time-consuming and requires expert protocol knowledge to evaluate a multitude of possible configuration mishaps. This may lead to protracted delays, and to high mean time to repair statistics.
  • MIB Management Information Base
  • polling generally requires a relatively long cycle of time to gather data from a large number of devices, it is not always feasible to gather up-to-date information on routes in a large routed environment via polling. Polling also adds overhead to both network links and network system resources, thereby causing a negative impact on scalability.
  • FIG. 1 is a block diagram of an exemplary computing environment in accordance with an implementation of the herein described systems and methods
  • FIG. 2 is a block diagram showing the cooperation of exemplary components of an exemplary data communications architecture, in accordance with an embodiment
  • FIG. 3 is a diagram illustrating transmission of a notification from an exemplary routing analyzer to an exemplary management station, in a network environment for practicing an embodiment of the invention.
  • FIG. 4A is a diagram illustrating an interface having a management information base for practicing an embodiment of the invention.
  • FIG. 4B depicts an illustrative notification, according to an embodiment of the invention.
  • FIG. 5 is a flow chart of a first exemplary method for router misconfiguration diagnosis according to an embodiment of the present invention.
  • FIG. 6 is a flow chart of a simplified exemplary method for router misconfiguration diagnosis according to an further embodiment of the present invention.
  • FIG. 7 shows an exemplary user interface for management software according to an embodiment of the invention.
  • aspects of the present invention provide a tool which, used with a network management service having a route listening service, provides a network engineer with evidence, of what parameters, if any, are misconfigured for a reported route failure that is not explained by a physical network failure.
  • the route failure causes the generation of a notification (e.g., a symptomatic alarm or trap).
  • the tool can perform live Simple Network Management Protocol (SNMP) queries to a router identified in the notification, to obtain analysis information on its configuration values and states.
  • the analysis can show what configuration parameters (i.e., management information values) are checked and can highlight any parameters that are misconfigured.
  • the list of parameters and values that are checked can help the network engineer further narrow the possible cause of the problems. The mean time to repair such route failures can thereby be reduced.
  • An embodiment of the present invention can provide near real-time immediacy in alerting a network engineer of router failures, by using a routing analyzer (e.g., a route listening service) that monitors route traffic. Further aspects of the invention can identify the cause of a route failure as misconfiguration, providing accurate, specific details so that the network engineer can quickly correct the problem. Such details may, in some embodiments, include displaying all protocol configuration parameter-value pairs that have been checked, thereby providing information to help narrow down a problem whose cause may not be obvious.
  • a routing analyzer e.g., a route listening service
  • aspects of the invention provide enhanced accuracy in detecting route failures, compared to solutions that indirectly determine the health of the routing protocol layer based solely on the use of polling, or Simple Network Management Protocol (SNMP) traps, or syslog notifications.
  • SNMP Simple Network Management Protocol
  • Authoritative information about a routing failure can be obtained by monitoring the network at its routing control plane, rather than at a higher-level network layer; accordingly, when monitoring of the routing control plane indicates there is a problem with routing, there is little doubt that a routing service is impaired.
  • FIG. 1 depicts an exemplary computing system 100 for practicing aspects of the invention, in accordance with herein described systems and methods.
  • the computing system 100 is capable of executing a variety of computing applications 180 .
  • Computing application 180 can comprise a computing application, a computing applet, a computing program and other instruction set operative on computing system 100 to perform at least one function, operation, and/or procedure.
  • Exemplary computing system 100 is controlled primarily by computer readable instructions, which can be in the form of software.
  • the computer readable instructions can contain instructions for computing system 100 for storing and accessing the computer readable instructions themselves.
  • Such software can be executed within central processing unit (CPU) 110 to cause the computing system 100 to do work.
  • CPU 110 central processing unit
  • CPU 110 is implemented by micro-electronic chips CPUs called microprocessors.
  • computing environment 100 can comprise a number of CPUs 110 . Additionally computing environment 100 can exploit the resources of remote CPUs (not shown) through communications network 160 or some other data communications means (not shown).
  • the CPU 110 fetches, decodes, and executes instructions, and transfers information to and from other resources via the computer's main data-transfer path, system bus 105 .
  • system bus 105 Such a system bus connects the components in the computing system 100 and defines the medium for data exchange.
  • Components that can be connected to the system bus 105 include extension cards, controllers such as a peripherals controller and a memory controller, memory devices such as random access memory (RAM) and read only memory (ROM), and CPU 110 .
  • the computing system 100 can contain network adaptor 170 which can be used to connect the computing system 100 to an external communication network 160 by a communication link 121 .
  • a communications network 160 may, for example, be any of, or a combination of a wired or wireless local area network (LAN), wide area network (WAN), intranet, extranet, peer-to-peer network, the Internet, or other communications network.
  • the communications network 160 can comprise two or more subnetworks such as communications networks 161 , 162 interconnected by one or more routers 150 .
  • the router 150 has interfaces (IFs) 155 A, 155 B (collectively, interfaces 155 ), through which the router 150 interconnects communications networks 161 , 162 by communication links 122 , 123 . While the exemplary router 150 shown in FIG. 1 has two interfaces 155 A, 155 B, a router 150 is not limited to two interfaces 155 , and can have one or more interfaces 155 .
  • the communications networks 160 - 162 can provide computer users with connections for communicating and transferring software and information electronically. Additionally, communications networks 160 - 162 can provide distributed processing, which involves several computers and the sharing of workloads or cooperative efforts in performing a task. Communication links 121 - 123 may, for example, include wired connections, wireless connections, optical connections, and the like. It will be appreciated that the network connections shown are exemplary and other means of establishing a communication link between computers may be used.
  • a router 150 in general, can be defined as a network device (which in some embodiments can comprise a dedicated computer 100 ) that is used to connect two or more communication networks 161 , 162 together and to route data packets between them.
  • Router 150 is configured to determine a path for forwarding the data packets, and can be adapted to use a protocol to communicate with other routers 150 ; examples of such protocols include, but are not limited to, Internet Control Message Protocol (ICMP) and routing protocols such as Open Shortest Path First (OSPF).
  • ICMP Internet Control Message Protocol
  • OSPF Open Shortest Path First
  • Router 150 is able to directly receive data packets over a communication network 161 , 162 from one or more adjacent nodes (such as computing system 100 , other computing systems 100 , other routers 150 , and other network devices).
  • Router 150 can be configured to determine an optimum route between two nodes.
  • exemplary computer system 100 is merely illustrative of a computing environment in which the herein described systems and methods may operate and does not limit the implementation of the herein described systems and methods in computing environments having differing components and configurations as the inventive concepts described herein may be implemented in various computing environments having various components and configurations.
  • FIG. 2 illustrates an illustrative networked computing environment 200 , with a server in communication with client computers via a communications network, in which the herein described apparatus and methods may be employed. While an exemplary client-server system is illustrated in FIG. 2 , any of numerous configurations may be used with aspects of the invention, including peer-to-peer and other network configurations.
  • server 205 can be one or more dedicated computing environment servers operable to process and communicate data to and from exemplary client computing environments 220 .
  • numerous computing systems 100 can be connected to the communications network 160 , and a particular computing system 100 may function as a server 205 , as a client 220 , or as both.
  • a user such as a network engineer, may interact with a computing application running on a client computing environment 220 to obtain desired data and/or computing applications.
  • the data and/or computing applications may be stored on server computing environment 205 and communicated to cooperating users through exemplary client computing environments 220 , over exemplary communications network 160 .
  • server 205 may be interconnected via a communications network 160 (which may be any of, or a combination of, a wired or wireless LAN, WAN, intranet, extranet, peer-to-peer network, the Internet, or other communications network) with a number of exemplary client computing environments such as computing system 100 , personal digital assistant 225 , wired or mobile telephone (not shown), networked storage devices, printing devices, and other network appliances (not shown), and management station 230 (collectively, client computing environments 220 ).
  • Server 205 , client computing environments 220 , and a routing analyzer 210 are connected with communications network 160 (such as by a communication link 121 ).
  • the management station 230 is operable to monitor nodes of the communications network 160 ; for example, management station 230 can monitor a protocol (e.g., Internet Protocol (IP)) used in the communications network 160 .
  • management station 230 comprises a computing system 100 equipped with a computing application 180 such as network management software for monitoring devices connected to the communications network 160 .
  • FIG. 3 is a diagram illustrating transmission of a notification 330 from an exemplary routing analyzer 210 to an exemplary management station 230 , in a network environment for practicing an embodiment of the invention, in accordance with an embodiment.
  • the routing analyzer 210 is operable to provide a route listening service 320 for monitoring the communications network 160 .
  • Routing analyzer 210 can be, for example, a network appliance such as Route Explorer, commercially available from Packet Design Inc., or OpenView Route Analytics Management System (RAMS), commercially available from Hewlett-Packard Company.
  • Routing analyzer 210 is operable to monitor a routing protocol used in the communications network 160 .
  • Routing protocols include, but are not limited to, routing protocol families such as Interior Gateway Protocol (IGP) and Exterior Gateway Protocol (EGP). Examples of IGP protocols include Intermediate-System to Intermediate-System (IS-IS), Open Shortest Path First (OSPF), Enhanced Interior Gateway Routing Protocol (EIGRP), and the like.
  • Routing analyzer 210 for example, a route analysis appliance
  • Routing analyzer 210 is able to detect events (such as routing failure 331 ) on the communications network 160 , and is able to generate notifications (e.g., asynchronous event reports, or traps) for reporting events over the communications network 160 .
  • the communications network 160 comprises a plurality of routers 150 (e.g., routers 150 A, 150 B, 150 C), which connect a plurality of nodes 310 (e.g., nodes 311 , 312 ).
  • Exemplary nodes 310 may include one or more of computing system 100 , server 205 , client computing environment 220 , or any network-connected system, device, appliance, or the like.
  • the routing analyzer 210 is able to detect a routing protocol failure condition of one or more of the routers 150 ; for example, routing failure 331 .
  • routing failure 331 packets are dropped and not advertised.
  • a further example of routing failure 331 is lost adjacency; e.g., loss of adjacency between two of the routers 150 or between two of the nodes 310 .
  • Routing analyzer 210 generates notification 330 , such as by using SNMP to generate a trap which is transmitted over communications network 160 .
  • Management station 230 is able to receive the notification 330 over communications network 160 .
  • Management station 230 is equipped with network management software 340 for monitoring devices connected to the communications network 160 .
  • Network management software 340 may, for example, send and receive network messages, e.g., by using Simple Network Management Protocol (SNMP).
  • the management station 230 is able to receive a notification 330 , such as a notification 330 generated by the routing analyzer 210 or by a router 150 .
  • Management station 230 is also able to interact with a user (not shown), such as a network engineer, by displaying information to the user and receiving inputs from the user.
  • web browsing software can be provided on management station 230 to provide interactivity with the user.
  • network management software 340 may be configured to provide interactivity with the user.
  • FIG. 4A is a diagram illustrating an interface 155 having a management information base 400 for practicing an embodiment of the invention.
  • An exemplary router 150 has an interface 155 , for connecting the router 150 to a communications network 160 .
  • the interface 155 is associated with data elements for describing aspects of the interface 155 ; for example, an interface index 411 (ifIndex), an interface administrative status 412 (ifAdminStatus), and an interface maximum transmission unit size 413 (ifMTU) that represents the maximum amount of data (e.g., packet size) that can be transferred in one physical frame.
  • the data elements 411 - 413 may, in some embodiments, be included as entries in an interface table. In further exemplary embodiments, data elements 411 - 413 may be included in management information base 400 .
  • Management information base 400 is associated with the interface 155 .
  • the management information base 400 comprises a plurality of management information values 420 .
  • the management information base 400 comprises an OSPF interface table, and the OSPF interface table includes entries (such as management information values 420 ) associated with the OSPF routing protocol.
  • Illustrative examples of management information values 420 include an OSPF interface administrative status 421 (ospfIfAdminStat), an OSPF interface area identifier 422 (ospfIfAreald), an OSPF interface type 423 (ospfIfType), an OSPF interface hello interval value 424 (ospfIfHelloInterval), and an OSPF interface router dead interval value 425 (ospfIfRtrDeadInterval).
  • the OSPF interface administrative status 421 (ospfIfAdminStat) may, for example, have a value representing an enabled status, or a disabled status.
  • the OSPF interface area identifier 422 may, for example, be a 32-bit integer uniquely identifying the area to which the interface 155 connects.
  • the OSPF interface type 423 may, for example, have a value representing broadcast LANs (e.g., Ethernet and IEEE 802.5), a value representing X.25 and similar technologies, and values representing links that are point-to-point, or point-to-multipoint.
  • the OSPF interface hello interval value 424 may, for example, represent a length of time, in seconds, between “Hello” packets that the router 150 sends on the interface 155 .
  • the OSPF interface router dead interval value 425 may, for example, represent a number of seconds that the router 150 's “Hello” packets have not been seen before neighboring routers 150 declare the adjacency between themselves and router 150 to be down.
  • FIG. 4B depicts an illustrative notification 330 , according to an embodiment of the invention.
  • the notification 330 comprises a plurality of data elements.
  • Source IP address 451 is a data element comprising a first IP address for a source node 310 (e.g., first node 311 ).
  • Destination IP address 452 is a data element comprising a second IP address for a destination node 310 (e.g., second node 312 ).
  • Alarm type 453 is a data element comprising an identifier (e.g., a numeric value, text, enumerator, constant, or the like) representing a purpose or subject matter of the notification.
  • alarm type 453 may comprise an identifier that indicates lost adjacency between source IP address 451 and destination IP address 452 .
  • FIG. 5 shows a first exemplary method 500 for router misconfiguration diagnosis according to an embodiment of the present invention.
  • the method 500 begins at start block 501 , and proceeds to block 510 .
  • a notification 330 of a routing failure 331 (e.g., lost adjacency) between a first node 311 and a second node 312 is received, such as by management software 340 running on management station 230 .
  • a user selection is made, thereby causing the management software 340 to undertake or launch a diagnostic routine (e.g., routing protocol diagnosis) for the routing failure 331 .
  • a diagnostic routine e.g., routing protocol diagnosis
  • a user at management station 230 may select a representation 710 of the notification 330 (e.g., a lost adjacency alarm) from a user interface 700 (e.g., web application, menu, browser, screen, or other interface) of the management software 340 .
  • a representation 710 of the notification 330 e.g., a lost adjacency alarm
  • FIG. 7 An example of such a representation 710 is illustrated in FIG. 7 , discussed below.
  • a first interface 155 associated with the first node 311 is identified, and a second interface 155 associated with the second node 312 is identified.
  • the identification is accomplished by extracting a source IP address 451 and a destination IP address 452 from the notification 330 .
  • two instances of an interface index 411 associated with the first and second interfaces 155 are then determined; for instance, one or more SNMP queries are initiated to find the value of an interface index 411 for the first interface 155 at source IP address 451 , and to find the value of an interface index 411 for the second interface 155 at the destination IP address 452 .
  • SNMP queries may, for example, be encoded in an executable file, or in some embodiments, may be encoded in a Perl script for enhanced platform portability, re-use of tools, customizability, and reasonably fast prototyping turnaround.
  • a check takes place, evaluating the response, if any, to the SNMP query or queries of block 520 . If there was an error or no response, the method 500 proceeds to block 550 A, discussed below. In some embodiments, if there was a valid response, the values returned from the SNMP query or queries may be saved into a table. If there was a valid response, the method 500 proceeds to block 525 .
  • interface data is found.
  • one or more data elements 412 - 413 associated with the first interface 155 for the source IP address 451 are determined. For instance, one or more SNMP queries are initiated to find the value of an ifAdminStatus 412 and an ifMTU 413 for the first interface 155 .
  • one or more data elements 412 - 413 associated with the second interface 155 at the destination IP address 452 are determined. For instance, one or more SNMP queries are initiated to find the value of an ifAdminStatus 412 and an ifMTU 413 for the second interface 155 .
  • a check takes place, evaluating the response, if any, to the SNMP query or queries of block 525 . If there was an error or no response, the method 500 proceeds to block 550 A, discussed below. In some embodiments, if there was a valid response, the values returned from the SNMP query or queries may be saved into a table. If there was a valid response, the method 500 proceeds to block 530 .
  • a first management information value 420 for the first interface 155 and a second management information value 420 for the second interface 155 are determined.
  • the determination is made using queries that are specific to a routing protocol; for example, SNMP queries to the MIB 400 associated with the OSPF routing protocol.
  • SNMP queries may be used to retrieve the relevant set of management information values 420 from a MIB 400 associated with router 150 .
  • the first management information value 420 is the OSPF interface administrative status 421 for the first interface 155 (e.g., the source interface), and the second management information value 420 is the OSPF interface administrative status 421 for the second interface 155 (e.g., the destination interface).
  • the value of ospfIfAdminStat 421 may, for example, indicate an enabled status, or a disabled status.
  • additional management information values 420 are determined for the first and second interfaces 155 .
  • management information values 420 may also be determined for an OSPF interface area identifier 422 (ospfIfAreald), an OSPF interface type 423 (ospfIfType), an OSPF interface hello interval value 424 (ospfIfHelloInterval), and an OSPF interface router dead interval value 425 (ospfIfRtrDeadInterval).
  • a check takes place, evaluating the response, if any, to the SNMP query or queries of block 530 . If there was an error or no response, the method 500 proceeds to block 550 A, discussed below. In some embodiments, if there was a valid response, the values returned from the SNMP query or queries may be saved into a table. If there was a valid response, the method 500 proceeds to block 535 .
  • the interface status (such as the value of ospfIfAdminStat 421 ) is checked for the first and second interfaces 155 .
  • Each value of ospfIfAdminStat 421 may, for example, indicate an enabled status, or a disabled status. If the ospfIfAdminStat 421 for the first interface 155 is disabled, or if the ospfIfAdminStat 421 for the second interface 155 is disabled, or both, the method 500 proceeds to block 550 B, discussed below. If neither is disabled, the method 500 proceeds to block 540 .
  • a matching status is determined between the first management information value 420 for the first (source) interface 155 and the corresponding second management information value 420 for the second (destination) interface 155 .
  • a mismatch may be identified between the two management information values 420 , or a match may be identified.
  • the matching status is checked. If one or more mismatches were identified at block 540 , the method 500 proceeds to block 550 C, discussed below. If no mismatches were identified at block 540 , the method 500 proceeds to block 550 D, discussed below.
  • an error message is generated; for example, a message may be generated with error text returned from the SNMP query or queries.
  • An illustrative example of such an error message is shown in Table 1.
  • the error message may advise the user to check for events (e.g., APA events) that may indicate physical failure of a device.
  • the method 500 proceeds to block 555 .
  • an error message is generated, responsive to the notification 330 , indicating that a routing protocol (e.g., OSPF) is disabled for one or both of the interfaces 155 , and identifying the disabled interface(s) 155 .
  • a routing protocol e.g., OSPF
  • An illustrative example of such an error message is shown in Table 2. The method 500 proceeds to block 555 .
  • a message is generated, responsive to the notification 330 , indicating that a mismatch or misconfiguration has been found, and identifying the mismatched data elements 411 - 413 and/or management information values 420 .
  • the message may, in some embodiments, include a table or display identifying the data elements 411 - 413 and/or management information values 420 that were queried, together with the corresponding values thereof.
  • An illustrative example of such an error message is shown in Table 3.
  • the method 500 proceeds to block 555 .
  • a diagnostic message is generated responsive to the notification 330 ; for example, a message indicating that no mismatch or misconfiguration has been found.
  • the message may, in some embodiments, include a table or display identifying the data elements 411 - 413 and/or management information values 420 that were queried, together with the corresponding values thereof.
  • the method 500 proceeds to block 555 .
  • An illustrative example of such an error message is shown in Table 4.
  • IpIfAdminStatus up up IpIfMtu 1500 : 1500 OSPF IfIpAdminStat 1 : 1 OSPF IfAreaId 0.0.0.1 : 0.0.0.1 OSPF IfType 1 : 1 OSPF IfHelloInterval 10 : 10 OSPF IfRouterDeadInterval 40 : 40
  • the message generated at any of blocks 550 A- 550 D (e.g., an error message or diagnostic message) is displayed to the user; for example, by a web browser page or a pop-up window displaying the error message.
  • a tool such as webappmon
  • the method 500 concludes at block 599 .
  • FIG. 6 is a flow chart of a simplified exemplary method 600 for router misconfiguration diagnosis according to a further embodiment of the present invention. It should be noted that FIG. 6 includes blocks having identical reference numbers to corresponding blocks shown in FIG. 5 . Such blocks represent steps of method 600 that correspond to steps of method 500 .
  • the method 600 begins at start block 501 , and proceeds to block 510 .
  • a notification 330 of a routing failure 331 (e.g., lost adjacency) between a first node 311 and a second node 312 is received.
  • a first interface 155 associated with the first node 311 is identified, and a second interface 155 associated with the second node 312 is identified.
  • a first management information value 420 and a second management information value 420 are determined. For example, SNMP queries may be used to retrieve the relevant set of management information values 420 from a MIB 400 associated with router 150 .
  • matching status is determined between the first management information value 420 and the second management information value 420 .
  • a mismatch may be identified between the two management information values 420 , or a match may be identified.
  • a diagnostic message is generated responsive to the notification.
  • a tool such as webappmon
  • a diagnostic script can be used to invoke a diagnostic script, to capture the standard output of its results, and to display the output as a web page to the user.
  • the method 600 concludes at block 599 .
  • FIG. 7 shows an exemplary user interface 700 for management software 340 according to an embodiment of the invention.
  • the user interface 700 displays a plurality of representations of alarms, each associated with a notification 330 .
  • Representation 710 is a representation of a selected alarm indicating “Lost Adjacency,” and showing information derived from a notification 330 of lost adjacency between a source IP address 451 and a destination IP address 452 .

Abstract

Router misconfiguration diagnosis is disclosed. A notification of a routing failure between a first node and a second node is received. A first interface associated with the first node is identified, and a second interface associated with the second node is identified. A first management information value and a second management information value, specific to a routing protocol, are determined. Matching status is determined between the first and second management information values. A diagnostic message is generated responsive to the notification.

Description

BACKGROUND
Routing protocols provide reachability information and network path preference information for transmission of data packets across communications networks. Routing protocols include, but are not limited to, routing protocol families such as Interior Gateway Protocol (IGP) and Exterior Gateway Protocol (EGP). Examples of IGP protocols include Intermediate-System to Intermediate-System (IS-IS), Open Shortest Path First (OSPF), and Enhanced Interior Gateway Routing Protocol (EIGRP). Examples of EGP protocols include Border Gateway Protocol (BGP) and BGP4.
Route listening technologies can monitor the data packets that flow between routers, using routing protocols. Route listening technologies are able to detect route failures and anomalies. Such technologies are able to provide near real-time reporting of routing symptoms that may indicate that a component of the communications network has gone awry.
In some cases, route failures are caused by physical network failures that are reported by a network monitoring service. However, route failures are often caused by a protocol miss configuration in a router. Troubleshooting in such cases typically requires manual comparison of protocol configuration values, and logging on to affected routers to perform a set of pertinent diagnostic commands. The process is time-consuming and requires expert protocol knowledge to evaluate a multitude of possible configuration mishaps. This may lead to protracted delays, and to high mean time to repair statistics.
Existing solutions are able to detect a misconfiguration by polling a router's Management Information Base (MIB) for a given network protocol, and are able to alert the user of the misconfiguration in an alarm. Such polling takes place periodically, such as at preset time intervals. However, since such polling requires an amount of time or a polling cycle to determine when an adverse routing condition occurs, there can be delays in detecting and reporting the misconfiguration. The speed at which a network can be polled may depend on a number of factors, including the number of nodes, the availability of bandwidth and the response times of those nodes. Since polling generally requires a relatively long cycle of time to gather data from a large number of devices, it is not always feasible to gather up-to-date information on routes in a large routed environment via polling. Polling also adds overhead to both network links and network system resources, thereby causing a negative impact on scalability.
BRIEF DESCRIPTION OF THE DRAWINGS
For the purpose of illustrating the invention, there is shown in the drawings a form that is presently exemplary; it being understood, however, that this invention is not limited to the precise arrangements and instrumentalities shown.
FIG. 1 is a block diagram of an exemplary computing environment in accordance with an implementation of the herein described systems and methods;
FIG. 2 is a block diagram showing the cooperation of exemplary components of an exemplary data communications architecture, in accordance with an embodiment;
FIG. 3 is a diagram illustrating transmission of a notification from an exemplary routing analyzer to an exemplary management station, in a network environment for practicing an embodiment of the invention.
FIG. 4A is a diagram illustrating an interface having a management information base for practicing an embodiment of the invention.
FIG. 4B depicts an illustrative notification, according to an embodiment of the invention.
FIG. 5 is a flow chart of a first exemplary method for router misconfiguration diagnosis according to an embodiment of the present invention.
FIG. 6 is a flow chart of a simplified exemplary method for router misconfiguration diagnosis according to an further embodiment of the present invention.
FIG. 7 shows an exemplary user interface for management software according to an embodiment of the invention.
DETAILED DESCRIPTION
Overview
Aspects of the present invention provide a tool which, used with a network management service having a route listening service, provides a network engineer with evidence, of what parameters, if any, are misconfigured for a reported route failure that is not explained by a physical network failure. The route failure causes the generation of a notification (e.g., a symptomatic alarm or trap). The tool can perform live Simple Network Management Protocol (SNMP) queries to a router identified in the notification, to obtain analysis information on its configuration values and states. The analysis can show what configuration parameters (i.e., management information values) are checked and can highlight any parameters that are misconfigured. In the event that no values are found to be misconfigured, the list of parameters and values that are checked can help the network engineer further narrow the possible cause of the problems. The mean time to repair such route failures can thereby be reduced.
An embodiment of the present invention can provide near real-time immediacy in alerting a network engineer of router failures, by using a routing analyzer (e.g., a route listening service) that monitors route traffic. Further aspects of the invention can identify the cause of a route failure as misconfiguration, providing accurate, specific details so that the network engineer can quickly correct the problem. Such details may, in some embodiments, include displaying all protocol configuration parameter-value pairs that have been checked, thereby providing information to help narrow down a problem whose cause may not be obvious.
Aspects of the invention provide enhanced accuracy in detecting route failures, compared to solutions that indirectly determine the health of the routing protocol layer based solely on the use of polling, or Simple Network Management Protocol (SNMP) traps, or syslog notifications. Authoritative information about a routing failure can be obtained by monitoring the network at its routing control plane, rather than at a higher-level network layer; accordingly, when monitoring of the routing control plane indicates there is a problem with routing, there is little doubt that a routing service is impaired.
Illustrative Computing Environment
Referring to the drawings, in which like reference numerals indicate like elements, FIG. 1 depicts an exemplary computing system 100 for practicing aspects of the invention, in accordance with herein described systems and methods. The computing system 100 is capable of executing a variety of computing applications 180. Computing application 180 can comprise a computing application, a computing applet, a computing program and other instruction set operative on computing system 100 to perform at least one function, operation, and/or procedure. Exemplary computing system 100 is controlled primarily by computer readable instructions, which can be in the form of software. The computer readable instructions can contain instructions for computing system 100 for storing and accessing the computer readable instructions themselves. Such software can be executed within central processing unit (CPU) 110 to cause the computing system 100 to do work. In many known computer servers, workstations and personal computers CPU 110 is implemented by micro-electronic chips CPUs called microprocessors.
It is appreciated that although an illustrative computing environment is shown to comprise the single CPU 110 that such description is merely illustrative as computing environment 100 can comprise a number of CPUs 110. Additionally computing environment 100 can exploit the resources of remote CPUs (not shown) through communications network 160 or some other data communications means (not shown).
In operation, the CPU 110 fetches, decodes, and executes instructions, and transfers information to and from other resources via the computer's main data-transfer path, system bus 105. Such a system bus connects the components in the computing system 100 and defines the medium for data exchange. Components that can be connected to the system bus 105 include extension cards, controllers such as a peripherals controller and a memory controller, memory devices such as random access memory (RAM) and read only memory (ROM), and CPU 110.
Further, the computing system 100 can contain network adaptor 170 which can be used to connect the computing system 100 to an external communication network 160 by a communication link 121.
A communications network 160 may, for example, be any of, or a combination of a wired or wireless local area network (LAN), wide area network (WAN), intranet, extranet, peer-to-peer network, the Internet, or other communications network. In an exemplary embodiment, the communications network 160 can comprise two or more subnetworks such as communications networks 161, 162 interconnected by one or more routers 150. The router 150 has interfaces (IFs) 155A, 155B (collectively, interfaces 155), through which the router 150 interconnects communications networks 161, 162 by communication links 122, 123. While the exemplary router 150 shown in FIG. 1 has two interfaces 155A, 155B, a router 150 is not limited to two interfaces 155, and can have one or more interfaces 155.
The communications networks 160-162 can provide computer users with connections for communicating and transferring software and information electronically. Additionally, communications networks 160-162 can provide distributed processing, which involves several computers and the sharing of workloads or cooperative efforts in performing a task. Communication links 121-123 may, for example, include wired connections, wireless connections, optical connections, and the like. It will be appreciated that the network connections shown are exemplary and other means of establishing a communication link between computers may be used.
A router 150, in general, can be defined as a network device (which in some embodiments can comprise a dedicated computer 100) that is used to connect two or more communication networks 161, 162 together and to route data packets between them. Router 150 is configured to determine a path for forwarding the data packets, and can be adapted to use a protocol to communicate with other routers 150; examples of such protocols include, but are not limited to, Internet Control Message Protocol (ICMP) and routing protocols such as Open Shortest Path First (OSPF). Router 150 is able to directly receive data packets over a communication network 161, 162 from one or more adjacent nodes (such as computing system 100, other computing systems 100, other routers 150, and other network devices). Router 150 can be configured to determine an optimum route between two nodes.
It is appreciated that the exemplary computer system 100 is merely illustrative of a computing environment in which the herein described systems and methods may operate and does not limit the implementation of the herein described systems and methods in computing environments having differing components and configurations as the inventive concepts described herein may be implemented in various computing environments having various components and configurations.
Illustrative Computer Network Environment
Computing system 100, described above, can be deployed as part of a computer network. In general, the above description for computing environments applies to both server computers and client computers deployed in a network environment. FIG. 2 illustrates an illustrative networked computing environment 200, with a server in communication with client computers via a communications network, in which the herein described apparatus and methods may be employed. While an exemplary client-server system is illustrated in FIG. 2, any of numerous configurations may be used with aspects of the invention, including peer-to-peer and other network configurations.
In a network environment 200 in which the communications network 160 is the Internet, for example, server 205 can be one or more dedicated computing environment servers operable to process and communicate data to and from exemplary client computing environments 220. In some embodiments of the network environment 200, numerous computing systems 100 can be connected to the communications network 160, and a particular computing system 100 may function as a server 205, as a client 220, or as both. In operation, a user (not shown), such as a network engineer, may interact with a computing application running on a client computing environment 220 to obtain desired data and/or computing applications. The data and/or computing applications may be stored on server computing environment 205 and communicated to cooperating users through exemplary client computing environments 220, over exemplary communications network 160.
As shown in FIG. 2, server 205 may be interconnected via a communications network 160 (which may be any of, or a combination of, a wired or wireless LAN, WAN, intranet, extranet, peer-to-peer network, the Internet, or other communications network) with a number of exemplary client computing environments such as computing system 100, personal digital assistant 225, wired or mobile telephone (not shown), networked storage devices, printing devices, and other network appliances (not shown), and management station 230 (collectively, client computing environments 220). Server 205, client computing environments 220, and a routing analyzer 210 are connected with communications network 160 (such as by a communication link 121).
The management station 230 is operable to monitor nodes of the communications network 160; for example, management station 230 can monitor a protocol (e.g., Internet Protocol (IP)) used in the communications network 160. In some embodiments, management station 230 comprises a computing system 100 equipped with a computing application 180 such as network management software for monitoring devices connected to the communications network 160.
Illustrative Data Flow
FIG. 3 is a diagram illustrating transmission of a notification 330 from an exemplary routing analyzer 210 to an exemplary management station 230, in a network environment for practicing an embodiment of the invention, in accordance with an embodiment.
The routing analyzer 210 is operable to provide a route listening service 320 for monitoring the communications network 160. Routing analyzer 210 can be, for example, a network appliance such as Route Explorer, commercially available from Packet Design Inc., or OpenView Route Analytics Management System (RAMS), commercially available from Hewlett-Packard Company. Routing analyzer 210 is operable to monitor a routing protocol used in the communications network 160. Routing protocols include, but are not limited to, routing protocol families such as Interior Gateway Protocol (IGP) and Exterior Gateway Protocol (EGP). Examples of IGP protocols include Intermediate-System to Intermediate-System (IS-IS), Open Shortest Path First (OSPF), Enhanced Interior Gateway Routing Protocol (EIGRP), and the like. Examples of EGP protocols include Border Gateway Protocol (BGP), BGP4, and the like. Routing analyzer 210 (for example, a route analysis appliance) is able to detect events (such as routing failure 331) on the communications network 160, and is able to generate notifications (e.g., asynchronous event reports, or traps) for reporting events over the communications network 160.
The communications network 160 comprises a plurality of routers 150 (e.g., routers 150A, 150B, 150C), which connect a plurality of nodes 310 (e.g., nodes 311, 312). Exemplary nodes 310 may include one or more of computing system 100, server 205, client computing environment 220, or any network-connected system, device, appliance, or the like.
Using the listening service 320 for monitoring the communications network 160, the routing analyzer 210 is able to detect a routing protocol failure condition of one or more of the routers 150; for example, routing failure 331. In an illustrative example of routing failure 331, packets are dropped and not advertised. A further example of routing failure 331 is lost adjacency; e.g., loss of adjacency between two of the routers 150 or between two of the nodes 310. Routing analyzer 210 generates notification 330, such as by using SNMP to generate a trap which is transmitted over communications network 160.
Management station 230 is able to receive the notification 330 over communications network 160. Management station 230 is equipped with network management software 340 for monitoring devices connected to the communications network 160. Network management software 340 may, for example, send and receive network messages, e.g., by using Simple Network Management Protocol (SNMP). The management station 230 is able to receive a notification 330, such as a notification 330 generated by the routing analyzer 210 or by a router 150. Management station 230 is also able to interact with a user (not shown), such as a network engineer, by displaying information to the user and receiving inputs from the user. In an illustrative example, web browsing software can be provided on management station 230 to provide interactivity with the user. In a further illustrative example, network management software 340 may be configured to provide interactivity with the user.
Exemplary Data Elements
FIG. 4A is a diagram illustrating an interface 155 having a management information base 400 for practicing an embodiment of the invention. An exemplary router 150 has an interface 155, for connecting the router 150 to a communications network 160. The interface 155 is associated with data elements for describing aspects of the interface 155; for example, an interface index 411 (ifIndex), an interface administrative status 412 (ifAdminStatus), and an interface maximum transmission unit size 413 (ifMTU) that represents the maximum amount of data (e.g., packet size) that can be transferred in one physical frame. For example, the data elements 411-413 may, in some embodiments, be included as entries in an interface table. In further exemplary embodiments, data elements 411-413 may be included in management information base 400.
Management information base 400 (MIB) is associated with the interface 155. The management information base 400 comprises a plurality of management information values 420. In an illustrative example, the management information base 400 comprises an OSPF interface table, and the OSPF interface table includes entries (such as management information values 420) associated with the OSPF routing protocol.
Illustrative examples of management information values 420 include an OSPF interface administrative status 421 (ospfIfAdminStat), an OSPF interface area identifier 422 (ospfIfAreald), an OSPF interface type 423 (ospfIfType), an OSPF interface hello interval value 424 (ospfIfHelloInterval), and an OSPF interface router dead interval value 425 (ospfIfRtrDeadInterval). The OSPF interface administrative status 421 (ospfIfAdminStat) may, for example, have a value representing an enabled status, or a disabled status. The OSPF interface area identifier 422 may, for example, be a 32-bit integer uniquely identifying the area to which the interface 155 connects. The OSPF interface type 423 may, for example, have a value representing broadcast LANs (e.g., Ethernet and IEEE 802.5), a value representing X.25 and similar technologies, and values representing links that are point-to-point, or point-to-multipoint. The OSPF interface hello interval value 424 may, for example, represent a length of time, in seconds, between “Hello” packets that the router 150 sends on the interface 155. The OSPF interface router dead interval value 425 may, for example, represent a number of seconds that the router 150's “Hello” packets have not been seen before neighboring routers 150 declare the adjacency between themselves and router 150 to be down.
FIG. 4B depicts an illustrative notification 330, according to an embodiment of the invention. The notification 330 comprises a plurality of data elements. Source IP address 451 is a data element comprising a first IP address for a source node 310 (e.g., first node 311). Destination IP address 452 is a data element comprising a second IP address for a destination node 310 (e.g., second node 312). Alarm type 453 is a data element comprising an identifier (e.g., a numeric value, text, enumerator, constant, or the like) representing a purpose or subject matter of the notification. For example, alarm type 453 may comprise an identifier that indicates lost adjacency between source IP address 451 and destination IP address 452.
First Exemplary Method
FIG. 5 shows a first exemplary method 500 for router misconfiguration diagnosis according to an embodiment of the present invention. The method 500 begins at start block 501, and proceeds to block 510. At block 510, a notification 330 of a routing failure 331 (e.g., lost adjacency) between a first node 311 and a second node 312 is received, such as by management software 340 running on management station 230.
At block 515, a user selection is made, thereby causing the management software 340 to undertake or launch a diagnostic routine (e.g., routing protocol diagnosis) for the routing failure 331. In an illustrative example, a user at management station 230 may select a representation 710 of the notification 330 (e.g., a lost adjacency alarm) from a user interface 700 (e.g., web application, menu, browser, screen, or other interface) of the management software 340. An example of such a representation 710 is illustrated in FIG. 7, discussed below.
At block 520, a first interface 155 associated with the first node 311 is identified, and a second interface 155 associated with the second node 312 is identified. In an illustrative example, the identification is accomplished by extracting a source IP address 451 and a destination IP address 452 from the notification 330. In a further illustrative example, two instances of an interface index 411 associated with the first and second interfaces 155 are then determined; for instance, one or more SNMP queries are initiated to find the value of an interface index 411 for the first interface 155 at source IP address 451, and to find the value of an interface index 411 for the second interface 155 at the destination IP address 452.
SNMP queries, together with diagnostic steps, may, for example, be encoded in an executable file, or in some embodiments, may be encoded in a Perl script for enhanced platform portability, re-use of tools, customizability, and reasonably fast prototyping turnaround.
At block 521, a check takes place, evaluating the response, if any, to the SNMP query or queries of block 520. If there was an error or no response, the method 500 proceeds to block 550A, discussed below. In some embodiments, if there was a valid response, the values returned from the SNMP query or queries may be saved into a table. If there was a valid response, the method 500 proceeds to block 525.
At block 525, interface data is found. In an illustrative example, using the value of an interface index 411 for the source IP address 451, one or more data elements 412-413 associated with the first interface 155 for the source IP address 451 are determined. For instance, one or more SNMP queries are initiated to find the value of an ifAdminStatus 412 and an ifMTU 413 for the first interface 155. Continuing the same illustrative example, using the value of an interface index 411 for the destination IP address 452, one or more data elements 412-413 associated with the second interface 155 at the destination IP address 452 are determined. For instance, one or more SNMP queries are initiated to find the value of an ifAdminStatus 412 and an ifMTU 413 for the second interface 155.
At block 526, a check takes place, evaluating the response, if any, to the SNMP query or queries of block 525. If there was an error or no response, the method 500 proceeds to block 550A, discussed below. In some embodiments, if there was a valid response, the values returned from the SNMP query or queries may be saved into a table. If there was a valid response, the method 500 proceeds to block 530.
At block 530, a first management information value 420 for the first interface 155 and a second management information value 420 for the second interface 155 are determined. The determination is made using queries that are specific to a routing protocol; for example, SNMP queries to the MIB 400 associated with the OSPF routing protocol. In an illustrative example, SNMP queries may be used to retrieve the relevant set of management information values 420 from a MIB 400 associated with router 150.
In an illustrative embodiment, the first management information value 420 is the OSPF interface administrative status 421 for the first interface 155 (e.g., the source interface), and the second management information value 420 is the OSPF interface administrative status 421 for the second interface 155 (e.g., the destination interface). The value of ospfIfAdminStat 421 may, for example, indicate an enabled status, or a disabled status.
In some embodiments, additional management information values 420 are determined for the first and second interfaces 155. For example, management information values 420 may also be determined for an OSPF interface area identifier 422 (ospfIfAreald), an OSPF interface type 423 (ospfIfType), an OSPF interface hello interval value 424 (ospfIfHelloInterval), and an OSPF interface router dead interval value 425 (ospfIfRtrDeadInterval).
At block 531, a check takes place, evaluating the response, if any, to the SNMP query or queries of block 530. If there was an error or no response, the method 500 proceeds to block 550A, discussed below. In some embodiments, if there was a valid response, the values returned from the SNMP query or queries may be saved into a table. If there was a valid response, the method 500 proceeds to block 535.
At block 535, the interface status (such as the value of ospfIfAdminStat 421) is checked for the first and second interfaces 155. Each value of ospfIfAdminStat 421 may, for example, indicate an enabled status, or a disabled status. If the ospfIfAdminStat 421 for the first interface 155 is disabled, or if the ospfIfAdminStat 421 for the second interface 155 is disabled, or both, the method 500 proceeds to block 550B, discussed below. If neither is disabled, the method 500 proceeds to block 540.
At block 540, for the values of management information value 420 previously determined, a matching status is determined between the first management information value 420 for the first (source) interface 155 and the corresponding second management information value 420 for the second (destination) interface 155. For example, for pairs of corresponding management information values 420, a mismatch may be identified between the two management information values 420, or a match may be identified.
At block 541, the matching status is checked. If one or more mismatches were identified at block 540, the method 500 proceeds to block 550C, discussed below. If no mismatches were identified at block 540, the method 500 proceeds to block 550D, discussed below.
At block 550A, an error message is generated; for example, a message may be generated with error text returned from the SNMP query or queries. An illustrative example of such an error message is shown in Table 1.
TABLE 1
RAMS Protocol Diagnosis : “AdjacencyLost” “15.6.96.34” “15.6.96.33”
RAMS Protocol Diagnosis Results
Diagnosing AdjacencyLost related symptoms between source 15.6.96.34
and destination 15.6.96.33
Unable to proceed with diagnosis on 15.6.96.34.
 snmpget: No response arrived before timeout.
 snmpget: Possible causes include invalid community name, agent is
not running, or the node is unaccessible.
Unable to proceed with diagnosis on 15.6.96.33.
 snmpget: No response arrived before timeout.
 snmpget: Possible causes include invalid community name, agent is
not running, or the node is unaccessible.
 Probable cause:
 network failure.
 Check if any APA events are correlated under this Adjacency Lost.
In some embodiments, in the event of no response to a SNMP query, the error message may advise the user to check for events (e.g., APA events) that may indicate physical failure of a device. The method 500 proceeds to block 555.
At block 550B, an error message is generated, responsive to the notification 330, indicating that a routing protocol (e.g., OSPF) is disabled for one or both of the interfaces 155, and identifying the disabled interface(s) 155. An illustrative example of such an error message is shown in Table 2. The method 500 proceeds to block 555.
TABLE 2
RAMS Protocol Diagnosis : “AdjacencyLost” “15.6.96.50” “15.6.96.49”
RAMS Protocol Diagnosis Results
Diagnosing AdjacencyLost related symptoms between source 15.6.96.50
and destination 15.6.96.49
Found values configured for ip 15.6.96.49 :
  ospfIfAdminStat = 2
  ospfIfAreaId = 0.0.0.0
  ospfIfType = 1
  ospfIfHelloInterval = 10
  ospfIfRtrDeadInterval = 40
Probable cause:
 IP Address 15.6.96.49 is disabled (ospfIfAdminStat = 2) for OSPF
(IGP) protocol.
Check OSPF configuration for the IP Address on the router.
At block 550C, a message is generated, responsive to the notification 330, indicating that a mismatch or misconfiguration has been found, and identifying the mismatched data elements 411-413 and/or management information values 420. In an illustrative example, the message may, in some embodiments, include a table or display identifying the data elements 411-413 and/or management information values 420 that were queried, together with the corresponding values thereof. An illustrative example of such an error message is shown in Table 3. The method 500 proceeds to block 555.
TABLE 3
RAMS Protocol Diagnosis : “AdjacencyLost” “15.6.96.49” “15.6.96.50”
RAMS Protocol Diagnosis Results
Diagnosing AdjacencyLost related symptoms between source 15.6.96.49
and destination 15.6.96.50
 Probable cause:
 Mismatched protocol value(s) configured between source 15.6.96.49
and destination 15.6.96.50 :
  source ospfIfHelloInterval value = 15
  destination ospfIfHelloInterval value = 10
  source ospfIfRtrDeadInterval value = 60
  destination ospfIfRtrDeadInterval value = 40
At block 550D, a diagnostic message is generated responsive to the notification 330; for example, a message indicating that no mismatch or misconfiguration has been found. The message may, in some embodiments, include a table or display identifying the data elements 411-413 and/or management information values 420 that were queried, together with the corresponding values thereof. The method 500 proceeds to block 555. An illustrative example of such an error message is shown in Table 4.
TABLE 4
RAMS Protocol Diagnosis : “AdjacencyLost” “15.6.96.49” “15.6.96.50”
RAMS Protocol Diagnosis Results
Diagnosing AdjacencyLost related symptoms between source 15.6.96.49
and destination 15.6.96.50
Cannot determine probable cause - no mismatched configuration found
between routers.
Values configured for
 source IP Address 15.6.96.49 : destination IP Address 15.6.96.50
IpIfAdminStatus up : up
IpIfMtu 1500 : 1500
OSPF IfIpAdminStat 1 : 1
OSPF IfAreaId      0.0.0.1 : 0.0.0.1
OSPF IfType 1 : 1
OSPF IfHelloInterval 10 : 10
OSPF IfRouterDeadInterval 40 : 40
At block 555, the message generated at any of blocks 550A-550D (e.g., an error message or diagnostic message) is displayed to the user; for example, by a web browser page or a pop-up window displaying the error message. For example, in some embodiments, a tool (such as webappmon) can be used to invoke a diagnostic script, to capture the standard output of its results, and to display the output as a web page to the user. From block 555, the method 500 concludes at block 599.
Simplified Exemplary Method
FIG. 6 is a flow chart of a simplified exemplary method 600 for router misconfiguration diagnosis according to a further embodiment of the present invention. It should be noted that FIG. 6 includes blocks having identical reference numbers to corresponding blocks shown in FIG. 5. Such blocks represent steps of method 600 that correspond to steps of method 500.
The method 600 begins at start block 501, and proceeds to block 510. At block 510, a notification 330 of a routing failure 331 (e.g., lost adjacency) between a first node 311 and a second node 312 is received.
At block 520, a first interface 155 associated with the first node 311 is identified, and a second interface 155 associated with the second node 312 is identified.
At block 530, a first management information value 420 and a second management information value 420, specific to a routing protocol, are determined. For example, SNMP queries may be used to retrieve the relevant set of management information values 420 from a MIB 400 associated with router 150.
At block 540, matching status is determined between the first management information value 420 and the second management information value 420. For example, a mismatch may be identified between the two management information values 420, or a match may be identified.
At block 550, a diagnostic message is generated responsive to the notification. For example, in some embodiments, a tool (such as webappmon) can be used to invoke a diagnostic script, to capture the standard output of its results, and to display the output as a web page to the user. The method 600 concludes at block 599.
Exemplary Interfaces
FIG. 7 shows an exemplary user interface 700 for management software 340 according to an embodiment of the invention. The user interface 700 displays a plurality of representations of alarms, each associated with a notification 330. Representation 710 is a representation of a selected alarm indicating “Lost Adjacency,” and showing information derived from a notification 330 of lost adjacency between a source IP address 451 and a destination IP address 452.
Although exemplary implementations of the invention have been described in detail above, those skilled in the art will readily appreciate that many additional modifications are possible in the exemplary embodiments without materially departing from the novel teachings and advantages of the invention. Accordingly, these and all such modifications are intended to be included within the scope of this invention.

Claims (31)

What is claimed is:
1. A method for router misconfiguration diagnosis, comprising:
receiving a notification of a routing failure between a first node and a second node,
identifying a first interface associated with the first node and a second interface associated with the second node,
determining a first management information value associated with the first interface and a second management information value associated with the second interface, the first management information value and the second management information value being specific to a routing protocol,
determining that the routing failure was caused by the first and second management information values being mismatched, and
generating a diagnostic message responsive to the notification;
wherein the first management information value and the second management information value are selected from the group of variables consisting of ospflfAdminStat, ospflfAreald, ospflfType, ospflfHelloInterval, and ospfifRtrDeadInterval.
2. The method of claim 1 wherein the routing failure is lost adjacency.
3. The method of claim 1 further comprising
extracting from the notification a first IP address for the first node and a second IP address for the second node.
4. The method of claim 1 further comprising
receiving a user selection identifying the notification for diagnosis.
5. The method of claim 1 wherein identifying the first and second interfaces further comprises:
querying for a first interface index associated with the first node,
querying for a second interface index associated with the second node, and
saving the first and second interface index.
6. The method of claim 1 further comprising:
querying for a first interface status associated with the first interface,
querying for a second interface status associated with the second interface, and
saving the first and second interface statuses.
7. The method of claim 1 further comprising querying using a network management protocol.
8. The method of claim 7 wherein the network management protocol is SNMP.
9. The method of claim 1 wherein the routing protocol comprises OSPF.
10. The method of claim 1 wherein the first management information value is selected from a first management information base associated with the first interface, and the second management information value is selected from a second management information base associated with the second interface.
11. The method of claim 10 wherein the first and second management information bases comprise OSPF interface tables.
12. The method of claim 1 further comprising displaying the diagnostic message.
13. The method of claim 1 wherein the diagnostic message comprises an error message returned from a network management protocol query.
14. The method of claim 1 wherein the diagnostic message comprises an indication of advice to check for an event indicative of physical failure.
15. The method of claim 1 wherein the diagnostic message comprises an indication that the routing protocol is disabled for an interface associated with at least one of the first and second nodes.
16. The method of claim 1 wherein the diagnostic message comprises the first management information value and the second management information value.
17. The method of claim 1 wherein the diagnostic message comprises an indication of a router misconfiguration.
18. The method of claim 1 wherein the diagnostic message comprises identification of a mismatch between the first management information value and the second management information value.
19. A system for router misconfiguration diagnosis in a communications network, comprising:
a management station able to receive from a routing analyzer a notification of a routing failure between a first node of the communications network and a second node of the communications network,
the management station being adapted to identify a first interface associated with the first node and a second interface associated with the second node, to determine a first management information value associated with the first interface and a second management information value associated with the second interface, the first management information value and the second management information value being specific to a routing protocol, to determine that the routing failure was caused by the first and second management information values being mismatched, and to generate a diagnostic message responsive to the notification;
wherein the first management information value and the second management information value are selected from the group of variables consisting of ospflfAdminStat, ospflfAreald, ospflfType, ospflfHelloInterval, and ospflfRtrDeadInterval.
20. The system of claim 19 wherein the routing failure is lost adjacency.
21. The system of claim 19 wherein the notification comprises a first IP address for the first node and a second IP address for the second node.
22. The system of claim 19 wherein the management station is further adapted to receive a user selection identifying the notification for diagnosis.
23. The system of claim 19 wherein the first management information value is selected from a first management information base associated with the first interface, and the second management information value is selected from a second management information base associated with the second interface.
24. A non-transitory computer-readable medium comprising storage and a set of instructions located on the storage, for router misconfiguration diagnosis, which when the instructions are executed by a computer, cause the computer to perform a process comprising:
receiving a notification of a routing failure between a first node and a second node,
identifying a first interface associated with the first node and a second interface associated with the second node,
determining a first management information value associated with the first interface and a second management information value associated with the second interface, the first management information value and the second management information value being specific to a routing protocol,
determining that the routing failure was caused by the first and second management information values being mismatched, and
generating a diagnostic message responsive to the notification;
wherein the first management information value and the second management information value are selected from the group of variables consisting of ospflfAdminStat, ospflfAreald, ospflfType, ospfifHelloInterval, and ospflfRtrDeadInterval.
25. The computer-readable medium of claim 24 wherein the routing failure is lost adjacency.
26. The computer-readable medium of claim 24, wherein the set of instructions further comprises steps for:
extracting from the notification a first IP address for the first node and a second IP address for the second node.
27. The computer-readable medium of claim 24, wherein the set of instructions further comprises steps for:
receiving a user selection identifying the notification for diagnosis.
28. The computer-readable medium of claim 24, wherein identifying the first and second interfaces further comprises:
querying for a first interface index associated with the first node,
querying for a second interface index associated with the second node, and
saving the first and second interface index.
29. A system for router misconfiguration diagnosis, comprising:
a computing environment arranged to receive a notification of a routing failure between a first node and a second node,
a computing system operatively associated with the computing environment for identifying a first interface associated with the first node and a second interface associated with the second node,
a computing application operatively associated with the computing system for determining a first management information value associated with the first interface and a second management information value associated with the second interface, the first management information value and the second management information value being specific to a routing protocol, and for determining that the routing failure was caused by the first and second management information values being mismatched, and
a message generator operatively associated with the computing system for generating a diagnostic message responsive to the notification,
wherein the first management information value and the second management information value are selected from the group of variables consisting of ospflfAdminStat, ospflfAreald, ospflfType, ospflfHelloInterval, and ospflfRtrDeadInterval.
30. The method of claim 1, wherein the first management information value comprises an entry in an OSPF interface table.
31. The method of claim 1, wherein the first management information value comprises an administrative status for the first interface.
US11/446,914 2006-06-05 2006-06-05 Router misconfiguration diagnosis Active 2029-07-17 US8467301B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/446,914 US8467301B2 (en) 2006-06-05 2006-06-05 Router misconfiguration diagnosis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/446,914 US8467301B2 (en) 2006-06-05 2006-06-05 Router misconfiguration diagnosis

Publications (2)

Publication Number Publication Date
US20070280120A1 US20070280120A1 (en) 2007-12-06
US8467301B2 true US8467301B2 (en) 2013-06-18

Family

ID=38789998

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/446,914 Active 2029-07-17 US8467301B2 (en) 2006-06-05 2006-06-05 Router misconfiguration diagnosis

Country Status (1)

Country Link
US (1) US8467301B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110082923A1 (en) * 2009-10-02 2011-04-07 Canon Kabushiki Kaisha Communication apparatus having a plurality of network interfaces, method for controlling the communication apparatus, and storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7911976B2 (en) * 2008-12-19 2011-03-22 At&T Intellectual Property I, L.P. Method and apparatus for managing routing in a network
CN107870832B (en) * 2016-09-23 2021-06-18 伊姆西Ip控股有限责任公司 Multi-path storage device based on multi-dimensional health diagnosis method
CN108390790B (en) * 2018-03-16 2021-08-03 迈普通信技术股份有限公司 Fault diagnosis method and device for routing equipment

Citations (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5982753A (en) * 1997-06-09 1999-11-09 Fluke Corporation Method of testing a switched local area network
US6292472B1 (en) * 1998-10-22 2001-09-18 Alcatel Reduced polling in an SNMPv1-managed network
US6393486B1 (en) * 1995-06-23 2002-05-21 Cisco Technology, Inc. System and method using level three protocol information for network centric problem analysis and topology construction of actual or planned routed network
US20030023716A1 (en) 2001-07-25 2003-01-30 Loyd Aaron Joel Method and device for monitoring the performance of a network
US20030037136A1 (en) * 2001-06-27 2003-02-20 Labovitz Craig H. Method and system for monitoring control signal traffic over a computer network
US20030211842A1 (en) * 2002-02-19 2003-11-13 James Kempf Securing binding update using address based keys
US20040006619A1 (en) * 2002-07-02 2004-01-08 Fujitsu Network Communications, Inc. Structure for event reporting in SNMP systems
US6697970B1 (en) * 2000-07-14 2004-02-24 Nortel Networks Limited Generic fault management method and system
US6711152B1 (en) * 1998-07-06 2004-03-23 At&T Corp. Routing over large clouds
US20040221025A1 (en) 2003-04-29 2004-11-04 Johnson Ted C. Apparatus and method for monitoring computer networks
US20040221296A1 (en) 2003-03-18 2004-11-04 Renesys Corporation Methods and systems for monitoring network routing
US20040221026A1 (en) 2003-04-30 2004-11-04 Dorland Chia-Chu S. Method and system for managing a network
US20050083855A1 (en) 2003-10-20 2005-04-21 Srikanth Natarajan Method and system for identifying the health of virtual routers
US20050102423A1 (en) * 1995-06-23 2005-05-12 Pelavin Richard N. Analyzing an access control list for a router to identify a subsumption relation between elements in the list
US20060092941A1 (en) * 2004-11-01 2006-05-04 Kazuhiro Kusama Communication path monitoring system and communication network system
US7095738B1 (en) * 2002-05-07 2006-08-22 Cisco Technology, Inc. System and method for deriving IPv6 scope identifiers and for mapping the identifiers into IPv6 addresses
US7106740B1 (en) * 2002-01-02 2006-09-12 Juniper Networks, Inc. Nexthop to a forwarding table
US7155500B2 (en) * 2001-03-16 2006-12-26 Telefonaktiebolaget Lm Ericsson (Publ) IP address ownership verification mechanism
US20070058631A1 (en) * 2005-08-12 2007-03-15 Microsoft Corporation Distributed network management
US20070230482A1 (en) * 2006-03-31 2007-10-04 Matsushita Electric Industrial Co., Ltd. Method for on demand distributed hash table update
US20080043633A1 (en) * 2006-04-03 2008-02-21 Padula Richard A Method and system for performing emta loop diagnostics
US20080060082A1 (en) * 2006-05-24 2008-03-06 International Business Machines Corporation Validating routing of client requests to appropriate servers hosting specific stateful web service instances
US7633942B2 (en) * 2001-10-15 2009-12-15 Avaya Inc. Network traffic generation and monitoring systems and methods for their use in testing frameworks for determining suitability of a network for target applications
US7702810B1 (en) * 2003-02-03 2010-04-20 Juniper Networks, Inc. Detecting a label-switched path outage using adjacency information

Patent Citations (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050102423A1 (en) * 1995-06-23 2005-05-12 Pelavin Richard N. Analyzing an access control list for a router to identify a subsumption relation between elements in the list
US6393486B1 (en) * 1995-06-23 2002-05-21 Cisco Technology, Inc. System and method using level three protocol information for network centric problem analysis and topology construction of actual or planned routed network
US5982753A (en) * 1997-06-09 1999-11-09 Fluke Corporation Method of testing a switched local area network
US6711152B1 (en) * 1998-07-06 2004-03-23 At&T Corp. Routing over large clouds
US6292472B1 (en) * 1998-10-22 2001-09-18 Alcatel Reduced polling in an SNMPv1-managed network
US6697970B1 (en) * 2000-07-14 2004-02-24 Nortel Networks Limited Generic fault management method and system
US7155500B2 (en) * 2001-03-16 2006-12-26 Telefonaktiebolaget Lm Ericsson (Publ) IP address ownership verification mechanism
US20030037136A1 (en) * 2001-06-27 2003-02-20 Labovitz Craig H. Method and system for monitoring control signal traffic over a computer network
US20030023716A1 (en) 2001-07-25 2003-01-30 Loyd Aaron Joel Method and device for monitoring the performance of a network
US7633942B2 (en) * 2001-10-15 2009-12-15 Avaya Inc. Network traffic generation and monitoring systems and methods for their use in testing frameworks for determining suitability of a network for target applications
US7106740B1 (en) * 2002-01-02 2006-09-12 Juniper Networks, Inc. Nexthop to a forwarding table
US20030211842A1 (en) * 2002-02-19 2003-11-13 James Kempf Securing binding update using address based keys
US7095738B1 (en) * 2002-05-07 2006-08-22 Cisco Technology, Inc. System and method for deriving IPv6 scope identifiers and for mapping the identifiers into IPv6 addresses
US20040006619A1 (en) * 2002-07-02 2004-01-08 Fujitsu Network Communications, Inc. Structure for event reporting in SNMP systems
US7702810B1 (en) * 2003-02-03 2010-04-20 Juniper Networks, Inc. Detecting a label-switched path outage using adjacency information
US20040221296A1 (en) 2003-03-18 2004-11-04 Renesys Corporation Methods and systems for monitoring network routing
US20040221025A1 (en) 2003-04-29 2004-11-04 Johnson Ted C. Apparatus and method for monitoring computer networks
US20040221026A1 (en) 2003-04-30 2004-11-04 Dorland Chia-Chu S. Method and system for managing a network
US20050083855A1 (en) 2003-10-20 2005-04-21 Srikanth Natarajan Method and system for identifying the health of virtual routers
US20060092941A1 (en) * 2004-11-01 2006-05-04 Kazuhiro Kusama Communication path monitoring system and communication network system
US20070058631A1 (en) * 2005-08-12 2007-03-15 Microsoft Corporation Distributed network management
US20070230482A1 (en) * 2006-03-31 2007-10-04 Matsushita Electric Industrial Co., Ltd. Method for on demand distributed hash table update
US20080043633A1 (en) * 2006-04-03 2008-02-21 Padula Richard A Method and system for performing emta loop diagnostics
US20080060082A1 (en) * 2006-05-24 2008-03-06 International Business Machines Corporation Validating routing of client requests to appropriate servers hosting specific stateful web service instances

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
EMC Corporation, "EMC Smarts Network Protocol Manager," Data Sheet S0002, May 2005.

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110082923A1 (en) * 2009-10-02 2011-04-07 Canon Kabushiki Kaisha Communication apparatus having a plurality of network interfaces, method for controlling the communication apparatus, and storage medium

Also Published As

Publication number Publication date
US20070280120A1 (en) 2007-12-06

Similar Documents

Publication Publication Date Title
US7957295B2 (en) Ethernet performance monitoring
US10142203B2 (en) Ethernet fault management systems and methods
US8811193B2 (en) Network path discovery and analysis
US8451745B2 (en) Auto probing endpoints for performance and fault management
US7986632B2 (en) Proactive network analysis system
US8774010B2 (en) System and method for providing proactive fault monitoring in a network environment
US11509552B2 (en) Application aware device monitoring correlation and visualization
CN111934936B (en) Network state detection method and device, electronic equipment and storage medium
EP2586158B1 (en) Apparatus and method for monitoring of connectivity services
CN108449210B (en) Network routing fault monitoring system
US10742672B2 (en) Comparing metrics from different data flows to detect flaws in network data collection for anomaly detection
US8971195B2 (en) Querying health of full-meshed forwarding planes
US11032124B1 (en) Application aware device monitoring
US8467301B2 (en) Router misconfiguration diagnosis
US7792045B1 (en) Method and apparatus for configuration and analysis of internal network routing protocols
US20070061663A1 (en) Method and system for identifying root cause of network protocol layer failures
CN107707429B (en) Method and system for discovering IP route interruption
CN115955690A (en) Wireless signal strength based detection of poor network link performance
JP5968829B2 (en) Evaluation method, evaluation apparatus, and evaluation program
Luwemba Practical analysis of flows with IPFIX
WO2017156675A1 (en) Operation administration maintenance system

Legal Events

Date Code Title Description
AS Assignment

Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WONG, KAM C.;ZWETKOF, PETER C.;RHODES, DAVID M.;REEL/FRAME:017980/0526

Effective date: 20060531

STCF Information on status: patent grant

Free format text: PATENTED CASE

CC Certificate of correction
AS Assignment

Owner name: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP, TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.;REEL/FRAME:037079/0001

Effective date: 20151027

FPAY Fee payment

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8