JP2008061139A

JP2008061139A - Network monitoring device, network monitoring method, and computer program

Info

Publication number: JP2008061139A
Application number: JP2006238203A
Authority: JP
Inventors: Yuichiro Hei; 雄一郎屏; Tomohiko Ogishi; 智彦大岸; Shigehiro Ano; 茂浩阿野
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2006-09-01
Filing date: 2006-09-01
Publication date: 2008-03-13
Anticipated expiration: 2026-09-01
Also published as: JP4620019B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a computer program for improving precision in monitoring the controlling state of a route and providing a network monitoring device utilizing a computer. <P>SOLUTION: The network monitoring device is provided with: an LSA collection part 11 which collects the link-state advertisement message of a route control protocol from an IP network; an LSDB management part 13 for preparing a link-state data base having a state value for managing difference information of the collected link-state advertisement message; and an error/restoration judging part 14 judging an error using a criterion based on the arrival pattern of the link-state advertisement message corresponding to an assumed error by referring to the link-state data base. The error/restoration judging part 14 performs two-stage error judgment having a time difference considering the delay of arrival of the link-state advertisement message. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、ネットワーク監視装置およびネットワーク監視方法、コンピュータプログラムに関する。 The present invention relates to a network monitoring device, a network monitoring method, and a computer program.

従来、IP（Internet Protocol）を用いた通信ネットワーク（以下、「IPネットワーク」と称する）では、経路制御プロトコルにより、パケットがどの経路を通って伝送されるかを制御（経路制御）している。そして、その経路制御状態を監視することで、IPネットワーク内の経路障害発生やその復旧をできる限り即時に把握するように努め、IPネットワークの安定的な運用を図っている。 2. Description of the Related Art Conventionally, in a communication network using IP (Internet Protocol) (hereinafter referred to as “IP network”), a route control protocol controls (route control) which route a packet is transmitted through. By monitoring the path control status, we try to grasp the occurrence of a path failure in the IP network and its recovery as quickly as possible to ensure stable operation of the IP network.

IPネットワークで利用される経路制御プロトコルの代表的なものとしては、OSPF（Open Shortest Path First）が知られている。OSPFでは、IPネットワークを構成する各ルータが、自身が持つリンク状態（通信リンクの接続状態や通信リンクのコストなど）を、リンク状態広告メッセージ（Link-state Advertisement：LSA）を用いてIPネットワーク全体に広報する。ルータは他のルータから発信されたLSAを受信し、その受信したLSAからリンク状態データベース（Link-state Database：LSDB）を作成する。そして、LSDBに基づき、IPネットワークにおいて宛先までのコストが最小となるパスツリー（以下、「最短パスツリー」と称する）を作成し、それをもとにルータの経路テーブルを作成する。 As a typical routing protocol used in an IP network, OSPF (Open Shortest Path First) is known. In OSPF, each router that configures an IP network uses its link state (communication link connection state, communication link cost, etc.) to indicate the entire IP network using a link-state advertisement (LSA). To publicize. The router receives LSAs transmitted from other routers and creates a link-state database (LSDB) from the received LSAs. Based on the LSDB, a path tree that minimizes the cost to the destination in the IP network (hereinafter referred to as “shortest path tree”) is created, and a route table of the router is created based on the path tree.

また、OSPFでは、ルータは、自身が持つリンク状態を定期的にLSAを用いて送信すること（Refresh LSA）に加えて、自身が持つリンク状態に変更があった場合にも、例えばルータに接続した通信リンクが切れたことを検知した場合などに、LSAを送信する。ルータは、以前に受信したLSAとは異なるメッセージ内容のLSAを受信すると、LSDBを書き換え、トポロジーや経路テーブルを再生成する。従って、OSPFが動作するIPネットワークにおいてLSAを収集し監視すれば、IPネットワークの経路制御状態の変更を検知することができる。 In OSPF, in addition to periodically sending the link status of the router using LSA (Refresh LSA), when the link status of the router changes, for example, the router connects to the router. LSA is transmitted when it is detected that the communication link is broken. When the router receives an LSA with a message content different from the previously received LSA, the router rewrites the LSDB and regenerates the topology and route table. Therefore, if an LSA is collected and monitored in an IP network on which OSPF operates, a change in the route control state of the IP network can be detected.

例えば特許文献１記載の従来のネットワーク監視技術では、IPネットワークに接続する監視端末装置がLSAを収集し、LSDBと、IPネットワーク内の全てのルータを含む最短パスツリーとを作成する。そして、LSAの変化分を受信すると、LSDB及び最短パスツリーを再生成し、変更前のLSDBと変更後のLSDB、もしくは、変更前の最短パスツリーと変更後の最短パスツリーを比較し、変更箇所をディスプレイ等に明示的に表示している。
特開２００６−１４０８３４号公報 For example, in the conventional network monitoring technique described in Patent Document 1, a monitoring terminal device connected to an IP network collects LSAs and creates an LSDB and a shortest path tree including all routers in the IP network. When the change of LSA is received, the LSDB and the shortest path tree are regenerated, the LSDB before the change and the LSDB after the change, or the shortest path tree before the change and the shortest path tree after the change are compared, and the changed part is displayed. Etc. are explicitly displayed.
JP 2006-140834 A

しかし、上述した従来のネットワーク監視技術では、以下に示すような問題がある。
IPネットワークで障害が発生した場合、その障害を検知したルータは、自身のリンク状態を変更したLSAをIPネットワーク全体に広報する。このとき、一つの障害を複数のルータが検知すると、その障害を検知した全てのルータが、自身のリンク状態を変更したLSAをネットワーク全体に広報するが、検知時間のずれやLSAの伝播遅延により、ある一つの障害に関連する複数のLSAが、監視端末装置に数十秒の時間差をもって到着する場合がある。すると、各LSAに応じてLSDBや最短パスツリーの変更が複数回発生するが、その複数回の変更点を表示しても、ネットワーク運用管理者には、それらの変更点が、一つの障害によるものなのか、あるいは複数の障害によるものなのかが分からない。 However, the above-described conventional network monitoring technology has the following problems.
When a failure occurs in the IP network, the router that detects the failure advertises the LSA whose link state has been changed to the entire IP network. At this time, if multiple routers detect one failure, all the routers that detected the failure will publicize the LSA whose link status has been changed to the entire network, but due to the detection time lag and LSA propagation delay, A plurality of LSAs related to a single failure may arrive at the monitoring terminal device with a time difference of several tens of seconds. Then, depending on each LSA, the LSDB and the shortest path tree change occur multiple times. Even if the changes are displayed multiple times, the network operation administrator can see these changes due to a single failure. I don't know if this is due to multiple failures.

また、障害によりIPネットワークの分断が発生した場合には、監視端末装置から見て分断発生箇所よりも遠い所にあるルータからのLSAは、該監視端末装置には届かないため、受信したLSAに基づき、LSDBやパスツリーの変更点を検出しても、正確な障害状況を把握することができない。 In addition, when the IP network is divided due to a failure, the LSA from the router that is farther from the location where the division occurs when viewed from the monitoring terminal device does not reach the monitoring terminal device. On the basis of this, even if the LSDB or path tree change point is detected, the exact failure status cannot be grasped.

本発明は、このような事情を考慮してなされたもので、その目的は、経路制御状態の監視に係る精度向上を図ることのできるネットワーク監視装置およびネットワーク監視方法を提供することにある。 The present invention has been made in view of such circumstances, and an object of the present invention is to provide a network monitoring apparatus and a network monitoring method capable of improving accuracy related to monitoring of a path control state.

また、本発明の他の目的は、本発明のネットワーク監視装置をコンピュータを利用して実現するためのコンピュータプログラムを提供することにある。 Another object of the present invention is to provide a computer program for realizing the network monitoring apparatus of the present invention using a computer.

上記の課題を解決するために、本発明に係るネットワーク監視装置は、IPネットワークから経路制御プロトコルのリンク状態広告メッセージを収集するメッセージ収集手段と、該収集されたリンク状態広告メッセージの差分情報を管理するための状態値を有するリンク状態データベースを作成するデータベース管理手段と、前記リンク状態データベースを参照し、想定される障害に対応するリンク状態広告メッセージの到着パターンに基づいた判定基準を用いて障害判定を行う判定手段と、を備え、前記判定手段は、リンク状態広告メッセージの到着遅延を考慮した時間差をもった２段階の障害判定を行うことを特徴とする。 In order to solve the above problem, a network monitoring apparatus according to the present invention manages message collection means for collecting a link status advertisement message of a routing protocol from an IP network, and differential information of the collected link status advertisement message. A database management means for creating a link state database having a state value for performing failure determination using a criterion based on an arrival pattern of a link state advertisement message corresponding to an assumed failure with reference to the link state database And determining means for performing a two-stage failure determination with a time difference in consideration of the arrival delay of the link state advertisement message.

本発明に係るネットワーク監視装置においては、先の障害判定時に収集したリンク状態広告メッセージの到着パターンから、遅れて到着すると予想されるリンク状態広告メッセージを待ち受けリンク状態広告メッセージとして記録する待ち受けメッセージ記録手段を備え、前記判定手段は、前記記録された待ち受けリンク状態広告メッセージが受信された場合に、後の障害判定を行うことを特徴とする。 In the network monitoring apparatus according to the present invention, the standby message recording means for recording the link state advertisement message expected to arrive late from the arrival pattern of the link state advertisement message collected at the time of the previous failure determination as the standby link state advertisement message. The determination means performs a subsequent failure determination when the recorded standby link state advertisement message is received.

本発明に係るネットワーク監視装置においては、前記判定手段は、起こりうる障害の種別毎に、所定の判定順序に従って、障害判定を行うことを特徴とする。 The network monitoring apparatus according to the present invention is characterized in that the determination unit performs failure determination according to a predetermined determination order for each type of failure that may occur.

本発明に係るネットワーク監視装置においては、前記判定手段は、先に行った障害判定結果と後に行った障害判定結果とが異なる場合には、後に行った障害判定結果を採用することを特徴とする。 In the network monitoring apparatus according to the present invention, the determination unit adopts a failure determination result performed later when a failure determination result performed earlier and a failure determination result performed later are different. .

本発明に係るネットワーク監視装置においては、前記障害判定時に、ネットワークの分断が発生しているかを判定するネットワーク分断判定手段を備えたことを特徴とする。 The network monitoring device according to the present invention is characterized in that it includes a network partition determination unit that determines whether or not a network partition has occurred at the time of the failure determination.

本発明に係るネットワーク監視装置においては、障害発生により削除されたリンク情報が、再度、リンク状態広告メッセージにより受信された時に、障害が復旧したと判断する障害復旧判定手段を備えたことを特徴とする。 The network monitoring device according to the present invention is characterized by comprising failure recovery determination means for determining that the failure has been recovered when the link information deleted due to the occurrence of the failure is received again by the link status advertisement message. To do.

本発明に係るネットワーク監視方法は、IPネットワークから経路制御プロトコルのリンク状態広告メッセージを収集する過程と、該収集されたリンク状態広告メッセージの差分情報を管理するための状態値を有するリンク状態データベースを作成する過程と、前記リンク状態データベースを参照し、想定される障害に対応するリンク状態広告メッセージの到着パターンに基づいた判定基準を用いて障害判定を行う過程と、リンク状態広告メッセージの到着遅延を考慮した時間差をもって２段階で前記障害判定を行う過程とを含むことを特徴とする。 The network monitoring method according to the present invention includes a process of collecting a link status advertisement message of a routing protocol from an IP network, and a link state database having a state value for managing difference information of the collected link state advertisement message. A process of making a failure determination using a determination criterion based on an arrival pattern of a link state advertisement message corresponding to an assumed failure with reference to the link state database, and an arrival delay of the link state advertisement message And a step of performing the failure determination in two stages with a time difference taken into consideration.

本発明に係るコンピュータプログラムは、IPネットワークから経路制御プロトコルのリンク状態広告メッセージを収集する機能と、該収集されたリンク状態広告メッセージの差分情報を管理するための状態値を有するリンク状態データベースを作成する機能と、前記リンク状態データベースを参照し、想定される障害に対応するリンク状態広告メッセージの到着パターンに基づいた判定基準を用いて障害判定を行う機能と、リンク状態広告メッセージの到着遅延を考慮した時間差をもって２段階で前記障害判定を行う機能とをコンピュータに実現させることを特徴とする。
これにより、前述のネットワーク監視装置がコンピュータを利用して実現できるようになる。 The computer program according to the present invention creates a link state database having a function for collecting a link state advertisement message of a routing protocol from an IP network and a state value for managing difference information of the collected link state advertisement message. A function that performs a failure determination using a determination criterion based on an arrival pattern of a link state advertisement message corresponding to an assumed failure, and a delay of arrival of the link state advertisement message. The computer is realized with the function of performing the failure determination in two steps with the time difference.
As a result, the network monitoring apparatus described above can be realized using a computer.

本発明によれば、障害箇所やその状況を正確に特定することが容易になり、経路制御状態の監視に係る精度向上を図ることができる。また、障害判定と同時にネットワーク分断判定を行うことにより、障害発生状況とともにネットワーク分断状況を把握することができる。これにより、ネットワークの分断が発生した場合には、ネットワーク分断に起因した障害判定結果の誤り発生の可能性を知ることができ、ネットワーク運用管理者等は、ネットワーク分断の影響を考慮して障害判定結果を利用することで、効率的な障害対応を行うことが可能になる。 ADVANTAGE OF THE INVENTION According to this invention, it becomes easy to pinpoint a fault location and its condition easily, and it can aim at the precision improvement regarding monitoring of a path control state. Further, by performing the network division determination at the same time as the failure determination, it is possible to grasp the network division status together with the failure occurrence status. As a result, when network partitioning occurs, it is possible to know the possibility of an error in the failure determination result due to network partitioning, and network operation managers can determine the failure considering the effects of network partitioning. By using the result, it becomes possible to perform an efficient failure response.

以下、図面を参照し、本発明の一実施形態について説明する。
図１は、本発明の一実施形態に係る監視対象のネットワーク構成例を示した図である。図１に示されるネットワークはIPネットワークである。また、図１に示されるネットワークは、経路制御プロトコルとしてOSPFを利用している。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
FIG. 1 is a diagram illustrating a network configuration example of a monitoring target according to an embodiment of the present invention. The network shown in FIG. 1 is an IP network. The network shown in FIG. 1 uses OSPF as a route control protocol.

図１において、ネットワークＮ０は、企業内ＬＡＮ等のローカルエリアネットワーク（ＬＡＮ）である。ネットワークＮ５は、ネットワークＮ０に属さない外部のネットワークであり、例えばインターネットである。外部ネットワークＮ５は、ルータＲ４を介してネットワークＮ０に接続している。 In FIG. 1, a network N0 is a local area network (LAN) such as a corporate LAN. The network N5 is an external network that does not belong to the network N0, for example, the Internet. The external network N5 is connected to the network N0 via the router R4.

ネットワークＮ０は、１０台のルータＲ１〜１０と、スタブネットワークＮ１，Ｎ３，Ｎ４と、トランジットネットワークＮ２とから構成される。ルータＲ１〜１０はOSPFに従って経路制御動作を行う。ネットワークＮ０内は、３つのエリア０，１，２に分割されている。エリア０はバックボーンエリアである。 The network N0 includes ten routers R1 to R10, stub networks N1, N3, and N4, and a transit network N2. The routers R1 to R10 perform path control operations according to OSPF. The network N0 is divided into three areas 0, 1, and 2. Area 0 is a backbone area.

ルータＲ１〜Ｒ４はエリア０に属する。ルータＲ７，Ｒ８はエリア１に属する。ルータＲ９，Ｒ１０はエリア２に属する。ルータＲ５はエリア０とエリア１の両方に属する。ルータＲ６はエリア０とエリア２の両方に属する。 Routers R1 to R4 belong to area 0. Routers R7 and R8 belong to area 1. Routers R9 and R10 belong to area 2. Router R5 belongs to both area 0 and area 1. Router R6 belongs to both area 0 and area 2.

スタブネットワークＮ１はルータＲ３に接続されている。トランジットネットワークＮ２はルータＲ２，Ｒ４，Ｒ６に接続されている。外部ネットワークＮ５はルータＲ４に接続されている。ルータＲ４は、外部ネットワークＮ５の経路情報を受信し、ネットワークＮ０内に再配布するAS（Autonomous System）境界ルータである。 The stub network N1 is connected to the router R3. The transit network N2 is connected to routers R2, R4, and R6. The external network N5 is connected to the router R4. The router R4 is an AS (Autonomous System) border router that receives route information of the external network N5 and redistributes it within the network N0.

OSPFでは、経路情報メッセージの転送量を低減させるために、ネットワークが大規模な場合には、図１に示すように、ネットワークをバックボーンエリアを含む複数のエリアに分割する。あるエリアに属するルータから送信されたLSA（リンク状態広告メッセージ）は、同一エリアに属する全てのルータが受信可能である。従って、ルータは、自身が属するエリアのリンク状態については全て取得することができる。しかし、各エリアの境界に位置するエリア境界ルータ（図１中のルータＲ５，Ｒ６）は、エリア間では要約したLSAのみ送信するため、あるエリアに属するルータは、他のエリアのリンク状態を全て取得することはできない。図１では、例えばルータＲ５は、エリア０のリンク状態を要約したLSAをエリア１の各ルータＲ７，Ｒ８に送信し、エリア１の状態を要約したLSAをエリア０の各ルータＲ１〜Ｒ４に送信する。 In OSPF, in order to reduce the transfer amount of the route information message, when the network is large, the network is divided into a plurality of areas including a backbone area as shown in FIG. An LSA (link status advertisement message) transmitted from a router belonging to a certain area can be received by all routers belonging to the same area. Therefore, the router can acquire all the link states of the area to which the router belongs. However, since the area border routers (routers R5 and R6 in FIG. 1) located at the border of each area transmit only the summarized LSA between the areas, the routers belonging to one area have all the link states of other areas. It cannot be acquired. In FIG. 1, for example, router R5 sends an LSA summarizing the link status of area 0 to each router R7, R8 in area 1, and sends an LSA summarizing the status of area 1 to each router R1-R4 in area 0. To do.

このため、ネットワーク全体のLSAを収集し全てのリンク状態を取得するためには、ネットワーク監視装置をエリア毎に設置する。図１では、エリア０において、ルータＲ２にネットワーク監視装置１０を接続している。また、エリア１において、ルータＲ８にネットワーク監視装置１０を接続している。また、エリア２において、ルータＲ１０にネットワーク監視装置１０を接続している。ネットワーク監視装置１０は、OSPFに従って、接続するルータからLSAを収集する。そして、収集したLSAからLSDBを生成し、障害判定や障害復旧判定を行う。 For this reason, in order to collect LSA of the entire network and acquire all link states, a network monitoring device is installed for each area. In FIG. 1, in the area 0, the network monitoring device 10 is connected to the router R2. In area 1, the network monitoring device 10 is connected to the router R8. In area 2, the network monitoring device 10 is connected to the router R10. The network monitoring apparatus 10 collects LSA from the router to which it is connected according to OSPF. Then, an LSDB is generated from the collected LSA, and failure determination and failure recovery determination are performed.

図２は、本発明の一実施形態に係るネットワーク監視装置１０の構成を示すブロック図である。図２において、ネットワーク監視装置１０は、LSA収集部１１とLSA管理部１２とLSDB管理部１３と障害・復旧判定部１４と障害管理テーブル１５と判定結果格納部１６とネットワーク分断判定部１７を有する。 FIG. 2 is a block diagram showing the configuration of the network monitoring apparatus 10 according to an embodiment of the present invention. In FIG. 2, the network monitoring apparatus 10 includes an LSA collection unit 11, an LSA management unit 12, an LSDB management unit 13, a failure / recovery determination unit 14, a failure management table 15, a determination result storage unit 16, and a network division determination unit 17. .

LSA収集部１１は、自ネットワーク監視装置１０が接続するルータから送信されるLSAを受信する。LSA管理部１２は、LSA収集部１１で受信されたLSAを記録する。LSDB管理部１３は、LSA管理部１２で記録されているLSAから、LSDB（リンク状態データベース）を作成し保持する。 The LSA collection unit 11 receives the LSA transmitted from the router to which the own network monitoring device 10 is connected. The LSA management unit 12 records the LSA received by the LSA collection unit 11. The LSDB management unit 13 creates and holds an LSDB (link state database) from the LSA recorded by the LSA management unit 12.

図３は、図２に示すLSDB管理部１３で保持されるLSDBの構成例である。この図３の例は、図１に示すエリア０に配置されたネットワーク監視装置１０のものである。図３に示すように、LSDBは、ルータやネットワークの接続関係をマトリックス形式で管理する。そして、二点間（図３中のfromからtoへ）に接続関係がある場合は、LSタイプ、リンクタイプ、リンクコスト、LS age及び状態値を含むデータ３０を保持する。図３の例では、ルータＲ３とルータＲ５間に接続関係があり、そのデータ３０が例示されている。データ３０において、「状態値」は、「Del」、「Alive」、「NG」及び「NC」の４種類の値を有し、LSAの差分情報を管理するための情報である。。「Del」は接続関係の削除を表す。「Alive」は接続関係の復旧を表す。「NG」は後述する障害判定処理において既に参照済みの要素を表す。「NC」は変更なしを表す。 FIG. 3 is a configuration example of the LSDB held by the LSDB management unit 13 shown in FIG. The example of FIG. 3 is for the network monitoring apparatus 10 arranged in the area 0 shown in FIG. As shown in FIG. 3, the LSDB manages router and network connection relationships in a matrix format. If there is a connection relationship between two points (from to in FIG. 3), the data 30 including the LS type, link type, link cost, LS age, and state value is held. In the example of FIG. 3, there is a connection relationship between the router R3 and the router R5, and the data 30 is illustrated. In the data 30, the “state value” has four types of values “Del”, “Alive”, “NG”, and “NC”, and is information for managing LSA difference information. . “Del” represents deletion of a connection relationship. “Alive” represents restoration of the connection relationship. “NG” represents an element that has already been referred to in a failure determination process described later. “NC” indicates no change.

障害・復旧判定部１４は、「Refresh LSA」ではないLSAの受信があると、判定処理を行う。その判定処理としては、LSDB管理部１３で保持するLSDBを参照して障害判定を行い、障害箇所を特定する。また、LSDB管理部１３で保持するLSDBを参照して障害復旧判定を行い、障害復旧箇所を特定する。障害・復旧判定部１４は、その判定結果を障害管理テーブル１５に登録する。また、障害・復旧判定部１４は、その判定結果を判定結果格納部１６に出力する。 The failure / recovery determination unit 14 performs determination processing when an LSA other than “Refresh LSA” is received. As the determination process, a failure determination is performed by referring to the LSDB held by the LSDB management unit 13 to identify the failure location. Also, failure recovery determination is performed with reference to the LSDB held by the LSDB management unit 13 to identify the failure recovery location. The failure / recovery determination unit 14 registers the determination result in the failure management table 15. Further, the failure / recovery determination unit 14 outputs the determination result to the determination result storage unit 16.

ネットワーク分断判定部１７は、LSDB管理部１３で保持しているLSDBを参照して、ネットワークの分断が発生しているか否かを判定する。ネットワーク分断判定部１７は、その判定結果を判定結果格納部１５に出力する。 The network division determination unit 17 refers to the LSDB held by the LSDB management unit 13 and determines whether network division has occurred. The network division determination unit 17 outputs the determination result to the determination result storage unit 15.

障害管理テーブル１５は、障害・復旧判定部１４で特定された障害箇所および障害復旧箇所を記憶する。判定結果格納部１５は、障害・復旧判定部１４で特定された障害箇所および障害復旧箇所を記憶する。また、判定結果格納部１５は、ネットワーク分断判定部１７によるネットワーク分断に係る判定結果を記憶する。 The failure management table 15 stores the failure location and the failure recovery location identified by the failure / recovery determination unit 14. The determination result storage unit 15 stores the failure location and the failure recovery location specified by the failure / recovery determination unit 14. In addition, the determination result storage unit 15 stores a determination result related to network division by the network division determination unit 17.

次に、図２に示すネットワーク監視装置１０の動作を説明する。 Next, the operation of the network monitoring apparatus 10 shown in FIG. 2 will be described.

まず、ネットワーク監視装置１０は、OSPF機能を動作させることで、接続するルータと隣接関係を確立し、隣接関係を確立したルータからLSAを受信する。受信したLSAについてはLSA管理部１２で保持する。LSA管理部１２は、新たに受信したLSAを既に保持しているLSAと比較し、リンク状態の変更の有無を調べる。その結果、リンク状態の変更がある場合には、リンク状態の変更点を示す差分リンク情報を作成し、一定期間保持する。例えば、既に保持しているLSAでは存在していたリンクＡが、新たに受信したLSAでは削除された場合には、「リンクＡ削除」を示す差分リンク情報を作成し、一定期間保持する。LSA管理部１２は、保持中のLSAに関し、一定期間において同じLSA（Refresh LSA）が受信されなかった場合には、当該LSAを削除する。 First, the network monitoring device 10 operates the OSPF function to establish an adjacency relationship with a router to be connected, and receives an LSA from the router that has established the adjacency relationship. The LSA management unit 12 holds the received LSA. The LSA management unit 12 compares the newly received LSA with the already held LSA and checks whether the link state has changed. As a result, when there is a change in the link state, difference link information indicating the change point of the link state is created and held for a certain period. For example, when the link A that has already existed in the already held LSA is deleted in the newly received LSA, differential link information indicating “delete link A” is created and held for a certain period. When the same LSA (Refresh LSA) is not received for a certain period with respect to the held LSA, the LSA management unit 12 deletes the LSA.

図４は、本実施形態の障害判定に係る全体処理の手順を示すフローチャートである。図４を参照し、障害判定に係る動作を説明する。
ステップＳ１では、LSA管理部１２は、LSA収集部１１で受信されたLSAが「Refresh LSA」であるか否かを監視する。ステップＳ２では、LSA管理部１２は、その監視の結果、「Refresh LSA」ではない、変更されたLSAが受信されたならば、タイマー（タイムアウト時間「X秒」）を起動する。タイムアウト時間「X秒」は、監視対象ネットワークの規模などによって決まる値であるが、通常は数秒程度に設定すればよい。 FIG. 4 is a flowchart showing a procedure of overall processing according to the failure determination of the present embodiment. With reference to FIG. 4, an operation related to the failure determination will be described.
In step S <b> 1, the LSA management unit 12 monitors whether or not the LSA received by the LSA collection unit 11 is “Refresh LSA”. In step S2, the LSA management unit 12 activates a timer (timeout time “X seconds”) when a changed LSA that is not “Refresh LSA” is received as a result of the monitoring. The timeout time “X seconds” is a value determined by the scale of the network to be monitored, etc., but it may be set to about several seconds normally.

ステップＳ３では、LSA管理部１２は、タイムアウト前（X秒以内）に、別の変更されたLSAが受信されるかを確認する。ステップＳ４では、LSA管理部１２は、その確認の結果、別の変更されたLSAが受信されたならば、タイマーをリセットしてステップＳ３に戻り、再びX秒以内に別の変更されたLSAが受信される否かを監視する。一方、タイムアウトした場合には、ステップＳ５に進む。 In step S3, the LSA management unit 12 confirms whether another changed LSA is received before the timeout (within X seconds). In step S4, if another changed LSA is received as a result of the confirmation, the LSA management unit 12 resets the timer and returns to step S3, and another changed LSA is again received within X seconds. Monitor whether it is received or not. On the other hand, if a timeout has occurred, the process proceeds to step S5.

ステップＳ５では、つまり、X秒以内に変更されたLSAが受信されなかった場合は、LSDB管理部１３は、LSDBを初期化し、LSA管理部１２で保持されているLSAからLSDBを再生成する。ステップＳ６では、障害・復旧判定部１４は、その再生成されたLSDBに基づいて障害判定処理を行う。 In step S5, that is, if the LSA changed within X seconds is not received, the LSDB management unit 13 initializes the LSDB and regenerates the LSDB from the LSA held in the LSA management unit 12. In step S6, the failure / recovery determination unit 14 performs a failure determination process based on the regenerated LSDB.

図５は、図４に示す障害判定処理（ステップＳ６）の手順を示すフローチャートである。障害判定処理（ステップＳ６）では、図５に示される手順で、障害の種類別に障害判定を行う。以下、各ステップにおける障害種別毎の判定基準について説明する。この障害判定基準は、LSAのパターンに基づいている。 FIG. 5 is a flowchart showing the procedure of the failure determination process (step S6) shown in FIG. In the failure determination process (step S6), failure determination is performed for each type of failure according to the procedure shown in FIG. Hereinafter, criteria for each failure type in each step will be described. This failure criterion is based on the LSA pattern.

ステップＳ２１；自律ルータ障害では、障害となるルータが、障害前に自身が持つリンク情報を全て削除とするLSAを送信する。ネットワーク監視装置１０では、そのLSAを受信した場合に、該当ルータで自律ルータ障害が発生したと判定する。自律ルータ障害の例としては、ルータの再起動などが考えられる。 Step S21: In the case of an autonomous router failure, the router that becomes the failure transmits an LSA that deletes all link information that the router has before the failure. When the network monitoring apparatus 10 receives the LSA, it determines that an autonomous router failure has occurred in the corresponding router. An example of an autonomous router failure is a router restart.

ステップＳ２２；ルータ障害では、障害となるルータ自身は何もLSAを送信しないが、障害となるルータと接続していたルータが障害を検知して、障害ルータへの接続リンク情報を削除するLSAを送信する。ネットワーク監視装置１０では、全ての接続ルータから、障害ルータへの接続リンク情報が削除されたLSAを受信した場合に、そのルータでルータ障害が発生したと判定する。ルータ障害の例としては、ルータの不意の電源断などが考えられる。 Step S22: In the case of a router failure, the faulty router itself does not send any LSA, but the router connected to the faulty router detects the fault and deletes the link link information to the faulty router. Send. When the network monitoring apparatus 10 receives an LSA from which connection link information to a failed router has been deleted from all the connected routers, it determines that a router failure has occurred in that router. An example of a router failure is a sudden power interruption of the router.

ステップＳ２３；トランジットネットワーク障害では、そのトランジットネットワークの代表ルータが、障害ネットワークを削除するLSAを送信する。また、障害となるトランジットネットワークに接続していたルータが、障害ネットワークへの接続リンク情報を削除するLSAを送信する。ネットワーク監視装置１０では、そのLSAを受信した場合に、トランジットネットワーク障害が発生したと判定する。トランジットネットワーク障害の例として、ネットワークのレイヤ２スイッチ障害などが考えられる。 Step S23: In the transit network failure, the representative router of the transit network transmits an LSA for deleting the failed network. In addition, the router connected to the faulty transit network transmits an LSA for deleting connection link information to the faulty network. When the network monitoring apparatus 10 receives the LSA, it determines that a transit network failure has occurred. An example of a transit network failure is a network layer 2 switch failure.

ステップＳ２４；PtoPリンク（ルータ間リンク）障害では、障害となるリンクの両端に接続したルータが、そのリンク情報を削除するLSAを送信する。ネットワーク監視装置１０では、どちらか一方のLSAを受信した場合に、PtoPリンク障害が発生したと判定する。 Step S24: In a PtoP link (inter-router link) failure, the router connected to both ends of the failed link transmits an LSA for deleting the link information. The network monitoring device 10 determines that a PtoP link failure has occurred when one of the LSAs is received.

ステップＳ２５；トランジットネットワークリンク障害は、ルータがトランジットネットワークに接続するリンクの障害である。そのルータがトランジットネットワークの代表ルータである場合は、トランジットネットワークへの接続リンク情報を削除するLSAを送信するとともに、そのトランジットネットワークを削除するLSAを送信する。ネットワーク監視装置１０では、それらのLSAを受信した場合に、トランジットネットワークリンク障害が発生したと判定する。一方、ルータがトランジットネットワークの代表ルータでない場合は、そのルータはトランジットネットワークへの接続リンク情報を削除するLSAを送信し、代表ルータは、そのルータがトランジットネットワークから削除されたLSAを送信する。ネットワーク監視装置１０では、それらのLSAを受信した場合に、トランジットネットワークリンク障害が発生したと判定する。 Step S25: The transit network link failure is a failure of the link connecting the router to the transit network. When the router is a representative router of the transit network, an LSA that deletes connection link information to the transit network is transmitted, and an LSA that deletes the transit network is transmitted. The network monitoring device 10 determines that a transit network link failure has occurred when these LSAs are received. On the other hand, when the router is not the representative router of the transit network, the router transmits an LSA that deletes connection link information to the transit network, and the representative router transmits the LSA that the router has been deleted from the transit network. The network monitoring device 10 determines that a transit network link failure has occurred when these LSAs are received.

ステップＳ２６；スタブネットワーク障害は、ルータにおけるスタブネットワークへの接続リンクの障害であり、ルータはスタブネットワークへの接続リンク情報を削除するLSAを送信する。ネットワーク監視装置１０では、そのLSAを受信した場合に、スタブネットワーク障害が発生したと判定する。 Step S26: The stub network failure is a failure of the connection link to the stub network in the router, and the router transmits an LSA for deleting the connection link information to the stub network. When the network monitoring apparatus 10 receives the LSA, it determines that a stub network failure has occurred.

ステップＳ２７；エリア外ネットワーク障害は、エリア境界ルータが検知する障害であり、エリア境界ルータがエリア外のあるネットワークを削除するLSAを送信する。ネットワーク監視装置１０では、そのLSAを受信した場合に、エリア外ネットワーク障害が発生したと判定する。 Step S27: The out-of-area network failure is a failure detected by the area border router, and the area border router transmits an LSA for deleting a network outside the area. When the network monitoring apparatus 10 receives the LSA, it determines that an out-of-area network failure has occurred.

ステップＳ２８；外部ネットワーク障害は、AS境界ルータが検知する障害であり、AS境界ルータが外部のあるネットワークを削除するLSAを送信する。ネットワーク監視装置１０では、そのLSAを受信した場合に、外部ネットワーク障害が発生したと判定する。 Step S28: The external network failure is a failure detected by the AS boundary router, and the AS boundary router transmits an LSA for deleting a certain external network. When the network monitoring apparatus 10 receives the LSA, it determines that an external network failure has occurred.

説明を図４に戻す。ステップＳ６の障害判定処理では、それまで（ステップＳ１〜Ｓ３）で収集された「変更されたLSA」を調査し、その収集されたLSAからどの種類の障害に相当するかを判定し、障害判定結果を障害管理テーブル１５に記録するとともに、判定結果格納部１６にログとして記録する。 Returning to FIG. In the failure determination process in step S6, the “changed LSA” collected so far (steps S1 to S3) is investigated, and the type of failure corresponding to the collected LSA is determined to determine the failure. The result is recorded in the failure management table 15 and recorded as a log in the determination result storage unit 16.

ここで、ある障害に関連して送信される全てのLSAが、ネットワーク監視装置１０にほぼ同時に到着するとは限らない。各LSAは、それぞれ別の複数のルータを中継してネットワーク監視装置１０に到着するが、例えば途中に負荷の高いルータが存在してそのルータでの中継処理が遅れた場合、ある障害に関連するLSAがネットワーク監視装置１０にはある時間差をもって到着すると考えられる。また障害の検知が遅れたルータが存在する場合は、そのルータは他のルータより遅れてLSAを送信するため、この場合も、ある障害に関連するLSAが、ネットワーク監視装置１０にはある時間差をもって到着すると考えられる。そこで、ステップＳ７では、障害・復旧判定部１４は、障害判定時に、ネットワーク監視装置１０で収集したLSAの到着パターンから、遅れて到着すると予想されるLSAを待ち受けLSAとして判定し、その待ち受けLSAを障害管理テーブル１５に記録する。なお、待ち受けLSAが存在しないと判断した場合は、図４の処理を終了する。 Here, all the LSAs transmitted in relation to a certain failure do not always arrive at the network monitoring device 10 almost simultaneously. Each LSA relays through a plurality of different routers and arrives at the network monitoring device 10. For example, when there is a router with a high load on the way and the relay processing at that router is delayed, it is related to a certain failure. It is considered that the LSA arrives at the network monitoring apparatus 10 with a certain time difference. Further, when there is a router whose failure detection is delayed, the router transmits the LSA later than the other routers. In this case, too, the LSA related to a certain failure has a certain time difference in the network monitoring device 10. Expected to arrive. Therefore, in step S7, the failure / recovery determination unit 14 determines the LSA expected to arrive late as a standby LSA from the LSA arrival patterns collected by the network monitoring device 10 at the time of failure determination, and determines the standby LSA as the standby LSA. Record in the failure management table 15. If it is determined that there is no standby LSA, the process in FIG. 4 ends.

ステップＳ８では、つまり、待ち受けLSAが存在する場合、LSA管理部１２は、さらにY秒間、変更されたLSAを収集する。LSDB管理部１３は、この収集結果をLSDBに反映させてLSDBを更新する。ステップＳ９では、そのY秒間で待ち受けLSAを受信したか判定する。その判定の結果、待ち受けLSAを受信した場合はステップＳ１０に進み、待ち受けLSAを受信しなかった場合は図４の処理を終了する。 In step S8, that is, when there is a standby LSA, the LSA management unit 12 collects the changed LSA for another Y seconds. The LSDB management unit 13 updates the LSDB by reflecting the collection result on the LSDB. In step S9, it is determined whether or not a standby LSA is received in the Y seconds. If the standby LSA is received as a result of the determination, the process proceeds to step S10. If the standby LSA is not received, the process of FIG. 4 ends.

ステップＳ１０では、再度、ステップＳ６と同様の障害判定処理を行う（障害後判定）。ステップＳ１０において、遅れて到着した待ち受けLSAを含めて障害判定を行った結果、先（ステップＳ６）と異なる判定結果が出た場合は、先の障害を解除するログを判定結果格納部１６に記録するとともに、後判定での障害判定結果を判定結果格納部１６にログとして記録する。待ち受けLSAの待ち時間「Y秒」は、障害検知の遅れ時間やLSAの伝播遅延、処理遅延を考慮して設定する必要があるが、通常は数十秒程度に設定すればよい。 In step S10, the same failure determination process as in step S6 is performed again (determination after failure). In step S10, if a failure determination is made including the standby LSA that arrived late, and a determination result different from the previous (step S6) is obtained, a log for canceling the previous failure is recorded in the determination result storage unit 16. In addition, the failure determination result in the post-determination is recorded in the determination result storage unit 16 as a log. The waiting time “Y seconds” of the standby LSA needs to be set in consideration of failure detection delay time, LSA propagation delay, and processing delay, but it is usually set to about several tens of seconds.

ここで、具体例を挙げて、本実施形態に係る障害判定処理を説明する。 Here, the failure determination processing according to the present embodiment will be described with a specific example.

［具体例１；自律ルータ障害発生］
図６に示されるネットワークにおいて、ルータ間は全てPtoPリンクで接続している。また、ネットワーク監視装置１０はルータＲ５に接続されている。この図６のネットワークにおいて、ルータＲ２で自律ルータ障害が発生したとする。 [Specific example 1: Autonomous router failure]
In the network shown in FIG. 6, the routers are all connected by PtoP links. The network monitoring device 10 is connected to the router R5. In the network of FIG. 6, it is assumed that an autonomous router failure has occurred in the router R2.

まず、ルータＲ２は自律ルータ障害が発生したため、自身が持つ全てのリンク情報を削除するLSAを送信する。またそれに付随して、ルータＲ２と接続していたルータＲ１，Ｒ３，Ｒ５は、ルータＲ２との接続が切れることを検知し、ルータＲ２へのリンク情報を削除するLSAを送信する。なお、ここでは障害に関連するLSAが全てほぼ同時にネットワーク監視装置１０で収集された場合を考える。 First, since an autonomous router failure has occurred, the router R2 transmits an LSA that deletes all link information that the router R2 has. Along with this, the routers R1, R3, and R5 connected to the router R2 detect that the connection with the router R2 is disconnected, and transmit an LSA for deleting link information to the router R2. Here, a case is considered where all the LSAs related to the failure are collected by the network monitoring apparatus 10 almost simultaneously.

その結果、ネットワーク監視装置１０のLSDBには、R2（from）からR1、R3、R5（to）へのリンク状態が削除（Del）されたことと、R1、R3、R5（from）からR2（to）へのリンク状態が削除（Del）されたことが登録される。これにより、ネットワーク監視装置１０のLSDBは、図７に示される障害判定前のLSDBとなる。なお、図７には、LSDB中の「状態値」の情報のみが示されている。 As a result, the link status from R2 (from) to R1, R3, R5 (to) has been deleted (Del) in the LSDB of the network monitoring device 10, and R1, R3, R5 (from) to R2 ( It is registered that the link state to (to) has been deleted (Del). As a result, the LSDB of the network monitoring device 10 becomes the LSDB before failure determination shown in FIG. Note that FIG. 7 shows only “state value” information in the LSDB.

次いで、障害判定処理では、まず自律ルータ障害が発生しているかをLSDBを参照して調査する（図５、ステップＳ２１）。ここでは、図７に示されるようにR2からのリンク状態が全て「Del」となっているため、R2で自律ルータ障害が発生したと判定する。次いで、そのR2に関するリンク状態で「Del」となっているものを「NG」に書き換える。「NG」は障害判定処理において既に参照済みの要素を表す。これにより、ネットワーク監視装置１０のLSDBは、図８に示される障害判定後のLSDBとなる。なお、図８には、LSDB中の「状態値」の情報のみが示されている。 Next, in the failure determination process, first, whether an autonomous router failure has occurred is investigated with reference to the LSDB (FIG. 5, step S21). Here, as shown in FIG. 7, since the link states from R2 are all “Del”, it is determined that an autonomous router failure has occurred in R2. Next, the link state related to R2 that is “Del” is rewritten to “NG”. “NG” represents an element already referenced in the failure determination process. Thereby, the LSDB of the network monitoring device 10 becomes the LSDB after the failure determination shown in FIG. Note that FIG. 8 shows only “state value” information in the LSDB.

次いで、他の障害が発生しているかを調査する。ここでLSDBに「Del」が残っていれば他の障害が発生していると考えるが、図８のLSDBの例では「Del」は残っておらず、かつ待ち受けLSAもないので、障害判定処理を終了する。 Next, it is investigated whether another failure has occurred. Here, if “Del” remains in the LSDB, it is considered that another failure has occurred. However, in the example of the LSDB in FIG. 8, “Del” does not remain and there is no standby LSA. Exit.

［具体例２；ルータ障害およびPtoPリンク障害の同時発生］
図９に示されるネットワークにおいて、ルータ間は全てPtoPリンクで接続している。また、ネットワーク監視装置１０はルータＲ６に接続されている。この図９のネットワークにおいて、ルータＲ２でルータ障害が発生し、かつ、ルータＲ３とルータＲ４間のリンクでPtoPリンク障害が発生したとする。 [Specific example 2: Simultaneous occurrence of router failure and PtoP link failure]
In the network shown in FIG. 9, the routers are all connected by PtoP links. The network monitoring device 10 is connected to the router R6. In the network of FIG. 9, it is assumed that a router failure has occurred in the router R2, and a PtoP link failure has occurred in the link between the router R3 and the router R4.

まず、ネットワーク監視装置１０はLSAを収集し、その結果、ネットワーク監視装置１０のLSDBは、図１０に示される障害判定前のLSDBとなる。なお、図１０には、LSDB中の「状態値」の情報のみが示されている。図１０において、R2はLSAを送信しないため、R2からのリンク状態は「NC」で変更されていない。「NC」は変更なしを表す。 First, the network monitoring apparatus 10 collects LSAs, and as a result, the LSDB of the network monitoring apparatus 10 becomes the LSDB before failure determination shown in FIG. FIG. 10 shows only “state value” information in the LSDB. In FIG. 10, since R2 does not transmit LSA, the link state from R2 is not changed by “NC”. “NC” indicates no change.

また、R2に接続していたルータは、R2への接続が切れたことを検知した後、R2へのリンク情報を削除するLSAを送信する。しかしながら、この時点ではR3とR6からのLSAはネットワーク監視装置１０に到着しているが、R1からのLSAはまだ到着していないとする。このため、図１０では、R3、R6（from）からR2（to）へのリンク状態が削除（Del）されたことが登録されているが、R1（from）からR2（to）へのリンク状態は「NC」で変更されていない。 Also, after detecting that the connection to R2 has been lost, the router connected to R2 transmits an LSA that deletes link information to R2. However, at this time, the LSAs from R3 and R6 have arrived at the network monitoring device 10, but the LSA from R1 has not yet arrived. Therefore, in FIG. 10, it is registered that the link state from R3, R6 (from) to R2 (to) is deleted (Del), but the link state from R1 (from) to R2 (to) Is not changed by "NC".

また、R3からはR4へのリンク情報を削除するLSAがネットワーク監視装置１０に到着し、R4からはR3へのリンク情報を削除するLSAがネットワーク監視装置１０に到着している。これにより、図１０では、R3（from）からR4（to）へのリンク状態が削除（Del）されたことと、R4（from）からR3（to）へのリンク状態が削除（Del）されたことが登録される。 In addition, an LSA that deletes link information to R4 from R3 arrives at the network monitoring device 10, and an LSA that deletes link information to R3 from R4 has arrived at the network monitoring device 10. As a result, in FIG. 10, the link state from R3 (from) to R4 (to) has been deleted (Del), and the link state from R4 (from) to R3 (to) has been deleted (Del). Is registered.

次いで、障害判定処理では、LSDBを参照して、図５に示される順番で、障害発生を調査する。まず自律ルータ障害が発生しているかを調査する（図５、ステップＳ２１）。ここでは、あるルータからのリンク状態が全て「Del」となっているルータはないので、自律ルータ障害は発生していないと判定する。 Next, in the failure determination process, the occurrence of the failure is investigated in the order shown in FIG. 5 with reference to the LSDB. First, it is investigated whether an autonomous router failure has occurred (FIG. 5, step S21). Here, since there is no router in which all link states from a certain router are “Del”, it is determined that no autonomous router failure has occurred.

次いで、ルータ障害が発生しているかを調査する（図５、ステップＳ２２）。LSDBの各行を参照し、もしリンク状態が全て「Del」であるルータがあれば、そのルータでルータ障害が発生していると判定する。しかしながら、図１０のLSDBでは、そのようなルータはないので、この時点ではルータ障害は発生していないと判定する。 Next, it is investigated whether a router failure has occurred (FIG. 5, step S22). Referring to each row of the LSDB, if there is a router whose link state is all “Del”, it is determined that a router failure has occurred in that router. However, since there is no such router in the LSDB of FIG. 10, it is determined that no router failure has occurred at this point.

次いで、図９のネットワークにはトランジットネットワークは存在しないので、PtoPリンク障害が発生しているかを調査する（図５、ステップＳ２４）。まず、R3からR4へのリンク状態とR4からR3へのリンク状態が「Del」となっているので、R3とR4の間のPtoPリンク障害が発生したと判定し、それらリンク状態を「NG」に書き換える。次に、R3からR2へのリンク状態と、R6からR2へのリンク状態が「Del」となっているので、R3とR2の間のPtoPリンク障害と、R6とR2の間のPtoPリンク障害が発生したと判定し、それらリンク状態を「NG」に書き換える。 Next, since there is no transit network in the network of FIG. 9, it is investigated whether a PtoP link failure has occurred (FIG. 5, step S24). First, since the link status from R3 to R4 and the link status from R4 to R3 are "Del", it is determined that a PtoP link failure between R3 and R4 has occurred, and the link status is set to "NG" Rewrite to Next, since the link status from R3 to R2 and the link status from R6 to R2 are “Del”, there is a PtoP link failure between R3 and R2 and a PtoP link failure between R6 and R2. It is determined that it has occurred, and the link state is rewritten to “NG”.

ここでLSAの到着遅れを考慮すると、R2からのLSAが遅れている場合と、R2に障害が発生しており、R2に接続したR1からのLSAが遅れている場合とがLSA遅れの可能性として考えられる。そこで、ここでは待ち受けLSAとして、（１）R2からR1へのリンク状態削除のLSA、（２）R2からR3へのリンク状態削除のLSA、（３）R2からR6へのリンク状態削除のLSA、（４）R1からR2へのリンク状態削除のLSA、の４つを障害管理テーブル１５に登録する。 Considering the LSA arrival delay here, there is a possibility of LSA delay when LSA from R2 is delayed and when R2 is faulty and LSA from R1 connected to R2 is delayed. Is considered. Therefore, here, as standby LSA, (1) LSA for deleting link state from R2 to R1, (2) LSA for deleting link state from R2 to R3, (3) LSA for deleting link state from R2 to R6, (4) The four LSAs for deleting the link state from R1 to R2 are registered in the failure management table 15.

ここまでの先の障害判定処理の結果、ネットワーク監視装置１０のLSDBは、図１１に示される障害判定後のLSDBとなる。なお、図１１には、LSDB中の「状態値」の情報のみが示されている。図１１に示されるように、LSDBには「Del」は残っていないので、一旦、障害判定処理を終了するが、待ち受けLSAが存在するので、再び変更されたLSAを収集する（図４、ステップＳ８）。ここでR1からR2へのリンク状態削除のLSAが、遅れてネットワーク監視装置１０に到着したとする。このLSAは待ち受けLSAであるので、障害後判定処理を実施する（図４、ステップＳ１０）。このときのLSDBは図１２の状態になっており、R1（from）からR2（to）へのリンク状態が削除（Del）されたことが登録されている。 As a result of the previous failure determination process, the LSDB of the network monitoring device 10 becomes the LSDB after the failure determination shown in FIG. FIG. 11 shows only the information of “state value” in the LSDB. As shown in FIG. 11, since “Del” does not remain in the LSDB, the failure determination process is temporarily terminated, but since there is a standby LSA, the changed LSA is collected again (FIG. 4, step). S8). Here, it is assumed that the link state deletion LSA from R1 to R2 arrives at the network monitoring apparatus 10 with a delay. Since this LSA is a standby LSA, a post-failure determination process is performed (FIG. 4, step S10). The LSDB at this time is in the state of FIG. 12, and it is registered that the link state from R1 (from) to R2 (to) has been deleted (Del).

次いで、障害後判定処理では、ルータ障害が発生しているかを調査する。LSDBの各行を参照すると、R2へのリンク状態が全て「Del」か「NG」となっており、R2へのリンク状態が全て削除されたと考えられる。このことからR2でルータ障害が発生したと判定する。ここで先に判定したR3とR2の間のPtoPリンク障害と、R6とR2の間のPtoPリンク障害は、R2のルータ障害によるものと判断して問題ないと考えられる。このことから、先の判定結果である二つのPtoPリンク障害を一つのルータ障害へと判定結果を変更する。そして、R1からR2へのリンク状態を「NG」に書き換える。これにより、ネットワーク監視装置１０のLSDBは、図１３に示される障害後判定後のLSDBとなる。 Next, in the post-failure determination process, it is investigated whether a router failure has occurred. Referring to each row of LSDB, all link states to R2 are “Del” or “NG”, and it is considered that all link states to R2 have been deleted. From this, it is determined that a router failure has occurred in R2. Here, it is considered that the PtoP link failure between R3 and R2 and the PtoP link failure between R6 and R2 determined earlier are determined to be due to the router failure of R2, and there is no problem. Therefore, the determination result is changed from the two PtoP link failures, which are the previous determination results, to one router failure. Then, the link state from R1 to R2 is rewritten to “NG”. As a result, the LSDB of the network monitoring device 10 becomes the LSDB after the failure determination shown in FIG.

ここまでの障害後判定処理結果から、R3とR2の間のPtoPリンク障害およびR6とR2の間のPtoPリンク障害を解除するログと、R2でルータ障害が発生したというログとを、判定結果格納部１６に記録する。 Based on the results of the post-failure judgment processing so far, the judgment results are stored as a log that clears the PtoP link fault between R3 and R2 and the PtoP link fault between R6 and R2, and a log that a router fault has occurred in R2. Part 16 is recorded.

以上が障害判定に係る動作の説明である。
なお、障害・復旧判定部１４が行う復旧判定処理では、障害発生により削除されたリンク情報が、再度、LSAによりリンク情報として送信され、そのLSAをネットワーク監視装置１０で受信した時に、障害が復旧したと判断する。 The above is the description of the operation related to the failure determination.
In the recovery determination process performed by the failure / recovery determination unit 14, the link information deleted due to the failure is transmitted again as link information by the LSA, and when the LSA is received by the network monitoring device 10, the failure is recovered. Judge that

次に、図１４を参照して、ネットワーク分断判定部１７によるネットワーク分断判定に係る動作を説明する。図１４は、本実施形態に係るネットワーク分断判定処理の手順を示すフローチャートである。
ネットワーク分断判定部１７は、障害判定時に、LSDBを参照して、ネットワークの分断が発生しているか否かを判断する。LSDBを参照すれば、エリア内の任意のルータから、同じエリア内の他の全てのルータまでの経路が存在するか否かを確認することができる。 Next, with reference to FIG. 14, the operation | movement which concerns on the network parting determination by the network parting determination part 17 is demonstrated. FIG. 14 is a flowchart illustrating a procedure of network partitioning determination processing according to the present embodiment.
The network division determination unit 17 refers to the LSDB when determining a failure and determines whether or not a network division has occurred. By referring to the LSDB, it is possible to check whether there is a route from any router in the area to all other routers in the same area.

図１４において、ステップＳ４１では、障害判定処理が行われたかを監視し、障害判定処理が行われた場合にステップＳ４２に進む。ステップＳ４２では、LSDBを参照して、自ネットワーク監視装置１０が接続しているルータを起点としたときの、エリア内の全てのルータを接続するパスツリーを算出する。ステップＳ４３では、その算出したパスツリーに、エリア内の全てのルータが含まれるか否かを判定する。 In FIG. 14, in step S41, it is monitored whether the failure determination process has been performed. If the failure determination process has been performed, the process proceeds to step S42. In step S42, with reference to the LSDB, a path tree connecting all the routers in the area when the router to which the own network monitoring apparatus 10 is connected as a starting point is calculated. In step S43, it is determined whether or not all the routers in the area are included in the calculated path tree.

その判定の結果、エリア内の全てのルータが含まれる場合はステップＳ４４に進み、ネットワークの分断は発生していないと判定する。一方、エリア内の全てのルータが含まれていない場合はステップＳ４５に進む。 As a result of the determination, if all the routers in the area are included, the process proceeds to step S44, and it is determined that the network is not divided. On the other hand, if all the routers in the area are not included, the process proceeds to step S45.

ステップＳ４５では、ネットワークの分断が発生していると判定し、ステップＳ４６で、障害判定結果とあわせて判定結果格納部１６に記録する。 In step S45, it is determined that the network has been divided, and in step S46, it is recorded in the determination result storage unit 16 together with the failure determination result.

上述した実施形態によれば、想定される障害に対応するLSAの到着パターンに基づいた判定基準を用いて障害判定を行う。さらに、LSAの到着遅延を考慮した時間差をもった２段階の障害判定を行う。これにより、障害箇所やその状況を正確に特定することが容易になり、経路制御状態の監視に係る精度向上を図ることができる。この結果、障害箇所や原因を特定するために要する時間を短縮することができ、ネットワーク運用管理者等は効率的に障害対応を行うことが可能になる。 According to the above-described embodiment, the failure determination is performed using the determination criterion based on the LSA arrival pattern corresponding to the assumed failure. Further, a two-stage failure determination with a time difference taking into account the LSA arrival delay is performed. As a result, it becomes easy to accurately identify the location of failure and its situation, and the accuracy of monitoring the path control state can be improved. As a result, it is possible to reduce the time required to identify the location and cause of the failure, and the network operation manager or the like can efficiently handle the failure.

また、障害判定と同時にネットワーク分断判定を行うことにより、障害発生状況とともにネットワーク分断状況を把握することができる。これにより、ネットワークの分断が発生した場合には、ネットワーク分断に起因した障害判定結果の誤り発生の可能性を知ることができる。この結果、ネットワーク運用管理者等は、ネットワーク分断の影響を考慮して障害判定結果を利用することで、効率的な障害対応を行うことができ、障害対応時間の短縮を図ることが可能になる。 Further, by performing the network division determination at the same time as the failure determination, it is possible to grasp the network division status together with the failure occurrence status. Thereby, when the network is divided, it is possible to know the possibility of an error in the failure determination result due to the network division. As a result, network operation managers can use the failure determination result in consideration of the effects of network partitioning, so that the failure can be efficiently dealt with and the failure handling time can be shortened. .

なお、本実施形態に係るネットワーク監視装置１０は、専用のハードウェアにより実現されるものであってもよく、あるいはパーソナルコンピュータ等のコンピュータシステムにより構成され、図２に示されるネットワーク監視装置１０の各機能を実現するためのプログラムを実行することによりその機能を実現させるものであってもよい。 Note that the network monitoring apparatus 10 according to the present embodiment may be realized by dedicated hardware, or configured by a computer system such as a personal computer, and each of the network monitoring apparatuses 10 shown in FIG. The function may be realized by executing a program for realizing the function.

また、図４、図５、図１４に示す各ステップを実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより、ネットワーク監視処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものであってもよい。
また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、フラッシュメモリ等の書き込み可能な不揮発性メモリ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。 Also, a program for realizing each step shown in FIGS. 4, 5, and 14 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into a computer system and executed. Thus, network monitoring processing may be performed. Here, the “computer system” may include an OS and hardware such as peripheral devices.
The “computer-readable recording medium” means a flexible disk, a magneto-optical disk, a ROM, a writable nonvolatile memory such as a flash memory, a portable medium such as a CD-ROM, a hard disk built in a computer system, etc. This is a storage device.

さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（例えばＤＲＡＭ（Dynamic Random Access Memory））のように、一定時間プログラムを保持しているものも含むものとする。
また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。
また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 Further, the “computer-readable recording medium” means a volatile memory (for example, DRAM (Dynamic DRAM) in a computer system that becomes a server or a client when a program is transmitted through a network such as the Internet or a communication line such as a telephone line. Random Access Memory)), etc., which hold programs for a certain period of time.
The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line.
The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

以上、本発明の実施形態を図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、本発明の要旨を逸脱しない範囲の設計変更等も含まれる。
例えば、上述の実施形態では、ネットワーク監視装置１０を単独の装置として構成したが、ネットワーク監視装置１０とルータとを一つの装置に統合してもよい。例えば、パーソナルコンピュータにネットワーク監視装置１０の機能とルータの機能とを実装するようにしてもよい。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes design changes and the like within a scope not departing from the gist of the present invention.
For example, in the above-described embodiment, the network monitoring device 10 is configured as a single device, but the network monitoring device 10 and the router may be integrated into one device. For example, the function of the network monitoring device 10 and the function of the router may be mounted on a personal computer.

本発明の一実施形態に係る監視対象のネットワーク構成例を示した図である。It is the figure which showed the example of a network structure of the monitoring object which concerns on one Embodiment of this invention. 本発明の一実施形態に係るネットワーク監視装置１０の構成を示すブロック図である。It is a block diagram which shows the structure of the network monitoring apparatus 10 which concerns on one Embodiment of this invention. 図２に示すLSDB管理部１３で保持されるLSDBの構成例である。It is a structural example of LSDB hold | maintained by the LSDB management part 13 shown in FIG. 本発明の一実施形態の障害判定に係る全体処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the whole process which concerns on the failure determination of one Embodiment of this invention. 図４に示す障害判定処理（ステップＳ６）の手順を示すフローチャートである。It is a flowchart which shows the procedure of the failure determination process (step S6) shown in FIG. 本発明の一実施形態の障害判定処理を説明するための第１の具体例を示す説明図である。It is explanatory drawing which shows the 1st specific example for demonstrating the failure determination process of one Embodiment of this invention. 図６の具体例におけるLSDBの第１の状態を説明するための説明図である。It is explanatory drawing for demonstrating the 1st state of LSDB in the specific example of FIG. 図６の具体例におけるLSDBの第２の状態を説明するための説明図である。It is explanatory drawing for demonstrating the 2nd state of LSDB in the specific example of FIG. 本発明の一実施形態の障害判定処理を説明するための第２の具体例を示す説明図である。It is explanatory drawing which shows the 2nd specific example for demonstrating the failure determination process of one Embodiment of this invention. 図９の具体例におけるLSDBの第１の状態を説明するための説明図である。It is explanatory drawing for demonstrating the 1st state of LSDB in the specific example of FIG. 図９の具体例におけるLSDBの第２の状態を説明するための説明図である。It is explanatory drawing for demonstrating the 2nd state of LSDB in the specific example of FIG. 図９の具体例におけるLSDBの第３の状態を説明するための説明図である。It is explanatory drawing for demonstrating the 3rd state of LSDB in the specific example of FIG. 図９の具体例におけるLSDBの第４の状態を説明するための説明図である。It is explanatory drawing for demonstrating the 4th state of LSDB in the specific example of FIG. 本発明の一実施形態に係るネットワーク分断判定処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the network division | segmentation determination process which concerns on one Embodiment of this invention.

Explanation of symbols

１０…ネットワーク監視装置、１１…LSA収集部、１２…LSA管理部、１３…LSDB管理部、１４…障害・復旧判定部、１５…障害管理テーブル、１６…判定結果格納部、１７…ネットワーク分断判定部

DESCRIPTION OF SYMBOLS 10 ... Network monitoring apparatus, 11 ... LSA collection part, 12 ... LSA management part, 13 ... LSDB management part, 14 ... Failure / recovery determination part, 15 ... Failure management table, 16 ... Determination result storage part, 17 ... Network division | segmentation determination Part

Claims

A message collection means for collecting a routing protocol link status advertisement message from the IP network;
Database management means for creating a link state database having state values for managing the difference information of the collected link state advertisement messages;
A determination unit that refers to the link state database and performs a failure determination using a determination criterion based on an arrival pattern of a link state advertisement message corresponding to an assumed failure; and
The determination means performs a two-stage failure determination with a time difference considering the arrival delay of the link state advertisement message.
A network monitoring device.

A standby message recording means for recording a link status advertisement message that is expected to arrive late from the arrival pattern of the link status advertisement message collected at the time of the previous failure determination, as a standby link status advertisement message,
The determination means performs a later failure determination when the recorded standby link state advertisement message is received.
The network monitoring apparatus according to claim 1.

The network monitoring apparatus according to claim 1, wherein the determination unit performs a failure determination according to a predetermined determination order for each type of failure that may occur.

4. The method according to claim 1, wherein the determination unit adopts a failure determination result performed later when a failure determination result performed earlier and a failure determination result performed later are different. 5. The network monitoring device according to the section.

5. The network monitoring device according to claim 1, further comprising a network partitioning determination unit configured to determine whether network partitioning has occurred at the time of the failure determination.

6. The failure recovery determination means for determining that the failure has been recovered when the link information deleted due to the failure is received again by the link status advertisement message. The network monitoring device according to any of the above sections.

Collecting routing protocol link status advertisement messages from the IP network;
Creating a link state database having state values for managing the difference information of the collected link state advertisement messages;
Referring to the link state database and performing a failure determination using a determination criterion based on an arrival pattern of a link state advertisement message corresponding to an assumed failure;
A process of performing the failure determination in two steps with a time difference considering the arrival delay of the link state advertisement message;
A network monitoring method comprising:

The ability to collect routing protocol link status advertisement messages from the IP network;
A function of creating a link state database having a state value for managing difference information of the collected link state advertisement message;
A function of referring to the link state database and performing a failure determination using a determination criterion based on an arrival pattern of a link state advertisement message corresponding to an assumed failure;
A function of performing the failure determination in two stages with a time difference considering the arrival delay of the link state advertisement message;
A computer program for causing a computer to realize the above.