JP2012191526A

JP2012191526A - Network failure detection system, network failure detection method, and network failure detection program

Info

Publication number: JP2012191526A
Application number: JP2011054778A
Authority: JP
Inventors: Hiroki Toshima; 宏樹戸嶋; Hiroyuki Seki; 博幸関
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2011-03-11
Filing date: 2011-03-11
Publication date: 2012-10-04

Abstract

PROBLEM TO BE SOLVED: To efficiently and quickly detect a failure in a wide-area Ether network.SOLUTION: A network failure detection system 100 is configured by: packet capture means 110 which is connected to a monitoring domain different from existing domains in a wide-area Ether network 10; and failure determination means 111 which counts the number of packets transmitted to addresses other than an own address among packets received by the packet capture means 110 in a predetermined time, and when the counted number of packets exceeds a predetermined threshold, transmits notification of failure detection to a predetermined terminal 200 of which address is stored in a storage unit 101 in advance.

Description

本発明は、ネットワーク障害検知システム、ネットワーク障害検知方法、およびネットワーク障害検知プログラムに関するものであり、具体的には、広域イーサネットワークにおける障害を効率的かつ迅速に検知する技術に関する。 The present invention relates to a network failure detection system, a network failure detection method, and a network failure detection program. Specifically, the present invention relates to a technique for efficiently and quickly detecting a failure in a wide area Ethernet network.

ネットワーク機器を監視する場合、監視装置からのｐｉｎｇ確認やＳＮＭＰ（Simple Network Management Protocol）による情報収集が行われている。ｐｉｎｇは、ＩＣＭＰ（Internet Control Message Protocol）のエコーリクエストを受信したノードがエコー応答を返信する規定を用いたものである。この場合、ネットワーク疎通の確認対象となるホストに対しＩＰパケットが発行され、該パケットに対応した返答を受信できたか確認することで障害検知がなされる。ｐｉｎｇコマンドが正常に実行できれば、ホスト間のネットワークは正常であると判断される。 When monitoring a network device, ping confirmation from the monitoring device and information collection by SNMP (Simple Network Management Protocol) are performed. Ping uses a rule that a node that receives an ICMP (Internet Control Message Protocol) echo request returns an echo response. In this case, an IP packet is issued to a host whose network communication is to be confirmed, and a failure is detected by confirming whether a response corresponding to the packet has been received. If the ping command can be executed normally, it is determined that the network between the hosts is normal.

また、ＳＮＭＰ(Simple Network Management Protocol)は、ＵＤＰ／ＩＰベースのネットワーク監視を行うプロトコルであり、ネットワーク機器に関しては、各ポート上で送受信されたパケット数、エラーパケット数、ポートの状態等を監視する。この場合、監視対象内にＳＮＭＰエージェントを予め配置しておき、このＳＮＭＰエージェントが収集した情報（MIB：Management information Base）から、監視対象の状態を把握することができる。 SNMP (Simple Network Management Protocol) is a protocol that performs UDP / IP-based network monitoring. For network devices, the number of packets transmitted and received on each port, the number of error packets, the state of the port, and the like are monitored. . In this case, an SNMP agent is arranged in advance in the monitoring target, and the status of the monitoring target can be grasped from information (MIB: Management information Base) collected by the SNMP agent.

この他にもネットワーク障害の監視技術としては次のようなものがある。例えば、ネットワークを介して互いに接続され、各々がネットワークの、あるドメインの障害を管理する複数の管理装置を含む、ネットワーク管理システムであって、前記複数の管理装置のうちの１つが前記管理するドメインで障害を検知した際、前記管理装置は前記障害の原因を特定することを試み、前記管理装置が前記管理するドメインで前記障害の原因を特定できなかった場合、前記ネットワーク内の他の管理装置と共同して、前記他の管理装置のドメインで前記障害の原因の特定を試みることを特徴とする、ネットワーク管理システム（特許文献１参照）などが提案されている。 Other network fault monitoring techniques include the following. For example, a network management system that includes a plurality of management devices that are connected to each other via a network and that each manage a failure of a certain domain of the network, the domain managed by one of the plurality of management devices When a failure is detected in the management device, the management device tries to identify the cause of the failure, and when the management device cannot identify the cause of the failure in the domain managed by the management device, the other management device in the network A network management system (see Patent Document 1) and the like, which is characterized by trying to identify the cause of the failure in the domain of the other management device in cooperation with the above, has been proposed.

また、各々に複数の端末が接続された複数の中継装置にてネットワークを構成し、該複数の端末の１つが該ネットワークの管理端末として機能する通信ネットワークにおいて、前記管理端末より前記複数の端末に対して第１のパケットを所定順に巡回させる手段と、前記第１のパケットが前記巡回後に前記管理端末へ戻るか否かを検知する検知手段と、前記第１のパケットが戻らない場合、前記複数の端末各々に第２のパケットを送出する送出手段と、前記管理端末にて、前記第２のパケットに対する前記複数の端末からの応答パケットを受信する手段と、前記管理端末における前記応答パケットの受信状態をもとに、前記ネットワークでの障害位置を特定する手段とを備えることを特徴とする通信ネットワーク（特許文献２参照）なども提案されている。 Further, in a communication network in which a plurality of relay devices each having a plurality of terminals connected thereto constitute a network, and one of the plurality of terminals functions as a management terminal of the network, the management terminal transfers the plurality of terminals to the plurality of terminals. Means for circulating the first packet in a predetermined order; detection means for detecting whether the first packet returns to the management terminal after the circulation; and when the first packet does not return, Sending means for sending a second packet to each of the terminals; means for receiving response packets from the plurality of terminals for the second packet at the management terminal; and receiving the response packet at the management terminal A communication network (refer to Patent Document 2) characterized by comprising means for identifying a fault location in the network based on the status It has been proposed.

また、ネットワークモニタモードに設定される第１のＬＡＮ送受信制御装置と、通常のＬＡＮ送受信モードに設定され、障害発生通知電文の受信を監視する第２のＬＡＮ送受信制御装置と、ＬＡＮ上の電文を蓄積記憶するロギングメモリと、前記第２のＬＡＮ送受信制御装置で前記障害発生通知電文を受信すると前記第１のＬＡＮ送受信制御装置のネットワークモニタモードを停止させ、前記ロギングメモリから蓄積された電文をロギングデータとして読み出すＣＰＵとを有することを特徴とするＬＡＮ障害監視装置（特許文献３参照）なども提案されている。 In addition, the first LAN transmission / reception control device set in the network monitor mode, the second LAN transmission / reception control device set in the normal LAN transmission / reception mode and monitoring the reception of the failure notification message, the message on the LAN When the failure notification message is received by the logging memory for storing and storing and the second LAN transmission / reception control device, the network monitor mode of the first LAN transmission / reception control device is stopped, and the messages stored from the logging memory are logged. A LAN fault monitoring device (see Patent Document 3) characterized by having a CPU that reads data as data has also been proposed.

特開平１０−３４０２３５号公報Japanese Patent Laid-Open No. 10-340235 特開平９−８３５５６号公報JP-A-9-83556 特開平６−１３２９６０号公報JP-A-6-132960

ここで例えば、障害監視対象が広域イーサネットワークである場合を想定する。そして、あるドメインでループ障害などフラッディングやブロードキャストの原因が生じ、そのために異常トラフィックによるネットワーク内の負荷増大が発生したとする。こうした状況下において、上述したような従来のｐｉｎｇやＳＮＭＰ等の手法を適用しようとしても、広域イーサネットワーク中の膨大な数のドメイン全てについて検知作業を実行することは困難である。また、各ドメインの管理者等が自ドメインに加えて広域イーサネットワーク全体の監視を個々に行うことにも無理がある。いずれにせよ、自ドメインでの問題発生に気がついたユーザ等から何らかの申告があるまで、障害検知が難しい状況と言える。 Here, for example, it is assumed that the fault monitoring target is a wide area Ethernet work. Then, it is assumed that a cause of flooding or broadcasting such as a loop failure occurs in a certain domain, which causes an increase in load in the network due to abnormal traffic. Under such circumstances, it is difficult to perform detection work for all of a huge number of domains in the wide-area Ethernet work even if it is attempted to apply the conventional methods such as ping and SNMP as described above. It is also impossible for the administrator of each domain to individually monitor the entire wide area Ethernet work in addition to its own domain. In any case, it can be said that it is difficult to detect a failure until a report is made by a user who notices a problem in his / her domain.

そこで本発明の目的は、広域イーサネットワークにおける障害を効率的かつ迅速に検知する技術を提供することにある。 Accordingly, an object of the present invention is to provide a technique for efficiently and quickly detecting a failure in a wide area Ethernet work.

上記課題を解決する本発明のネットワーク障害検知システムは、広域イーサネットワークにおいて既存ドメインとは異なる監視用ドメインに接続したパケットキャプチャ手段と、前記パケットキャプチャ手段が一定時間に受信したパケットのうち、自アドレス宛て以外のパケット数をカウントし、該カウントしたパケット数が所定閾値を超えた場合に、予め記憶部にてアドレスを保持する所定端末に宛てて、障害検知の通知を送信する障害判定手段と、を備えることを特徴とする。 The network failure detection system of the present invention that solves the above-mentioned problems is characterized in that a packet capture unit connected to a monitoring domain different from an existing domain in a wide area Ethernet network, and a self-address among packets received by the packet capture unit at a predetermined time A failure determination unit that counts the number of packets other than the destination and, when the counted number of packets exceeds a predetermined threshold, sends a failure detection notification to a predetermined terminal that holds an address in the storage unit in advance; It is characterized by providing.

また、本発明のネットワーク障害検知方法は、広域イーサネットワークに通信可能に接続された情報処理装置が、前記広域イーサネットワークにおいて既存ドメインとは異なる監視用ドメインのパケットをキャプチャする処理と、前記キャプチャの処理により一定時間に受信したパケットのうち、自アドレス宛て以外のパケット数をカウントし、該カウントしたパケット数が所定閾値を超えた場合に、予め記憶部にてアドレスを保持する所定端末に宛てて、障害検知の通知を送信する処理と、を実行することを特徴とする。 The network failure detection method of the present invention includes a process in which an information processing apparatus connected to a wide area Ethernet work so as to be communicable captures a packet in a monitoring domain different from an existing domain in the wide area Ethernet work, The number of packets other than those addressed to the self-address is counted among the packets received at a fixed time by the processing, and when the counted number of packets exceeds a predetermined threshold, the packet is addressed to a predetermined terminal that holds an address in advance in the storage unit And a process of transmitting a failure detection notification.

また、本発明のネットワーク障害検知プログラムは、広域イーサネットワークに通信可能に接続された情報処理装置に、前記広域イーサネットワークにおいて既存ドメインとは異なる監視用ドメインのパケットをキャプチャする処理と、前記キャプチャの処理により一定時間に受信したパケットのうち、自アドレス宛て以外のパケット数をカウントし、該カウントしたパケット数が所定閾値を超えた場合に、予め記憶部にてアドレスを保持する所定端末に宛てて、障害検知の通知を送信する処理と、を実行させることを特徴とする。 Further, the network failure detection program of the present invention includes a process for capturing a packet in a monitoring domain different from an existing domain in the wide area Ethernet work, to an information processing apparatus that is communicably connected to the wide area Ethernet work; The number of packets other than those addressed to the self-address is counted among the packets received at a fixed time by the processing, and when the counted number of packets exceeds a predetermined threshold, the packet is addressed to a predetermined terminal that holds an address in advance in the storage unit And a process of transmitting a failure detection notification.

本発明によれば、広域イーサネットワークにおける障害を効率的かつ迅速に検知することが可能となる。 According to the present invention, it is possible to efficiently and quickly detect a failure in a wide area Ethernet work.

本実施形態のネットワーク障害検知システムを含むネットワーク構成図である。It is a network block diagram including the network failure detection system of this embodiment. 本実施形態におけるパケットキャプチャ装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the packet capture apparatus in this embodiment. パケット構造例を示す説明図である。It is explanatory drawing which shows the packet structure example. 本実施形態における通知先テーブルのデータ構造例を示す図である。It is a figure which shows the example of a data structure of the notification destination table in this embodiment. 本実施形態における障害事例例１を示すフロー図である。It is a flowchart which shows the example 1 of a failure in this embodiment. 本実施形態における障害事例例２を示すフロー図である。It is a flowchart which shows the example 2 of a failure in this embodiment. 本実施形態におけるネットワーク障害検知方法の処理手順例を示すフロー図である。It is a flowchart which shows the example of a process sequence of the network failure detection method in this embodiment.

−−−システム構成−−−
以下に本発明の実施形態について図面を用いて詳細に説明する。図１は、本実施形態のネットワーク障害検知システムを含むネットワーク構成図である。図１に示すネットワーク障害検知システムは、広域イーサネットワーク１０における障害を効率的かつ迅速に検知するコンピュータシステムである。本実施形態では、一例として、パケットキャプチャ装置によりネットワーク障害検知システム１００を実現した例を想定している。従って、以降はパケットキャプチャ装置１００をネットワーク障害検知システムとして説明を行う。 --- System configuration ---
Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a network configuration diagram including the network failure detection system of the present embodiment. The network failure detection system shown in FIG. 1 is a computer system that detects a failure in the wide area Ethernet work 10 efficiently and quickly. In the present embodiment, as an example, it is assumed that the network failure detection system 100 is realized by a packet capture device. Therefore, hereinafter, the packet capture device 100 will be described as a network failure detection system.

前記パケットキャプチャ装置１００が接続される広域イーサネットワーク１０は、各ドメインにおける各ユーザのイーサネットフレーム＝パケットを透過的に転送し、遠隔地の拠点間を接続できるネットワークである。図１に示した例であれば、拠点Ａ〜Ｃのドメインが存在し、各拠点で生じたパケットは、広域イーサネットワーク１０を介して拠点を跨って送受される。１つの広域イーサネットワーク１０で複数ユーザのパケットを転送するため、ユーザのパケットを分離する必要があり、その場合、ＶＬＡＮ（Virtual LAN）を利用する状況が想定できる。広域イーサネットワーク１０中で、IEEE802.1QのＶＬＡＮタグによってユーザのパケットが分離される。 The wide-area Ethernet work 10 to which the packet capture device 100 is connected is a network that can transparently transfer Ethernet frames = packets of each user in each domain and connect remote bases. In the example shown in FIG. 1, there are domains of bases A to C, and packets generated at each base are transmitted and received across the bases via the wide area Ethernet work 10. Since a single wide area Ethernet work 10 transfers a plurality of users 'packets, it is necessary to separate the users' packets. In this case, a situation in which a VLAN (Virtual LAN) is used can be assumed. In the wide area Ethernet work 10, user packets are separated by IEEE802.1Q VLAN tags.

こうした広域イーサネットワーク１０は、パケットを透過的に転送する性質から、拠点間のルーティングプロトコルにも制限が無く、既存ネットワークでＲＩＰ（Routing information Protocol）やＯＳＰＦ（Open Shortest Path First）を利用している場合、広域イーサネットに移行してもルーティングプロトコルの構成をほとんど変更することなく、拠点間でのダイナミックルーティングを行うことが可能である。 The wide-area Ethernet work 10 has no restriction on the routing protocol between bases due to the property of transparently transferring packets, and uses RIP (Routing Information Protocol) and OSPF (Open Shortest Path First) in the existing network. In this case, it is possible to perform dynamic routing between bases with almost no change in the configuration of the routing protocol even after shifting to wide area Ethernet.

こうした特徴を有する広域イーサネットワーク１０においては、或る拠点でループ障害などで異常トラフィックが発生すると、ネットワーク内でブロードキャスト通信やフラッディング通信が大量に発生してしまう。この通信は、ネットワークに繋がっている全ての拠点に伝播することとなる。そこで本実施形態のパケットキャプチャ装置１００を広域イーサネットワーク１０における所定の監視用ドメインに接続し、こうした異常事態を監視することになる。 In the wide area Ethernet work 10 having such characteristics, when abnormal traffic occurs due to a loop failure or the like at a certain site, a large amount of broadcast communication and flooding communication occur in the network. This communication is propagated to all bases connected to the network. Therefore, the packet capture device 100 of this embodiment is connected to a predetermined monitoring domain in the wide area Ethernet work 10 to monitor such an abnormal situation.

例えば、拠点Ａのルータ３００（図中、“ルータＡ”）は、アドレスとして“192.168.0.1”が設定され、拠点Ｂのルータ３００（図中、“ルータＢ”）は、アドレスとして“192.168.0.2”が設定され、拠点Ｃのルータ３００（図中、“ルータＣ”）は、アドレスとして“192.168.0.3”が設定されているとする。また、前記パケットキャプチャ装置１００は、前記各拠点Ａ〜Ｃのドメインとは別ドメインで監視用のドメインとして、“192.168.0.10”に接続されている。 For example, the router 300 at the base A (“Router A” in the figure) has “192.168.0.1” as the address, and the router 300 at the base B (“Router B” in the figure) has the address “192.168.”. Assume that “0.2” is set, and the router 300 (“router C” in the figure) at the base C has “192.168.0.3” as the address. The packet capture device 100 is connected to “192.168.0.10” as a monitoring domain in a domain different from the domains of the respective bases A to C.

この場合、拠点Ａ宛てのパケットは、該パケットを受信したルータ等により、パケットの宛先ＩＰアドレス（図３参照）を参照され、該ルータから前記アドレス“192.168.0.1”へ転送され、同様に、拠点Ｂ宛てのパケットはアドレス“192.168.0.2”へ転送され、また、拠点Ｃ宛てのパケットは“192.168.0.3”へ転送される。 In this case, the packet addressed to the site A is referred to the destination IP address (see FIG. 3) of the packet by the router or the like that has received the packet, and is forwarded from the router to the address “192.168.0.1”. The packet addressed to the site B is transferred to the address “192.168.0.2”, and the packet addressed to the site C is transferred to “192.168.0.3”.

従って、広域イーサネットワーク内のルーティング情報には、パケットキャプチャ装置１００＝“192.168.0.10”に転送する情報は無い。つまり、通常、パケットキャプチャ装置１００のアドレス宛以外のパケットは届かない。届くとすれば、ブロードキャスト通信やフラッディング通信と判断できるのである。 Therefore, there is no information transferred to the packet capture device 100 = “192.168.0.10” in the routing information in the wide area Ethernet network. That is, normally, packets other than those addressed to the address of the packet capture device 100 do not arrive. If it arrives, it can be determined as broadcast communication or flooding communication.

パケットキャプチャ装置１００の備えるパケットキャプチャ機能としては、例えば、ＬＡＮスイッチ等のミラーリング機能を用いた手法となる。このミラーリング機能はＬＡＮスイッチ等の特定のポート上で送受信するパケットを、別のポートにコピーする機能である。既存のパケットキャプチャ機能として、キャプチャトラフィックの対象となるミラーリングするポートを複数指定したり、ＶＬＡＮベースでキャプチャー対象を指定することも可能である。通常、ＮＩＣではパケットの宛先ＭＡＣアドレスをチェックして、自分あてのパケット以外は受信しないよう設定されているが、本実施形態のパケットキャプチャ装置１００では当然ながら自分宛ではないパケットも受信するよう設定がなされている。 As a packet capture function provided in the packet capture device 100, for example, a technique using a mirroring function such as a LAN switch is used. This mirroring function is a function for copying a packet transmitted / received on a specific port such as a LAN switch to another port. As an existing packet capture function, it is also possible to specify a plurality of mirroring ports to be captured traffic or to specify a capture target on a VLAN basis. Normally, the NIC checks the destination MAC address of the packet and is set not to receive any packets other than the packet addressed to itself. Of course, the packet capture device 100 of the present embodiment is set to receive a packet not addressed to itself. Has been made.

続いて、パケットキャプチャ装置１００の構成について説明する。図２は本実施形態におけるパケットキャプチャ装置１００の構成例を示すブロック図である。本実施形態のパケットキャプチャ装置１００は、例えば、一般的なパーソナルコンピュータに、パケットキャプチャアプリケーションをインストールしたものを想定できる。従ってこの場合、パケットキャプチャ装置１００は、ＨＤＤなどの不揮発性記憶装置で構成される記憶部１０１、プログラム１０２やテーブルを格納した記憶部１０１、ＲＡＭなど揮発性記憶装置たるメモリ１０３、前記記憶部１０１に格納されたプログラム１０２を前記メモリ１０３に読み出して実行するＣＰＵなどの演算部１０４、広域イーサネットワーク１０に接続するためのＮＩＣなど通信部１０５、キーボードやマウス等を介してユーザからの指示を受け付ける入力部１０６、処理結果の出力を行うディスプレイやスピーカ等を制御する出力部１０７から構成されている。また、これら各構成要素はＢＵＳ、ブリッジ等で接続されている。 Next, the configuration of the packet capture device 100 will be described. FIG. 2 is a block diagram illustrating a configuration example of the packet capture device 100 according to the present embodiment. For example, the packet capture device 100 according to the present embodiment can be assumed to have a packet capture application installed in a general personal computer. Accordingly, in this case, the packet capture device 100 includes a storage unit 101 configured by a nonvolatile storage device such as an HDD, a storage unit 101 storing a program 102 and a table, a memory 103 as a volatile storage device such as a RAM, and the storage unit 101. An instruction from a user is received via a computation unit 104 such as a CPU that reads and executes the program 102 stored in the memory 103 and executes it, a communication unit 105 such as a NIC for connecting to the wide area Ethernet work 10, a keyboard, a mouse, and the like. An input unit 106 and an output unit 107 that controls a display, a speaker, and the like that output processing results are configured. Each of these components is connected by a BUS, a bridge, or the like.

上記パケットキャプチャ装置１００が備える機能としては、前記監視用ドメイン“192.168.0.10”に流れ込むパケットをキャプチャするパケットキャプチャ部１１０を備えている。このパケットキャプチャ部１１０は、上述したように、通常は自身宛のパケット以外は受信する機会が無いはずだが、他の拠点でフラッディング等が生じた場合にはそれに由来するパケットを受信することとなる。キャプチャ手法としては既存のパケットキャプチャ技術を採用すればよい。 The packet capture device 100 includes a packet capture unit 110 that captures packets flowing into the monitoring domain “192.168.0.10”. As described above, the packet capture unit 110 should normally have no opportunity to receive packets other than those addressed to itself. However, if flooding or the like occurs at another site, the packet capture unit 110 receives a packet derived therefrom. . An existing packet capture technique may be employed as the capture method.

また、前記パケットキャプチャ装置１００は、前記パケットキャプチャ部１１０が一定時間に受信したパケットのうち、自アドレス宛て以外のパケット数をカウントし、該カウントしたパケット数が所定閾値を超えた場合に、予め記憶部１０１の通知先テーブル１２５にてアドレスを保持する管理者端末２００に宛てて、障害検知の通知を送信する障害判定部１１１を備える。なお、ここで説明した各機能１１０，１１１は、例えばパケットキャプチャ装置１００が備えるプログラム１０２を実行することで実装される機能と言える。 In addition, the packet capture device 100 counts the number of packets other than those addressed to its own address among the packets received by the packet capture unit 110 at a predetermined time, and when the counted number of packets exceeds a predetermined threshold, A failure determination unit 111 that transmits a failure detection notification to the administrator terminal 200 that holds an address in the notification destination table 125 of the storage unit 101 is provided. Note that the functions 110 and 111 described here can be said to be implemented by executing the program 102 provided in the packet capture device 100, for example.

−−−データ構造例−−−
次に、本実施形態のネットワーク障害検知システムであるパケットキャプチャ装置１００が用いるテーブルにおけるデータ構造例について説明する。図４は本実施形態における通知先テーブル１２５のデータ構造例を示す図である。この通知先テーブル１２５は、パケットキャプチャ装置１００の障害判定部１１１が利用する、障害検知通知の送信先を特定するためのテーブルである。当該通知先テーブル１２５は、例えば、障害検知の通知を送信する対象者を示す通知先（図中、“管理者１”、“管理者２”・・・）と、そのメールアドレスを対応付けたものとなっている。勿論、通知先としては、こうしたメールアドレスに限定するものではなく、その他のネットワーク経由の通知手段であればいずれのものでもよい。図で示した通知先テーブル１２５のように、前記通知の対象者を複数設定されている場合、前記障害判定部１１１は、例えば、前記カウントしたパケット数が所定閾値を超えた程度に応じて通知対象者を選定し（例：閾値を１０％以上越えた時に“管理者１”を選定、２０％を越えた時に“管理者２”を選定）、該当者のアドレスに宛てて前記通知を送信するといった制御を行っても良い。 --- Data structure example ---
Next, an example of a data structure in a table used by the packet capture device 100 which is the network failure detection system of the present embodiment will be described. FIG. 4 is a diagram showing an example of the data structure of the notification destination table 125 in this embodiment. This notification destination table 125 is a table used by the failure determination unit 111 of the packet capture device 100 to specify the transmission destination of the failure detection notification. The notification destination table 125 associates, for example, notification destinations (“manager 1”, “manager 2”,... It has become a thing. Of course, the notification destination is not limited to such an e-mail address, and any other notification means via a network may be used. When a plurality of persons to be notified are set as in the notification destination table 125 shown in the figure, the failure determination unit 111 notifies, for example, according to the degree that the counted number of packets exceeds a predetermined threshold. Select the target person (example: “Administrator 1” is selected when the threshold is exceeded by 10% or more, “Administrator 2” is selected when the threshold is exceeded by 20%), and the notification is sent to the address of the person concerned You may control to do.

−−−フラッディングの例−−−
続いて、本実施形態のパケットキャプチャ装置１００が検知するフラッディングの例について説明する。図５は本実施形態における障害事例例１を示すフロー図であり、図６は本実施形態における障害事例例２を示すフロー図である。この場合、回線障害に備え、拠点Ａに２本の回線を敷設している。この拠点Ａのルーティング設計を間違えていた場合、他拠点との通信で行きと帰りの通信経路が異なる場合がある。例えば、行きは回線ａ、帰りは回線ｂとなった場合、通信は成立するが、広域イーサネット内のＦＤＢ(forwarding database)がいつまで経っても生成されないことになる。よって、拠点Ａのドメインと拠点Ｂのドメインの間の通信は全てフラッディングする。網内でフラッディングした通信は、全く関係ない拠点Ｃにも送信される。拠点Ａ、Ｂの回線の帯域が、例えば１００Ｍｂｐｓで、拠点Ｃの回線の帯域が１Ｍｂｐｓだった場合、拠点Ｃの回線帯域は前記拠点Ａに由来するフラッディングに食いつぶされてしまい、通信が不安定もしくは全く通信できなくなる。 --- Example of flooding ---
Next, an example of flooding detected by the packet capture device 100 of this embodiment will be described. FIG. 5 is a flowchart showing failure example 1 in the present embodiment, and FIG. 6 is a flowchart showing failure example 2 in the present embodiment. In this case, two lines are laid at the site A in preparation for a line failure. If the routing design of this site A is wrong, the outgoing and return communication paths may be different in communication with other sites. For example, if the outgoing line is the line a and the return line is the line b, the communication is established, but the FDB (forwarding database) in the wide area Ethernet is not generated any time. Therefore, all communication between the domain of the base A and the domain of the base B is flooded. Communications flooded within the network are also sent to the base C, which has nothing to do with it. If the bandwidth of the lines at sites A and B is, for example, 100 Mbps, and the bandwidth of the line at site C is 1 Mbps, the line bandwidth at site C is engulfed by flooding originating from the site A, and communication is unstable. Or you can not communicate at all.

また、拠点Ａと拠点Ｂの間で、同じセグメントである“Ｓｅｇ．Ｄ”を持っている。（広域イーサネットでルーティングできないローカルアドレスを利用している場合など）Ｓｅｇ．Ｄは広域イーサネットワーク１０をまたぐように、ＴａｇＶＬＡＮ５００で延長されている。拠点ＡのＳｅｇ．Ｄでループ障害が発生した場合、ブロードキャストが大量に複製される。複製されたブロードキャストはＴａｇＶＬＡＮを通して、広域イーサネットワーク１０に流れ込む。このブレードキャストによるパケットを、広域イーサネットワーク１０中のＬ２ＳＷも、ブロードキャストとして網内全体に転送する。これにより、網内が不安定になってしまう。 Further, the base A and the base B have the same segment “Seg. D”. (For example, when using a local address that cannot be routed by wide area Ethernet) Seg. D is extended by TagVLAN 500 so as to straddle the wide area Ethernet work 10. Site A Seg. If a loop failure occurs at D, the broadcast is replicated in large quantities. The replicated broadcast flows into the wide area Ethernet work 10 through the Tag VLAN. The L2SW in the wide-area Ethernet work 10 also transfers this blade cast packet as a broadcast throughout the network. As a result, the inside of the network becomes unstable.

−−−処理手順例−−−
以下、本実施形態におけるネットワーク障害検知方法の実際手順について図に基づき説明する。以下で説明するネットワーク障害検知方法に対応する各種動作は、ネットワーク障害検知システムたる前記パケットキャプチャ装置１００がメモリ１０３に読み出して実行するプログラム１０２によって実現されることが想定できる。そして、このプログラム１０２は、以下に説明される各種の動作を行うためのコードから構成されている。勿論、パケットキャプチャ装置１００の備える各機能をハードウェアとして実現するとしてもよい。 --- Processing procedure example ---
Hereinafter, an actual procedure of the network failure detection method according to the present embodiment will be described with reference to the drawings. Various operations corresponding to the network failure detection method described below can be assumed to be realized by a program 102 that the packet capture device 100 as a network failure detection system reads out and executes in the memory 103. And this program 102 is comprised from the code | cord | chord for performing the various operation | movement demonstrated below. Of course, each function of the packet capture device 100 may be realized as hardware.

図７は、本実施形態におけるネットワーク障害検知方法の処理手順例を示すフロー図である。ここで前記パケットキャプチャ装置１００のパケットキャプチャ部１１０は、広域イーサネットワーク１０から流れてくるパケットを受信する（ｓ１００）。この時、パケットキャプチャ装置１００の障害判定部１１１は、前記パケットキャプチャ部１１０が受信したパケットのＩＰヘッダーを確認し（ｓ１０１）、宛先ＩＰアドレスが、パケットキャプチャ装置１００のＩＰアドレス“192.168.0.10”か判定する（ｓ１０２）。 FIG. 7 is a flowchart illustrating an example of a processing procedure of the network failure detection method according to the present embodiment. Here, the packet capture unit 110 of the packet capture device 100 receives a packet flowing from the wide area Ethernet work 10 (s100). At this time, the failure determination unit 111 of the packet capture device 100 confirms the IP header of the packet received by the packet capture unit 110 (s101), and the destination IP address is the IP address “192.168.0.10” of the packet capture device 100. Is determined (s102).

前記ステップｓ１０２の判定において、受信したパケットの宛先ＩＰアドレスが、パケットキャプチャ装置１００のものであった場合（ｓ１０３：Ｙｅｓ）、パケットキャプチャ装置１００は処理をステップｓ１００に戻す。 If it is determined in step s102 that the destination IP address of the received packet is that of the packet capture device 100 (s103: Yes), the packet capture device 100 returns the process to step s100.

他方、前記ステップｓ１０２の判定において、受信したパケットの宛先ＩＰアドレスが、パケットキャプチャ装置１００のものでなかった場合（ｓ１０３：Ｎｏ）、パケットキャプチャ装置１００の障害判定部１１１は、過去１秒間など一定時間あたりの受信パケット量、すなわちパケットの流量(パケット/秒)をカウントし、この値が所定閾値（記憶部１０１に予め保持）を越えているか判定する（ｓ１０４）。閾値の例としては、５００ｐｐｓ(パケット/秒)が想定できるが、閾値の大小は網の規模に応じて変化する。 On the other hand, when the destination IP address of the received packet is not that of the packet capture device 100 in the determination of step s102 (s103: No), the failure determination unit 111 of the packet capture device 100 is constant for the past 1 second or the like. The amount of received packets per time, that is, the packet flow rate (packets / second) is counted, and it is determined whether this value exceeds a predetermined threshold (previously held in the storage unit 101) (s104). As an example of the threshold value, 500 pps (packets / second) can be assumed, but the magnitude of the threshold value changes according to the scale of the network.

広域イーサネットワーク１０に含まれるドメインの数が多く、高頻度で通信がなされている場合、正常時であっても前記パケット流量の値は大きくなる。逆に、広域イーサネットワーク１０に含まれるドメインの数が少なく低頻度でしか通信がなされていない場合、異常時であっても前記パケット流量の値は小さい。いずれにせよ、ブロードキャスト、フラッティングは常時発生しており、正常時に障害と判定されないよう考慮する必要がある。例えば、前記パケットキャプチャ装置１００が、過去の障害発生履歴を記憶部１０１に保持しておき、過去の障害発生時におけるパケット流量の値の平均値を定期的に算定し、該算定値を前記閾値に設定するといった処理を行えば好適である。 When the number of domains included in the wide area Ethernet work 10 is large and communication is performed at a high frequency, the value of the packet flow rate becomes large even in a normal state. On the contrary, when the number of domains included in the wide area Ethernet work 10 is small and communication is performed only infrequently, the value of the packet flow rate is small even at the time of abnormality. In any case, broadcast and flatting occur all the time, and it is necessary to consider that a failure is not determined when normal. For example, the packet capture device 100 stores a past failure occurrence history in the storage unit 101, periodically calculates an average value of packet flow rates at the time of a past failure occurrence, and calculates the calculated value as the threshold value. It is preferable to perform processing such as setting.

前記ステップｓ１０４の判定により、一定時間あたりの受信パケット量が所定閾値を越えていない場合（ｓ１０５：Ｎｏ）、パケットキャプチャ装置１００は処理を前記ステップｓ１００に戻す。他方、前記ステップｓ１０４の判定により、一定時間あたりの受信パケット量が所定閾値を越えていた場合（ｓ１０５：Ｙｅｓ）、パケットキャプチャ装置１００の障害判定部１１１は、記憶部１０１の通知先テーブル１２５を参照し、該通知先テーブル１２５で設定されているアドレスに宛てて、障害検知の通知（例：電子メール）を送信する（ｓ１０６）。上述したように、前記一定時間あたりの受信パケット数が所定閾値を超えた程度に応じて通知対象者を通知先テーブル１２５で選定し、該当者のアドレスに宛てて前記通知を送信するといった制御を行っても良い。 If it is determined in step s104 that the amount of received packets per certain time does not exceed the predetermined threshold (s105: No), the packet capture device 100 returns the process to step s100. On the other hand, if it is determined in step s104 that the amount of received packets per certain time exceeds a predetermined threshold (s105: Yes), the failure determination unit 111 of the packet capture device 100 stores the notification destination table 125 of the storage unit 101. The failure detection notification (eg, electronic mail) is transmitted to the address set in the notification destination table 125 (s106). As described above, a control is performed such that a notification target person is selected in the notification destination table 125 according to the degree that the number of received packets per predetermined time exceeds a predetermined threshold value, and the notification is transmitted to the address of the corresponding person. You can go.

なお、こうした障害検知の通知処理は、パケットキャプチャ装置１００においてＳＮＭＰエージェントを設定しておき、ＳＮＭＰトラップで実現することが考えられる。ＳＮＭＰトラップは、ＳＮＭＰエージェントが稼働する端末上で、あらかじめ指定したイベントが発生したり、閾値に達したときにＳＮＭＰマネージャすなわち前記管理者端末２００に送信される。 Such failure detection notification processing may be realized by setting an SNMP agent in the packet capture device 100 and using an SNMP trap. The SNMP trap is transmitted to the SNMP manager, that is, the manager terminal 200 when a predetermined event occurs on the terminal on which the SNMP agent operates or when a threshold value is reached.

前記障害検知の通知を管理者端末２００では、これをディスプレイ等に表示し、管理者の閲覧に供する。管理者はこの通知を閲覧して障害発生を認識し、必要な障害対応を実行することとなる。 The administrator terminal 200 displays the failure detection notification on a display or the like for browsing by the administrator. The administrator views this notification, recognizes the occurrence of the failure, and executes necessary failure handling.

以上、本発明を実施するための最良の形態などについて具体的に説明したが、本発明はこれに限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能である。 Although the best mode for carrying out the present invention has been specifically described above, the present invention is not limited to this, and various modifications can be made without departing from the scope of the invention.

こうした本実施形態によれば、広域イーサネットワークにおける障害を効率的かつ迅速に検知することが可能となる。 According to the present embodiment, it is possible to efficiently and quickly detect a failure in the wide area Ethernet work.

１０広域イーサネットワーク
１００ネットワーク障害検知システム（パケットキャプチャ装置）
１０１記憶部
１０２プログラム
１０３メモリ
１０４演算部
１０５通信部
１０６入力部
１０７出力部
１１０パケットキャプチャ部（パケットキャプチャ手段）
１１１障害判定部（障害判定手段）
１２５通知先テーブル
２００管理者端末（所定端末）
３００ルータ 10 Wide area Ethernet work 100 Network failure detection system (packet capture device)
DESCRIPTION OF SYMBOLS 101 Memory | storage part 102 Program 103 Memory 104 Operation part 105 Communication part 106 Input part 107 Output part 110 Packet capture part (packet capture means)
111 Failure determination unit (failure determination means)
125 Notification destination table 200 Administrator terminal (predetermined terminal)
300 routers

Claims

Packet capture means connected to a monitoring domain different from the existing domain in wide area Ethernet work,
A predetermined terminal that counts the number of packets that are not addressed to its own address among the packets received by the packet capture means at a predetermined time, and holds the address in advance in the storage unit when the counted number of packets exceeds a predetermined threshold Failure determination means for sending a failure detection notification to
A network failure detection system comprising:

An information processing device that is communicably connected to a wide area Ethernet network
A process of capturing a packet of a monitoring domain different from the existing domain in the wide area Ethernet network;
A predetermined terminal that counts the number of packets that are not addressed to its own address among the packets received in a certain time by the capture process, and holds the address in advance in the storage unit when the counted number of packets exceeds a predetermined threshold Processing to send notification of failure detection to
A network failure detection method characterized by executing:

To an information processing device that is communicably connected to a wide area Ethernet work,
A process of capturing a packet of a monitoring domain different from the existing domain in the wide area Ethernet network;
A predetermined terminal that counts the number of packets that are not addressed to its own address among the packets received in a certain time by the capture process, and holds the address in advance in the storage unit when the counted number of packets exceeds a predetermined threshold Processing to send notification of failure detection to
A network failure detection program characterized in that