JP5367002B2

JP5367002B2 - Monitoring server and monitoring program

Info

Publication number: JP5367002B2
Application number: JP2011072257A
Authority: JP
Inventors: 崇史鈴木; 峰樹尾形; 達也相原; 哲文花岡; 利広高橋; 香井筒
Original assignee: Nippon Telegraph and Telephone East Corp
Current assignee: Nippon Telegraph and Telephone East Corp
Priority date: 2011-03-29
Filing date: 2011-03-29
Publication date: 2013-12-11
Anticipated expiration: 2031-03-29
Also published as: JP2012209666A

Description

本発明は、通信システムにおける機器のサイレント故障を検査する監視サーバおよび監視プログラムに関する。 The present invention relates to a monitoring server and a monitoring program for inspecting a silent failure of a device in a communication system.

昨今の通信ネットワークの発達に伴い、故障した通信機器の特定や、その故障した通信機器への対応は重要な課題となっている。早期に故障した通信機器を特定し、その機器に対応することにより、通信ネットワークを安定して提供することができる。 With the recent development of communication networks, identification of a failed communication device and correspondence to the failed communication device have become important issues. A communication network can be stably provided by identifying a communication device that has failed at an early stage and dealing with the device.

通信機器の正常な稼働を確認するために、検査データを送信する方法がある（例えば、非特許文献１参照。）。例えば、検査対象の通信機器に、イーサネット（登録商標）ループバックや、ｐｉｎｇなどのコマンドを送信し、その応答がある場合に、通信機器は正常と判断することができる。 In order to confirm the normal operation of the communication device, there is a method of transmitting inspection data (see, for example, Non-Patent Document 1). For example, when a command such as Ethernet (registered trademark) loopback or ping is transmitted to the communication device to be inspected and there is a response, the communication device can be determined to be normal.

ITU-T Recommendation Y.1731 - OAM functions and mechansims for Ethernet based networksITU-T Recommendation Y.1731-OAM functions and mechansims for Ethernet based networks

しかしながら、上記の方法では、通信が可能なことは確認できるものの、データが正常に送受信されているかを判定することはできない。 However, although the above method can confirm that communication is possible, it cannot determine whether data is normally transmitted / received.

一般的に、スイッチは論理回路やＳＲＡＭなどを備える。高エネルギー中性子がこれらの半導体基板に衝突し、衝突で放出された重イオンが電流パルスを引き起こすことにより、これらの半導体基板において、データが反転する事象が発生する場合がある。 In general, the switch includes a logic circuit, an SRAM, and the like. When high-energy neutrons collide with these semiconductor substrates, and heavy ions emitted by the collision cause current pulses, an event of data inversion may occur in these semiconductor substrates.

このような事象が発生している状況で、イーサネットループバックや、ｐｉｎｇなどのコマンドを送信すると、その応答は、正常に返ってくる。しかし、このようなスイッチ内のメモリエラーを検出することはできない。本明細書において、このような通信機器内のメモリエラーを、サイレント故障と称する。 If a command such as Ethernet loopback or ping is transmitted in a situation where such an event occurs, the response is returned normally. However, such a memory error in the switch cannot be detected. In this specification, such a memory error in the communication device is referred to as a silent failure.

従って本発明の目的は、通信システムにおける機器のサイレント故障を検査する監視サーバおよび監視プログラムを提供することである。 Accordingly, an object of the present invention is to provide a monitoring server and a monitoring program for inspecting a silent failure of a device in a communication system.

上記課題を解決するために、本発明の第１の特徴は、複数の機器と、機器のサイレント故障を検査する監視サーバと、を備える通信システムにおける監視サーバに関する。本発明の第１の特徴に係る監視サーバは、サイレント故障の検査対象の機器の識別子を含む機器データを記憶する機器データ記憶部と、機器に送信する検査データを記憶する検査データ記憶部と、機器データの検査対象の機器に、複数の検査データを所定時間内に送信して当該機器において高トラフィック状態にし、当該機器によって当該検査データがコピーされ返信された応答データを取得するとともに、当該機器に送信した検査データと当該応答データとが、一致するか否かを判定し、一致しない場合、当該機器の識別子を結果データに記憶するエラー検査手段と、エラー検査手段において、いずれかの機器で一致しないと判定された場合、一致しないと判定された対象機器および当該対象機器に隣接する機器の上流側および下流側のそれぞれのＭＩＰに検査データを送信し、対象機器および隣接する機器によって当該検査データがコピーされ返信された応答データを取得するとともに、対象機器および隣接する機器の上流側および下流側にそれぞれ送信した検査データと当該応答データとが、一致するか否かを判定し、上流側と下流側とで判定結果が一致しない場合、当該ＭＩＰの識別子に基づいて、サイレント故障が発生した機器を特定する詳細エラー検査手段を備える。 In order to solve the above problems, a first feature of the present invention relates to a monitoring server in a communication system including a plurality of devices and a monitoring server that inspects a silent failure of the devices. The monitoring server according to the first aspect of the present invention includes a device data storage unit that stores device data including an identifier of a device to be inspected for silent failure, a test data storage unit that stores test data to be transmitted to the device, A plurality of inspection data is transmitted to the device to be inspected for the device data within a predetermined time, and the device is brought into a high traffic state, and the response data in which the inspection data is copied and returned by the device is acquired, and the device It is determined whether or not the inspection data transmitted to the response data and the response data match, and if they do not match, the error inspection means for storing the identifier of the device in the result data and the error inspection means When it is determined that they do not match, the upstream side and the downstream side of the target device that is determined not to match and the device adjacent to the target device. The inspection data is transmitted to each MIP, the response data is copied and returned by the target device and the adjacent device, and sent back to the upstream side and the downstream side of the target device and the adjacent device, respectively. Details for determining whether or not the inspection data and the response data match, and when the determination result does not match between the upstream side and the downstream side, details identifying the device in which the silent failure has occurred based on the identifier of the MIP An error checking means is provided.

ここで、機器の識別子とともに、通信システムのトポロジーを示すトポロジーデータを記憶するトポロジーデータ記憶部と、トポロジーデータに基づいて通信システムのネットワーク構成を表示装置に表示するとともに、結果データに含まれる機器の識別子を抽出し、抽出した機器の識別子に基づいて表示装置に警告を表示する表示手段を備えても良い。 Here, a topology data storage unit for storing topology data indicating the topology of the communication system together with the identifier of the device, and the network configuration of the communication system based on the topology data are displayed on the display device, and the device included in the result data is displayed. There may be provided display means for extracting the identifier and displaying a warning on the display device based on the extracted identifier of the device.

また、検査データ記憶部は、異なる複数の検査データが記憶され、エラー検査手段は、検査データ記憶部に記憶された複数の検査データをそれぞれ、検査対象の機器に送信しても良い。 The inspection data storage unit may store a plurality of different inspection data, and the error inspection unit may transmit each of the plurality of inspection data stored in the inspection data storage unit to the inspection target device.

ここで詳細エラー検査手段は、結果データおよびトポロジーデータを参照し、隣接する機器のいずれか一方の識別子が、結果データに含まれる場合、当該隣接する機器のＭＩＰに検査データを送信しても良い。 Here, the detailed error inspection means refers to the result data and the topology data, and when the identifier of either one of the adjacent devices is included in the result data, the detailed error inspection means may transmit the inspection data to the MIP of the adjacent device. .

情報システムが、運用系の機器と待機系の機器を備える場合、エラー検査手段は、待機系の機器に、検査データを送信しても良い。 When the information system includes an active device and a standby device, the error inspection means may transmit inspection data to the standby device.

本発明の第２の特徴は、本発明の第１の特徴のいずれかに係る記載の監視プログラムである。 A second feature of the present invention is the monitoring program according to any one of the first features of the present invention.

本発明によれば、通信システムにおける機器のサイレント故障を検査する監視サーバおよび監視プログラムを提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the monitoring server and monitoring program which test | inspect the silent failure of the apparatus in a communication system can be provided.

図１は、本発明の実施の形態に係る通信システムの構成図である。FIG. 1 is a configuration diagram of a communication system according to an embodiment of the present invention. 図２は、本発明の実施の形態に係る通信システムにおける監視方法を説明するシーケンス図である。FIG. 2 is a sequence diagram illustrating a monitoring method in the communication system according to the embodiment of the present invention. 図３は、本発明の実施の形態に係る監視サーバを説明する機能ブロック図である。FIG. 3 is a functional block diagram illustrating the monitoring server according to the embodiment of the present invention. 図４は、フレームの長さによって、検知されるエラーを説明する図である。FIG. 4 is a diagram for explaining an error detected according to the length of the frame. 図５は、トラヒックによって、検知されるエラーを説明する図である。FIG. 5 is a diagram for explaining an error detected by traffic. 図６は、本発明の実施の形態に係る機器データのデータ構造とデータの一例を説明する図である。FIG. 6 is a diagram for explaining an example of the data structure and data of the device data according to the embodiment of the present invention. 図７は、本発明の実施の形態に係る監視サーバが表示する結果表示画面の一例である。FIG. 7 is an example of a result display screen displayed by the monitoring server according to the embodiment of the present invention. 図８は、本発明の実施の形態に係る詳細エラー検査手段の処理を説明する図である。FIG. 8 is a diagram for explaining the processing of the detailed error checking means according to the embodiment of the present invention. 図９は、本発明の実施の形態に係る監視サーバの処理を説明するフローチャートである。FIG. 9 is a flowchart for explaining processing of the monitoring server according to the embodiment of the present invention. 図１０は、本発明の実施の形態に係る監視サーバが、サイレント故障を検査する待機系システムを説明する図である。FIG. 10 is a diagram illustrating a standby system in which the monitoring server according to the embodiment of the present invention inspects a silent failure.

次に、図面を参照して、本発明の実施の形態を説明する。以下の図面の記載において、同一または類似の部分には同一または類似の符号を付している。 Next, embodiments of the present invention will be described with reference to the drawings. In the following description of the drawings, the same or similar parts are denoted by the same or similar reference numerals.

（通信システム）
図１を参照して、本発明の実施の形態に係る通信システム３を説明する。通信システム３は、複数の機器と、監視サーバ１と、を備える。ここで、図１に示す例においては、複数の機器が、第１のスイッチ２ａないし第１２のスイッチ２ｌの１２のスイッチである場合について説明する。機器の数はこれに限られないし、機器はスイッチでなくても良い。また、本実施形態においてこれらのスイッチを区別する必要のない場合、単にスイッチ２と記載する場合がある。 (Communications system)
A communication system 3 according to an embodiment of the present invention will be described with reference to FIG. The communication system 3 includes a plurality of devices and the monitoring server 1. Here, in the example shown in FIG. 1, a case will be described in which the plurality of devices are 12 switches of the first switch 2a to the twelfth switch 21. The number of devices is not limited to this, and the devices may not be switches. In the present embodiment, when it is not necessary to distinguish between these switches, they may be simply referred to as switches 2.

図１に示す例において、スイッチ２間を結ぶリンクは、通信ネットワークが隣接していることを示す。例えば第１のスイッチ２ａは、第２のスイッチ２ｂおよび第３のスイッチ２ｃと隣あって接続される。第２のスイッチ２ｂは、第１のスイッチおよび第４のスイッチ２ｄと隣あって接続される。このように、図１に示すスイッチ２は、双方向に通信可能なネットワークにより、網状に相互に通信可能に接続される。図１に示す例において監視サーバ１は、第１のスイッチ２ａにのみ接続されるが、この接続形態には限られない。 In the example shown in FIG. 1, the link connecting the switches 2 indicates that the communication network is adjacent. For example, the first switch 2a is connected adjacent to the second switch 2b and the third switch 2c. The second switch 2b is connected adjacent to the first switch and the fourth switch 2d. As described above, the switches 2 shown in FIG. 1 are connected to each other so as to be able to communicate with each other in the form of a network through a bidirectionally communicable network. In the example shown in FIG. 1, the monitoring server 1 is connected only to the first switch 2a, but is not limited to this connection form.

監視サーバ１は、スイッチ２などの機器のサイレント故障を検査する。監視サーバ１は、検査対象のスイッチ２に、所定の検査データを送信する。検査対象のスイッチ２は、ＩＣＭＰ（Internet Control Message Protocol）のエコー要求機能を利用して、検査データをそのまま返信し、監視サーバ１は、検査対象のスイッチ２から、エコー応答として検査データを受信する。監視サーバ１は、送信した検査データと、受信した検査データと、を比較し、各ビットが一致するか否かを判定する。一致しない場合、監視サーバ１から検査対象のスイッチ２までの経路上のスイッチのいずれかにおいて、サイレント故障が発生していると判定する。監視サーバ１は、すべての検査対象のスイッチ２に同様に検査データを送信し、送信した検査データと、受信した応答データとが一致しなかったスイッチ２を特定する。監視サーバ１は、ネットワーク構成と比較し、隣接するスイッチ２のいずれか一方のみが一致しなかった場合、監視サーバ１は、この隣接するスイッチ２を、サイレント故障の被疑区間として特定する。 The monitoring server 1 inspects a silent failure of a device such as the switch 2. The monitoring server 1 transmits predetermined inspection data to the switch 2 to be inspected. The switch 2 to be inspected uses the echo request function of ICMP (Internet Control Message Protocol) to send back the inspection data as it is, and the monitoring server 1 receives the inspection data as an echo response from the switch 2 to be inspected. . The monitoring server 1 compares the transmitted inspection data with the received inspection data, and determines whether each bit matches. If they do not match, it is determined that a silent failure has occurred in any of the switches on the path from the monitoring server 1 to the switch 2 to be inspected. The monitoring server 1 transmits inspection data to all the inspection target switches 2 in the same manner, and identifies the switch 2 in which the transmitted inspection data and the received response data do not match. When only one of the adjacent switches 2 does not match as compared with the network configuration, the monitoring server 1 identifies the adjacent switch 2 as a suspected section of silent failure.

ここで、本発明の実施の形態において、被疑区間のスイッチ２のＭＩＰに対してイーサネットループバックを利用して検査データを送信する。これにより、いずれのスイッチ２でサイレント故障が発生しているかを検知することができる。 Here, in the embodiment of the present invention, the inspection data is transmitted to the MIP of the switch 2 in the suspected section using the Ethernet loopback. Thereby, it can be detected which switch 2 has a silent failure.

スイッチ２は、検査データを含むデータを受信すると、その宛先のスイッチにデータを転送する。このときスイッチ２は、データを受信すると、データに含まれる検査データをスイッチ２のメモリに記憶し、そのメモリから検査データを読み出して、宛先のスイッチに転送する。ここで、検査データが記憶されたメモリに不具合が生じている場合が考えられる。この場合、スイッチ２が検査データを記憶したアドレスに基づいてデータを読み出しても、スイッチ２において書き込んだデータと読み出したデータに差異が生じる場合がある。メモリの所定のビットに不具合が生じている場合、そのビットにデータを記憶しても正しくデータを保持できないため、読み出したデータが、書き込んだデータと異なる事象が発生するからである。本発明の実施の形態に係る監視サーバ１は、このようなサイレント故障を検出することができる。 When the switch 2 receives the data including the inspection data, the switch 2 transfers the data to the destination switch. At this time, when the switch 2 receives the data, the switch 2 stores the inspection data included in the data in the memory of the switch 2, reads the inspection data from the memory, and transfers it to the destination switch. Here, there may be a case where a defect occurs in the memory in which the inspection data is stored. In this case, even if the switch 2 reads data based on the address where the inspection data is stored, there may be a difference between the data written in the switch 2 and the read data. This is because when a failure occurs in a predetermined bit in the memory, even if data is stored in that bit, the data cannot be held correctly, and an event in which the read data is different from the written data occurs. The monitoring server 1 according to the embodiment of the present invention can detect such a silent failure.

図２を参照して、本発明の実施の形態に係る監視方法を説明する。図２においては、監視サーバ１が、第１のスイッチ２ａ、第２のスイッチ２ｂを介して宛先スイッチ２ｎに検査データを送信する場合を説明する。ここで、宛先スイッチ２ｎは、図１に示す例において、監視サーバ１と、第１のスイッチ２ａおよび第２のスイッチ２ｂを介して接続される第４のスイッチ２ｄである。 With reference to FIG. 2, the monitoring method according to the embodiment of the present invention will be described. In FIG. 2, a case will be described in which the monitoring server 1 transmits inspection data to the destination switch 2n via the first switch 2a and the second switch 2b. Here, the destination switch 2n is the fourth switch 2d connected to the monitoring server 1 via the first switch 2a and the second switch 2b in the example shown in FIG.

まずステップＳ１において監視データ１は、ｐｉｎｇコマンドを使ってデータＤ１を、宛先スイッチ２ｎ宛に送信する。このデータＤ１は、宛先および送信元を示すヘッダと、検査データＣ１と、このヘッダおよび検査データＣ１に対するＦＣＳ（フレームチェックシーケンス：誤り検出データ）を含む。このＦＣＳは、エラー検出のために、監視サーバ１において算出され、データＤ１に含まれる。 First, in step S1, the monitoring data 1 transmits data D1 to the destination switch 2n using a ping command. The data D1 includes a header indicating a destination and a transmission source, inspection data C1, and FCS (frame check sequence: error detection data) for the header and inspection data C1. This FCS is calculated by the monitoring server 1 and included in the data D1 for error detection.

第１のスイッチ２ａがデータＤ１を受信すると、ステップＳ２において、受信したデータＤ１をメモリに記憶する。スイッチ２ａは、データＤ１からＦＣＳを外し、メモリから読み出したヘッダおよび検査データＣ２について、新たにＦＣＳを算出する。ステップＳ３において第１のスイッチ２ａは、メモリから読み出した検査データＣ２と、ヘッダおよび検査データＣ２から算出したＦＣＳを含むデータＤ２を、送信する。 When the first switch 2a receives the data D1, in step S2, the received data D1 is stored in the memory. The switch 2a removes the FCS from the data D1, and newly calculates the FCS for the header and the inspection data C2 read from the memory. In step S3, the first switch 2a transmits inspection data C2 read from the memory and data D2 including the FCS calculated from the header and the inspection data C2.

第２のスイッチ２ｂがデータＤ２を受信すると、ステップＳ４において、受信したデータＤ２をメモリに記憶する。スイッチ２ｂは、データＤ２からＦＣＳを外し、メモリから読み出したヘッダおよび検査データＣ３について、新たにＦＣＳを算出する。ステップＳ５において第２のスイッチ２ｂは、メモリから読み出した検査データＣ３と、ヘッダおよび検査データＣ３から算出したＦＣＳを含むデータＤ３を、送信する。 When the second switch 2b receives the data D2, in step S4, the received data D2 is stored in the memory. The switch 2b removes the FCS from the data D2, and newly calculates the FCS for the header and inspection data C3 read from the memory. In step S5, the second switch 2b transmits inspection data C3 read from the memory and data D3 including the header and the FCS calculated from the inspection data C3.

宛先スイッチ２ｎはデータＤ３を受信すると、ステップＳ６において、宛先スイッチ２ｎは、ＩＣＭＰを利用して、発信元と宛先のアドレスを入れ替え、ＦＣＳを再計算し、ｐｉｎｇコマンドの応答としてデータＤ４を送信する。 When the destination switch 2n receives the data D3, in step S6, the destination switch 2n uses ICMP to switch the source and destination addresses, recalculates the FCS, and transmits the data D4 as a response to the ping command. .

ステップＳ７ないしステップＳ９において、検査データは、第２のスイッチ２ｂおよび第１のスイッチ２ａを介して、監視サーバ１に送信される、監視サーバ１は、ｐｉｎｇコマンドの応答として、第１のスイッチ２ａからデータＤ６を受信すると、データＤ６に含まれる検査データＣ６と、ステップＳ１で送信した検査データＣ１とを比較する。 In step S7 to step S9, the inspection data is transmitted to the monitoring server 1 via the second switch 2b and the first switch 2a. The monitoring server 1 responds to the ping command by the first switch 2a. When the data D6 is received, the inspection data C6 included in the data D6 is compared with the inspection data C1 transmitted in step S1.

ここで、データを中継した第１のスイッチ２ａ、第２のスイッチ２ｂおよび宛先スイッチ２ｎのいずれのメモリもエラーが発生しなければ、データＤ１の検査データＣ１と、データＤ６の検査データＣ６とは一致する。しかし、第１のスイッチ２ａ、第２のスイッチ２ｂおよび宛先スイッチ２ｎのいずれかにおいて、メモリエラーが発生し、そのエラーが発生した記憶領域に検査データが記憶されていた場合、データＤ１の検査データＣ１と、データＤ６の検査データＣ６とは一致しない。例えば、検査データＣ１が”１０１０１０１”で、検査データＣ６が”１０１０１１１”の場合、監視サーバ１は、第１のスイッチ２ａ、第２のスイッチ２ｂおよび宛先スイッチ２ｎのいずれかの検査データが保持された第５ビットにおいて、メモリエラーが発生していると特定することができる。 Here, if no error occurs in any of the memories of the first switch 2a, the second switch 2b, and the destination switch 2n that relayed the data, the inspection data C1 of the data D1 and the inspection data C6 of the data D6 are Match. However, when a memory error occurs in any of the first switch 2a, the second switch 2b, and the destination switch 2n, and the inspection data is stored in the storage area where the error has occurred, the inspection data of the data D1 C1 and the inspection data C6 of the data D6 do not match. For example, when the inspection data C1 is “1010101” and the inspection data C6 is “1010111”, the monitoring server 1 holds the inspection data of any of the first switch 2a, the second switch 2b, and the destination switch 2n. In the fifth bit, it can be specified that a memory error has occurred.

この後、監視サーバ１は、同様に、サイレント故障が疑われる機器のＭＩＰに対し、イーサループバックコマンドを使って、検査データを送信する。そこで監視サーバ１は、送信した検査データと、イーサループバックの応答として受信した検査データと、を比較して、一致するか否かを判定し、メモリエラーが発生した機器を特定することができる。 Thereafter, the monitoring server 1 similarly transmits inspection data to the MIP of the device suspected of silent failure using the Ethernet loopback command. Therefore, the monitoring server 1 compares the transmitted inspection data with the inspection data received as the response of the ether loop back, determines whether or not they match, and can identify the device in which the memory error has occurred. .

（監視サーバ）
図３を参照して本発明の実施の形態に係る監視サーバ１を説明する。監視サーバ１は、記憶装置１０、中央処理制御装置２０、通信制御装置３０および表示装置４０を備える一般的なコンピュータである。監視サーバ１は、記憶装置１０に記憶された監視プログラムが、中央処理制御装置２０によって読み出され実行されることにより実現される。通信制御装置３０は、例えばＬＡＮボードであって、監視サーバ１の通信を制御する。表示装置４０は、一般的なディスプレイである。 (Monitoring server)
The monitoring server 1 according to the embodiment of the present invention will be described with reference to FIG. The monitoring server 1 is a general computer including a storage device 10, a central processing control device 20, a communication control device 30, and a display device 40. The monitoring server 1 is realized by the monitoring program stored in the storage device 10 being read and executed by the central processing control device 20. The communication control device 30 is a LAN board, for example, and controls communication of the monitoring server 1. The display device 40 is a general display.

記憶装置１０は、検査データ記憶部１１、機器データ記憶部１２、結果データ記憶部１３、詳細結果データ記憶部１４およびトポロジーデータ記憶部１５を備える。 The storage device 10 includes an inspection data storage unit 11, a device data storage unit 12, a result data storage unit 13, a detailed result data storage unit 14, and a topology data storage unit 15.

検査データ記憶部１１は、記憶装置１０のうち、検査データ１１ａが記憶された記憶領域である。検査データ１１ａは、スイッチ２等の検査対象の機器に送信されるデータである。この検査データ１１ａは、例えば、所定のビット数だけ”１”や”０”が連続したフレームデータである。 The inspection data storage unit 11 is a storage area in the storage device 10 in which inspection data 11a is stored. The inspection data 11a is data transmitted to an inspection target device such as the switch 2. The inspection data 11a is frame data in which “1” and “0” are continuous by a predetermined number of bits, for example.

検査データ記憶部１１には、異なる複数の検査データ１１ａ、１１ｂ、１１ｃ…が記憶されても良い。例えば、スイッチ２のメモリにおけるサイレント故障は、そのメモリ領域のいずれの位置にも発生する可能性はある。そこで、本発明の実施の形態においては、ショートフレームの検査データ、ロングフレームの検査データなど、種々の検査データを予め検査データ記憶部１１に記憶する。このような種々の検査データを検査対象の機器に送信することにより、監視サーバ１は、メモリのビットの位置を問わず、サイレント故障を検査することができる。 The inspection data storage unit 11 may store a plurality of different inspection data 11a, 11b, 11c,. For example, a silent failure in the memory of switch 2 can occur anywhere in that memory area. Therefore, in the embodiment of the present invention, various inspection data such as short frame inspection data and long frame inspection data are stored in the inspection data storage unit 11 in advance. By transmitting such various inspection data to the device to be inspected, the monitoring server 1 can inspect a silent failure regardless of the bit position of the memory.

例えば、図４（ａ）は、検査データがショートフレームの場合のスイッチ２のメモリを模式的に示す。ショートフレームの場合、記憶されるメモリ領域も少ない。従って、最大位近辺のビットでサイレント故障が発生すると、サイレント故障が検出されない場合がある。しかし、図４（ｂ）に示すようにロングフレームの場合、メモリ領域のビットを多く使用する。従って、最大位近辺のビットでサイレント故障が発生している場合でも、サイレント故障が発生したビットにフレームが疎通される。これにより監視データ１は、サイレント故障を検出することができる。 For example, FIG. 4A schematically shows the memory of the switch 2 when the inspection data is a short frame. In the case of a short frame, the memory area stored is also small. Therefore, when a silent failure occurs in the bits near the maximum position, the silent failure may not be detected. However, as shown in FIG. 4B, in the case of a long frame, many bits in the memory area are used. Therefore, even when a silent failure has occurred in the bits near the maximum, the frame is communicated to the bit in which the silent failure has occurred. Thereby, the monitoring data 1 can detect a silent failure.

また、検査データ記憶部１１には、ショートフレームとロングフレームを混合した複数の検査データの集合が記憶されても良い。これらの複数の検査データを所定時間内に送出することにより、高トラヒック状態を作り出すことができる。これにより、高トラヒック時にしか使用されないメモリ領域にもフレームが疎通されるので、サイレント故障を検査することができる。 The inspection data storage unit 11 may store a set of a plurality of inspection data in which a short frame and a long frame are mixed. By sending the plurality of inspection data within a predetermined time, a high traffic state can be created. As a result, the frame is communicated also to a memory area that is used only during high traffic, so that a silent failure can be inspected.

例えば図４（ｂ）は、トラヒックが低い場合のスイッチ２のメモリを模式的に示す。低トラヒックの場合、スイッチ２のバッファ蓄積量が少ない。従って、バッファ蓄積量が多いときのみ使用するビットでサイレント故障が発生すると、サイレント故障が検出されない場合がある。しかし、図５（ｂ）に示すように高トラヒックの場合、バッファ蓄積量が多くなりメモリ領域のビットも多く使う。従って、バッファ蓄積量が多いときのみ使用するビットでサイレント故障が発生する場合でも、サイレント故障が発生したビットにフレームが疎通される。これにより監視データ１は、サイレント故障を検出することができる。 For example, FIG. 4B schematically shows the memory of the switch 2 when the traffic is low. In the case of low traffic, the buffer accumulation amount of the switch 2 is small. Therefore, if a silent failure occurs with a bit used only when the buffer storage amount is large, the silent failure may not be detected. However, as shown in FIG. 5B, in the case of high traffic, the amount of buffer accumulation increases and many bits in the memory area are used. Therefore, even when a silent failure occurs with a bit used only when the buffer accumulation amount is large, the frame is communicated to the bit where the silent failure has occurred. Thereby, the monitoring data 1 can detect a silent failure.

また、優先度によってスイッチ２において蓄積されるキューが異なる。従って、検査データ記憶部１１に、優先度の異なる検査データが記憶されても良い。 Further, the queue stored in the switch 2 differs depending on the priority. Therefore, inspection data having different priorities may be stored in the inspection data storage unit 11.

このように検査データ記憶部１１には、監視サーバ１がサイレント故障を送信するために必要な、フレーム長、優先度、トラヒックなどの異なる種々の検査データが記憶される。 As described above, the inspection data storage unit 11 stores various types of inspection data such as frame length, priority, and traffic necessary for the monitoring server 1 to transmit a silent failure.

機器データ記憶部１２は、記憶装置１０のうち、機器データ１２ａが記憶された記憶領域である。機器データ１２ａは、監視データ１のサイレント故障の検査対象の機器の識別子を含むデータである。 The device data storage unit 12 is a storage area in the storage device 10 in which device data 12a is stored. The device data 12a is data including the identifier of the device to be inspected for the silent failure in the monitoring data 1.

機器データ１２ａは、例えば、図６に示すように、スイッチ２の識別子、ＩＰアドレスおよびＭＩＰ識別子が対応づけられたデータである。スイッチ識別子として、図１で参照した機器の名称を用いているが、ＩＰアドレスを用いても良い。ＩＰアドレスは、後述するエラー検査手段２１および表示手段２３によって参照される。ＭＩＰ識別子は、後述する詳細エラー検査手段２２によって参照される。 For example, as shown in FIG. 6, the device data 12a is data in which the identifier, IP address, and MIP identifier of the switch 2 are associated with each other. Although the name of the device referred to in FIG. 1 is used as the switch identifier, an IP address may be used. The IP address is referred to by error checking means 21 and display means 23 described later. The MIP identifier is referred to by detailed error checking means 22 described later.

結果データ記憶部１３は、記憶装置１０のうち、結果データ１３ａが記憶された記憶領域である。結果データ１３ａには、後述するエラー検査手段２１による検査結果が格納されたデータである。例えば結果データ１３ａには、エラーの発生したスイッチ２の識別子と、エラーの発生した検査データと、を対応づけて記憶する。例えば、第７のスイッチ２ｇにロングフレームの検査データを送信し、監視サーバ１が送信した検査データと、監視サーバ１が受信した検査データとが異なる場合、結果データ１３ａには、第７のスイッチ２ｇの識別子と、エラーの発生した検査データがロングフレームであることが記憶される。 The result data storage unit 13 is a storage area in the storage device 10 in which result data 13a is stored. The result data 13a is data in which an inspection result by an error inspection unit 21 described later is stored. For example, the result data 13a stores the identifier of the switch 2 in which the error has occurred and the inspection data in which the error has occurred in association with each other. For example, when long frame inspection data is transmitted to the seventh switch 2g and the inspection data transmitted by the monitoring server 1 is different from the inspection data received by the monitoring server 1, the result data 13a includes the seventh switch The 2g identifier and the inspection data in which an error has occurred are stored as a long frame.

詳細結果データ記憶部１４は、記憶装置１０のうち、詳細結果データ１４ａが記憶された記憶領域である。詳細結果データ記憶部１４ａには、後述する詳細エラー検査手段２２による検査結果が記憶された記憶領域である。詳細結果データ１４ａには、詳細エラー検査手段２２において特定された、サイレント故障が発生した機器の識別子が記憶される。 The detailed result data storage unit 14 is a storage area in the storage device 10 in which the detailed result data 14a is stored. The detailed result data storage unit 14a is a storage area in which the inspection result by the detailed error inspection unit 22 described later is stored. The detailed result data 14a stores the identifier of the device in which the silent failure has occurred and is specified by the detailed error checking means 22.

トポロジーデータ記憶部１４は、スイッチ２などの機器の識別子とともに、通信システム３のトポロジーを示すデータである。トポロジーデータ１５ａは、図１に示すようなネットワーク構成を表示装置４０に表示する際に参照される。トポロジーデータ１５ａは、スイッチ２の識別子と、そのスイッチ２の接続情報とが含まれる。 The topology data storage unit 14 is data indicating the topology of the communication system 3 together with identifiers of devices such as the switch 2. The topology data 15a is referred to when the network configuration as shown in FIG. The topology data 15a includes an identifier of the switch 2 and connection information of the switch 2.

中央処理制御装置２０は、エラー検査手段２１、詳細エラー検査手段２２および表示手段２３を備える。 The central processing control device 20 includes an error inspection unit 21, a detailed error inspection unit 22, and a display unit 23.

エラー検査手段２１は、機器データ１２ａの検査対象のスイッチ２に検査データ１１ａを送信し、当該スイッチ２によって検査データがコピーされ返信された応答データを取得するとともに、当該スイッチ２に送信した検査データ１１ａと応答データとが、一致するか否かを判定する。一致しない場合、エラー検査手段２１は、当該スイッチ２の識別子を結果データ１３ａに記憶する。 The error inspection unit 21 transmits the inspection data 11a to the switch 2 to be inspected in the device data 12a, acquires the response data to which the inspection data is copied and returned by the switch 2, and the inspection data transmitted to the switch 2 It is determined whether 11a and the response data match. If they do not match, the error checking means 21 stores the identifier of the switch 2 in the result data 13a.

ここで、検査データ記憶部１１に複数の検査データが格納されている場合、エラー検査手段２１は、検査データ記憶部１１に記憶された複数の検査データ１１ａ、１１ｂ、１１ｃ…のそれぞれを、機器データ１２ａの検査対象のスイッチ２に送信し、それぞれの検査データと、それぞれの検査データに対する応答データとが、一致するか否かを判定する。一致しない場合、エラー検査手段２１は、当該スイッチ２の識別子および送信した検査データと、を対応づけて結果データ１３ａに記憶する。 Here, when a plurality of inspection data is stored in the inspection data storage unit 11, the error inspection unit 21 converts each of the plurality of inspection data 11a, 11b, 11c,. The data 12a is transmitted to the switch 2 to be inspected, and it is determined whether or not each inspection data and the response data for each inspection data match. If they do not match, the error checking means 21 associates the identifier of the switch 2 with the transmitted check data and stores it in the result data 13a.

さらにエラー検査手段２１は、複数の検査データを所定時間内に送信して高トラヒック状態をつくり、高トラヒック状態で送信した各検査データと、その状態で受信した各応答データとが、一致するか否かを判定する。一致しない場合、エラー検査手段２１は、エラーが発生したスイッチ２の識別子と、高トラヒック状態でエラーが発生したことを対応づけて結果データ１３ａに記憶する。 Furthermore, the error inspection means 21 transmits a plurality of inspection data within a predetermined time to create a high traffic state, and whether each inspection data transmitted in the high traffic state matches each response data received in that state. Determine whether or not. If they do not match, the error checking means 21 associates the identifier of the switch 2 in which the error has occurred with the fact that the error has occurred in the high traffic state and stores it in the result data 13a.

結果データ１３ａに何らデータが記憶されていない場合、すべての検査データが正常に送受信されているので、監視サーバ１によるサイレント故障の検査は終了する。一方、結果データ１３ａに何らかのデータが記憶されている場合、通信システム３のいずれかのスイッチ２でサイレント故障が発生していると考えられる。この結果データ１３ａは、表示手段２３によって、表示装置４０に表示される。 When no data is stored in the result data 13a, all inspection data is normally transmitted / received, and thus the silent failure inspection by the monitoring server 1 is completed. On the other hand, if any data is stored in the result data 13a, it is considered that a silent failure has occurred in any of the switches 2 of the communication system 3. The result data 13a is displayed on the display device 40 by the display means 23.

表示手段２３は、トポロジーデータ１５ａに基づいて通信システム３のネットワーク構成を表示装置４０に表示するとともに、結果データ１３ａに含まれるスイッチ２の識別子を抽出し、抽出したスイッチ２の識別子に基づいて表示装置４０に警告を表示する。 The display means 23 displays the network configuration of the communication system 3 on the display device 40 based on the topology data 15a, extracts the identifier of the switch 2 included in the result data 13a, and displays it based on the extracted identifier of the switch 2. A warning is displayed on the device 40.

表示手段２３は、例えば図７に示す結果表示画面Ｐ１０１を表示装置４０に表示する。結果表示画面Ｐ１０１は、通信システム３のトポロジーを表示するとともに、エラー検査手段２１によってエラーが検出されたスイッチ２のアイコンをハッチングして表示している。 The display means 23 displays a result display screen P101 shown in FIG. The result display screen P101 displays the topology of the communication system 3 and hatches and displays the icon of the switch 2 in which an error is detected by the error inspection unit 21.

例えば、結果データ１３ａには、第７のスイッチ２ｇないし第１２のスイッチ２ｌの識別子が含まれる場合を考える。この場合、結果表示画面Ｐ１０１においては、第７のスイッチ２ｇないし第１２のスイッチ２ｌに対応するアイコンがハッチングされている。これにより作業者は、第７のスイッチ２ｇないし第１２のスイッチ２ｌ宛の検査において検査データの不一致が確認され、これらのスイッチの近傍でサイレント故障が発生していることを認識することができる。図７に示す例では、エラーの発生したスイッチのアイコンをハッチングして表示したが、これに限られない。例えば点滅表示など、作業者にエラーが発生していることを示すことができれば、他の警告表示でも良い。 For example, consider a case where the result data 13a includes the identifiers of the seventh switch 2g to the twelfth switch 2l. In this case, in the result display screen P101, icons corresponding to the seventh switch 2g to the twelfth switch 21 are hatched. As a result, the operator can confirm that the inspection data is inconsistent in the inspections addressed to the seventh switch 2g to the twelfth switch 21 and that a silent failure has occurred in the vicinity of these switches. In the example illustrated in FIG. 7, the icon of the switch in which the error has occurred is displayed by hatching, but is not limited thereto. For example, another warning display may be used as long as it can indicate to the operator that an error has occurred, such as a blinking display.

表示手段２３はさらに、サイレント故障が発生した被疑区間を結果表示画面Ｐ１０１に表示しても良い。表示手段２３は、監視サーバ１は、隣接するスイッチ２のいずれか一方のみが、結果データ１３ａに含まれている場合、この隣接するスイッチを、被疑区間として特定することができる。図７に示す例では、第３のスイッチ２ｃおよび第７のスイッチ２ｇ間と、第４のスイッチ２ｄおよび第８のスイッチ２ｈ間とが、被疑区間として特定される。従って表示手段２３は、第３のスイッチ２ｃおよび第７のスイッチ２ｇ間と、第４のスイッチ２ｄおよび第８のスイッチ２ｈ間とを、被疑区間として、結果表示画面Ｐ１０１に表示しても良い。 The display unit 23 may further display the suspected section where the silent failure has occurred on the result display screen P101. When only one of the adjacent switches 2 is included in the result data 13a, the display unit 23 can specify the adjacent switch as the suspicious section. In the example shown in FIG. 7, the area between the third switch 2c and the seventh switch 2g and the area between the fourth switch 2d and the eighth switch 2h are specified as the suspected section. Therefore, the display unit 23 may display the area between the third switch 2c and the seventh switch 2g and the area between the fourth switch 2d and the eighth switch 2h as the suspected section on the result display screen P101.

エラー検査手段２１においてエラーが発生した場合、詳細エラー検査手段２２によって、サイレント故障の発生したスイッチ２が特定される。 When an error occurs in the error checking means 21, the detailed error checking means 22 identifies the switch 2 in which the silent failure has occurred.

詳細エラー検査手段２２は、エラー検査手段２１によって何らかのエラーが検出された場合、サイレント故障が発生した機器を特定する。具体的には詳細エラー検査手段２２は、エラーが検出された機器、具体的には、結果データ１３ａに含まれるスイッチ２に隣接するスイッチのＭＩＰに検査データを送信し、当該スイッチ２によって検査データがコピーされ返信された応答データを取得する。詳細エラー検査手段２２は、当該スイッチ２に送信した検査データと応答データとが、一致するか否かを判定する。一致しない場合、当該ＭＩＰの識別子に基づいて、サイレント故障が発生したスイッチ２を特定する。サイレント故障が発生したスイッチ２が特定されると、詳細エラー検査手段２２は、そのスイッチ２の識別子を、詳細結果データ１４ａに記憶する。 The detailed error inspection unit 22 specifies a device in which a silent failure has occurred when any error is detected by the error inspection unit 21. Specifically, the detailed error inspection unit 22 transmits the inspection data to the device in which the error is detected, specifically, the MIP of the switch adjacent to the switch 2 included in the result data 13a. Get response data copied and returned. The detailed error inspection unit 22 determines whether the inspection data transmitted to the switch 2 matches the response data. If they do not match, the switch 2 in which the silent failure has occurred is specified based on the identifier of the MIP. When the switch 2 in which the silent failure has occurred is specified, the detailed error checking unit 22 stores the identifier of the switch 2 in the detailed result data 14a.

ここで詳細エラー検査手段２２は、結果データ１３ａを読み出して、エラーの発生したスイッチ２の識別子を取得するとともに、トポロジーデータ１５ａを参照し、隣接する機器のいずれか一方の識別子が、結果データ１３ａに含まれる場合、イーサネットループバックコマンドを利用して、隣接するスイッチのＭＩＰに検査データを送信する。 Here, the detailed error checking means 22 reads out the result data 13a, acquires the identifier of the switch 2 in which the error has occurred, refers to the topology data 15a, and the identifier of one of the adjacent devices is the result data 13a. If it is included, the inspection data is transmitted to the MIP of the adjacent switch using the Ethernet loopback command.

図７に示す例において、第３のスイッチ２ｃおよび第７のスイッチ２ｇは一つのリンクで接続され隣接されているところ、第３のスイッチ２ｃの識別子は結果データ１３ａに含まれておらず、第７のスイッチ２ｇの識別子は結果データ１３ａに含まれている。同様に、第４のスイッチ２ｄの識別子は結果データ１３ａに含まれておらず、第８のスイッチ２ｈの識別子は結果データ１３ａに含まれている。そこで、詳細エラー検査手段２２は、このように、隣接する２つのスイッチのうち、一方が結果データ１３ａに含まれ一方が結果データ１３ａに含まれない区間を、被疑区間として抽出し、この抽出した被疑区間のスイッチ２のＭＩＰに対して検査データを送信する。図７に示す例の場合、第３のスイッチ２ｃ、第４のスイッチ２ｄ、第７のスイッチ２ｇおよび第８のスイッチ２ｈの各ＭＩＰに、検査データが送信される。 In the example shown in FIG. 7, the third switch 2c and the seventh switch 2g are connected by one link and are adjacent to each other. However, the identifier of the third switch 2c is not included in the result data 13a, and 7 is included in the result data 13a. Similarly, the identifier of the fourth switch 2d is not included in the result data 13a, and the identifier of the eighth switch 2h is included in the result data 13a. Therefore, the detailed error checking means 22 extracts a section in which one of the two adjacent switches is included in the result data 13a and one of the two is not included in the result data 13a as a suspected section. The inspection data is transmitted to the MIP of the switch 2 in the suspected section. In the case of the example shown in FIG. 7, the inspection data is transmitted to each MIP of the third switch 2c, the fourth switch 2d, the seventh switch 2g, and the eighth switch 2h.

詳細エラー検査手段２２は、その検査データと、各ＭＩＰから受信した検査データとが一致するか否かに基づいて、サイレント故障が発生しているスイッチ２を特定することができる。 The detailed error inspection means 22 can identify the switch 2 in which the silent failure has occurred based on whether the inspection data matches the inspection data received from each MIP.

図８を参照して説明する。エラー検査手段２１において、監視サーバ１から第３のスイッチ２ｃ宛に送信した場合はエラーが発生せず、第７のスイッチ２ｇ宛に送信した場合にエラーが発生している。従って、第３のスイッチ２ｃおよび第７のスイッチ２ｇの区間は、このいずれかにおいてサイレント故障が発生している可能性がある被疑区間となる。そこで詳細エラー検査手段２２は、第３のスイッチ２ｃのＭＩＰおよび第７のスイッチ２ｇのＭＩＰに検査データを送信する。 This will be described with reference to FIG. In the error checking means 21, no error occurs when the monitoring server 1 sends to the third switch 2c, and an error occurs when sending to the seventh switch 2g. Therefore, the section of the third switch 2c and the seventh switch 2g is a suspicious section where there is a possibility that a silent failure has occurred. Therefore, the detailed error inspection means 22 transmits inspection data to the MIP of the third switch 2c and the MIP of the seventh switch 2g.

図８は、第３のスイッチ２ｃにサイレント故障が発生している場合を示す。監視サーバ１が、第３のスイッチ２ｃの上流側のＭＩＰに検査データを送信した場合は、送信したデータと受信したデータとは一致する。一方、第３のスイッチ２ｃの下流側のＭＩＰに検査データを送信した場合は、第３のスイッチ２ｃのメモリに一度記憶された検査データが返信されるので、送信したデータと受信したデータとは一致しない。なお、第７のスイッチ２ｇの上流のＭＩＰに検査データを送信しても同様に、送信したデータと受信したデータとは一致しない。そこで詳細エラー検査手段２２は、一つのスイッチ２の上流側のＭＩＰと下流側のＭＩＰとで検査結果が異なる場合、このスイッチ２のメモリにおいてサイレント故障が発生していると特定することができる。 FIG. 8 shows a case where a silent failure has occurred in the third switch 2c. When the monitoring server 1 transmits the inspection data to the MIP upstream of the third switch 2c, the transmitted data matches the received data. On the other hand, when the inspection data is transmitted to the MIP downstream of the third switch 2c, the inspection data once stored in the memory of the third switch 2c is returned, so the transmitted data and the received data are It does not match. Even if the inspection data is transmitted to the MIP upstream of the seventh switch 2g, the transmitted data and the received data do not coincide. Therefore, the detailed error inspection means 22 can specify that a silent failure has occurred in the memory of the switch 2 when the inspection results are different between the upstream MIP and the downstream MIP of one switch 2.

ここで、結果データ１３ａに、エラーが発生したときの検査データの種別が含まれている場合、詳細エラー検査手段２２は、同様の検査データを、ＭＩＰに送信する。例えば、エラー検査手段２１においてロングフレームの検査データを送信してエラーが発生した場合、詳細エラー検査手段２２は同様に、ロングフレームの検査データをＭＩＰに送信する。またエラー検査手段２１において高トラヒックの状態でエラーが発生した場合、詳細エラー検査手段２２は同様に、高トラヒック状態で検査データをＭＩＰに送信する。 Here, when the result data 13a includes the type of inspection data when an error occurs, the detailed error inspection unit 22 transmits similar inspection data to the MIP. For example, when an error occurs when the error inspection unit 21 transmits long frame inspection data, the detailed error inspection unit 22 similarly transmits long frame inspection data to the MIP. Further, when an error occurs in the high traffic state in the error inspection unit 21, the detailed error inspection unit 22 similarly transmits the inspection data to the MIP in the high traffic state.

このように、詳細エラー検査手段２２が、サイレント故障の発生しているスイッチ２を特定すると、そのスイッチ２の識別子を、詳細結果データ１４ａに記憶する。表示手段２３は、詳細結果データ１４ａに含まれるスイッチ２の識別子に基づいて、サイレント故障が発生している機器である旨を表示装置４０に表示する。 As described above, when the detailed error checking means 22 identifies the switch 2 in which the silent failure has occurred, the identifier of the switch 2 is stored in the detailed result data 14a. Based on the identifier of the switch 2 included in the detailed result data 14a, the display means 23 displays on the display device 40 that the device has a silent failure.

図９を参照して、本発明の実施の形態に係る監視サーバ１の処理を説明する。 With reference to FIG. 9, the process of the monitoring server 1 which concerns on embodiment of this invention is demonstrated.

通信システム３のサイレント故障の検査のタイミングになると、すべての対象機器およびすべての検査データについて、ステップＳ１０１ないしステップＳ１０３を繰り返す。ここで、すべての対象機器とは、図１に示す通信システム３の第１のスイッチ２ａないし第１２のスイッチ２ｌである。すべての検査データとは、検査データ記憶部１１に記憶された検査データであって、例えば、ショートフレームの検査データ、ロングフレームの検査データ、優先度の異なる検査データなどである。また、高トラヒック状態をつくるために、複数の検査データの集合が含まれていても良い。 When it is time to inspect the silent failure of the communication system 3, Steps S101 to S103 are repeated for all target devices and all inspection data. Here, all the target devices are the first switch 2a to the twelfth switch 21 of the communication system 3 shown in FIG. All the inspection data is inspection data stored in the inspection data storage unit 11 and includes, for example, short frame inspection data, long frame inspection data, inspection data with different priorities, and the like. In order to create a high traffic state, a set of a plurality of inspection data may be included.

まずステップＳ１０１において監視サーバ１は、対象機器にｐｉｎｇで検査データを送信し、その応答データを受信する。ステップＳ１０２において監視サーバ１は、検査データと応答データとを比較し、一致しない場合、当該対象機器の識別子と、当該検査データを、結果データ１３ａに記憶する。 First, in step S101, the monitoring server 1 transmits inspection data to the target device by pinging and receives response data. In step S102, the monitoring server 1 compares the inspection data and the response data. If they do not match, the monitoring server 1 stores the identifier of the target device and the inspection data in the result data 13a.

すべての対象機器およびすべての検査データについてステップＳ１０１ないしステップＳ１０３の処理が終了すると、監視サーバ１は、ステップＳ１０４において、サイレント故障が発生した被疑区間を特定する。監視サーバ１は、トポロジーデータ１５ａを参照し、隣接するスイッチ２のいずれか一方のみが、結果データ１３ａに含まれている場合、この隣接する２つのスイッチを結ぶ区間を、被疑区間として特定することができる。 When the processing of step S101 to step S103 is completed for all target devices and all inspection data, the monitoring server 1 identifies the suspicious section in which the silent failure has occurred in step S104. The monitoring server 1 refers to the topology data 15a, and when only one of the adjacent switches 2 is included in the result data 13a, the section connecting the two adjacent switches is specified as the suspected section. Can do.

ステップＳ１０５において監視サーバ１は、トポロジーデータ１５ａに基づいて、通信システム３のトポロジーを表示装置４０に表示する。さらに監視サーバ１は、結果データ１３ａに基づいて検査データと応答データとが一致しなかった検査データの宛先のスイッチ２をエラー表示するとともに、ステップＳ１０４で特定した被疑区間を表示装置４０に表示する。なお、結果データ１３ａに何らデータが格納されていない場合、ステップＳ１０５において監視サーバ１は、エラーが発生していない旨を、表示装置４０に表示しても良い。 In step S105, the monitoring server 1 displays the topology of the communication system 3 on the display device 40 based on the topology data 15a. Further, the monitoring server 1 displays an error on the destination switch 2 of the inspection data whose inspection data and response data do not match based on the result data 13a, and displays the suspicious section identified in step S104 on the display device 40. . When no data is stored in the result data 13a, the monitoring server 1 may display on the display device 40 that no error has occurred in step S105.

ステップＳ１０６において監視サーバ１は、詳細エラー検査を実施するか否かを決定するために、結果データ１３ａに記録があるか否かを判定する。記録がない場合、通信システム３の対象機器のいずれもサイレント故障が発生していないので、そのまま処理を終了する。一方、結果データ１３ａに記録がある場合、ステップＳ１０４で特定された被疑区間の機器の各ＭＩＰについて、ステップＳ１０７およびステップＳ１０８の処理が繰り返される。 In step S106, the monitoring server 1 determines whether or not there is a record in the result data 13a in order to determine whether or not to perform the detailed error inspection. When there is no record, since no silent failure has occurred in any of the target devices of the communication system 3, the processing is ended as it is. On the other hand, when there is a record in the result data 13a, the processing in step S107 and step S108 is repeated for each MIP of the device in the suspected section identified in step S104.

まずステップＳ１０７において監視サーバ１は、各ＭＩＰに、イーサネットループバックで、検査データを送信する。ステップＳ１０８において監視サーバ１は、検査データと応答データが一致するか否かを判定する。各ＭＩＰに対して検査データを送信すると、ステップＳ１０９においてサイレント故障が発生した被疑装置を特定する。具体的には、機器の上流側のＭＩＰ宛に送信した検査データについては一致し、下流側のＭＩＰ宛に送信した検査データについては一致しなかった場合、その機器にサイレント故障が発生していると特定することができる。 First, in step S107, the monitoring server 1 transmits inspection data to each MIP by Ethernet loopback. In step S108, the monitoring server 1 determines whether the inspection data and the response data match. When the inspection data is transmitted to each MIP, the suspicious device in which the silent failure has occurred is specified in step S109. Specifically, if the inspection data transmitted to the MIP on the upstream side of the device matches, and the inspection data transmitted to the MIP on the downstream side does not match, a silent failure has occurred in the device. Can be specified.

被疑装置が特定されると、ステップＳ１１０において監視サーバ１は、特定された被疑装置の名称等を、表示装置４０に表示する。ここで監視サーバ１は、トポロジーデータとともに、被疑装置のアイコンを警告表示しても良い。 When the suspected device is identified, the monitoring server 1 displays the name of the identified suspected device on the display device 40 in step S110. Here, the monitoring server 1 may display a warning icon of the suspected device together with the topology data.

（適用例）
図１０に示すように、本発明の実施の形態に係る監視サーバ１によって、待機系の機器についてサイレント故障を検出することが有効である。図１０（ａ）に示すシステムは、待機系のＡスイッチ２００ａと、運用系のＢスイッチ２００Ｂと、Ａスイッチ２００ａまたはＢスイッチ２００ｂに処理を振り分けるＣスイッチ２００ｃを備える。 (Application example)
As shown in FIG. 10, it is effective to detect a silent failure for a standby device by the monitoring server 1 according to the embodiment of the present invention. The system shown in FIG. 10A includes a standby A switch 200a, an active B switch 200B, and a C switch 200c that distributes processing to the A switch 200a or the B switch 200b.

この場合、Ｃスイッチ２００ｃからＢスイッチ２００ｂに接続される第２のインスタンス２０１ｂにはトラヒックがあるものの、Ｃスイッチ２００ｃからＡスイッチ２００ａに接続される第１のインスタンス２０１ａには、トラヒックはない。しかし、図１０（ｂ）に示すように、運用系システムの障害に伴い、待機系システムは運用を開始するところ、この待機系システムにサイレント故障が発生している状況では、問題が発生するおそれがある。 In this case, the second instance 201b connected from the C switch 200c to the B switch 200b has traffic, but the first instance 201a connected from the C switch 200c to the A switch 200a has no traffic. However, as shown in FIG. 10B, when the standby system starts operation due to a failure of the active system, a problem may occur in a situation where a silent failure has occurred in the standby system. There is.

そこで本発明の実施の形態に係る監視サーバ１が、運用系システムが正常に稼働する間、待機系システムの機器にサイレント故障の検査を実施することが好ましい。これにより、待機系システムは、万全の状態で待機することができる。 Therefore, it is preferable that the monitoring server 1 according to the embodiment of the present invention performs a silent failure inspection on the devices in the standby system while the active system operates normally. Thereby, the standby system can stand by in a complete state.

このように、本発明の実施の形態に係る監視サーバ１によれば、検査データを送信し応答データを比較することにより、通信システム３の機器のメモリの不具合によるサイレント故障も発見することができる。これにより、高品質な通信システム３を提供することができる。 Thus, according to the monitoring server 1 which concerns on embodiment of this invention, the silent failure by the malfunction of the memory of the apparatus of the communication system 3 can be discovered by transmitting test | inspection data and comparing response data. . Thereby, the high quality communication system 3 can be provided.

また監視サーバ１は、コンピュータ処理として、通信システム３の機器のサイレント故障を予め発見することができる。これにより、例えば、ユーザの申告によって受動的に発見される障害の数を低減させることができる。また、待機系の通信システムに対してサイレント故障を検知することにより、定常的な正常性を確認することができる。 In addition, the monitoring server 1 can detect in advance a silent failure of a device of the communication system 3 as a computer process. Thereby, for example, the number of obstacles passively discovered by the user's report can be reduced. In addition, it is possible to confirm steady normality by detecting a silent failure in the standby communication system.

さらに、サイレント故障が発生した機器を特定することができるので、り障時間を短縮させることができる。さらに、トポロジーデータを用いて表示装置４０に表示することにより、作業者に障害箇所を迅速に特定させ、障害回復アクションを迅速化させることができる。 Furthermore, since the device in which the silent failure has occurred can be identified, the failure time can be shortened. Furthermore, by displaying the topology data on the display device 40, it is possible to prompt the operator to identify the fault location and speed up the fault recovery action.

また本発明の実施の形態に係る監視サーバ１は、ＩＣＭＰを利用して被疑区間を特定し、その被疑区間の機器のＭＩＰに対してイーサネットループバックを利用してサイレント故障の部位を特定することができる。このように段階的に故障箇所を特定することにより、サイレント故障の検査時間を短縮することができる。 In addition, the monitoring server 1 according to the embodiment of the present invention specifies the suspected section using ICMP, and identifies the part of the silent failure using the Ethernet loopback for the MIP of the device in the suspected section. Can do. In this way, by specifying the failure location step by step, the inspection time for silent failure can be shortened.

このように本発明の実施の形態に係る監視サーバ１によれば、高品質な通信システム３を提供し、ユーザの信頼性を向上させることができる。 Thus, according to the monitoring server 1 which concerns on embodiment of this invention, the high quality communication system 3 can be provided and a user's reliability can be improved.

（その他の実施の形態）
上記のように、本発明の実施の形態によって記載したが、この開示の一部をなす論述および図面はこの発明を限定するものであると理解すべきではない。この開示から当業者には様々な代替実施の形態、実施例および運用技術が明らかとなる。 (Other embodiments)
As described above, the embodiments of the present invention have been described. However, it should not be understood that the descriptions and drawings constituting a part of this disclosure limit the present invention. From this disclosure, various alternative embodiments, examples, and operational techniques will be apparent to those skilled in the art.

例えば、本発明の最良の実施の形態に記載したアプリケーションサーバ１は、図３に示すように一つのハードウェア上に構成されても良いし、その機能や処理数に応じて複数のハードウェア上に構成されても良い。 For example, the application server 1 described in the best embodiment of the present invention may be configured on one piece of hardware as shown in FIG. 3, or on a plurality of pieces of hardware according to the functions and the number of processes. It may be configured.

本発明はここでは記載していない様々な実施の形態等を含むことは勿論である。従って、本発明の技術的範囲は上記の説明から妥当な特許請求の範囲に係る発明特定事項によってのみ定められるものである。 It goes without saying that the present invention includes various embodiments not described herein. Therefore, the technical scope of the present invention is defined only by the invention specifying matters according to the scope of claims reasonable from the above description.

１監視サーバ
２スイッチ
３通信システム
１０記憶装置
１１検査データ記憶部
１２機器データ記憶部
１３結果データ記憶部
１４詳細結果データ記憶部
１５トポロジーデータ記憶部
２０中央処理制御装置
２１エラー検査手段
２２詳細エラー検査手段
２３表示手段
３０通信制御装置
４０表示装置 DESCRIPTION OF SYMBOLS 1 Monitoring server 2 Switch 3 Communication system 10 Storage apparatus 11 Inspection data storage part 12 Equipment data storage part 13 Result data storage part 14 Detailed result data storage part 15 Topology data storage part 20 Central processing control apparatus 21 Error inspection means 22 Detailed error inspection Means 23 Display means 30 Communication control device 40 Display device

Claims

A monitoring server in a communication system comprising a plurality of devices and a monitoring server for inspecting silent failures of devices,
A device data storage unit for storing device data including an identifier of a device to be inspected for silent failure;
An inspection data storage unit for storing inspection data to be transmitted to the device;
A plurality of the inspection data is transmitted to the device to be inspected for the device data within a predetermined time to be in a high traffic state in the device, and the response data in which the inspection data is copied and returned by the device is acquired. It is determined whether or not the inspection data transmitted to the device and the response data match, and if they do not match, error checking means for storing the identifier of the device in the result data;
When it is determined that any of the devices does not match in the error inspection unit, the inspection data is stored in the MIP on each of the upstream side and the downstream side of the target device determined not to match and the device adjacent to the target device. Transmitting the response data in which the inspection data is copied and returned by the target device and the adjacent device, and transmitting the inspection data transmitted to the upstream side and the downstream side of the target device and the adjacent device, respectively It is determined whether or not the response data matches, and if the determination result does not match between the upstream side and the downstream side, detailed error checking means for specifying the device in which the silent failure has occurred is determined based on the identifier of the MIP. A monitoring server comprising:

A topology data storage unit that stores topology data indicating the topology of the communication system together with the identifier of the device;
A display that displays the network configuration of the communication system on a display device based on the topology data, extracts an identifier of the device included in the result data, and displays a warning on the display device based on the extracted identifier of the device The monitoring server according to claim 1, further comprising: means.

The inspection data storage unit stores a plurality of different inspection data,
The monitoring server according to claim 1, wherein the error inspection unit transmits each of a plurality of inspection data stored in the inspection data storage unit to the inspection target device.

The detailed error inspection means refers to the result data and the topology data, and when the identifier of any one of the adjacent devices is included in the result data, transmits the inspection data to the MIP of the adjacent device. The monitoring server according to any one of claims 1 to 3 , characterized in that:

When the information system includes an active device and a standby device,
The monitoring server according to any one of claims 1 to 4 , wherein the error inspection unit transmits the inspection data to a standby device.

The monitoring program according to any one of claims 1 to 5 .