JP5230311B2

JP5230311B2 - Failure analysis system and failure analysis method

Info

Publication number: JP5230311B2
Application number: JP2008230609A
Authority: JP
Inventors: 昌克森井; 恵志伊加田; 佳孝濱口
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2008-09-09
Filing date: 2008-09-09
Publication date: 2013-07-10
Anticipated expiration: 2028-09-09
Also published as: JP2010068075A

Description

本発明は、ネットワークに接続された機器の障害を解析する手法に関するものである。 The present invention relates to a technique for analyzing a failure of a device connected to a network.

ネットワークの発達および情報通信機器の普及により、様々な機器がネットワークに接続され、情報を交換し合うようになってきている。このような状況は、一般家庭にも徐々に浸透し始め、各家庭内でネットワークを構築したホームネットワークという言葉も生まれた。
また、このホームネットワークに接続される機器も、従来のパーソナルコンピュータの他、テレビや冷蔵庫といった一般的な家電製品や、人間の存在を検知するセンサといったものまで含まれるようになってきている。 With the development of networks and the spread of information communication devices, various devices are connected to the network and exchange information. This situation has gradually spread to ordinary households, and the term “home network” has been created.
Also, devices connected to the home network include not only conventional personal computers but also general home appliances such as televisions and refrigerators, and sensors for detecting the presence of human beings.

しかし、一般家庭におけるネットワークは、従来の専門家によって管理されてきたインターネットやイントラネットとは異なり、プライバシ等の問題から、外部の人間が無断でネットワークにアクセスできるようにすることは好ましくない。
そのため、ホームネットワーク内で障害が発生した場合、ユーザ自身がそれを発見して対処する必要がある。しかし、専門家ではないユーザが自らこれらを全て行うことは困難である。 However, unlike the Internet and intranets that have been managed by conventional experts, it is not preferable for a network in a general home to allow an external person to access the network without permission, due to problems such as privacy.
Therefore, when a failure occurs in the home network, it is necessary for the user to find out and deal with it. However, it is difficult for a non-expert user to do all of this.

そこで、パーソナルコンピュータやその他の通信機器をはじめとした各メーカ等は、サポートセンターやコールセンターを設け、ホームネットワーク上での障害に対する対処を行っている。
ユーザは、メーカ等のコールセンター等に電話をかける。コールセンター等のオペレータは、ユーザから発生状況を直接聞き取り、障害状況や原因、対処方法を、過去の事例や専門家の経験・勘などを元に導き出す。 Thus, manufacturers such as personal computers and other communication devices have established support centers and call centers to deal with failures on the home network.
The user calls a call center such as a manufacturer. An operator such as a call center directly listens to the occurrence status from the user, and derives the failure status, cause, and coping method based on past cases and experience and intuition of experts.

一方、ネットワークの障害診断に関し、『ネットワークに発生する障害と障害の兆候を示すイベントとの因果関係に基づいて障害を特定するネットワーク障害診断装置において、管理対象ネットワークとの間のトラフィックを削減すること。』を目的とした技術として、『因果関係テーブル１０４が障害とイベントの因果関係を記憶し、監視イベント選択部１０５が、因果関係テーブル１０４を参照し、障害を特定するために必要最低限のイベントを抽出して監視イベントに設定し、取得イベント選択部１０７が、最新の障害候補に基づいて因果関係テーブル１０４からイベントを選択し、選択した各イベントに対して障害を効率よく特定できる順番に優先度を設定し、イベント取得部１０２が、設定された優先度の順番にイベントを要求し、要求に対して応答されるイベントをイベント受信部１０３が受信し、順次受信されるイベントをもとに障害判定部１０８が障害の候補を絞り込むよう構成する。』というものが提案されている（特許文献１）。 On the other hand, with regard to network fault diagnosis, “in a network fault diagnosis device that identifies faults based on the causal relationship between faults occurring in the network and events indicating fault signs, reducing traffic to and from the managed network. . As a technique aiming at “the causal relationship table 104 stores the causal relationship between the fault and the event, the monitoring event selection unit 105 refers to the causal relationship table 104 and identifies the minimum event necessary for identifying the fault. Is extracted and set as a monitoring event, and the acquisition event selection unit 107 selects an event from the causal relationship table 104 based on the latest failure candidate, and gives priority to the selected event in the order in which the failure can be efficiently identified. The event acquisition unit 102 requests events in the order of the set priority, the event reception unit 103 receives an event responding to the request, and based on the sequentially received events The failure determination unit 108 is configured to narrow down failure candidates. Is proposed (Patent Document 1).

また、障害予測に関し、『予測対象装置で生じたイベントの種類やその発生順序に基づいて障害発生の予測をすることができる障害予測システム等を提供すること』を目的とした技術として、『障害予測システム１は、予測対象装置１０に生じたイベントに関するイベントログ３５に対しデータマイニングを実施して、たとえばイベントの発生順序によって特定される前兆パターンを抽出し、解析対象ログに前兆パターンが検出されたときに予測対象装置１０に障害が発生すると予測するログ解析部３９を備えている。』というものが提案されている（特許文献２）。 In addition, with regard to failure prediction, as a technology for the purpose of “providing a failure prediction system that can predict failure occurrence based on the type of event that occurred in the prediction target device and its occurrence order” The prediction system 1 performs data mining on the event log 35 related to the event that has occurred in the prediction target device 10, extracts, for example, a precursor pattern specified by the event occurrence order, and the precursor pattern is detected in the analysis target log. A log analysis unit 39 that predicts that a failure will occur in the prediction target device 10 at the time. Is proposed (Patent Document 2).

特開２００７−９６７９６号公報JP 2007-96796 A 特開２００７−１７２１３１号公報JP 2007-172131 A

上記特許文献１や特許文献２に記載の技術では、ホームネットワーク上での機器障害を検知してユーザに通知することはできるが、ユーザがその通知を受けて障害に自ら対処することは一般に困難である。 With the techniques described in Patent Document 1 and Patent Document 2, it is possible to detect a device failure on the home network and notify the user, but it is generally difficult for the user to receive the notification and deal with the failure themselves. It is.

また、ユーザがメーカ等のコールセンター等に障害復旧を依頼する場合でも、一般にユーザは機器や障害分析の知識をもっておらず、障害状況や原因をオペレータが判断するために必要な情報を的確に伝えることは困難である。したがって、オペレータが障害状況等を把握して障害復旧を完了するまでに時間がかかる。 Also, even when a user requests a failure recovery from a call center of a manufacturer, etc., the user generally does not have knowledge of equipment or failure analysis, and accurately conveys information necessary for the operator to determine the failure status and cause. It is difficult. Therefore, it takes time for the operator to grasp the failure status and complete the failure recovery.

一方、コールセンター側で、ホームネットワーク内に設置した機器からログ等の動作情報を取得して、障害状況の解析に用いることも考えられるが、プライバシ等の観点から家庭内の機器に関する情報を全てコールセンターに送信することは好ましくない。
さらには、仮に全ての情報を送信するとしても、送信のための通信量が膨大になってしまう懸念がある。 On the other hand, it is conceivable that the call center side obtains operation information such as logs from devices installed in the home network and uses it for analysis of failure conditions. It is not preferable to send to.
Furthermore, even if all information is transmitted, there is a concern that the amount of communication for transmission becomes enormous.

そのため、障害分析を行うために必要な最小限の情報を収集して解析先へ送信することのできる障害分析手法が望まれていた。 Therefore, there has been a demand for a failure analysis method that can collect and transmit the minimum information necessary for failure analysis to an analysis destination.

本発明に係る障害分析システムは、第１ネットワークに接続されたゲートウェイ装置と、前記第１ネットワークとは異なる第２ネットワークに接続された解析装置と、を有し、前記ゲートウェイ装置は、前記第１ネットワーク上の１ないし複数の機器の障害を検知する障害検知部と、前記機器間で通信が行われた際に各通信相手の識別情報を記録する通信状況記録部と、前記機器の動作情報を取得して前記解析装置に送信する動作情報取得部と、を備え、前記解析装置は、前記動作情報を用いて前記機器の障害解析を行う障害解析部を備え、前記動作情報取得部は、前記通信状況記録部が記録している識別情報を用いて、前記障害検知部が障害を検知した機器が過去に通信を行った相手機器を特定し、その通信相手の動作情報を取得して前記解析装置に送信するものである。 A failure analysis system according to the present invention includes a gateway device connected to a first network and an analysis device connected to a second network different from the first network, and the gateway device includes the first device A failure detection unit that detects a failure of one or more devices on the network, a communication status recording unit that records identification information of each communication partner when communication is performed between the devices, and operation information of the devices An operation information acquisition unit that acquires and transmits the operation information to the analysis device, the analysis device includes a failure analysis unit that performs a failure analysis of the device using the operation information, and the operation information acquisition unit includes: Using the identification information recorded by the communication status recording unit, the device in which the failure detection unit has detected a failure identifies a partner device with which communication has been performed in the past, acquires the operation information of the communication partner, and It is intended to be transmitted to the analysis device.

本発明に係る障害分析システムは、障害が発生した機器が過去に通信を行った相手機器が障害に関係しているという想定の下、その相手機器の動作情報を解析装置に送信する。
即ち、ネットワーク上の全ての機器の動作情報を解析装置に送信することになるので、障害に関係していると思われる機器の動作情報のみを解析装置に送信し、通信量を抑えることができる。
また、障害解析に必要ない情報を送信せずに済み、送信する情報を必要最低限に抑えることができるので、情報漏えい・プライバシーの保護の観点からも好適に用いることができる。 The fault analysis system according to the present invention transmits the operation information of the counterpart device to the analysis device under the assumption that the counterpart device with which the faulty device has communicated in the past is related to the fault.
That is, since the operation information of all the devices on the network is transmitted to the analysis device, only the operation information of the device that seems to be related to the failure can be transmitted to the analysis device, thereby reducing the amount of communication. .
In addition, since it is not necessary to transmit information that is not necessary for failure analysis and the information to be transmitted can be suppressed to the minimum necessary, it can be suitably used from the viewpoint of information leakage and privacy protection.

実施の形態１．
図１は、本発明の実施の形態１に係る障害解析システムの構成図である。
本実施の形態１に係る障害解析システムは、ゲートウェイ装置２００、解析サーバ３００を有する。以下、各装置等の構成を説明し、その後に本実施の形態１に係る障害解析システムの動作を説明する。 Embodiment 1 FIG.
FIG. 1 is a configuration diagram of a failure analysis system according to Embodiment 1 of the present invention.
The failure analysis system according to the first embodiment includes a gateway device 200 and an analysis server 300. Hereinafter, the configuration of each device will be described, and then the operation of the failure analysis system according to the first embodiment will be described.

ゲートウェイ装置２００と解析サーバ３００は、ネットワーク６００を介して接続されている。また、ゲートウェイ装置２００の配下には、ローカルネットワーク１００が敷設されている。
図１では、記載の簡易の観点から、ネットワーク６００と解析サーバ３００が直接接続されているように記載したが、解析サーバ３００は、ローカルネットワーク１００と同様に組織内ネットワークに接続されていてもよい。 Gateway device 200 and analysis server 300 are connected via network 600. A local network 100 is laid under the gateway device 200.
In FIG. 1, the network 600 and the analysis server 300 are described as being directly connected from the viewpoint of simplicity of description, but the analysis server 300 may be connected to the intra-organization network as with the local network 100. .

ローカルネットワーク１００は、ある組織内で閉じたネットワークである。例えば、家庭内のネットワーク（ホームネットワーク）がこれに相当する。 The local network 100 is a closed network within an organization. For example, a home network (home network) corresponds to this.

ゲートウェイ装置２００は、ローカルネットワーク１００とネットワーク６００の接続点に設置され、配下には機器４０１〜４０４が接続されている。
ゲートウェイ装置２００は、機器４０１〜４０４同士、または機器４０１〜４０４と解析サーバ３００の間の通信を仲介するルータとしての機能を備えている。また、ゲートウェイ装置２００自身も、機器４０１〜４０４、および解析サーバ３００と通信する機能を備える。 The gateway device 200 is installed at a connection point between the local network 100 and the network 600, and devices 401 to 404 are connected under the gateway device 200.
The gateway device 200 has a function as a router that mediates communication between the devices 401 to 404 or between the devices 401 to 404 and the analysis server 300. The gateway device 200 itself also has a function of communicating with the devices 401 to 404 and the analysis server 300.

ゲートウェイ装置２００は、障害検知部２０１、通信観測部２０２、通信記録部２０３、動作情報取得部２０４を備える。 The gateway device 200 includes a failure detection unit 201, a communication observation unit 202, a communication recording unit 203, and an operation information acquisition unit 204.

障害検知部２０１は、例えば特許文献１〜２に記載されているような技術を用いて、機器４０１〜４０４で発生する障害を検知する。
通信観測部２０２は、後述の図４で説明する手順を用いて、機器４０１〜４０４間の通信を観測し、その状況を通信記録部２０３に格納する。 The failure detection unit 201 detects a failure that occurs in the devices 401 to 404 using a technique such as that described in Patent Literatures 1 and 2, for example.
The communication observation unit 202 observes communication between the devices 401 to 404 using a procedure described later with reference to FIG. 4 and stores the situation in the communication recording unit 203.

通信記録部２０３は、機器４０１〜４０４間の通信状況を記録する。通信記録の具体例については、後述の図４で改めて説明する。
動作情報取得部２０４は、機器４０１〜４０４がそれぞれ保持している動作情報５０１〜５０４を取得する。動作情報取得部２０４の詳細動作については、後述の図３で改めて説明する。 The communication recording unit 203 records the communication status between the devices 401 to 404. A specific example of the communication record will be described later with reference to FIG.
The operation information acquisition unit 204 acquires the operation information 501 to 504 held by the devices 401 to 404, respectively. The detailed operation of the operation information acquisition unit 204 will be described again with reference to FIG.

解析サーバ３００は、ローカルネットワーク１００とは異なるネットワークに属するサーバ装置であり、機器４０１〜４０４で発生した障害を解析する役割を有する。解析サーバ３００は、障害解析部３０１、動作情報要求部３０２を備える。 The analysis server 300 is a server device that belongs to a network different from the local network 100 and has a role of analyzing a failure that has occurred in the devices 401 to 404. The analysis server 300 includes a failure analysis unit 301 and an operation information request unit 302.

障害解析部３０１は、動作情報５０１〜５０４を解析してその機器に発生した障害の原因を絞り込む。
動作情報要求部３０２は、機器４０１〜４０４からそれぞれの動作情報５０１〜５０４を取得して解析サーバ３００に送信するよう、ゲートウェイ装置２００に要求する。 The failure analysis unit 301 analyzes the operation information 501 to 504 to narrow down the cause of the failure that has occurred in the device.
The operation information requesting unit 302 requests the gateway device 200 to acquire the respective operation information 501 to 504 from the devices 401 to 404 and transmit them to the analysis server 300.

機器４０１〜４０４は、相互に通信する機能を有する。図１では４台構成の例を示したが、台数はこれに限られるものではない。また、機器４０１〜４０４は、内部プロセスのログを出力する機能や、当該機器の外部にプロセス一覧を出力する機能を備える。
動作情報５０１〜５０４は、機器４０１〜４０４がそれぞれ記録または出力する、各機器の動作状況を表す情報である。 The devices 401 to 404 have a function of communicating with each other. Although FIG. 1 shows an example of a four-unit configuration, the number is not limited to this. Further, the devices 401 to 404 have a function of outputting an internal process log and a function of outputting a process list to the outside of the device.
The operation information 501 to 504 is information representing the operation status of each device, which is recorded or output by the devices 401 to 404, respectively.

ネットワーク６００は、ローカルネットワーク１００と、解析サーバ３００が属するネットワークを接続する、例えばインターネット等のネットワークである。 The network 600 is a network such as the Internet that connects the local network 100 and the network to which the analysis server 300 belongs.

障害検知部２０１、通信観測部２０２、動作情報取得部２０４、障害解析部３０１、動作情報要求部３０２は、これらの機能を実現する回路デバイスのようなハードウェアで構成することもできるし、マイコンやＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）のような演算装置とその動作を規定するソフトウェアで構成することもできる。また、必要な通信インターフェース等を適宜備える。 The failure detection unit 201, the communication observation unit 202, the operation information acquisition unit 204, the failure analysis unit 301, and the operation information request unit 302 can be configured by hardware such as a circuit device that realizes these functions, or a microcomputer. And a computing device such as a CPU (Central Processing Unit) and software defining its operation. In addition, necessary communication interfaces and the like are provided as appropriate.

通信記録部２０３は、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）のような記憶装置で構成することができる。 The communication recording unit 203 can be configured by a storage device such as an HDD (Hard Disk Drive).

本実施の形態１における「通信状況記録部」は、通信観測部２０２、通信記録部２０３が相当する。 The “communication status recording unit” in the first embodiment corresponds to the communication observation unit 202 and the communication recording unit 203.

以上、本実施の形態１に係る障害解析システムの各装置等の構成を説明した。
次に、本実施の形態１に係る障害解析システムの動作を説明する。 The configuration of each device and the like of the failure analysis system according to the first embodiment has been described above.
Next, the operation of the failure analysis system according to the first embodiment will be described.

本実施の形態１に係る障害解析システムは、全体としては後述の図２で説明する動作を行う。また、ゲートウェイ装置２００は、障害解析システムの全体動作と並行して、後述の図４で説明する通信記録動作を行う。
以下では、まず始めに障害解析システムの全体動作を図２で説明し、個別の動作については図３〜図５で説明する。 The failure analysis system according to the first embodiment performs the operation described with reference to FIG. 2 as a whole. Further, the gateway device 200 performs a communication recording operation described in FIG. 4 to be described later in parallel with the overall operation of the failure analysis system.
In the following, first, the overall operation of the failure analysis system will be described with reference to FIG. 2, and individual operations will be described with reference to FIGS.

図２は、本実施の形態１に係る障害解析システムの動作フローである。以下、図２の各ステップについて説明する。 FIG. 2 is an operation flow of the failure analysis system according to the first embodiment. Hereinafter, each step of FIG. 2 will be described.

（Ｓ２０１）
図１の機器４０１で障害が発生したものと仮定する。
（Ｓ２０２）
ゲートウェイ装置２００の障害検知部２０１は、機器４０１で発生した障害を検知する。検知する手法としては、例えば動作情報取得部２０４が取得した動作情報５０１を解析する、機器４０１のプロセスを監視する、といった手法が考えられる。また、特許文献１〜２に記載されているような公知の手法を用いてもよい。 (S201)
Assume that a failure has occurred in the device 401 of FIG.
(S202)
The failure detection unit 201 of the gateway device 200 detects a failure that has occurred in the device 401. As a detection method, for example, a method of analyzing the operation information 501 acquired by the operation information acquisition unit 204 or monitoring the process of the device 401 can be considered. Moreover, you may use the well-known method as described in patent documents 1-2.

（Ｓ２０３）
障害検知部２０１は、機器４０１に障害が発生した旨を、ネットワーク６００を介して解析サーバ３００に送信する。あるいは、障害検知部２０１は、機器４０１に障害が発生した旨を適当な手法でユーザに通知し、ユーザはその通知を受けて電話や電子メールでコールセンターにその旨を連絡する。
なおここでは、コールセンターのオペレータは、解析サーバ３００に対し操作指示を行うことができる端末等を有しているものと仮定する。 (S203)
The failure detection unit 201 transmits information that a failure has occurred in the device 401 to the analysis server 300 via the network 600. Alternatively, the failure detection unit 201 notifies the user that a failure has occurred in the device 401 by an appropriate method, and the user receives the notification and notifies the call center by telephone or e-mail.
Here, it is assumed that the call center operator has a terminal or the like that can give an operation instruction to the analysis server 300.

（Ｓ２０４）
コールセンターのオペレータは、ステップＳ２０３で受けた障害発生の通知に基づき、障害原因の分析等を行うために必要な動作情報を取得するよう、解析装置３００に指示する。
なお、コールセンターのオペレータではなく、障害発生の通知に基づき、動作情報を取得するように自動的に指示する装置にしてもよい。
解析装置３００の動作情報要求部３０２は、その取得要求が、障害発生機器（ここでは機器４０１）の動作情報５０１を取得する要求であるか、それとも障害に関連する機器（例えば４０２、４０３、４０４）の動作情報（例えば５０２、５０３、５０４）を取得する要求であるかを判定する。
障害発生機器４０１についての取得要求であればステップＳ２０５へ進み、関連機器４０２等についての取得要求であればステップＳ２１１へ進む。 (S204)
The call center operator instructs the analysis apparatus 300 to acquire operation information necessary for analyzing the cause of the failure based on the failure occurrence notification received in step S203.
Note that, instead of a call center operator, an apparatus that automatically instructs operation information to be acquired based on a failure occurrence notification may be used.
The operation information request unit 302 of the analysis apparatus 300 determines whether the acquisition request is a request for acquiring the operation information 501 of the faulty device (in this case, the device 401) or a device related to the fault (for example, 402, 403, 404). ) Operation information (for example, 502, 503, 504) is determined.
If it is an acquisition request for the faulty device 401, the process proceeds to step S205, and if it is an acquisition request for the related device 402, the process proceeds to step S211.

（Ｓ２０５）
解析装置３００の動作情報要求部３０２は、ゲートウェイ装置２００に対し、機器４０１の動作情報５０１を送信するよう要求する。 (S205)
The operation information request unit 302 of the analysis device 300 requests the gateway device 200 to transmit the operation information 501 of the device 401.

（Ｓ２０６）
ゲートウェイ装置２００の動作情報取得部２０４は、ステップＳ２０５の要求を受け取ったときは、機器４０１の動作情報５０１を取得する。また、ステップＳ２１１の要求を受け取ったときは、関連機器の動作情報を取得する。本ステップの詳細は、後述の図３で改めて説明する。
以下のステップＳ２０７〜Ｓ２１０の説明では、本ステップでステップＳ２０５の要求を受け取ったものと仮定する。 (S206)
When the operation information acquisition unit 204 of the gateway device 200 receives the request in step S205, the operation information acquisition unit 204 acquires the operation information 501 of the device 401. When the request in step S211 is received, the operation information of the related device is acquired. Details of this step will be described later with reference to FIG.
In the following description of steps S207 to S210, it is assumed that the request of step S205 has been received in this step.

（Ｓ２０７）
動作情報取得部２０４は、ステップＳ２０６で取得した動作情報５０１を、ネットワーク６００を介して解析サーバ３００に送信する。
このとき、動作情報取得部２０４は、ＡＥＳ（ＡｄｖａｎｃｅｄＥｎｃｒｙｐｔｉｏｎＳｔａｎｄａｒｄ）のような共通鍵暗号や、ＲＳＡのような公開鍵暗号を用いて送信データを暗号化したり、ＳＳＬ（ＳｅｃｕｒｅＳｏｃｋｅｔＬａｙｅｒ）やＩＰｓｅｃのような通信路暗号化技術を用いたりして、送信する内容に何らかのセキュリティ対策を施す。 (S207)
The operation information acquisition unit 204 transmits the operation information 501 acquired in step S206 to the analysis server 300 via the network 600.
At this time, the operation information acquisition unit 204 encrypts transmission data using a common key encryption such as AES (Advanced Encryption Standard), or a public key encryption such as RSA, or uses SSL (Secure Socket Layer) or IPsec. Some security measures are applied to the content to be transmitted, for example, using such a channel encryption technology.

（Ｓ２０８）
解析サーバ３００の障害解析部３０１は、機器４０１の動作情報５０１を受信する。次に、障害解析部３０１は、その動作情報５０１を解析し、障害原因などの分析を行う。
（Ｓ２０９）
障害解析部３０１は、より詳細な解析を行うために、動作情報５０１以外の動作情報（図１の例では５０２〜５０４）が更に必要であるか否かを判断する。必要であると判断するときはステップＳ２０４へ戻り、必要でないと判断するときはステップＳ２１０へ進む。 (S208)
The failure analysis unit 301 of the analysis server 300 receives the operation information 501 of the device 401. Next, the failure analysis unit 301 analyzes the operation information 501 and analyzes the cause of the failure.
(S209)
The failure analysis unit 301 determines whether or not operation information other than the operation information 501 (502 to 504 in the example of FIG. 1) is further required in order to perform more detailed analysis. When it is determined that it is necessary, the process returns to step S204, and when it is determined that it is not necessary, the process proceeds to step S210.

（Ｓ２１０）
解析サーバ３００の障害解析部３０１は、障害分析を完了し、コールセンターのオペレータに結果を通知する。オペレータは、その結果に基づき、ユーザにアドバイスを行うなどの対処を取る。これらを自動で行ってもよいが、セキュリティ等の観点から、このような手法用いる方が望ましい。
（Ｓ２１１）
一方、上記ステップ２０４において、関連機器４０２等についての取得要求であれば、
解析装置３００の動作情報要求部３０２は、ゲートウェイ装置２００に対し、機器４０１の障害発生と関連のある動作情報を送信するよう要求し、上記Ｓ２０６に進む。 (S210)
The failure analysis unit 301 of the analysis server 300 completes the failure analysis and notifies the result to the call center operator. Based on the result, the operator takes measures such as giving advice to the user. These may be performed automatically, but it is preferable to use such a method from the viewpoint of security and the like.
(S211)
On the other hand, in the above step 204, if it is an acquisition request for the related device 402 or the like,
The operation information request unit 302 of the analysis device 300 requests the gateway device 200 to transmit operation information related to the occurrence of the failure of the device 401, and the process proceeds to S206.

以上、本実施の形態１に係る障害解析システムの動作フローを説明した。
次に、ステップＳ２０６の詳細動作を説明する。 The operation flow of the failure analysis system according to the first embodiment has been described above.
Next, the detailed operation of step S206 will be described.

図３は、ステップＳ２０６の詳細動作を説明する動作フローである。以下、図３の各ステップについて説明する。なお、図３では、図２と同様に機器４０１に障害が発生した場合を想定する。 FIG. 3 is an operation flow for explaining the detailed operation of step S206. Hereinafter, each step of FIG. 3 will be described. In FIG. 3, it is assumed that a failure has occurred in the device 401 as in FIG.

（Ｓ３０１）
ゲートウェイ装置２００の動作情報取得部２０４は、図２のステップＳ２０５またはＳ２１１で、機器４０１〜４０４のいずれかの動作情報を取得するよう要求を受け取る。次に、この取得要求が、障害発生機器（図２の例では機器４０１）についての動作情報取得要求であるか否かを判定する。
障害発生機器についての動作情報取得要求であればステップＳ３０４へ進み、それ以外の場合はステップＳ３０２へ進む。 (S301)
The operation information acquisition unit 204 of the gateway device 200 receives a request to acquire operation information of any of the devices 401 to 404 in step S205 or S211 of FIG. Next, it is determined whether or not this acquisition request is an operation information acquisition request for a faulty device (device 401 in the example of FIG. 2).
If it is an operation information acquisition request for a faulty device, the process proceeds to step S304. Otherwise, the process proceeds to step S302.

（Ｓ３０２）
動作情報取得部２０４は、通信記録部２０３が格納している通信記録の中から、機器４０１の通信記録を検索する。次に、機器４０１と過去に通信を行った機器（例えば、４０２、４０３、４０４）を、その検索結果に基づき抽出する。通信記録の具体例は後述の図５で示す。
本ステップは、機器４０１と過去に通信を行った機器が、機器４０１の障害発生に関連しているだろうとの想定の下、それらの機器を通信記録の中から検索する意義がある。 (S302)
The operation information acquisition unit 204 searches for the communication record of the device 401 from the communication records stored in the communication recording unit 203. Next, devices that have communicated with the device 401 in the past (for example, 402, 403, and 404) are extracted based on the search results. A specific example of the communication record is shown in FIG.
This step has the significance of searching for the devices in the communication record under the assumption that the devices that have communicated with the device 401 in the past will be related to the occurrence of the failure of the device 401.

（Ｓ３０３）
動作情報取得部２０４は、ステップＳ３０２で検索した機器（例えば、４０２、４０３、４０４）の動作情報（例えば５０２、５０３、５０４）を取得する。このとき、当該機器のＩＰアドレス等、機器の個体識別を行うことのできる情報を併せて取得して動作情報に含めてもよい。ステップＳ３０４でも同様である。
（Ｓ３０４）
動作情報取得部２０４は、機器４０１の動作情報５０１を取得する。 (S303)
The operation information acquisition unit 204 acquires operation information (for example, 502, 503, 504) of the device (for example, 402, 403, 404) searched in step S302. At this time, information that enables individual identification of the device, such as the IP address of the device, may be acquired together and included in the operation information. The same applies to step S304.
(S304)
The operation information acquisition unit 204 acquires operation information 501 of the device 401.

以上、図２のステップＳ２０６の詳細動作について説明した。
次に、ゲートウェイ装置２００が機器４０１〜４０４の通信状況を通信記録部２０３に記録する動作を説明する。 The detailed operation of step S206 in FIG. 2 has been described above.
Next, an operation in which the gateway device 200 records the communication status of the devices 401 to 404 in the communication recording unit 203 will be described.

図４は、ゲートウェイ装置２００が常時行っている、機器４０１〜４０４の通信状況を記録する動作のフローチャートである。ゲートウェイ装置２００は、図４の動作を、例えば所定時間間隔で実行し、機器４０１〜４０４の通信状況を定常的に記録する。
以下、図４の各ステップについて説明する。なお、通信記録の記録形式の例については、後述の図５で改めて説明する。 FIG. 4 is a flowchart of an operation of recording the communication status of the devices 401 to 404, which is always performed by the gateway device 200. The gateway device 200 performs the operation of FIG. 4 at predetermined time intervals, for example, and regularly records the communication status of the devices 401 to 404.
Hereinafter, each step of FIG. 4 will be described. An example of the recording format of the communication record will be described again with reference to FIG.

（Ｓ４００）
ゲートウェイ装置２００の通信観測部２０２は、図２〜図３で説明した各部の動作と並行して、機器４０１〜４０４間の通信を常時観測している。
（Ｓ４０１）
通信観測部２０２は、機器４０１〜４０４間で通信が行われると、その通信パケットを捕捉する。 (S400)
The communication observation unit 202 of the gateway device 200 constantly observes communication between the devices 401 to 404 in parallel with the operation of each unit described with reference to FIGS.
(S401)
When communication is performed between the devices 401 to 404, the communication observation unit 202 captures the communication packet.

（Ｓ４０２）
通信観測部２０２は、ステップＳ４０１で捕捉したパケットから、送信元アドレスと送信先アドレスのペアを抽出する。ここではＩＰアドレスを抽出するものとする。
（Ｓ４０３）
通信観測部２０２は、ステップＳ４０１で抽出した送信元アドレスと送信先アドレスのペアが、通信記録部２０３に既に記録済みであるか否かを判定する。記録済みであればステップＳ４０５へ進み、記録済みでなければステップＳ４０４へ進む。 (S402)
The communication observation unit 202 extracts a source address / destination address pair from the packet captured in step S401. Here, it is assumed that the IP address is extracted.
(S403)
The communication observation unit 202 determines whether or not the pair of the transmission source address and the transmission destination address extracted in step S 401 has already been recorded in the communication recording unit 203. If already recorded, the process proceeds to step S405, and if not recorded, the process proceeds to step S404.

（Ｓ４０４）
通信観測部２０２は、ステップＳ４０２で抽出した送信元アドレスと送信先アドレスのペアを、抽出時刻とともに通信記録部２０３に格納する。
（Ｓ４０５）
通信観測部２０２は、ステップＳ４０２で抽出した送信元アドレスと送信先アドレスに該当する通信記録部２０３内の通信記録を、ステップＳ４０２の抽出時刻で時刻のみ更新する。 (S404)
The communication observation unit 202 stores the pair of the transmission source address and the transmission destination address extracted in step S402 in the communication recording unit 203 together with the extraction time.
(S405)
The communication observation unit 202 updates only the time of the communication record in the communication recording unit 203 corresponding to the transmission source address and the transmission destination address extracted in step S402 at the extraction time of step S402.

（Ｓ４０６）
通信観測部２０２は、現在時刻よりも所定時間以上前に通信記録部２０３に記録された通信記録を削除する。具体的には、現在時刻と、通信記録部２０３に記録されている通信記録の記録時刻とを比較し、所定時間以上前の通信記録を削除する。これにより、最新の通信状況だけが通信記録部２０３に残り、障害に関連する機器だけの情報が残せるとともに、通信記録部２０３の記憶領域が膨大なることを防ぐことができる。 (S406)
The communication observation unit 202 deletes the communication record recorded in the communication recording unit 203 a predetermined time or more before the current time. Specifically, the current time is compared with the recording time of the communication record recorded in the communication recording unit 203, and the communication record more than a predetermined time is deleted. As a result, only the latest communication status remains in the communication recording unit 203, information on only the device related to the failure can be left, and an increase in the storage area of the communication recording unit 203 can be prevented.

以上、通信観測部２０２の動作を説明した。本動作フローを繰り返し実行することにより、機器４０１〜４０４間の通信記録が通信記録部２０３に追加更新されていく。 The operation of the communication observation unit 202 has been described above. By repeatedly executing this operation flow, the communication record between the devices 401 to 404 is additionally updated in the communication recording unit 203.

図５は、通信記録部２０３が格納する通信記録の形式例である。ここではテーブル形式で記録する例を示したが、記録形式はこれに限られるものではない。 FIG. 5 shows a format example of the communication record stored in the communication recording unit 203. Although an example of recording in the table format is shown here, the recording format is not limited to this.

通信記録は、「ＩＰアドレス」列、「記録時刻」列を有する。
「ＩＰアドレス」列には、通信観測部２０２が捕捉した通信パケットの送信元ＩＰアドレスと送信先ＩＰアドレスのペアが格納される。本実施の形態１における「識別情報」は本列の値がこれに相当する。
「記録時刻」列には、「ＩＰアドレス」列のアドレスペアの通信を記録した最新時刻が格納される。 The communication record has an “IP address” column and a “recording time” column.
In the “IP address” column, a pair of a transmission source IP address and a transmission destination IP address of a communication packet captured by the communication observation unit 202 is stored. The “identification information” in the first embodiment corresponds to the value in this column.
The “recording time” column stores the latest time when the communication of the address pair in the “IP address” column is recorded.

以下、図５のデータ例において、機器４０１のＩＰアドレスを「１９２．１６８．０．５」であると仮定し、図３のステップＳ３０２〜Ｓ３０３における動作例を説明する。 Hereinafter, in the data example of FIG. 5, assuming that the IP address of the device 401 is “192.168.0.5”, an operation example in steps S302 to S303 of FIG. 3 will be described.

（Ｓ３０２＿１）
図３のステップＳ３０２において、動作情報取得部２０４は、ＩＰアドレス「１９２．１６８．０．５」をキーにして、機器４０１の通信記録を検索する。図５のデータ例では、１行目と２行目のデータが検索にヒットする。 (S302_1)
In step S302 of FIG. 3, the operation information acquisition unit 204 searches for a communication record of the device 401 using the IP address “192.168.0.5” as a key. In the data example of FIG. 5, the data in the first and second rows hits the search.

（Ｓ３０２＿２）
図３のステップＳ３０２において、動作情報取得部２０４は、上記ステップで取得した各行の相手方機器のアドレスを取得する。図５のデータ例では、「１９２．１６８．０．７」「１９２．１６８．０．９」を取得することになる。
このステップにより、過去に機器４０１の通信相手となって連係動作していた機器のアドレスを特定することができる。 (S302_2)
In step S302 in FIG. 3, the operation information acquisition unit 204 acquires the address of the counterpart device in each row acquired in the above step. In the data example of FIG. 5, “192.168.0.7” and “192.168.0.9” are acquired.
By this step, it is possible to specify the address of the device that has been linked and operated in the past as the communication partner of the device 401.

（Ｓ３０３）
図３のステップＳ３０３において、動作情報取得部２０４は、上記ステップ（Ｓ３０２＿２）で取得したアドレス「１９２．１６８．０．７」「１９２．１６８．０．９」の各機器の動作情報を取得する。 (S303)
In step S303 of FIG. 3, the operation information acquisition unit 204 acquires the operation information of each device having the addresses “192.168.0.7” and “192.168.0.9” acquired in step (S302_2). .

以上、本実施の形態１に係る障害分析システムおよび各装置等の動作を説明した。
なお、図２のステップＳ２０４において、既に同じ機器についての動作情報を要求済みであるか否かにより、いずれの機器についての動作情報を要求するかを区別したが、区別する手順はこれに限られるものではない。 The operation of the failure analysis system and each device according to the first embodiment has been described above.
Note that in step S204 in FIG. 2, whether to request the operation information for the same device is determined depending on whether the operation information for the same device has already been requested, but the procedure for distinguishing is limited to this. It is not a thing.

例えば、解析サーバ３００側では常に障害発生機器（本実施の形態１では機器４０１）についての動作情報を要求しておき、ゲートウェイ装置２００側で、既に機器４０１の動作情報５０１を送信したか否かに基づき、いずれの機器についての動作情報を要求するかを判定するようにしてもよい。
具体的には、以下のような手法が考えられる。 For example, whether or not the analysis server 300 always requests operation information about a faulty device (the device 401 in the first embodiment) and the gateway device 200 has already transmitted the operation information 501 of the device 401. Based on the above, it may be determined which device to request the operation information.
Specifically, the following methods can be considered.

（ゲートウェイ装置２００側で送信済みか否かを判定する手法例）
ゲートウェイ装置２００の動作情報取得部２０４は、動作情報を送信した機器のリストを一定時間保持しておく。その一定時間内に再び同じ機器について動作情報を要求されたときは、その機器についての動作情報は既に送信済みであると判断する。
動作情報取得部２０４は、当該機器についての動作情報を既に送信済みであると判断したときは、通信記録部２０３の通信記録から、当該機器に関連する機器のアドレスを検索し、その機器の動作情報を代わりに送信する。 (Example of method for determining whether transmission has been completed on the gateway device 200 side)
The operation information acquisition unit 204 of the gateway device 200 holds a list of devices that have transmitted the operation information for a certain period of time. When the operation information is requested for the same device again within the predetermined time, it is determined that the operation information for the device has already been transmitted.
When the operation information acquisition unit 204 determines that the operation information about the device has already been transmitted, the operation information acquisition unit 204 searches the communication record of the communication recording unit 203 for the address of the device related to the device, and operates the device. Send information instead.

以上のように、本実施の形態１に係る障害分析システムは、機器４０１〜４０４間の通信記録を一定時間通信記録部２０３に格納しておく。
また、例えば機器４０１に障害が発生したとき、動作情報取得部２０４は、その通信記録を用いて、直近で機器４０１と通信していた機器を特定することにより、障害に関連すると想定される機器を絞り込んだ上で、その関連機器の動作情報を解析サーバ３００に送信する。 As described above, the failure analysis system according to the first embodiment stores the communication record between the devices 401 to 404 in the communication recording unit 203 for a certain period of time.
Further, for example, when a failure occurs in the device 401, the operation information acquisition unit 204 uses the communication record to identify the device that has been in communication with the device 401 most recently, thereby assuming that the device is related to the failure. And the operation information of the related device is transmitted to the analysis server 300.

これにより、解析サーバ３００は、障害が発生した機器４０１のみならず、障害に関連すると思われる機器の動作情報も解析することができるので、障害をより詳細に解析することができる。 As a result, the analysis server 300 can analyze not only the device 401 in which the failure has occurred but also the operation information of the device that seems to be related to the failure, so that the failure can be analyzed in more detail.

また、本実施の形態１によれば、ローカルネットワーク１００上の全ての機器から動作情報を解析サーバ３００に送信するのではなく、障害に関連すると思われる機器の動作情報のみを送信するので、通信量を抑えることができる。
同時に、障害解析に関係しない動作情報がローカルネットワーク１００の外部へ流れることを抑制できるので、プライバシーの観点から好ましい。 In addition, according to the first embodiment, the operation information is not transmitted from all the devices on the local network 100 to the analysis server 300, but only the operation information of the device that seems to be related to the failure is transmitted. The amount can be reduced.
At the same time, operation information not related to failure analysis can be prevented from flowing outside the local network 100, which is preferable from the viewpoint of privacy.

また、本実施の形態１によれば、解析サーバ３００が解析すべき動作情報の量も、同様に抑えることができるので、障害解析部３０１が行う解析時間を短縮することができる。 Further, according to the first embodiment, the amount of operation information to be analyzed by the analysis server 300 can be similarly reduced, so that the analysis time performed by the failure analysis unit 301 can be shortened.

実施の形態２．
実施の形態１では、ゲートウェイ装置２００の通信観測部２０２は、ローカルネットワーク１００上の全てのパケットを観測して通信相手機器のアドレスを抽出することとしたが、これ以外にも以下のようなアドレス抽出手法が考えられる。 Embodiment 2. FIG.
In the first embodiment, the communication observation unit 202 of the gateway device 200 observes all the packets on the local network 100 and extracts the address of the communication partner device. An extraction method can be considered.

（アドレス抽出手法１）
通信観測部２０２は、ローカルネットワーク１００上のパケットを適宜サンプリングした上で、そのサンプリングしたパケットから、送信元アドレスと送信先アドレスを抽出する。 (Address extraction method 1)
The communication observation unit 202 samples a packet on the local network 100 as appropriate, and extracts a transmission source address and a transmission destination address from the sampled packet.

（アドレス抽出手法２）
通信観測部２０２は、ＳＩＰ（ＳｅｓｓｉｏｎＩｎｉｔｉａｔｉｏｎＰｒｏｔｏｃｏｌ）のような、機器４０１〜４０４間で通信が開始される時にローカルネットワーク１００上を流れるパケットを捕捉し、そのパケットから送信元アドレスと送信先アドレスを抽出する。 (Address extraction method 2)
The communication observation unit 202 captures a packet that flows on the local network 100 when communication is started between the devices 401 to 404, such as a SIP (Session Initiation Protocol), and determines a source address and a destination address from the packet. Extract.

（アドレス抽出手法３）
通信観測部２０２は、ＤＬＮＡ（ＤｉｇｉｔａｌＬｉｖｉｎｇＮｅｔｗｏｒｋＡｌｌｉａｎｃｅ）やＥｃｈｏｎｅｔのようなホームネットワーク向けの通信規格などで定められている、ネットワーク内の機器を発見するためのパケットを捕捉し、そのパケットからローカルネットワーク１００上の機器４０１〜４０４のアドレスを取得する。 (Address extraction method 3)
The communication observation unit 202 captures a packet for finding a device in the network, which is defined by a communication standard for a home network such as DLNA (Digital Living Network Alliance) or Echonet, and uses the local network from the packet. The addresses of the devices 401 to 404 on the 100 are acquired.

実施の形態３．
図６は、本発明の実施の形態３に係る障害解析システムの構成図である。
本実施の形態３において、ゲートウェイ装置２００は、実施の形態１の図１で説明した構成に加え、新たにフィルタリング部２０５、フィルタ規則格納部２０６を備える。その他の各装置等の構成は、実施の形態１〜２と同様である。 Embodiment 3 FIG.
FIG. 6 is a configuration diagram of a failure analysis system according to Embodiment 3 of the present invention.
In the third embodiment, the gateway device 200 newly includes a filtering unit 205 and a filter rule storage unit 206 in addition to the configuration described in FIG. 1 of the first embodiment. Configurations of other devices and the like are the same as those in the first and second embodiments.

フィルタリング部２０５は、動作情報取得部２０４が取得した動作情報を、フィルタ規則格納部２０６が格納しているフィルタ規則にしたがってフィルタリングし、重要度の高い項目のみを抽出する。
フィルタ規則格納部２０６は、フィルタリング部２０５がフィルタリングを行うための規則を格納している。具体例は後述の図８で説明する。 The filtering unit 205 filters the operation information acquired by the operation information acquisition unit 204 in accordance with the filter rules stored in the filter rule storage unit 206, and extracts only items with high importance.
The filter rule storage unit 206 stores rules for the filtering unit 205 to perform filtering. A specific example will be described later with reference to FIG.

フィルタリング部２０５は、その機能を実現する回路デバイスのようなハードウェアで構成することもできるし、マイコンやＣＰＵのような演算装置とその動作を規定するソフトウェアで構成することもできる。 The filtering unit 205 can be configured by hardware such as a circuit device that implements the function, or can be configured by an arithmetic device such as a microcomputer or CPU and software that defines the operation thereof.

フィルタ規則格納部２０６は、ＨＤＤのような記憶装置で構成することができる。 The filter rule storage unit 206 can be configured by a storage device such as an HDD.

フィルタリング部２０５は、動作情報取得部２０４と一体的に構成してもよい。また、フィルタ規則格納部２０６は、通信記録部２０３と一体的に構成してもよい。 The filtering unit 205 may be configured integrally with the operation information acquisition unit 204. The filter rule storage unit 206 may be configured integrally with the communication recording unit 203.

図７は、本実施の形態３に係る障害解析システムの動作フローである。以下、図７の各ステップについて説明する。なお、図２と同様に、機器４０１で障害が発生したものと仮定する。 FIG. 7 is an operation flow of the failure analysis system according to the third embodiment. Hereinafter, each step of FIG. 7 will be described. It is assumed that a failure has occurred in the device 401 as in FIG.

（Ｓ７０１）〜（Ｓ７０６）
図２のステップＳ２０１〜Ｓ２０６と同様である。
（Ｓ７０７）
フィルタリング部２０５は、ステップＳ７０６で動作情報取得部２０４が取得した動作情報をフィルタリングし、重要度の高い項目を抽出する。本ステップの詳細は、後述の図１０で改めて説明する。
（Ｓ７０８）〜（Ｓ７１２）
図２のステップＳ２０７〜Ｓ２１１と同様である。 (S701) to (S706)
This is the same as steps S201 to S206 in FIG.
(S707)
The filtering unit 205 filters the operation information acquired by the operation information acquisition unit 204 in step S706, and extracts items with high importance. Details of this step will be described later with reference to FIG.
(S708) to (S712)
This is the same as steps S207 to S211 in FIG.

以上、本実施の形態３に係る障害解析システムの動作フローを説明した。
次に、フィルタリングに関して詳細を説明する。 The operation flow of the failure analysis system according to the third embodiment has been described above.
Next, details regarding filtering will be described.

図８は、フィルタ規則格納部２０６が格納しているフィルタ規則の例である。ここではテーブル形式で格納している例を示したが、フィルタ規則の形式はこれに限られるものではない。 FIG. 8 shows an example of the filter rules stored in the filter rule storage unit 206. Here, an example of storing in a table format is shown, but the format of the filter rule is not limited to this.

フィルタ規則は、「抽出規則」列、「重要度」列を有する。
「抽出規則」列には、動作情報から項目を抽出する規則が格納される。
「重要度」列では、「抽出規則」列で抽出される動作情報項目の重要度が指定される。図８では、重要度「１」を最重要とし、数値が増えるほど重要度が下がるものとした。
重要度とは、当該動作情報項目の深刻度と概ね同義である。即ち、重要度が高い動作情報項目は深刻な障害もしくはその前兆を示している可能性が高い。 The filter rule has an “extraction rule” column and an “importance” column.
The “extraction rule” column stores a rule for extracting an item from the operation information.
In the “importance” column, the importance of the operation information item extracted in the “extraction rule” column is designated. In FIG. 8, the importance “1” is the most important, and the importance decreases as the numerical value increases.
The importance is generally synonymous with the severity of the operation information item. That is, it is highly possible that an operation information item having a high degree of importance indicates a serious failure or a precursor thereof.

なお、図８では、説明の便宜上、日本語で各列の値を記載したが、実際には正規表現などの機械可読形式で各列を表した方が、処理の上では都合よい。 In FIG. 8, for convenience of explanation, the value of each column is described in Japanese. However, in actuality, it is more convenient in processing to represent each column in a machine-readable format such as a regular expression.

図９は、動作情報５０１のデータ例である。ここでは、機器４０１の内部プロセスのログを動作情報５０１とした例を示す。
動作情報５０１は、１行で１つのログ項目を表す。１つのログ項目には、そのログ項目の重要度を表す文字列（図９のＩＮＦＯ、ＥＲＲＯＲなど）と、ログの内容を表す文字列とが記載される。 FIG. 9 is a data example of the operation information 501. Here, an example in which the internal process log of the device 401 is the operation information 501 is shown.
The operation information 501 represents one log item per line. In one log item, a character string representing the importance of the log item (INFO, ERROR, etc. in FIG. 9) and a character string representing the contents of the log are described.

図１０は、フィルタリング部２０５が行うフィルタリング処理のフローである。以下、図１０の各ステップについて説明する。なお、説明に際し、フィルタ規則は図８、動作情報５０１は図９のデータ例を用いる。 FIG. 10 is a flow of filtering processing performed by the filtering unit 205. Hereinafter, each step of FIG. 10 will be described. In the description, the data example of FIG. 8 is used for the filter rule and the data example of FIG. 9 is used for the operation information 501.

（Ｓ１００１）
フィルタリング部２０５は、フィルタ規則格納部２０６から、重要度が高い順にフィルタ規則を選択する。図８のフィルタ規則例では、本ステップを最初に実行するときは１行目と４行目のフィルタ規則を選択し、２回目に実行するときは２行目のフィルタ規則を選択することになる。
（Ｓ１００２）
フィルタリング部２０５は、図９の動作情報５０１に、ステップＳ１００１で選択したフィルタ規則を適用する。 (S1001)
The filtering unit 205 selects filter rules from the filter rule storage unit 206 in descending order of importance. In the filter rule example of FIG. 8, when this step is executed for the first time, the filter rules for the first and fourth lines are selected, and when this step is executed for the second time, the filter rule for the second line is selected. .
(S1002)
The filtering unit 205 applies the filter rule selected in step S1001 to the operation information 501 in FIG.

（Ｓ１００３）
フィルタリング部２０５は、図９の動作情報５０１の各行に記載されている動作情報項目のうち、ステップＳ１００１で選択したフィルタ規則に適合するものを抽出する。
（Ｓ１００４）
フィルタリング部２０５は、図８の全てのフィルタ規則について以上のステップを実行したか否かを確認する。未処理のフィルタ規則が残っていればステップＳ１００１に戻って同様の処理を実行し、全て処理済であれば本動作フローを終了する。 (S1003)
The filtering unit 205 extracts items that match the filter rule selected in step S1001 from among the operation information items described in each row of the operation information 501 in FIG.
(S1004)
The filtering unit 205 confirms whether or not the above steps have been executed for all the filter rules in FIG. If unprocessed filter rules remain, the process returns to step S1001 to execute the same process. If all the process rules have been processed, the operation flow ends.

図１１は、図１０の動作フローの処理結果として得られるフィルタリング後の動作情報５０１を示すものである。
図８のフィルタ規則を図９の動作情報５０１に適用すると、「重要度」が「ＦＡＴＡＬ」「ＥＲＲＯＲ」「ＷＡＲＮ」の３種類のログ項目が抽出され、図１１の３行が最終的に残ることになる。 FIG. 11 shows the filtered operation information 501 obtained as a result of the operation flow shown in FIG.
When the filter rule of FIG. 8 is applied to the operation information 501 of FIG. 9, three types of log items whose “importance” is “FATAL”, “ERROR”, and “WARN” are extracted, and the three rows of FIG. 11 finally remain. It will be.

なお、図８〜図１１では、全てのフィルタ規則を適用する例を説明したが、フィルタ規則格納部２０６が格納しているフィルタ規則のうち一部のみを動作情報に適用するようにしてもよい。 8 to 11, the example in which all the filter rules are applied has been described. However, only a part of the filter rules stored in the filter rule storage unit 206 may be applied to the operation information. .

以上のように、本実施の形態３によれば、フィルタリング部２０５は、フィルタ規則格納部２０６が格納しているフィルタ規則を用いて、動作情報取得部２０４が取得した動作情報から重要度が高いものを抽出した上で、解析サーバ３００に送信する。
これにより、ゲートウェイ装置２００は、解析サーバ３００が障害解析を行うために有用な、即ち重要度の高い動作情報のみを絞り込んで解析サーバ３００に送信することができるので、通信量をさらに抑えることができる。
また、ローカルネットワーク１００外に送信する情報を最小限に抑えることができるので、プライバシーの観点からも好ましい。 As described above, according to the third embodiment, the filtering unit 205 uses the filter rule stored in the filter rule storage unit 206 and has a high importance from the operation information acquired by the operation information acquisition unit 204. After extracting a thing, it transmits to the analysis server 300.
Thereby, the gateway apparatus 200 can narrow down only the operation information useful for the analysis server 300 to perform the failure analysis, that is, the operation information having high importance, and transmit it to the analysis server 300, so that the communication amount can be further suppressed. it can.
In addition, information transmitted outside the local network 100 can be minimized, which is preferable from the viewpoint of privacy.

実施の形態４．
実施の形態３では、フィルタ規則格納部２０６はフィルタ規則をあらかじめ格納済みであることを想定したが、フィルタ規則を以下に述べる手法で生成することもできる。
いずれの手法であっても、フィルタ規則は、障害解析を行うに際して有用である項目に高い重要度が割り当てられるように生成される。 Embodiment 4 FIG.
In the third embodiment, it is assumed that the filter rule storage unit 206 has stored the filter rule in advance, but the filter rule can also be generated by the method described below.
In any method, the filter rule is generated so that a high importance is assigned to an item that is useful when performing a failure analysis.

（フィルタ規則生成手法１）
コールセンターのオペレータや技術者等は、過去に障害解析を行った経験に基づき、障害解析に有用であった共通的な動作情報パターンを抽出する。抽出したパターンをフィルタ規則の形式に整形し、解析サーバ３００よりゲートウェイ装置２００のフィルタ規則格納部２０６に宛てて送信する。 (Filter rule generation method 1)
Call center operators, engineers, and the like extract common operation information patterns that have been useful for failure analysis based on past failure analysis experience. The extracted pattern is shaped into a filter rule format and transmitted from the analysis server 300 to the filter rule storage unit 206 of the gateway device 200.

（フィルタ規則生成手法２）
解析サーバ３００の障害解析部３０１は、障害解析結果に基づき、障害解析に有用であった動作情報の項目に重要度の値を割り当てる。次に、障害解析部３０１は、その重要度と動作情報の項目を用いてフィルタ規則を生成し、ゲートウェイ装置２００のフィルタ規則格納部２０６に宛てて送信する。 (Filter rule generation method 2)
The failure analysis unit 301 of the analysis server 300 assigns importance values to the items of operation information that are useful for failure analysis based on the failure analysis results. Next, the failure analysis unit 301 generates a filter rule using the items of importance and operation information, and transmits the filter rule to the filter rule storage unit 206 of the gateway device 200.

（フィルタ規則生成手法３）
障害解析に有用であった動作情報の項目にマークを付与する重要情報指示部を、解析サーバ３００に設けておく。具体的には、例えばオペレータが動作情報と障害解析結果を画面上で確認しながら、目視確認により動作情報の項目の中で重要なものを選別し、重要項目を画面上でクリックするなどしてマークする。
解析サーバ３００の障害解析部３０１は、オペレータが指示したマークの統計を取り、多くマークが付与された動作情報の項目に高い重要度を割り当てて、フィルタ規則を作成する。また、作成したフィルタ規則を、ゲートウェイ装置２００のフィルタ規則格納部２０６に宛てて送信する。 (Filter rule generation method 3)
An important information instructing unit for adding a mark to an item of operation information that has been useful for failure analysis is provided in the analysis server 300. Specifically, for example, while checking the operation information and failure analysis result on the screen, the operator selects important items from the operation information items by visual confirmation, and clicks on the important items on the screen. Mark.
The failure analysis unit 301 of the analysis server 300 collects the statistics of the mark instructed by the operator, assigns a high importance to the operation information item with many marks, and creates a filter rule. Further, the created filter rule is transmitted to the filter rule storage unit 206 of the gateway device 200.

（フィルタ規則生成手法４）
ゲートウェイ装置２００の障害検知部２０１は、機器４０１〜４０４の障害を検知すると、その時刻を解析サーバ３００の障害解析部３０１に通知する。
障害解析部３０１は、後に動作情報を用いてフィルタ規則を生成する。このとき、障害発生時刻から時間的に離れている動作情報項目ほど重要度が低くなるよう、フィルタ規則を生成する。 (Filter rule generation method 4)
When the failure detection unit 201 of the gateway device 200 detects a failure of the devices 401 to 404, the failure detection unit 201 notifies the failure analysis unit 301 of the analysis server 300 of the time.
The failure analysis unit 301 generates a filter rule later using the operation information. At this time, the filter rule is generated so that the importance is lower as the operation information item is separated in time from the failure occurrence time.

（フィルタ規則生成手法５）
フィルタ規則は、障害発生時刻から時刻が離れている動作情報ほど重要度が低くなるように構成することもできる。
この場合、フィルタ規則は、ゲートウェイ装置２００から解析サーバ３００へ障害発生時刻を通知して解析サーバ３００で生成することもできるし、ゲートウェイ装置２００で生成することもできる。 (Filter rule generation method 5)
The filter rule may be configured such that the importance is lower as the operation information is separated from the failure occurrence time.
In this case, the filter rule can be generated by the analysis server 300 by notifying the analysis server 300 of the failure occurrence time from the gateway device 200, or can be generated by the gateway device 200.

以上の（フィルタ規則生成手法１）〜（フィルタ規則生成手法５）は、適宜組み合わせて用いることもできる。 The above (filter rule generation method 1) to (filter rule generation method 5) can be used in appropriate combination.

実施の形態５．
実施の形態１〜４では、解析サーバ３００の動作情報要求部３０２は、ゲートウェイ装置２００に対し、障害発生機器に関連する機器についての動作情報を送信するよう要求することを説明した。これは、先に送信した動作情報のみでは、障害解析に不十分であったことを示唆する。 Embodiment 5 FIG.
In the first to fourth embodiments, it has been described that the operation information request unit 302 of the analysis server 300 requests the gateway device 200 to transmit operation information regarding a device related to the faulty device. This suggests that the operation information transmitted earlier is insufficient for failure analysis.

しかし、実施の形態３〜４で説明した場合のように、そもそも重要度の高い動作情報のみをフィルタリング抽出して解析サーバに送信しているときは、フィルタリングで排除された動作情報を改めて解析サーバ３００に送信すれば足りる可能性がある。 However, as described in the third to fourth embodiments, when only the highly important operation information is filtered and sent to the analysis server in the first place, the operation information excluded by the filtering is newly analyzed. Sending to 300 may be sufficient.

そこで、ゲートウェイ装置２００の動作情報取得部２０４は、解析サーバ３００の動作情報要求部３０２よりさらに動作情報を送信するよう要求されたとき、フィルタリング部２０５がフィルタリング処理で排除した動作情報を、解析サーバ３００に送るようにしてもよい。
その上でなお、さらに動作情報を送信するよう要求されたときは、改めて実施の形態１で説明したように関連機器の動作情報を取得し、解析サーバ３００に送信するとよい。 Therefore, when the operation information acquisition unit 204 of the gateway device 200 is requested to transmit more operation information from the operation information request unit 302 of the analysis server 300, the operation information excluded by the filtering unit 205 by the filtering process is used as the analysis server. You may make it send to 300.
In addition, when it is requested to transmit further operation information, the operation information of the related device may be acquired and transmitted to the analysis server 300 as described in the first embodiment.

実施の形態６．
以上の実施の形態１〜５で説明した解析サーバ３００に、障害を検知した機器４０１〜４０４の復旧を試みるようゲートウェイ装置２００に要求する、復旧要求部３０３（図示せず）を設けることもできる。 Embodiment 6 FIG.
The analysis server 300 described in the first to fifth embodiments can be provided with a recovery request unit 303 (not shown) that requests the gateway device 200 to attempt recovery of the devices 401 to 404 that have detected a failure. .

復旧要求部３０３は、その機能を実現する回路デバイスのようなハードウェアで構成することもできるし、マイコンやＣＰＵのような演算装置とその動作を規定するソフトウェアで構成することもできる。また、必要な通信インターフェース等を適宜備える。 The recovery request unit 303 can be configured by hardware such as a circuit device that implements the function, or can be configured by an arithmetic device such as a microcomputer or CPU and software that defines the operation thereof. In addition, necessary communication interfaces and the like are provided as appropriate.

ゲートウェイ装置２００は、復旧要求部３０３より障害発生機器の復旧要求を受けるとその機器を再起動するなどして復旧を試みる。その後、動作情報取得部２０４は、その機器の動作情報を改めて取得し、解析サーバ３００に送信する。
解析サーバ３００は、その動作情報を受け取り、復旧が成功したか否かを知ることができる。オペレータはその結果に基づき、次の対処を行う。 When the gateway device 200 receives a recovery request for a failed device from the recovery request unit 303, the gateway device 200 tries to recover the device by restarting the device. Thereafter, the operation information acquisition unit 204 acquires the operation information of the device again and transmits it to the analysis server 300.
The analysis server 300 can receive the operation information and know whether the recovery is successful. The operator takes the following measures based on the result.

実施の形態１に係る障害解析システムの構成図である。1 is a configuration diagram of a failure analysis system according to Embodiment 1. FIG. 実施の形態１に係る障害解析システムの動作フローである。3 is an operation flow of the failure analysis system according to the first embodiment. ステップＳ２０６の詳細動作を説明する動作フローである。It is an operation | movement flow explaining the detailed operation | movement of step S206. ゲートウェイ装置２００が機器４０１〜４０４の動作情報５０１〜５０４を記録する動作のフローチャートである。It is a flowchart of the operation | movement which the gateway apparatus 200 records the operation information 501-504 of the apparatuses 401-404. 通信記録部２０３が格納する通信記録の形式例である。It is an example of a format of the communication record which the communication recording part 203 stores. 実施の形態３に係る障害解析システムの構成図である。6 is a configuration diagram of a failure analysis system according to Embodiment 3. FIG. 実施の形態３に係る障害解析システムの動作フローである。10 is an operation flow of the failure analysis system according to the third embodiment. フィルタ規則格納部２０６が格納しているフィルタ規則の例である。It is an example of a filter rule stored in the filter rule storage unit 206. 動作情報５０１のデータ例である。It is an example of data of operation information 501. フィルタリング部２０５が行うフィルタリング処理のフローである。It is the flow of the filtering process which the filtering part 205 performs. 図１０の動作フローの処理結果として得られるフィルタリング後の動作情報５０１を示すものである。11 shows filtered operation information 501 obtained as a result of the operation flow processing of FIG. 10.

Explanation of symbols

１００ローカルネットワーク、２００ゲートウェイ装置、２０１障害検知部、２０２通信観測部、２０３通信記録部、２０４動作情報取得部、２０５フィルタリング部、２０６フィルタ規則格納部、３００解析サーバ、３０１障害解析部、３０２動作情報要求部、３０３復旧要求部、４０１〜４０４機器、５０１〜５０４動作情報、６００ネットワーク。 DESCRIPTION OF SYMBOLS 100 Local network, 200 Gateway apparatus, 201 Failure detection part, 202 Communication observation part, 203 Communication recording part, 204 Operation | movement information acquisition part, 205 Filtering part, 206 Filter rule storage part, 300 Analysis server, 301 Failure analysis part, 302 Operation | movement Information request unit, 303 recovery request unit, 401-404 device, 501-504 operation information, 600 network.

Claims

A gateway device connected to the first network;
An analysis device connected to a second network different from the first network;
Have
The gateway device is
A failure detection unit that detects a failure of one or more devices on the first network;
A communication status recording unit that records identification information of each communication partner when communication is performed between the devices;
An operation information acquisition unit that acquires operation information of the device and transmits the operation information to the analysis device;
With
The analysis device includes:
A failure analysis unit that performs a failure analysis of the device using the operation information,
The operation information acquisition unit
Using the identification information recorded by the communication status recording unit,
The device in which the failure detection unit has detected a failure identifies a partner device with which communication has been performed in the past,
A failure analysis system characterized in that the operation information of the communication partner is acquired and transmitted to the analysis device.

The communication status recording unit
The failure analysis system according to claim 1, wherein the identification information is created and recorded by extracting a transmission source address and a transmission destination address from packets of all communications performed between the devices.

The communication status recording unit
Sampling packets for communication between the devices,
The failure analysis system according to claim 1, wherein the identification information is created and recorded by extracting a transmission source address and a transmission destination address from the sampled packet.

The communication status recording unit
Capture a session start packet for communication between the devices,
The failure analysis system according to claim 1, wherein the identification information is created and recorded by acquiring the address of the device from the session start packet.

The communication status recording unit
Capturing a device discovery packet flowing on the first network;
The failure analysis system according to claim 1, wherein the identification information is created and recorded by extracting a source address and a destination address using the device discovery packet.

The operation information acquisition unit
The failure according to any one of claims 1 to 5, wherein the acquired operation information having high importance is narrowed down according to a predetermined importance and then transmitted to the analysis device. Analysis system.

The failure analysis unit
Corresponding rule information that associates the content of the operation information with its importance is transmitted to the gateway device in advance,
The operation information acquisition unit
Acquired failure analysis system according to claim 6 Symbol mounting and transmits to the analysis apparatus having high importance in terms of narrowed down according to the corresponding rule information of the operation information.

The failure analysis unit
Giving importance to the operation information based on the failure analysis result of the device,
The failure analysis system according to claim 7, wherein the correspondence rule information is created using the importance and transmitted to the gateway device.

The analysis device includes:
An important information instruction unit that gives a mark to the operation information that is significant when the failure analysis unit performs the failure analysis,
The failure analysis unit
9. The failure analysis according to claim 8 , wherein the correspondence rule information is created and transmitted to the gateway device so that the importance of the operation information with a high frequency of adding a mark by the important information instruction unit is higher. system.

The correspondence rule information is
The failure analysis system according to any one of claims 6 to 9 , wherein the operation information having a time farther from the failure occurrence time is configured to have a lower importance.

The analysis device includes:
An operation information requesting unit for requesting the operation information to the gateway device;
The operation information acquisition unit
In accordance with the request of the operation information request unit, after narrowing down the high importance of the operation information and transmitted to the analysis device,
When the operation information has already been transmitted,
The failure analysis system according to any one of claims 6 to 10 , wherein operation information having lower importance than the previously transmitted operation information is transmitted.

The analysis device includes:
A recovery request unit that requests the gateway device to try to recover a failed device;
The operation information acquisition unit
After attempting to recover the device requested by the recovery request unit,
The failure analysis system according to any one of claims 1 to 11 , wherein operation information of the device is acquired and transmitted to the analysis device.

The operation information acquisition unit
The failure analysis system according to any one of claims 1 to 12 , wherein a list of processes operating in the device, a log of the processes, or both are acquired as the operation information.

The operation information acquisition unit
The failure analysis system according to any one of claims 1 to 13 , wherein when the operation information is acquired, individual identification information of the device is acquired and included in the operation information.

A method for analyzing a failure of one or more devices connected to a first network, comprising:
An analyzer is connected to a second network different from the first network;
A communication status recording step for recording identification information of each communication partner when communication is performed between the devices;
A failure detection step of detecting a failure of the device;
An operation information acquisition step of acquiring operation information of the device and transmitting it to the analysis device;
Have
The analysis device includes:
Performing a failure analysis step of performing a failure analysis of the device using the operation information;
In the operation information acquisition step,
Using the identification information recorded in the communication status recording step,
Identify the counterpart device with which the device that detected the failure in the failure detection step communicated in the past,
A failure analysis method characterized in that the operation information of the communication partner is acquired and transmitted to the analysis device.

In the communication status recording step,
The failure analysis method according to claim 15, wherein the identification information is created and recorded by extracting a transmission source address and a transmission destination address from packets of all communications performed between the devices.

In the communication status recording step,
Sampling packets for communication between the devices,
The failure analysis method according to claim 15, wherein the identification information is created and recorded by extracting a transmission source address and a transmission destination address from the sampled packet.

In the communication status recording step,
Capture a session start packet for communication between the devices,
The failure analysis method according to claim 15, wherein the identification information is created and recorded by acquiring the address of the device from the session start packet.

In the communication status recording step,
Capturing a device discovery packet flowing on the first network;
The failure analysis method according to claim 15, wherein the identification information is created and recorded by extracting a source address and a destination address using the device discovery packet.

In the operation information acquisition step,
The failure according to any one of claims 15 to 19 , wherein the acquired operation information having high importance is narrowed down according to a predetermined importance and then transmitted to the analysis device. Analysis method.

In the failure analysis step,
Corresponding rule information that associates the content of the operation information with its importance is transmitted to the gateway device in advance,
In the operation information acquisition step,
Acquired failure analysis method according to claim 20 Symbol mounting and transmits to the analysis apparatus having high importance in terms of narrowed down according to the corresponding rule information of the operation information.

In the failure analysis step,
Giving importance to the operation information based on the failure analysis result of the device,
The failure analysis method according to claim 21, wherein the correspondence rule information is created using the importance and transmitted to the gateway device.

The analysis device includes:
Performing an important information instruction step of adding a mark to what was significant when performing the failure analysis in the failure analysis step in the operation information;
In the failure analysis step,
23. The failure analysis according to claim 22 , wherein the correspondence rule information is generated and transmitted to the gateway device so that the higher the degree of importance of the operation information that is given the mark in the important information instruction step, the higher the importance. Method.

The correspondence rule information is
The failure analysis method according to any one of claims 20 to 23 , wherein the degree of importance is lower as the operation information is separated from the failure occurrence time.

The analysis device includes:
Executing an operation information requesting step for requesting the operation information to the gateway device;
In the operation information acquisition step,
In accordance with the request of the operation information request step, after narrowing down the high importance of the operation information and transmitted to the analysis device,
When the operation information has already been transmitted,
The failure analysis method according to any one of claims 20 to 24 , wherein operation information having a lower importance than the previously transmitted operation information is transmitted.

The analysis device includes:
Performing a recovery request step for requesting the gateway device to attempt to recover the failed device;
In the operation information acquisition step,
After attempting to recover the device requested in the recovery request step,
The failure analysis method according to any one of claims 15 to 25 , wherein operation information of the device is acquired and transmitted to the analysis device.

In the operation information acquisition step,
The failure analysis method according to any one of claims 15 to 26 , wherein a list of processes operating in the device, a log of the processes, or both of them is acquired as the operation information.

In the operation information acquisition step,
The failure analysis method according to any one of claims 15 to 27 , wherein when the operation information is acquired, individual identification information of the device is acquired and included in the operation information.