JP6648058B2

JP6648058B2 - Information processing apparatus, information processing method, and program

Info

Publication number: JP6648058B2
Application number: JP2017041721A
Authority: JP
Inventors: 喜芳近藤; 崇行後藤; 長谷川　輝之; 輝之長谷川; 徳広福元
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2017-03-06
Filing date: 2017-03-06
Publication date: 2020-02-14
Anticipated expiration: 2037-03-06
Also published as: JP2018148408A

Description

本発明は、情報処理装置、情報処理方法、及びプログラムに関し、特にネットワークで発生した障害の対処方法を学習するための技術に関する。 The present invention relates to an information processing apparatus, an information processing method, and a program, and more particularly, to a technique for learning a method of coping with a failure that has occurred in a network.

ネットワークシステムにおける障害の際に、障害の被疑部位を推定して絞り込む技術が知られている。特許文献１には、ツリー型のネットワークを仮定し、障害となっている各ノードに至る経路に含まれる各インタフェースに対して逐次ポーリングを実行することで障害被疑部位を抽出する技術が開示されている。 2. Description of the Related Art In the event of a failure in a network system, a technique for estimating and narrowing down a suspected failure site is known. Patent Literature 1 discloses a technique for extracting a suspected failure part by sequentially performing polling on each interface included in a path to each node that is a failure, assuming a tree-type network. I have.

特開２０１４−５３６５８号公報JP 2014-53658 A

ネットワークの構造はツリー型に限られず多岐にわたる。大規模かつ複雑なネットワークにおいては、あるノードで障害が発生すると、その影響は周囲のノードに広がる。このため、仮にポーリングによる情報が取得できないノードがあったとしても、必ずしもそのノードが障害原因とは限らない。真の障害原因となるノードの影響で、情報の送受信が困難となったノードも存在するからである。このため、ネットワークにおける障害原因の推定の改良、ひいてはネットワークにおける障害の対処方法の判定精度を向上するための技術には改良の余地があると考えられる。 The structure of the network is not limited to a tree type, but is various. In a large-scale and complex network, when a failure occurs at one node, the effect spreads to surrounding nodes. Therefore, even if there is a node for which information cannot be obtained by polling, the node is not always the cause of the failure. This is because some nodes have difficulty transmitting and receiving information due to the influence of a node that causes a true failure. For this reason, it is considered that there is room for improvement in the technique for improving the estimation of the cause of a failure in a network, and further improving the accuracy of determining a method of coping with a failure in the network.

そこで、本発明はこれらの点に鑑みてなされたものであり、ネットワークにおける障害の対処方法の判定精度を向上する。 Therefore, the present invention has been made in view of these points, and improves the accuracy of determining a method of coping with a failure in a network.

本発明の第１の態様は、情報処理装置である。この装置は、（１）ネットワークを構成する複数のノードの接続関係を示すトポロジー情報と、（２）前記ネットワークで発生した１以上の障害のそれぞれと、各障害の原因となった原因ノードと当該原因の対応処置とを含む対処方法及び障害発生時刻とを関連付けた障害対処情報と、（３）前記ノードにおいて発生したイベントの時系列データであるイベント情報と、を取得する情報取得部と、前記１以上の障害のそれぞれについて、各障害に起因してイベントを発生させたノードを要素として構成される関連ノードトポロジーを特定する関連ノードトポロジー特定部と、前記関連ノードトポロジーを構成する各ノードが発生させたイベントの時系列データと前記障害の対処方法とを教師データとして、前記ネットワークにおける障害の対処方法の尤度を出力するための学習器を機械学習によって生成する学習部と、を備える。 A first aspect of the present invention is an information processing device. This device includes: (1) topology information indicating a connection relationship between a plurality of nodes constituting a network; (2) each of one or more faults occurring in the network; An information acquisition unit that acquires failure handling information in which a troubleshooting method including a troubleshooting procedure for the cause and a failure occurrence time are associated with each other, and (3) event information that is time-series data of an event that has occurred in the node. For each of the one or more faults, a related node topology specifying unit for specifying a related node topology configured by using a node that has caused an event due to each fault as an element, and each node configuring the related node topology is generated. Using the time series data of the event and the method of coping with the failure as teacher data, Comprising a learning unit for generating a learning device for outputting a likelihood method by a machine learning, a.

前記関連ノードトポロジー特定部は、前記１以上の障害のそれぞれについて、各障害の発生時刻を含む所定期間内においてイベントを発生させたノードを抽出するイベント抽出部と、前記イベント抽出部が抽出したノードのうち、前記原因ノードに至るまでの経路に存在するすべてのノードがイベントを発生させているノードの集合を連結部分グラフ関連ノードトポロジーとして抽出するノード抽出部と、を含んでもよい。 The related node topology specifying unit includes, for each of the one or more faults, an event extracting unit that extracts a node that has generated an event within a predetermined period including the time of occurrence of each fault, and a node extracted by the event extracting unit. A node extraction unit that extracts, as a connected subgraph-related node topology, a set of nodes in which all nodes existing on a path leading to the cause node generate an event.

前記情報処理装置は、前記１以上の障害のうち、前記原因ノードが同一であり、原因ノードの対応処置が同一であり、かつ前記関連ノードトポロジーを構成するノードが同一である障害を関連障害として抽出する関連障害抽出部をさらに備えてもよく、前記学習部は、前記関連障害における関連ノードトポロジーを構成する各ノードが発生させたイベントの時系列データと前記障害の対処方法とを教師データとしてもよい。 The information processing apparatus may define, as the related fault, a fault in which the cause node is the same, the corresponding action of the cause node is the same, and the node configuring the related node topology is the same among the one or more faults. The learning unit may further include a related fault extracting unit that extracts, as teacher data, time series data of an event generated by each node configuring a related node topology in the related fault and a method of coping with the fault. Is also good.

前記ノード抽出部は、前記ノードを構成する機器の製造者が異なれば、異なる関連ノードトポロジーとして抽出してもよい。 The node extraction unit may extract a different related node topology if a manufacturer of a device configuring the node is different.

前記ノード抽出部は、前記ノードを構成する機器の製造者が同じであっても、当該機器の種類が異なれば、異なる関連ノードトポロジーとして抽出してもよい。 The node extraction unit may extract as a different related node topology if the type of the device is different even if the manufacturer of the device configuring the node is the same.

前記ノード抽出部は、前記ノードを構成する機器の製造者及び当該機器の種類が同じであっても、当該機器に組み込まれているファームウェアのバージョンが異なれば、異なる関連ノードトポロジーとして抽出してもよい。 Even if the manufacturer of the device constituting the node and the type of the device are the same, even if the version of the firmware incorporated in the device is different, the node extraction unit may extract as a different related node topology. Good.

前記情報処理装置は、前記ネットワークの障害に起因して発生したイベントと、前記学習部が生成した学習器とに基づいて、前記障害の対処方法の尤度を出力する判定部をさらに備えてもよい。 The information processing apparatus may further include a determination unit that outputs a likelihood of a method of coping with the failure based on an event generated due to the failure of the network and a learning device generated by the learning unit. Good.

前記判定部は、前記イベントを発生させたノードの集合を含む関連ノードトポロジーに関して生成された１以上の学習器のそれぞれについて当該学習器に関連付けられている対処方法の前記尤度を出力し、出力した１以上の尤度に基づいて前記障害の対処方法を判定してもよい。 The determination unit outputs the likelihood of a coping method associated with the learning device for each of one or more learning devices generated with respect to a related node topology including a set of nodes that caused the event, and outputs A method for coping with the failure may be determined based on the one or more likelihoods thus obtained.

本発明の第２の態様は、情報処理方法である。この方法は、プロセッサが、ネットワークを構成する複数のノードの接続関係を示すトポロジー情報を取得するステップと、前記ネットワークで発生した１以上の障害のそれぞれと、各障害の対処方法及び障害発生時刻とを関連付けた障害対処情報を取得するステップと、前記ノードにおいて発生したイベントの時系列データであるイベント情報を取得するステップと、前記１以上の障害のそれぞれについて、各障害に起因してイベントを発生させたノードを要素として構成される関連ノードトポロジーを特定するステップと、前記関連ノードトポロジーを構成する各ノードが発生させたイベントの時系列データと前記障害の対処方法とを教師データとして、障害の対処方法の尤度を判定するための判定器を機械学習によって生成するステップと、を実行する。 A second aspect of the present invention is an information processing method. In this method, a processor acquires topology information indicating a connection relationship between a plurality of nodes constituting a network, each of one or more failures occurring in the network, a method of coping with each failure, and a failure occurrence time. Acquiring fault handling information associated with the event, acquiring event information that is time-series data of an event that has occurred in the node, and generating an event due to each of the one or more faults. Specifying a related node topology configured by using the caused node as an element, and using time series data of an event generated by each node configuring the related node topology and a method of coping with the fault as teacher data, A step of generating a determiner for determining the likelihood of the coping method by machine learning. And, to run.

本発明の第３の態様は、プログラムである。このプログラムは、コンピュータに、ネットワークを構成する複数のノードの接続関係を示すトポロジー情報を取得する機能と、前記ネットワークで発生した１以上の障害のそれぞれと、各障害の対処方法及び障害発生時刻とを関連付けた障害対処情報を取得する機能と、前記ノードにおいて発生したイベントの時系列データであるイベント情報を取得する機能と、前記１以上の障害のそれぞれについて、各障害に起因してイベントを発生させたノードを要素として構成される関連ノードトポロジーを特定する機能と、前記関連ノードトポロジーを構成する各ノードが発生させたイベントの時系列データと前記障害の対処方法とを教師データとして、障害の対処方法の尤度を判定するための判定器を機械学習によって生成する機能と、を実現させる。 A third aspect of the present invention is a program. This program provides a computer with a function of acquiring topology information indicating a connection relationship between a plurality of nodes constituting a network, one or more faults that have occurred in the network, a method of dealing with each fault, and a fault occurrence time. A function of acquiring failure handling information associated with the above, a function of acquiring event information that is time-series data of events occurring in the node, and an event occurring due to each failure for each of the one or more failures A function of specifying a related node topology configured by using the node as an element, a time series data of an event generated by each node configuring the related node topology, and a method of coping with the fault as teacher data; A function to generate a determiner for determining the likelihood of the coping method by machine learning. To.

本発明によれば、ネットワークにおける障害の対処方法の判定精度を向上することができる。 According to the present invention, it is possible to improve the accuracy of determining a method of coping with a failure in a network.

ネットワークを構成するノードとノードにおいて発生したイベントとを模式的に示す図である。FIG. 2 is a diagram schematically illustrating nodes constituting a network and events occurring in the nodes. 隠れマルコフモデルを説明するための図である。It is a figure for explaining a hidden Markov model. 実施の形態に係る情報処理装置の機能構成を模式的に示す図である。FIG. 2 is a diagram schematically illustrating a functional configuration of the information processing apparatus according to the embodiment. 実施の形態に係るトポロジー情報を説明するための図である。FIG. 4 is a diagram for explaining topology information according to the embodiment. 実施の形態に係る障害対処情報のデータ構造を模式的に示す図である。It is a figure which shows typically the data structure of the troubleshooting information concerning embodiment. 実施の形態に係るイベント情報のデータ構造を模式的に示す図である。FIG. 4 is a diagram schematically illustrating a data structure of event information according to the embodiment. 実施の形態に係る関連障害抽出部が抽出する関連障害を格納するデータベースのデータ構造を模式的に示す図である。It is a figure which shows typically the data structure of the database which stores the related fault extracted by the related fault extraction part which concerns on embodiment. 実施の形態に係る学習部が生成する学習器を格納する学習器データベースのデータ構造を模式的に示す図である。It is a figure which shows typically the data structure of the learning device database which stores the learning device which the learning part which concerns on embodiment produces. 実施の形態に係る情報処理装置が実行する情報処理の流れを説明するためのフローチャートである。5 is a flowchart for explaining a flow of information processing executed by the information processing apparatus according to the embodiment.

＜実施の形態の概要＞
図１及び図２を参照して、実施の形態の概要を述べる。
図１（ａ）―（ｂ）は、ネットワークを構成するノードとノードにおいて発生したイベントとを模式的に示す図である。具体的には、図１（ａ）はネットワークを構成するノードを示す図であり、図１（ｂ）はノードにおいて発生したイベントの時系列データを模式的に示す図である。ここで「イベント」とは、例えばアラームの発生等、ノードの通常の動作状態ではない動作が起こることを意味する。 <Outline of Embodiment>
An outline of the embodiment will be described with reference to FIGS.
FIGS. 1A and 1B are diagrams schematically showing nodes constituting a network and events occurring in the nodes. Specifically, FIG. 1A is a diagram illustrating nodes forming a network, and FIG. 1B is a diagram schematically illustrating time-series data of an event that has occurred in the node. Here, the “event” means that an operation that is not in the normal operation state of the node, such as occurrence of an alarm, occurs.

図１（ａ）において、符号Ｎで示す数字が付された白丸は、ネットワークを構成するノードを示す。煩雑となることを防ぐために１が付された白丸以外には符号を付していないが、図１（ａ）において１から９までの数字が付された白丸はすべてネットワークを構成するノードである。以下説明の便宜上、本明細書において、１が付されたノードをノードＮ１と記載する。他の数字についても同様である。例えば、９が付されたノードはノードＮ９と記載する。 In FIG. 1A, white circles with numbers indicated by reference symbols N indicate nodes constituting the network. Although numerals are not given to the parts other than the white circles to which 1 is added in order to prevent complexity, all the white circles to which numbers from 1 to 9 are added in FIG. 1A are nodes constituting the network. . Hereinafter, for convenience of description, a node assigned 1 is referred to as a node N1 in this specification. The same applies to other numbers. For example, a node with 9 is described as a node N9.

図１（ａ）において、符号Ａで示す七角の星形は、隣接するノードにおいてイベントが発生したことを示している。煩雑となることを防ぐためにノードＮ５に隣接する星形にのみ符号Ａを付しているが、図１（ａ）においてノードＮ２、ノードＮ３、ノードＮ４、及びノードＮ８のそれぞれに隣接している星形も、これらのノードにおいてイベントが発生したことを示している。 In FIG. 1A, a seven-pointed star indicated by a symbol A indicates that an event has occurred at an adjacent node. To avoid complication, the symbol A is assigned only to the star adjacent to the node N5, but in FIG. 1A, the star A is adjacent to each of the nodes N2, N3, N4, and N8. The star also indicates that an event has occurred at these nodes.

図１（ｂ）は、ノードＮ２、ノードＮ４、ノードＮ３、ノードＮ２、ノードＮ５、ノードＮ８、ノードＮ５、及びノードＮ２の順番で、各ノードにおいてイベントが発生したことを示している。また、図１（ｂ）において、ノードＮ８におけるイベント発生と隣接して示されている矩形は、ノード８において発生した障害に対する対応処置が実施されたことを示している。 FIG. 1B shows that an event has occurred at each node in the order of the nodes N2, N4, N3, N2, N5, N8, N5, and N2. In FIG. 1B, a rectangle shown adjacent to the occurrence of an event at the node N8 indicates that a countermeasure for a failure that has occurred at the node 8 has been performed.

ネットワークを構成するあるノードにおいて障害が発生した場合、その障害に起因して他のノードでもアラーム等のイベントが発生することがある。この場合、障害に起因して発生するイベントは、障害の発生時刻と時間的に近いと考えられる。一方で、ある障害の発生時刻と近い時刻に発生したイベントが、すべてその障害に起因するとは限らない。このため、実施の形態に係る情報処理装置１は、ネットワークの各ノードにおいて発生したイベントを、そのイベント発生の原因となった障害毎に分類する。 When a failure occurs in a node configuring the network, an event such as an alarm may occur in another node due to the failure. In this case, the event that occurs due to the failure is considered to be temporally closer to the failure occurrence time. On the other hand, events that occur at a time close to the time of occurrence of a certain failure are not always caused by the failure. For this reason, the information processing device 1 according to the embodiment classifies an event that has occurred in each node of the network for each fault that has caused the event.

実施の形態において「障害に起因するノード」とは、以下の３つの条件を満たすノードである。
（条件１）障害の原因となったノード（以下、「原因ノード」と記載する。）において障害が発生した時刻を含む所定期間Ｄ内において、イベントを発生させているノードであること。
（条件２）原因ノードに至るまでの経路に存在するすべてのノードがイベントを発生させているノードであること。
（条件３）原因ノードの対応処置が同じであること。 In the embodiment, a “node caused by a failure” is a node that satisfies the following three conditions.
(Condition 1) A node that has generated an event within a predetermined period D including a time at which a failure has occurred in a node that caused the failure (hereinafter, referred to as a “cause node”).
(Condition 2) All nodes existing on the route up to the cause node must be nodes that are generating events.
(Condition 3) The corresponding actions of the cause nodes are the same.

図１（ａ）において、ノードＮ８は原因ノードである。また、図８（ｂ）において、符号Ｄで示す矢印は、条件１における所定期間Ｄである。図１（ｂ）より、所定期間Ｄ内においてイベントを発生させているノードは、ノードＮ２、ノードＮ３、ノードＮ４、ノードＮ５、ノードＮ８、及びノードＮ５である。また、図１（ａ）より、原因ノードであるノードＮ８に至るまでの経路に存在するすべてのノードがイベントを発生させているノードの集合は、ノードＮ２、ノードＮ５、及びノードＮ８である。ノードＮ３及びノードＮ４はそれぞれ所定期間Ｄ内にイベントを発生させているため条件１を満たすが、ノードＮ６がイベントを発生させていないため条件２を満たさない。 In FIG. 1A, a node N8 is a cause node. Also, in FIG. 8B, the arrow indicated by the symbol D is the predetermined period D in the condition 1. As shown in FIG. 1B, the nodes that generate events within the predetermined period D are the nodes N2, N3, N4, N5, N8, and N5. Further, from FIG. 1A, a set of nodes in which all the nodes existing on the route to the node N8 which is the cause node have generated an event are the node N2, the node N5, and the node N8. The node N3 and the node N4 each generate the event within the predetermined period D and thus satisfy the condition 1, but do not satisfy the condition 2 because the node N6 does not generate the event.

以下、本明細書において、上記の３つの条件を満たすノードの集合及びその接続関係を「関連ノードトポロジー」と記載する。図１（ａ）において破線で囲まれたノードの集合（ノードＮ２、ノードＮ５、及びノードＮ８を要素とする集合及びその接続関係）は関連ノードトポロジーである。実施の形態に係る情報処理装置は、ネットワークを構成するノードの部分集合である関連ノードトポロジーを単位として、イベントの時系列データであるイベント情報、原因ノード、及び対応処置を教師データとする機械学習を実行することにより、障害の対処方法の尤度を出力するための学習器を生成する。 Hereinafter, in this specification, a set of nodes satisfying the above three conditions and their connection relationship will be referred to as “related node topology”. In FIG. 1A, a set of nodes surrounded by a dashed line (a set including nodes N2, N5, and N8 as elements and their connection relations) is a related node topology. The information processing apparatus according to the embodiment uses, as a unit, a related node topology, which is a subset of nodes constituting a network, as machine learning, using event information, which is time-series data of an event, a cause node, and corresponding action as teacher data. To generate a learning device for outputting the likelihood of the coping method of the fault.

ネットワーク全体ではなく関連ノードトポロジーを単位として実施の形態に係る情報処理装置が学習器を生成することにより、教師データから障害に起因しないノードのイベント情報を除外できる。これにより、実施の形態に係る情報処理装置は、ネットワークにおける障害の推定精度を向上することができる。 When the information processing apparatus according to the embodiment generates a learning device using the related node topology as a unit instead of the entire network, it is possible to exclude the event information of the node not caused by the failure from the teacher data. As a result, the information processing apparatus according to the embodiment can improve the accuracy of estimating a failure in the network.

実施の形態が使用する機械学習手法は、例えば、隠れマルコフモデル（Hidden Markov Model：以下「ＨＭＭ」と記載する。）、ニューラルネットワーク、サポートベクタマシン（Support Vector Machine）等の既知の手法を用いて機械学習を実行する。以下、機械学習の一例として、ＨＭＭについて説明する。 The machine learning method used in the embodiment uses, for example, a known method such as a hidden Markov model (hereinafter, referred to as “HMM”), a neural network, a support vector machine (Support Vector Machine), or the like. Perform machine learning. Hereinafter, an HMM will be described as an example of machine learning.

［ＨＭＭモデル］
図２は、ＨＭＭを説明するための図であり、図１（ａ）に示す関連ノードトポロジーにおけるイベントの発生過程のＨＭＭモデルを模式的に示す図である。ＨＭＭにおいて、各ノードにおいて観測される各イベントは、観測できない「状態」によって説明される。図２に示すＨＭＭでは、ノードＮ２、ノードＮ３、及びノードＮ８を要素とする関連ノードトポロジーで発生する各イベントは、内部の状態Ｓ１、状態Ｓ２、及び状態Ｓ３からの出力確率によって表現される。 [HMM model]
FIG. 2 is a diagram for explaining the HMM, and is a diagram schematically illustrating an HMM model of an event generation process in the related node topology illustrated in FIG. In the HMM, each event observed at each node is described by an unobservable “state”. In the HMM illustrated in FIG. 2, each event that occurs in the related node topology including the nodes N2, N3, and N8 as elements is expressed by internal states S1, S2, and an output probability from the state S3.

図２において、ａ１、ａ２、及びａ３（ただし、ａ１＋ａ２＋ａ３＝１．０）は、それぞれ状態Ｓ１、状態Ｓ２、及び状態Ｓ３がＨＭＭの初期状態である確率を示す。また、ａｉｊ（ｉ，ｊ＝１，２，３）は、状態Ｓｉから状態Ｓｊへの遷移確率を示す。例えば状態Ｓ１から状態Ｓ２の遷移確率はａ１２であり、状態Ｓ３から状態Ｓ２への遷移確率はＳ３２である。図２に示すＨＭＭの例では、各状態は、必ず他のいずれかの状態に遷移する。状態Ｓ１は状態Ｓ２にのみ遷移するため、遷移確率ａ１２は１．０（１００％）である。また、状態Ｓ２は状態Ｓ１又は状態Ｓ３のいずれかに遷移するため、ａ２１＋ａ２３＝１．０である。 In FIG. 2, a1, a2, and a3 (where a1 + a2 + a3 = 1.0) indicate the probabilities that the states S1, S2, and S3 are the initial states of the HMM, respectively. Aij (i, j = 1, 2, 3) indicates a transition probability from the state Si to the state Sj. For example, the transition probability from the state S1 to the state S2 is a12, and the transition probability from the state S3 to the state S2 is S32. In the example of the HMM shown in FIG. 2, each state always transitions to any other state. Since the state S1 transits only to the state S2, the transition probability a12 is 1.0 (100%). Since the state S2 transits to either the state S1 or the state S3, a21 + a23 = 1.0.

図２において、ｂｉｊ（ｉ，ｊ＝２，５，８）は、ＨＭＭが状態Ｓｉのときに、ノードＮｊでイベントが発生する確率を示す。例えばＨＭＭが状態Ｓ２のときにノードＮ５でイベントが発生する確率はｂ２５であり、状態Ｓ３のときにノードＮ８でイベントが発生する確率はｂ３８である。 In FIG. 2, bij (i, j = 2, 5, 8) indicates a probability that an event will occur at the node Nj when the HMM is in the state Si. For example, the probability that an event will occur at the node N5 when the HMM is in the state S2 is b25, and the probability that an event will occur at the node N8 when the HMM is the state S3 is b38.

図１（ａ）に示すノードＮ２、ノードＮ５、及びノードＮ８から構成される関連ノードトポロジーでは、図１（ｂ）に示すように、ノードＮ２、ノードＮ５、ノードＮ８、及びノードＮ５の順序でイベントが発生している。図２に示すＨＭＭによると、ノードＮ２、ノードＮ５、ノードＮ８、及びノードＮ５の順でイベントが発生するためには、状態Ｓ１、状態Ｓ２、状態Ｓ３、及び状態Ｓ２の順で遷移する場合か、あるいは状態Ｓ２、状態Ｓ３、状態Ｓ２、及び状態Ｓ１の順に遷移する場合の２通りである。 In the related node topology including the node N2, the node N5, and the node N8 illustrated in FIG. 1A, as illustrated in FIG. 1B, in the order of the node N2, the node N5, the node N8, and the node N5. An event has occurred. According to the HMM shown in FIG. 2, in order for an event to occur in the order of the node N2, the node N5, the node N8, and the node N5, a transition is made in the order of the state S1, the state S2, the state S3, and the state S2. Or state S2, state S3, state S2, and state S1 in this order.

前者の発生確率Ｐ１は、ａ１×ｂ１２×ａ１２×ｂ２５×ａ２３×ｂ３８×ａ３２×ｂ２５である。また、後者の発生確率Ｐ２は、ａ２×ｂ２２×ａ２３×ｂ３５×ａ３２×ｂ２８×ａ２１×１５である。したがって、図２に示すＨＭＭにおいて、ノードＮ２、ノードＮ５、ノードＮ８、及びノードＮ５の順でイベントが発生する尤度はＰ１＋Ｐ２となる。 The former occurrence probability P1 is a1 × b12 × a12 × b25 × a23 × b38 × a32 × b25. The latter occurrence probability P2 is a2 × b22 × a23 × b35 × a32 × b28 × a21 × 15. Therefore, in the HMM shown in FIG. 2, the likelihood that an event will occur in the order of node N2, node N5, node N8, and node N5 is P1 + P2.

実施の形態に係る情報処理装置は、以下の３つの情報に基づいて学習することにより、複数の異なる関連ノードトポロジーそれぞれについてＨＭＭの学習器を算出する。具体的には、実施の形態に係る情報処理装置は、例えば既知のバウム・ウェルチアルゴリズム（Baum-Welch algorithm）を用いることにより、ＨＭＭのパラメータであるａｉ、ａｉｊ、及びｂｉｊを算出する。 The information processing apparatus according to the embodiment calculates an HMM learner for each of a plurality of different related node topologies by learning based on the following three pieces of information. Specifically, the information processing apparatus according to the embodiment calculates the parameters ai, aij, and bij of the HMM by using, for example, a known Baum-Welch algorithm.

（情報１）ネットワークを構成する複数のノードの接続関係を示すトポロジー情報。
（情報２）ネットワークで発生した１以上の障害のそれぞれと、各障害の原因となった原因ノードと当該原因の対応処置とを含む対処方法及び障害発生時刻とを関連付けた障害対処情報。
（情報３）ノードにおいて発生したイベントの時系列データであるイベント情報。 (Information 1) Topology information indicating a connection relationship between a plurality of nodes constituting a network.
(Information 2) Failure handling information that associates each of one or more failures that have occurred in the network, a handling method including a cause node that caused each failure and a corresponding action for the cause, and a failure occurrence time.
(Information 3) Event information that is time-series data of an event that has occurred in a node.

実施の形態に係る情報処理装置は、ネットワークにおいてあるイベントが発生した場合、そのイベントを発生させたノードを関連ノードトポロジーに含む１以上の学習器を選択し、選択したそれぞれの学習器についてイベントの発生確率を算出する。各学習器は、それぞれ原因ノードの対処方法が紐づけられている。各学習器を用いて算出したイベントの発生確率は、そのイベントの起因となる原因ノード及びその対処方法の尤度となる。実施の形態に係る情報処理装置は、例えば各学習器を用いて算出した尤度に基づいて、障害の対処方法を判定する。 When an event occurs in a network, the information processing apparatus according to the embodiment selects one or more learning devices that include the node that caused the event in the related node topology, and performs an event processing for each of the selected learning devices. Calculate the probability of occurrence. Each learning device is associated with a coping method of the cause node. The event occurrence probability calculated using each learning device is the likelihood of the cause node causing the event and the coping method thereof. The information processing apparatus according to the embodiment determines a method of coping with a failure based on, for example, the likelihood calculated using each learning device.

なお、図２は、状態Ｓ１と状態Ｓ３との間で状態遷移が起こらない場合のＨＭＭの例を示しているが、状態Ｓ１と状態Ｓ３との間で状態遷移が起こるＨＭＭを採用してもよい。また、同一の状態への遷移（例えば、状態Ｓ２から状態Ｓ２への遷移）も起こるＨＭＭを採用してもよい。さらに、図２に示すＨＭＭでは、例えば状態Ｓ１のときにノードＮ８でイベントが発生しない等のようにある状態のときに発生しないイベントが存在するが、すべての状態においてすべてのイベントが発生可能なＨＭＭモデルを採用してもよい。どのようなＨＭＭを採用するかは、ネットワークの構造や計算コスト等を考慮して実験により定めればよい。 FIG. 2 shows an example of an HMM in which a state transition does not occur between the state S1 and the state S3. However, even when an HMM in which a state transition occurs between the state S1 and the state S3 is employed. Good. Further, an HMM in which a transition to the same state (for example, a transition from the state S2 to the state S2) may be adopted. Further, in the HMM shown in FIG. 2, there are events that do not occur in a certain state, such as no event occurring in the node N8 in the state S1, but all events can occur in all states. An HMM model may be adopted. Which HMM is used may be determined by experiment in consideration of the network structure, calculation cost, and the like.

＜情報処理装置の機能構成＞
以下、図３を参照して、実施の形態に係る情報処理装置についてより詳細に説明する。
図３は、実施の形態に係る情報処理装置１の機能構成を模式的に示す図である。実施の形態に係る情報処理装置１は、記憶部１０及び制御部２０を備える。制御部２０は、情報取得部２１、関連ノードトポロジー特定部２２、関連障害抽出部２３、学習部２４、及び判定部２５を備える。 <Functional configuration of information processing device>
Hereinafter, the information processing apparatus according to the embodiment will be described in more detail with reference to FIG.
FIG. 3 is a diagram schematically illustrating a functional configuration of the information processing apparatus 1 according to the embodiment. The information processing device 1 according to the embodiment includes a storage unit 10 and a control unit 20. The control unit 20 includes an information acquiring unit 21, a related node topology specifying unit 22, a related fault extracting unit 23, a learning unit 24, and a determining unit 25.

記憶部１０は、情報処理装置１を実現するコンピュータのＢＩＯＳ（Basic Input Output System）等を格納するＲＯＭや情報処理装置１の作業領域となるＲＡＭ（Random Access Memory）、ＯＳ（Operating System）やアプリケーションプログラム、当該アプリケーションプログラムの実行時に参照される種々の情報を格納するＨＤＤ（Hard Disk Drive）やＳＳＤ（Solid State Drive）等の大容量記憶装置である。 The storage unit 10 includes a ROM that stores a basic input output system (BIOS) of a computer that implements the information processing apparatus 1, a RAM (Random Access Memory) that is a work area of the information processing apparatus 1, an OS (Operating System), and an application. It is a large-capacity storage device such as a hard disk drive (HDD) or a solid state drive (SSD) that stores programs and various information referred to when the application program is executed.

制御部２０は、情報処理装置１のＣＰＵ（Central Processing Unit）やＧＰＵ（Graphics Processing Unit）等のプロセッサであり、記憶部１０に記憶されたプログラムを実行することによって、情報取得部２１、関連ノードトポロジー特定部２２（イベント抽出部２２１及びノード抽出部２２２）、関連障害抽出部２３、学習部２４、及び判定部２５として機能する。 The control unit 20 is a processor such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit) of the information processing device 1, and executes a program stored in the storage unit 10 to execute the information acquisition unit 21 and the related nodes. It functions as the topology specifying unit 22 (the event extracting unit 221 and the node extracting unit 222), the related fault extracting unit 23, the learning unit 24, and the determining unit 25.

情報取得部２１は、上述した３つの情報（トポロジー情報、障害対処情報、及びイベント情報）を取得する。トポロジー情報、障害対処情報、及びイベント情報は、記憶部１０に格納されている。以下、図４、図５、及び図６を参照して、トポロジー情報、障害対処情報、及びイベント情報について説明する。 The information acquisition unit 21 acquires the above-described three pieces of information (topology information, failure handling information, and event information). The topology information, the troubleshooting information, and the event information are stored in the storage unit 10. Hereinafter, the topology information, the failure handling information, and the event information will be described with reference to FIG. 4, FIG. 5, and FIG.

［トポロジー情報］
図４（ａ）−（ｂ）は、実施の形態に係るトポロジー情報を説明するための図である。具体的には、図４（ａ）はネットワークの接続関係を模式的に示す図であり、図４（ｂ）は図４（ａ）に示す接続関係を示すトポロジー情報のデータ構造を模式的に示す図である。なお、図４（ａ）に示すネットワークの接続関係は、図１に示すネットワークの接続関係と同一である。 [Topology information]
FIGS. 4A and 4B are diagrams for explaining topology information according to the embodiment. Specifically, FIG. 4A is a diagram schematically illustrating the connection relationship of the network, and FIG. 4B is a schematic diagram illustrating the data structure of the topology information indicating the connection relationship illustrated in FIG. FIG. Note that the network connection shown in FIG. 4A is the same as the network connection shown in FIG.

図４（ｂ）に示すように、実施の形態に係るトポロジー情報は、ネットワークを構成する各ノードを識別するためのノードＩＤ（Identification）を割り当てて情報を管理する。図４（ｂ）は、ノードＮ５のノードＩＤとしてＮＩＤ０００５が割り当てられている場合の例を示している。同様に、ノードＮｉ（ｉ＝１，２，・・・，９）には、それぞれＮＩＤ０００ｉが割り当てられている。 As shown in FIG. 4B, the topology information according to the embodiment is managed by assigning a node ID (Identification) for identifying each node constituting the network. FIG. 4B shows an example in which NID0005 is assigned as the node ID of the node N5. Similarly, NID000i is assigned to each of the nodes Ni (i = 1, 2,..., 9).

図４（ａ）に示すように、ノードＮ５は、ノードＮ１、ノードＮ２、及びノードＮ８と接続している。このため、図４（ｂ）に示すように、ノードＩＤがＮＩＤ０００５であるノードには、接続ノードとしてＮＩＤ０００１、ＮＩＤ０００２、及びＮＩＤ０００８が関連付けられている。トポロジー情報は、この他、各ノードの説部種類、機種、及びファームウェアのバージョンを含むノード種類情報と、ノードが設置されている各地域に割り当てられているノード設置地域ＩＤと、各ノードＩＤとも対応付けている。 As shown in FIG. 4A, the node N5 is connected to the nodes N1, N2, and N8. For this reason, as shown in FIG. 4B, a node whose node ID is NID0005 is associated with NID0001, NID0002, and NID0008 as connection nodes. The topology information also includes node type information including the description type, model, and firmware version of each node, the node installation area ID assigned to each area where the node is installed, and each node ID. Corresponding.

［障害対処情報］
図５は、実施の形態に係る障害対処情報のデータ構造を模式的に示す図である。図５に示すように、実施の形態に係る障害対処情報は、障害を一意に特定するための障害ＩＤを各障害に割り当てて情報を管理する。図５は、ノードＩＤがＮＩＤ０００８であるノード（すなわち、ノードＮ８）が原因ノードである障害に関する障害対処情報を図示している。図５に示すように、ノードＮ８はスイッチであり、障害の対応処置として機器の交換をしたことを示している。また、障害の発生時刻は２０１７年２月２３日午前１１時５４分３２秒であり、障害の対処時刻は同日の午後１時４６分５７秒、復旧時刻は同日の午後２時１分２３秒である。 [Troubleshooting information]
FIG. 5 is a diagram schematically illustrating a data structure of the troubleshooting information according to the embodiment. As shown in FIG. 5, the failure handling information according to the embodiment manages information by assigning a failure ID for uniquely identifying a failure to each failure. FIG. 5 illustrates failure handling information regarding a failure in which the node whose node ID is NID0008 (that is, the node N8) is the cause node. As shown in FIG. 5, the node N8 is a switch, which indicates that the device has been replaced as a response to the failure. The failure occurrence time was 11:54:32 am on February 23, 2017, the failure response time was 1:46:57 pm on the same day, and the recovery time was 2:01:23 pm on the same day. It is.

［イベント情報］
図６は、実施の形態に係るイベント情報のデータ構造を模式的に示す図である。図５に示すように、実施の形態に係るイベント情報は、各ノードで発生したイベントを一意に特定するためのイベントＩＤを、イベント毎に割り当てて情報を管理する。図６は、ノードＮ５において機器の温度が上昇したことによるアラームが発生したことを示している。ノードＮ５において、アラームは２０１７年２月２３日午前１０時１１分２２秒に発生し、復旧は同日の午後２時１分２３秒である。 [event information]
FIG. 6 is a diagram schematically illustrating a data structure of event information according to the embodiment. As illustrated in FIG. 5, the event information according to the embodiment manages information by assigning an event ID for uniquely identifying an event that has occurred in each node to each event. FIG. 6 illustrates that an alarm has occurred at the node N5 due to an increase in the temperature of the device. At the node N5, the alarm occurs on February 23, 2017 at 10:11:22 AM, and the recovery is at 2:01:23 PM on the same day.

［関連ノードトポロジーの特定］
続いて、関連ノードトポロジーの特定方法について説明する。
関連ノードトポロジー特定部２２は、トポロジー情報、障害対処情報、及びイベント情報を参照して、１以上の障害のそれぞれについて、各障害に起因してイベントを発生させたノードを要素として構成される関連ノードトポロジーを特定する。これを実現するために、関連ノードトポロジー特定部２２は、イベント抽出部２２１及びノード抽出部２２２を備える。 [Identify related node topology]
Subsequently, a method of specifying the related node topology will be described.
The related node topology identifying unit 22 refers to the topology information, the failure handling information, and the event information, and configures, for each of the one or more failures, a relation that includes a node that has caused an event due to each failure as an element. Identify the node topology. To realize this, the related node topology specifying unit 22 includes an event extracting unit 221 and a node extracting unit 222.

イベント抽出部２２１は、１以上の障害のそれぞれについて、各障害の発生時刻を含む所定期間Ｄ内においてイベントを発生させたノードを抽出する。ここで「所定期間Ｄ」は、一つの障害が影響を及ぼす時間的な範囲を示す。所定期間Ｄは、ネットワークの規模や構成要素の種類等を考慮して実験により定めればよいが、例えば１０時間である。イベント抽出部２２１は、障害対処情報を参照して一つの障害を特定する。続いてイベント抽出部２２１は、イベント情報を参照し、特定した障害の発生時刻を含む所定期間Ｄ内にイベントを発生させたノードを特定する。 The event extracting unit 221 extracts, for each of one or more faults, a node that has generated an event within a predetermined period D including the time of occurrence of each fault. Here, the “predetermined period D” indicates a temporal range affected by one failure. The predetermined period D may be determined by experiments in consideration of the scale of the network, the types of components, and the like, and is, for example, 10 hours. The event extracting unit 221 specifies one failure with reference to the failure handling information. Subsequently, the event extraction unit 221 refers to the event information, and specifies a node that has generated the event within a predetermined period D including the specified failure occurrence time.

ノード抽出部２２２は、トポロジー情報を参照することにより、イベント抽出部２２１が抽出したノードのうち、原因ノードに至るまでの経路に存在するすべてのノードがイベントを発生させているノードの集合を関連ノードトポロジーとして抽出する。これにより、関連ノードトポロジー特定部２２は、ネットワークを構成するノードの集合の中から、ある障害に関係するノードから構成される関連ノードトポロジーを抽出することができる。関連ノードトポロジーは学習器を生成するための学習の単位となるため、ある障害に関係するイベントを発生させたノードのみを関連ノードトポロジーとすることにより、学習器の判別精度を高めることができる。 The node extraction unit 222 refers to the topology information to associate a set of nodes in which all nodes existing on the path leading to the cause node among the nodes extracted by the event extraction unit 221 generate an event. Extract as node topology. Thereby, the related node topology specifying unit 22 can extract, from the set of nodes configuring the network, the related node topology configured by nodes related to a certain failure. Since the related node topology is a unit of learning for generating a learning device, the determination accuracy of the learning device can be improved by using only the node that has generated an event related to a certain failure as the related node topology.

ここで、ノードを構成する機器の製造者が異なれば、障害時に発生させるイベントも異なることも起こり得る。そこでノード抽出部２２２は、ノードを構成する機器の製造者が異なれば、異なる関連ノードトポロジーとして抽出してもよい。これにより、ノード抽出部２２２は、関連ノードトポロジーを構成するノードが発生させるイベントのバリエーションを抑えることができる。これは学習器が対象とするイベントを絞ることに相当するため、学習器の判別精度を高めることができる。 Here, if the manufacturers of the devices constituting the nodes are different, the events to be generated at the time of failure may be different. Therefore, the node extracting unit 222 may extract a different related node topology if the manufacturer of the device configuring the node is different. Thereby, the node extracting unit 222 can suppress a variation of an event generated by a node configuring the related node topology. Since this corresponds to narrowing down events targeted by the learning device, the accuracy of the learning device determination can be increased.

また、ノードを構成する機器の製造者が同じであっても、その機器の設備種類が異なれば、障害時に発生させるイベントも異なることも起こり得る。そこで、ノード抽出部２２２は、ノードを構成する機器の製造者が同じであっても、当該機器の種類が異なれば、異なる関連ノードトポロジーとして抽出してもよい。これにより、ノード抽出部２２２は、関連ノードトポロジーを構成するノードが発生させるイベントのバリエーションを抑えることができる。これは学習器が対象とするイベントを絞ることに相当するため、学習器の判別精度を高めることができる。 Further, even if the equipment constituting the node is the same, if the equipment type of the equipment is different, the event to be generated at the time of failure may be different. Therefore, the node extracting unit 222 may extract as a different related node topology if the type of the device is different even if the manufacturer of the device configuring the node is the same. Thereby, the node extracting unit 222 can suppress a variation of an event generated by a node configuring the related node topology. Since this corresponds to narrowing down events targeted by the learning device, the accuracy of the learning device determination can be increased.

さらに、ノードを構成する機器の製造者及びその機器の設備種類が同じであっても、機器に組み込まれているファームウェアのバージョンが異なれば、障害時に発生させるイベントも異なることも起こり得る。そこで、ノード抽出部２２２は、ノードを構成する機器の製造者及び機器の種類が同じであっても、機器に組み込まれているファームウェアのバージョンが異なれば、異なる関連ノードトポロジーとして抽出してもよい。これにより、ノード抽出部２２２は、関連ノードトポロジーを構成するノードが発生させるイベントのバリエーションを抑えることができる。これは学習器が対象とするイベントを絞ることに相当するため、学習器の判別精度を高めることができる。 Further, even if the manufacturer of the device constituting the node and the equipment type of the device are the same, if the version of the firmware incorporated in the device is different, the event to be generated at the time of failure may be different. Therefore, the node extracting unit 222 may extract a different related node topology as long as the firmware and the version of the firmware incorporated in the device are different even if the manufacturer and the type of the device constituting the node are the same. . Thereby, the node extracting unit 222 can suppress a variation of an event generated by a node configuring the related node topology. Since this corresponds to narrowing down events targeted by the learning device, the accuracy of the learning device determination can be increased.

図３の説明に戻る。関連障害抽出部２３は、障害対処情報に含まれる１以上の障害のうち、原因ノードが同一であり、原因ノードの対応処置が同一であり、かつ関連ノードトポロジーを構成するノードが同一である障害を関連障害として抽出する。 Returning to the description of FIG. The related fault extracting unit 23 determines, among one or more faults included in the fault handling information, a fault in which the cause node is the same, the corresponding action of the cause node is the same, and the nodes configuring the related node topology are the same. Is extracted as a related disorder.

図７は、実施の形態に係る関連障害抽出部２３が抽出する関連障害を格納する関連障害データベースのデータ構造を模式的に示す図である。関連障害データベースは、記憶部１０によって保持される。図７に示すように、関連障害データベースは、関連ノードトポロジーＩＤ、関連ノードトポロジーを構成するノード情報、及び複数の障害ＩＤを関連付けて格納している。なお、関連ノードトポロジーＩＤは、関連ノードトポロジーを一意に特定するために各関連ノードトポロジーに割り当てられた識別子である。またノード情報は、関連ノードトポロジーを構成する各ノードのノードＩＤと、関連ノードトポロジーの接続関係を示すノード形状とを含んでいる。 FIG. 7 is a diagram schematically illustrating a data structure of a related fault database storing related faults extracted by the related fault extracting unit 23 according to the embodiment. The related failure database is held by the storage unit 10. As shown in FIG. 7, the related fault database stores a related node topology ID, node information constituting the related node topology, and a plurality of fault IDs in association with each other. The related node topology ID is an identifier assigned to each related node topology in order to uniquely specify the related node topology. The node information includes a node ID of each node constituting the related node topology and a node shape indicating a connection relationship of the related node topology.

原因ノードの対応処置が同一であり、かつ関連ノードトポロジーを構成するノードが同一であっても、イベントの発生パターンは異なることも起こり得る。図７は、ノードＮ２、ノードＮ５、及びノードＮ８から構成される関連ノードトポロジーにおいて、ノードＮ８が原因ノードであり、障害の対処方法が機器の交換である場合のイベントの発生パターンを示している。 Even if the corresponding action of the cause node is the same and the nodes configuring the related node topology are the same, the event occurrence pattern may be different. FIG. 7 shows an event occurrence pattern when the node N8 is the cause node in the related node topology including the node N2, the node N5, and the node N8, and the method of coping with the failure is replacement of the device. .

すなわち、原因ノードの対応処置が同一であり、かつ関連ノードトポロジーを構成するノードが同一であっても、あるときは、例えばノードＮ２、ノードＮ５、ノードＮ８、及びノードＮ５の順序でイベントが発生し、別のときはノードＮ８、ノードＮ２、ノードＮ５、及びノードＮ８の順序でイベント発生する。関連障害は、学習部２４が実行する機械学習の教師データとして利用される。 That is, even if the corresponding actions of the cause node are the same and the nodes configuring the related node topology are the same, at some point, an event occurs in the order of, for example, node N2, node N5, node N8, and node N5. Otherwise, the event occurs in the order of node N8, node N2, node N5, and node N8. The related obstacle is used as teacher data of the machine learning executed by the learning unit 24.

学習部２４は、関連ノードトポロジーを構成する各ノードが発生させたイベントの時系列データと障害の対処方法とを教師データとして、機械学習によって学習器を生成する。より具体的には、学習部２４は、図７に示す関連障害における関連ノードトポロジーを構成する各ノードが発生させたイベントの時系列データと、障害の対処方法とを教師データとし、ＨＭＭによって学習器を生成する。 The learning unit 24 generates a learning device by machine learning using time-series data of an event generated by each node configuring the related node topology and a method of coping with a failure as teacher data. More specifically, the learning unit 24 uses the time series data of the event generated by each node constituting the related node topology in the related failure shown in FIG. Generate a container.

図８は、実施の形態に係る学習部２４が生成する学習器を格納する学習器データベースのデータ構造を模式的に示す図である。学習器データベースは、記憶部１０によって保持される。学習部２４は関連障害毎に個別に学習器を生成するため、同一のネットワークを対象とする場合であっても、学習器データベースには複数の学習器が登録されることになる。図９に示すように、学習器データベースは、学習器を一意に特定するための学習器ＩＤを各学習器に割り当てて情報を管理する。 FIG. 8 is a diagram schematically illustrating a data structure of a learning device database storing learning devices generated by the learning unit 24 according to the embodiment. The learning device database is held by the storage unit 10. Since the learning unit 24 generates a learning device individually for each related disorder, a plurality of learning devices are registered in the learning device database even when the same network is targeted. As shown in FIG. 9, the learning device database manages information by assigning a learning device ID for uniquely specifying a learning device to each learning device.

図８に示すように、学習器データベースは、学習器ＩＤと、学習器が判定の対象とする関連ノードトポロジーを特定するための関連ノードトポロジーＩＤと、その関連ノードトポロジーを構成する各ノードのノード情報と、障害の対処方法とを関連付けて保持する。 As shown in FIG. 8, the learning device database includes a learning device ID, a related node topology ID for specifying a related node topology to be determined by the learning device, and a node of each node constituting the related node topology. The information is stored in association with the troubleshooting method.

例えば、学習器ＩＤがＬＩＤ０００３である学習器が対象とする関連ノードトポロジーはＴＩＤ０００８で特定され、その構成要素はノードＮ２、ノードＮ５、及びノードＮ８である。この学習器は、ノードＮ２、ノードＮ５、及びノードＮ８又はその部分集合から構成される関連ノードトポロジーで発生したイベント情報を入力として、障害の対象方法の尤度を出力する。すなわち、学習器ＩＤがＬＩＤ０００３である学習器は、ノードＮ８で障害が発生し、その対応処置は機器の交換である確率を出力する。 For example, the related node topology targeted by the learning device whose learning device ID is LID0003 is specified by TID0008, and its components are a node N2, a node N5, and a node N8. The learning device outputs the likelihood of the target method of the failure by inputting the event information generated in the related node topology including the nodes N2, N5, and N8 or a subset thereof. In other words, the learning device whose learning device ID is LID0003 outputs a probability that a failure has occurred in the node N8 and the corresponding action is replacement of the device.

学習器を用いた障害の対象方法の尤度出力は、判定部２５が担う。判定部２５は、ネットワークの障害に起因して発生したイベントと、学習部２４が生成した学習器とに基づいて、障害の対処方法の尤度を出力する。より具体的には、判定部２５は、イベントを発生させたノードの集合を含む関連ノードトポロジーに関して生成された１以上の学習器を、記憶部１０が保持する学習器データベースから取得する。 The determination unit 25 performs the likelihood output of the target method of the obstacle using the learning device. The determination unit 25 outputs the likelihood of the failure handling method based on the event that has occurred due to the network failure and the learning device generated by the learning unit 24. More specifically, the determination unit 25 acquires one or more learning devices generated with respect to the related node topology including the set of nodes that have caused the event from the learning device database held by the storage unit 10.

判定部２５は、取得した１以上の学習器のそれぞれについて、その学習器に関連付けられている対処方法の尤度を出力する。判定部２５は、出力した１以上の尤度に基づいて障害の対処方法を判定する。例えば判定部２５は、出力した１以上の尤度のうち、最も高い尤度における対処方法を出力する。これにより、判定部２５は、ネットワークにおいて発生した過去のイベント情報に基づいて、統計的に最も尤もらしい対処方法を出力することができる。 The determination unit 25 outputs the likelihood of the coping method associated with each of the acquired one or more learning devices. The determining unit 25 determines a handling method for the failure based on the output one or more likelihoods. For example, the determination unit 25 outputs a coping method with the highest likelihood among the one or more likelihoods output. Thus, the determination unit 25 can output a statistically most likely coping method based on past event information generated in the network.

＜情報処理装置１が実行する情報処理の処理フロー＞
図９は、実施の形態に係る情報処理装置１が実行する情報処理の流れを説明するためのフローチャートである。本フローチャートにおける処理は、例えば情報処理装置１の電源が投入されたときに開始する。 <Processing flow of information processing executed by information processing apparatus 1>
FIG. 9 is a flowchart illustrating a flow of information processing executed by information processing apparatus 1 according to the embodiment. The processing in this flowchart is started, for example, when the information processing apparatus 1 is turned on.

情報取得部２１は、記憶部１０を参照して、ネットワークを構成する複数のノードの接続関係を示すトポロジー情報を取得する（Ｓ２）。情報取得部２１は、記憶部１０を参照して、ネットワークで発生した１以上の障害のそれぞれと、各障害の原因となった原因ノードと当該原因の対応処置とを含む対処方法及び障害発生時刻とを関連付けた障害対処情報も取得する（Ｓ４）。 The information acquiring unit 21 refers to the storage unit 10 and acquires topology information indicating a connection relationship between a plurality of nodes configuring the network (S2). The information acquisition unit 21 refers to the storage unit 10, and deals with one or more failures that have occurred in the network, the cause node that caused each failure, and the countermeasure for the cause, and the failure occurrence time. Is also obtained (S4).

情報取得部２１は、記憶部１０を参照して、ノードにおいて発生したイベントの時系列データであるイベント情報も取得する（Ｓ６）。関連ノードトポロジー特定部２２は、障害対処情報に含まれる各原因ノードについて関連ノードトポロジーを抽出する（Ｓ８）。関連障害抽出部２３は、原因ノードが同一であり、原因ノードの対応処置が同一であり、かつ関連ノードトポロジーを構成するノードが同一である障害を関連障害として抽出する（Ｓ１０）。 The information acquisition unit 21 also refers to the storage unit 10 to acquire event information that is time-series data of an event that has occurred in the node (S6). The related node topology specifying unit 22 extracts a related node topology for each cause node included in the failure handling information (S8). The related fault extracting unit 23 extracts, as the related fault, a fault in which the cause node is the same, the corresponding action of the cause node is the same, and the nodes configuring the related node topology are the same (S10).

学習部２４は、関連障害抽出部２３が抽出した関連障害毎に、各関連障害を教師データとして障害の対処方法の尤度を出力するための学習器を生成する（Ｓ１２）。学習部２４がすべての関連障害に対応する学習器を生成すると、本フローチャートにおける処理は終了する。 The learning unit 24 generates a learning device for outputting the likelihood of the method of coping with the failure with each related failure as teacher data for each related failure extracted by the related failure extraction unit 23 (S12). When the learning unit 24 generates the learning devices corresponding to all the related failures, the processing in this flowchart ends.

＜情報処理装置１が奏する効果＞
以上説明したように、実施の形態に係る情報処理装置１によれば、ネットワークにおける障害の対処方法の判定精度を向上することができる。特に、実施の形態に係る情報処理装置１は、関連ノードトポロジーを構成する各ノードが発生させたイベントの時系列データと障害の対処方法とを教師データとして学習器を生成するため、教師データから障害と無関係な情報を除くことができるので、学習器の判定精度を向上できる。 <Effects of Information Processing Apparatus 1>
As described above, according to the information processing apparatus 1 according to the embodiment, it is possible to improve the accuracy of determining a method of coping with a failure in a network. In particular, the information processing apparatus 1 according to the embodiment generates a learning device using time-series data of an event generated by each node configuring the related node topology and a method of coping with a failure as teacher data. Since information irrelevant to the obstacle can be excluded, the accuracy of the learning device determination can be improved.

また、実施の形態に係る情報処理装置１は、ネットワークの構造が仮に複雑であっても、ネットワークの部分集合である関連ノードトポロジーを単位として学習器を生成する。これにより、実施の形態に係る情報処理装置１は、ネットワークのトポロジーの形状に依存せず、障害種別に対して発生したイベントの発生順序を学習することができる。 In addition, the information processing apparatus 1 according to the embodiment generates a learning device for each related node topology, which is a subset of the network, even if the structure of the network is complex. Thereby, the information processing apparatus 1 according to the embodiment can learn the order of occurrence of the events that have occurred for the failure types, without depending on the topology shape of the network.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されず、その要旨の範囲内で種々の変形及び変更が可能である。例えば、装置の分散・統合の具体的な実施の形態は、以上の実施の形態に限られず、その全部又は一部について、任意の単位で機能的又は物理的に分散・統合して構成することができる。また、複数の実施の形態の任意の組み合わせによって生じる新たな実施の形態も、本発明の実施の形態に含まれる。組み合わせによって生じる新たな実施の形態の効果は、もとの実施の形態の効果を合わせ持つ。 As described above, the present invention has been described using the embodiment, but the technical scope of the present invention is not limited to the scope described in the above embodiment, and various modifications and changes are possible within the scope of the gist. is there. For example, the specific embodiment of the dispersion / integration of the apparatus is not limited to the above-described embodiment, and all or a part of the apparatus may be functionally or physically dispersed / integrated in an arbitrary unit. Can be. Further, new embodiments that are generated by arbitrary combinations of the plurality of embodiments are also included in the embodiments of the present invention. The effect of the new embodiment caused by the combination has the effect of the original embodiment.

１・・・情報処理装置
１０・・・記憶部
２０・・・制御部
２１・・・情報取得部
２２・・・関連ノードトポロジー特定部
２３・・・関連障害抽出部
２４・・・学習部
２５・・・判定部
２２１・・・イベント抽出部
２２２・・・ノード抽出部

DESCRIPTION OF SYMBOLS 1 ... Information processing apparatus 10 ... Storage part 20 ... Control part 21 ... Information acquisition part 22 ... Related node topology specification part 23 ... Related failure extraction part 24 ... Learning part 25 ... Determining unit 221 ... Event extracting unit 222 ... Node extracting unit

Claims

(1) topology information indicating a connection relationship between a plurality of nodes configuring the network; (2) each of one or more faults that have occurred in the network, a cause node that caused each fault and a corresponding measure for the cause Doo and troubleshooting information that associates the Action and failure time including an information acquiring unit that acquires a time series data of the events that occurred in (3) the node;
Wherein for each of the troubleshooting information to the one or more disorders associated with, an associated node topology configured node that caused the event Te predetermined period in odor including time of occurrence of each failure as an element, the topology information Related node topology identification unit to identify by referring to
The time-series data of the event generated by each node configuring the related node topology in the time-series data, and the cause of the event generated by each node configuring the related node topology in the troubleshooting information. and the Action of the fault as teacher data composed, a learning unit for generating the machine learning learning device for outputting a likelihood of Action of a fault in the network,
An information processing apparatus comprising:

The related node topology specifying unit includes:
For each of the one or more faults, an event extraction unit that extracts a node that has generated an event within a predetermined period including the time of occurrence of each fault;
Among the nodes extracted by the event extraction unit, a node extraction unit that extracts, as a connected subgraph-related node topology, a set of nodes in which all nodes existing on the path leading to the cause node are generating an event,
including,
The information processing device according to claim 1.

A related fault extracting unit that extracts, as the related fault, a fault in which the cause node is the same, the corresponding action of the cause node is the same, and the nodes configuring the related node topology are the same among the one or more faults. Further comprising
The learning unit sets, as teacher data, time-series data of an event generated by each node configuring the related node topology in the related failure and a method of coping with the failure.
The information processing device according to claim 2.

The node extraction unit, if the manufacturer of the device constituting the node is different, to extract as a different related node topology,
The information processing apparatus according to claim 2.

The node extraction unit, even if the manufacturer of the device constituting the node is the same, if the type of the device is different, extract as a different related node topology,
The information processing device according to claim 2.

The node extraction unit, even if the manufacturer of the device constituting the node and the type of the device are the same, if the version of the firmware incorporated in the device is different, extract as a different related node topology,
The information processing apparatus according to claim 2.

An event that has occurred due to the failure of the network, and a determination unit that outputs a likelihood of a method of coping with the failure based on the learning device generated by the learning unit,
The information processing apparatus according to claim 1.

The determination unit outputs the likelihood of a coping method associated with the learning device for each of one or more learning devices generated with respect to a related node topology including a set of nodes that caused the event, and outputs Determining a method of coping with the failure based on the one or more likelihoods determined;
The information processing device according to claim 7.

The processor
Obtaining topology information indicating a connection relationship between a plurality of nodes constituting the network;
Acquiring failure handling information that associates each of the one or more failures that have occurred in the network with a troubleshooting method and a failure occurrence time of each failure;
Obtaining a time series data of the events that have occurred in the node,
Wherein for each of the troubleshooting information to the one or more disorders associated with, an associated node topology configured node that caused the event Te predetermined period in odor including time of occurrence of each failure as an element, the topology information Referring to and identifying;
The time-series data of the event generated by each node configuring the related node topology in the time-series data, and the cause of the event generated by each node configuring the related node topology in the troubleshooting information. a step of generating by a machine learning and as consisting Action and the teacher data of the fault, the determiner for determining the likelihood of Action of the disorder,
Information processing method for executing.

On the computer,
A function of acquiring topology information indicating a connection relationship between a plurality of nodes constituting the network;
A function of acquiring failure handling information associating each of one or more failures that have occurred in the network with a troubleshooting method and a failure occurrence time of each failure;
The function of acquiring the time series data of the events that have occurred in the node,
Wherein for each of the troubleshooting information to the one or more disorders associated with, an associated node topology configured node that caused the event Te predetermined period in odor including time of occurrence of each failure as an element, the topology information With reference to the
The time-series data of the event generated by each node configuring the related node topology in the time-series data, and the cause of the event generated by each node configuring the related node topology in the troubleshooting information. and as becomes teacher data and Action of the disorder, a function of generating a machine learning determiner for determining the likelihood of Action of the disorder,
The program that realizes.