JP7720005B2

JP7720005B2 - Anomaly location estimation device, anomaly location estimation method, and program

Info

Publication number: JP7720005B2
Application number: JP2024505678A
Authority: JP
Inventors: 洋一松尾; 敬志郎渡辺
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2022-03-07
Filing date: 2022-03-07
Publication date: 2025-08-07
Anticipated expiration: 2042-03-07
Also published as: WO2023170760A1; JPWO2023170760A1

Description

本発明は、通信ネットワークから収集したログから、通信ネットワークの異常箇所を推定する技術に関連するものである。 The present invention relates to a technology for estimating abnormal locations in a communication network from logs collected from the communication network.

通信事業者にとって、通信ネットワーク内に発生する異常に対して、異常の状態の把握や迅速な対応は重要である。こうした中で、通信ネットワーク内の異常を早期に検知するための研究や、異常箇所の推定を行う研究が行われている。 For telecommunications carriers, it is important to understand the status of anomalies and respond quickly to any abnormalities that occur within their communications networks. In this context, research is being conducted into early detection of anomalies within communications networks and estimating the location of anomalies.

異常箇所を推定する手法として、ベイジアンネットワークを用いて、異常箇所とそれによって引き起こされる通信ネットワーク内のデータ（観測データと呼ぶ）の変化の関係性をモデル化（因果モデルと呼ぶ）し、異常時の観測データから異常箇所を推定する手法が提案されている（非特許文献１）。 As a method for estimating the location of an anomaly, a method has been proposed that uses a Bayesian network to model (called a causal model) the relationship between the anomaly location and the changes in data (called observed data) within the communication network that are caused by it, and then estimate the location of the anomaly from the observed data at the time of the anomaly (Non-Patent Document 1).

通信ネットワークはiBGP（https://datatracker.ietf.org/doc/html/rfc4271）やOSPF（https://datatracker.ietf.org/doc/html/rfc5340）などの自律システム(AS)内のルーティングを行うInterior Gateway Protocol（IGP）の通信プロトコルによって運用されており、ルータ間で通信ができなくなると、対向ルータと通信できなくなったことを表すｓｙｓｌｏｇが生成されるため、エキスパートオペレーターは異常時にはルータから発生するリンクダウンに関するｓｙｓｌｏｇを用いて、ルータの正常・異常を判定することができる。 The communication network is operated using Interior Gateway Protocol (IGP) communication protocols, such as iBGP (https://datatracker.ietf.org/doc/html/rfc4271) and OSPF (https://datatracker.ietf.org/doc/html/rfc5340), which handle routing within autonomous systems (AS).When communication between routers is disabled, a syslog is generated indicating that communication with the opposing router is no longer possible.In the event of an abnormality, expert operators can use the syslog regarding link downs generated by the router to determine whether the router is operating normally or abnormally.

従来技術では、エキスパートオペレーターの知識から、ルータの異常は、異常状態になったルータの観測データと隣接しているルータの観測データのみに影響があるという仮定をもとに、通信ネットワーク内の機器に対して、各機器の状態を表す機器ノードと、その機器からリンクダウンに関するｓｙｓｌｏｇが発生したかどうかを表す観測ノードからなる因果モデルを構築し、異常箇所の判定を行っている。 In conventional technology, based on the knowledge of expert operators and the assumption that a router abnormality only affects the observation data of the router in an abnormal state and the observation data of adjacent routers, a causal model is constructed for devices within a communication network, consisting of device nodes that represent the status of each device and observation nodes that indicate whether a syslog related to a link down has been generated from that device, and the location of the abnormality is determined.

Srikanth Kandula, Dina Katabi, and Jean-philippe Vasseur. Shrink: A tool for failure diagnosis in IP networks. Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data, pages 173-178, 2005.Srikanth Kandula, Dina Katabi, and Jean-philippe Vasseur. Shrink: A tool for failure diagnosis in IP networks. Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data, pages 173-178, 2005.

従来技術では、ルータの異常は、異常状態になったルータの観測データと隣接しているルータの観測データのみに影響があるという仮定をもとに因果モデルを作成していたが、通信ネットワークの異常においては、異常状態となったルータがリンクダウンを表すｓｙｓｌｏｇを生成できるとは限らない。 In conventional technology, causal models were created based on the assumption that router abnormalities only affect the observation data of the router in an abnormal state and the observation data of adjacent routers.However, in the case of an abnormality in a communication network, it is not necessarily the case that a router in an abnormal state will be able to generate a syslog indicating a link down.

例えば、ＣＰＵチップの故障などでは、故障が発生したそのルータではプログラムを処理できなくなるため、ｓｙｓｌｏｇを生成できなくなる。そのため、従来技術では、因果モデルへの入力（観測データ）が、仮定に反する（隣接ルータからはリンクダウンに関するｓｙｓｌｏｇが発生するが、異常ルータからはｓｙｓｌｏｇが発生しない）ものとなる場合があり、結果として異常箇所の推定精度が下がるという課題がある。 For example, if a CPU chip fails, the router where the failure occurred will no longer be able to process programs and will therefore no longer be able to generate a syslog. Therefore, with conventional technology, the input (observed data) to the causal model may contradict the assumption (a syslog related to a link down will be generated from an adjacent router, but no syslog will be generated from the abnormal router), resulting in a problem of reduced accuracy in estimating the location of the abnormality.

本発明は上記の点に鑑みてなされたものであり、通信ネットワークから収集したログを利用して、通信ネットワークの異常箇所推定を行う技術において、異常箇所の推定精度を向上させることを目的とする。 The present invention has been made in consideration of the above points, and aims to improve the accuracy of estimating abnormal locations in a technology that uses logs collected from a communication network to estimate abnormal locations in the communication network.

開示の技術によれば、複数の機器を有する通信ネットワークにおける異常箇所を推定する異常箇所推定装置であって、
第１の機器と通信できなくなったことを示す、第２の機器から発生したログを収集する観測データ収集部と、
各機器の状態を表す機器ノードと、各機器の観測結果を表す観測ノードからなる因果モデルにおいて、前記ログに基づいて前記第１の機器に対応する観測ノードへの入力値を決定し、決定した入力値を適用した因果モデルから異常箇所を推定する因果モデル推論部と
を備える異常箇所推定装置が提供される。 According to the disclosed technology, there is provided an anomaly location estimation device that estimates an anomaly location in a communication network having a plurality of devices, the anomaly location estimation device comprising:
an observation data collection unit that collects logs generated from the second device indicating that communication with the first device has become impossible;
An anomaly location estimation device is provided, comprising: a causal model inference unit that determines an input value to an observation node corresponding to the first device based on the log in a causal model consisting of device nodes that represent the state of each device and observation nodes that represent the observation results of each device, and estimates an anomaly location from a causal model to which the determined input value is applied.

開示の技術によれば、通信ネットワークから収集したログを利用して、通信ネットワークの異常箇所推定を行う技術において、異常箇所の推定精度を向上させることが可能となる。 The disclosed technology makes it possible to improve the accuracy of estimating abnormal locations in a communication network by using logs collected from the communication network.

異常箇所推定装置の構成図である。FIG. 1 is a configuration diagram of an abnormality location estimation device. 装置のハードウェア構成例を示す図である。FIG. 2 illustrates an example of a hardware configuration of the apparatus. 通信ネットワークの構成例を示す図である。FIG. 1 is a diagram illustrating an example of the configuration of a communication network. 因果モデルを示す図である。FIG. 1 is a diagram illustrating a causal model. 因果モデルへの入力を示す図である。FIG. 1 illustrates inputs to a causal model. 因果モデルへの入力を示す図である。FIG. 1 illustrates inputs to a causal model.

以下、図面を参照して本発明の実施の形態（本実施の形態）を説明する。以下で説明する実施の形態は一例に過ぎず、本発明が適用される実施の形態は、以下の実施の形態に限られるわけではない。 The following describes an embodiment of the present invention (the present embodiment) with reference to the drawings. The embodiment described below is merely an example, and the embodiments to which the present invention can be applied are not limited to the following embodiment.

（装置構成例）
図１に、本実施の形態における異常箇所推定装置１００の構成例を示す。図１に示すように、異常箇所推定装置１００は、因果モデル構築エンジン１１０、因果モデル推論エンジン１２０、観測データ収集エンジン１３０、観測データＤＢ１４０、及び出力インターフェース１５０を有する。 (Device configuration example)
1 shows an example of the configuration of an abnormality location estimation device 100 according to this embodiment. As shown in Fig. 1, the abnormality location estimation device 100 includes a causal model construction engine 110, a causal model inference engine 120, an observation data collection engine 130, an observation data DB 140, and an output interface 150.

なお、因果モデル構築エンジン１１０、因果モデル推論エンジン１２０、観測データ収集エンジン１３０をそれぞれ、因果モデル構築部１１０、因果モデル推論部１２０、観測データ収集部１３０と呼んでもよい。また、因果モデル構築エンジン１１０、因果モデル推論エンジン１２０、観測データ収集エンジン１３０をそれぞれ、因果モデル構築回路１１０、因果モデル推論回路１２０、観測データ収集回路１３０と呼んでもよい。異常箇所推定装置１００の動作概要は下記のとおりである。 The causal model construction engine 110, the causal model inference engine 120, and the observation data collection engine 130 may also be referred to as the causal model construction unit 110, the causal model inference unit 120, and the observation data collection unit 130, respectively. The causal model construction engine 110, the causal model inference engine 120, and the observation data collection engine 130 may also be referred to as the causal model construction circuit 110, the causal model inference circuit 120, and the observation data collection circuit 130, respectively. The operation of the anomaly location estimation device 100 is outlined below.

観測データ収集エンジン１３０は、通信ネットワークムから観測データ（機器から発生するログ等）を収集し、リンクダウンに関するログの発生状況を観測データＤＢ１４０へ格納する。以降、本実施の形態では、ログとしてｓｙｓｌｏｇを例に挙げて説明する。 The observation data collection engine 130 collects observation data (such as logs generated by devices) from the communication network and stores the occurrence of link-down related logs in the observation data DB 140. Hereinafter, in this embodiment, syslog will be used as an example of a log.

因果モデル構築エンジン１１０は、エキスパートの知識等を入力とし、観測データ収集エンジン１３０から取得された通信ネットワークの情報に基づいて、因果モデルを構築する。因果モデル推論エンジン１２０は観測データＤＢ１４０へ格納されたリンクダウンに関するｓｙｓｌｏｇの発生状況をもとに、観測ノードの値を決定し、異常箇所の推定を実施し、出力インターフェース１５０に推定結果である異常箇所を出力する。The causal model construction engine 110 takes expert knowledge and other information as input and constructs a causal model based on communication network information acquired from the observation data collection engine 130. The causal model inference engine 120 determines the value of the observation node based on the occurrence status of syslogs related to link downs stored in the observation data DB 140, estimates the location of the anomaly, and outputs the estimated result of the anomaly location to the output interface 150.

出力インターフェース１５０は、利用者に対して通信ネットワークの中の異常発生箇所とその際の最大事後確率等を表示する。また、出力インターフェース１５０は、運用システムに新たにマシンが追加された際などは、因果グラフへのノードの追加を行なったり、また、それに伴う因果関係の変化を利用者に修正させることもできる。 The output interface 150 displays to the user the location of an anomaly in the communication network and the maximum a posteriori probability at that time. In addition, when a new machine is added to the operational system, the output interface 150 can add a node to the causal graph and also allow the user to correct any changes in the causal relationships that result from this.

（ハードウェア構成例）
異常箇所推定装置１００は、例えば、コンピュータにプログラムを実行させることにより実現できる。このコンピュータは、物理的なコンピュータであってもよいし、クラウド上の仮想マシンであってもよい。 (Example of hardware configuration)
The anomaly location estimation device 100 can be realized, for example, by causing a computer to execute a program. This computer may be a physical computer or a virtual machine on the cloud.

すなわち、異常箇所推定装置１００は、コンピュータに内蔵されるＣＰＵやメモリ等のハードウェア資源を用いて、異常箇所推定装置１００で実施される処理に対応するプログラムを実行することによって実現することが可能である。上記プログラムは、コンピュータが読み取り可能な記録媒体（可搬メモリ等）に記録して、保存したり、配布したりすることが可能である。また、上記プログラムをインターネットや電子メール等、ネットワークを通して提供することも可能である。 In other words, the anomaly location estimation device 100 can be realized by using hardware resources such as a CPU and memory built into a computer to execute a program corresponding to the processing performed by the anomaly location estimation device 100. The program can be recorded on a computer-readable recording medium (such as portable memory) and saved or distributed. The program can also be provided via a network such as the Internet or email.

図２は、上記コンピュータのハードウェア構成例を示す図である。図２のコンピュータは、それぞれバスＢＳで相互に接続されているドライブ装置１０００、補助記憶装置１００２、メモリ装置１００３、ＣＰＵ１００４、インターフェース装置１００５、表示装置１００６、入力装置１００７、出力装置１００８等を有する。 Figure 2 is a diagram showing an example of the hardware configuration of the computer. The computer in Figure 2 has a drive device 1000, an auxiliary storage device 1002, a memory device 1003, a CPU 1004, an interface device 1005, a display device 1006, an input device 1007, an output device 1008, etc., which are interconnected by a bus BS.

当該コンピュータでの処理を実現するプログラムは、例えば、ＣＤ－ＲＯＭ又はメモリカード等の記録媒体１００１によって提供される。プログラムを記憶した記録媒体１００１がドライブ装置１０００にセットされると、プログラムが記録媒体１００１からドライブ装置１０００を介して補助記憶装置１００２にインストールされる。但し、プログラムのインストールは必ずしも記録媒体１００１より行う必要はなく、ネットワークを介して他のコンピュータよりダウンロードするようにしてもよい。補助記憶装置１００２は、インストールされたプログラムを格納すると共に、必要なファイルやデータ等を格納する。 The program that realizes processing on the computer is provided by a recording medium 1001, such as a CD-ROM or memory card. When the recording medium 1001 storing the program is set in the drive device 1000, the program is installed from the recording medium 1001 to the auxiliary storage device 1002 via the drive device 1000. However, the program does not necessarily have to be installed from the recording medium 1001; it may be downloaded from another computer via a network. The auxiliary storage device 1002 stores the installed program as well as necessary files, data, etc.

メモリ装置１００３は、プログラムの起動指示があった場合に、補助記憶装置１００２からプログラムを読み出して格納する。ＣＰＵ１００４は、メモリ装置１００３に格納されたプログラムに従って、異常箇所推定装置１００に係る機能を実現する。インターフェース装置１００５は、ネットワークに接続するためのインターフェースとして用いられる。表示装置１００６はプログラムによるＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）等を表示する。入力装置１００７はキーボード及びマウス、ボタン、又はタッチパネル等で構成され、様々な操作指示を入力させるために用いられる。出力装置１００８は演算結果を出力する。 When an instruction to start a program is received, the memory device 1003 reads and stores the program from the auxiliary storage device 1002. The CPU 1004 realizes functions related to the anomaly location estimation device 100 in accordance with the program stored in the memory device 1003. The interface device 1005 is used as an interface for connecting to a network. The display device 1006 displays a GUI (Graphical User Interface) or the like according to the program. The input device 1007 is composed of a keyboard, mouse, buttons, touch panel, or the like, and is used to input various operational instructions. The output device 1008 outputs the results of calculations.

（動作例）
以下では、異常箇所推定装置１００の動作を、より具体的な例を用いて説明する。なお、本実施の形態では、ルータにより構成される通信ネットワークを示しているが、これは一例である。本発明は、通信ネットワークを構成するノードの種類に依らずに適用可能である。 (Example of operation)
The operation of the anomaly location estimation device 100 will be described below using a more specific example. Note that, although a communication network configured with routers is shown in this embodiment, this is merely an example. The present invention is applicable regardless of the type of nodes that configure the communication network.

＜因果モデルについて＞
図３に、観測データ収集エンジン１３０が観測データを収集する対象となる通信ネットワークの例を示す。図３に示すように、この通信ネットワークは、ルータ１～６が図示するとおりに接続されたネットワークである。例えば、ルータ１とルータ２は直接に接続されたおり、これらは互いに隣接関係にある。ルータ１とルータ４は直接には接続されておらず、これらは隣接関係にない。 <About the causal model>
3 shows an example of a communication network from which the observation data collection engine 130 collects observation data. As shown in FIG. 3, this communication network is a network in which routers 1 to 6 are connected as shown. For example, routers 1 and 2 are directly connected and are adjacent to each other. Routers 1 and 4 are not directly connected and are not adjacent to each other.

因果モデル構築エンジン１１０は、エキスパートオペレーターの知識等に基づいて、図３に示す通信ネットワークに対して、図４に示す因果モデルを構築する。因果モデルは、通信ネットワーク内の機器（ルータ）に対して、各機器の状態を表す機器ノードと、その機器からリンクダウンに関するｓｙｓｌｏｇが発生したかどうかを表す観測ノードからなる。つまり、観測ノードは、各機器の観測結果を表す。なお、因果モデルをベイジアンネットワークと呼んでもよい。 The causal model construction engine 110 constructs the causal model shown in Figure 4 for the communication network shown in Figure 3 based on the knowledge of an expert operator, etc. The causal model consists of device nodes that represent the status of each device (router) within the communication network, and observation nodes that represent whether a syslog related to a link down has occurred from that device. In other words, the observation nodes represent the observation results of each device. The causal model may also be called a Bayesian network.

図３の通信ネットワークに対する因果モデルは、図４に示すとおりとなる。例えば、図４の因果モデルにおいて、機器ノードのルータ１は、観測ノードのルータ１、２と接続されている。これは、ルータ１に異常が発生した場合に、ルータ１の観測データとルータ２の観測データに影響する可能性があるということを示している。 The causal model for the communication network in Figure 3 is as shown in Figure 4. For example, in the causal model in Figure 4, Router 1, an equipment node, is connected to Routers 1 and 2, which are observation nodes. This indicates that if an abnormality occurs in Router 1, it may affect the observation data of Router 1 and the observation data of Router 2.

また、例えば、図４の因果モデルにおいて、機器ノードのルータ２は、観測ノードのルータ１、２、３、６と接続されている。これは、ルータ２に異常が発生した場合に、ルータ１、２、３、６のそれぞれの観測データに影響する可能性があるということを示している。 Also, for example, in the causal model of Figure 4, Router 2, an equipment node, is connected to Routers 1, 2, 3, and 6, which are observation nodes. This indicates that if an abnormality occurs in Router 2, it may affect the observation data of Routers 1, 2, 3, and 6.

＜因果モデルへの入力について＞
本実施の形態では、ＩＧＰプロトコルより生成されるｓｙｓｌｏｇの中身を考慮して、因果モデルの観測ノードへの入力を定義することで、異常箇所の推定精度を向上させる。詳細は以下のとおりである。 <Input to the causal model>
In this embodiment, the accuracy of estimating anomaly locations is improved by defining the input to the observation node of the causal model in consideration of the contents of the syslog generated by the IGP protocol. The details are as follows.

なお、本実施の形態では、ＩＧＰプロトコルのｉＢＧＰとＯＳＰＦを例にとって説明するが、他のプロトコルに対しても同様に実施可能である。また、本実施の形態では、ｉＢＧＰとＯＳＰＦが生成するｓｙｓｌｏｇを例にとって説明するが、通信ネットワークの監視においては、生成されたｓｙｓｌｏｇを基にメッセージ等を正規化し新たなログとして生成しオペレーターへ通知する場合や、ｐｉｎｇ等のツールを用いて死活監視を実施し、その結果をアラームとしてオペレーターへ通知する場合もある。その場合のアラームにおいてもメッセージ内に対向ルータ（あるルータに隣接する他のルータ）の情報がある限り、本発明に係る技術を実施可能である。 In this embodiment, the IGP protocols iBGP and OSPF are used as examples, but the technology can be applied to other protocols as well. Furthermore, while the syslog generated by iBGP and OSPF is used as an example in this embodiment, in monitoring communication networks, messages and the like may be normalized based on the generated syslog to generate a new log and notify the operator, or a tool such as ping may be used to perform alive monitoring, and the results may be notified to the operator as an alarm. In such cases, the technology of the present invention can be applied to alarms as long as the message contains information about the opposing router (the other router adjacent to a given router).

まず、ｉＢＧＰとＯＳＰＦのｓｙｓｌｏｇについて説明する。ｉＢＧＰやＯＳＰＦでは通信ネットワークの異常により、あるルータから隣接ルータに疎通できない場合、ｓｙｓｌｏｇが生成される。ｓｙｓｌｏｇメッセージの一例を以下に示す。 First, let's explain the syslog for iBGP and OSPF. In iBGP and OSPF, a syslog is generated when a router cannot communicate with an adjacent router due to an abnormality in the communication network. An example of a syslog message is shown below.

2021-12-21 13:00:00 Router1 192.168.10.1 OSPF neighbor down (Router2 192.168.10.2）
ｓｙｓｌｏｇはｉＢＧＰ／ＯＳＰＦのバージョンや、ｉＢＧＰ／ＯＳＰＦのｓｙｓｌｏｇを加工して生成されるアラームなどにより違いはあるものの、上記のように、タイムスタンプ、ホスト名、ホスト情報（ＩＰアドレスなど）、疎通できなくなった対向ルータの情報（対向ルータのホスト名やＩＰアドレスなど）、などが記載されている。 2021-12-21 13:00:00 Router1 192.168.10.1 OSPF neighbor down (Router2 192.168.10.2)
Although the syslog differs depending on the iBGP/OSPF version and the alarms generated by processing the iBGP/OSPF syslog, as described above, it contains a timestamp, host name, host information (IP address, etc.), information about the opposing router that has lost communication (host name and IP address of the opposing router, etc.), etc.

本実施の形態では、対向ルータの情報をもとに観測ノードの値を定義することで、課題を解決する。 In this embodiment, the problem is solved by defining the value of the observation node based on information from the opposing router.

ここで、異常箇所推定の対象となるシステム（通信ネットワーク）の因果モデルにおける機器ノードをｘ_ｉ、観測ノードをｙ_ｉ、ｉ∈（１，…Ｎ）とする。Ｎは機器数である。 Here, the device node in the causal model of the system (communication network) that is the target of anomaly location estimation is denoted by x _i , and the observation node is denoted by y _i , where iε(1, . . . N), where N is the number of devices.

各ｘ_ｉは０（正常状態）か１（異常状態）の値を取るとする。なお、０か１の２値ではなく、３値以上の多値を取ることも可能であり、その場合は最小値が正常状態、最大値が異常状態、その間の値ｃは、「ｃ／（最大値－最小値）」の割合で異常となっていることを意味する値、などのように定義する。 Each x _i takes on a value of 0 (normal state) or 1 (abnormal state). Note that instead of the two values of 0 and 1, it is possible for it to take on three or more values, in which case the minimum value is the normal state, the maximum value is the abnormal state, and the value c between them is defined as a value that means that the abnormality occurs at the rate of "c/(maximum value - minimum value)".

各ｙ_ｉは０か１の値を取るとし、ｉ番目のルータと疎通できなくなったことを表すＢＧＰ／ＯＳＰＦのｓｙｓｌｏｇがｉ番目以外のルータで発生していた場合、ｙ_ｉを１とし、そうでない場合を０とすることとする。なお、０か１の２値ではなく、３値以上の多値を取ることも可能であり、その場合はｉ番目のルータのリンクダウンに関するｓｙｓｌｏｇの他ノードでの発生件数を値とするなどのように定義する。 Each _yi takes on a value of 0 or 1, and if a BGP/OSPF syslog indicating that communication with the i-th router has been lost occurs at a router other than the i-th router, _yi is set to 1, otherwise it is set to 0. Note that instead of the binary values of 0 and 1, it is also possible for yi to take on three or more values, in which case the value is defined as the number of syslog occurrences at other nodes related to link-down of the i-th router.

上記の因果モデルへの入力値については、因果モデル推論エンジン１２０が、観測データＤＢ１４０から読み出したｓｙｓｌｏｇから決定（計算）する。あるいは、観測データ収集エンジン１３０が、収集したｓｙｓｌｏｇから入力値を決定し、それを観測データＤＢ１４０に格納してもよい。この場合、因果モデル推論エンジン１２０は、観測データＤＢ１４０から読み出した値をｙ_ｉの値としてそのまま使用できる。 The input values to the above causal model are determined (calculated) by the causal model inference engine 120 from the syslog read from the observation data DB 140. Alternatively, the observation data collection engine 130 may determine the input values from the collected syslog and store them in the observation data DB 140. In this case, the causal model inference engine 120 can use the value read from the observation data DB 140 as the value of _yi as is.

因果モデルへの入力に関して、従来技術（非特許文献１）と本発明に係る技術との違いを図５、図６を用いて説明する。ここでは、ルータ１、３，６において、対向ルータ（ルータ２）と疎通できないことを示すｓｙｓｌｏｇが発生した場合の観測ノードへの入力を説明する。図５、図６の観測ノードにおいて、網掛けのされたノードが値１（異常状態）を示し、網掛けなしのノードが値０（正常状態）を示す。 The differences between the conventional technology (Non-Patent Document 1) and the technology of the present invention regarding input to the causal model are explained using Figures 5 and 6. Here, we will explain the input to the observation node when a syslog occurs in routers 1, 3, and 6 indicating that communication with the opposing router (router 2) is not possible. In the observation nodes of Figures 5 and 6, shaded nodes indicate a value of 1 (abnormal state), and unshaded nodes indicate a value of 0 (normal state).

図５は、従来技術での因果モデルへの入力を示している。図５に示すとおり、ｓｙｓｌｏｇを観測したルータ１、３、６の観測ノードとしての入力値が１となり、異常が発生した可能性が高いと考えられるルータ２の観測ノードとしての入力値が０となる。 Figure 5 shows the input to the causal model in the prior art. As shown in Figure 5, the input value as the observation node for routers 1, 3, and 6, which observed the syslog, is 1, and the input value as the observation node for router 2, which is considered to have a high probability of an abnormality, is 0.

図６は、本発明に係る技術での因果モデルへの入力を示している。図６に示すとおり、ｓｙｓｌｏｇを観測したルータ１、３、６の観測ノードとしての入力値が０となり、異常が発生した可能性が高いと考えられるルータ２の観測ノードとしての入力値が１となる。このように、実際に発生した可能性が高い事象に合った入力値を得ることができるので、推定精度を高めることができる。 Figure 6 shows the input to the causal model in the technology of the present invention. As shown in Figure 6, the input value as the observation node for routers 1, 3, and 6, which observed the syslog, is 0, and the input value as the observation node for router 2, which is considered to have a high probability of an abnormality having occurred, is 1. In this way, it is possible to obtain input values that match the event that is likely to have actually occurred, thereby improving estimation accuracy.

＜因果モデルを用いた推論＞
因果モデルを用いた推論自体は従来技術（例えば非特許文献１）と同じであり、事前確率Ｐ（ｘ_ｉ）と条件付き確率Ｐ（ｙ_ｊ｜ｘ_ｉ）を規定し、推論を行う。以下、因果モデルを用いた推論処理の概要を説明する。 <Inference using causal models>
The inference itself using a causal model is the same as that in the prior art (for example, Non-Patent Document 1), and inference is performed by specifying the prior probability P(x _i ) and the conditional probability P(y _j |x _i ). An overview of the inference process using a causal model will be explained below.

図４に示した因果モデル（どのノード間をエッジで接続するかを示す情報）は、通信ネットワークから得られる情報に基づいて、因果モデル構築エンジン１１０が作成し、因果モデル推論エンジン１２０へ渡す。なお、因果モデル（どのノード間をエッジで接続するかを示す情報）を予め作成しておき、因果モデル推論エンジン１２０が備える記憶部（メモリ等）に格納しておくこととしてもよい。 The causal model shown in Figure 4 (information indicating which nodes are connected by edges) is created by the causal model construction engine 110 based on information obtained from the communication network and passed to the causal model inference engine 120. Note that the causal model (information indicating which nodes are connected by edges) may be created in advance and stored in a storage unit (memory, etc.) provided in the causal model inference engine 120.

事前確率Ｐ（ｘ_ｉ）は予め定めておき、例えば、因果モデル推論エンジン１２０が備える記憶部（メモリ等）に格納しておく。 The prior probability P(x _i ) is determined in advance and stored in, for example, a storage unit (memory or the like) provided in the causal model inference engine 120 .

ここで、Ｘ＝（ｘ_ｉ，ｘ_２，…，ｘ_Ｎ），ｘ_ｉ∈｛０，１｝、Ｙ＝（ｙ_ｉ，ｙ_２，…，ｙ_Ｎ），ｙ_ｉ∈｛０，１｝とする。Ｘは機器ノード、すなわち推定対象であり、Ｙは観測ノード、すなわちログに基づき得られた観測結果の値である。 Here, let X = ( _xi , _x2 , ..., _xN ), _xiε {0,1}, and Y = ( _yi , _y2 , ..., _yN ), _yiε {0,1}. X is a device node, i.e., an estimation target, and Y is an observation node, i.e., a value of an observation result obtained based on a log.

因果モデル推定エンジン１２０は、観測結果（因果モデルへの入力値Ｙ）を用いて、下記の式で示されるＸ'を求める。下記の式のａｒｇｍａｘはＸについてのａｒｇｍａｘであり、Ｘ'は事後確率Ｐ（Ｘ｜Ｙ）を最大にするＸである。The causal model estimation engine 120 uses the observation results (input value Y to the causal model) to find X' shown in the following equation. In the equation below, argmax is the argmax for X, and X' is X that maximizes the posterior probability P(X|Y).

Ｘ'＝ａｒｇｍａｘＰ（Ｘ｜Ｙ）＝ａｒｇｍａｘ（Ｐ（Ｙ｜Ｘ）Ｐ（Ｘ））
条件付き確率Ｐ（ｙ_ｊ｜Ｘ）の計算については、例えば、観測ノードｙ_ｊに接続する全ての機器ノードの状態が正常であれば、観測ノードｙ_ｊが０（正常）になる確率はほぼ１であり、観測ノードｙ_ｊに接続する全ての機器ノードのうちの一部の機器ノードのみの状態が正常であれば、観測ノードｙ_ｊが０（正常）になる確率は、全ての機器ノードのうちの正常な機器ノードの数に依存する値になる、といった計算ができる方法であればどのような方法で計算してもよい。 X'=argmaxP(X|Y)=argmax(P(Y|X)P(X))
Regarding the calculation of the conditional probability P(y _j |X), any method may be used as long as it can calculate, for example, that if the states of all device nodes connected to observation node y _j are normal, the probability that _observation node y _j will be 0 (normal) is approximately 1, and if the states of only some device nodes among all device nodes connected to observation node y _{j are normal, the probability that observation node y j} will be 0 (normal) will be a value that depends on the number of normal device nodes among all device nodes.

因果モデル推定エンジン１２０により得られた推定結果に関して、出力インターフェース１５０は、値が１の機器を推定故障個所として出力してもよいし、値が１の機器と、当該機器に接続される対向機器との間のリンクを推定故障個所として出力してもよい。 Regarding the estimation results obtained by the causal model estimation engine 120, the output interface 150 may output the equipment with a value of 1 as the estimated fault location, or may output the link between the equipment with a value of 1 and the opposing equipment connected to that equipment as the estimated fault location.

（効果について）
上記のように、あるルータｉと疎通できないことを示す別のルータで発生したログにより、そのルータｉの観測ノードの入力値（ｙ_ｉの値）を定めるので、ｉ番目のルータが異常状態になり、リンクダウンに関するｓｙｓｌｏｇを生成できない場合においても、正常状態の対向ルータからの情報により異常箇所の推定が可能となり、異常箇所推定の精度向上が可能となる。 (About the effects)
As described above, the input value (value of y _i ) of the observation node of router i is determined based on a log generated in another router indicating that communication with that router i is not possible. Therefore, even if the i-th router goes into an abnormal state and is unable to generate a syslog related to link down, it is possible to estimate the location of the abnormality based on information from the opposing router that is in a normal state, thereby improving the accuracy of estimating the location of the abnormality.

（付記）
以上の実施形態に関し、更に以下の付記項を開示する。
（付記項１）
複数の機器を有する通信ネットワークにおける異常箇所を推定する異常箇所推定装置であって、
メモリと、
前記メモリに接続された少なくとも１つのプロセッサと、
を含み、
前記プロセッサは、
第１の機器と通信できなくなったことを示す、第２の機器から発生したログを収集し、
各機器の状態を表す機器ノードと、各機器の観測結果を表す観測ノードからなる因果モデルにおいて、前記ログに基づいて前記第１の機器に対応する観測ノードへの入力値を決定し、決定した入力値を適用した因果モデルから異常箇所を推定する
異常箇所推定装置。
（付記項２）
前記プロセッサは、前記第１の機器に対応する観測ノードへの入力値として、異常を示す値を決定する
付記項１に記載の異常箇所推定装置。
（付記項３）
複数の機器を有する通信ネットワークにおける異常箇所を推定する異常箇所推定装置として使用されるコンピュータが実行する異常箇所推定方法であって、
第１の機器と通信できなくなったことを示す、第２の機器から発生したログを収集し、
各機器の状態を表す機器ノードと、各機器の観測結果を表す観測ノードからなる因果モデルにおいて、前記ログに基づいて前記第１の機器に対応する観測ノードへの入力値を決定し、決定した入力値を適用した因果モデルから異常箇所を推定する
異常箇所推定方法。
（付記項４）
コンピュータに、付記項１又は２に記載の異常箇所推定装置における各処理を実行させるプログラムを記憶した非一時的記憶媒体。 (Additional Note)
The following additional clauses are disclosed in relation to the above-described embodiment.
(Additional note 1)
An anomaly location estimation device that estimates an anomaly location in a communication network having a plurality of devices,
Memory and
at least one processor coupled to said memory;
Including,
The processor:
collecting a log generated from the second device indicating that communication with the first device has been lost;
an anomaly location estimation device that, in a causal model consisting of an equipment node that represents the state of each equipment and an observation node that represents the observation result of each equipment, determines an input value to an observation node that corresponds to the first equipment based on the log, and estimates an anomaly location from the causal model to which the determined input value is applied.
(Additional note 2)
The anomaly location estimation device according to supplementary item 1, wherein the processor determines a value indicating an anomaly as an input value to an observation node corresponding to the first device.
(Additional note 3)
An anomaly location estimation method executed by a computer used as an anomaly location estimation device that estimates an anomaly location in a communication network having a plurality of devices, comprising:
collecting a log generated from the second device indicating that communication with the first device has been lost;
An anomaly location estimation method, in which, in a causal model consisting of device nodes representing the state of each device and observation nodes representing the observation results of each device, an input value to an observation node corresponding to the first device is determined based on the log, and an anomaly location is estimated from the causal model to which the determined input value is applied.
(Additional note 4)
A non-transitory storage medium storing a program for causing a computer to execute each process in the abnormality location estimation device according to appended item 1 or 2.

以上、本実施の形態について説明したが、本発明はかかる特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 The present embodiment has been described above, but the present invention is not limited to such a specific embodiment, and various modifications and variations are possible within the scope of the gist of the present invention as described in the claims.

１００異常箇所推定装置
１１０因果モデル構築エンジン
１２０因果モデル推論エンジン
１３０観測データ収集エンジン
１４０観測データＤＢ
１５０出力インターフェース
１０００ドライブ装置
１００１記録媒体
１００２補助記憶装置
１００３メモリ装置
１００４ＣＰＵ
１００５インターフェース装置
１００６表示装置
１００７入力装置
１００８出力装置 100 Abnormality location estimation device 110 Causal model construction engine 120 Causal model inference engine 130 Observation data collection engine 140 Observation data DB
150 Output interface 1000 Drive device 1001 Recording medium 1002 Auxiliary storage device 1003 Memory device 1004 CPU
1005 Interface device 1006 Display device 1007 Input device 1008 Output device

Claims

An anomaly location estimation device that estimates an anomaly location in a communication network having a plurality of devices,
an observation data collection unit that collects logs generated from the second device indicating that communication with the first device has become impossible;
an anomaly location inference device comprising: a causal model inference unit that determines an input value to an observation node corresponding to the first device based on the log in a causal model consisting of device nodes that represent the state of each device and observation nodes that represent the observation results of each device, and infers an anomaly location from a causal model to which the determined input value is applied.

The anomaly location estimating device according to claim 1 , wherein the causal model inference unit determines a value indicating an anomaly as an input value to an observation node corresponding to the first device.

An anomaly location estimation method executed by a computer used as an anomaly location estimation device that estimates an anomaly location in a communication network having a plurality of devices, comprising:
collecting a log generated from the second device indicating that communication with the first device has been lost;
An anomaly location estimation method, in which, in a causal model consisting of device nodes representing the state of each device and observation nodes representing the observation results of each device, an input value to an observation node corresponding to the first device is determined based on the log, and an anomaly location is estimated from the causal model to which the determined input value is applied.

A program for causing a computer to function as each part of the abnormality location estimation device described in claim 1 or 2.