JP7325557B2

JP7325557B2 - Abnormality diagnosis method and abnormality diagnosis device

Info

Publication number: JP7325557B2
Application number: JP2022003718A
Authority: JP
Inventors: 達海大庭
Original assignee: Panasonic Intellectual Property Corp of America
Current assignee: Panasonic Intellectual Property Corp of America
Priority date: 2017-09-27
Filing date: 2022-01-13
Publication date: 2023-08-14
Anticipated expiration: 2037-09-27
Also published as: JP2022036261A

Description

本発明は、多次元情報における異常に寄与する変数を特定する異常診断方法および異常診断装置に関する。 The present invention relates to an anomaly diagnosis method and an anomaly diagnosis apparatus for identifying variables that contribute to anomalies in multidimensional information.

従来、特許文献１～４および非特許文献１において開示されているように、複数の変数の観測値に異常があるか否かを診断することで、異常を検知することが知られている。 Conventionally, as disclosed in Patent Documents 1 to 4 and Non-Patent Document 1, it is known to detect an abnormality by diagnosing whether there is an abnormality in observed values of a plurality of variables.

特許第６０７６７５１号公報Japanese Patent No. 6076751 特許第５８５８８３９号公報Japanese Patent No. 5858839 特許第５８１１６８３号公報Japanese Patent No. 5811683 特許第５１０８１１６号公報Japanese Patent No. 5108116

Ｓｏｍｍｅｒ，Ｒｏｂｉｎ，ａｎｄＶｅｒｎＰａｘｓｏｎ． “Ｏｕｔｓｉｄｅｔｈｅｃｌｏｓｅｄｗｏｒｌｄ：Ｏｎｕｓｉｎｇｍａｃｈｉｎｅｌｅａｒｎｉｎｇｆｏｒｎｅｔｗｏｒｋｉｎｔｒｕｓｉｏｎｄｅｔｅｃｔｉｏｎ．” ＳｅｃｕｒｉｔｙａｎｄＰｒｉｖａｃｙ（ＳＰ），２０１０ＩＥＥＥＳｙｍｐｏｓｉｕｍｏｎ．ＩＥＥＥ，２０１０．Sommer, Robin, and Vern Paxson. "Outside the closed world: On using machine learning for network intrusion detection." Security and Privacy (SP), 2010 IEEE Symposium on. IEEE, 2010. Ｍ．Ｔａｖａｌｌａｅｅ，Ｅ．Ｂａｇｈｅｒｉ，Ｗ．Ｌｕ，ａｎｄＡ．Ｇｈｏｒｂａｎｉ， “ＡＤｅｔａｉｌｅｄＡｎａｌｙｓｉｓｏｆｔｈｅＫＤＤＣＵＰ９９ＤａｔａＳｅｔ，” ＳｕｂｍｉｔｔｅｄｔｏＳｅｃｏｎｄＩＥＥＥＳｙｍｐｏｓｉｕｍｏｎＣｏｍｐｕｔａｔｉｏｎａｌＩｎｔｅｌｌｉｇｅｎｃｅｆｏｒＳｅｃｕｒｉｔｙａｎｄＤｅｆｅｎｓｅＡｐｐｌｉｃａｔｉｏｎｓ（ＣＩＳＤＡ），２００９．M. Tavallaee, E.; Bagheri, W.; Lu, and A.L. Gorbani, "A Detailed Analysis of the KDD CUP 99 Data Set," Submitted to Second IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), 2009. Ｍｉｋｏｌｏｖ，Ｔｏｍａｓ，ｅｔａｌ． “Ｄｉｓｔｒｉｂｕｔｅｄｒｅｐｒｅｓｅｎｔａｔｉｏｎｓｏｆｗｏｒｄｓａｎｄｐｈｒａｓｅｓａｎｄｔｈｅｉｒｃｏｍｐｏｓｉｔｉｏｎａｌｉｔｙ．” Ａｄｖａｎｃｅｓｉｎｎｅｕｒａｌｉｎｆｏｒｍａｔｉｏｎｐｒｏｃｅｓｓｉｎｇｓｙｓｔｅｍｓ．２０１３．Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013. Ｙａｎｇ，Ｓｅｎ，ｅｔａｌ． “Ｆｅａｔｕｒｅｇｒｏｕｐｉｎｇａｎｄｓｅｌｅｃｔｉｏｎｏｖｅｒａｎｕｎｄｉｒｅｃｔｅｄｇｒａｐｈ．” ＧｒａｐｈＥｍｂｅｄｄｉｎｇｆｏｒＰａｔｔｅｒｎＡｎａｌｙｓｉｓ．ＳｐｒｉｎｇｅｒＮｅｗＹｏｒｋ，２０１３．２７－４３．Yang, Sen, et al. "Feature grouping and selection over an undirected graph." Graph Embedding for Pattern Analysis. Springer New York, 2013. 27-43. Ｓｏｎｇ，Ｊｉｎｇｐｉｎｇ，ＺｈｉｌｉａｎｇＺｈｕ，ａｎｄＣｈｒｉｓＰｒｉｃｅ． “Ｆｅａｔｕｒｅｇｒｏｕｐｉｎｇｆｏｒｉｎｔｒｕｓｉｏｎｄｅｔｅｃｔｉｏｎｓｙｓｔｅｍｂａｓｅｄｏｎｈｉｅｒａｒｃｈｉｃａｌｃｌｕｓｔｅｒｉｎｇ．” ＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＡｖａｉｌａｂｉｌｉｔｙ，Ｒｅｌｉａｂｉｌｉｔｙ，ａｎｄＳｅｃｕｒｉｔｙ．ＳｐｒｉｎｇｅｒＩｎｔｅｒｎａｔｉｏｎａｌＰｕｂｌｉｓｈｉｎｇ，２０１４．Song, Jingping, Zhiliang Zhu, and Chris Price. "Feature grouping for intrusion detection system based on hierarchical clustering." International Conference on Availability, Reliability, and Security. Springer International Publishing, 2014.

本開示は、異常の原因を効果的に特定することができる異常診断方法及び異常診断装置を提供する。 The present disclosure provides an abnormality diagnosis method and an abnormality diagnosis device that can effectively identify the cause of an abnormality.

本開示の一態様に係る異常診断方法は、監視対象の状態を観測することにより得られた、前記状態を示す複数の変数の値で構成される観測値を用いて、当該観測値が異常であるか否かを診断する異常診断装置が実行する異常診断方法であって、前記異常診断装置は、プロセッサおよびメモリを備え、前記メモリは、複数の前記観測値を用いた学習により生成された異常検知モデルを記憶しており、前記プロセッサは、前記複数の変数のうち互いに関連のある少なくとも２つの変数の組み合わせでそれぞれが構成される１以上のグループを示すグループ情報を取得し、前記観測値を取得し、前記メモリから前記異常検知モデルを読み出して、前記異常検知モデルに前記観測値を入力することでスコアを算出し、前記スコアを用いて前記観測値が異常であるか否かを判定し、前記観測値が異常であると判定した場合、前記スコアと取得した前記グループ情報で示される前記１以上のグループとに基づいて定義される損失関数を用いて、当該観測値の前記１以上のグループのうち、異常の原因であるグループを特定する。 An abnormality diagnosis method according to an aspect of the present disclosure uses an observed value obtained by observing the state of a monitoring target and configured by values of a plurality of variables indicating the state, and determines whether the observed value is abnormal. An abnormality diagnosis method executed by an abnormality diagnosis device for diagnosing whether or not there is an abnormality, the abnormality diagnosis device comprising a processor and a memory, wherein the memory detects an abnormality generated by learning using a plurality of the observed values A detection model is stored, and the processor obtains group information indicating one or more groups each composed of a combination of at least two mutually related variables among the plurality of variables, and calculates the observation value. read the anomaly detection model from the memory, calculate a score by inputting the observed value into the anomaly detection model, and use the score to determine whether the observed value is anomalous , when the observed value is determined to be abnormal, using a loss function defined based on the score and the one or more groups indicated by the acquired group information, the one or more of the observed value Identify the group that is the cause of the anomaly among the groups.

本開示の一態様に係る異常診断装置は、監視対象の状態を観測することにより得られた、前記状態を示す複数の変数の値で構成される観測値を用いて、当該観測値が異常であるか否かを診断する異常診断装置であって、プロセッサおよびメモリを備え、前記メモリは、複数の前記観測値を用いた学習により生成された異常検知モデルを記憶しており、前記プロセッサは、前記複数の変数のうち互いに関連のある少なくとも２つの変数の組み合わせによりそれぞれが構成される１以上のグループを示すグループ情報を取得し、前記観測値を取得し、前記メモリから前記異常検知モデルを読み出して、前記異常検知モデルに前記観測値を入力することでスコアを算出し、記スコアを用いて前記観測値が異常であるか否かを判定し、前記観測値が異常であると判定した場合、前記スコアと取得した前記グループ情報で示される前記１以上のグループとに基づいて定義される損失関数を用いて、前記観測値の前記１以上のグループのうち、異常の原因であるグループを特定する。 An abnormality diagnosing device according to an aspect of the present disclosure uses an observed value obtained by observing the state of a monitoring target and configured by the values of a plurality of variables indicating the state, and determines whether the observed value is abnormal. An anomaly diagnosis device for diagnosing whether or not there is a Acquiring group information indicating one or more groups each formed by a combination of at least two mutually related variables among the plurality of variables, acquiring the observed value, and reading the anomaly detection model from the memory and calculating a score by inputting the observed value to the anomaly detection model, determining whether the observed value is abnormal using the score, and determining that the observed value is abnormal , using a loss function defined based on the score and the one or more groups indicated by the acquired group information, among the one or more groups of the observed values, identifying the group that is the cause of the abnormality do.

本開示の一態様に係る異常診断方法は、監視対象の状態を観測することにより得られた、前記状態を示す複数の変数の値で構成される観測値を用いて、当該観測値が異常であるか否かを診断する異常診断装置が実行する異常診断方法であって、前記異常診断装置は、プロセッサおよびメモリを備え、前記メモリは、複数の前記観測値を用いた学習により生成された異常検知モデルを記憶しており、前記プロセッサは、前記複数の変数のうち互いに関連のある少なくとも２つの変数の組み合わせでそれぞれが構成される１以上のグループを示すグループ情報を取得し、前記観測値を取得し、前記メモリから前記異常検知モデルを読み出して、読み出した前記異常検知モデルを用いて前記観測値が異常であるか否かを判定し、前記観測値が異常であると判定した場合、当該観測値と取得した前記グループ情報で示される前記１以上のグループとに基づいて、当該観測値の前記１以上のグループのうち、異常の原因であるグループを特定する。 An abnormality diagnosis method according to an aspect of the present disclosure uses an observed value obtained by observing the state of a monitoring target and configured by values of a plurality of variables indicating the state, and determines whether the observed value is abnormal. An abnormality diagnosis method executed by an abnormality diagnosis device for diagnosing whether or not there is an abnormality, the abnormality diagnosis device comprising a processor and a memory, wherein the memory detects an abnormality generated by learning using a plurality of the observed values A detection model is stored, and the processor obtains group information indicating one or more groups each composed of a combination of at least two mutually related variables among the plurality of variables, and calculates the observation value. read the anomaly detection model from the memory, determine whether the observed value is abnormal using the read anomaly detection model, and if it is determined that the observed value is abnormal, Based on the observed value and the one or more groups indicated by the acquired group information, the group that is the cause of the abnormality is specified among the one or more groups of the observed value.

本開示の一態様に係る異常診断装置は、監視対象の状態を観測することにより得られた、前記状態を示す複数の変数の値で構成される観測値を用いて、当該観測値が異常であるか否かを診断する異常診断装置であって、プロセッサおよびメモリを備え、前記メモリは、複数の前記観測値を用いた学習により生成された異常検知モデルを記憶しており、前記プロセッサは、前記複数の変数のうち互いに関連のある少なくとも２つの変数の組み合わせによりそれぞれが構成される１以上のグループを示すグループ情報を取得し、前記観測値を取得し、前記メモリから前記異常検知モデルを読み出して、読み出した前記異常検知モデルを用いて前記観測値が異常であるか否かを判定し、前記観測値が異常であると判定した場合、取得した前記観測値と取得した前記グループ情報で示される前記１以上のグループとに基づいて、前記観測値の前記１以上のグループのうち、異常の原因であるグループを特定する。 An abnormality diagnosing device according to an aspect of the present disclosure uses an observed value obtained by observing the state of a monitoring target and configured by the values of a plurality of variables indicating the state, and determines whether the observed value is abnormal. An anomaly diagnosis device for diagnosing whether or not there is a Acquiring group information indicating one or more groups each formed by a combination of at least two mutually related variables among the plurality of variables, acquiring the observed value, and reading the anomaly detection model from the memory Then, it is determined whether the observed value is abnormal using the read anomaly detection model, and if it is determined that the observed value is abnormal, it is indicated by the acquired observed value and the acquired group information. A group that is the cause of the abnormality is identified among the one or more groups of the observed values, based on the one or more groups that are obtained.

なお、これらの全般的または具体的な態様は、システム、装置、集積回路、コンピュータプログラムまたはコンピュータ読み取り可能なＣＤ－ＲＯＭなどの記録媒体で実現されてもよく、システム、装置、集積回路、コンピュータプログラムおよび記録媒体の任意な組み合わせで実現されてもよい。 It should be noted that these general or specific aspects may be realized by a system, device, integrated circuit, computer program, or a recording medium such as a computer-readable CD-ROM. and any combination of recording media.

本開示における異常診断方法および異常診断装置は、異常の原因を効果的に特定することができる。 The abnormality diagnosis method and abnormality diagnosis device according to the present disclosure can effectively identify the cause of abnormality.

図１は、実施の形態に係る異常診断システムの概略図である。FIG. 1 is a schematic diagram of an abnormality diagnosis system according to an embodiment. 図２は、実施の形態に係る異常診断装置のハードウェア構成の一例を示すブロック図である。FIG. 2 is a block diagram showing an example of the hardware configuration of the abnormality diagnosis device according to the embodiment. 図３は、本実施の形態に係る異常診断システムにおける監視対象の一例を示す図である。FIG. 3 is a diagram showing an example of a monitoring target in the abnormality diagnosis system according to this embodiment. 図４は、本実施の形態における異常診断装置の機能構成の一例を示すブロック図である。FIG. 4 is a block diagram showing an example of the functional configuration of the abnormality diagnosis device according to this embodiment. 図５は、監視対象から取得された観測データの一例を示す図である。FIG. 5 is a diagram showing an example of observation data acquired from a monitoring target. 図６Ａは、本実施の形態における異常診断装置のグループを設定するためにディスプレイに表示されるＵＩの一例を示す図である。FIG. 6A is a diagram showing an example of a UI displayed on the display for setting groups of abnormality diagnosis devices according to the present embodiment. 図６Ｂは、本実施の形態における異常診断装置のグループを設定するためにディスプレイに表示されるＵＩの一例を示す図である。FIG. 6B is a diagram showing an example of a UI displayed on the display for setting groups of abnormality diagnosis devices according to the present embodiment. 図７は、実施の形態におけるオートエンコーダを示す図である。FIG. 7 is a diagram showing an autoencoder according to the embodiment. 図８は、実施の形態に係る異常検知モデルの生成処理の一例を示すフローチャートである。FIG. 8 is a flowchart illustrating an example of processing for generating an anomaly detection model according to the embodiment. 図９は、実施の形態に係る異常検知モデルを用いた判定処理の一例を示すフローチャートである。FIG. 9 is a flowchart illustrating an example of determination processing using an anomaly detection model according to the embodiment. 図１０は異常の原因であるグループを特定する特定処理の一例を示すフローチャートである。FIG. 10 is a flow chart showing an example of identification processing for identifying a group that is the cause of an abnormality.

（本発明の基礎となった知見）
近年、工場などの設備では、ＩＴ化や自動化が高度に進んでいる。設備を効率よく、かつ、安全に自動化するためには、正確なセンシングと適切なフィードバックとが必要不可欠である。このため、故障、サイバー攻撃などによる異常を検知する異常検知は、設備の不良による大きな損害を事前に回避するために極めて重要な技術である。 (Knowledge on which the present invention is based)
2. Description of the Related Art In recent years, the use of IT and automation have advanced to a high degree in facilities such as factories. Accurate sensing and appropriate feedback are essential for efficient and safe automation of equipment. For this reason, anomaly detection, which detects anomalies due to failures, cyberattacks, etc., is an extremely important technology for avoiding major damage in advance due to equipment defects.

その一方で、適切に異常を捉えることができたとしても、何が原因で異常が起きているのか、その原因を追究することは、また別の困難さをはらんでいる。機械学習における一般的な異常検知の手法では、正常状態では発生する可能性が低いイベントを検知するのみであり、その異常の原因についての情報は得られない。そのため、異常検知は、正常状態でないことが様々な可能性を含むネットワークなどの分野においては特に、あまり有効に活用されていないのが現状である（非特許文献１）。 On the other hand, even if the anomaly can be detected appropriately, it is fraught with another difficulty to investigate the cause of the anomaly. General anomaly detection methods in machine learning only detect events that are unlikely to occur under normal conditions, and do not provide information about the cause of the anomaly. Therefore, the current situation is that anomaly detection is not very effectively used, especially in the field of networks and the like, in which there are various possibilities for an abnormal state (Non-Patent Document 1).

異常が起きた原因を特定するために、観測された値のどの変数が特に異常に大きく寄与しているかを特定することは有効である。例えばセンサが故障した場合には、故障したセンサの観測値だけが異常な値を示すはずである。このため、観測値が異常であると特定できた場合には、速やかに観測値が得られたセンサの故障という原因にたどり着くことができる（特許文献１、２）。 In order to identify the cause of an anomaly, it is useful to identify which variables in the observed values contribute significantly to the anomaly. For example, if a sensor fails, only the failed sensor's observations should show an abnormal value. Therefore, when an observed value is identified as abnormal, it is possible to quickly find the cause of the failure of the sensor from which the observed value was obtained (Patent Documents 1 and 2).

しかし、現実に生じる異常診断では、上記のような、故障したセンサの特定などが目的であることばかりではない。実際には、ある１つの根本的な現象または原因が複数のセンサなどの観測値に影響を及ぼす場面は多い。例えば数学のテストの点数と物理のテストの点数とは、いずれも数理的思考力という根本的な要因に影響を受ける。このため、両者の点数は高い相関関係にあると言える。このように、根本要因に影響を与える異常な現象が、同時に複数の変数に影響を及ぼすことになる。 However, in actual abnormality diagnosis, the purpose is not always to identify the failed sensor as described above. In practice, it is often the case that a single underlying phenomenon or cause affects observations from multiple sensors or the like. For example, mathematics test scores and physics test scores are both influenced by the fundamental factor of mathematical thinking. Therefore, it can be said that both scores are highly correlated. Thus, an anomalous phenomenon that affects the root cause will affect multiple variables at the same time.

また、特許文献３の手法では、ある変数に異常の可能性があると判断されると、その変数と強い相関を持つ複数の変数が抽出される。強い相関を持つ変数同士は、共通の要素に影響を受けている可能性が高い。このため、この方法は異常診断にも役立つ。特許文献４の手法も同様に強い相関を持つ変数を共線性データ項目としてまとめている。 Further, in the technique of Patent Document 3, when it is determined that a certain variable may be abnormal, a plurality of variables having a strong correlation with that variable are extracted. Variables with strong correlations are likely to be influenced by common factors. Therefore, this method is also useful for abnormality diagnosis. The method of Patent Document 4 also summarizes variables with strong correlation as collinear data items.

しかし、前述の方法は、化学プラントなどにおける設備またはセンサの故障診断など、物理的なシステムを対象としていた。このため、ネットワークを流れるパケットの不正なパラメータを検出する目的など、異常診断のモデルがより複雑な場合、または、悪意を持ってパラメータを操作される場合には、異常となる変数を効果的に特定できない場合がある。 However, the aforementioned methods were intended for physical systems such as failure diagnosis of equipment or sensors in chemical plants and the like. For this reason, if the anomaly diagnosis model is more complicated, such as for the purpose of detecting invalid parameters in packets flowing through the network, or if the parameters are manipulated maliciously, it is necessary to effectively detect anomalous variables. It may not be possible to specify.

具体的には、サイバー攻撃、マルウェアなどが進化を続けてくる中で、侵入検知システム（ＩＤＳ：Intrusion Detection System）、または、侵入防止システム（ＩＰＳ：Intrusion prevention system）を導入することで、攻撃などを未然に防ぐことが不可欠となってきた。侵入検知システムまたは侵入防止システムは、コンピュータまたはネットワークに対する不正行為を検知または防御するシステムである。 Specifically, as cyber attacks and malware continue to evolve, by introducing an intrusion detection system (IDS: Intrusion Prevention System) or an intrusion prevention system (IPS: Intrusion Prevention System), attacks, etc. It has become essential to prevent An intrusion detection system or intrusion prevention system is a system that detects or prevents unauthorized activity on a computer or network.

しかし、昨今の攻撃の多様化に伴い、従来用いられてきたシグネチャ型の検知システムでは不正通信を捉えきれなくなってきた。このため、アノマリ型と呼ばれる異常検知のシステムが必要になってきた。シグネチャ型が既知のパターンにしか適用できないのに対して、アノマリ型は未知の攻撃パターンも検知できる可能性がある。しかし、異常を検知できたところで、具体的に何が異常なのかわからない上に、膨大な異常検知アラートを発するため、ユーザが対応に追われてしまうという問題があった。 However, with the recent diversification of attacks, it is no longer possible to catch unauthorized communications with the signature-type detection system that has been used in the past. For this reason, an anomaly detection system called an anomaly type has become necessary. While the signature type can only be applied to known patterns, the anomaly type may be able to detect unknown attack patterns as well. However, even if an anomaly is detected, there is a problem that the specific anomaly cannot be identified, and a large number of anomaly detection alerts are issued, which causes the user to be forced to deal with the problem.

このような課題を解決するため、本開示の一実施様態の異常診断方法は、監視対象の状態を観測することにより得られた、前記状態を示す複数の変数の値で構成される観測値を用いて、当該観測値が異常であるか否かを診断する異常診断装置が実行する異常診断方法であって、前記異常診断装置は、プロセッサおよびメモリを備え、前記メモリは、複数の前記観測値を用いた学習により生成された異常検知モデルを記憶しており、前記プロセッサは、前記複数の変数のうち互いに関連のある少なくとも２つの変数の組み合わせでそれぞれが構成される１以上のグループを示すグループ情報を取得し、前記観測値を取得し、前記メモリから前記異常検知モデルを読み出して、読み出した前記異常検知モデルを用いて前記観測値が異常であるか否かを判定し、前記観測値が異常であると判定した場合、当該観測値と取得した前記グループ情報で示される前記１以上のグループとに基づいて、当該観測値の前記１以上のグループのうち、異常の原因であるグループを特定する。 In order to solve such a problem, an abnormality diagnosis method according to an embodiment of the present disclosure obtains an observed value composed of values of a plurality of variables indicating the state of a monitoring target, obtained by observing the state. An abnormality diagnosis method executed by an abnormality diagnosis device for diagnosing whether or not an observed value is abnormal using The processor stores an anomaly detection model generated by learning using Obtaining information, obtaining the observed value, reading the anomaly detection model from the memory, determining whether the observed value is abnormal using the read anomaly detection model, and determining whether the observed value is abnormal If it is determined to be abnormal, the group that is the cause of the abnormality is specified among the one or more groups of the observed value based on the observed value and the one or more groups indicated by the acquired group information. do.

これにより、広範な問題に対して高精度に異常の原因となるグループを特定することができる。 This makes it possible to identify a group that causes anomalies with high precision for a wide range of problems.

また、前記異常検知モデルは、前記複数の観測値を用いて、前記学習としてのオートエンコーダ、変分オートエンコーダ、および、１クラスサポートベクトルマシンのうちの少なくとも１つによって生成されたモデルであってもよい。 In addition, the anomaly detection model is a model generated by at least one of an autoencoder for learning, a variational autoencoder, and a one-class support vector machine using the plurality of observed values. good too.

これにより、既によい性能を発揮することが知られている異常検知を用いて観測値が異常であるかを判定することができる。また、異常検知が出力するスコアを用いて、自然な損失関数を用いた異常診断を行うことが容易にできる。 This allows anomaly detection, which is already known to perform well, to be used to determine if an observed value is anomalous. Moreover, it is possible to easily perform abnormality diagnosis using a natural loss function by using the score output by the abnormality detection.

また、前記観測値が異常であるか否かの判定では、前記異常検知モデルに前記観測値を入力することでスコアを算出し、算出した前記スコアが予め定められた第１の閾値以上の場合、取得した前記観測値が異常であると判定し、算出した前記スコアが前記第１の閾値未満の場合、取得した前記複数の観測値が異常でないと判定してもよい。 Further, in determining whether or not the observed value is abnormal, a score is calculated by inputting the observed value into the anomaly detection model, and if the calculated score is equal to or greater than a predetermined first threshold , it may be determined that the acquired observation value is abnormal, and if the calculated score is less than the first threshold, it may be determined that the plurality of acquired observation values are not abnormal.

このため、効果的に観測値の異常を検知することができる。 Therefore, it is possible to effectively detect abnormalities in observed values.

また、前記異常検知モデルは、学習用の正常な複数の前記観測値と、診断用の複数の前記観測値との確率密度比に基づいて生成されたモデルであり、前記観測値の取得では、複数の前記観測値を取得し、前記観測値が異常であるか否かの判定では、前記メモリから読み出した前記異常検知モデルと、取得した前記複数の観測値とを用いてスコアを算出し、算出した前記スコアが予め定められた第１の閾値以上の場合、取得した前記複数の観測値が異常であると判定し、算出した前記スコアが前記第１の閾値未満の場合、取得した前記複数の観測値が異常でないと判定してもよい。 Further, the anomaly detection model is a model generated based on the probability density ratio of the plurality of normal observation values for learning and the plurality of observation values for diagnosis. Obtaining a plurality of the observed values, and in determining whether the observed values are abnormal, calculating a score using the anomaly detection model read from the memory and the obtained plurality of observed values, When the calculated score is equal to or greater than a predetermined first threshold, the obtained plurality of observed values are determined to be abnormal, and when the calculated score is less than the first threshold, the obtained plurality of observed values are determined to be abnormal. It may be determined that the observed value of is not abnormal.

これにより、学習期間のデータが少ないなどの理由でモデリングが困難な場合や、診断対象データが集合として異常かどうかを判断できるようになり、異常であるならばどの変数グループに異常があるかを特定することができる。 As a result, when modeling is difficult due to reasons such as a small amount of data during the learning period, it becomes possible to judge whether the diagnosis target data is abnormal as a set, and if so, which variable group is abnormal. can be specified.

また、前記グループの特定では、算出した前記スコアと、前記グループ情報で示される前記１以上のグループ毎に正則化することにより得られるグループ正則化項との和で構成される損失関数が極小値となる変位ベクトルを算出し、算出した前記変位ベクトルのうち、１未満の第２の閾値以上の変数を含むグループを、異常の原因であるグループとして特定してもよい。 Further, in identifying the group, a loss function composed of the sum of the calculated score and a group regularization term obtained by regularizing each of the one or more groups indicated by the group information is a minimum value. may be calculated, and among the calculated displacement vectors, a group including a variable equal to or greater than a second threshold value less than 1 may be specified as the group causing the abnormality.

これにより、異常の原因であるグループを効果的に特定することができる。 This makes it possible to effectively identify the group that is the cause of the anomaly.

また、前記グループ正則化項は、グループ間では０＜ｐ≦１であり、かつ、グループ内ではｐ＞１であるＬｐ正則化を満たすとしてもよい。 Also, the group regularization term may satisfy Lp regularization with 0<p≦1 between groups and p>1 within groups.

以下で説明する実施の形態は、いずれも本発明の一具体例を示すものである。以下の実施の形態で示される数値、形状、構成要素、ステップ、ステップの順序などは、一例であり、本発明を限定する主旨ではない。また、以下の実施の形態における構成要素のうち、最上位概念を示す独立請求項に記載されていない構成要素については、任意の構成要素として説明される。また全ての実施の形態において、各々の内容を組み合わせることも出来る。 All of the embodiments described below represent specific examples of the present invention. Numerical values, shapes, components, steps, order of steps, and the like shown in the following embodiments are examples and are not intended to limit the present invention. In addition, among the constituent elements in the following embodiments, constituent elements that are not described in independent claims representing the highest concept will be described as optional constituent elements. Moreover, in all the embodiments, each content can be combined.

（実施の形態）
［１．異常診断システムの構成］
まず、本実施の形態における異常診断システムの概略構成について説明する。 (Embodiment)
[1. Configuration of anomaly diagnosis system]
First, a schematic configuration of the abnormality diagnosis system according to the present embodiment will be described.

図１は、実施の形態に係る異常診断システムの概略図である。 FIG. 1 is a schematic diagram of an abnormality diagnosis system according to an embodiment.

具体的には、図１において、異常診断システム１は、異常診断装置１００、サーバ２００および監視対象３００を備える。異常診断システム１では、監視対象３００の状態を観測することにより得られたデータである観測データが監視対象３００からサーバ２００に送信される。サーバ２００に受信された観測データは、サーバ２００に蓄積される。異常診断装置１００は、サーバ２００に蓄積された観測データを取得する。異常診断装置１００は、例えば、定期的にサーバ２００に蓄積された観測データを取得してもよいし、リアルタイムにサーバ２００に蓄積された観測データを取得してもよい。 Specifically, in FIG. 1 , the abnormality diagnosis system 1 includes an abnormality diagnosis device 100 , a server 200 and a monitored object 300 . In the abnormality diagnosis system 1 , observation data, which is data obtained by observing the state of the monitored object 300 , is transmitted from the monitored object 300 to the server 200 . The observation data received by the server 200 are accumulated in the server 200 . The abnormality diagnosis device 100 acquires observation data accumulated in the server 200 . For example, the abnormality diagnosis device 100 may acquire observation data accumulated in the server 200 periodically, or may acquire observation data accumulated in the server 200 in real time.

異常診断装置１００は、取得した観測データを用いて、観測データに異常である観測値が含まれているか否かを判定する。これにより、異常診断装置１００は、監視対象の異常診断を行う。 The abnormality diagnosis apparatus 100 uses the obtained observation data to determine whether or not the observation data includes an abnormal observation value. Thereby, the abnormality diagnosis device 100 diagnoses the abnormality of the monitoring target.

監視対象３００は、異常診断の対象となるシステムである。監視対象３００は、例えば、化学プラント、制御システム、車載ネットワークシステムなどである。監視対象３００からは、観測により観測データを得ることができる。観測データは、監視対象３００の状態を示す複数の変数の値で構成される観測値を示すデータである。観測データは、監視対象３００から取得された観測値を示す複数種類のデータ列である。観測値のデータ列は、例えば時系列のベクトルにより表され、それぞれの次元が監視対象３００が含むセンサなどから得られた値である。つまり、観測値は、複数の異なるタイミングのそれぞれにおいて得られた値である。観測値は、例えば、所定の１単位の処理中に観測されることにより得られる値であって、当該処理が終了するタイミングで得られる値である。 A monitoring target 300 is a system targeted for abnormality diagnosis. The monitored object 300 is, for example, a chemical plant, a control system, an in-vehicle network system, or the like. Observation data can be obtained from the monitoring target 300 through observation. Observation data is data indicating observation values composed of values of a plurality of variables indicating the state of the monitoring target 300 . Observation data are multiple types of data strings indicating observation values acquired from the monitoring target 300 . A data string of observed values is represented by, for example, a time-series vector, and each dimension is a value obtained from a sensor or the like included in the monitored object 300 . In other words, an observed value is a value obtained at each of a plurality of different timings. An observed value is, for example, a value obtained by observation during a predetermined unit of processing, and is a value obtained at the timing when the processing ends.

異常診断システム１では、異常診断装置１００、サーバ２００および監視対象３００は、互いに通信可能に接続されている。例えば、異常診断システム１では、異常診断装置１００とサーバ２００とは、インターネットなどの汎用のネットワークで接続されていてもよいし、専用のネットワークで接続されていてもよい。また、異常診断システム１では、サーバ２００と監視対象３００とは、インターネットなどの汎用のネットワークで接続されていてもよいし、専用のネットワークで接続されていてもよい。 In the abnormality diagnosis system 1, the abnormality diagnosis device 100, the server 200, and the monitored object 300 are connected so as to be able to communicate with each other. For example, in the abnormality diagnosis system 1, the abnormality diagnosis device 100 and the server 200 may be connected via a general-purpose network such as the Internet, or may be connected via a dedicated network. In the abnormality diagnosis system 1, the server 200 and the monitored object 300 may be connected via a general-purpose network such as the Internet, or may be connected via a dedicated network.

なお、異常診断システム１は、サーバ２００を備えていなくてもよい。つまり、異常診断システムは、異常診断装置１００および監視対象３００を備える構成であってもよい。この場合の異常診断システム１では、監視対象３００の状態を観測することにより得られたデータである観測データが監視対象３００から異常診断装置１００に直接送信される。また、サーバ２００を備えていない異常診断システムでは、異常診断装置１００と監視対象３００とは、互いに通信可能に接続されており、例えば、インターネットなどの汎用のネットワークで接続されていてもよいし、専用のネットワークで接続されていてもよい。 Note that the abnormality diagnosis system 1 does not have to include the server 200 . In other words, the abnormality diagnosis system may be configured to include the abnormality diagnosis device 100 and the monitored object 300 . In the abnormality diagnosis system 1 in this case, observation data obtained by observing the state of the monitored object 300 is directly transmitted from the monitored object 300 to the abnormality diagnosis device 100 . Further, in an abnormality diagnosis system that does not include the server 200, the abnormality diagnosis device 100 and the monitored object 300 are connected so as to be able to communicate with each other. It may be connected via a dedicated network.

［２．ハードウェア構成］
［２－１．異常診断装置の構成］
次に、異常診断装置１００のハードウェア構成について図２を用いて説明する。 [2. Hardware configuration]
[2-1. Configuration of abnormality diagnosis device]
Next, the hardware configuration of the abnormality diagnosis device 100 will be explained using FIG.

図２は、実施の形態に係る異常診断装置のハードウェア構成の一例を示すブロック図である。 FIG. 2 is a block diagram showing an example of the hardware configuration of the abnormality diagnosis device according to the embodiment.

図２に示すように、異常診断装置１００は、ハードウェア構成として、ＣＰＵ１０１（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）と、メインメモリ１０２と、ストレージ１０３と、通信ＩＦ（Ｉｎｔｅｒｆａｃｅ）１０４と、入力ＩＦ（Ｉｎｔｅｒｆａｃｅ）１０５と、ディスプレイ１０６とを備える。 As shown in FIG. 2, the abnormality diagnosis apparatus 100 has a hardware configuration including a CPU 101 (Central Processing Unit), a main memory 102, a storage 103, a communication IF (Interface) 104, and an input IF (Interface) 105. , and a display 106 .

ＣＰＵ１０１は、ストレージ１０３等に記憶された制御プログラムを実行するプロセッサの一例である。 The CPU 101 is an example of a processor that executes control programs stored in the storage 103 or the like.

メインメモリ１０２は、ＣＰＵ１０１が制御プログラムを実行するときに使用するワークエリアとして用いられる揮発性の記憶領域、つまりメモリの一例である。 The main memory 102 is an example of a volatile storage area, that is, a memory used as a work area used when the CPU 101 executes the control program.

ストレージ１０３は、制御プログラム、コンテンツなどを保持する不揮発性の記憶領域、つまり、メモリの一例である。 The storage 103 is an example of a non-volatile storage area that holds control programs, content, etc., that is, a memory.

通信ＩＦ１０４は、通信ネットワークを介してサーバ２００と通信する通信インタフェースである。通信ＩＦ１０４は、例えば、有線ＬＡＮインタフェースである。なお、通信ＩＦ１０４は、無線ＬＡＮインタフェースであってもよい。また、通信ＩＦ１０４は、ＬＡＮインタフェースに限らずに、通信ネットワークとの通信接続を確立できる通信インタフェースであれば、どのような通信インタフェースであってもよい。 Communication IF 104 is a communication interface that communicates with server 200 via a communication network. Communication IF 104 is, for example, a wired LAN interface. Note that the communication IF 104 may be a wireless LAN interface. Further, the communication IF 104 is not limited to a LAN interface, and may be any communication interface as long as it can establish a communication connection with a communication network.

入力ＩＦ１０５は、例えば、テンキー、キーボード、マウスなどの入力装置である。 The input IF 105 is, for example, an input device such as a numeric keypad, keyboard, or mouse.

ディスプレイ１０６は、ＣＰＵ１０１での処理結果を表示する表示装置である。ディスプレイ１０６は、例えば、液晶ディスプレイ、有機ＥＬディスプレイである。 A display 106 is a display device that displays the processing results of the CPU 101 . The display 106 is, for example, a liquid crystal display or an organic EL display.

［２－２．監視対象の構成］
次に、異常診断システム１における監視対象３００の一例として制御システム３１０から観測データを得る場合について説明する。 [2-2. Monitoring target configuration]
Next, a case of obtaining observation data from the control system 310 as an example of the monitoring target 300 in the abnormality diagnosis system 1 will be described.

図３は、本実施の形態に係る異常診断システムにおける監視対象の一例を示す図である。図３において、異常診断システム１では、監視対象３００は、例えば、制御システム３１０である。制御システム３１０において観測されることにより得られた観測データは、サーバ２００に送信される。サーバ２００に送信されることで蓄積された観測データは、異常診断装置１００に送信され、異常診断装置１００における観測値の異常診断に用いられる。 FIG. 3 is a diagram showing an example of a monitoring target in the abnormality diagnosis system according to this embodiment. In FIG. 3, in the abnormality diagnosis system 1, the monitored object 300 is, for example, the control system 310. As shown in FIG. Observation data obtained by observing in the control system 310 is transmitted to the server 200 . Observation data accumulated by being transmitted to the server 200 is transmitted to the abnormality diagnosis device 100 and used for abnormality diagnosis of the observation value in the abnormality diagnosis device 100 .

制御システム３１０は、ルータ３１１と、スイッチングハブ３１２と、管理端末３１３と、サーバ３１４と、ＰＬＣ（Programmable Logic Controller）３１５と、センサ３１
６とを備える。 The control system 310 includes a router 311, a switching hub 312, a management terminal 313, a server 314, a PLC (Programmable Logic Controller) 315, and a sensor 31.
6.

ルータ３１１は、制御システム３１０と他のネットワークとの間において、データの送受信を中継する通信機器である。ルータ３１１は、受信したデータを解析し、解析した結果に基づいてデータの転送経路を選択するなどのデータの転送制御を行う。 The router 311 is a communication device that relays data transmission/reception between the control system 310 and other networks. The router 311 analyzes the received data and performs data transfer control such as selecting a data transfer route based on the analysis result.

スイッチングハブ３１２は、ルータ３１１、管理端末３１３、サーバ３１４、ＰＬＣ３１５、およびセンサ３１６と通信接続され、受信したデータを接続された機器のうち、受信したデータに含まれる宛先情報に基づく機器に転送する。スイッチングハブ３１２は、例えば、受信したデータをコピーしたデータを出力するミラーポートを有する。スイッチングハブ３１２は、ミラーポートにおいて、サーバ２００と接続されている。観測データは、スイッチングハブ３１２のミラーポート経由で抽出され、サーバ２００に送信される。 The switching hub 312 is connected for communication with the router 311, the management terminal 313, the server 314, the PLC 315, and the sensor 316, and transfers the received data to one of the connected devices based on the destination information included in the received data. . The switching hub 312 has, for example, a mirror port that outputs data obtained by copying the received data. The switching hub 312 is connected to the server 200 at the mirror port. Observation data is extracted via the mirror port of the switching hub 312 and sent to the server 200 .

管理端末３１３は、例えば、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）、タブレット端末、スマートフォンなどである。 The management terminal 313 is, for example, a PC (Personal Computer), a tablet terminal, a smart phone, or the like.

サーバ３１４は、コンピュータであり、例えば、管理端末３１３に所定の機能、データなどを提供する。 The server 314 is a computer, and provides predetermined functions, data, etc. to the management terminal 313, for example.

ＰＬＣ３１５は、各種機械を制御するための制御装置である。 The PLC 315 is a control device for controlling various machines.

センサ３１６は、各種センサを含み、例えば、各種物理量を電気信号に変換する機器である。 The sensor 316 includes various sensors, and is, for example, a device that converts various physical quantities into electrical signals.

［３．異常診断システムの機能構成］
次に、異常診断システム１の機能構成について図４を用いて説明する。 [3. Functional Configuration of Abnormality Diagnosis System]
Next, the functional configuration of the abnormality diagnosis system 1 will be explained using FIG.

図４は、本実施の形態における異常診断装置の機能構成の一例を示すブロック図である。 FIG. 4 is a block diagram showing an example of the functional configuration of the abnormality diagnosis device according to this embodiment.

サーバ２００に蓄積されている観測データ２１０は、学習用データ２１１と診断用データ２１２とを含む。 Observation data 210 stored in server 200 includes learning data 211 and diagnosis data 212 .

学習用データ２１１は、取得された観測データ２１０のうちで、機械学習による異常検知モデルを生成するためのデータである。診断用データ２１２は、取得された観測データ２１０のうちで、生成された異常検知モデルを用いて監視対象３００から得られた観測データ２１０が異常か否かを判断する異常診断の対象となるデータである。なお、学習用データ２１１には、正常なデータだけでなく、異常なデータも含む取得された観測データ２１０を用いることができる。例えば、学習用データ２１１は、観測データ２１０の始めの所定期間で取得されたデータであり、診断用データ２１２は、学習用データ２１１を取得した所定期間より後の期間において取得されたデータとしてもよい。 The learning data 211 is data for generating an anomaly detection model by machine learning among the acquired observation data 210 . Diagnosis data 212 is data to be subjected to abnormality diagnosis for determining whether or not observation data 210 obtained from monitored object 300 is abnormal using a generated abnormality detection model, among acquired observation data 210. is. Note that the acquired observation data 210 including not only normal data but also abnormal data can be used as the learning data 211 . For example, the learning data 211 is data acquired in a predetermined period at the beginning of the observation data 210, and the diagnostic data 212 is data acquired in a period after the predetermined period in which the learning data 211 is acquired. good.

異常診断装置１００は、第１取得部１１０と、グループ情報ＤＢ（Ｄａｔａｂａｓｅ）１２０と、第２取得部１３０と、生成部１４０と、異常検知モデルＤＢ（Ｄａｔａｂａｓｅ）１５０と、判定部１６０と、特定部１７０とを備える。 The abnormality diagnosis device 100 includes a first acquisition unit 110, a group information DB (Database) 120, a second acquisition unit 130, a generation unit 140, an abnormality detection model DB (Database) 150, a determination unit 160, and a determination unit 160. and a section 170 .

第１取得部１１０は、監視対象３００の状態を示す複数の変数のうち、互いに関連のある少なくとも２つの変数の組み合わせでそれぞれが構成される１以上のグループを示すグループ情報を取得する。第１取得部１１０は、例えば、ユーザからのグループを示す入力を受け付けることで、グループ情報を取得する。第１取得部１１０は、例えば、入力ＩＦ１０５、ディスプレイ１０６などにより実現される。 The first acquisition unit 110 acquires group information indicating one or more groups each configured by a combination of at least two mutually related variables among a plurality of variables indicating the state of the monitoring target 300 . The first acquisition unit 110 acquires group information, for example, by accepting an input indicating a group from a user. The first acquisition unit 110 is realized by, for example, the input IF 105, the display 106, and the like.

グループ情報ＤＢ１２０は、第１取得部１１０により取得されたグループ情報を記憶する。グループ情報ＤＢ１２０は、例えば、ストレージ１０３などにより実現される。 The group information DB 120 stores group information acquired by the first acquisition unit 110 . The group information DB 120 is implemented by, for example, the storage 103 or the like.

第２取得部１３０は、サーバ２００から観測データ２１０を取得することで、観測データ２１０に含まれる観測値を取得する。第２取得部１３０は、例えば、通信ＩＦ１０４などにより実現される。 The second acquisition unit 130 acquires observation values included in the observation data 210 by acquiring the observation data 210 from the server 200 . The second acquisition unit 130 is implemented by, for example, the communication IF 104 or the like.

生成部１４０は、第２取得部１３０により取得された観測データ２１０のうちの学習用データ２１１を用いた機械学習により異常検知モデルを生成する。機械学習による異常検知モデルの生成は、通常教師なし学習の問題として定式化される。生成部１４０は、例えば、学習用データ２１１を用いてデータの確率密度分布などを推定することで異常検知用のモデルである異常検知モデルを生成する。学習用データ２１１は、例えば、所定の期間、監視対象３００を観測することにより得られた多くのラベル無しデータを含む。生成部１４０による異常検知モデルの生成処理の具体例は、後述する。生成部１４０は、例えば、ＣＰＵ１０１、メインメモリ１０２、ストレージ１０３などにより実現される。 The generation unit 140 generates an anomaly detection model by machine learning using the learning data 211 of the observation data 210 acquired by the second acquisition unit 130 . Generating anomaly detection models by machine learning is usually formulated as an unsupervised learning problem. The generation unit 140 generates an anomaly detection model, which is a model for anomaly detection, by estimating the probability density distribution of data using the learning data 211, for example. The learning data 211 includes, for example, a large amount of unlabeled data obtained by observing the monitored object 300 for a predetermined period. A specific example of the generation processing of the anomaly detection model by the generation unit 140 will be described later. The generation unit 140 is implemented by, for example, the CPU 101, the main memory 102, the storage 103, and the like.

異常検知モデルＤＢ１５０は、生成部１４０により生成された異常検知モデルを記憶する。異常検知モデルＤＢ１５０は、例えば、ストレージなどにより実現される。 The anomaly detection model DB 150 stores the anomaly detection model generated by the generation unit 140 . The anomaly detection model DB 150 is implemented by, for example, storage.

判定部１６０は、異常検知モデルＤＢ１５０に記憶されている異常検知モデルを読み出す。判定部１６０は、読み出した異常検知モデルを用いて、第２取得部１３０により取得された観測データ２１０のうち診断用データ２１２が異常であるか否かを判定する。判定部１６０は、例えば、診断用データ２１２に含まれる複数の観測値のそれぞれについて上記の判定を行ってもよいし、複数の観測値について上記の判定を行ってもよい。判定部１６０は、異常検知モデルに対する診断用データ２１２の確率密度が所定の閾値を超えているか否かに応じて、診断用データ２１２が異常であるか否かの判定を行う。判定部１６０は、例えば、得られた上記確率密度が負の対数となる場合、当該確率密度が得られた診断用データ２１２に異常が含まれると判定する。判定部１６０の異常判定の方法はこの限りではない。判定部１６０による判定処理の具体例は、後述する。判定部１６０は、例えば、ＣＰＵ１０１、メインメモリ１０２、ストレージ１０３などにより実現される。 The determination unit 160 reads the anomaly detection model stored in the anomaly detection model DB 150 . The determination unit 160 determines whether or not the diagnostic data 212 of the observation data 210 acquired by the second acquisition unit 130 is abnormal using the read anomaly detection model. For example, the determination unit 160 may perform the above determination for each of the plurality of observed values included in the diagnostic data 212, or may perform the above determination for the plurality of observed values. The determination unit 160 determines whether or not the diagnostic data 212 is abnormal depending on whether the probability density of the diagnostic data 212 with respect to the abnormality detection model exceeds a predetermined threshold. For example, when the obtained probability density is a negative logarithm, the determination unit 160 determines that the diagnostic data 212 from which the probability density is obtained contains an abnormality. The abnormality determination method of the determination unit 160 is not limited to this. A specific example of determination processing by the determination unit 160 will be described later. The determination unit 160 is implemented by, for example, the CPU 101, the main memory 102, the storage 103, and the like.

特定部１７０は、診断用データ２１２が異常であると判定部１６０により判定された場合、異常であると判定された診断用データ２１２とグループ情報ＤＢ１２０に記憶されているグループ情報で示される１以上のグループとに基づいて、当該診断用データ２１２の１以上のグループのうち、異常の原因であるグループを特定する。特定部１７０は、異常と判定された診断用データ２１２と、グループ情報とを用いて異常診断を行う。ここで言う異常診断とは、判定部１６０により異常であると判定された診断用データ２１２に関して、当該診断用データ２１２のどの変数グループが異常であるかを診断することである。特定部１７０による異常診断の具体例は、後述する。 When the determining unit 160 determines that the diagnostic data 212 is abnormal, the specifying unit 170 determines that the diagnostic data 212 determined to be abnormal and one or more information indicated by the group information stored in the group information DB 120 , the group that is the cause of the abnormality is specified among the one or more groups of the diagnostic data 212 . The identification unit 170 performs abnormality diagnosis using the diagnostic data 212 determined to be abnormal and the group information. The abnormality diagnosis referred to here means diagnosing which variable group of the diagnostic data 212 that is determined to be abnormal by the determination unit 160 is abnormal. A specific example of abnormality diagnosis by the identifying unit 170 will be described later.

なお、特定部１７０は、異常検知モデルＤＢ１５０の異常検知モデルとグループ情報ＤＢ１２０のグループ情報とを用いてどの変数グループが異常であるかを抽出することで異常診断のための異常診断モデルを予め生成していてもよい。この場合、特定部１７０は、判定部１６０で異常と判定された診断用データ２１２に対して、予め生成した異常診断モデルを用いることで、異常診断を行う。 The identification unit 170 extracts which variable group is abnormal using the abnormality detection model of the abnormality detection model DB 150 and the group information of the group information DB 120, thereby generating an abnormality diagnosis model for abnormality diagnosis in advance. You may have In this case, the identification unit 170 performs abnormality diagnosis on the diagnostic data 212 determined to be abnormal by the determination unit 160 by using a previously generated abnormality diagnosis model.

特定部１７０は、ＣＰＵ１０１、メインメモリ１０２、ストレージ１０３などにより実現される。 The identification unit 170 is implemented by the CPU 101, the main memory 102, the storage 103, and the like.

なお、異常診断装置１００は、異常検知モデルを生成する生成部１４０を備える構成であるとしたが、生成部１４０を備えていなくてもよい。つまり、学習用データを用いて異常検知モデルを生成する装置は、異常診断装置１００とは別の装置により実現されていてもよく、異常診断装置１００は当該別の装置から異常検知モデルを取得してもよい。 Although the abnormality diagnosis apparatus 100 is configured to include the generation unit 140 that generates an abnormality detection model, the generation unit 140 may not be provided. In other words, the device that generates the anomaly detection model using the learning data may be realized by a device other than the anomaly diagnosis device 100, and the anomaly diagnosis device 100 acquires the anomaly detection model from the other device. may

なお、異常診断システム１は、異常診断装置１００が観測データ２１０を取得できれば、監視対象３００を備えていなくてもよい。 Note that the abnormality diagnosis system 1 does not have to include the monitoring target 300 as long as the abnormality diagnosis device 100 can acquire the observation data 210 .

［４．観測データの一例］
次に、観測データ２１０の詳細について図５を用いて説明する。 [4. An example of observation data]
Next, details of the observation data 210 will be described with reference to FIG.

図５は、監視対象から取得された観測データの一例を示す図である。 FIG. 5 is a diagram showing an example of observation data acquired from a monitoring target.

図５において、観測データ２１０は、セッション継続時間と、プロトコル種別と、サービス名と、フラグと、サーバ送信データ量と、サーバ受信データ量と、サーバ直近通信回数と、サーバ直近エラー率と、ログインステータスと、ルートシェルフラグと、ルートで実行された命令数とのそれぞれで示される監視対象３００の各状態の項目において、複数のセッションごとに観測された観測値１～７で構成されている。つまり、図５で示される複数の項目は、監視対象の状態を示す複数の変数を示し、観測値１～７は、複数の異なるタイミングにおいて得られる観測値を示す。例えば、観測値１～７のそれぞれは、１つのセッションにおいて、監視対象３００の上記複数の各項目で示される状態が観測されることにより得られた値である。よって、観測値ｎ（ｎは自然数）で観測された値は、セッションｎ（ｎは自然数）に対する処理中に観測された値である。 In FIG. 5, the observation data 210 includes session duration time, protocol type, service name, flag, server transmission data amount, server reception data amount, server most recent communication count, server most recent error rate, login The items of each state of the monitored object 300 indicated by status, root shell flag, and number of instructions executed by the root are composed of observed values 1 to 7 observed for each of a plurality of sessions. In other words, the multiple items shown in FIG. 5 represent multiple variables indicating the state of the monitored object, and the observed values 1 to 7 represent observed values obtained at multiple different timings. For example, each of observed values 1 to 7 is a value obtained by observing the state indicated by each of the plurality of items of the monitoring target 300 in one session. Thus, the value observed in observation n (where n is a natural number) is the value observed during processing for session n (where n is a natural number).

図５における観測データ２１０の観測値の項目は、ネットワークを観測して得られる特徴量の一例である。これらの特徴量は、非特許文献２におけるＮＳＬ－ＫＤＤデータセットで提供されている特徴量の一部である。これらの特徴量は、ネットワークを流れるパケットの情報などを監視対象として、様々な種別の変数を観測することにより得られる。 The item of observation value of the observation data 210 in FIG. 5 is an example of the feature amount obtained by observing the network. These features are part of the features provided in the NSL-KDD dataset in Non-Patent Document 2. These feature quantities are obtained by observing various types of variables, with information on packets flowing through the network being monitored.

図５の観測データ２１０は、観測値として、セッション継続時間、サーバ送信データ量、サーバ受信データ量、サーバ直近通信回数、サーバ直近エラー率、およびルートで実行された命令数で観測された値である実数と、プロトコル種別、サービス名、およびフラグで観測されたカテゴリ情報、ログインステータスおよびルートシェルフラグで観測されたフラグ情報などを含んだ生の値を含む。異常診断装置１００は、これらの観測値のうち、例えば、カテゴリ情報を生の値のままで、異常検知、異常診断などの各処理において利用することが難しい。そのため、カテゴリ情報等の観測値を１－ｏｆ－Ｎエンコードを用いてベクトル値に変換する。ベクトル値への変換は、サーバ２００において行われてもよいし、異常診断装置１００において行われてもよい。 Observation data 210 in FIG. 5 are observed values of session duration, amount of data sent to server, amount of data received by server, number of most recent server communications, most recent server error rate, and number of instructions executed by the route. Contains a real number and a raw value including protocol type, service name, category information observed in flags, login status and flag information observed in root shell flags, and so on. It is difficult for the abnormality diagnosis apparatus 100 to use, for example, category information among these observed values as raw values in each process such as abnormality detection and abnormality diagnosis. Therefore, observed values such as category information are converted into vector values using 1-of-N encoding. The conversion into vector values may be performed in the server 200 or in the abnormality diagnosis device 100 .

ここで１－ｏｆ－Ｎエンコードとは、Ｎ種類の観測値に対し、１次元ずつ割り当てを行い、その観測値が該当する次元のみを１とし、他の次元を０とする数値を決定することでカテゴリ情報などの観測値をベクトル値に変換するためのエンコードである。 Here, 1-of-N encoding refers to assigning one dimension to each of the N types of observed values, and determining a numerical value by assigning 1 to only the dimension to which the observed value corresponds and 0 to the other dimensions. is an encoding for converting observed values such as category information into vector values.

例えば、晴／雨／曇の３種類である観測値に１－ｏｆ－Ｎエンコードを用いる場合、Ｎ＝３であるため、例えば、晴を（１，０，０）、雨を（０，１，０）、曇を（０，０，１）のように３次元に観測値を割り当てる変換を行うことで、晴／雨／曇という観測値からベクトル値が得られる。 For example, if 1-of-N encoding is used for three types of observations, sunny/rainy/cloudy, N=3, so for example, clear is (1,0,0) and rain is (0,1 , 0), and (0, 0, 1) for cloudy, a vector value can be obtained from the observed values of fine/rainy/cloudy.

また、ＯＮ／ＯＦＦのような２値のフラグの観測値に対しては、ＯＮを１、ＯＦＦを０という特徴量で表す変換を行うことで数値化してもよい。 Further, the observation value of a binary flag such as ON/OFF may be digitized by performing a conversion that expresses ON as 1 and OFF as 0 as a feature quantity.

［５．グループの設定の一例］
次に、グループの設定方法について図６Ａおよび図６Ｂを用いて詳細に説明する。 [5. Example of group settings]
Next, a group setting method will be described in detail with reference to FIGS. 6A and 6B.

図６Ａおよび図６Ｂは、実施の形態における異常診断装置のグループを設定するためにディスプレイに表示されるＵＩの一例を示す図である。 6A and 6B are diagrams showing examples of UIs displayed on the display for setting groups of abnormality diagnosis devices according to the embodiment.

図６Ａにおいて、ディスプレイ１０６には、少なくとも２つの変数の組み合わせで構成されるグループを設定するためのＵＩ１１１が表示されている。ＵＩ１１１は、項目１１２と、グループ追加ボタン１１３と、グループ削除ボタン１１４とで構成されている。項目１１２には、監視対象３００から取得される観測データ２１０に含まれる複数の項目が表示されている。項目１１２は、具体的には、セッション継続時間と、プロトコル種別と、サービス名と、フラグと、サーバ送信データ量と、サーバ受信データ量と、サーバ直近通信回数と、サーバ直近エラー率と、ログインステータスと、ルートシェルフラグと、ルートで実行された命令数とを含む。ユーザは、入力ＩＦ１０５を用いてＵＩ１１１に対する入力を行うことで、これらの項目１１２に対して、グループを設定する。 In FIG. 6A, the display 106 displays a UI 111 for setting groups composed of combinations of at least two variables. The UI 111 is composed of items 112 , a group addition button 113 and a group deletion button 114 . Items 112 display a plurality of items included in the observation data 210 acquired from the monitored object 300 . Specifically, the items 112 are session duration time, protocol type, service name, flag, server transmission data amount, server reception data amount, server most recent communication count, server most recent error rate, login Contains status, root shell flags, and number of instructions executed by root. The user sets groups for these items 112 by inputting to the UI 111 using the input IF 105 .

図６Ａは、ＵＩ１１１初期状態を示しており、項目１１２に対してグループに属さないことを示すグループなし１１５を設定するためのチェックボックスが表示されている。 FIG. 6A shows the initial state of the UI 111, and a check box for setting no group 115 indicating that the item 112 does not belong to a group is displayed.

グルーピングを行う場合、ユーザは、グループ追加ボタン１１３を押下することでグループを追加することができ、追加されたグループに属する項目を設定できる。 When grouping is performed, the user can add a group by pressing the add group button 113, and can set items belonging to the added group.

図６Ｂは、グループ追加ボタン１１３を押下することで、グループとして「グループ１」、「グループ２」を追加した場合に表示されるＵＩ１１１ａを示す。 FIG. 6B shows the UI 111a displayed when the group addition button 113 is pressed to add "group 1" and "group 2" as groups.

ＵＩ１１１ａでは、「グループ１」として、「セッション継続時間」、「サーバ送信データ量」、および「サーバ受信データ量」が設定されている。「セッションの継続時間」は、送受信データが大きい場合に大きくなるので、「セッション継続時間」、「サーバ送信データ量」、および「サーバ受信データ量」は互いに関連がある。このため、「セッション継続時間」、「サーバ送信データ量」、および「サーバ受信データ量」は、グルーピングされている。 In the UI 111a, "session duration", "server transmission data amount", and "server reception data amount" are set as "group 1". Since the "duration of session" increases when the amount of transmitted and received data is large, "duration of session", "amount of data transmitted by server", and "amount of data received by server" are related to each other. Therefore, "session duration", "server transmission data amount", and "server reception data amount" are grouped.

また、ＵＩ１１１ａでは、「グループ２」として、「ルートシェルフラグ」および「ルートで実行された命令数」が設定されている。「ルートシェルフラグ」がＯＮの場合に、「ルートで実行された命令数」が１以上になるため、「ルートシェルフラグ」および「ルートで実行された命令数」は互いに関連がある。このため、「ルートシェルフラグ」および「ルートで実行された命令数」は、グループピングされている。 In addition, in the UI 111a, "root shell flag" and "number of instructions executed by root" are set as "group 2". When the "root shell flag" is ON, the "number of instructions executed by the root" is 1 or more, so the "root shell flag" and the "number of instructions executed by the root" are related to each other. Therefore, "root shell flag" and "number of instructions executed by root" are grouped.

ＵＩ１１１、１１１ａを用いた項目１１２のグループの設定では、複数の項目それぞれは「グループなし」を含む１つのグループに含まれるように設定され、２つ以上のグループに属する項目はないものとする。 In setting the group of the item 112 using the UIs 111 and 111a, each of the plurality of items is set to be included in one group including "no group", and no item belongs to two or more groups.

また、項目１１２では「プロトコル種別」を観測値としているが、代わりに、「ｔｃｐ」、「ｕｄｐ」、および「ｉｃｍｐ」の３通りの観測値とする場合、つまり、項目として「プロトコル種別」の代わりに「ｔｃｐ」、「ｕｄｐ」、および「ｉｃｍｐ」の３項目とする場合は、これらの３項目を同一のグループとして設定してもよい。 In item 112, "protocol type" is an observed value, but instead, three types of observed values, "tcp", "udp", and "icmp" are used. Alternatively, if there are three items of "tcp", "udp", and "icmp", these three items may be set as the same group.

設定したグループを削除する場合は、削除するグループを選択した上でグループ削除ボタン１１４を押下することで選択したグループを削除できる。なお、ＵＩ１１１、１１１ａでは、グループを選択したことは、例えば、「グループ１」、「グループ２」および「グループなし」のいずれかを選択する入力を受け付けたときに、当該入力により示されるグループが強調表示されることで示されてもよい。 When deleting the set group, the selected group can be deleted by selecting the group to be deleted and pressing the delete group button 114 . In the UIs 111 and 111a, selecting a group means that, for example, when an input to select one of "group 1", "group 2" and "no group" is received, the group indicated by the input is It may be indicated by being highlighted.

［６．生成処理］
次に、生成部１４０による異常検知モデルの生成処理について詳細に説明する。 [6. Generation process]
Next, generation processing of an anomaly detection model by the generation unit 140 will be described in detail.

異常検知モデルの生成の一例として、オートエンコーダを用いる場合について図７を用いて説明する。 As an example of generating an anomaly detection model, a case where an autoencoder is used will be described with reference to FIG.

図７は、実施の形態におけるオートエンコーダを示す図である。 FIG. 7 is a diagram showing an autoencoder according to the embodiment.

図７に示すように、オートエンコーダは、ニューラルネットワークの一種であり、入力を表す特徴の情報量を次元圧縮により抽出する。オートエンコーダは、入力層および出力層よりも小さい次元の中間層を有する３層以上のニューラルネットワークにおいて、入力特徴ベクトルと出力ベクトルとが同じ値になるように重み付けを決定する機械学習である。 As shown in FIG. 7, an autoencoder is a kind of neural network, and extracts the information amount of features representing an input by dimensional compression. An autoencoder is machine learning that determines weights so that an input feature vector and an output vector have the same value in a neural network with three or more layers having an intermediate layer of smaller dimension than the input and output layers.

図８は、実施の形態に係る異常検知モデルの生成処理の一例を示すフローチャートである。 FIG. 8 is a flowchart illustrating an example of processing for generating an anomaly detection model according to the embodiment.

生成部１４０は、オートエンコーダのハイパーパラメータを取得する（Ｓ１１）。ハイパーパラメータとは、オートエンコーダの中間層のノード数、学習率、ドロップアウト関数のパラメータなどである。ドロップアウトは、オートエンコーダを学習する際に中間ノードの値をランダムに０にするテクニックである。生成部１４０は、ユーザの入力ＩＦ１０５への入力を受け付けることによりハイパーパラメータを取得してもよいし、予めストレージ１０３に記憶されているハイパーパラメータをストレージ１０３から取得してもよいし、外部の機器から通信ＩＦ１０４を用いてハイパーパラメータを取得してもよい。 The generation unit 140 acquires the hyperparameters of the autoencoder (S11). The hyperparameters are the number of nodes in the middle layer of the autoencoder, the learning rate, parameters of the dropout function, and so on. Dropout is a technique that randomly sets intermediate node values to 0 when training an autoencoder. The generation unit 140 may acquire hyperparameters by accepting input to the input IF 105 from the user, may acquire hyperparameters stored in advance in the storage 103 from the storage 103, or acquire hyperparameters from an external device. hyperparameters may be obtained using the communication IF 104 from .

生成部１４０は、オートエンコーダで学習する学習用データ２１１を第２取得部１３０から取得する（Ｓ１２）。 The generation unit 140 acquires the learning data 211 to be learned by the autoencoder from the second acquisition unit 130 (S12).

生成部１４０は、オートエンコーダを用いて学習を行わせ、オートエンコーダにおける重みパラメータを調整することで、学習用データ２１１を適切に圧縮および復元できる重みパラメータを取得する（Ｓ１３）。生成部１４０は、取得した重みパラメータをステップＳ１１で取得されたハイパーパラメータが設定されたオートエンコーダに適用することで、異常検知モデルを生成する。 The generation unit 140 performs learning using an autoencoder and adjusts the weighting parameter in the autoencoder, thereby obtaining a weighting parameter that can appropriately compress and restore the learning data 211 (S13). The generation unit 140 generates an anomaly detection model by applying the acquired weight parameter to the autoencoder set with the hyperparameter acquired in step S11.

［７．判定処理］
次に、生成した異常検知モデルを用いた判定部１６０による判定処理について詳細に説明する。 [7. Determination process]
Next, determination processing by the determination unit 160 using the generated anomaly detection model will be described in detail.

図９は、実施の形態に係る異常検知モデルを用いた判定処理の一例を示すフローチャートである。オートエンコーダを用いて異常の判定を行う。 FIG. 9 is a flowchart illustrating an example of determination processing using an anomaly detection model according to the embodiment. Abnormalities are determined using an autoencoder.

判定部１６０は、異常検知モデルＤＢ１５０から異常検知モデルを読み出す（Ｓ２１）。判定部１６０は、例えば、異常検知モデルＤＢ１５０から学習済のオートエンコーダを読み込む。 The determination unit 160 reads an anomaly detection model from the anomaly detection model DB 150 (S21). The determination unit 160 reads a learned autoencoder from the anomaly detection model DB 150, for example.

判定部１６０は、スコアが異常であると判定するための第１の閾値を取得する（Ｓ２２）。判定部１６０は、例えば、交差検証法を用いて第１の閾値を算出することで、第１の閾値を取得してもよい。交差検証法では、例えば、複数の学習用データ２１１を４：１に分割し、分割した一部（４：１の４）を学習に用い、残り（４：１の１）を検証に用いる。分割した学習用データ２１１のうち学習に用いるデータを用いてオートエンコーダを学習させ、かつ、分割した学習用データ２１１のうち検証に用いるデータを用いて、当該学習により得られたオートエンコーダで異常判定を行い、異常と判定される確率が０．０１％程度になるように、判定部１６０は、第１の閾値を算出してもよい。判定部１６０は、また、ユーザの入力ＩＦ１０５への入力を受け付けることにより第１の閾値を取得してもよいし、予めストレージ１０３に記憶されている第１の閾値をストレージ１０３から取得してもよいし、外部の機器から通信ＩＦ１０４を用いて第１の閾値を取得してもよい。 The determination unit 160 acquires a first threshold for determining that the score is abnormal (S22). The determination unit 160 may acquire the first threshold by calculating the first threshold using, for example, a cross-validation method. In the cross-validation method, for example, a plurality of learning data 211 are divided into 4:1, a part (4:1 of 4) of the division is used for learning, and the rest (1 of 4:1) is used for verification. An autoencoder is learned using data used for learning out of the divided learning data 211, and an abnormality is determined by the autoencoder obtained by the learning using data used for verification out of the divided learning data 211. and the determination unit 160 may calculate the first threshold so that the probability of being determined to be abnormal is about 0.01%. The determination unit 160 may acquire the first threshold value by accepting the user's input to the input IF 105, or may acquire the first threshold value stored in the storage 103 in advance from the storage 103. Alternatively, the first threshold may be obtained from an external device using the communication IF 104 .

判定部１６０は、第２取得部１３０により取得された診断用データ２１２を取得する（Ｓ２３）。 The determination unit 160 acquires the diagnostic data 212 acquired by the second acquisition unit 130 (S23).

判定部１６０は、第２取得部１３０により取得された診断用データ２１２を学習済のオートエンコーダに入力することでスコアを算出する（Ｓ２４）。判定部１６０は、例えば、式１を用いてスコアを算出する。 The determination unit 160 calculates a score by inputting the diagnostic data 212 acquired by the second acquisition unit 130 to the learned autoencoder (S24). The determination unit 160 calculates the score using Equation 1, for example.

式１において、ｘは、オートエンコーダへの入力ベクトルを示し、ｉは、ｘのｉ次元目であることを示し、ｉは、オートエンコーダの出力ベクトルのｉ次元目であることを示す。また、式１において、ｍは、入力ベクトルおよび出力ベクトルの次元数であることを示す。また、式１において、スコアＪ（ｘ）は、診断用データをオートエンコーダに入力した場合の出力ベクトルとの２乗平均誤差を示す。 In Equation 1, x indicates the input vector to the autoencoder, i indicates the i-th dimension of x, and i indicates the i-th dimension of the output vector of the autoencoder. Also, in Equation 1, m indicates the number of dimensions of the input vector and the output vector. Also, in Equation 1, the score J(x) indicates the mean square error between the diagnostic data input to the autoencoder and the output vector.

判定部１６０は、算出したスコアが第１の閾値以上か否かを判定する（Ｓ２５）。 The determination unit 160 determines whether or not the calculated score is greater than or equal to the first threshold (S25).

判定部１６０は、スコアが第１の閾値以上の場合（Ｓ２５でＹｅｓ）、入力された診断用データが異常であると判定し（Ｓ２６）、判定結果を特定部１７０へ通知する。 If the score is greater than or equal to the first threshold (Yes in S25), the determination unit 160 determines that the input diagnostic data is abnormal (S26), and notifies the identification unit 170 of the determination result.

判定部１６０は、スコアが第１の閾値未満の場合（Ｓ２５でＮｏ）、入力された診断用データが正常であると判定する（Ｓ２７）。 If the score is less than the first threshold (No in S25), the determination unit 160 determines that the input diagnostic data is normal (S27).

特定部１７０は、算出されたスコアとグループ情報とを用いて異常の原因であるグループを特定する特定処理を行う（Ｓ２８）。特定処理の詳細は後述する。 The identification unit 170 performs identification processing for identifying the group that is the cause of the abnormality using the calculated score and group information (S28). Details of the specific processing will be described later.

［８．特定処理］
次に、異常の原因であるグループを特定する特定処理について、詳細に説明する。 [8. Specific processing]
Next, the identification processing for identifying the group that is the cause of the abnormality will be described in detail.

図１０は、実施の形態に係る異常の原因であるグループを特定する特定処理の一例を示すフローチャートである。 FIG. 10 is a flowchart illustrating an example of identifying processing for identifying a group that is the cause of an abnormality according to the embodiment.

特定部１７０は、判定部１６０においてで異常と判定された診断用データ２１２のスコアを用いて、異常の原因であるグループを特定する特定処理を行う。式２は、異常の原因であるグループを特定するために使用する損失関数である。 The identifying unit 170 uses the score of the diagnostic data 212 determined to be abnormal by the determining unit 160 to perform identification processing for identifying the group that is the cause of the abnormality. Equation 2 is the loss function used to identify the group responsible for the anomaly.

式２において、δは入力ベクトルと同じ次元数を持つ変位ベクトルを示し、λは正則化パラメータを示し、ｎ_ｇは変数グループの個数を示し、ｄ_ｉは変数グループｇ_ｉに属する変数の数を示す。Ｊ（ｘ）はスコアを示す。 In Equation 2, δ indicates a displacement vector with the same number of dimensions as the input vector, λ indicates a regularization parameter, n _g indicates the number of variable groups, and d _i indicates the number of variables belonging to variable group g _i . show. J(x) indicates the score.

損失関数は、式２に示すように、スコアと、グループ正則化による元の入力値からの変位δのＬ２ノルムをグループごとに足し合わせたスコアとの和として定義される。 The loss function is defined as the sum of the score and the groupwise summed score of the L2 norm of the displacement δ from the original input value due to group regularization, as shown in Equation 2.

式２におけるグループ正則化項は、グループ間ではｐ＝１であり、グループ内ではｐ＝２であるＬｐ正則化を満たす式である。なお、グループ正則化項は、式２に限らずに、グループ間では０＜ｐ≦１であり、かつ、グループ内ではｐ＞１であるＬｐ正則化を満たせばよい。 The group regularization term in Equation 2 is a formula that satisfies Lp regularization with p=1 between groups and p=2 within groups. Note that the group regularization term is not limited to Equation 2, and may satisfy Lp regularization where 0<p≦1 between groups and p>1 within a group.

また、式２ではグループ正則化は、パラメータλを掛けることで重み付けされている。 Also, in Equation 2, the group regularization is weighted by multiplying by the parameter λ.

特定部１７０は、パラメータλを取得する（Ｓ３１）。特定部１７０は、パラメータλとして、任意の値を取得してもよいし、交差検証法を用いて算出することで取得してもよい。また、特定部１７０は、ユーザの入力ＩＦ１０５への入力を受け付けることによりパラメータλを取得してもよいし、予めストレージ１０３に記憶されているパラメータλをストレージ１０３から取得してもよいし、外部の機器から通信ＩＦ１０４を用いてパラメータλを取得してもよい。 The identifying unit 170 acquires the parameter λ (S31). The specifying unit 170 may acquire an arbitrary value as the parameter λ, or may acquire it by performing calculation using a cross-validation method. Further, the specifying unit 170 may acquire the parameter λ by accepting an input to the input IF 105 from the user, acquire the parameter λ stored in advance in the storage 103 from the storage 103, or acquire the parameter λ stored in the storage 103 in advance. parameter λ may be acquired from the device using the communication IF 104 .

特定部１７０は、式２の損失関数の値が極小となる変位ベクトルδを、初期値０として勾配法で探索して算出する（Ｓ３２）。つまり、特定部１７０は、判定部１６０により第１の閾値以上であると判定されたスコアと、グループ情報で示される１以上のグループ毎に正則化することにより得られるグループ正則化項との和で構成される損失関数が極小値となる変位ベクトルδを算出する。 The specifying unit 170 searches and calculates the displacement vector δ that minimizes the value of the loss function of Equation 2 by using the gradient method as the initial value 0 (S32). That is, the identifying unit 170 adds the score determined by the determining unit 160 to be equal to or greater than the first threshold and the group regularization term obtained by regularizing each of the one or more groups indicated by the group information. Calculate the displacement vector δ at which the loss function composed of is the minimum value.

特定部１７０は、観測値の複数の変数のうちゼロでない変数、もしくは非常に小さな正の数ε（第２の閾値）に対し、絶対値が第２の閾値ε以上の変数を含むグループを異常の原因であるグループであると特定する（Ｓ３３）。第２の閾値は、０より大きく１未満の値である。 The identification unit 170 selects a group including a variable whose absolute value is equal to or greater than the second threshold ε for a non-zero variable or a very small positive number ε (second threshold) among the plurality of variables of the observation value. (S33). The second threshold is a value greater than 0 and less than 1.

［９．効果など］
本実施の形態に係る異常診断方法によれば、複数の変数を観測した値で構成される観測値において、変数のグループを予め定めておくため、広範な問題に対して高精度に異常の原因となるグループを特定することができる。 [9. effects, etc.]
According to the abnormality diagnosis method according to the present embodiment, in the observed values composed of observed values of a plurality of variables, a group of variables is determined in advance. group can be specified.

また、異常診断方法によれば、オートエンコーダなど、既によい性能を発揮することが知られている異常検知を用いて観測値が異常であるか否かを判定することができる。また、異常検知が出力するスコアを用いて、自然な損失関数を用いた異常診断を行うことが容易にできる。 Further, according to the abnormality diagnosis method, it is possible to determine whether or not an observed value is abnormal using abnormality detection such as an autoencoder, which is already known to exhibit good performance. Moreover, it is possible to easily perform abnormality diagnosis using a natural loss function by using the score output by the abnormality detection.

また、異常診断方法によれば、損失関数が極小値となる変位ベクトルのうち、第２の閾値以上の変数を含むグループを、異常の原因であるグループとして特定するため、効果的に異常の原因であるグループを特定することができる。 In addition, according to the abnormality diagnosis method, among the displacement vectors in which the loss function has a local minimum value, a group including a variable equal to or greater than the second threshold is specified as a group that is the cause of the abnormality. can identify groups that are

また、異常診断方法によれば、グループ正則化項は、グループ間では０＜ｐ≦１であり、かつ、グループ内ではｐ＞１であるＬｐ正則化を満たすため、効果的に異常の原因であるグループを特定することができる。 In addition, according to the abnormality diagnosis method, the group regularization term satisfies Lp regularization that 0 < p ≤ 1 between groups and p > 1 within the group, so it is effectively a cause of abnormality. A group can be identified.

［１０．その他の変形例］
なお、本発明を上記の実施の形態に基づいて説明してきたが、本発明は、上記の実施の形態に限定されないのはもちろんである。以下のような場合も本発明に含まれる。 [10. Other Modifications]
Although the present invention has been described based on the above embodiments, the present invention is of course not limited to the above embodiments. The following cases are also included in the present invention.

（１）
上記の実施の形態では、監視対象として制御システムの構成を説明した。そして、スイッチングハブから得られる観測データを用いて異常判定を実施するものとして説明したが、それだけに限られない。例えば化学プラントにおける種々のセンサから得られる観測値を観測データとして取得し、当該観測データを用いて異常判定を実施しても良い。 (1)
In the above embodiment, the configuration of the control system has been described as an object to be monitored. And although it demonstrated as what implements abnormality determination using the observation data obtained from a switching hub, it is not restricted only to it. For example, observation values obtained from various sensors in a chemical plant may be obtained as observation data, and abnormality determination may be performed using the observation data.

（２）
上記の実施の形態に係る異常診断装置１００では、ベクトル値への変換の手段として１－ｏｆ－Ｎエンコードを利用して数値化するとしたが、エンコードの方法はそれだけに限らない。例えば、非特許文献３にあるようなｅｍｂｅｄｄｉｎｇを用いる方法や、乱数を用いてエンコードする方法など、特定の次元数の実数値ベクトルに変換する方法が考えられる。これにより、より高精度に異常検知モデルの生成や異常判定を行うことができる。さらに、新たなカテゴリが生じる場合であっても、変数の次元を増やすことなくベクトル化することができる。 (2)
In the abnormality diagnosis apparatus 100 according to the above embodiment, 1-of-N encoding is used as a means of conversion into vector values for digitization, but the encoding method is not limited to this. For example, a method using embedding as described in Non-Patent Document 3, a method of encoding using random numbers, or a method of converting to a real-valued vector with a specific number of dimensions can be considered. As a result, an abnormality detection model can be generated and abnormality determination can be performed with higher accuracy. Furthermore, even if new categories arise, they can be vectorized without increasing the dimensionality of the variables.

（３）
上記の実施の形態に係る異常診断装置１００では、変数のグルーピングをユーザからの入力ＩＦ１０５への入力に基づいて設定するとしたが、これに限らない。ＵＩ１１１、１１１ａには、さらに、変数間の相互情報量を示す表示が追加されてもよい。また、ユーザから入力ＩＦ１０５にグルーピングしたいグループ数を示す入力が受け付けられた場合、当該入力で示されるグループ数に応じて、非特許文献４または非特許文献５に記載の方法で自動的にグルーピングしてもよい。 (3)
In the abnormality diagnosis device 100 according to the above embodiment, the grouping of variables is set based on the input to the input IF 105 from the user, but it is not limited to this. The UIs 111 and 111a may further include a display showing mutual information between variables. Also, when an input indicating the number of groups to be grouped is received by the input IF 105 from the user, grouping is automatically performed by the method described in Non-Patent Document 4 or Non-Patent Document 5 according to the number of groups indicated by the input. may

また、変数のグルーピングとして、任意の変数を引数とする関数、例えば、恒等関数が別の変数である場合や、任意の角度θの三角関数であるｓｉｎθおよびｃｏｓθが別の変数である場合、変数間の関連の大きさに応じてこれらの変数のグルーピングをすればよい。 In addition, as a grouping of variables, a function with an arbitrary variable as an argument, for example, when the identity function is another variable, or when sin θ and cos θ, which are trigonometric functions of an arbitrary angle θ, are different variables, These variables may be grouped according to the degree of association between them.

（４）
上記の実施の形態において、オートエンコーダを用いた異常判定を行ったが、他の方法を用いて異常判定を行ってもよい。例えば、カーネル密度推定によって確率密度関数を推定し、所定の閾値を下回る場合に異常と判定する方法を用いてもよいし、オートエンコーダの代わりに１クラスサポートベクトルマシンまたは変分オートエンコーダを用いて異常判定を行ってもよい。 (4)
In the above embodiment, abnormality determination is performed using an autoencoder, but abnormality determination may be performed using other methods. For example, a method of estimating the probability density function by kernel density estimation and determining that it is abnormal when it is below a predetermined threshold may be used, or a one-class support vector machine or a variational autoencoder may be used instead of the autoencoder. Abnormality determination may be performed.

１クラスサポートベクトルマシンを用いる場合、カーネル法を使って入力ベクトルを特徴空間に写像し、その特徴空間上で正常と異常を識別する境界面を決定することで、異常判定を行う。 When a one-class support vector machine is used, the kernel method is used to map the input vector to the feature space, and the boundary plane that distinguishes between normal and abnormal is determined on the feature space to perform abnormality determination.

変分オートエンコーダを用いる場合、入力データが生成される尤度を算出するため、異常の原因となる変数グループを特定する際に用いる損失関数は、２乗平均誤差ではなく、負の対数尤度などを用いることができる。 When using a variational autoencoder, the loss function used to identify groups of variables responsible for anomalies is the negative log-likelihood, rather than the mean squared error, to calculate the likelihood that the input data will be generated. etc. can be used.

他にも、正常だと仮定できるサンプル群と、異常判定対象のサンプル群との間の確率密度の比を推定する密度比推定を使って異常を判定してもよい。この場合、異常診断装置１００では、生成部１４０は、学習用の正常な複数の観測データと、診断用の複数の観測データに基づいて、学習用データと診断用データの確率密度比に基づく異常検知モデルを生成する。つまり、上記実施の形態では、生成部１４０は、異常なデータを含む観測データ２１０を学習用データ２１１として用いているが、確率密度比に基づく異常検知モデルを生成する場合、異常なデータを除いた正常なデータのみの学習用データ２１１を用いる点と、さらに診断用データ２１２を用いる点とが異なる。また、判定部１６０は、診断用の複数の観測データに関し、生成部１４０により生成された確率密度比に基づく異常検知モデルを用いてスコアを算出する。そして、判定部１６０は、算出したスコアが予め定められた第１の閾値以上の場合、取得した複数の観測値が異常であると判定し、算出したスコアが第１の閾値未満の場合、取得した複数の観測値が異常でないと判定する。 Alternatively, an abnormality may be determined using density ratio estimation, which estimates the ratio of probability densities between a group of samples that can be assumed to be normal and a group of samples that are subject to abnormality determination. In this case, in the abnormality diagnosis apparatus 100, the generation unit 140 generates an abnormality based on the probability density ratio between the learning data and the diagnosis data based on the plurality of normal observation data for learning and the plurality of observation data for diagnosis. Generate a detection model. That is, in the above embodiment, the generation unit 140 uses the observation data 210 including abnormal data as the learning data 211. However, when generating an abnormality detection model based on the probability density ratio, the abnormal data is removed. The difference is that learning data 211 containing only normal data is used, and diagnostic data 212 is used. Further, the determination unit 160 calculates a score for a plurality of observation data for diagnosis using an anomaly detection model based on the probability density ratio generated by the generation unit 140 . Then, when the calculated score is equal to or greater than a predetermined first threshold, the determination unit 160 determines that the acquired plurality of observation values are abnormal, and when the calculated score is less than the first threshold, the acquired It is judged that the observed values are not abnormal.

（５）
上記の実施の形態では、異常の原因となる変数グループを特定する方法として、グループ正則化を用いたが、その他にも様々な方法が考えられる。例えば、オートエンコーダの出力値の２乗平均誤差が所定の値を超えている変数グループを異常の原因となる変数グループと特定してもよいし、特定の変数グループ以外の変数を固定し、特定の変数グループのみを変数とみなしてスコアを極小となる値を勾配法で求め、最もスコアが小さくなる場合の変数グループを異常の原因となる変数グループであるとしてもよい。 (5)
In the above embodiments, group regularization is used as a method of identifying variable groups that cause anomalies, but various other methods are conceivable. For example, a variable group in which the mean square error of the output value of the autoencoder exceeds a predetermined value may be identified as the variable group that causes the abnormality, or variables other than the specific variable group may be fixed and identified. may be regarded as variables, the value that minimizes the score may be obtained by the gradient method, and the variable group with the smallest score may be regarded as the variable group causing the abnormality.

なお、上記各実施の形態において、各構成要素は、専用のハードウェアで構成されるか、各構成要素に適したソフトウェアプログラムを実行することによって実現されてもよい。各構成要素は、ＣＰＵまたはプロセッサなどのプログラム実行部が、ハードディスクまたは半導体メモリなどの記録媒体に記録されたソフトウェアプログラムを読み出して実行することによって実現されてもよい。ここで、上記各実施の形態の異常診断方法などを実現するソフトウェアは、次のようなプログラムである。 In each of the above-described embodiments, each component may be configured by dedicated hardware, or realized by executing a software program suitable for each component. Each component may be realized by reading and executing a software program recorded in a recording medium such as a hard disk or a semiconductor memory by a program execution unit such as a CPU or processor. Here, the software that implements the abnormality diagnosis method and the like of each of the above embodiments is the following program.

すなわち、このプログラムは、コンピュータに、監視対象の状態を観測することにより得られた、前記状態を示す複数の変数の値で構成される観測値を用いて、当該観測値が異常であるか否かを診断する異常診断装置が実行する異常診断方法であって、前記異常診断装置は、プロセッサおよびメモリを備え、前記メモリは、複数の前記観測値を用いた学習により生成された異常検知モデルを記憶しており、前記プロセッサは、前記複数の変数のうち互いに関連のある少なくとも２つの変数の組み合わせでそれぞれが構成される１以上のグループを示すグループ情報を取得し、前記観測値を取得し、前記メモリから前記異常検知モデルを読み出して、読み出した前記異常検知モデルを用いて前記観測値が異常であるか否かを判定し、前記観測値が異常であると判定した場合、当該観測値と取得した前記グループ情報で示される前記１以上のグループとに基づいて、当該観測値の前記１以上のグループのうち、異常の原因であるグループを特定する異常診断方法を実行させる。 That is, this program uses an observed value obtained by observing the state of a monitoring target, which is composed of the values of a plurality of variables indicating the state, to determine whether the observed value is abnormal. An abnormality diagnosis method executed by an abnormality diagnosis device for diagnosing a storing, the processor acquires group information indicating one or more groups each composed of a combination of at least two mutually related variables among the plurality of variables, acquires the observed value, reading the anomaly detection model from the memory, determining whether the observed value is abnormal using the read anomaly detection model, and determining that the observed value is abnormal, the observed value and Based on the one or more groups indicated by the acquired group information, an abnormality diagnosis method is executed to identify the group that is the cause of the abnormality among the one or more groups of the observed values.

以上、本発明の一つまたは複数の態様に係る異常診断方法および異常診断装置について、実施の形態に基づいて説明したが、本発明は、この実施の形態に限定されるものではない。本発明の趣旨を逸脱しない限り、当業者が思いつく各種変形を本実施の形態に施したものや、異なる実施の形態における構成要素を組み合わせて構築される形態も、本発明の一つまたは複数の態様の範囲内に含まれてもよい。 Although the abnormality diagnosis method and the abnormality diagnosis device according to one or more aspects of the present invention have been described above based on the embodiments, the present invention is not limited to these embodiments. As long as it does not depart from the gist of the present invention, one or more modifications of the present embodiment that can be considered by those skilled in the art, or a form constructed by combining the components of different embodiments may be included within the scope of the embodiments.

本開示は、多次元情報における異常に寄与する変数を特定する場合に有用である。 The present disclosure is useful in identifying variables that contribute to anomalies in multidimensional information.

１異常診断システム
１００異常診断装置
１０１ＣＰＵ
１０２メインメモリ
１０３ストレージ
１０４通信ＩＦ
１０５入力ＩＦ
１０６ディスプレイ
１１０第１取得部
１１１、１１１ａＵＩ
１１２項目
１１３グループ追加ボタン
１１４グループ削除ボタン
１１５グループなし
１２０グループ情報ＤＢ
１３０第２取得部
１４０生成部
１５０異常検知モデルＤＢ
１６０判定部
１７０特定部
２００サーバ
２１０観測データ
２１１学習用データ
２１２診断用データ
３００監視対象
３１０制御システム
３１１ルータ
３１２スイッチングハブ
３１３管理端末
３１４サーバ
３１５ＰＬＣ
３１６センサ 1 abnormality diagnosis system 100 abnormality diagnosis device 101 CPU
102 Main memory 103 Storage 104 Communication IF
105 input interface
106 display 110 first acquisition unit 111, 111a UI
112 Item 113 Group add button 114 Group delete button 115 No group 120 Group information DB
130 Second acquisition unit 140 Generation unit 150 Anomaly detection model DB
160 determination unit 170 identification unit 200 server 210 observation data 211 learning data 212 diagnostic data 300 monitoring target 310 control system 311 router 312 switching hub 313 management terminal 314 server 315 PLC
316 sensor

Claims

An abnormality diagnosis device for diagnosing whether or not an observed value is abnormal using an observed value composed of values of a plurality of variables indicating the state obtained by observing the state of a monitoring target. An abnormality diagnosis method for
The abnormality diagnosis device includes a processor and a memory,
The memory stores an anomaly detection model generated by learning using the plurality of observed values,
The processor
Acquiring group information indicating one or more groups each composed of a combination of at least two mutually related variables among the plurality of variables;
obtaining said observations;
reading the anomaly detection model from the memory and calculating a score by inputting the observed value into the anomaly detection model;
Determining whether the observed value is abnormal using the score,
When the observed value is determined to be abnormal, using a loss function defined based on the score and the one or more groups indicated by the acquired group information, the one or more groups of the observed value Identify the group responsible for the anomaly among
The anomaly detection model is a model generated by at least one of an autoencoder as learning, a variational autoencoder, and a one-class support vector machine using the plurality of observations.
Anomaly diagnosis method.

An abnormality diagnosis device for diagnosing whether or not an observed value is abnormal using an observed value composed of values of a plurality of variables indicating the state obtained by observing the state of a monitoring target. An abnormality diagnosis method for
The abnormality diagnosis device includes a processor and a memory,
The memory stores an anomaly detection model generated by learning using the plurality of observed values,
The processor
Acquiring group information indicating one or more groups each composed of a combination of at least two mutually related variables among the plurality of variables;
obtaining said observations;
reading the anomaly detection model from the memory and calculating a score by inputting the observed value into the anomaly detection model;
Determining whether the observed value is abnormal using the score,
When the observed value is determined to be abnormal, using a loss function defined based on the score and the one or more groups indicated by the acquired group information, the one or more groups of the observed value Identify the group responsible for the anomaly among
If the anomaly detection model is a model generated by an autoencoder as the learning using the plurality of observations, the score is the mean squared error between the input and output vectors to the autoencoder. calculated by
Abnormal diagnosis method.

An abnormality diagnosis device for diagnosing whether or not an observed value is abnormal using an observed value composed of values of a plurality of variables indicating the state obtained by observing the state of a monitoring target. An abnormality diagnosis method for
The abnormality diagnosis device includes a processor and a memory,
The memory stores an anomaly detection model generated by learning using the plurality of observed values,
The processor
Acquiring group information indicating one or more groups each composed of a combination of at least two mutually related variables among the plurality of variables;
obtaining said observations;
reading the anomaly detection model from the memory and calculating a score by inputting the observed value into the anomaly detection model;
Determining whether the observed value is abnormal using the score,
When the observed value is determined to be abnormal, using a loss function defined based on the score and the one or more groups indicated by the acquired group information, the one or more groups of the observed value Identify the group responsible for the anomaly among
If the anomaly detection model is a model generated by a variational autoencoder as the learning using the plurality of observations, the loss function is defined using negative log-likelihood.
Abnormal diagnosis method.

An abnormality diagnosis device for diagnosing whether or not an observed value is abnormal using an observed value composed of values of a plurality of variables indicating the state obtained by observing the state of a monitoring target. An abnormality diagnosis method for
The abnormality diagnosis device includes a processor and a memory,
The memory stores an anomaly detection model generated by learning using the plurality of observed values,
The processor
Acquiring group information indicating one or more groups each composed of a combination of at least two mutually related variables among the plurality of variables;
obtaining said observations;
reading the anomaly detection model from the memory and calculating a score by inputting the observed value into the anomaly detection model;
Determining whether the observed value is abnormal using the score,
When the observed value is determined to be abnormal, using a loss function defined based on the score and the one or more groups indicated by the acquired group information, the one or more groups of the observed value Identify the group responsible for the anomaly among
In identifying the group responsible for the anomaly,
Obtaining a displacement vector that reduces the value of the loss function using the observed value as an initial value,
Identify a group containing non-zero variables in the displacement vector as the cause of the anomaly
Abnormal diagnosis method.

In determining whether the observed value is abnormal,
If the calculated score is equal to or greater than a predetermined first threshold, determine that the acquired observation value is abnormal,
The abnormality diagnosis method according to any one of claims 1 to 4 , wherein if the calculated score is less than the first threshold, it is determined that the plurality of acquired observation values are not abnormal.

An anomaly diagnosis apparatus for diagnosing whether or not an observed value is anomalous, using an observed value composed of a plurality of variable values indicating the state obtained by observing the state of a monitoring target. hand,
with processor and memory,
The memory stores an anomaly detection model generated by learning using the plurality of observed values,
The processor
Acquiring group information indicating one or more groups each composed of a combination of at least two mutually related variables among the plurality of variables;
obtaining said observations;
reading the anomaly detection model from the memory and calculating a score by inputting the observed value into the anomaly detection model;
Determining whether the observed value is abnormal using the score,
When the observed value is determined to be abnormal, using a loss function defined based on the score and the one or more groups indicated by the acquired group information, the one or more groups of the observed value Identify the group responsible for the anomaly among
The anomaly detection model is a model generated by at least one of an autoencoder as learning, a variational autoencoder, and a one-class support vector machine using the plurality of observations.
Abnormal diagnosis device.

Consists of multiple variable values that indicate the state of a monitoring target, obtained by observing the state of the monitoring target
An abnormality diagnosis device for diagnosing whether the observed value is abnormal using the observed value obtained,
with processor and memory,
The memory stores an anomaly detection model generated by learning using the plurality of observed values,
The processor
Acquiring group information indicating one or more groups each composed of a combination of at least two mutually related variables among the plurality of variables;
obtaining said observations;
reading the anomaly detection model from the memory and calculating a score by inputting the observed value into the anomaly detection model;
Determining whether the observed value is abnormal using the score,
When the observed value is determined to be abnormal, using a loss function defined based on the score and the one or more groups indicated by the acquired group information, the one or more groups of the observed value Identify the group responsible for the anomaly among
If the anomaly detection model is a model generated by an autoencoder as the learning using the plurality of observations, the score is the mean squared error between the input and output vectors to the autoencoder. calculated by
Abnormal diagnosis device.

Consists of multiple variable values that indicate the state of a monitoring target, obtained by observing the state of the monitoring target
An abnormality diagnosis device for diagnosing whether the observed value is abnormal using the observed value obtained,
with processor and memory,
The memory stores an anomaly detection model generated by learning using the plurality of observed values,
The processor
Acquiring group information indicating one or more groups each composed of a combination of at least two mutually related variables among the plurality of variables;
obtaining said observations;
reading the anomaly detection model from the memory and calculating a score by inputting the observed value into the anomaly detection model;
Determining whether the observed value is abnormal using the score,
When the observed value is determined to be abnormal, using a loss function defined based on the score and the one or more groups indicated by the acquired group information, the one or more groups of the observed value Identify the group responsible for the anomaly among
If the anomaly detection model is a model generated by a variational autoencoder as the learning using the plurality of observations, the loss function is defined using negative log-likelihood.
Abnormal diagnosis device.

Consists of multiple variable values that indicate the state of a monitoring target, obtained by observing the state of the monitoring target
An abnormality diagnosis device for diagnosing whether the observed value is abnormal using the observed value obtained,
with processor and memory,
The memory stores an anomaly detection model generated by learning using the plurality of observed values,
The processor
Acquiring group information indicating one or more groups each composed of a combination of at least two mutually related variables among the plurality of variables;
obtaining said observations;
reading the anomaly detection model from the memory and calculating a score by inputting the observed value into the anomaly detection model;
Determining whether the observed value is abnormal using the score,
When the observed value is determined to be abnormal, using a loss function defined based on the score and the one or more groups indicated by the acquired group information, the one or more groups of the observed value Identify the group responsible for the anomaly among
In identifying the group responsible for the anomaly,
Obtaining a displacement vector that reduces the value of the loss function using the observed value as an initial value,
Identify a group containing non-zero variables in the displacement vector as the cause of the anomaly
Abnormal diagnosis device.