JP2017215765A

JP2017215765A - Abnormality detector, abnormality detection method and abnormality detection program

Info

Publication number: JP2017215765A
Application number: JP2016109033A
Authority: JP
Inventors: 山中　章裕; Akihiro Yamanaka; 章裕山中; 中村　吉孝; Yoshitaka Nakamura; 吉孝中村; 祥史武市; Yoshifumi Takeichi; 弘仁丸山; Hirohito Maruyama; 慶一郎中川; Keiichiro Nakagawa; 明典松尾; Akinori Matsuo
Original assignee: Nippon Telegraph and Telephone Corp; NTT Data Mathematical Systems Inc; NTT TechnoCross Corp
Current assignee: Nippon Telegraph and Telephone Corp; NTT Data Mathematical Systems Inc; NTT TechnoCross Corp
Priority date: 2016-05-31
Filing date: 2016-05-31
Publication date: 2017-12-07
Anticipated expiration: 2036-05-31
Also published as: JP6538615B2

Abstract

PROBLEM TO BE SOLVED: To accurately detect abnormality of data to be detected, based on relationship among data.SOLUTION: An abnormality detector 10 includes: a divergence value vector calculation unit 122 configured to calculate a divergence value vector indicating divergence of detection data as a detection object from relationship among data, based on the relationship among data, and a divergence value vector indicating divergence of comparison data as a comparison object from the relationship among data; an abnormality degree calculation unit 123 configured to calculate discrete degree of the divergence value vector of the detection data from aggregation of the divergence value vectors of the comparison data, as degree indicating abnormality; and an abnormality determination unit 124 configured to determine that the detection data is abnormal when the discrete degree exceeds a predetermined threshold.SELECTED DRAWING: Figure 1

Description

本発明は、異常検知装置、異常検知方法及び異常検知プログラムに関する。 The present invention relates to an abnormality detection device, an abnormality detection method, and an abnormality detection program.

多数のサーバやルータ等の機器で構成されるシステムでは、各装置のハードディスクやメモリ等のハードウェア異常を検知することが運用上必要になる。情報・通信機器に限らず、多数のセンサを持つ産業機械や自動車などにおいても、温度や加速度等のセンサデータから機械の異常検知をしたいという需要が存在する。また、機器類だけでなく、例えばクレジットカードの利用状況を分析することによる不正使用の検知、情報セキュリティにおけるＤＤｏＳ（Distributed Denial of Service）攻撃の検知分野やマルウェアの検知分野でも、異常検知技術が用いられている。 In a system composed of a large number of devices such as servers and routers, it is necessary to detect hardware abnormalities such as hard disks and memories of each device. There is a demand not only for information / communication equipment but also for industrial machines and automobiles having a large number of sensors to detect machine abnormalities from sensor data such as temperature and acceleration. In addition to devices, for example, detection of unauthorized use by analyzing credit card usage, abnormal detection technology used in the field of DDoS (Distributed Denial of Service) attacks and malware detection in information security It has been.

最も簡単な異常検知の方法として、人間がデータを直接視認することで、データにおける異常を検知するという方法が考えられる。しかし、多数のデータがどのような傾向を示した時に異常であるかを把握するには、相当な習熟が必要になるため、人間の目で異常判定することは困難である。仮に人間の目で異常検知できるとしても、特定の習熟した作業者に依存してしまい、その作業者がいなくなれば異常検知はできなくなってしまう。以上を鑑みると、異常検知の仕組みを何らかの方法で機械的に実行することが必要になる。 As the simplest abnormality detection method, a method in which an abnormality in data is detected by a human directly viewing the data can be considered. However, it is difficult to determine an abnormality with the human eye because it requires considerable skill to grasp the tendency when a lot of data shows an abnormality. Even if an abnormality can be detected by human eyes, it depends on a specific skilled worker, and if the worker disappears, the abnormality cannot be detected. In view of the above, it is necessary to mechanically execute the mechanism of abnormality detection by some method.

このような機械的な異常検知の方法として最も簡易なものに、データの値域に閾値を設け、閾値を超えた際に異常であると判定する方法がある。この方法は、適切に機能する場合があるものの、一般的には、妥当な閾値を決定することが困難である。これは、閾値を大きくすると、本来異常として発見したい事象をとり損ねる可能性がある一方で、閾値を小さくすると異常でないのに異常であると判定してしまう事象が増えるためである。 One of the simplest methods for detecting such mechanical anomalies is a method in which a threshold value is provided in the data range, and an abnormality is determined when the threshold value is exceeded. Although this method may work properly, it is generally difficult to determine a reasonable threshold. This is because, if the threshold value is increased, an event that is originally supposed to be detected as an abnormality may be missed. On the other hand, if the threshold value is decreased, an event that is determined to be abnormal although not abnormal is increased.

そこで、それまでに出現していないパターンを発見し、異常であると判定する異常検知方法として、広く利用されている方法にＬＯＦが提案されている（例えば、非特許文献１参照）。ＬＯＦは、データ空間内での局所的密度を計算する方法である。 Thus, LOF has been proposed as a widely used method for detecting an abnormality that has not occurred so far and determining an abnormality (for example, see Non-Patent Document 1). LOF is a method for calculating local density in data space.

具体的には、ＬＯＦは、新たに得られたデータが、それまでに得られているデータの空間の中で密度の高い箇所に存在する場合は、異常度合いを表す数値を小さく出力する。言い換えると、ＬＯＦは、新たに得られたデータがそれまでにも得られているデータと類似するデータである場合、新たに得られたデータは、異常ではないと判定する。一方、新たに得られたデータが密度の低い箇所に存在する場合、異常度合いを表す数値を大きく出力する。言い換えると、ＬＯＦは、新たに得られたデータがそれまでに得られているデータと類似しないデータである場合、新たに得られたデータは、異常であると判定する。 Specifically, the LOF outputs a small numerical value indicating the degree of abnormality when newly obtained data is present at a high density location in the data space obtained so far. In other words, the LOF determines that the newly obtained data is not abnormal when the newly obtained data is similar to the data obtained so far. On the other hand, when newly obtained data is present at a location with low density, a numerical value representing the degree of abnormality is output to a large value. In other words, the LOF determines that the newly obtained data is abnormal when the newly obtained data is not similar to the data obtained so far.

また、データ間に何らかの関係性がある場合に、この関係性が崩れたことから異常を検知する方法がある。この方法として、複数種類のデータ間の相関関係が維持されているか否かを分析し、その結果から異常を発見する方法が提案されている。この方法では、データを２組ずつ選び、例えば単回帰により一方から他方を予測する関数を構築し、その予測値が観測値から一定以上離れていることによって相関関係が破壊されているとみなしている。 In addition, when there is some relationship between the data, there is a method of detecting an abnormality because the relationship is broken. As this method, a method of analyzing whether or not correlation between a plurality of types of data is maintained and finding an abnormality from the result has been proposed. In this method, two sets of data are selected, a function that predicts one from the other by simple regression, for example, is constructed, and the correlation is considered to be destroyed because the predicted value is more than a certain distance from the observed value. Yes.

M. Breunig, H. Kriegel, R. Ng， and J. Sander, “ＬＯＦ: Identifying Density-Based Local Outliers”, SIGMOD, Volume 29 Issue 2, 93-104, 2000M. Breunig, H. Kriegel, R. Ng, and J. Sander, “LOF: Identifying Density-Based Local Outliers”, SIGMOD, Volume 29 Issue 2, 93-104, 2000

しかしながら、ＬＯＦでは、データ間に相関がある場合に、相関に従っているデータであって本来正常であるデータであっても、データの集合から外れているデータを異常として検知する、という問題がある。具体的に、図１３〜図１５を参照して、従来技術の問題を説明する。図１３〜図１５は、従来技術に係る異常検知方法を説明するための図である。図１３〜図１５は、データとしてＸ及びＹの組が与えられたとして、座標平面上にその組をプロットしたものである。 However, LOF has a problem that when there is a correlation between data, even if the data conforms to the correlation and is normally normal, data that is out of the data set is detected as abnormal. Specifically, the problems of the prior art will be described with reference to FIGS. 13 to 15 are diagrams for explaining an abnormality detection method according to the related art. In FIGS. 13 to 15, assuming that a set of X and Y is given as data, the set is plotted on a coordinate plane.

図１３では、Ｘ及びＹの組に対する点として、平均ベクトルが（３，３）であり、共分散行列が［［１，０．９］、［０．９，１］］の２次元正規分布に従う点を２００点（白丸）プロットしている。また、右上に位置する点Ｐｂは、(６．５，６．５)の点である。また、左上の点Ｐｒは、（２，５）の点である。 In FIG. 13, as a point for a set of X and Y, a two-dimensional normal distribution with an average vector of (3, 3) and a covariance matrix of [[1, 0.9], [0.9, 1]] 200 points (white circles) are plotted according to the above. The point Pb located at the upper right is the point (6.5, 6.5). The upper left point Pr is the point (2, 5).

この図１３では、次のような、データ間に相関が見られる場合をイメージして、ＸとＹとの組に対する点をプロットしている。例えば、Ｘ及びＹが、アプリケーションサーバＡとアプリケーションサーバＢとのＣＰＵ（Central Processing Unit）使用率をそれぞれ表しているとする。そして、アプリケーションサーバＡとアプリケーションサーバＢとが、Ｗｅｂサーバからのリクエストを均等に受け付けている場合を例とする。 In FIG. 13, points for a set of X and Y are plotted in the image of the case where the following correlation is found between data. For example, let X and Y represent the CPU (Central Processing Unit) usage rates of the application server A and the application server B, respectively. Then, as an example, the application server A and the application server B receive requests from the Web server equally.

この場合、Ｗｅｂサーバへのアクセス数が増加し、アプリケーションサーバへのリクエストが増加すると、Ｘ及びＹがともに上昇すると考えられる。逆に、Ｗｅｂサーバへのアクセス数が減少すると、Ｘ及びＹはともに下降すると考えられる。図１３において、白丸は、正常な状態のデータの例である。そして、点Ｐｂは、相関関係を維持したままで、それまでには存在していなかった値をとった場合の例である。また、点Ｐｒは、相関関係が崩れた場合の例である。以下、正常な状態のデータである白丸が先に与えられ、続いて異常検知対象データとして、点Ｐｂ及び点Ｐｒが与えられた状況を考える。 In this case, if the number of accesses to the Web server increases and the number of requests to the application server increases, both X and Y are considered to increase. Conversely, when the number of accesses to the Web server decreases, both X and Y are considered to decrease. In FIG. 13, white circles are examples of data in a normal state. The point Pb is an example in a case where the correlation is maintained and a value that has not existed until then is taken. The point Pr is an example when the correlation is broken. Hereinafter, a situation is considered in which white circles, which are data in a normal state, are given first, and then points Pb and Pr are given as abnormality detection target data.

このうち、点Ｐｂは、正常な状態において成り立つ相関関係を維持しているため、本来、異常でないと判定すべき場合がある。一方、点Ｐｒでは正常な状態において成り立つ相関関係には従っていないため、異常であると判定すべき場合がある。しかしながら、ＬＯＦではデータ間の関係性を考慮していないため、白丸の密度が低い点に存在する点Ｐｂ及び点Ｐｒは、いずれも異常度合いを表わす数値が大きく出力されてしまう。すなわち、異常であると判定すべきではない点Ｐｂにおいて、異常であると判定されてしまう問題がある。 Among these points, the point Pb maintains a correlation that holds in a normal state, and therefore it may be determined that the point Pb is not normally abnormal. On the other hand, the point Pr does not follow the correlation established in the normal state, and therefore it may be determined to be abnormal. However, since the relationship between data is not taken into account in LOF, a large numerical value indicating the degree of abnormality is output for both the point Pb and the point Pr existing at points where the density of white circles is low. That is, there is a problem that it is determined to be abnormal at the point Pb that should not be determined to be abnormal.

一方、複数種類データ間の相関関係が維持されているか否かを分析し、その結果から異常を発見する方法では、図１４や図１５に示すように、データ間の関係性が複数あり、複雑な関係が見られる場合に異常を検知することが難しいという問題がある。 On the other hand, in the method of analyzing whether or not the correlation between a plurality of types of data is maintained and finding an abnormality from the result, there are a plurality of relationships between the data as shown in FIG. 14 and FIG. There is a problem that it is difficult to detect an abnormality when there is a significant relationship.

例えば、図１４の白丸は、平均ベクトルが（３，３）、共分散行列が［［１，０．９９］，［０．９９，１］］の２次元正規分布に従う２００点（データ群Ｒｂ）と、これらを（π／５）だけ反時計回りに回転させた点（データ群Ｒｂ´）である。図１４の例では、データ間の関係性が２つあり、検知データをいずれの相関関係と比較すればよいか判断が難しい。そして、この図１４では、白丸で表示された点の相関係数は「０．０７」と小さい。したがって、関係性が崩れたことから異常を検知する方法でも、図１４の例では、正常な状態のデータに限っても相関が見られないと判断するため、異常検知を適切に実行することが難しい。 For example, the white circles in FIG. 14 are 200 points (data group Rb) according to a two-dimensional normal distribution with an average vector of (3, 3) and a covariance matrix of [[1, 0.99], [0.99, 1]]. ) And a point (data group Rb ′) obtained by rotating these counterclockwise by (π / 5). In the example of FIG. 14, there are two relationships between data, and it is difficult to determine which correlation the detected data should be compared with. And in this FIG. 14, the correlation coefficient of the point displayed with the white circle is as small as “0.07”. Therefore, even in the method of detecting an abnormality because the relationship is broken, in the example of FIG. 14, it is determined that no correlation is found even if the data is in a normal state. difficult.

そして、図１５では、正常な状態が、データで見ると３つの離れた群を構成している場合のイメージを示す。図１５中の白丸は正常な状態のデータである。点Ｐｇは、異常判定対象のデータである。また、直線Ｌｂは、正常な白丸のみから単回帰直線（具体的には、「Ｙ＝０．９１７５Ｘ＋０．１０８１」の関係を有する。）を計算し、図１５中にプロットしたものである。図１５中、左下の白丸の群（データ群Ｒｃ）は、Ｘ及びＹが、それぞれ平均が「０．５」、標準偏差が「０．１」の正規分布に従う乱数をとった１００点をプロットしたものである。右の直線Ｌｂの上の群（データ群Ｒｄ）は、Ｘは平均「３」、標準偏差「０．１」、Ｙは平均「４」、標準偏差「０．１」の正規分布に従う乱数をとった２０点をプロットしたものである。右の直線Ｌｂの下の群（データ群Ｒｅ）は、Ｘは平均「４」、標準偏差「０．１」、Ｙは平均「３」、標準偏差「０．１」の正規分布に従う乱数をとった２０点をプロットしたものである。なお、白丸の相関係数は「０．９２３１」という高い値をとっている。 FIG. 15 shows an image in a case where the normal state constitutes three separate groups as viewed in data. White circles in FIG. 15 are data in a normal state. The point Pg is data for abnormality determination. The straight line Lb is a single regression line (specifically, having a relationship of “Y = 0.9175X + 0.1081”) calculated from only normal white circles and plotted in FIG. In FIG. 15, the white circle group at the lower left (data group Rc) plots 100 points that are random numbers according to a normal distribution in which X and Y each have an average of “0.5” and a standard deviation of “0.1”. It is a thing. The group on the right straight line Lb (data group Rd) is a random number according to a normal distribution in which X is an average “3”, standard deviation “0.1”, Y is an average “4”, and standard deviation “0.1”. The 20 points taken are plotted. The group below the right straight line Lb (data group Re) is a random number according to a normal distribution with X being an average of “4” and standard deviation of “0.1”, and Y being an average of “3” and standard deviation of “0.1”. The 20 points taken are plotted. Note that the white circle has a high correlation coefficient of “0.9231”.

ここで、図１５に示す点Ｐｇは、明らかに白丸の３つのデータ群Ｒｃ〜Ｒｄから大きく離れている。しかしながら、単回帰直線の直線Ｌｂには近い位置であるため、点Ｐｇと直線Ｌｂとの比較を行うだけでは、点Ｐｇが直線Ｌｂに対応する相関から崩れていると判断することはできない。すなわち、点Ｐｇは、正常な状態のデータから見れば離れた箇所に存在するものであるが、関係性が崩れたことから異常を検知する方法では、この点Ｐｇを異常であると検知することができない。 Here, the point Pg shown in FIG. 15 is clearly far from the three data groups Rc to Rd of white circles. However, since the position is close to the straight line Lb of the single regression line, it is not possible to determine that the point Pg is broken from the correlation corresponding to the straight line Lb only by comparing the point Pg with the straight line Lb. That is, the point Pg is present at a location distant from the normal state data, but in the method of detecting an abnormality because the relationship is broken, the point Pg is detected as abnormal. I can't.

本発明は、上記に鑑みてなされたものであって、データ間の関係性に基づいた検知対象データの異常検知を精度よく実行することができる異常検知装置、異常検知方法及び異常検知プログラムを提供することを目的とする。 The present invention has been made in view of the above, and provides an anomaly detection apparatus, an anomaly detection method, and an anomaly detection program capable of accurately detecting anomaly of detection target data based on the relationship between data. The purpose is to do.

上述した課題を解決し、目的を達成するために、本発明に係る異常検知装置は、データ間の関係性に基づいて、検知対象である検知データの、データ間の関係性からの乖離を表す乖離値ベクトルと、比較対象である比較データの、データ間の関係性からの乖離を表す乖離値ベクトルと、を計算する乖離値ベクトル計算部と、検知データの乖離値ベクトルの、比較データの乖離値ベクトルの集合からの離散度合を、異常を示す度合として計算する異常度計算部と、離散度合が所定の閾値を超えた場合に検知データは異常であることを判定する異常判定部と、を有する。 In order to solve the above-described problems and achieve the object, the abnormality detection device according to the present invention represents a deviation of the detection data, which is a detection target, from the relationship between the data based on the relationship between the data. Deviation value vector calculation unit for calculating deviation value vector and deviation value vector representing deviation from relationship between comparison data of comparison data to be compared, and deviation of comparison data between deviation value vector of detection data An abnormality degree calculation unit that calculates a discrete degree from a set of value vectors as a degree indicating abnormality, and an abnormality determination part that determines that the detected data is abnormal when the discrete degree exceeds a predetermined threshold value. Have.

本発明によれば、データ間の関係性に基づき、検知対象データの異常検知を精度よく実行することができる。 According to the present invention, it is possible to accurately detect abnormality of detection target data based on the relationship between data.

図１は、実施の形態１に係る異常検知装置の構成の一例を示すブロック図である。FIG. 1 is a block diagram illustrating an example of the configuration of the abnormality detection device according to the first embodiment. 図２は、実施の形態１における処理対象のデータ構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a data configuration to be processed in the first embodiment. 図３は、図１に示す乖離値ベクトル計算部が行う乖離値ベクトルの計算処理の例を示す図である。FIG. 3 is a diagram illustrating an example of a deviation value vector calculation process performed by the deviation value vector calculation unit illustrated in FIG. 1. 図４は、図１に示す乖離値ベクトル計算部が行う乖離値ベクトルの計算処理の他の例を示す図である。FIG. 4 is a diagram illustrating another example of the deviation value vector calculation process performed by the deviation value vector calculation unit illustrated in FIG. 1. 図５は、図１に示す異常度計算部による異常度計算処理を説明する図である。FIG. 5 is a diagram for explaining the degree of abnormality calculation processing by the degree of abnormality calculation unit shown in FIG. 図６が、図１に示す異常度計算部による異常度計算処理の他の例を説明する図である。FIG. 6 is a diagram for explaining another example of the abnormality degree calculation processing by the abnormality degree calculation unit shown in FIG. 図７は、図１に示す異常検知装置が実行する異常検知処理の処理手順を示すフローチャートである。FIG. 7 is a flowchart showing a processing procedure of abnormality detection processing executed by the abnormality detection device shown in FIG. 図８は、実施の形態１の異常検知処理を説明する図である。FIG. 8 is a diagram for explaining the abnormality detection process of the first embodiment. 図９は、実施の形態３に係る異常度計算処理を説明する図である。FIG. 9 is a diagram for explaining abnormality degree calculation processing according to the third embodiment. 図１０は、実施の形態３に係る異常度計算処理を説明する図である。FIG. 10 is a diagram for explaining the abnormality degree calculation processing according to the third embodiment. 図１１は、実施の形態３に係る異常度計算処理及び異常判定処理を説明する図である。FIG. 11 is a diagram for explaining an abnormality degree calculation process and an abnormality determination process according to the third embodiment. 図１２は、プログラムが実行されることにより、異常検知装置が実現されるコンピュータの一例を示す図である。FIG. 12 is a diagram illustrating an example of a computer in which an abnormality detection apparatus is realized by executing a program. 図１３は、従来技術に係る異常検知方法を説明するための図である。FIG. 13 is a diagram for explaining an abnormality detection method according to the related art. 図１４は、従来技術に係る異常検知方法を説明するための図である。FIG. 14 is a diagram for explaining an abnormality detection method according to the related art. 図１５は、従来技術に係る異常検知方法を説明するための図である。FIG. 15 is a diagram for explaining the abnormality detection method according to the related art.

以下、図面を参照して、本発明の一実施の形態を詳細に説明する。なお、この実施の形態により本発明が限定されるものではない。また、図面の記載において、同一部分には同一の符号を付して示している。 Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings. In addition, this invention is not limited by this embodiment. Moreover, in description of drawing, the same code | symbol is attached | subjected and shown to the same part.

［実施の形態１］
まず、第一の実施形態について説明する。以下の実施形態では、第一の実施形態に係る異常検知装置の構成、異常検知装置による処理の流れを説明する。 [Embodiment 1]
First, the first embodiment will be described. In the following embodiments, the configuration of the abnormality detection device according to the first embodiment and the flow of processing by the abnormality detection device will be described.

［異常検知装置の構成］
図１は、実施の形態１に係る異常検知装置１０の構成の一例を示すブロック図である。図１に示すように、第一の実施形態に係る異常検知装置１０は、通信処理部１１、制御部１２及び記憶部１３を有する。 [Configuration of anomaly detection device]
FIG. 1 is a block diagram illustrating an example of the configuration of the abnormality detection apparatus 10 according to the first embodiment. As illustrated in FIG. 1, the abnormality detection device 10 according to the first embodiment includes a communication processing unit 11, a control unit 12, and a storage unit 13.

通信処理部１１は、接続される端末装置２０との間でやり取りする各種情報に関する通信を制御する。例えば、通信処理部１１は、比較対象であるデータ、検知対象となるデータ、及び、検知対象となるデータに対する異常検知処理の要求を端末装置２０から受信する。また、例えば、通信処理部１１は、異常検知処理の処理結果を端末装置２０に対して送信する。 The communication processing unit 11 controls communication related to various types of information exchanged with the connected terminal device 20. For example, the communication processing unit 11 receives from the terminal device 20 a request for abnormality detection processing for data to be compared, data to be detected, and data to be detected. For example, the communication processing unit 11 transmits the processing result of the abnormality detection process to the terminal device 20.

制御部１２は、各種の処理手順などを規定したプログラム及び所要データを格納するための内部メモリを有し、これらによって種々の処理を実行する。例えば、制御部１２は、ＣＰＵやＭＰＵ（Micro Processing Unit）などの電子回路である。制御部１２は、関係性推定部１２１、乖離値ベクトル計算部１２２、異常度計算部１２３及び異常判定部１２４を有する。 The control unit 12 has an internal memory for storing a program that defines various processing procedures and the necessary data, and executes various processes using these programs. For example, the control unit 12 is an electronic circuit such as a CPU or MPU (Micro Processing Unit). The control unit 12 includes a relationship estimation unit 121, a divergence value vector calculation unit 122, an abnormality degree calculation unit 123, and an abnormality determination unit 124.

関係性推定部１２１は、データ間の関係性を推定し、データ間の関係性を示すパラメータを算出する。例えば、与えられたデータについて、データ間の関係性が式として与えられているものの、パラメータに相当するものが未定である場合に、パラメータを推定する。具体的には、データＸとデータＹとの関係性が、「Ｙ＝ａＸ＋ｂ」という単回帰であることは与えられているが、「ａ」及び「ｂ」が不明な場合に、関係性推定部１２１は、与えられたデータを基に「ａ」及び「ｂ」を推定する。この場合、関係性推定部１２１は、それを出力した機器等が正常な状態のデータ、言い換えると、異常な状態のデータを含まないデータを、パラメータ推定のために用いることが望ましい。 The relationship estimation unit 121 estimates the relationship between data and calculates a parameter indicating the relationship between data. For example, for the given data, the parameter is estimated when the relationship between the data is given as an equation, but the data corresponding to the parameter is undetermined. Specifically, it is given that the relationship between the data X and the data Y is a single regression of “Y = aX + b”, but the relationship estimation is performed when “a” and “b” are unknown. The unit 121 estimates “a” and “b” based on the given data. In this case, it is desirable that the relationship estimation unit 121 uses data in which the device or the like that outputs the data is in a normal state, in other words, data that does not include data in an abnormal state for parameter estimation.

乖離値ベクトル計算部１２２は、データ間の関係性に基づき、関係性からの乖離を表す乖離値ベクトルを計算する。実施の形態１では、乖離値ベクトル計算部１２２は、データ間の関係性に基づいて、検知対象である検知データの、データ間の関係性からの乖離を表す乖離値ベクトルを計算する。そして、乖離値ベクトル計算部１２２は、データ間の関係性に基づいて、比較対象である比較データの、データ間の関係性からの乖離を表す乖離値ベクトルを計算する。すなわち、乖離値ベクトル計算部１２２は、データ間に何らかの関係性が見られる場合に、その関係性からの乖離を示す値を、検知データ及び比較データについて計算する。なお、乖離値ベクトル計算部１２２は、データ間の乖離値ベクトルを、記憶部１３（後述）の乖離値ベクトル記憶部１３１に記憶させてもよい。また、乖離値ベクトル計算部１２２は、関係性推定部１２１が算出したデータ間の関係性を示すパラメータを、データ間の関係性に適用し、データ間の乖離値ベクトルを計算する。なお、このデータ間の関係性は、予め与えられたものであってもよい。 The divergence value vector calculation unit 122 calculates a divergence value vector representing the divergence from the relationship based on the relationship between the data. In the first embodiment, the divergence value vector calculation unit 122 calculates a divergence value vector representing the divergence of the detection data that is the detection target from the relationship between the data based on the relationship between the data. Then, the divergence value vector calculation unit 122 calculates a divergence value vector representing the divergence from the relationship between the data of the comparison data to be compared based on the relationship between the data. That is, the divergence value vector calculation unit 122 calculates a value indicating a divergence from the relationship for the detection data and the comparison data when some relationship is found between the data. The divergence value vector calculation unit 122 may store a divergence value vector between data in a divergence value vector storage unit 131 of the storage unit 13 (described later). Further, the divergence value vector calculation unit 122 applies the parameter indicating the relationship between the data calculated by the relationship estimation unit 121 to the relationship between the data, and calculates a divergence value vector between the data. The relationship between the data may be given in advance.

異常度計算部１２３は、検知データの乖離値ベクトルの、比較データの乖離値ベクトルの集合からの離散度合を計算する。具体的には、異常度計算部１２３は、検知データの乖離値ベクトルの、比較データの乖離値ベクトルの集合からの、空間的な距離や密度などに基づき、離散度合を計算する。この異常度計算部１２３が計算した離散度合は、異常を示す異常度として、異常判定部１２４（後述）における判定において用いられる。なお、異常度計算部１２３は、乖離値ベクトル記憶部１３１（後述）が記憶する乖離値ベクトルを用いて離散度合を計算してもよい。 The abnormality degree calculation unit 123 calculates the degree of discreteness of the deviation value vector of the detection data from the set of deviation value vectors of the comparison data. Specifically, the degree-of-abnormality calculation unit 123 calculates a discrete degree based on a spatial distance, a density, and the like of a deviation value vector of detection data from a set of deviation value vectors of comparison data. The discrete degree calculated by the abnormality degree calculation unit 123 is used in the determination in the abnormality determination unit 124 (described later) as the abnormality degree indicating abnormality. Note that the degree of abnormality calculation unit 123 may calculate the degree of discreteness using a deviation value vector stored in a deviation value vector storage unit 131 (described later).

異常判定部１２４は、異常度計算部１２３が計算した離散度合が所定の閾値を超えた場合に、検知データは異常であることを判定する。異常判定部１２４は、離散度合が所定の閾値以下である場合に、検知データは正常であることを判定する。異常判定部１２４による判定結果は、異常検知結果として、通信処理部１１を介して、例えば、端末装置２０に出力される。 The abnormality determination unit 124 determines that the detected data is abnormal when the discrete degree calculated by the abnormality degree calculation unit 123 exceeds a predetermined threshold. The abnormality determination unit 124 determines that the detection data is normal when the discrete degree is equal to or less than a predetermined threshold. The determination result by the abnormality determination unit 124 is output to the terminal device 20 through the communication processing unit 11 as an abnormality detection result, for example.

記憶部１３は、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、又は、ハードディスク、光ディスク等の記憶装置によって実現され、異常検知装置１０を動作させる処理プログラムや、処理プログラムの実行中に使用されるデータなどが記憶される。記憶部１３は、乖離値ベクトル計算部１２２が計算した乖離値ベクトルを記憶する乖離値ベクトル記憶部１３１を有する。 The storage unit 13 is realized by a semiconductor memory device such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk, and a processing program for operating the abnormality detection device 10 or a processing program The data used during the execution of is stored. The storage unit 13 includes a deviation value vector storage unit 131 that stores the deviation value vector calculated by the deviation value vector calculation unit 122.

［処理対象のデータの例］
次に、異常検知装置１０における処理対象のデータの例について説明する。図２は、異常検知装置１０における処理対象のデータ構成の一例を示す図である。 [Example of data to be processed]
Next, an example of data to be processed in the abnormality detection apparatus 10 will be described. FIG. 2 is a diagram illustrating an example of a data structure to be processed in the abnormality detection apparatus 10.

図２に示すように、例えば、検知データとして、「Ｘ^１」〜「Ｘ^Ｎ」というＮ種類のデータが与えられたとする。図２において、「ｘ^ｎ _ｍ」は、ｎ番目のデータのｍにおける観測値である。添字の「ｍ」は、観測された地点や時点を意味する。例えば、「Ｘ^１」〜「Ｘ^Ｎ」がユーザ「１」からユーザ「Ｎ」を示し、データの要素が商品「ｍ」の購入の有無を表す場合、添字が等しいデータは、同一商品の購入の有無を意味する。或いは、「Ｘ^１」〜「Ｘ^Ｎ」がサーバ「１」からサーバ「Ｎ」のＣＰＵ使用率を示し、データの要素が時点「ｍ」におけるＣＰＵ使用率を表す場合、添字が等しいデータは、観測時点が等しいことを意味する。 As shown in FIG. 2, for example, it is assumed that N types of data “X ¹ ” to “X ^N ” are given as detection data. In FIG. 2, “x ⁿ _m ” is an observed value at m of the n-th data. The subscript “m” means an observed point or time point. For example, when “X ¹ ” to “X ^N ” indicate the user “N” from the user “1” and the data element indicates the presence / absence of purchase of the product “m”, the data with the same subscript is the purchase of the same product It means presence or absence. Alternatively, when “X ¹ ” to “X ^N ” indicate the CPU usage rate of the server “1” to the server “N” and the data element indicates the CPU usage rate at the time “m”, the data with the same subscript is It means that observation time is equal.

［関係性推定部の処理］
関係性推定部１２１は、「Ｘ^１」〜「Ｘ^Ｎ」の間に成り立つ関係性を推定し、「Ｘ^１」〜「Ｘ^Ｎ」間の関係性を示すパラメータを算出する。 [Relationship Estimator Processing]
The relationship estimating unit 121 estimates the relationship that holds between the "X ^1" - "X ^N", calculates a parameter indicating the relationship between "X ^1" - "X ^N".

この場合、関係性推定部１２１は、それを出力した機器等が正常な状態のデータ、言い換えると、異常な状態のデータを含まないデータを、パラメータ算出のために用いることが望ましい。正常な状態のデータであるか否かは、機器が正常であったことから判断してもよい。また、正常な状態のデータであるか否かは、データを見て、異常値に相当するものを含まないことなどを基に、人間が視認して判断してもよい。これは、異常判定部１２４での異常検知において、「正常と異なる」ことを「異常」とみなすという指標を用いるためである。さらに、関係性推定部１２１において、正常なデータのみを用いて、正常な状態のデータに成り立つ関係式を推定することで、異常検知の感度を向上させることが期待できる。 In this case, it is desirable that the relationship estimation unit 121 use data for the parameter calculation using data in which the device or the like that outputs the data is in a normal state, in other words, data that does not include data in an abnormal state. Whether the data is in a normal state may be determined from the fact that the device is normal. Further, whether or not the data is in a normal state may be determined visually by human eyes based on the fact that the data does not include data corresponding to abnormal values. This is because an indicator that “differs from normal” is regarded as “abnormal” in the abnormality detection in the abnormality determination unit 124 is used. Furthermore, the relationship estimation unit 121 can be expected to improve the sensitivity of abnormality detection by estimating a relational expression that holds normal data using only normal data.

例えば、データ間の関係性が式として与えられているものの、パラメータに相当するものが未定である場合に、関係性推定部１２１は、このパラメータを推定する。具体的には、データＸとデータＹとの関係性が、「Ｙ＝ａＸ＋ｂ」という単回帰であることは与えられているが、「ａ」及び「ｂ」が不明な場合に、関係性推定部１２１は、与えられたデータを基に「ａ」及び「ｂ」を推定する。 For example, if the relationship between the data is given as an equation, but the one corresponding to the parameter is undetermined, the relationship estimating unit 121 estimates this parameter. Specifically, it is given that the relationship between the data X and the data Y is a single regression of “Y = aX + b”, but the relationship estimation is performed when “a” and “b” are unknown. The unit 121 estimates “a” and “b” based on the given data.

また、関係性推定部１２１は、データ間の関係性として、例えば、「Ｘ^１」〜「Ｘ^Ｎ」のいずれかを目的変数、残りを説明変数とする重回帰式が与えられる場合、この重回帰式のパラメータを求める。また、関係性推定部１２１は、データ間の関係性として、「Ｘ^１」〜「Ｘ^Ｎ」の中から２組ずつを選択し、２組ごとに一方を目的変数とし、他方を説明変数とする単回帰式が与えられる場合、この単回帰式のパラメータを求める。 In addition, when the relationship estimation unit 121 is given a multiple regression equation with any of “X ¹ ” to “X ^N ” as an objective variable and the rest as explanatory variables, for example, Find the regression parameters. Further, the relationship estimation unit 121 selects two sets of “X ¹ ” to “X ^N ” as the relationship between the data, and sets one of the two sets as an objective variable and the other as an explanatory variable. When a single regression equation is given, parameters of this single regression equation are obtained.

或いは、関係性推定部１２１は、データ間の関係性として、各「Ｘ^Ｎ」が系列データであり、過去のデータから将来を予測する自己回帰式またはベクトル自己回帰式が与えられる場合、この自己回帰式またはベクトル自己回帰式のパラメータを求めてもよい。添え字の「１，２，…，ｍ，…」に順序性がある場合、例えば、前述のサーバのＣＰＵ使用率がデータである例などの場合である。 Alternatively, as the relationship between data, each of the “X ^N ” is series data, and the relationship estimation unit 121 provides this self-regression formula or vector auto-regression formula for predicting the future from past data. You may obtain | require the parameter of a regression type or a vector autoregressive type. The subscript “1, 2,..., M,...” Has an order, for example, an example in which the CPU usage rate of the server is data.

また、データ間の関係性は、混合分布モデルでモデリングしてあってもよいし、より複雑な非線形な関係を表す式で示されたものであってもよい。例えば、何らかの確率分布を用いて、データ間の関係性を表してもよい。例えば、データ間の関係性が「Ｋ」個のクラスタを持つ、混合分布モデルにより表現される場合、関係性推定部１２１は、（１）式に示す関係式のパラメータを求める。 Further, the relationship between data may be modeled by a mixed distribution model, or may be expressed by an expression representing a more complicated nonlinear relationship. For example, the relationship between data may be expressed using some probability distribution. For example, when the relationship between the data is expressed by a mixed distribution model having “K” clusters, the relationship estimation unit 121 obtains a parameter of the relationship expressed by equation (1).

なお、「Ｘ^１」〜「Ｘ^Ｎ」の間に成り立つ関係性が予め与えられている場合には、本実施の形態では、関係性推定部１２１の算出処理を省略することができる。 Note that in the present embodiment, when the relationship that holds between “X ¹ ” to “X ^N ” is given in advance, the calculation processing of the relationship estimation unit 121 can be omitted.

［乖離値ベクトル計算部の処理］
続いて、乖離値ベクトル計算部１２２は、「Ｘ^１」〜「Ｘ^Ｎ」および「Ｘ^１」〜「Ｘ^Ｎ」の間に成り立つ関係性から、乖離値ベクトルを計算する。図３は、乖離値ベクトル計算部１２２が行う乖離値ベクトルの計算処理の例を示す図である。 [Processing of deviation value vector calculation unit]
Subsequently, the divergence value vector calculation unit 122 calculates a divergence value vector from the relationship established between “X ¹ ” to “X ^N ” and “X ¹ ” to “X ^N ”. FIG. 3 is a diagram illustrating an example of a deviation value vector calculation process performed by the deviation value vector calculation unit 122.

図３に示す例では、乖離値ベクトルは、添字「ｍ」ごとに計算されるものとしている。また、この例では、図３に示すデータ「Ｘ^１」〜「Ｘ^Ｎ」が与えられ、この「Ｘ^１」〜「Ｘ^Ｎ」の間に成り立つ関係性を、「Ｆ（Ｘ^１，Ｘ^２，・・・，Ｘ^Ｎ)＝０」としている。この例では、関係式が、「Ｘ^１」〜「Ｘ^Ｎ」のいずれかを目的変数、残りを説明変数とする重回帰式であることをイメージしている。 In the example illustrated in FIG. 3, the divergence value vector is calculated for each subscript “m”. Further, in this example, data “X ¹ ” to “X ^N ” shown in FIG. 3 are given, and the relationship established between these “X ¹ ” to “X ^N ” is expressed by “F (X ¹ , X ^2). ,..., X ^N ) = 0 ”. In this example, it is assumed that the relational expression is a multiple regression expression in which any one of “X ¹ ” to “X ^N ” is an objective variable and the rest is an explanatory variable.

具体的には、図３における「Ｘ^１の乖離値」は、添字「ｍ」における観測値「ｘ^１ _ｍ」と、「ｍ」において観測されたデータから、関係性を用いて推定される「Ｘ^１」の値「Ｆ（ｘ^１ _ｍ，ｘ^２ _ｍ，・・・，ｘ^Ｎ _ｍ)」と、の差である。また、「Ｘ^２の乖離値」は、添字「ｍ」における観測値「ｘ^２ _ｍ」と、「ｍ」において観測されたデータから、関係性を用いて推定される「Ｘ^２」の値「Ｆ（ｘ^１ _ｍ，ｘ^２ _ｍ，・・・，ｘ^Ｎ _ｍ)」と、の差である。乖離値ベクトル計算部１２２は、図３に示す計算処理を行うことによって、「Ｘ^１」〜「Ｘ^Ｎ」の各乖離値を計算する。そして、図３に示すように、乖離値ベクトル計算部１２２は、共通の添字「ｍ」を持つ複数種類のデータから計算した乖離値を、乖離値ベクトルとして出力する。 Specifically, the “divergence value of X ¹ ” in FIG. 3 is estimated using the relationship from the observed value “x ¹ _m ” at the subscript “m” and the data observed at “m”. And the value of “X ¹ ” “F (x ¹ _m , x ² _m ,..., X ^N _m )”. Further, "divergence values of X ^2" is the observed value of the subscript "m" and "x ² _m" from the observed data in the "m", the value of "X ^2" which is estimated using the relationship " F (x ¹ _m , x ² _m ,..., X ^N _m ) ”. The divergence value vector calculation unit 122 calculates each divergence value of “X ¹ ” to “X ^N ” by performing the calculation process illustrated in FIG. 3. Then, as shown in FIG. 3, the divergence value vector calculation unit 122 outputs the divergence value calculated from a plurality of types of data having a common subscript “m” as a divergence value vector.

また、データ間の関係性は「Ｆ（Ｘ^１，Ｘ^２，・・・，Ｘ^Ｎ)＝０」のように、全てのデータに対して一つの関係性が与えられる場合だけでなく、データの組に対して与えられる場合もある。例えば、「Ｘ^１」〜「Ｘ^Ｎ」の中から２組ずつを選択し、２組ごとに一方を目的変数、他方を説明変数とする単回帰式で関連性が与えられる場合である。そこで、図４を参照して、この場合における乖離値ベクトルの計算処理について説明する。図４は、乖離値ベクトル計算部１２２が行う乖離値ベクトルの計算処理の他の例を示す図である。 In addition, the relationship between data is not only when one relationship is given to all data, such as “F (X ¹ , X ² ,..., X ^N ) = 0”, but also data In some cases. For example, there is a case where two sets are selected from “X ¹ ” to “X ^N ”, and relevance is given by a single regression equation in which one set is an objective variable and the other is an explanatory variable. Therefore, the divergence value vector calculation process in this case will be described with reference to FIG. FIG. 4 is a diagram illustrating another example of a divergence value vector calculation process performed by the divergence value vector calculation unit 122.

図４に示すように、例えば、「Ｘ^１のＸ^２から見た乖離値」は、添字「ｍ」における観測値「ｘ^１ _ｍ」と、「ｍ」において観測されたデータから、「Ｘ^１」と「Ｘ^２」の関係性「Ｆ^１２」を用いて推定される「Ｘ^１」の値「Ｆ^１２(ｘ^１ _ｍ，ｘ^２ _ｍ)」と、の差を乖離値である。乖離値ベクトル計算部１２２は、図４に示す計算処理を行うことによって、「Ｘ^１のＸ^２から見た乖離値」〜「Ｘ^Ｎ−１のＸ^Ｎから見た乖離値」を計算する。そして、乖離値ベクトル計算部１２２は、図４に示すように、共通の添字「ｍ」を持つ複数種類のデータから計算した乖離値を、乖離値ベクトルとして出力する。 As shown in FIG. 4, for example, “the divergence value of X ¹ as viewed from X ² ” is obtained from the observed value “x ¹ _m ” at the subscript “m” and the data observed at “m” as “X ¹ ”And the value“ F ¹² (x ¹ _m , x ² _m ) ”of“ X ¹ ”estimated using the relationship“ F ¹² ”between“ X ² ”and the difference value. Divergence value vector calculation unit 122, by performing a calculation process shown in FIG. 4, calculates a "divergence values viewed from X ² of X ^1" - "divergence values viewed from X ^N-1 of X ^N". Then, as shown in FIG. 4, the divergence value vector calculation unit 122 outputs divergence values calculated from a plurality of types of data having a common subscript “m” as divergence value vectors.

なお、図４では特に指定していないが、データ間の関係性が一部の組み合わせに対してのみ成り立つと考えてもよい。例えば、「Ｘ^１」と「Ｘ^２」の間は、相関係数が大きく相関関係が認められるのに対し、「Ｘ^１」と「Ｘ^３」の間は、相関係数が小さく相関関係が認められないような場合である。このような場合、乖離値ベクトル計算部１２２は、関係性が認められるものに対してのみ乖離値を計算すればよい。 Although not particularly specified in FIG. 4, it may be considered that the relationship between data holds only for some combinations. For example, the correlation coefficient is large between “X ¹ ” and “X ² ”, whereas the correlation coefficient is small between “X ¹ ” and “X ³ ”. This is a case where it is not allowed. In such a case, the divergence value vector calculation unit 122 may calculate the divergence value only for those for which a relationship is recognized.

また、データ間の関係性が（１）式に示す混合分布モデルで表現される場合には、乖離値ベクトル計算部１２２は、以下の（２）式で定義される各クラスタへの帰属度「ｍ_ｋ」を計算する。帰属度が大きい程、そのクラスタへの帰属度が高い、すなわち、そのクラスタの中心点に近いと言える。言い換えると、帰属度が大きい程、そのクラスタへの乖離度が小さいと言える。 Further, when the relationship between the data is expressed by the mixed distribution model shown in the equation (1), the divergence value vector calculation unit 122 assigns the degree of belonging to each cluster defined by the following equation (2) “ m _k "is calculated. It can be said that the greater the degree of belonging, the higher the degree of belonging to the cluster, that is, closer to the center point of the cluster. In other words, it can be said that the greater the degree of attribution, the smaller the degree of deviation from the cluster.

この場合、乖離値ベクトル計算部１２２は、（２）式を用いて計算した帰属度「ｍ_ｋ」に対し、乖離値ベクトルとして、「（ｍ_１，ｍ_２，・・・，ｍ_Ｋ）」を求める。或いは、乖離値ベクトル計算部１２２は、（２）式を用いて計算した帰属度「ｍ_ｋ」に対し、乖離値ベクトルとして、「−logΣπ^ｋP（ｘ｜θ^ｋ）」のように、負の対数尤度を計算してもよい。なお、この場合には、乖離値ベクトルは、１次元となる。 In this case, the divergence value vector calculation unit 122 uses “(m ₁ , m ₂ ,..., M _K )” as the divergence value vector for the degree of membership “m _k ” calculated using the equation (2). Ask for. Alternatively, the divergence value vector calculation unit 122 has a negative value such as “−logΣπ ^k P (x | θ ^k )” as the divergence value vector with respect to the degree of membership “m _k ” calculated using the equation (2). The log likelihood may be calculated. In this case, the divergence value vector is one-dimensional.

乖離値ベクトル計算部１２２は、上記に示したような計算処理を行うことによって、データ間の関係性に基づいて検知データ及び比較データのデータ間の関係性からの乖離を表す乖離値ベクトルを計算する。なお、一般的には、比較データは複数存在する。もちろん、比較データは、一つでもよい。 The divergence value vector calculation unit 122 calculates the divergence value vector representing the divergence from the relationship between the detection data and the comparison data based on the relationship between the data by performing the calculation process as described above. To do. In general, there are a plurality of comparison data. Of course, the comparison data may be one.

［異常度計算部の処理］
次に、異常度計算部１２３の処理について説明する。異常度計算部１２３は、乖離値ベクトル計算部１２２が、検知データから計算した乖離値ベクトルと、比較データから計算した乖離値ベクトルと、を用いて、離散度合を計算する。 [Processing of abnormality level calculation part]
Next, the processing of the abnormality degree calculation unit 123 will be described. The degree-of-abnormality calculation unit 123 calculates a discrete degree using the divergence value vector calculated from the detection data by the divergence value vector calculation unit 122 and the divergence value vector calculated from the comparison data.

具体的には、異常度計算部１２３は、検知データの乖離値ベクトルが、比較データの乖離値ベクトルの集合から、空間的にどのくらい離れているかを計算する。この場合、異常度計算部１２３は、例えばｋ−ＮＮ（k−nearest neighbor method）法を用いて、検知データの乖離値ベクトルが、比較データの乖離値ベクトルの集合から、空間的にどのくらい離れているかを計算する。図５を参照して、異常度計算部１２３が、ｋ−ＮＮ法を用いて、離散度合を計算した場合について説明する。 Specifically, the degree-of-abnormality calculation unit 123 calculates how far the deviation value vector of the detection data is spatially separated from the set of deviation value vectors of the comparison data. In this case, the degree-of-abnormality calculation unit 123 uses, for example, a k-NN (k-nearest neighbor method) method to determine how far the deviation value vector of the detection data is spatially separated from the set of deviation value vectors of the comparison data. To calculate. With reference to FIG. 5, the case where the degree-of-abnormality calculation unit 123 calculates the discrete degree using the k-NN method will be described.

図５は、異常度計算部１２３による異常度計算処理を説明する図である。図５は、乖離値ベクトルの次元１〜次元３に対し、乖離値ベクトル計算部１２２が比較データ及び検知データから計算した各乖離値ベクトルをプロットした図である。図５において、原点近傍に位置する白丸のデータ群Ｒ１は、比較データから計算された乖離値ベクトルに対応する。点Ｐ１は、検知データから計算した乖離値ベクトルに対応する（図５の（１）参照）。ｋ−ＮＮ法では、検知データの乖離値に対応する点Ｐ１から見て、ｋ番目に近い点Ｐｋまでの距離を、異常度（離散度合）として計算する（図５の（２）参照）。ここで、「ｋ」は、パラメータであり、ヒューリスティックスを用いて設定される。 FIG. 5 is a diagram for explaining the abnormality degree calculation processing by the abnormality degree calculation unit 123. FIG. 5 is a diagram in which each divergence value vector calculated by the divergence value vector calculation unit 122 from the comparison data and the detection data is plotted with respect to the first to third divergence value vectors. In FIG. 5, a white circle data group R1 located in the vicinity of the origin corresponds to a deviation value vector calculated from the comparison data. The point P1 corresponds to the deviation value vector calculated from the detection data (see (1) in FIG. 5). In the k-NN method, the distance to the point Pk closest to the kth point as viewed from the point P1 corresponding to the deviation value of the detection data is calculated as the degree of abnormality (discrete degree) (see (2) in FIG. 5). Here, “k” is a parameter and is set using heuristics.

また、異常度計算部１２３は、検知データの乖離値ベクトルが、比較データの乖離値ベクトルの集合から、どのくらい空間的に疎な位置に存在するかを計算してもよい。この場合、異常度計算部１２３は、例えば、ＬＯＦを用いて検知データの乖離値ベクトルが、比較データの乖離値ベクトルの集合から、空間的にどのくらい離れているかを計算する。 In addition, the degree of abnormality calculation unit 123 may calculate how spatially sparse the divergence value vectors of the detection data exist from the set of divergence value vectors of the comparison data. In this case, the degree-of-abnormality calculation unit 123 calculates, for example, how far the divergence value vector of the detection data is spatially separated from the set of divergence value vectors of the comparison data using LOF.

図６は、異常度計算部１２３による異常度計算処理の他の例を説明する図である。ＬＯＦは、空間内での局所的密度を計算する手法である。図６の白丸のデータ群Ｒ１は、比較データの乖離値ベクトルに対応する点の集まりであり、点Ｐ１は、検知データから計算した乖離値ベクトルに対応する点である（図６の（１）参照）。 FIG. 6 is a diagram for explaining another example of the abnormality degree calculation processing by the abnormality degree calculation unit 123. LOF is a technique for calculating local density in space. The white circle data group R1 in FIG. 6 is a collection of points corresponding to the deviation value vector of the comparison data, and the point P1 is a point corresponding to the deviation value vector calculated from the detected data ((1) in FIG. 6). reference).

具体的には、検知データの乖離値（点Ｐ１）から見て、ｋ番目までに近い点Ｐｋまでの距離の平均を、それらｋ番目の点Ｐｋから見てｍ番目までに近い点Ｐｍまでの距離の平均で割った値を異常度（離散度合）として計算する（図６の（２）参照）。例えば、データ群Ｒ１の密度の高い位置に、検知データの乖離値ベクトルに対応する点Ｐ１があった場合には、点Ｐ１からｋ番目に近い点Ｐｋまでの距離の平均が小さくなり、点Ｐｋから点Ｐｍまでの距離も小さくなるため、離散度合は小さくなる。一方、データ群Ｒ１の密度の低い位置に点Ｐ１があった場合には、点Ｐ１からｋ番目に近い点Ｐｋまでの距離の平均が大きくなるため、離散度合は小さくなる。なお、「ｋ」及び「ｍ」は、パラメータであり、ヒューリスティックスを用いて設定される。 Specifically, the average of the distances to the points Pk closest to the kth, as viewed from the divergence value (point P1) of the detection data, is obtained from the kth point Pk to the point Pm closest to the mth. The value divided by the average of the distance is calculated as the degree of abnormality (discrete degree) (see (2) in FIG. 6). For example, when there is a point P1 corresponding to the deviation value vector of the detected data at a high density position in the data group R1, the average of the distances from the point P1 to the kth closest point Pk becomes small, and the point Pk Since the distance from to point Pm is also small, the degree of discreteness is small. On the other hand, when the point P1 is located at a low density position in the data group R1, the average of the distance from the point P1 to the kth closest point Pk is large, so the degree of discreteness is small. “K” and “m” are parameters and are set using heuristics.

また、比較データは、正常な状態のデータに限定することで、後述する異常判定部１２４の異常検知精度を高めることができる。「正常な状態のデータ」の定義は前述の通りである。また、正常な状態のデータのみを比較データとした場合、乖離値ベクトルは、空間的には局所に集中することに注意しておく。例えば図５及び図６において説明した方法を用いて乖離値ベクトルを計算すると、空間的には原点近傍に乖離値ベクトルが集中する。また、（２）式に示す帰属度「ｍ_ｋ」に対し、乖離値ベクトルを（ｍ_１，ｍ_２，・・・，ｍ_Ｋ）で定義した場合は、乖離値ベクトルは、空間的にはＫ個のクラスタに集中する。 Further, by limiting the comparison data to data in a normal state, it is possible to improve the abnormality detection accuracy of the abnormality determination unit 124 described later. The definition of “normal state data” is as described above. In addition, when only normal data is used as comparison data, it should be noted that the divergence value vectors are spatially concentrated locally. For example, when the divergence value vector is calculated using the method described with reference to FIGS. 5 and 6, the divergence value vector is spatially concentrated near the origin. In addition, when the divergence value vector is defined as (m ₁ , m ₂ ,..., M _K ) with respect to the degree of membership “m _k ” shown in equation (2), the divergence value vector is spatially Concentrate on K clusters.

また、検知データと比較データとが別々に与えられる場合がある。例えば、ある特定の過去１日分の複数のサーバのＣＰＵ使用率を比較データとして異常検知装置１０に届き、検知データは、異常検知装置１０の運用時に逐次的に届くような場合である。このような場合、比較データの乖離値ベクトルを、検知対象のデータが届くたびに計算し直すことは計算リソース上、効率的ではない。そこで、記憶部１３の乖離値ベクトル記憶部１３１は、このような場合に比較データの乖離値を再計算する必要がないように、比較データの乖離値ベクトルを記憶しておく。乖離値ベクトル記憶部１３１を利用する場合、異常度計算部１２３は、検知データの乖離値ベクトルと、乖離値ベクトル記憶部１３１が記憶する乖離値ベクトルと、を比較する。 Further, the detection data and the comparison data may be given separately. For example, the CPU usage rates of a plurality of servers for a specific past day reach the abnormality detection device 10 as comparison data, and the detection data sequentially arrives when the abnormality detection device 10 is operated. In such a case, it is not efficient in terms of calculation resources to recalculate the divergence value vector of the comparison data each time the detection target data arrives. Therefore, the divergence value vector storage unit 131 of the storage unit 13 stores the divergence value vector of the comparison data so that it is not necessary to recalculate the divergence value of the comparison data in such a case. When the deviation value vector storage unit 131 is used, the abnormality degree calculation unit 123 compares the deviation value vector of the detected data with the deviation value vector stored in the deviation value vector storage unit 131.

そして、乖離値ベクトル記憶部１３１を利用する場合も、比較データとして、正常な状態のデータから計算した乖離値ベクトルのみを記憶させることで、異常検知精度を高めることができる。 Even when the divergence value vector storage unit 131 is used, the abnormality detection accuracy can be improved by storing only the divergence value vector calculated from the normal state data as the comparison data.

［異常判定部の処理］
異常判定部１２４は、異常度計算部１２３が計算した離散度合が所定の閾値を超えた場合に、検知データは異常であることを判定する。異常判定部１２４は、離散度合が所定の閾値以下である場合に、検知データは正常であることを判定する。 [Abnormality judgment unit processing]
The abnormality determination unit 124 determines that the detected data is abnormal when the discrete degree calculated by the abnormality degree calculation unit 123 exceeds a predetermined threshold. The abnormality determination unit 124 determines that the detection data is normal when the discrete degree is equal to or less than a predetermined threshold.

ここで、判定の基準となる閾値は、予め設定されたものである。或いは、テストデータがある場合は、テストデータ中の特定のデータ、すなわち、異常が発生した際のデータにおける異常度を閾値として設定してもよい。または、テストデータにおける異常度が、適当な確率分布に従うと考え、その上位５％或いは上位１％などの値を閾値として設定してもよい。異常判定部１２４による判定結果は、異常検知結果として、通信処理部１１を介して、例えば、端末装置２０に出力される。 Here, the threshold value used as the criterion for determination is set in advance. Alternatively, when there is test data, the degree of abnormality in specific data in the test data, that is, data when an abnormality occurs may be set as a threshold value. Alternatively, the degree of abnormality in the test data may be considered to follow an appropriate probability distribution, and a value such as the top 5% or the top 1% may be set as the threshold value. The determination result by the abnormality determination unit 124 is output to the terminal device 20 through the communication processing unit 11 as an abnormality detection result, for example.

［異常検知処理の流れ］
次に、異常検知装置１０が実行する異常検知処理について説明する。図７は、異常検知装置１０が実行する異常検知処理の処理手順を示すフローチャートである。 [Flow of error detection processing]
Next, the abnormality detection process executed by the abnormality detection device 10 will be described. FIG. 7 is a flowchart showing a processing procedure of an abnormality detection process executed by the abnormality detection device 10.

まず、異常検知装置１０では、関係性推定部１２１が、入力されたデータに対して、データ間の関係性を推定し、データ間の関係性を示すパラメータを算出する関係性推定処理を行う（ステップＳ１）。関係性推定部１２１は、それを出力した機器等が正常な状態のデータ、言い換えると、異常な状態のデータを含まないデータを、パラメータ推定のために用いる。データ間に成り立つ関係性が予め与えられている場合には、本ステップＳ１を省略することができる。 First, in the abnormality detection apparatus 10, the relationship estimation unit 121 performs a relationship estimation process for estimating the relationship between data with respect to input data and calculating a parameter indicating the relationship between data ( Step S1). The relationship estimation unit 121 uses data in which the device or the like that outputs the data is in a normal state, in other words, data that does not include data in an abnormal state for parameter estimation. When the relationship that holds between the data is given in advance, this step S1 can be omitted.

そして、乖離値ベクトル計算部１２２は、データ間の関係性に基づいて、検知対象である検知データの集合及び比較データの集合におけるデータ間の乖離値ベクトルを計算する乖離値ベクトル計算処理を実行する（ステップＳ２）。ここで、乖離値ベクトル計算部１２２は、比較データが予め与えられている場合、該比較データの集合におけるデータ間の乖離値ベクトルを計算して、乖離値ベクトル記憶部１３１に記憶する。 Then, the divergence value vector calculation unit 122 executes divergence value vector calculation processing for calculating a divergence value vector between the data in the set of detection data to be detected and the set of comparison data based on the relationship between the data. (Step S2). Here, when the comparison data is given in advance, the divergence value vector calculation unit 122 calculates a divergence value vector between data in the set of comparison data, and stores it in the divergence value vector storage unit 131.

続いて、異常度計算部１２３は、検知データの乖離値ベクトルの、比較データの乖離値ベクトルの集合からの離散度合を、異常度として計算する異常度計算処理を行う（ステップＳ３）。なお、異常度計算部１２３は、比較データの乖離値ベクトルが予め計算されて乖離値ベクトル記憶部１３１に記憶されている場合、乖離値ベクトル記憶部１３１から比較データの乖離値ベクトルを読み出して、比較データの乖離値ベクトルの集合を取得する。 Subsequently, the abnormality degree calculation unit 123 performs an abnormality degree calculation process for calculating the degree of discreteness of the deviation value vector of the detection data from the set of deviation value vectors of the comparison data as an abnormality degree (step S3). When the deviation value vector of the comparison data is calculated in advance and stored in the deviation value vector storage unit 131, the abnormality degree calculation unit 123 reads the deviation value vector of the comparison data from the deviation value vector storage unit 131, and A set of deviation value vectors of comparison data is acquired.

そして、異常判定部１２４は、異常度計算部１２３が計算した離散度合を基に、検知データが異常であるか否かを判定する異常判定処理を行う（ステップＳ４）。この場合、異常判定部１２４は、異常度計算部１２３が計算した離散度合が所定の閾値を超えた場合に、検知データは異常であることを判定する。一方、異常判定部１２４は、離散度合が所定の閾値以下である場合に、検知データは正常であることを判定する。異常判定部１２４は、判定結果を異常検知結果として、通信処理部１１を介して端末装置２０に出力し、異常検知処理を終了する。 And the abnormality determination part 124 performs the abnormality determination process which determines whether detection data are abnormal based on the discrete degree which the abnormality degree calculation part 123 calculated (step S4). In this case, the abnormality determination unit 124 determines that the detected data is abnormal when the discrete degree calculated by the abnormality degree calculation unit 123 exceeds a predetermined threshold. On the other hand, the abnormality determination unit 124 determines that the detection data is normal when the discrete degree is equal to or less than a predetermined threshold. The abnormality determination unit 124 outputs the determination result as an abnormality detection result to the terminal device 20 via the communication processing unit 11, and ends the abnormality detection process.

［異常検知処理の具体例］
図８は、実施の形態１の異常検知処理を説明する図である。図８は、データとして、Ｘ及びＹの組が与えられたとして、座標平面上にその組をプロットしたものである。図８の白丸は、正常な状態の比較データに対応する。また、点Ｐｂは、相関関係を維持したままで、それまでには存在していなかった値をとった場合の例である。点Ｐｒは、相関関係が崩れた場合の例である。また、正常である比較データ（図８の白丸）を基に、Ｘ及びＹの関係性として、直線Ｌｔで示される「Ｙ＝ａＸ＋ｂ」という単回帰が与えられている。 [Specific examples of abnormality detection processing]
FIG. 8 is a diagram for explaining the abnormality detection process of the first embodiment. FIG. 8 plots the set on the coordinate plane assuming that a set of X and Y is given as data. White circles in FIG. 8 correspond to comparison data in a normal state. Moreover, the point Pb is an example in the case of taking a value that did not exist until then while maintaining the correlation. The point Pr is an example when the correlation is broken. Further, based on normal comparison data (white circles in FIG. 8), a single regression “Y = aX + b” indicated by a straight line Lt is given as the relationship between X and Y.

図８の示す点Ｐｂは、直線Ｌｔ上に位置し、正常である場合に成り立つ相関関係を維持しているため、正常であることが想定される。ここで、従来用いられていたＬＯＦでは、データ間の関係性を考慮しておらず、白丸の密度が低い点に存在する点Ｐｂ及び点Ｐｒは、いずれも異常であると検知される。 Since the point Pb shown in FIG. 8 is located on the straight line Lt and maintains the correlation that is established when it is normal, it is assumed to be normal. Here, in the conventional LOF, the relationship between data is not taken into consideration, and the points Pb and Pr existing at the point where the density of white circles is low are detected as abnormal.

これに対し、本実施の形態１では、データ間の関係性に基づいて、検知データの集合におけるデータ間の乖離値ベクトルと、比較データの集合におけるデータ間の乖離値ベクトルと、を計算し、検知データの乖離値ベクトルの、比較データの乖離値ベクトルの集合からの離散度合を、異常度として異常判定を行う。 On the other hand, in the first embodiment, based on the relationship between the data, the divergence value vector between the data in the set of detection data and the divergence value vector between the data in the set of comparison data are calculated, Anomaly determination is performed with the degree of discreteness of the deviation value vector of the detection data from the set of deviation value vectors of the comparison data as the degree of abnormality.

例えば、点Ｐｂが検知データである場合を例とする。この点Ｐｂは、直線Ｌｔ上に位置するため、点Ｐｂに示す「Ｘ，Ｙ」は、直線Ｌｔで示される「Ｙ＝ａＸ＋ｂ」の関係を有していると言える。したがって、この点Ｐｂに示す「Ｘ，Ｙ」について、直線Ｌｔで示される「Ｙ＝ａＸ＋ｂ」に対する乖離値ベクトルを計算し、その乖離値ベクトルの、正常である比較データ（白丸）の乖離値ベクトルの集合からの離散度合を計算すると、ほぼ０となり、点Ｐｂは正常であることを検知できる。 For example, a case where the point Pb is detection data is taken as an example. Since the point Pb is located on the straight line Lt, it can be said that “X, Y” indicated by the point Pb has a relationship of “Y = aX + b” indicated by the straight line Lt. Therefore, a divergence value vector with respect to “Y = aX + b” indicated by the straight line Lt is calculated for “X, Y” indicated by this point Pb, and the divergence value vector of normal comparison data (white circle) of the divergence value vector. When the degree of discreteness from the set is calculated, it becomes almost 0, and it can be detected that the point Pb is normal.

一方、点Ｐｒが検知データである場合について説明する。この点Ｐｒは、直線Ｌｔから離れているため、点Ｐｒに示す「Ｘ，Ｙ」は、「Ｙ＝ａＸ＋ｂ」の関係を有していないと言える。したがって、この点Ｐｒに示す「Ｘ，Ｙ」の、直線Ｌｔで示される「Ｙ＝ａＸ＋ｂ」に対する乖離値ベクトルを計算し、その乖離値ベクトルの、正常である比較データ（白丸）の乖離値ベクトルの集合からの離散度合を計算すると、その値は大きくなり、Ｐｒは異常であることを検知できる。 On the other hand, a case where the point Pr is detection data will be described. Since this point Pr is away from the straight line Lt, it can be said that “X, Y” shown at the point Pr does not have a relationship of “Y = aX + b”. Therefore, the divergence value vector of “X, Y” indicated by the point Pr with respect to “Y = aX + b” indicated by the straight line Lt is calculated, and the divergence value vector of the normal comparison data (white circle) of the divergence value vector is calculated. When the degree of discreteness from the set is calculated, the value becomes large, and it can be detected that Pr is abnormal.

このように、異常検知装置１０は、乖離値ベクトルという概念を導入し、検知対象のデータの乖離値ベクトルと、正常である比較データの乖離値ベクトルの集合との空間的な距離や密度に基づき離散度合（異常度）を計算し、検知データの異常の有無を判定する。したがって、異常検知装置１０は、データ間に相関がある場合に、相関に乗っているが、比較データの集合から外れた、正常であると想定できるデータ（例えば、点Ｐｂ）を、正常であると検知することができる。 As described above, the abnormality detection apparatus 10 introduces the concept of the divergence value vector, and is based on the spatial distance and density between the divergence value vector of the detection target data and the set of divergence value vectors of normal comparison data. The degree of discreteness (abnormality) is calculated, and the presence or absence of abnormality in the detected data is determined. Therefore, when there is a correlation between the data, the abnormality detection device 10 is normal for data that is on the correlation but can be assumed to be normal, for example, the point Pb that is out of the set of comparison data. Can be detected.

また、図１４や図１５のように、データ間に単なる相関関係でない、複雑な関係性が見られる場合であっても、データ間の関係性からの乖離値ベクトルという概念により、異常検知を精度よく実行することができる。 Moreover, even when a complicated relationship that is not just a correlation between data is seen as shown in FIG. 14 and FIG. 15, anomaly detection is accurately performed by the concept of a deviation value vector from the relationship between data. Can perform well.

［実施の形態１の効果］
上記のように、実施の形態１では、乖離値ベクトルという概念を導入し、データ間の関係性に基づいて計算した、検知データの乖離値ベクトルと、正常である比較データの乖離値ベクトルの集合との空間的な距離や密度に基づき離散度合（異常度）を計算し、検知データの異常の有無を判定するため、データ間の関係性に基づいた検知対象データの異常検知を精度よく実行することができる。 [Effect of Embodiment 1]
As described above, the first embodiment introduces the concept of divergence value vectors, and is a set of divergence value vectors of detected data and divergence value vectors of normal comparison data calculated based on the relationship between the data. In order to calculate the degree of anomaly (abnormality) based on the spatial distance and density of the data and determine the presence or absence of abnormality in the detected data, the abnormality detection of the detection target data based on the relationship between the data is accurately executed be able to.

［実施の形態２］
次に、実施の形態２について説明する。実施の形態２では、離散度合として、検知データ及び比較データの乖離値ベクトルに基づいたマハラビノス距離を計算し、異常の有無を判定する。なお、実施の形態２に係る異常検知装置は、図１に示す異常検知装置１０と同等の構成を有する。 [Embodiment 2]
Next, a second embodiment will be described. In the second embodiment, the Mahalanobis distance based on the deviation value vector of the detection data and the comparison data is calculated as the discrete degree, and the presence / absence of an abnormality is determined. Note that the abnormality detection device according to the second embodiment has a configuration equivalent to that of the abnormality detection device 10 shown in FIG.

実施の形態２では、実施の形態１と同様に、乖離値ベクトル計算部１２２が、データ間の関係性に基づいて、検知対象である検知データの集合におけるデータ間の乖離値ベクトルを計算する。そして、乖離値ベクトル計算部１２２は、比較データの集合におけるデータ間の乖離値ベクトルを計算する。なお、実施の形態１と同様に、乖離値ベクトル計算部１２２は、比較データが予め与えられている場合、該比較データの集合におけるデータ間の乖離値ベクトルを計算して、乖離値ベクトル記憶部１３１に記憶してもよい。 In the second embodiment, as in the first embodiment, the divergence value vector calculation unit 122 calculates a divergence value vector between data in a set of detection data to be detected based on the relationship between data. Then, the divergence value vector calculation unit 122 calculates a divergence value vector between data in the set of comparison data. As in the first embodiment, the divergence value vector calculation unit 122 calculates the divergence value vector between data in the set of comparison data when the comparison data is given in advance, and the divergence value vector storage unit 131 may be stored.

そして、異常度計算部１２３は、比較データの乖離値ベクトルが多次元正規分布に従うと仮定し、それらの乖離値ベクトルの平均と共分散行列とを計算する。続いて、異常度計算部１２３は、（３）式で定義されるマハラビノス距離を計算し、このマハラビノス距離を離散度合（異常度）として出力する。 Then, the degree-of-abnormality calculation unit 123 assumes that the divergence value vectors of the comparison data follow a multidimensional normal distribution, and calculates an average of these divergence value vectors and a covariance matrix. Subsequently, the abnormality degree calculation unit 123 calculates the Mahalanobis distance defined by the equation (3), and outputs the Mahalanobis distance as a discrete degree (abnormal degree).

異常判定部１２４は、異常度計算部１２３が計算したマハラビノス距離が一定の閾値を超えた場合に、検知データは異常であることを判定する。一方、異常判定部１２４は、異常度計算部１２３が計算したマハラビノス距離が一定の閾値以下である場合には、検知データは正常であることを判定する。なお、本実施の形態２では、比較データの乖離値ベクトルが多次元正規分布に従うと仮定しており、この場合、乖離値ベクトルのマハラビノス距離は近似的にｘ二乗分布に従うため、ｘ二乗分布に基づき閾値を決定することができる。このような方法は、ホテリングのＴ^２検定と呼ばれている（「竹内啓，統計学辞典 P112，東洋経済新聞社，1989」参照）。 The abnormality determination unit 124 determines that the detected data is abnormal when the Mahalanobis distance calculated by the abnormality degree calculation unit 123 exceeds a certain threshold. On the other hand, the abnormality determination unit 124 determines that the detected data is normal when the Mahalanobis distance calculated by the abnormality degree calculation unit 123 is equal to or less than a certain threshold value. In the second embodiment, it is assumed that the divergence value vector of the comparison data follows a multidimensional normal distribution. In this case, since the Mahalanobis distance of the divergence value vector approximately follows an x-square distribution, A threshold can be determined based on this. Such a method is referred to as T ² test of Hotelling (see "Kei Takeuchi, statistically dictionary P112, Toyo Keizai newspaper, 1989").

［実施の形態２の効果］
このように、実施の形態２においては、離散度合として、マハラビノス距離を計算し、計算したマハラビノス距離と所定の閾値との比較結果によって、異常の有無を判定する。マハラビノス距離は、データ間の関係性に基づく検知データ及び比較データの乖離値ベクトルを基に計算されたものであるため、実施の形態２は、実施の形態１と同様に、データ間の関係性に基づいた検知対象データの異常検知を精度よく実行することができる。 [Effect of Embodiment 2]
As described above, in the second embodiment, the Mahalanobis distance is calculated as the discrete degree, and the presence or absence of abnormality is determined based on the comparison result between the calculated Mahalanobis distance and the predetermined threshold value. Since the Mahalanobis distance is calculated based on the deviation value vector of the detection data and the comparison data based on the relationship between the data, the second embodiment is similar to the first embodiment in the relationship between the data. It is possible to accurately detect abnormality of detection target data based on the above.

［実施の形態３］
次に、実施の形態３について説明する。この実施の形態３に係る異常検知装置は、図１に示す異常検知装置１０と同等の構成を有する。 [Embodiment 3]
Next, Embodiment 3 will be described. The abnormality detection device according to Embodiment 3 has a configuration equivalent to that of abnormality detection device 10 shown in FIG.

また、実施の形態１と同様に、乖離値ベクトル計算部１２２は、データ間の関係性に基づいて、検知対象である検知データの集合におけるデータ間の乖離値ベクトルを計算する。そして、乖離値ベクトル計算部１２２は、比較データの集合におけるデータ間の乖離値ベクトルを計算する。なお、実施の形態１と同様に、乖離値ベクトル計算部１２２は、比較データが予め与えられている場合、該比較データの集合におけるデータ間の乖離値ベクトルを計算して、乖離値ベクトル記憶部１３１に記憶してもよい。そこで、次に、異常度計算部１２３の処理を説明する。 Similarly to the first embodiment, the divergence value vector calculation unit 122 calculates a divergence value vector between data in a set of detection data to be detected based on the relationship between data. Then, the divergence value vector calculation unit 122 calculates a divergence value vector between data in the set of comparison data. As in the first embodiment, the divergence value vector calculation unit 122 calculates the divergence value vector between data in the set of comparison data when the comparison data is given in advance, and the divergence value vector storage unit 131 may be stored. Then, next, the process of the abnormality degree calculation part 123 is demonstrated.

［異常度計算部の処理］
実施の形態３では、異常度計算部１２３は、さらに、One-class Support Vector Machine（以下「Ｏｎｅ-ｃｌａｓｓＳＶＭ」と略す。詳しくは、「B. Scholkopf, J. C. Platt, J. Shawe-Taylor， A. J. Smola， and R. C. Williamson, “Estimating the Support of a High-Dimensional Distribution”, Neural Computation， 13(7):1443-1471， 2001．」参照。）の概念に基づいて、比較データの乖離値ベクトルの集合を含む領域を推定する。 [Processing of abnormality level calculation part]
In the third embodiment, the abnormality degree calculation unit 123 is further abbreviated as “One-class Support Vector Machine” (hereinafter, “One-class SVM”. For details, refer to “B. Scholkopf, JC Platt, J. Shawe-Taylor, AJ” Smola, and RC Williamson, “Estimating the Support of a High-Dimensional Distribution”, Neural Computation, 13 (7): 1443-1471, 2001.)) The region including is estimated.

具体的に、図９を参照して、離散度合（異常度）を求める処理について説明する。図９は、実施の形態３に係る異常度計算処理を説明する図である。図９は、データの乖離値ベクトルを所定の高次元空間に写像したものである。 Specifically, with reference to FIG. 9, a process for obtaining the degree of discreteness (abnormality) will be described. FIG. 9 is a diagram for explaining abnormality degree calculation processing according to the third embodiment. FIG. 9 is a map of data divergence value vectors in a predetermined high-dimensional space.

異常度計算部１２３は、Ｏｎｅ-ｃｌａｓｓＳＶＭに基づき、正常データである比較データの乖離値ベクトルを、高次元空間（図９ではφの次元１及び次元２）に写像する。そして、異常度計算部１２３は、写像した比較データの乖離値ベクトルに対応する点の、原点からの距離（マージン）が最大化するような平面（超平面）を求める。この平面は、図９の例では、超平面Ｌｅとして示している。この超平面Ｌｅは、正常である比較データの乖離値ベクトルの集合の境界に対応するものであり、実際には、写像した比較データの乖離値ベクトルを示す点は、超平面Ｌｅよりも原点側でない方に位置する。 The degree-of-abnormality calculation unit 123 maps the deviation value vector of the comparison data, which is normal data, to a high-dimensional space (dimensions 1 and 2 in φ in FIG. 9) based on the One-class SVM. Then, the degree-of-abnormality calculation unit 123 obtains a plane (hyperplane) that maximizes the distance (margin) from the origin of the point corresponding to the divergence value vector of the mapped comparison data. This plane is shown as a hyperplane Le in the example of FIG. This hyperplane Le corresponds to the boundary of the set of divergence value vectors of normal comparison data, and actually, the point indicating the divergence value vector of the mapped comparison data is the origin side of the hyperplane Le. Located on the side that is not.

続いて、異常度計算部１２３は、検知対象データの乖離値ベクトルを、比較データに対して写像した高次元空間と同じ高次元空間に写像する。例えば、図９に示すように、写像した検知データの乖離値ベクトルを示す各点は、超平面Ｌｅから見て、原点側にある群Ｒ２と、原点側にない群Ｒ３とに分けられる。異常度計算部１２３は、写像した検知データの乖離値ベクトルに対応する点が、超平面Ｌｅから見て原点側にあるか否かを基に、離散度合（異常度）を計算する。 Subsequently, the degree-of-abnormality calculation unit 123 maps the deviation value vector of the detection target data to the same high-dimensional space as the high-dimensional space mapped to the comparison data. For example, as shown in FIG. 9, each point indicating the deviation value vector of the mapped detection data is divided into a group R2 on the origin side and a group R3 not on the origin side when viewed from the hyperplane Le. The degree of abnormality calculation unit 123 calculates the degree of discreteness (degree of abnormality) based on whether or not the point corresponding to the divergence value vector of the mapped detection data is on the origin side when viewed from the hyperplane Le.

そこで、異常度計算部１２３における計算処理を、図１０を参照して、説明する。図１０は、実施の形態３に係る異常度計算処理を説明する図である。まず、比較データの乖離値ベクトルを「ｅ_１，ｅ_２，・・・，ｅ_Ｍ」とする。この比較データの乖離値ベクトルに対し、図１０に示す式Ｇ（（Ａ）式参照）を、（Ｂ）式及び（Ｃ）式に示す条件下で最小化する最小化問題を解く。なお、記号「＜，＞」は内積を表す。 Therefore, the calculation process in the abnormality degree calculation unit 123 will be described with reference to FIG. FIG. 10 is a diagram for explaining the abnormality degree calculation processing according to the third embodiment. First, let divergence value vectors of comparison data be “e ₁ , e ₂ ,..., E _M ”. The minimization problem of minimizing the equation G shown in FIG. 10 (see equation (A)) under the conditions shown in equations (B) and (C) with respect to the deviation value vector of the comparison data is solved. The symbol “<,>” represents an inner product.

この問題では、超平面として、各データの（「φ（ベクトルｅ_ｍ）」の距離を、「ｄ_ｍ」としたときに、最も小さいｄ_ｍを最大化する超平面を求めようとしている。言い換えると、最も超平面に近いデータまでの距離を最大化する、超平面のパラメータ「ベクトルｗ」と「ρ」を求めようとしている。 In this problem, an attempt is made to obtain a hyperplane that maximizes the smallest d _m when the distance (“φ (vector e _m ))” of each data is “d _m ”. The hyperplane parameters “vector w” and “ρ” that maximize the distance to the data closest to the hyperplane are obtained.

この最小化問題は、以下の（４）式に示す「Ｌ」を最小化するＬａｇｒａｎｇｅの未定乗数法により解くことができる。この（４）式の１行目は、Ｇそのものであり、（４）式の２行目については、図１０の（Ｂ）式に示す制約条件を反映し、（４）式の３行目については、図１０の（Ｃ）式に示す制約条件を反映する。 This minimization problem can be solved by Lagrange's undetermined multiplier method for minimizing “L” shown in the following equation (4). The first line of the expression (4) is G itself, and the second line of the expression (4) reflects the constraints shown in the expression (B) of FIG. 10, and the third line of the expression (4). For, the constraints shown in the equation (C) of FIG. 10 are reflected.

図１１は、実施の形態３に係る異常度計算処理及び異常判定処理を説明する図である。異常度計算部１２３は、異常データの乖離値ベクトルを「ｅ’」としたとき、図１１に示す式ｆ（ｅ’）によって、検知データの乖離値ベクトル「ｅ’」に対する異常度を計算する。式ｆ（ｅ’）は、（４）式で求めたパラメータを適用し、比較データの乖離値ベクトル「ｅ_ｍ」を高次元空間に写像した点と、検知データの乖離値ベクトル「ｅ’」を高次元空間に写像した点との距離に基づいた異常度を計算するものである。異常度計算部１２３は、この式ｆ（ｅ’）を用いた計算を行うことによって、写像した検知データの乖離値ベクトルに対応する点が、超平面Ｌｅから見て原点側にあるか否かを示す異常度を求めることができる。 FIG. 11 is a diagram for explaining an abnormality degree calculation process and an abnormality determination process according to the third embodiment. When the deviation value vector of abnormal data is “e ′”, the abnormality degree calculation unit 123 calculates the degree of abnormality with respect to the deviation value vector “e ′” of the detected data using the equation f (e ′) shown in FIG. . Expression f (e ′) applies the parameter obtained by Expression (4), and maps the divergence value vector “e _m ” of the comparison data to the high-dimensional space and the divergence value vector “e ′” of the detection data. The degree of anomaly is calculated based on the distance from a point mapped to a high-dimensional space. The degree-of-abnormality calculation unit 123 performs the calculation using the formula f (e ′), thereby determining whether or not the point corresponding to the divergence value vector of the mapped detection data is on the origin side when viewed from the hyperplane Le. Can be obtained.

このように、異常度計算部１２３は、上述のＯｎｅ-ｃｌａｓｓＳＶＭに従い、写像した検知データの乖離値ベクトルを示す点が、平面（例えば、超平面Ｌｅ）から見て原点側にあるか、或いは、平面から見て原点側にないかを、式ｆ（ｅ’）を用いて計算する。 As described above, the degree-of-abnormality calculation unit 123 determines whether the point indicating the divergence value vector of the mapped detection data is on the origin side when viewed from the plane (for example, the hyperplane Le) according to the above-described One-class SVM, or Whether it is not on the origin side when viewed from the plane is calculated using the equation f (e ′).

［異常判定部の処理］
そして、実施の形態３では、異常判定部１２４は、写像した検知データの乖離値ベクトルを示す点が、平面から見て原点側にある場合には、該検知データは異常であると判定する。一方、異常判定部１２４は、写像した検知データの乖離値ベクトルを示す点が、平面から見て原点側にない場合には、正常であると判定する。例えば、異常判定部１２４は、図９に示す写像した検知データの乖離値ベクトルを示す各点のうち、超平面Ｌｅから見て、原点側にある群Ｒ２については、検知データは異常であると判定する。一方、異常判定部１２４は、超平面Ｌｅから見て、原点側にない群Ｒ３については、検知データは正常であると判定する（図９の枠Ｂ１参照）。 [Abnormality judgment unit processing]
In the third embodiment, the abnormality determination unit 124 determines that the detection data is abnormal when the point indicating the deviation value vector of the mapped detection data is on the origin side when viewed from the plane. On the other hand, the abnormality determination unit 124 determines that the point indicating the divergence value vector of the mapped detection data is normal when it is not on the origin side when viewed from the plane. For example, the abnormality determination unit 124 determines that the detection data is abnormal for the group R2 on the origin side when viewed from the hyperplane Le among the points indicating the divergence value vector of the mapped detection data illustrated in FIG. judge. On the other hand, the abnormality determination unit 124 determines that the detection data is normal for the group R3 that is not on the origin side when viewed from the hyperplane Le (see the frame B1 in FIG. 9).

ここで、異常判定部１２４は、検知データに対し式（ｅ’）で求めた異常度と、前述の未定乗数法（（４）式）によって求めた超平面に対応するパラメータ（ρチルダ）と、を比較することによって、検知データの異常の有無を判定する。すなわち、図１１に示すように、異常判定部１２４は、検知データについての異常度ｆ（ｅ’）が、（４）式からパラメータ（ρチルダ）よりも小さい場合には、検知データが超平面Ｌｅよりも原点側にあると判断して、該検知データは異常であると判定する。一方、異常判定部１２４は、検知データについての異常度ｆ（ｅ’）が、パラメータ（ρチルダ）よりも大きい場合には、検知データが超平面Ｌｅよりも原点側にないと判断して、正常であると判定する。 Here, the abnormality determination unit 124 calculates the degree of abnormality obtained from the detected data by the equation (e ′) and the parameter (ρ tilde) corresponding to the hyperplane obtained by the undetermined multiplier method (equation (4)). Are compared to determine whether the detected data is abnormal. That is, as shown in FIG. 11, when the abnormality degree f (e ′) for the detection data is smaller than the parameter (ρ tilde) from the equation (4), the abnormality determination unit 124 determines that the detection data is hyperplane. The detection data is determined to be abnormal because it is determined to be closer to the origin than Le. On the other hand, the abnormality determination unit 124 determines that the detection data is not closer to the origin than the hyperplane Le when the abnormality degree f (e ′) for the detection data is greater than the parameter (ρ tilde). Determined to be normal.

［実施の形態３の効果］
このように、実施の形態３においては、比較データの乖離値ベクトルを高次元空間に写像して原点からの距離が最大化する超平面を求める。そして、実施の形態３では、検知データの乖離値ベクトルを高次元空間に写像した場合に該写像した乖離値ベクトルに対応する点が、超平面から見て原点側にあるか否かを基に異常度を計算して、異常の有無を判定する。すなわち、実施の形態３においても、実施の形態１と同様に、データ間の関係性に基づいて計算した、検知データの乖離値ベクトルと、正常である比較データの乖離値ベクトルの集合との距離によって、検知データの異常の有無を判定しているため、データ間の関係性に基づいた検知対象データの異常検知を精度よく実行することができる。 [Effect of Embodiment 3]
As described above, in the third embodiment, the hyperplane where the distance from the origin is maximized is obtained by mapping the divergence value vector of the comparison data into the high-dimensional space. In the third embodiment, when the deviation value vector of the detection data is mapped to the high-dimensional space, the point corresponding to the mapped deviation value vector is on the origin side when viewed from the hyperplane. The degree of abnormality is calculated to determine whether there is an abnormality. That is, also in the third embodiment, as in the first embodiment, the distance between the divergence value vector of the detected data and the set of divergence value vectors of the normal comparison data calculated based on the relationship between the data. Therefore, the presence / absence of abnormality in the detected data is determined, so that the abnormality detection of the detection target data based on the relationship between the data can be accurately performed.

［実施形態のシステム構成について］
図１に示した異常検知装置１０の各構成要素は機能概念的なものであり、必ずしも物理的に図示のように構成されていることを要しない。すなわち、異常検知装置１０の機能の分散および統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散または統合して構成することができる。 [System configuration of the embodiment]
Each component of the abnormality detection apparatus 10 illustrated in FIG. 1 is functionally conceptual, and does not necessarily need to be physically configured as illustrated. That is, the specific form of distribution and integration of the functions of the abnormality detection device 10 is not limited to the illustrated one, and all or a part thereof can be functionally or physically processed in arbitrary units according to various loads and usage conditions. Can be distributed or integrated.

また、異常検知装置１０においておこなわれる各処理は、全部または任意の一部が、ＣＰＵ（Central Processing Unit）およびＣＰＵにより解析実行されるプログラムにて実現されてもよい。また、異常検知装置１０においておこなわれる各処理は、ワイヤードロジックによるハードウェアとして実現されてもよい。 In addition, each or all of the processes performed in the abnormality detection apparatus 10 may be realized by a CPU (Central Processing Unit) and a program that is analyzed and executed by the CPU. Moreover, each process performed in the abnormality detection apparatus 10 may be implement | achieved as hardware by a wired logic.

また、実施形態において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的に行うこともできる。もしくは、手動的におこなわれるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上述および図示の処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて適宜変更することができる。 In addition, among the processes described in the embodiment, all or a part of the processes described as being automatically performed can be manually performed. Alternatively, all or part of the processing described as being performed manually can be automatically performed by a known method. In addition, the above-described and illustrated processing procedures, control procedures, specific names, and information including various data and parameters can be changed as appropriate unless otherwise specified.

［プログラム］
図１２は、プログラムが実行されることにより、異常検知装置１０が実現されるコンピュータの一例を示す図である。コンピュータ１０００は、例えば、メモリ１０１０、ＣＰＵ１０２０を有する。また、コンピュータ１０００は、ハードディスクドライブインタフェース１０３０、ディスクドライブインタフェース１０４０、シリアルポートインタフェース１０５０、ビデオアダプタ１０６０、ネットワークインタフェース１０７０を有する。これらの各部は、バス１０８０によって接続される。 [program]
FIG. 12 is a diagram illustrating an example of a computer in which the abnormality detection apparatus 10 is realized by executing a program. The computer 1000 includes a memory 1010 and a CPU 1020, for example. The computer 1000 also includes a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These units are connected by a bus 1080.

メモリ１０１０は、ＲＯＭ（Read Only Memory）１０１１及びＲＡＭ１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、ハードディスクドライブ１０９０に接続される。ディスクドライブインタフェース１０４０は、ディスクドライブ１１００に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ１１００に挿入される。シリアルポートインタフェース１０５０は、例えばマウス１１１０、キーボード１１２０に接続される。ビデオアダプタ１０６０は、例えばディスプレイ１１３０に接続される。 The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012. The ROM 1011 stores a boot program such as BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1090. The disk drive interface 1040 is connected to the disk drive 1100. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected to a mouse 1110 and a keyboard 1120, for example. The video adapter 1060 is connected to the display 1130, for example.

ハードディスクドライブ１０９０は、例えば、ＯＳ１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３、プログラムデータ１０９４を記憶する。すなわち、異常検知装置１０の各処理を規定するプログラムは、コンピュータ１０００により実行可能なコードが記述されたプログラムモジュール１０９３として実装される。プログラムモジュール１０９３は、例えばハードディスクドライブ１０９０に記憶される。例えば、異常検知装置１０における機能構成と同様の処理を実行するためのプログラムモジュール１０９３が、ハードディスクドライブ１０９０に記憶される。なお、ハードディスクドライブ１０９０は、ＳＳＤ（Solid State Drive）により代替されてもよい。 The hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, a program that defines each process of the abnormality detection apparatus 10 is implemented as a program module 1093 in which a code executable by the computer 1000 is described. The program module 1093 is stored in the hard disk drive 1090, for example. For example, a program module 1093 for executing processing similar to the functional configuration in the abnormality detection device 10 is stored in the hard disk drive 1090. The hard disk drive 1090 may be replaced by an SSD (Solid State Drive).

また、上述した実施の形態の処理で用いられる設定データは、プログラムデータ１０９４として、例えばメモリ１０１０やハードディスクドライブ１０９０に記憶される。そして、ＣＰＵ１０２０が、メモリ１０１０やハードディスクドライブ１０９０に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出して実行する。 The setting data used in the processing of the above-described embodiment is stored as program data 1094 in, for example, the memory 1010 or the hard disk drive 1090. Then, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 to the RAM 1012 and executes them as necessary.

なお、プログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０９０に記憶される場合に限らず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ１１００等を介してＣＰＵ１０２０によって読み出されてもよい。あるいは、プログラムモジュール１０９３及びプログラムデータ１０９４は、ネットワーク（ＬＡＮ、ＷＡＮ等）を介して接続された他のコンピュータに記憶されてもよい。そして、プログラムモジュール１０９３及びプログラムデータ１０９４は、他のコンピュータから、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出されてもよい。 The program module 1093 and the program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in, for example, a removable storage medium and read out by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and the program data 1094 may be stored in another computer connected via a network (LAN, WAN, etc.). Then, the program module 1093 and the program data 1094 may be read by the CPU 1020 from another computer via the network interface 1070.

以上、本発明者によってなされた発明を適用した実施の形態について説明したが、本実施の形態による本発明の開示の一部をなす記述及び図面により本発明は限定されることはない。すなわち、本実施の形態に基づいて当業者等によりなされる他の実施の形態、実施例及び運用技術等は全て本発明の範疇に含まれる。 Although the embodiment to which the invention made by the present inventor is applied has been described above, the present invention is not limited by the description and the drawings that form part of the disclosure of the present invention according to this embodiment. That is, other embodiments, examples, operation techniques, and the like made by those skilled in the art based on the present embodiment are all included in the scope of the present invention.

１０異常検知装置
１１通信処理部
１２制御部
１３記憶部
１２１関係性推定部
１２２乖離値ベクトル計算部
１２３異常度計算部
１２４異常判定部
１３１乖離値ベクトル記憶部 DESCRIPTION OF SYMBOLS 10 Abnormality detection apparatus 11 Communication processing part 12 Control part 13 Storage part 121 Relationship estimation part 122 Deviation value vector calculation part 123 Abnormality degree calculation part 124 Abnormality determination part 131 Deviation value vector storage part

Claims

Based on the relationship between the data, it represents the divergence from the relationship between the data of the divergence value vector that represents the divergence from the relationship between the data of the detection data that is the detection target and the comparison data that is the comparison target. A divergence value vector, a divergence value vector calculation unit for calculating,
An abnormality degree calculation unit for calculating a discrete degree of the deviation value vector of the detection data from the set of deviation value vectors of the comparison data as a degree indicating abnormality;
An abnormality determination unit that determines that the detection data is abnormal when the discrete degree exceeds a predetermined threshold;
An abnormality detection device characterized by comprising:

A deviation value vector storage unit for storing the deviation value vector calculated by the deviation value vector calculation unit;
The abnormality detection device according to claim 1, wherein the abnormality degree calculation unit calculates the discrete degree using a deviation value vector stored in the deviation value vector storage unit.

Further comprising a relationship estimation unit for estimating a relationship between the data and calculating a parameter indicating the relationship between the data;
The divergence value vector calculation unit applies the parameter indicating the relationship between the data calculated by the relationship estimation unit to the relationship between the data, and calculates the divergence value vector. The abnormality detection device according to 1 or 2.

The anomaly degree calculation unit calculates the discrete degree based on a spatial distance, a density, and the like of a deviation value vector of the detection data from a set of deviation value vectors of the comparison data. Item 4. The abnormality detection device according to any one of Items 1 to 3.

The degree of abnormality calculation unit calculates an average of the divergence value vector of the comparison data and a covariance matrix of the divergence value vector of the comparison data, and the divergence value vector of the detection data and the divergence value vector of the comparison data The Mahalanobis distance calculated | required based on the average of the above and the covariance matrix of the divergence value vector of the comparison data is calculated as the discrete degree. Anomaly detection device.

The anomaly calculation unit obtains a plane that maximizes the distance from the origin at a point corresponding to the mapped deviation value vector when the deviation value vector of the comparison data is mapped to a high-dimensional space, and When the deviation value vector is mapped to the high-dimensional space, the discrete degree is calculated based on whether or not a point corresponding to the mapped deviation value vector is on the origin side when viewed from the plane. The abnormality detection device according to any one of claims 1 to 3.

An anomaly detection method executed by an anomaly detection device that detects the presence or absence of an anomaly with respect to a set of detection data to be detected,
Calculating a divergence value vector between data in the set of detection data and a divergence value vector between data in the set of comparison data to be compared based on the relationship between the data;
Calculating a discrete degree of the deviation value vector of the detection data from the set of deviation value vectors of the comparison data as a degree indicating abnormality;
Determining that the detection data is abnormal when the discrete degree exceeds a predetermined threshold;
An abnormality detection method characterized by including

Calculating a divergence value vector between data in a set of detection data to be detected and a divergence value vector between data in a set of comparison data to be compared based on the relationship between the data;
Calculating a discrete degree of the deviation value vector of the detection data from a set distribution of the deviation value vector of the comparison data as a degree indicating abnormality;
Determining that the detection data is abnormal when the discrete degree exceeds a predetermined threshold;
An abnormality detection program that causes a computer to execute.