JP2010092203A

JP2010092203A - Failure detection device and failure detection method

Info

Publication number: JP2010092203A
Application number: JP2008260420A
Authority: JP
Inventors: Toshiaki Hirose; 俊亮広瀬; Kenji Yamanishi; 健司山西
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2008-10-07
Filing date: 2008-10-07
Publication date: 2010-04-22

Abstract

<P>PROBLEM TO BE SOLVED: To provide a failure detection device for properly detecting a failure. <P>SOLUTION: The failure detection device 100 includes a data-based failure score calculation part 122 and classification category-based overview score calculation part 123-1 to 123-n installed according to different categories based on the combination of one or more data attributes. The data-based failure score calculation part 122 learns the probability distribution of a series of data sequences to be detected, and calculates failure scores showing the level of deviation between the data predicted from the learned probability distribution and actual data for each data in the data sequence. The category-based overview score calculation parts 123-1 to 123-n learn the conditioned probability distribution of the failure scores calculated for the data belonging to each of groups obtained by classifying each data in the data sequence according to the corresponding category, and calculates overview scores showing the level of the failure from the learned conditioned probability distribution. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は異常を検出する方法と装置に関し、特に一連のデータ系列の発生確率分布を学習し、この学習した確率分布から予測されるデータと実際のデータとを比較して、異常を検出する異常検出装置と方法に関する。 The present invention relates to a method and apparatus for detecting an anomaly, and in particular, learns an occurrence probability distribution of a series of data series, compares the data predicted from the learned probability distribution with actual data, and detects an anomaly. The present invention relates to a detection apparatus and method.

一連のデータ系列から異常を検出する手法、例えばＷｅｂアクセスログから不正なアクセスを検出したり、ＰＣのイベントログからユーザの異常行動（不正な行動等）を検出したり、機器の出力するログ（syslog等）から機器の異常（故障等）を検出したりする技術は、ルールベース手法と非ルールベース手法とに大別される。 A method for detecting an abnormality from a series of data series, for example, detecting an unauthorized access from a Web access log, detecting an abnormal action (such as an illegal action) of a user from a PC event log, or a log output from a device ( The technology for detecting an abnormality (failure, etc.) of a device from syslog etc. is roughly divided into a rule-based method and a non-rule-based method.

ルールベース手法は、IDS(Intrusion Detection System)のようにどのような種類の行動が異常かというルールを予め定義しておいて、ルールと合致したデータや行動を異常として検出する手法である。これに対して非ルールベース手法は、データの通常のパターンを学習して通常のパターンから外れているデータや行動を異常として検出する手法である。ルールベース手法が、定義された異常、つまり既知の異常しか検出できないのに対して、非ルールベース手法は未知の異常を検出できるという利点がある。本発明は、このような非ルールベース手法による異常検出に関する。 The rule-based method is a method in which, as in IDS (Intrusion Detection System), a rule indicating what type of behavior is abnormal is defined in advance, and data or behavior that matches the rule is detected as abnormal. On the other hand, the non-rule-based method is a method that detects a normal pattern of data and detects data and behavior that deviates from the normal pattern as abnormal. The rule-based approach can detect only defined anomalies, ie known anomalies, whereas the non-rule-based approach has the advantage that it can detect unknown anomalies. The present invention relates to abnormality detection by such a non-rule-based method.

非ルールベースの手法による異常検出装置の一例が特許文献１に記載されている。特許文献１に記載された異常検出装置（以下、関連技術１と称す）では、入力データの通常のパターンの確率分布を学習し、新たに入力されたデータが学習された通常のパターンと確率的にどれだけ異なるのかを示すスコアを算出し、スコアが予め設定された閾値以上となる場合に異常と判定する。 An example of an abnormality detection apparatus using a non-rule-based method is described in Patent Document 1. In the abnormality detection device described in Patent Document 1 (hereinafter referred to as Related Art 1), the probability distribution of the normal pattern of the input data is learned, and the newly input data is learned and the normal pattern and the stochastic distribution. A score indicating how much the difference between the two is calculated, and when the score is equal to or higher than a preset threshold value, it is determined as abnormal.

非ルールベースの手法による異常検出装置の他の例が特許文献２に記載されている。特許文献２に記載された異常検出装置（以下、関連技術２と称す）では、各データ単位毎（例えば、Ｗｅｂアクセスログであれば１件のアクセス毎）に異常スコアを与えてデータ単位毎に異常かどうかを判断する他に、より大域的な異常を検出している。具体的には、通常のパターンからの外れ度合いのスコア（以下、外れ値スコアと称す）を再度学習し、外れ値スコアの外れ値スコアを算出することで、時系列の変化点を検出する。これは直感的には外れ値スコアの密集度が高いところを時系列の変化点と見做していることに相当する。個々のデータのスコアのみではなく、データ間の相関（密集度）を考慮しているという意味で、より大域的な異常の検出方法となっている。 Another example of an abnormality detection apparatus using a non-rule-based method is described in Patent Document 2. In the anomaly detection device described in Patent Document 2 (hereinafter referred to as Related Art 2), an anomaly score is given for each data unit (for example, for each access in the case of a Web access log) for each data unit. In addition to judging whether it is abnormal, it detects more global abnormalities. Specifically, a score of a degree of deviation from a normal pattern (hereinafter referred to as an outlier score) is learned again, and an outlier score of the outlier score is calculated to detect a time series change point. Intuitively, this is equivalent to considering a place with a high density of outlier scores as a time-series change point. This is a more global anomaly detection method in the sense that it considers not only the scores of individual data but also the correlation (density) between the data.

特開２００４−３０９９９８号公報JP 2004-309998 A 特開２００４−５４３７０号公報JP 2004-54370 A

特許文献１に記載されるように、通常の行動パターンを学習し、入力単位（イベントログであれば１回のコマンド実行、アクセスログであれば１回のアクセス）毎に通常のパターンからの外れ度合いを示すスコアを計算し、スコアの高いものを異常と見做す方法によれば、未知の異常を検出することができることに加えて、専門家がデータを分析して異常の種類や原因を特定する際に上記のスコアリングを用いることにより、分析に必要な作業量を削減できる利点がある。しかしながら、データ単位のスコアのみによる異常検出では、誤検出や検出漏れによって検出精度が低下する可能性がある。例えば、ＰＣの使用履歴から不審なユーザを検出する場合、或るユーザが過去に所定回数以上、スコアの高い行動をとった場合に、そのユーザの行動を不審な行動として検出するものとする。しかし、正常なユーザでも突発的に例外的な行動をとる場合があるため、過去に所定回数以上、スコアの高い行動をとったとしても、その行動の間隔が時間的に非常に長い場合には不審な行動として検出すると誤検出を招くことになる。他方、時間的に集中して行われていれば、意図的に例外的な行動を連続してとっているということになるので、異常な行動として検出するのが望ましい。特許文献１に記載されるように、入力単位毎のスコアに基づく異常検出では、両者の区別がつかず、誤検出や検出漏れが生じる可能性がある。 As described in Patent Document 1, a normal action pattern is learned, and deviates from the normal pattern for each input unit (one command execution for an event log and one access for an access log). In addition to being able to detect unknown abnormalities, a method that calculates a score indicating the degree and considers those with high scores as abnormal, in addition to analyzing data to determine the type and cause of the abnormality By using the above scoring when specifying, there is an advantage that the amount of work required for the analysis can be reduced. However, in the abnormality detection based only on the data unit score, there is a possibility that the detection accuracy is lowered due to erroneous detection or detection omission. For example, when a suspicious user is detected from the PC usage history, when a certain user has taken a behavior with a high score for a predetermined number of times in the past, the user's behavior is detected as a suspicious behavior. However, even a normal user may unexpectedly take an exceptional action, so even if a high score action has been taken more than a predetermined number of times in the past, if the action interval is very long in time If it is detected as a suspicious action, a false detection will be caused. On the other hand, if it is performed concentrated in time, it means that exceptional behavior is intentionally taken continuously, so it is desirable to detect it as abnormal behavior. As described in Patent Document 1, in the abnormality detection based on the score for each input unit, the two cannot be distinguished from each other, and there is a possibility that erroneous detection or detection omission occurs.

これに対して特許文献２に記載されるように、外れ値スコアの再学習を行い、異常が集中しているところを異常（変化点）と見做す技術によれば、入力単位毎のスコアのみを用いる際には考慮されないデータ間の相関が考慮されるようになるために、異常検出の精度を向上することができる。しかしながら、特許文献２では、データ間の相関は時間による相関のみである。時間による相関だけでも、データ単位のスコアに基づく方法に比べれば異常の検出精度は向上するものの、より検出精度を高めるためには、時間以外の属性も含めて複数の属性によるデータ間の相関を扱える、非ルールベース手法による異常検出技術が望まれる。 On the other hand, as described in Patent Document 2, an outlier score is re-learned, and according to a technique in which abnormalities are concentrated as abnormalities (change points), a score for each input unit is obtained. Since the correlation between data that is not taken into account when using only is considered, the accuracy of abnormality detection can be improved. However, in Patent Document 2, the correlation between data is only the correlation by time. Although only the correlation by time improves the detection accuracy of anomalies compared to the method based on the data unit score, in order to improve the detection accuracy, the correlation between data by multiple attributes including attributes other than time is also required. An anomaly detection technique using a non-rule-based method that can be handled is desired.

本発明はこのような事情に鑑みて提案されたものであり、その目的は、複数の属性によるデータ間の相関を考慮することにより、異常をより精度良く検出することのできる異常検出装置および方法を提供することにある。 The present invention has been proposed in view of such circumstances, and an object of the present invention is to provide an abnormality detection apparatus and method capable of detecting an abnormality more accurately by considering the correlation between data based on a plurality of attributes. Is to provide.

本発明の第１の異常検出装置は、異常検出対象となる一連のデータ系列の確率分布を学習し、該学習した確率分布から予測されるデータと実際のデータとのずれの程度を表す異常スコアを、データ系列中の個々のデータについて算出するデータ別異常スコア計算手段と、１以上のデータ属性の組み合わせによるそれぞれ異なる分類基準に対応して設けられる手段であって、前記データ系列中の個々のデータを対応する分類基準に従って分類した各グループ毎に、当該グループに属するデータについて前記算出された異常スコアの条件付確率分布を学習し、該学習した条件付確率分布から異常の程度を表す俯瞰スコアを算出する複数の分類基準別俯瞰スコア計算手段とを備える。 The first anomaly detection device of the present invention learns a probability distribution of a series of data series that is an anomaly detection target, and an anomaly score representing the degree of deviation between the data predicted from the learned probability distribution and the actual data For each data in the data series, and a means provided corresponding to each different classification criterion by a combination of one or more data attributes, wherein each individual data in the data series For each group in which the data is classified according to the corresponding classification criteria, the conditional probability distribution of the calculated abnormal score is learned for the data belonging to the group, and an overhead score representing the degree of abnormality from the learned conditional probability distribution And a plurality of classification-based overhead score calculation means for calculating.

本発明の第１の異常検出方法は、データ別異常スコア計算手段が、異常検出対象となる一連のデータ系列の確率分布を学習し、該学習した確率分布から予測されるデータと実際のデータとのずれの程度を表す異常スコアを、データ系列中の個々のデータについて算出する第１のステップと、１以上のデータ属性の組み合わせによるそれぞれ異なる分類基準に対応して設けられた複数の分類基準別俯瞰スコア計算手段が、前記データ系列中の個々のデータを、対応する分類基準に従って分類した各グループ毎に、当該グループに属するデータについて前記算出された異常スコアの条件付確率分布を学習し、該学習した条件付確率分布から異常の程度を表す俯瞰スコアを算出する第２のステップとを含む。 In the first anomaly detection method of the present invention, the anomaly score calculation means for each data learns a probability distribution of a series of data series to be an anomaly detection target, the data predicted from the learned probability distribution, the actual data, The first step of calculating an abnormal score representing the degree of deviation for each data in the data series and a plurality of classification criteria provided corresponding to different classification criteria by a combination of one or more data attributes The overhead score calculation means learns the conditional probability distribution of the calculated abnormal score for the data belonging to the group for each group in which the individual data in the data series is classified according to the corresponding classification criteria, And a second step of calculating a bird's-eye view score representing the degree of abnormality from the learned conditional probability distribution.

本発明によれば、複数の属性によるデータ間の相関を考慮することにより、異常データをより精度良く検出することができる。 According to the present invention, abnormal data can be detected with higher accuracy by considering the correlation between data based on a plurality of attributes.

次に、本発明の実施の形態について図面を参照して詳細に説明する。 Next, embodiments of the present invention will be described in detail with reference to the drawings.

［第１の実施の形態］
図１を参照すると、本発明の第１の実施の形態に係る異常検出装置１００は、プログラム制御によって動作するデータ処理装置１０１と、キーボードやデータ受信装置あるいはファイル装置などで構成される入力装置１０２と、液晶ディスプレイなどで構成される表示装置１０３と、磁気ディスクや半導体メモリなどで構成される複数の記憶部、すなわちログデータ記憶部１０４、確率分布記憶部１０５、データ別異常スコア記憶部１０６、ｎ階層分のグループ記憶部１０７−１〜１０７−ｎ、ｎ階層分の条件付分布記憶部１０８−１〜１０８−ｎ、ｎ階層分の俯瞰スコア記憶部１０９−１〜１０９−ｎ、検出結果記憶部１１０を備えている。 [First Embodiment]
Referring to FIG. 1, an abnormality detection apparatus 100 according to a first embodiment of the present invention includes a data processing apparatus 101 that operates by program control, and an input apparatus 102 that includes a keyboard, a data reception apparatus, a file apparatus, and the like. A display device 103 composed of a liquid crystal display, etc., and a plurality of storage units composed of a magnetic disk, a semiconductor memory, etc., that is, a log data storage unit 104, a probability distribution storage unit 105, a data-specific abnormality score storage unit 106 Group storage units 107-1 to 107-n for n layers, conditional distribution storage units 108-1 to 108-n for n layers, overhead score storage units 109-1 to 109-n for n layers, detection results A storage unit 110 is provided.

ログデータ記憶部１０４は、異常を検出する対象となるログデータを記憶するために用いられる。ログデータと検出したい異常の例としては、Ｗｅｂアクセスログから不正なアクセスを検出する、ＰＣ（パーソナルコンピュータ）のイベントログからユーザの異常行動（不正な行動等）を検出する、機器の出力するログ（syslog等）から機器の異常（故障等）を検出する、などが挙げられる。以下では、ＰＣのイベントログからユーザの異常行動（不正な行動等）を検出する場合を例に取り上げる。しかし、本発明はこの例に限定されるものではなく、様々なログデータからの異常検出に適用することができる。 The log data storage unit 104 is used to store log data that is a target for detecting an abnormality. Examples of log data and abnormalities to be detected include detecting unauthorized access from a web access log, detecting abnormal user behavior (such as illegal behavior) from a PC (personal computer) event log, and log output by a device For example, an abnormality (failure or the like) of a device is detected from (syslog or the like). Below, the case where a user's abnormal action (an illegal action etc.) is detected from the event log of PC is taken up as an example. However, the present invention is not limited to this example, and can be applied to abnormality detection from various log data.

図２を参照すると、ログデータ記憶部１０４に記憶されるＰＣのイベントログは、ログＩＤと、日時情報と、ユーザ名と、イベントデータとで構成される。ログＩＤは、個々のイベントログを一意に識別するための番号である。日時情報は、当該イベントログが採取された日時を示す。ユーザ名は、当該イベントログを発生させたユーザの識別子である。イベントデータは、ユーザがＰＣに対して入力したコマンド名や、ユーザの使用した実行ファイル名などの操作情報である。ここで、日時情報、ユーザ名が、データ属性の一例である。 Referring to FIG. 2, the PC event log stored in the log data storage unit 104 includes a log ID, date and time information, a user name, and event data. The log ID is a number for uniquely identifying each event log. The date / time information indicates the date / time when the event log was collected. The user name is an identifier of the user who generated the event log. The event data is operation information such as a command name input to the PC by the user and an execution file name used by the user. Here, the date information and the user name are examples of data attributes.

確率分布記憶部１０５は、異常を検出する対象となるログデータの発生確率分布を一時的に記憶するために用いられる。 The probability distribution storage unit 105 is used to temporarily store an occurrence probability distribution of log data to be detected for abnormality.

データ別異常スコア記憶部１０６は、ログデータ毎の異常スコアを一時的に記憶するために用いられる。図３を参照すると、データ別異常スコア記憶部１０６には、ログＩＤに対応付けて、そのログＩＤを持つログデータについて算出された異常スコアが記憶される。 The data-specific abnormality score storage unit 106 is used to temporarily store an abnormality score for each log data. Referring to FIG. 3, the abnormal score calculated for log data having the log ID is stored in the data-specific abnormal score storage unit 106 in association with the log ID.

各階層のグループ記憶部１０７−１〜１０７−ｎは、１以上のデータ属性の組み合わせによるそれぞれ異なる分類基準に対応して設けられている。本実施の形態では、第ｉ階層の分類基準x_iは、第ｉ−１階層の分類基準x_i-1に新たなデータ属性を付け加えたものになっている。例えば、第１階層の分類基準が「週」の場合、第２階層の分類基準は「その週における或る日」であり、また第３階層の分類基準は「その週のその日における或るユーザ」である。 The group storage units 107-1 to 107-n in each hierarchy are provided corresponding to different classification criteria based on a combination of one or more data attributes. In the present embodiment, the classification criterion x _i of the i-th layer is obtained by adding a new data attribute to the classification criterion x _i-1 of the _i-th layer. For example, when the classification criterion of the first hierarchy is “week”, the classification criterion of the second hierarchy is “a certain day in the week”, and the classification criterion in the third hierarchy is “a certain user on that day of the week”. Is.

第ｉ階層グループ記憶部１０７−ｉは、第ｉ階層の分類基準x_iに従って、データ別異常スコア記憶部１０６に記憶された異常スコアを分類したときに生成される各グループ毎の異常スコアを一時的に記憶するために用いられる。図４を参照すると、第ｉ階層グループ記憶部１０７−ｉには、各グループ毎に、分類基準値と、この分類基準値で特定されるグループに属する異常スコアのリストとが記憶される。第ｉ階層の分類基準x_iが取り得る値の集合を、
χ_i＝｛x'_i,1,…,x'_i,mi｝ …（１）
とするとき、x'_i,1,…,x'_i,miのそれぞれが分類基準値となる。例えば、第ｉ階層の分類基準x_iが「週」であれば、x'_i,1は第１週を、x'_i,2は第２週を、それぞれ示し、m_iは週の数（＝グループ数）を表す。また、第ｉ階層の分類基準x_iが「週と日付の組み合わせ」であれば、x'_i,1はその週における或る日を、x'_i,2はその週における別の或る日を、それぞれ示し、m_iは日数（＝グループ数）を表す。さらに、第ｉ階層の分類基準x_iが「週と日付とユーザ名の組み合わせ」であれば、x'_i,1はその週におけるその日の或るユーザ名を、x'_i,2はその週におけるその日の別のユーザ名を、それぞれ示し、m_iはユーザ数（＝グループ数)を表す。 The i-th layer group storage unit 107-i temporarily stores an abnormality score for each group generated when the abnormality score stored in the data-specific abnormality score storage unit 106 is classified according to the classification criterion x _i of the i-th layer. Used to store automatically. Referring to FIG. 4, the i-th layer group storage unit 107-i stores, for each group, a classification reference value and a list of abnormal scores belonging to the group specified by the classification reference value. A set of values that the classification criterion x _i of the i-th layer can take,
χ _i = {x ' _{i, 1} , ..., x' _{i, mi} } (1)
, X ′ _{i, 1} ,..., X ′ _{i, mi} are classification reference values. For example, if the classification criteria x _i of the i-th layer is "week", "a _{i, 1} is the first week, x 'x a _{i, 2} is the second week, respectively, m _i is the number of weeks ( = Number of groups). If the classification criterion x _i of the i-th layer is “a combination of week and date”, x ′ _{i, 1} represents a certain day in the week _, and x ′ _{i, 2} represents another certain day in the week. the, respectively, m _i denotes the number of days (= number of groups). In addition, if the classification criteria x _i of the i-th hierarchy is "a combination of the week and the date and user name", 'a _{i, 1} is a certain user name of the day in the _week, x' x _{i, 2} is the week another user name of the day in, respectively, m _i denotes the number of users (= number of groups).

各階層の条件付分布記憶部１０８−１〜１０８−ｎは、対応する階層のグループ記憶部１０７−１〜１０７−ｎに記憶されているグループ毎に、そのグループに属する異常スコアの条件付確率分布を一時的に記憶するために用いられる。図５を参照すると、第ｉ階層の条件付分布記憶部１０８−ｉには、その階層に属するグループ毎に、分類基準値と、この分類基準値で特定されるグループに属する異常スコアの条件付確率分布とが記憶される。 The conditional distribution storage units 108-1 to 108-n of each hierarchy have, for each group stored in the group storage units 107-1 to 107-n of the corresponding hierarchy, conditional probabilities of abnormal scores belonging to that group. Used to temporarily store the distribution. Referring to FIG. 5, the conditional distribution storage unit 108-i in the i-th hierarchy includes, for each group belonging to that hierarchy, conditional classification values and abnormal scores belonging to the groups specified by the classification reference values. A probability distribution is stored.

異常スコアの条件付確率分布は、次式で表される。
p_i,j(S|x'_i,j,θ_i,j)(i=1,…,n、j=1,…,m_i) …（２）
例えば、分類基準x_iが「週」の場合、p_i,jはj番目の週の異常スコア分布を表す。つまり、p_i,・は、週毎の異常スコアの分布を表す。 The conditional probability distribution of the abnormal score is expressed by the following equation.
p _{i, j} (S | x ′ _{i, j} , θ _{i, j} ) (i = 1,…, n, j = 1, ..., m _i ) (2)
For example, when the classification criterion x _i is “week”, p _{i, j} represents the abnormal score distribution of the j-th week. That is, p _i, ... Represents the distribution of abnormal scores for each week.

各階層の俯瞰スコア記憶部１０９−１〜１０９−ｎは、対応する階層の条件付分布記憶部１０８−１〜１０８−ｎに記憶されている各グループ毎の異常スコアの条件付確率分布から求められた大域的な異常の程度を表す俯瞰スコアを一時的に記憶するために用いられる。図６を参照すると、第ｉ階層の俯瞰スコア記憶部１０９−ｉには、その階層に属するグループ毎に、分類基準値と、この分類基準値で特定されるグループに属する異常スコアの条件付確率分布から求められた俯瞰スコアとが記憶される。 The overhead score storage units 109-1 to 109-n of each hierarchy are obtained from the conditional probability distribution of the abnormal score for each group stored in the conditional distribution storage units 108-1 to 108-n of the corresponding hierarchy. It is used to temporarily store a bird's-eye view score representing the degree of global abnormality. Referring to FIG. 6, the overhead score storage unit 109-i of the i-th hierarchy stores, for each group belonging to the hierarchy, the conditional reference value of the abnormal score belonging to the group specified by the classification reference value and the classification reference value. The bird's-eye view score obtained from the distribution is stored.

俯瞰スコアをG_i,j(i=1,…,n、j=1,…,m_i)と記す。俯瞰スコアG_i,jは、分類基準x_iがx'_i,jをとった場合の異常さを表す量である。異常スコアの条件付確率分布から俯瞰スコアG_i,jを求める方法としては、例えば、以下のような方法がある。 The overhead score is denoted as G _{i, j} (i = 1,..., N, j = 1,..., M _i ). The bird's-eye view score G _{i, j} is an amount representing an abnormality when the classification criterion x _i is x ′ _{i, j} . As a method of obtaining the overhead score G _{i, j} from the conditional probability distribution of the abnormal score, for example, there are the following methods.

例１：
G_i,j＝（異常スコアの条件付確率分布p_i,jの尖度） …（３）
ここで、尖度は分布p_i,jの裾が重い（長い）かどうかを示す指標であるため、尖度が大きいほど（裾が重いほど）、異常の程度が大きいことを示す。 Example 1:
G _{i, j} = (kurtosis of conditional probability distribution p _{i, j} of abnormal score) (3)
Here, since the kurtosis is an index indicating whether the tail of the distribution p _{i, j} is heavy (long), the larger the kurtosis (the heavier the skirt), the greater the degree of abnormality.

例２：
G_i,j＝（異常スコアの条件付確率分布p_i,jの99.9パーセンタイル点の値） …（４）
ここで、パーセンタイル点は、データの値を大きさの順に並べ、それ以下のサンプル数がその割合であるような順位に対応するデータの値であるため、99.9パーセンタイル点の値が大きいほど、異常の程度が大きいことを示す。 Example 2:
G _{i, j} = (value of 99.9 percentile point of conditional probability distribution p _{i, j} of abnormal score) (4)
Here, the percentile points are the data values corresponding to the order in which the data values are arranged in the order of size and the number of samples below that is the ratio, so the larger the 99.9 percentile point value, the more abnormal Indicates that the degree of is large.

上記２つの式（３）、（４）では、分類基準x_iがx'_i,jをとった場合にどれだけ高い異常スコアが密集しているかで、俯瞰スコアを定義している。この定義を用いる場合、例えば、分類基準x_iが「週」を表すならば、大きいG_i,jは異常が密集して現れた週に対応する。また、分類基準x_iが「週と日の組み合わせ」を表すならば、大きいG_i,jは週のうち異常が密集して現れた日に対応する。さらに、分類基準x_iが「週と日とユーザ名の組み合わせ」を表すならば、大きいG_i,jは週のある日に異常な行動をとったユーザに対応する。 In the above two formulas (3) and (4), the bird's-eye view score is defined by how dense the abnormal scores are when the classification criterion x _i is x ′ _{i, j} . When this definition is used, for example, if the classification criterion x _i represents “week”, a large G _{i, j} corresponds to a week in which abnormalities appear densely. Further, if the classification criterion x _i represents “a combination of week and day”, a large G _{i, j} corresponds to a day in which abnormalities appear densely in the week. Further, if the classification criterion x _i represents “a combination of a week, a day, and a user name”, a large G _{i, j} corresponds to a user who takes an abnormal action on a certain day of the week.

また、この定義を用いる場合、俯瞰スコアが高いことは、通常のパターンから外れた異常な行動が特定の分類基準（つまり、特定のデータ属性）に集中していることを表す。つまり、俯瞰スコアが高いことは突発的に例外的な行動が現れたのではなく意図的に例外的な行動が連続して行われた可能性が高いことに対応する。そして、下位の階層にいくほど分類基準を定める条件が厳しくなるため、そのことがより確実なものとなる。例えば、分類基準「週」で分類した或る週の俯瞰スコアが高い場合に比べて、その週における或る日を分類基準とする或る日の俯瞰スコアが高い方が、より意図的に例外的な行動が連続して行われた可能性が高くなり、さらに、その日におけるユーザ名で分類した或るユーザの俯瞰スコアが高ければ、より一層、突発的に例外的な行動が現れた可能性が低くなり、意図的に例外的な行動が行われた可能性が高くなる。この結果、単一の分類基準による俯瞰スコアを用いるよりも、階層化された分類基準による俯瞰スコアを用いた方が、異常検出の精度を向上することができる。 In addition, when this definition is used, a high bird's-eye view score indicates that abnormal behaviors that deviate from the normal pattern are concentrated on a specific classification criterion (that is, a specific data attribute). That is, a high bird's-eye view score corresponds to the fact that there is a high possibility that exceptional behaviors have been intentionally performed continuously, not suddenly. And since the condition which defines a classification standard becomes severe, so that it goes to a lower hierarchy, it becomes more certain. For example, compared with the case where the bird's-eye view score for a certain week classified according to the classification standard “week” is higher, the one with a higher day-of-the-week score based on a certain day in that week has a higher intentional exception. If a certain user's bird's-eye view score classified by the user name on that day is high, there is a possibility that unusual behavior will appear even more suddenly. , And there is a high probability that exceptional behavior was intentionally performed. As a result, it is possible to improve the accuracy of abnormality detection by using an overhead score based on a hierarchical classification criterion rather than using an overhead score based on a single classification criterion.

検出結果記憶部１１０は、異常の有無や異常と判定されたログデータ等を含む検出結果を一時的に記憶するために用いられる。 The detection result storage unit 110 is used to temporarily store a detection result including presence / absence of an abnormality and log data determined to be abnormal.

一方、データ処理装置１０１は、データ入力部１２１と、データ別異常スコア計算部１２２と、各階層毎の分類基準別俯瞰スコア計算部１２３−１〜１２３−ｎと、異常検出制御部１２４とを有する。 On the other hand, the data processing apparatus 101 includes a data input unit 121, an abnormal score calculation unit 122 for each data, an overhead score calculation unit 123-1 to 123-n for each layer and an abnormality detection control unit 124. Have.

データ入力部１２１は、入力装置１０２からログデータを入力し、ログデータ記憶部１０４に格納する機能を有する。 The data input unit 121 has a function of inputting log data from the input device 102 and storing it in the log data storage unit 104.

データ別異常スコア計算部１２２は、ログデータ記憶部１０４に記憶された一連のログデータの発生確率分布を学習し、その学習した確率分布から予測されるデータと実際のデータとのずれの程度を表す異常スコアを、個々のログデータについて算出する機能を有する。このデータ別異常スコア計算部１２２は、分布学習部１３１と異常スコア計算部１３２とで構成される。 The data-specific abnormality score calculation unit 122 learns the occurrence probability distribution of a series of log data stored in the log data storage unit 104, and determines the degree of deviation between the data predicted from the learned probability distribution and the actual data. It has a function to calculate an abnormal score for each log data. The data-specific abnormality score calculation unit 122 includes a distribution learning unit 131 and an abnormality score calculation unit 132.

分布学習部１３１は、ログデータ記憶部１０４からログデータを発生日時順に読み出し、ログデータ中のイベントデータの発生確率分布を、有限個のパラメータで規定される統計モデルとして学習し、学習した確率分布を確率分布記憶部１０５に記憶する。ログデータ記憶部１０４に記憶されたログデータのうちt番目のデータをy_tと記す。分布学習部１３１は、ログデータ記憶部１０４から入力したデータの確率分布p_t(y|θ_t)を学習する。ここで、θは確率分布のパラメータを表す。また、分布学習部１３１は、データの入力と共に逐次的に確率分布を更新する。 The distribution learning unit 131 reads log data from the log data storage unit 104 in the order of occurrence date and time, learns the occurrence probability distribution of event data in the log data as a statistical model defined by a finite number of parameters, and learns the probability distribution Is stored in the probability distribution storage unit 105. The t-th data of the stored log data in the log data storage unit 104 referred to as y _t. The distribution learning unit 131 learns the probability distribution p _t (y | θ _t ) of data input from the log data storage unit 104. Here, θ represents a parameter of the probability distribution. In addition, the distribution learning unit 131 sequentially updates the probability distribution with data input.

異常スコア計算部１３２は、分布学習部１３１で学習された確率分布を確率分布記憶部１０５から読み出し、データy_tの異常スコアS_tを算出し、データ別異常スコア記憶部１０６へ格納する。異常スコアは、通常の出現パターンから外れたデータほど大きい値が与えられるようなスコアである。例えば、次式に示すように、確率分布p_t-1を用いてt番目のデータy_tを予測した際の対数予測損失を異常スコアとして用いることができる。
S_t＝-log p_t-1(y_t|θ_t) …（５） Anomaly score calculation unit 132 reads the probability distribution learned by the distribution learning unit 131 from the probability distribution storage unit 105, calculates the anomaly score S _t data y _t, and stores the data by the abnormality score storage unit 106. The abnormal score is a score that gives a larger value to data that deviates from the normal appearance pattern. For example, as shown in the following equation, the logarithmic prediction loss when the t-th data y _t is predicted using the probability distribution p _t−1 can be used as the abnormal score.
S _t = -log p _t-1 (y _t | θ _t ) (5)

第１階層の分類基準別俯瞰スコア計算部１２３−１は、ログデータを第１階層の分類基準に従って分類したグループ毎に、当該グループに属するログデータについて異常スコア計算部１３２で算出された異常スコアの条件付確率分布を学習し、この学習した条件付確率分布から俯瞰スコアを算出する機能を有する。この第１階層の分類基準別俯瞰スコア計算部１２３−１は、条件付分布学習部１４１と俯瞰スコア計算部１４２とで構成される。 For each group in which log data is classified according to the classification criteria of the first hierarchy, the abnormality score calculated by the abnormality score calculation unit 132 for the log data belonging to the group is classified by the first-tier classification reference overhead score calculation unit 123-1. The conditional probability distribution is learned, and an overhead view score is calculated from the learned conditional probability distribution. The first-level classification criterion-specific overhead score calculation unit 123-1 includes a conditional distribution learning unit 141 and an overhead score calculation unit 142.

条件付分布学習部１４１は、第１階層の分類基準x₁に従って、データ別異常スコア記憶部１０６に記憶された異常スコアを分類し、各グループ毎に分類基準値を付加して分類結果を第１階層グループ記憶部１０７−１に記憶し、次いで、第１階層グループ記憶部１０７−１に記憶した各グループ毎に、そのグループに属する異常スコアを読み出して、異常スコアの条件付確率分布
p_1,j(S|x'_1,j,θ_1,j)(j=1,…,m_i) …（６）
を学習し、学習した条件付確率分布を第１階層条件付分布記憶部１０８−１に記憶する。 Distribution learning unit 141 conditional according to the classification criteria x ₁ of the first layer, classifies the stored abnormality score data by the abnormality score storage unit 106, a classification result by adding the classification criterion value for each group a For each group stored in the first hierarchical group storage unit 107-1 and then stored in the first hierarchical group storage unit 107-1, the abnormal score belonging to that group is read out, and the conditional probability distribution of the abnormal score
p _{1, j} (S | x ′ _{1, j} , θ _{1, j} ) (j = 1,…, m _i ) (6)
And the learned conditional probability distribution is stored in the first hierarchical conditional distribution storage unit 108-1.

例えば、第１階層の分類基準x₁が「週」の場合、分布学習部１４１は、異常スコアに付加されたログＩＤをキーにログデータ記憶部１０４を検索して、各異常スコアに対応するログデータの日時情報を取得し、その日時情報から判明する週単位で異常スコアを分類し、各グループがどの週に対応するかを示す分類基準値を付加して、第１階層グループ記憶部１０７−１に記憶する。次に、条件付分布学習部１４１は、各週毎の異常スコアから、週毎の異常スコアの確率分布p(S|週)を学習し、第１階層条件付分布記憶部１０８−１に記憶する。 For example, when the classification criterion x ₁ of the first hierarchy is “week”, the distribution learning unit 141 searches the log data storage unit 104 using the log ID added to the abnormal score as a key, and corresponds to each abnormal score. The date and time information of the log data is acquired, the abnormal score is classified in units of weeks determined from the date and time information, a classification reference value indicating which week each group corresponds to is added, and the first hierarchical group storage unit 107 Store to -1. Next, the conditional distribution learning unit 141 learns the probability distribution p (S | week) of the abnormal score for each week from the abnormal score for each week, and stores it in the first hierarchical conditional distribution storage unit 108-1. .

俯瞰スコア計算部１４２は、第１階層条件付分布記憶部１０８−１に記憶された各グループ毎に、そのグループの異常スコアの条件付確率分布p_1,jを読み出し、この条件付確率分布p_1,jから前述した式（３）または式（４）を用いて、異常の程度を表す俯瞰スコアを算出して第１階層俯瞰スコア記憶部１０９−１に記憶する。例えば、第１階層の分類基準x₁が「週」の場合、俯瞰スコア計算部１４２は、週毎の異常度を表す俯瞰スコアG_1,j(j=1,2,…。jは週を表す)を算出する。 For each group stored in the first hierarchical conditional distribution storage unit 108-1, the bird's-eye view score calculation unit 142 reads the conditional probability distribution p1 _{, j} of the abnormal score of the group, and this conditional probability distribution p. _{Using the} above-described equation (3) or equation (4) from _{1, j} , an overhead score representing the degree of abnormality is calculated and stored in the first hierarchy overhead score storage unit 109-1. For example, when the classification criterion x ₁ of the first hierarchy is “week”, the overhead score calculation unit 142 displays an overhead score G _{1, j} (j = 1, 2,... Represent).

第２階層の分類基準別俯瞰スコア計算部１２３−２は、ログデータを第２階層の分類基準に従って分類したグループ毎に、当該グループに属するログデータについて異常スコア計算部１３２で算出された異常スコアの条件付確率分布を学習し、この学習した条件付確率分布から俯瞰スコアを算出する機能を有する。この第２階層の分類基準別俯瞰スコア計算部１２３−２は、条件付分布学習部１５１と俯瞰スコア計算部１５２とで構成される。 The second-level classification-based overhead score calculation unit 123-2 calculates the abnormal score calculated by the abnormal score calculation unit 132 for the log data belonging to the group for each group in which the log data is classified according to the second-layer classification criteria. The conditional probability distribution is learned, and an overhead view score is calculated from the learned conditional probability distribution. This overhead reference calculation unit 123-2 for each classification criterion in the second hierarchy includes a conditional distribution learning unit 151 and an overhead score calculation unit 152.

条件付分布学習部１５１は、第１階層グループ記憶部１０７−１に記憶された各グループ毎に、第２階層の分類基準x₂に従って異常スコアを分類し、この分類で生成された各グループ毎に分類基準値を付加して分類結果を第２階層グループ記憶部１０７−２に記憶し、次いで、第２階層グループ記憶部１０７−１に記憶した各グループ毎に、そのグループに属する異常スコアを読み出して、異常スコアの条件付確率分布
p_2,j(S|x'_2,j,θ_2,j)(j=1,…,m_i) …（７）
を学習し、学習した条件付確率分布を第２階層条件付分布記憶部１０８−２に記憶する。 Conditional distribution learning unit 151, for each group stored in the first hierarchical group storage unit 107-1, the abnormality score according to the classification criteria x ₂ of the second layer classifies, for each group generated in this classification The classification reference value is added to and the classification result is stored in the second hierarchical group storage unit 107-2. Then, for each group stored in the second hierarchical group storage unit 107-1, an abnormality score belonging to the group is calculated. Read and conditional probability distribution of abnormal scores
p _{2, j} (S | x ′ _{2, j} , θ _{2, j} ) (j = 1,…, m _i ) (7)
And the learned conditional probability distribution is stored in the second hierarchical conditional distribution storage unit 108-2.

例えば、第１階層の分類基準x₁が「週」で、第２階層の分類基準x₂が「週と日との組み合わせ」の場合、条件付分布学習部１５１は、第１階層グループ記憶部１０７−１に記憶された週単位のグループ毎に、異常スコアに付加されたログＩＤをキーにログデータ記憶部１０４を検索して、各異常スコアに対応するログデータの日時情報を取得し、その日時情報から判明する日単位で異常スコアをより細かく分類し、この分類で生成した各グループがどの週のどの日に対応するかを示す分類基準値を付加して、第２階層グループ記憶部１０７−２に記憶する。次に、条件付分布学習部１５１は、各週の各日毎の異常スコアから、週の日毎の異常スコアの確率分布p(S|週&日)を学習し、第２階層条件付分布記憶部１０８−２に記憶する。 For example, when the classification criterion x ₁ of the first hierarchy is “week” and the classification criterion x ₂ of the second hierarchy is “combination of week and day”, the conditional distribution learning unit 151 includes the first hierarchy group storage unit For each weekly group stored in 107-1, the log data storage unit 104 is searched using the log ID added to the abnormal score as a key, and the date and time information of the log data corresponding to each abnormal score is acquired, A second hierarchy group storage unit that classifies abnormal scores more finely in units of days determined from the date and time information, adds a classification reference value indicating which day of which week each group generated by this classification corresponds to, It memorize | stores in 107-2. Next, the conditional distribution learning unit 151 learns the probability distribution p (S | week & day) of the abnormal score for each day of the week from the abnormal score for each day of each week, and the second hierarchical conditional distribution storage unit 108. -2.

俯瞰スコア計算部１５２は、第２階層条件付分布記憶部１０８−２に記憶された各グループ毎に、そのグループの異常スコアの条件付確率分布p_2,jを読み出し、この条件付確率分布p_2,jから前述した式（３）または式（４）を用いて、異常の程度を表す俯瞰スコアを算出して第２階層俯瞰スコア記憶部１０９−２に記憶する。例えば、第２階層の分類基準x₁が「週と日の組み合わせ」の場合、俯瞰スコア計算部１５２は、週と日の組み合わせ毎の異常度を表す俯瞰スコアG_2,j(j=1,2,…。jはどの週のどの日かを表す)を算出する。 For each group stored in the second hierarchical conditional distribution storage unit 108-2, the overhead score calculation unit 152 reads out the conditional probability distribution p _{2, j} of the abnormal score of the group, and this conditional probability distribution p _The overhead score representing the degree of abnormality is calculated from the above-described formula (3) or formula (4) from _{2, j} and stored in the second hierarchy overhead score storage unit 109-2. For example, if the classification criteria x ₁ of the second hierarchy is "a combination of the week and day", overhead score calculation unit 152, bird's-eye score G ₂ representing the abnormality degree for each combination of the week and _{day, j} (j = 1, 2, .... j represents which day of the week).

最終階層である第ｎ階層の分類基準別俯瞰スコア計算部１２３−ｎは、ログデータを第ｎ階層の分類基準に従って分類したグループ毎に、当該グループに属するログデータについて異常スコア計算部１３２で算出された異常スコアの条件付確率分布を学習し、この学習した条件付確率分布から俯瞰スコアを算出する機能を有する。この第ｎ階層の分類基準別俯瞰スコア計算部１２３−ｎは、条件付分布学習部１６１と俯瞰スコア計算部１６２とで構成される。 For each group in which the log data is classified according to the classification criteria of the nth hierarchy, the abnormality score calculation unit 132 calculates the log data belonging to the group for each group in which the log data is classified according to the classification criteria of the nth hierarchy. It has a function of learning a conditional probability distribution of an abnormal score that has been learned and calculating an overhead score from the learned conditional probability distribution. This n-th layer classification reference-specific overhead score calculation unit 123-n includes a conditional distribution learning unit 161 and an overhead score calculation unit 162.

条件付分布学習部１６１は、第ｎ−１階層グループ記憶部１０７−ｎ−１に記憶されたグループ毎に、第ｎ階層の分類基準x_nに従って異常スコアを分類し、この分類で生成されたグループ毎に分類基準値を付加して分類結果を第ｎ階層グループ記憶部１０７−ｎに記憶し、次いで、第ｎ階層グループ記憶部１０７−ｎに記憶したグループ毎に、そのグループに属する異常スコアを読み出して、異常スコアの条件付確率分布
p_n,j(S|x'_n,j,θ_n,j)(j=1,…,m_i) …（８）
を学習し、学習した条件付確率分布を第ｎ階層条件付分布記憶部１０８−ｎに記憶する。 The conditional distribution learning unit 161 classifies the abnormal score for each group stored in the (n-1) -th layer group storage unit 107-n-1 according to the n-th layer classification criterion _xn , and is generated by this classification. A classification reference value is added to each group, and the classification result is stored in the n-th layer group storage unit 107-n. Then, for each group stored in the n-th layer group storage unit 107-n, an abnormality score belonging to that group is stored. , And the conditional probability distribution of abnormal scores
p _{n, j} (S | x ′ _{n, j} , θ _{n, j} ) (j = 1,…, m _i ) (8)
And the learned conditional probability distribution is stored in the nth hierarchical conditional distribution storage unit 108-n.

例えば、ｎ＝３であり、第１階層の分類基準x₁が「週」で、第２階層の分類基準x₂が「週と日との組み合わせ」で、第３階層の分類基準x₃が「週と日とユーザ名の組み合わせ」の場合、条件付分布学習部１６１は、第２階層グループ記憶部１０７−２に記憶された日単位のグループ毎に、異常スコアに付加されたログＩＤをキーにログデータ記憶部１０４を検索して、各異常スコアに対応するログデータのユーザ名を取得し、そのユーザ名で異常スコアをより細かく分類し、この分類で生成した各グループがどの週のどの日のどのユーザ名に対応するかを示す分類基準値を付加して、第ｎ階層グループ記憶部１０７−ｎに記憶する。次に、条件付分布学習部１６１は、各週の各日の各ユーザ毎の異常スコアから、週と日とユーザ名の組み合わせ毎の異常スコアの確率分布p(S|週&日&ユーザ)を学習し、第ｎ階層条件付分布記憶部１０８−ｎに記憶する。 For example, n = 3, the classification criterion x ₁ of the first hierarchy is “week”, the classification criterion x ₂ of the second hierarchy is “a combination of week and day”, and the classification criterion x ₃ of the third hierarchy is In the case of “a combination of a week, a day, and a user name”, the conditional distribution learning unit 161 sets the log ID added to the abnormality score for each daily group stored in the second hierarchical group storage unit 107-2. The log data storage unit 104 is searched for the key, the user name of the log data corresponding to each abnormality score is obtained, the abnormality score is further classified by the user name, and each group generated by this classification is assigned to which week. A classification reference value indicating which user name corresponds to which day is added and stored in the n-th layer group storage unit 107-n. Next, the conditional distribution learning unit 161 calculates the probability distribution p (S | week & day & user) of the abnormal score for each combination of week, day, and user name from the abnormal score for each user on each day of each week. Learning is performed and stored in the n-th layer conditional distribution storage unit 108-n.

俯瞰スコア計算部１６２は、第ｎ階層条件付分布記憶部１０８−ｎに記憶された各グループ毎に、そのグループの異常スコアの条件付確率分布p_n,jを読み出し、この条件付確率分布p_n,jから前述した式（３）または式（４）を用いて、異常の程度を表す俯瞰スコアを算出して第ｎ階層俯瞰スコア記憶部１０９−ｎに記憶する。例えば、ｎ＝３で、第３階層の分類基準x₃が「週と日とユーザ名の組み合わせ」の場合、俯瞰スコア計算部１６２は、週と日とユーザ名の組み合わせ毎の異常度を表す俯瞰スコアG_3,j(j=1,2,…。jはどの週のどの日のどのユーザかを表す)を算出する。 The overhead score calculation unit 162 reads out the conditional probability distribution p _{n, j} of the abnormal score of each group stored in the n-th layer conditional distribution storage unit 108-n, and this conditional probability distribution p _n, using the formula described above from _j (3) or (4), and stored in the n-th hierarchy overhead score storage unit 109-n to calculate the overhead score representing the degree of the abnormality. For example, in n = 3, if the classification criteria x ₃ of the third hierarchy is "a combination of the week and day and the user name", overhead score calculation unit 162, represents a week and day and each combination of abnormal level of a user name An overhead score G _{3, j} (j = 1, 2,... _{, J} represents which user of which day of which week) is calculated.

異常検出制御部１２４は、各階層の俯瞰スコア記憶部１０９−１〜１０９−ｎに記憶された俯瞰スコアG_i,j(i=1,…,n、j=1,…,m_i)と、データ別異常スコア記憶部１０６に記憶されたログデータ毎の異常スコアS_t(t=1,2,…)とを入力として、異常を検出し、検出結果を検出結果記憶部１１０に格納すると同時に表示装置１０３に表示する機能を有する。 Abnormality detection control unit 124, bird's-eye score G _i stored in the overhead score storage unit 109-1 to 109-n of each _{layer, j (i = 1, ...} , n, j = 1, ..., m i) and When the abnormality score _St (t = 1, 2,...) For each log data stored in the data-specific abnormality score storage unit 106 is input, an abnormality is detected, and the detection result is stored in the detection result storage unit 110. At the same time, it has a function of displaying on the display device 103.

一定量のログデータについてデータ別異常スコアが算出されている場合の異常検出制御部１２４による異常検出手順として、本実施の形態では、以下のような手順を用いる（この手順を階層的異常検出手順と呼ぶ）。 In the present embodiment, the following procedure is used as an abnormality detection procedure performed by the abnormality detection control unit 124 when an abnormal score for each data is calculated for a certain amount of log data (this procedure is a hierarchical abnormality detection procedure). Called).

＜階層的異常検出手順＞
手順１：まず、第１階層の分類基準別俯瞰スコア計算部１２３−１を起動し、データ別異常スコア記憶部１０６に記憶された異常スコアを、第１階層の分類基準に従って分類し、グループ毎の条件付確率分布と俯瞰スコアとを生成する。 <Hierarchical abnormality detection procedure>
Procedure 1: First, the first-level classification-specific overhead score calculation unit 123-1 is activated, and the abnormal scores stored in the data-specific abnormal score storage unit 106 are classified according to the first-layer classification criteria, A conditional probability distribution and an overhead score are generated.

手順２：次に、第２階層の分類基準別俯瞰スコア計算部１２３−２から最後の第ｎ階層の分類基準別俯瞰スコア計算部１２３−ｎまで順に、一つ上の階層の分類基準別俯瞰スコア計算部の処理が終了するのを待って、そこで生成されたグループのうち、算出された俯瞰スコアが予め設定された閾値以上のグループについてのみ、そのグループを当該階層に対応する分類基準でさらに細かく分類し、この分類して生成されたグループ毎の条件付確率分布と俯瞰スコアとを生成する。 Step 2: Next, from the second-tier classification reference overhead score calculation unit 123-2 to the last n-th hierarchy classification reference overhead score calculation unit 123-n, the next higher-level classification reference overhead Waiting for the processing of the score calculation unit to end, and among the groups generated there, only for the group whose calculated overhead score is equal to or higher than a preset threshold, the group is further classified according to the classification criterion corresponding to the hierarchy. It classifies finely and generates a conditional probability distribution and an overhead score for each group generated by this classification.

手順３：第ｎ階層の分類基準別俯瞰スコア計算部１２３−ｎまで計算が進み、そこで生成された俯瞰スコアのうち予め設定された閾値以上の俯瞰スコアが１つでも存在すれば異常ありとする検出結果を生成する。他方、上位の層で閾値以上の俯瞰スコアを持つグループが１つも生成されなかったために第ｎ階層の分類基準別俯瞰スコア計算部１２３−ｎまで計算が進まなかった場合、および、第ｎ階層の分類基準別俯瞰スコア計算部１２３−ｎまで処理を終えたが、第ｎ階層の分類基準別俯瞰スコア計算部１２３−ｎで算出された俯瞰スコアの中に閾値以上のものが１つもなかった場合、異常なしとする検出結果を生成する。 Step 3: The calculation proceeds to the n-th layer classification criterion-specific bird's-eye view score calculation unit 123-n, and if there is even one bird's-eye view score that is equal to or higher than a preset threshold among the bird's-eye view scores generated there, there is an abnormality. Generate detection results. On the other hand, when no group having an overhead score that is equal to or higher than the threshold is generated in the upper layer, the calculation does not proceed to the n-th layer classification criterion-based overhead score calculation unit 123-n, and When the processing has been completed up to the classification reference-based overhead score calculation unit 123-n, but there is no overhead score calculated by the classification reference-specific overhead score calculation unit 123-n in the nth layer that is greater than or equal to the threshold value. A detection result indicating no abnormality is generated.

異常ありの検出結果には、判定の根拠とした第ｎ階層の俯瞰スコアやそれ以前の階層の俯瞰スコアを含めるようにしても良いし、異常と判定した各階層の俯瞰スコアに対応するログデータの全て或いは閾値以上の異常スコアを持つログデータを異常データとして含めるようにしても良い。このように、異常スコアの大小のみではなく、例えば、どの週のどの日に異常だった、などといった複数の俯瞰スコアの情報を検出結果に含めて出力することにより、どのような異常だったかがわかり易くなる、つまり可読性を高めることができる。 The detection result with abnormality may include the bird's-eye view score of the nth layer as a basis for the determination or the bird's-eye view score of the previous layer, or log data corresponding to the bird's-eye view score of each layer determined to be abnormal. Log data having an abnormality score equal to or greater than the threshold may be included as abnormality data. In this way, it is not only the magnitude of the abnormal score, but it is easy to understand what kind of abnormality it was, for example, by outputting information including multiple overhead scores such as which week was abnormal on which day, etc. That is, readability can be improved.

次に、図７のフローチャートを参照して、本実施の形態の動作を説明する。 Next, the operation of the present embodiment will be described with reference to the flowchart of FIG.

最初に、データ処理装置１０１のデータ入力部１２１は、入力装置１０２から異常検出対象となる一連のログデータを入力し、ログデータ記憶部１０４に記憶する（図７のステップＳ１０１）。ログデータ記憶部１０４に記憶されるログデータは、図２で示したように、ログＩＤ、日時情報、ユーザ名およびイベントデータで構成されている。 First, the data input unit 121 of the data processing device 101 inputs a series of log data to be detected from the input device 102 and stores it in the log data storage unit 104 (step S101 in FIG. 7). As shown in FIG. 2, the log data stored in the log data storage unit 104 includes a log ID, date and time information, a user name, and event data.

次に、データ別異常スコア計算部１２２は、分布学習部１３１により、ログデータ記憶部１０４に記憶されたログデータにおけるイベントデータの発生確率分布を生成して確率分布記憶部１０５に記憶し、次に、異常スコア計算部１３２により、上記生成した発生確率分布に基づいて個々のログデータの異常の程度を表す異常スコアを生成し、データ別異常スコア記憶部１０６に記憶する（ステップＳ１０２）。データ別異常スコア記憶部１０６に記憶される異常スコアは、図３で示したように、その異常スコアがどのログデータのものであるかを示すログＩＤが付加されている。 Next, the data-specific abnormality score calculation unit 122 generates an event data occurrence probability distribution in the log data stored in the log data storage unit 104 by the distribution learning unit 131 and stores it in the probability distribution storage unit 105. In addition, the abnormal score calculation unit 132 generates an abnormal score indicating the degree of abnormality of each log data based on the generated probability distribution and stores it in the data-specific abnormal score storage unit 106 (step S102). As shown in FIG. 3, the abnormality score stored in the data-specific abnormality score storage unit 106 is added with a log ID indicating which log data the abnormality score is.

次に、異常検出制御部１２４は、制御変数ｉを１に初期化し（ステップＳ１０３）、制御変数ｉの値１に対応する第１階層の分類基準別俯瞰スコア計算部１２３−１による処理を行う（ステップＳ１０４）。具体的には、まず、第１階層の分類基準別俯瞰スコア計算部１２３−１は、条件付分布学習部１４１により、データ別異常スコア記憶部１０６に記憶されたデータ別の異常スコアを、第１階層の分類基準に基づいて分類し、図４に示したように各グループ毎の異常スコアを第１階層グループ記憶部１０７−１に記憶する。次に、同じく条件付分布学習部１４１により、上記生成した各グループ毎に、そのグループに属する一連の異常スコアの条件付確率分布を計算して、図５に示したように第１階層条件付分布記憶部１０８−１に記憶する。次に、俯瞰スコア計算部１４２により、上記生成した各グループ毎に、その条件付確率分布から大域的な異常度を示す俯瞰スコアを計算し、図６に示したように第１階層俯瞰スコア記憶部１０９−１に記憶する。 Next, the abnormality detection control unit 124 initializes the control variable i to 1 (step S103), and performs processing by the classification criteria-based overhead score calculation unit 123-1 corresponding to the value 1 of the control variable i. (Step S104). Specifically, first, the classification reference overhead score calculation unit 123-1 in the first hierarchy uses the conditional distribution learning unit 141 to calculate the abnormal score for each data stored in the abnormal score storage unit 106 for each data. Classification is performed based on the classification criteria of one layer, and the abnormal score for each group is stored in the first layer group storage unit 107-1 as shown in FIG. Next, the conditional distribution learning unit 141 calculates a conditional probability distribution of a series of abnormal scores belonging to the group for each of the generated groups as shown in FIG. Stored in the distribution storage unit 108-1. Next, the overhead score calculation unit 142 calculates an overhead score indicating a global abnormality degree from the conditional probability distribution for each of the generated groups, and stores the first-level overhead score storage as shown in FIG. Store in the unit 109-1.

次に、異常検出制御部１２４は、制御変数ｉの値１に対応する第１階層の俯瞰スコア記憶部１０９−１から各グループの俯瞰スコアを読み出し、閾値以上の俯瞰スコアが存在するか否かを判定する（ステップＳ１０５）。閾値以上の俯瞰スコアを持つグループが１つも存在しなければ（ステップＳ１０６でＮＯ）、異常検出制御部１２４は、異常なしの検出結果を生成し、検出結果記憶部１１０に記憶すると共に表示装置１０３に表示する（ステップＳ１０７）。 Next, the abnormality detection control unit 124 reads the bird's-eye view score of each group from the bird's-eye view score storage unit 109-1 in the first hierarchy corresponding to the value 1 of the control variable i, and whether or not there is an bird's-eye view score equal to or higher than the threshold value. Is determined (step S105). If there is no group having an overhead score equal to or higher than the threshold (NO in step S106), the abnormality detection control unit 124 generates a detection result without abnormality, stores the detection result in the detection result storage unit 110, and displays the display device 103. (Step S107).

閾値以上の俯瞰スコアを持つグループが１つ以上存在した場合（ステップＳ１０６でＹＥＳ）、異常検出制御部１２４は、制御変数ｉを＋１して２とし（ステップＳ１０８）、ｉの値が階層数ｎを超えていないことを確認して（ステップＳ１０９）、制御変数ｉの値２に対応する第２階層の分類基準別俯瞰スコア計算部１２３−２による処理を行う（ステップＳ１０４）。具体的には、まず、第２階層の分類基準別俯瞰スコア計算部１２３−２は、条件付分布学習部１５１により、第１階層グループ記憶部１０７−１に記憶されたグループのうち、その俯瞰スコアが閾値以上であったグループについてのみ、そのグループに属する異常スコアを、第２階層の分類基準に基づいて分類し、図４に示したように各グループ毎の異常スコアを第２階層グループ記憶部１０７−２に記憶する。次に、同じく条件付分布学習部１５１により、上記生成した各グループ毎に、そのグループに属する一連の異常スコアの条件付確率分布を計算して、図５に示したように第２階層条件付分布記憶部１０８−２に記憶する。次に、俯瞰スコア計算部１５２により、上記生成した各グループ毎に、その条件付確率分布から大域的な異常度を示す俯瞰スコアを計算し、図６に示したように第２階層俯瞰スコア記憶部１０９−２に記憶する。 If there is one or more groups having an overhead score that is equal to or greater than the threshold (YES in step S106), the abnormality detection control unit 124 increments the control variable i by 1 (step S108), and the value of i is the number n of layers. (Step S109), the second-tier classification reference overhead score calculation unit 123-2 corresponding to the value 2 of the control variable i performs processing (step S104). Specifically, first, the classification-level bird's-eye score calculation unit 123-2 of the second hierarchy is an overhead view of the groups stored in the first-layer group storage unit 107-1 by the conditional distribution learning unit 151. Only for a group whose score is equal to or greater than a threshold value, the abnormal score belonging to that group is classified based on the classification criteria of the second hierarchy, and the abnormal score for each group is stored in the second hierarchy group as shown in FIG. Stored in the unit 107-2. Next, the conditional distribution learning unit 151 calculates a conditional probability distribution of a series of abnormal scores belonging to the group for each of the generated groups, and the second hierarchical conditional condition as shown in FIG. The distribution is stored in the distribution storage unit 108-2. Next, the overhead score calculation unit 152 calculates an overhead score indicating a global abnormality degree from the conditional probability distribution for each of the generated groups, and stores the second-level overhead score storage as shown in FIG. Stored in the unit 109-2.

次に、異常検出制御部１２４は、制御変数ｉの値２に対応する第２階層の俯瞰スコア記憶部１０９−２から各グループの俯瞰スコアを読み出し、閾値以上の俯瞰スコアが存在するか否かを判定し（ステップＳ１０５）、その判定結果に応じて、異常なしの検出結果を生成して図７の処理を終了するか、さらに下位の階層の異常検出へと進むかを制御する。 Next, the abnormality detection control unit 124 reads the bird's-eye view score of each group from the bird's-eye view score storage unit 109-2 of the second hierarchy corresponding to the value 2 of the control variable i, and whether or not there is an bird's-eye view score equal to or higher than the threshold value. (Step S105), and in accordance with the determination result, a detection result with no abnormality is generated and the process of FIG. 7 is terminated, or whether the process proceeds to abnormality detection in a lower hierarchy is controlled.

こうして、第ｎ階層の分類基準別俯瞰スコア計算部１２３−ｎまで処理が進み、その処理が完了すると、異常検出制御部１２４は、制御変数ｉの値ｎに対応する第ｎ階層の俯瞰スコア記憶部１０９−ｎから各グループの俯瞰スコアを読み出し、閾値以上の俯瞰スコアが存在するか否かを判定する（ステップＳ１０５）。そして、何れかのグループの俯瞰スコアが閾値以上であれば、異常ありの検出結果を生成し、検出結果記憶部１１０に記憶すると同時に表示装置１０３に表示し（ステップＳ１１０）、図７の処理を終える。 In this way, the process proceeds to the n-th layer classification reference-specific overhead score calculation unit 123-n, and when the processing is completed, the abnormality detection control unit 124 stores the n-th layer overhead score storage corresponding to the value n of the control variable i. The bird's-eye view score of each group is read from the unit 109-n, and it is determined whether or not there is an bird's-eye view score equal to or higher than the threshold (step S105). If the bird's-eye view score of any group is greater than or equal to the threshold value, a detection result with abnormality is generated, stored in the detection result storage unit 110, and simultaneously displayed on the display device 103 (step S110), and the processing of FIG. Finish.

次に本実施の形態の効果を説明する。 Next, the effect of this embodiment will be described.

本実施の形態によれば、意図的な異常を精度良く検出することができる。その理由は次の通りである。俯瞰スコアが高いことは通常のパターンから外れた異常な行動が特定の分類基準値（つまり、特定のデータ属性）のグループに集中していることを表す。つまり、俯瞰スコアが高いことは突発的に例外的な行動が現れたのではなく意図的に例外的な行動が連続して行われた可能性が高いことに対応する。このため、異常スコア単体のみから異常を検出するよりも、異常スコアから導出した俯瞰スコアにより異常を検出する方が、意図的な異常を精度良く検出できることになる。 According to the present embodiment, intentional abnormality can be detected with high accuracy. The reason is as follows. A high bird's-eye view score indicates that abnormal behavior that deviates from the normal pattern is concentrated in a group of specific classification reference values (that is, specific data attributes). That is, a high bird's-eye view score corresponds to the fact that there is a high possibility that exceptional behaviors have been intentionally performed continuously, not suddenly. For this reason, it is possible to detect an intentional abnormality with higher accuracy by detecting an abnormality based on the bird's-eye view score derived from the abnormality score, rather than detecting an abnormality only from the abnormality score alone.

本実施の形態によれば、意図的な異常をより精度良く検出することができる。その理由は、本実施の形態においては、下位の階層にいくほど分類基準を定める条件が厳しくなるためである。例えば、分類基準「週」で分類した或る週の俯瞰スコアが高い場合に比べて、その週における或る日を分類基準とする或る日の俯瞰スコアが高い方が、より意図的に例外的な行動が連続して行われた可能性が高くなり、さらに、その日におけるユーザ名で分類した或るユーザの俯瞰スコアが高ければ、より一層、突発的に例外的な行動が現れた可能性が低くなり、意図的に例外的な行動が行われた可能性が高くなる。この結果、単一の分類基準による俯瞰スコアを用いるよりも、階層化された分類基準により俯瞰スコアを用いた方が、異常検出の精度を向上することができる。 According to the present embodiment, intentional abnormality can be detected with higher accuracy. The reason is that, in the present embodiment, the condition for determining the classification criteria becomes stricter as the level is lower. For example, compared with the case where the bird's-eye view score for a certain week classified according to the classification standard “week” is higher, the one with a higher day-of-the-week score based on a certain day in that week has a higher intentional exception. If a certain user's bird's-eye view score classified by the user name on that day is high, there is a possibility that unusual behavior will appear even more suddenly. , And there is a high probability that exceptional behavior was intentionally performed. As a result, the accuracy of abnormality detection can be improved by using the bird's-eye view score based on the hierarchical classification criterion, rather than using the bird's-eye score based on a single classification criterion.

本実施の形態によれば、意図的な異常を無駄なく効率よく検出することができる。その理由は、階層的異常検出手順を用いているためである。具体的には、第２階層の分類基準別俯瞰スコア計算部１２３−２から最後の第ｎ階層の分類基準別俯瞰スコア計算部１２３−ｎは、一つ上の階層の分類基準別俯瞰スコア計算部で生成されたグループのうち、算出された俯瞰スコアが予め設定された閾値以上のグループについてのみ、そのグループを当該階層に対応する分類基準でさらに細かく分類し、この分類して生成されたグループ毎の条件付確率分布と俯瞰スコアとを生成するためである。 According to the present embodiment, intentional abnormality can be detected efficiently without waste. The reason is that a hierarchical abnormality detection procedure is used. Specifically, the overhead score calculation unit 123-n according to the classification criteria of the second layer from the overhead score calculation unit 123-2 according to the classification criterion of the second layer is the top-level score calculation according to the classification criteria of the next higher hierarchy. The group generated by classifying the group more finely according to the classification criteria corresponding to the hierarchy only for the group whose calculated overhead score is equal to or higher than a preset threshold among the groups generated by the group This is to generate a conditional probability distribution and an overhead score for each.

以上の説明では、具体的な階層として、「週」、「週と日の組み合わせ」、「週と日とユーザ名との組み合わせ」の３階層を例示したが、本発明は３階層に限定されることなく、「ユーザ名」と「ユーザ名と日の組み合わせ」の如き２階層や、「週」、「週と日の組み合わせ」、「週と日とユーザ名との組み合わせ」、「週と日とユーザ名とファイル名の組み合わせ」の如き４階層以上の階層への適用が可能である。また、分類基準も、週、日、ユーザ名に限定されず、ログデータを複数のグループに分類できる基準であれば、「会社」、「会社と部署の組み合わせ」、「会社と部署とユーザ名の組み合わせ」など、任意の分類基準を用いることが可能である。 In the above description, three layers of “week”, “a combination of week and day”, and “a combination of week, day, and user name” are illustrated as specific layers, but the present invention is limited to three layers. Without having to use two levels such as “user name” and “user name / day combination”, “week”, “week / day combination”, “week / day / user name combination”, “week and The present invention can be applied to four or more layers such as “a combination of date, user name, and file name”. The classification criteria are not limited to week, day, and user name. If the criteria can classify log data into multiple groups, "Company", "Combination of company and department", "Company, department and user name" Arbitrary classification criteria such as “combination of” can be used.

［第２の実施の形態］
図８を参照すると、本発明の第２の実施の形態に係る異常検出装置２００は、複数の分類基準に階層構造を持たせないようにした点で図１に示した第１の実施の形態に係る異常検出装置１００と相違する。この相違に応じて、本実施の形態に係る異常検出装置２００は、第１の実施の形態に係る異常検出装置１００と比較して、第１〜第ｎ階層グループ記憶部１０７−１〜１０７−ｎの代わりに第１〜第ｍグループ記憶部２０７−１〜２０７−ｍを、第１〜第ｎ階層条件分布記憶部１０８−１〜１０８−ｎの代わりに第１〜第ｍ条件付分布記憶部２０８−１〜２０８−ｍを、第１〜第ｎ階層俯瞰スコア記憶部１０９−１〜１０９−ｎの代わりに第１〜第ｍ俯瞰スコア記憶部２０９−１〜２０９−ｍを備え、また、第１〜第ｎ分類基準別俯瞰スコア計算部１２３−１〜１２３−ｎの代わりに第１〜第ｍ分類基準別俯瞰スコア計算部２２３−１〜２２３−ｍを、異常検出制御部１２４の代わりに異常検出制御部２２４を備えている。 [Second Embodiment]
Referring to FIG. 8, the abnormality detection apparatus 200 according to the second embodiment of the present invention is the first embodiment shown in FIG. 1 in that a plurality of classification criteria are not provided with a hierarchical structure. This is different from the abnormality detecting apparatus 100 according to FIG. In accordance with this difference, the abnormality detection device 200 according to the present embodiment is compared with the abnormality detection device 100 according to the first embodiment in the first to nth hierarchical group storage units 107-1 to 107-. First to mth group storage units 207-1 to 207-m instead of n, and first to mth conditional distribution storages instead of the first to nth hierarchical condition distribution storage units 108-1 to 108-n. Units 208-1 to 208-m are provided with first to m-th overhead score storage units 209-1 to 209-m instead of the first to n-th hierarchy overhead score storage units 109-1 to 109-n, and , The first to mth classification criterion-specific overhead score calculation units 123-1 to 123-n are replaced with the first to mth classification criterion-specific overhead score calculation units 233-1 to 223-m of the abnormality detection control unit 124. Instead, an abnormality detection control unit 224 is provided.

第１〜第ｍ分類基準別俯瞰スコア計算部２２３−１〜２２３−ｍは、１以上のデータ属性の組み合わせによるそれぞれ異なるｍ種類の分類基準に対応して設けられている。本実施の形態では、ｍ種類の分類基準の間に階層関係はなく、それぞれ独立している。例えば、第１分類基準別俯瞰スコア計算部２２３−１に対応する第１分類基準は例えば「日」であり、第２分類基準別俯瞰スコア計算部２２３−２に対応する第２分類基準は例えば「ユーザ名」であり、…、第ｍ分類基準別俯瞰スコア計算部２２３−３に対応する第ｍ分類基準は例えば「ファイル名」である。 The first to mth classification criterion-specific overhead score calculation units 223-1 to 223 -m are provided corresponding to different m types of classification criteria based on a combination of one or more data attributes. In the present embodiment, there are no hierarchical relationships among the m types of classification criteria, and they are independent of each other. For example, the first classification criterion corresponding to the first classification criterion-based overhead score calculation unit 223-1 is, for example, “day”, and the second classification criterion corresponding to the second classification criterion-based overhead score calculation unit 223-2 is, for example, “User name”,..., M-th classification criterion corresponding to the m-th classification criterion-specific overhead score calculation unit 223-3 is, for example, “file name”.

第１分類基準別俯瞰スコア計算部２２３−１は、ログデータを第１分類基準に従って分類したグループ毎に、当該グループに属するログデータについて異常スコア計算部１３２で算出された異常スコアの条件付確率分布を学習し、この学習した条件付確率分布から俯瞰スコアを算出する機能を有する。この第１分類基準別俯瞰スコア計算部２２３−１は、条件付分布学習部２４１と俯瞰スコア計算部２４２とで構成される。 The overhead score calculation unit 223-1 for each first classification criterion is, for each group in which the log data is classified according to the first classification criterion, the conditional probability of the abnormal score calculated by the abnormal score calculation unit 132 for the log data belonging to the group. It has a function of learning a distribution and calculating an overhead score from the learned conditional probability distribution. This bird's-eye view score calculation unit 223-1 for each first classification criterion includes a conditional distribution learning unit 241 and an bird's-eye score calculation unit 242.

条件付分布学習部２４１は、第１分類基準に従って、データ別異常スコア記憶部１０６に記憶された異常スコアを分類し、各グループ毎に分類基準値を付加して分類結果を第１グループ記憶部２０７−１に記憶し、次いで、第１グループ記憶部２０７−１に記憶した各グループ毎に、そのグループに属する異常スコアを読み出して、異常スコアの条件付確率分布を学習し、学習した条件付確率分布を第１条件付分布記憶部２０８−１に記憶する。 The conditional distribution learning unit 241 classifies the abnormal score stored in the data-specific abnormal score storage unit 106 according to the first classification criterion, adds a classification reference value for each group, and outputs the classification result to the first group storage unit. For each group stored in the 207-1 and then stored in the first group storage unit 207-1, the abnormal score belonging to the group is read, the conditional probability distribution of the abnormal score is learned, and the learned conditional The probability distribution is stored in the first conditional distribution storage unit 208-1.

例えば、第１分類基準が「日」の場合、条件付分布学習部２４１は、異常スコアに付加されたログＩＤをキーにログデータ記憶部１０４を検索して、各異常スコアに対応するログデータの日時情報を取得し、その日時情報から判明する日単位で異常スコアを分類し、各グループがどの日に対応するかを示す分類基準値を付加して、第１グループ記憶部２０７−１に記憶する。次に、条件付分布学習部２４１は、各日毎の異常スコアから、日毎の異常スコアの確率分布p(S|日)を学習し、第１条件付分布記憶部２０８−１に記憶する。 For example, when the first classification criterion is “day”, the conditional distribution learning unit 241 searches the log data storage unit 104 using the log ID added to the abnormal score as a key, and the log data corresponding to each abnormal score. The date and time information is obtained, the abnormal score is classified on a day-by-day basis determined from the date and time information, and a classification reference value indicating which day each group corresponds to is added to the first group storage unit 207-1. Remember. Next, the conditional distribution learning unit 241 learns the probability distribution p (S | day) of the abnormal score for each day from the abnormal score for each day, and stores it in the first conditional distribution storage unit 208-1.

俯瞰スコア計算部２４２は、第１条件付分布記憶部２０８−１に記憶された各グループ毎に、そのグループの異常スコアの条件付確率分布を読み出し、この条件付確率分布から前述した式（３）または式（４）を用いて、異常の程度を表す俯瞰スコアを算出して第１俯瞰スコア記憶部２０９−１に記憶する。例えば、第１分類基準が「日」の場合、俯瞰スコア計算部２４２は、日毎の異常度を表す俯瞰スコアを算出する。 For each group stored in the first conditional distribution storage unit 208-1, the bird's-eye view score calculation unit 242 reads the conditional probability distribution of the abnormal score of the group, and uses the conditional expression (3) described above from the conditional probability distribution. ) Or equation (4), an overhead score representing the degree of abnormality is calculated and stored in the first overhead score storage unit 209-1. For example, when the first classification criterion is “day”, the overhead score calculation unit 242 calculates an overhead score representing the degree of abnormality for each day.

第２分類基準別俯瞰スコア計算部２２３−２は、ログデータを第２分類基準に従って分類したグループ毎に、当該グループに属するログデータについて異常スコア計算部１３２で算出された異常スコアの条件付確率分布を学習し、この学習した条件付確率分布から俯瞰スコアを算出する機能を有する。この第２分類基準別俯瞰スコア計算部２２３−２は、条件付分布学習部２５１と俯瞰スコア計算部２５２とで構成される。 For each group in which log data is classified according to the second classification criterion, the second classification criterion-based overhead score calculation unit 223-2 has a conditional probability of an abnormal score calculated by the abnormal score calculation unit 132 for log data belonging to the group. It has a function of learning a distribution and calculating an overhead score from the learned conditional probability distribution. The overhead classification score calculation unit 223-2 according to the second classification standard includes a conditional distribution learning unit 251 and an overhead score calculation unit 252.

条件付分布学習部２５１は、第２分類基準に従って、データ別異常スコア記憶部１０６に記憶された異常スコアを分類し、各グループ毎に分類基準値を付加して分類結果を第２グループ記憶部２０７−２に記憶し、次いで、第２グループ記憶部２０７−２に記憶した各グループ毎に、そのグループに属する異常スコアを読み出して、異常スコアの条件付確率分布を学習し、学習した条件付確率分布を第２条件付分布記憶部２０８−２に記憶する。 The conditional distribution learning unit 251 classifies the abnormal score stored in the data-specific abnormal score storage unit 106 according to the second classification criterion, adds a classification reference value for each group, and outputs the classification result to the second group storage unit. 207-2, and then for each group stored in the second group storage unit 207-2, the abnormal score belonging to that group is read, the conditional probability distribution of the abnormal score is learned, and the learned conditional The probability distribution is stored in the second conditional distribution storage unit 208-2.

例えば、第２分類基準が「ユーザ名」の場合、条件付分布学習部２５１は、異常スコアに付加されたログＩＤをキーにログデータ記憶部１０４を検索して、各異常スコアに対応するログデータのユーザ名を取得し、そのユーザ名単位で異常スコアを分類し、各グループがどのユーザに対応するかを示す分類基準値を付加して、第２グループ記憶部２０７−２に記憶する。次に、条件付分布学習部２５１は、各ユーザ毎の異常スコアから、ユーザ毎の異常スコアの確率分布p(S|ユーザ名)を学習し、第２条件付分布記憶部２０８−２に記憶する。 For example, when the second classification criterion is “user name”, the conditional distribution learning unit 251 searches the log data storage unit 104 using the log ID added to the abnormal score as a key, and logs corresponding to the abnormal scores. The user name of the data is acquired, the abnormal score is classified in units of the user name, a classification reference value indicating which user each group corresponds to is added and stored in the second group storage unit 207-2. Next, the conditional distribution learning unit 251 learns the probability distribution p (S | user name) of the abnormal score for each user from the abnormal score for each user, and stores it in the second conditional distribution storage unit 208-2. To do.

俯瞰スコア計算部２５２は、第２条件付分布記憶部２０８−２に記憶された各グループ毎に、そのグループの異常スコアの条件付確率分布を読み出し、この条件付確率分布から前述した式（３）または式（４）を用いて、異常の程度を表す俯瞰スコアを算出して第２俯瞰スコア記憶部２０９−２に記憶する。例えば、第２分類基準が「ユーザ名」の場合、俯瞰スコア計算部２５２は、ユーザ毎の異常度を表す俯瞰スコアを算出する。 For each group stored in the second conditional distribution storage unit 208-2, the bird's-eye view score calculation unit 252 reads the conditional probability distribution of the abnormal score of the group, and uses the conditional expression (3) described above from this conditional probability distribution. ) Or formula (4), an overhead score representing the degree of abnormality is calculated and stored in the second overhead score storage unit 209-2. For example, when the second classification criterion is “user name”, the overhead score calculation unit 252 calculates an overhead score representing the degree of abnormality for each user.

第ｍ分類基準別俯瞰スコア計算部２２３−ｍは、ログデータを第ｍ分類基準に従って分類したグループ毎に、当該グループに属するログデータについて異常スコア計算部１３２で算出された異常スコアの条件付確率分布を学習し、この学習した条件付確率分布から俯瞰スコアを算出する機能を有する。この第ｍ分類基準別俯瞰スコア計算部２２３−ｍは、条件付分布学習部２６１と俯瞰スコア計算部２６２とで構成される。 For each group in which log data is classified according to the mth classification criterion, the mth classification criterion-specific overhead score calculation unit 223-m has a conditional probability of an abnormal score calculated by the abnormal score calculation unit 132 for log data belonging to the group. It has a function of learning a distribution and calculating an overhead score from the learned conditional probability distribution. The m-th classification criterion-specific overhead score calculation unit 223-m includes a conditional distribution learning unit 261 and an overhead score calculation unit 262.

条件付分布学習部２６１は、第ｍ分類基準に従って、データ別異常スコア記憶部１０６に記憶された異常スコアを分類し、各グループ毎に分類基準値を付加して分類結果を第ｍグループ記憶部２０７−ｍに記憶し、次いで、第ｍグループ記憶部２０７−ｍに記憶した各グループ毎に、そのグループに属する異常スコアを読み出して、異常スコアの条件付確率分布を学習し、学習した条件付確率分布を第ｍ条件付分布記憶部２０８−２に記憶する。 The conditional distribution learning unit 261 classifies the abnormal score stored in the data-specific abnormal score storage unit 106 according to the mth classification criterion, adds a classification reference value for each group, and displays the classification result as the mth group storage unit. 207-m, and for each group stored in the m-th group storage unit 207-m, the abnormal score belonging to that group is read, the conditional probability distribution of the abnormal score is learned, and the learned conditional The probability distribution is stored in the mth conditional distribution storage unit 208-2.

例えば、第ｍ分類基準が「ファイル名」の場合、条件付分布学習部２５１は、異常スコアに付加されたログＩＤをキーにログデータ記憶部１０４を検索して、各異常スコアに対応するログデータのイベントデータ中からファイル名を取得し、そのファイル名単位で異常スコアを分類し、各グループがどのファイル名に対応するかを示す分類基準値を付加して、第ｍグループ記憶部２０７−ｍに記憶する。次に、条件付分布学習部２６１は、各ファイル毎の異常スコアから、ファイル毎の異常スコアの確率分布p(S|ファイル名)を学習し、第ｍ条件付分布記憶部２０８−ｍに記憶する。 For example, when the m-th classification criterion is “file name”, the conditional distribution learning unit 251 searches the log data storage unit 104 using the log ID added to the abnormal score as a key, and logs corresponding to the abnormal scores. The file name is acquired from the event data of the data, the abnormality score is classified in units of the file name, a classification reference value indicating which file name corresponds to each group is added, and the m-th group storage unit 207- Store in m. Next, the conditional distribution learning unit 261 learns the probability distribution p (S | file name) of the abnormal score for each file from the abnormal score for each file, and stores it in the mth conditional distribution storage unit 208-m. To do.

俯瞰スコア計算部２６２は、第ｍ条件付分布記憶部２０８−ｍに記憶された各グループ毎に、そのグループの異常スコアの条件付確率分布を読み出し、この条件付確率分布から前述した式（３）または式（４）を用いて、異常の程度を表す俯瞰スコアを算出して第ｍ俯瞰スコア記憶部２０９−ｍに記憶する。例えば、第ｍ分類基準が「ファイル名」の場合、俯瞰スコア計算部２６２は、ファイル毎の異常度を表す俯瞰スコアを算出する。 For each group stored in the m-th conditional distribution storage unit 208-m, the overhead score calculation unit 262 reads the conditional probability distribution of the abnormal score of the group, and the above-described equation (3) ) Or formula (4), an overhead score representing the degree of abnormality is calculated and stored in the m-th overhead score storage unit 209-m. For example, when the m-th classification criterion is “file name”, the overhead score calculation unit 262 calculates an overhead score representing the degree of abnormality for each file.

異常検出制御部２２４は、第１〜第ｍの俯瞰スコア記憶部２０９−１〜２０９−ｍに記憶された俯瞰スコアと、データ別異常スコア記憶部１０６に記憶されたログデータ毎の異常スコアとを入力として、異常を検出し、検出結果を検出結果記憶部１１０に格納すると同時に表示装置１０３に表示する機能を有する。 The abnormality detection control unit 224 includes an overhead score stored in the first to mth overhead score storage units 209-1 to 209-m, and an abnormality score for each log data stored in the data-specific abnormality score storage unit 106. Is input, and the abnormality is detected, and the detection result is stored in the detection result storage unit 110 and simultaneously displayed on the display device 103.

一定量のログデータについてデータ別異常スコアが算出されている場合の異常検出制御部２２４による異常検出手順として、本実施の形態では、以下のような手順を用いる（この手順を非階層的異常検出手順と呼ぶ）。 In the present embodiment, the following procedure is used as an abnormality detection procedure by the abnormality detection control unit 224 when an abnormal score by data is calculated for a certain amount of log data (this procedure is used for non-hierarchical abnormality detection). Called a procedure).

＜非階層的異常検出手順＞
手順１：第１〜第ｍの分類基準別俯瞰スコア計算部２２３−１〜２２３−ｍを一斉に起動し、データ別異常スコア記憶部１０６に記憶された異常スコアを、第１〜第ｍの分類基準に従って分類し、グループ毎の条件付確率分布と俯瞰スコアとを生成する。 <Non-hierarchical abnormality detection procedure>
Procedure 1: First to m-th classification criterion-specific overhead score calculation units 223-1 to 223 -m are activated all at once, and the abnormality scores stored in the data-specific abnormality score storage unit 106 are converted to the first to m-th classification scores. Classification is performed according to the classification standard, and a conditional probability distribution and an overhead score are generated for each group.

手順２：第１〜第ｍの分類基準別俯瞰スコア計算部２２３−１〜２２３−ｍの処理によって生成された俯瞰スコアのうち予め設定された閾値以上の俯瞰スコアが１つでも存在すれば異常ありとする検出結果を生成する。他方、第１〜第ｍの分類基準別俯瞰スコア計算部２２３−１〜２２３−ｍで算出された俯瞰スコアの中に閾値以上のものが１つもなければ、異常なしとする検出結果を生成する。 Procedure 2: It is abnormal if there is at least one bird's-eye view score that is equal to or higher than a preset threshold among the bird's-eye scores generated by the processing of the first to mth classification criterion-based bird's-eye score calculation units 233-1 to 223-m Generates a detection result that is present. On the other hand, if none of the overhead scores calculated by the first to mth classification reference overhead score calculators 223-1 to 223-m exceed the threshold, a detection result indicating no abnormality is generated. .

異常ありの検出結果には、判定の根拠とした俯瞰スコアを含めるようにしても良いし、異常と判定した俯瞰スコアに対応するログデータの全て或いは閾値以上の異常スコアを持つログデータを異常データとして含めるようにしても良い。 The detection result with abnormality may include the bird's-eye view score as a basis for the determination, or all of the log data corresponding to the bird's-eye view score determined to be abnormal or log data having an abnormality score equal to or higher than the threshold value is abnormal data. May be included.

次に、図９のフローチャートを参照して、本実施の形態の動作を説明する。 Next, the operation of the present embodiment will be described with reference to the flowchart of FIG.

最初に、データ処理装置１０１のデータ入力部１２１は、入力装置１０２から異常検出対象となる一連のログデータを入力し、ログデータ記憶部１０４に記憶する（図９のステップＳ２０１）。ログデータ記憶部１０４に記憶されるログデータは、図２で示したように、ログＩＤ、日時情報、ユーザ名およびイベントデータで構成されている。 First, the data input unit 121 of the data processing device 101 inputs a series of log data to be detected from the input device 102 and stores it in the log data storage unit 104 (step S201 in FIG. 9). As shown in FIG. 2, the log data stored in the log data storage unit 104 includes a log ID, date and time information, a user name, and event data.

次に、データ別異常スコア計算部１２２は、分布学習部１３１により、ログデータ記憶部１０４に記憶されたログデータにおけるイベントデータの発生確率分布を生成して確率分布記憶部１０５に記憶し、次に、異常スコア計算部１３２により、上記生成した発生確率分布に基づいて個々のログデータの異常の程度を表す異常スコアを生成し、データ別異常スコア記憶部１０６に記憶する（ステップＳ２０２）。データ別異常スコア記憶部１０６に記憶される異常スコアは、図３で示したように、その異常スコアがどのログデータのものであるかを示すログＩＤが付加されている。 Next, the data-specific abnormality score calculation unit 122 generates an event data occurrence probability distribution in the log data stored in the log data storage unit 104 by the distribution learning unit 131 and stores it in the probability distribution storage unit 105. In addition, the abnormal score calculation unit 132 generates an abnormal score indicating the degree of abnormality of each log data based on the generated probability distribution and stores it in the data-specific abnormal score storage unit 106 (step S202). As shown in FIG. 3, the abnormality score stored in the data-specific abnormality score storage unit 106 is added with a log ID indicating which log data the abnormality score is.

次に、異常検出制御部２２４は、第１〜第ｍの分類基準別俯瞰スコア計算部２２３−１〜２２３−ｍによる処理を一斉に行う（ステップＳ２０３）。具体的には、第１〜第ｍの分類基準別俯瞰スコア計算部２２３−１〜２２３−ｍは、条件付分布学習部２４１〜２６１により、データ別異常スコア記憶部１０６に記憶されたデータ別の異常スコアを、第１〜第ｍ分類基準に基づいて分類し、図４と同様に各グループ毎の異常スコアを第１〜第ｍグループ記憶部２０７−１〜２０７−ｍに記憶する。次に、同じく条件付分布学習部２４１〜２６１により、上記生成した各グループ毎に、そのグループに属する一連の異常スコアの条件付確率分布を計算して、図５と同様に第１〜第ｍの条件付分布記憶部２０８−１〜２０８−ｍに記憶する。次に、俯瞰スコア計算部２４２〜２６２により、上記生成した各グループ毎に、その条件付確率分布から大域的な異常度を示す俯瞰スコアを計算し、図６と同様に第１〜第ｍの俯瞰スコア記憶部２０９−１〜２０９−ｍに記憶する。 Next, the abnormality detection control unit 224 performs the processing by the first to mth classification reference-specific overhead score calculation units 223-1 to 223-m all at once (step S 203). Specifically, the first to mth classification reference-specific overhead score calculation units 223-1 to 223 -m are classified by data stored in the data-specific abnormality score storage unit 106 by the conditional distribution learning units 241 to 261. The abnormal scores are classified based on the first to mth classification criteria, and the abnormal scores for each group are stored in the first to mth group storage units 207-1 to 207-m, as in FIG. Next, the conditional distribution learning unit 241 to 261 calculates a conditional probability distribution of a series of abnormal scores belonging to the group for each of the generated groups, and first to m-th as in FIG. Are stored in the conditional distribution storage units 208-1 to 208 -m. Next, the overhead score calculation units 242 to 262 calculate an overhead score indicating a global abnormality degree from the conditional probability distribution for each of the generated groups. It memorize | stores in the bird's-eye view score memory | storage part 209-1 to 209-m.

次に、異常検出制御部２２４は、第１〜第ｍの俯瞰スコア記憶部２０９−１〜２０９−ｍから各グループの俯瞰スコアを読み出し、閾値以上の俯瞰スコアが存在するか否かを判定する（ステップＳ２０４）。閾値以上の俯瞰スコアを持つグループが１つも存在しなければ（ステップＳ２０５でＮＯ）、異常検出制御部２２４は、異常なしの検出結果を生成し、検出結果記憶部１１０に記憶すると共に表示装置１０３に表示する（ステップＳ２０６）。他方、閾値以上の俯瞰スコアを持つグループが１つ以上存在した場合（ステップＳ２０５でＹＥＳ）、異常検出制御部２２４は、異常ありの検出結果を生成し、検出結果記憶部１１０に記憶すると共に表示装置１０３に表示する（ステップＳ２０７）。 Next, the abnormality detection control unit 224 reads the bird's-eye view score of each group from the first to mth bird's-eye view score storage units 209-1 to 209-m, and determines whether or not an bird's-eye view score equal to or greater than the threshold exists. (Step S204). If there is no group having an overhead score equal to or higher than the threshold (NO in step S205), the abnormality detection control unit 224 generates a detection result without abnormality, stores the detection result in the detection result storage unit 110, and displays the display device 103. (Step S206). On the other hand, when one or more groups having an overhead score equal to or higher than the threshold exist (YES in step S205), the abnormality detection control unit 224 generates a detection result with abnormality, stores it in the detection result storage unit 110, and displays it. The information is displayed on the device 103 (step S207).

本実施の形態によれば、意図的な異常を精度良く検出することができる。その理由は次の通りである。俯瞰スコアが高いことは通常のパターンから外れた異常な行動が特定の分類基準値のグループに集中していることを表す。つまり、俯瞰スコアが高いことは突発的に例外的な行動が現れたのではなく意図的に例外的な行動が連続して行われた可能性が高いことに対応する。このため、異常スコア単体のみから異常を検出するよりも、異常スコアから導出した俯瞰スコアにより異常を検出する方が、意図的な異常をより精度良く検出できることになる。 According to the present embodiment, intentional abnormality can be detected with high accuracy. The reason is as follows. A high bird's-eye view score indicates that abnormal behaviors that deviate from the normal pattern are concentrated in a group of specific classification reference values. That is, a high bird's-eye view score corresponds to the fact that there is a high possibility that exceptional behaviors have been intentionally performed continuously, not suddenly. For this reason, the intentional abnormality can be detected with higher accuracy by detecting the abnormality from the bird's-eye view score derived from the abnormality score, rather than detecting the abnormality only from the abnormality score alone.

本実施の形態によれば、意図的な異常をより精度良く検出することができる。その理由は、本実施の形態においては、それぞれ異なる複数の分類基準に従って分類した各グループ毎に異常の程度を表す俯瞰スコアを算出し、その何れかのグループの俯瞰スコアが閾値以上かどうかによって異常の検出を行うためである。例えば、意図的な不正行動が行われた場合、その行為者が一人のユーザによるものであれば、分類基準を「ユーザ名」とする俯瞰スコア計算部で計算される俯瞰スコアが閾値以上となって検出できる。他方、発覚を防ぐために複数のユーザが分担して不正な行動をとった場合、一人一人のユーザが行った不正行動の数が少ないと、分類基準「ユーザ名」の俯瞰スコア計算部で計算される俯瞰スコアは閾値以上にならない可能性がある。このとき、同じ日に各ユーザが一斉に不正な行動を取っていれば、分類基準「日」の俯瞰スコア計算部で計算される俯瞰スコアが閾値以上となって検出でき、また、同じ対象ファイルに対して日を変えて複数のユーザが不正な行動を取った場合には、分類基準「ファイル」の俯瞰スコア計算部で計算される俯瞰スコアが閾値以上となって検出できる。この結果、単一の分類基準による俯瞰スコアを用いるよりも、複数の分類基準により俯瞰スコアを用いた方が、検出漏れが少なくなって異常検出の精度を向上することができる。 According to the present embodiment, intentional abnormality can be detected with higher accuracy. The reason for this is that, in this embodiment, an overhead score representing the degree of abnormality is calculated for each group classified according to a plurality of different classification criteria, and an abnormality occurs depending on whether the overhead score of any of the groups is greater than or equal to a threshold value. This is because the detection is performed. For example, when an intentional fraudulent action is performed, if the actor is a single user, the bird's-eye view score calculated by the bird's-eye score calculation unit whose classification criterion is “user name” is equal to or greater than the threshold value. Can be detected. On the other hand, when multiple users share and take illegal actions to prevent detection, if the number of fraudulent actions performed by each user is small, it is calculated by the overhead score calculation unit of the classification criterion “user name” The bird's-eye view score may not exceed the threshold. At this time, if the users are taking illegal actions all together on the same day, the bird's-eye view score calculated by the bird's-eye view score calculation unit of the classification standard “day” can be detected to be equal to or higher than the threshold, and the same target file In contrast, when a plurality of users take illegal actions by changing the day, the bird's-eye view score calculated by the bird's-eye view score calculation unit of the classification criterion “file” can be detected to be equal to or greater than the threshold value. As a result, it is possible to reduce detection omissions and improve the accuracy of abnormality detection by using the overhead scores based on a plurality of classification criteria, rather than using the overhead scores based on a single classification criterion.

以上の説明では、具体的な分類基準として、「日」、「ユーザ名」、「ファイル名」の３つを例示したが、本発明はこれらの分類基準に限定されることなく、ログデータをそのデータ属性に基づいて複数のグループに分類できる基準であれば、任意の分類基準を用いることが可能である。 In the above description, three examples of “day”, “user name”, and “file name” have been exemplified as specific classification criteria, but the present invention is not limited to these classification criteria, and log data Any classification standard can be used as long as it is a standard that can be classified into a plurality of groups based on the data attribute.

以上本発明の実施の形態について説明したが、本発明は以上の実施の形態にのみ限定されず、その他各種の付加変更が可能である。例えば、第１の実施の形態と第２の実施の形態とを組み合わせて、お互いの分類基準に階層関係のない２つ以上の分類基準別俯瞰スコア計算部を設けると共に、そのうちの少なくとも１つの分類基準別俯瞰スコア計算部を最上位階層として、その下位階層に上下間で分類基準を階層化した１つ以上の階層分類別俯瞰スコア計算部を設けるようにしても良い。また、本発明の異常検出装置は、その有する機能をハードウェア的に実現することは勿論、コンピュータとプログラムとで実現することができる。プログラムは、磁気ディスクや半導体メモリ等のコンピュータ可読記録媒体に記録されて提供され、コンピュータの立ち上げ時などにコンピュータに読み取られ、そのコンピュータの動作を制御することにより、そのコンピュータを前述した第１の実施の形態におけるデータ別異常スコア計算部１２２、第１〜第ｎ階層分類基準別俯瞰スコア計算部１２３−１〜１２３−ｎおよび異常検出制御部１２４、または、第２の実施の形態におけるデータ別異常スコア計算部１２２、第１〜第ｎ分類基準別俯瞰スコア計算部２２３−１〜２２３−ｍおよび異常検出制御部２２４として機能させる。 Although the embodiment of the present invention has been described above, the present invention is not limited to the above embodiment, and various other additions and modifications can be made. For example, the first embodiment and the second embodiment are combined to provide two or more classification criterion-specific overhead score calculation units that have no hierarchical relationship with each other, and at least one of the classification criteria The reference-specific bird's-eye score calculation unit may be the highest hierarchy, and one or more bird-specific bird's-eye score calculation units may be provided in the lower hierarchy. In addition, the abnormality detection device of the present invention can be realized by a computer and a program as well as by realizing the functions of the abnormality detection device by hardware. The program is provided by being recorded on a computer-readable recording medium such as a magnetic disk or a semiconductor memory, read by the computer when the computer is started up, etc., and controlling the operation of the computer. Data-specific abnormality score calculation unit 122, first to n-th layer classification reference-specific overhead score calculation units 123-1 to 123-n and abnormality detection control unit 124, or data according to the second embodiment It is made to function as another abnormal score calculation part 122, the 1st-n-th classification standard top-down score calculation parts 223-1 to 223-m, and an abnormality detection control part 224.

本発明の利用法として、コンピュータセキュリティへの応用が考えられる。応用の目的は、情報漏洩の抑止や不正操作の検出等である。具体的な応用例としては以下が挙げられる。
（1）Ｗｅｂ攻撃の検出
通常と異なるアクセスを検出することで、不正なアクセスを検出できる。俯瞰スコアを用いて大域的な異常を検出することで、単一のアクセスの異常度のみを見ていては検出が困難な不正アクセスを検出できる。例えば、単一のアクセス自体はそれほど異常でもないが、同一のアクセスが時間的に密集して続く不正なアクセス（SQLインジェクション攻撃など）を検出することができる。 As a method of using the present invention, application to computer security can be considered. The purpose of the application is to prevent information leakage and detect unauthorized operations. Specific applications include the following.
(1) Detection of Web attacks Unauthorized access can be detected by detecting access that is different from normal. By detecting a global anomaly using the bird's-eye view score, it is possible to detect an unauthorized access that is difficult to detect only by looking at the degree of anomaly of a single access. For example, it is possible to detect an unauthorized access (such as an SQL injection attack) in which a single access itself is not so abnormal, but the same access is concentrated in time.

（2）不審行動検出
ユーザの行動履歴（ＰＣの操作ログ、ファイルサーバへのアクセスログ等）から通常と異なる行動を検出することで、不審行動を検出できる。俯瞰スコアを用いて大域的な異常を見ることで、単一の行動の異常度のみを見ていては検出が困難な不審行動を検出できる。例えば、同一ユーザがログインの失敗を繰り返している場合が挙げられる。 (2) Suspicious behavior detection Suspicious behavior can be detected by detecting unusual behavior from a user's behavior history (PC operation log, file server access log, etc.). By looking at global abnormalities using the bird's-eye view score, it is possible to detect suspicious behaviors that are difficult to detect by looking only at the degree of abnormality of a single action. For example, a case where the same user repeatedly fails to log in can be mentioned.

本発明の第１の実施の形態のブロック図である。It is a block diagram of a 1st embodiment of the present invention. 本発明の第１の実施の形態におけるログデータ記憶部に記憶されるログデータのフォーマット例を示す図である。It is a figure which shows the example of a format of the log data memorize | stored in the log data memory | storage part in the 1st Embodiment of this invention. 本発明の第１の実施の形態におけるデータ別異常スコア記憶部の記憶構造の説明図である。It is explanatory drawing of the memory structure of the abnormal score memory | storage part classified by data in the 1st Embodiment of this invention. 本発明の第１の実施の形態における第ｉ階層グループ記憶部の記憶構造の説明図である。It is explanatory drawing of the memory structure of the i-th hierarchy group memory | storage part in the 1st Embodiment of this invention. 本発明の第１の実施の形態における第ｉ階層条件付分布記憶部の記憶構造の説明図である。It is explanatory drawing of the memory structure of the i-th hierarchy conditional distribution memory | storage part in the 1st Embodiment of this invention. 本発明の第１の実施の形態における第ｉ階層俯瞰スコア記憶部の記憶構造の説明図である。It is explanatory drawing of the memory structure of the i-th hierarchy bird's-eye view score memory | storage part in the 1st Embodiment of this invention. 本発明の第１の実施の形態の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process of the 1st Embodiment of this invention. 本発明の第２の実施の形態のブロック図である。It is a block diagram of the 2nd Embodiment of this invention. 本発明の第２の実施の形態の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process of the 2nd Embodiment of this invention.

Explanation of symbols

１００、２００…異常検出装置
１０１…データ処理装置
１０２…入力装置
１０３…表示装置
１０４…ログデータ記憶部
１０５…確率分布記憶部
１０６…データ別異常スコア記憶部
１０７−１〜１０７−ｎ…第１〜第ｎ階層グループ記憶部
１０８−１〜１０８−ｎ…第１〜第ｎ階層条件付分布記憶部
１０９−１〜１０９−ｎ…第１〜第ｎ階層俯瞰スコア記憶部
１１０…検出結果記憶部
１２１…データ入力部
１２２…データ別異常スコア計算部
１２３−１〜１２３−ｎ…第１〜第ｎ階層分類基準別俯瞰スコア計算部
１２４…異常検出制御部
１３１…分布学習部
１３２…異常スコア計算部
１４１、１５１、１６１…条件付分布学習部
１４２、１５２、１６２…俯瞰スコア計算部
２０７−１〜２０７−ｍ…第１〜第ｍグループ記憶部
２０８−１〜２０８−ｍ…第１〜第ｍ条件付分布記憶部
２０９−１〜２０９−ｍ…第１〜第ｍ俯瞰スコア記憶部
２２３−１〜２２３−ｍ…第１〜第ｍ分類基準別俯瞰スコア計算部
２２４…異常検出制御部
２４１、２５１、２６１…条件付分布学習部
２４２、２５２、２６２…俯瞰スコア計算部 100, 200: Abnormality detection device 101 ... Data processing device 102 ... Input device 103 ... Display device 104 ... Log data storage unit 105 ... Probability distribution storage unit 106 ... Data-specific abnormality score storage units 107-1 to 107-n ... First ˜n-th layer group storage units 108-1 to 108-n... 1 to n-th layer conditional distribution storage units 109-1 to 109-n. 121 ... Data input unit 122 ... Data-specific abnormality score calculation units 123-1 to 123-n ... First to n-th layer classification criteria-based overhead score calculation unit 124 ... Abnormality detection control unit 131 ... Distribution learning unit 132 ... Abnormal score calculation Units 141, 151, 161 ... Conditional distribution learning units 142, 152, 162 ... Overhead score calculation units 207-1 to 207-m ... First to m-th group storage units 208-1 to 20-20 -M ... 1st to m-th conditional distribution storage units 209-1 to 209-m ... 1st to m-th bird's-eye view score storage units 223-1 to 223-m ... 224 ... abnormality detection control units 241, 251 and 261 ... conditional distribution learning units 242, 252, and 262 ... overhead score calculation unit

Claims

Learn the probability distribution of a series of data series that are subject to anomaly detection, and calculate the anomaly score for each piece of data in the data series that represents the degree of deviation between the data predicted from the learned probability distribution and the actual data Each data group abnormality score calculating means and means provided corresponding to different classification criteria by a combination of one or more data attributes, each group in which individual data in the data series is classified according to the corresponding classification criteria A plurality of classification-based bird's-eye score calculation for learning the conditional probability distribution of the calculated abnormality score for each data belonging to the group and calculating an overhead score representing the degree of abnormality from the learned conditional probability distribution. An abnormality detection device comprising: means.

The abnormality detection apparatus according to claim 1, wherein a plurality of different classification criteria have a hierarchical relationship.

The overhead score calculation means for each classification criterion corresponding to the classification criteria of the lower hierarchy is preliminarily obtained from the plurality of groups calculated by the overhead score calculation means for each classification criterion corresponding to the classification criteria of the upper hierarchy. The abnormality detection apparatus according to claim 2, wherein only a group that is equal to or greater than a set threshold value performs classification according to a classification criterion, learning of a conditional probability distribution, and calculation of an overhead score.

An abnormality detection control means for detecting an abnormality by comparing the bird's-eye view score calculated by the classification reference bird's-eye score calculation means corresponding to the classification criterion of the lowest hierarchy with a preset threshold value. Item 4. The abnormality detection device according to Item 3.

The abnormality detection apparatus according to claim 1, wherein a plurality of different classification criteria have a non-hierarchical relationship.

6. The abnormality detection according to claim 5, further comprising abnormality detection control means for detecting an abnormality by comparing the bird's eye scores calculated by all the classification reference bird's eye score calculation means with a preset threshold value. apparatus.

The abnormality detection apparatus according to any one of claims 1 to 6, wherein the classification criterion-based bird's-eye view score calculation unit calculates a kurtosis of the learned conditional probability distribution as the bird's-eye view score.

The abnormality according to any one of claims 1 to 6, wherein the bird's-eye score calculation means for each classification criterion calculates a value of a predetermined percentile point of the learned conditional probability distribution as the bird's-eye score. Detection device.

Anomaly score calculation means for each data learns a probability distribution of a series of data series to be anomaly detection target, and an anomaly score representing the degree of deviation between the data predicted from the learned probability distribution and the actual data is A first step of calculating for individual data in the series;
For each group, a plurality of classification criterion-based overhead score calculation means provided corresponding to different classification criteria based on combinations of one or more data attributes classify individual data in the data series according to the corresponding classification criteria. And learning a conditional probability distribution of the calculated abnormality score for the data belonging to the group, and calculating a bird's eye score representing the degree of abnormality from the learned conditional probability distribution. Characteristic abnormality detection method.

The computer learns a probability distribution of a series of data series to be detected as an abnormality, and an abnormality score representing the degree of deviation between the data predicted from the learned probability distribution and the actual data is obtained for each individual data series. An abnormal score calculation means for each data to be calculated for data, and means provided corresponding to different classification criteria by a combination of one or more data attributes, wherein each data in the data series is classified according to the corresponding classification criteria For each group, the conditional probability distribution of the calculated abnormal score is learned for the data belonging to the group, and an overhead score representing the degree of abnormality is calculated from the learned conditional probability distribution for each of the plurality of classification criteria A program for functioning as a bird's-eye view score calculation means.