JP2007173907A

JP2007173907A - Abnormal traffic detection method and device

Info

Publication number: JP2007173907A
Application number: JP2005364484A
Authority: JP
Inventors: Keisuke Ishibashi; 圭介石橋; Yutaka Hirokawa; 裕廣川; Junji Kobayashi; 淳史小林; Koyo Yamamoto; 公洋山本; Ryoichi Kawahara; 亮一川原; Tatsuya Mori; 達哉森
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2005-12-19
Filing date: 2005-12-19
Publication date: 2007-07-05
Anticipated expiration: 2025-12-19
Also published as: JP4112584B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a detection method of an abnormal traffic for performing monitoring with a small amount of memory capacity, monitoring data of N upper field values that are important as the objects to be monitored and a counter value, and detecting abnormality in time series. <P>SOLUTION: The method includes a step for acquiring traffic data; a step for designating at least one field to be monitored and the kind of counter to the field; a step for calculating N (N≥2) upper field values to the designation counter of a designated field at a preset interval from the acquired traffic data, and for counting the counter value; a step for storing the N calculated upper field values and the data of the counter value to calculate the degree of similarity between newly calculated data and past ones; and a fifth step for issuing an alarm as a fault when the calculated degree of similarity is smaller than a preset threshold to extract the field value estimated to be the cause of a decrease in the degree of similarity. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、異常トラヒック検出方法及び装置に係り、特に、コンピュータネットワークのトラヒックの異常を検出する方法および装置に関するものである。 The present invention relates to an abnormal traffic detection method and apparatus, and more particularly to a method and apparatus for detecting an abnormality in computer network traffic.

インターネットにおける特定サーバに対する攻撃などの異常トラヒックの増加に伴い、異常トラヒックの検出手法に対する要求が高まっている。
現状の異常トラヒック検出手法は、シグネチャに基づくものと、統計情報に基づくものに大別される。
前者は、予め異常と判断されたトラヒックパターンをシグネチャとして用意しておき、観測トラヒックと同シグネチャを比較することによって観測トラヒックが異常か否かを判断するものである。
同手法は、観測トラヒックが異常か否かを確実に判断できる半面、すべての異常トラヒックパターンを用意することが困難であること、特に、シグネチャが得られていない新規の異常トラヒックを検知することが出来ないという問題がある。 With an increase in abnormal traffic such as attacks on specific servers on the Internet, there is an increasing demand for a method for detecting abnormal traffic.
Current anomalous traffic detection methods are roughly classified into those based on signatures and those based on statistical information.
In the former, a traffic pattern determined to be abnormal in advance is prepared as a signature, and it is determined whether or not the observed traffic is abnormal by comparing the observed traffic with the same signature.
While this method can reliably determine whether or not the observed traffic is abnormal, it is difficult to prepare all the abnormal traffic patterns. In particular, it can detect new abnormal traffic for which no signature has been obtained. There is a problem that it cannot be done.

統計情報に基づく手法は、過去の観測トラヒック履歴から統計情報を作成しておき、新規に観測したトラヒックが異常か否かを、過去のトラヒックから統計的に逸脱しているか否かで判断するものである。観測対象としては、パケット数などの総トラヒック量に基づくものが一般的である。
しかしながら、総トラヒック量観測に基づく統計的異常検出では、総トラヒックの変化として現れないようなトラヒックデータ中の一部の変化による異常検出が困難、また、異常検出した後の異常原因特定が困難という問題がある。
一方、総トラヒック量監視ではなく、指定フィールド（例えば、送信ＩＰアドレスフィールド）のフィールド値毎の指定カウンタ（例えば、送信ＩＰアドレス毎のパケット数もしくは送信ＩＰアドレス毎のバイト数）を保持し、このカウンタ値の統計情報から異常検出する手法では、このような問題はなく、一部の送信ＩＰアドレスのパケット数の変化の検出、また、変化を起こした送信ＩＰアドレスの特定が可能である。
しかしながら、この方式では観測されるフィールド値の数と同じ数のカウンタを必要とし、多数のフィールド値が発生するトラヒックに、この方式を適用するためには、膨大なメモリを必要とするため、実現が困難である。 The method based on statistical information creates statistical information from past observed traffic history, and judges whether newly observed traffic is abnormal or not based on statistical deviation from past traffic. It is. The observation target is generally based on the total traffic such as the number of packets.
However, in statistical anomaly detection based on total traffic volume observation, it is difficult to detect anomalies due to some changes in traffic data that do not appear as changes in total traffic, and it is difficult to identify the cause of anomalies after anomaly detection There's a problem.
On the other hand, instead of monitoring the total traffic volume, a specified counter (for example, the number of packets for each transmission IP address or the number of bytes for each transmission IP address) for each field value of the specified field (for example, the transmission IP address field) is held. In the method of detecting an abnormality from the statistical information of the counter value, there is no such problem, and it is possible to detect a change in the number of packets of some transmission IP addresses and to specify the transmission IP address that has caused the change.
However, this method requires the same number of counters as the number of field values to be observed, and it requires a huge amount of memory to apply this method to traffic that generates a large number of field values. Is difficult.

なお、本願発明に関連する先行技術文献としては以下のものがある。
特開２００５-６５２９４号公報 Robert Schweller，Ashish Gupta，Elliot Parsons，Yan Chen，“Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams，”Proceedings of the ACM SIGCOMM Internet Measurrment Conference;(IMC)，Octber2004. C.Estan and G.Varghese，“New directions in traffic measurement and accounting，”in Proc.ACM SIGCOMM，August 2002. G.Cormode and S.Muthukrishnan. WHat's hot and what's not: Tracking most frequent items dynamically. In Proceeding of ACM Principles of Database sSystems，June 2003. RFC 3577-Introduction to the Remote Monitoring(RMON)Family of MIB Modules Benoit Claise，“NetFlow features update，”Proceedings of Sampling 2005,July 2005 As prior art documents related to the invention of the present application, there are the following.
JP 2005-65294 A Robert Schweller, Ashish Gupta, Elliot Parsons, Yan Chen, “Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams,” Proceedings of the ACM SIGCOMM Internet Measurrment Conference; (IMC), Octber 2004. C. Estan and G. Varghese, “New directions in traffic measurement and accounting,” in Proc. ACM SIGCOMM, August 2002. G.Cormode and S.Muthukrishnan.WHat's hot and what's not: Tracking most frequent items dynamically.In Proceeding of ACM Principles of Database sSystems, June 2003. RFC 3577-Introduction to the Remote Monitoring (RMON) Family of MIB Modules Benoit Claise, “NetFlow features update,” Proceedings of Sampling 2005, July 2005

前述した課題を解決するために、前述の特許文献１では、フィールド値毎のカウンタではなく、スケッチと呼ばれる、フィールド値をハッシュ関数等で固定領域の数値に変換した値毎のカウンタを持つことにより、多数のフィールド値のカウンタを固定サイズのメモリ容量で監視する方式を提案している。
しかしながら、この方式では、フィールド値がハッシュ関数によって変換されているため、異常検出した後に、異常原因となったフィールド値を特定するのが困難という問題がある。前述の非特許文献１において、この困難さを緩和する方法が提案されているものの、異常原因特定のために繰り返し演算が必要であり、計算量の面での課題が残る。
一方、トラヒック監視に重要な、カウンタ値が大きいフィールド値のみを、限られたメモリ容量で抽出する手法が、前述の非特許文献２、３で提案されている。
また、ネットワーク機器によっては、前述の非特許文献４、５に記載されているように、カウンタ値の大きい順の上位Ｎ個のフィールド値、およびそのカウンタ値を出力する方式を実装しているものもある。 In order to solve the above-described problem, the above-mentioned Patent Document 1 has not a counter for each field value but a counter for each value called a sketch, which is a field value converted into a fixed area value using a hash function or the like. Have proposed a method of monitoring counters of a large number of field values with a fixed memory capacity.
However, this method has a problem that it is difficult to identify the field value that caused the abnormality after detecting the abnormality because the field value is converted by the hash function. In the above-mentioned Non-Patent Document 1, although a method for alleviating this difficulty has been proposed, iterative calculation is necessary for specifying the cause of the abnormality, and there remains a problem in terms of calculation amount.
On the other hand, methods for extracting only a field value having a large counter value, which is important for traffic monitoring, with a limited memory capacity have been proposed in the aforementioned Non-Patent Documents 2 and 3.
In addition, as described in Non-Patent Documents 4 and 5 described above, some network devices implement a method of outputting the top N field values in descending order of counter values and the counter values. There is also.

しかし、これら上位Ｎフィールド値およびカウンタ値を抽出する手法では、ある時点における上位Ｎフィールド値およびカウンタ値データのスナップショットを提供するのみであり、時系列上の上位Ｎフィールド値データの異常を検出するものではない。
本発明は、前記従来技術の問題点を解決するためになされたものであり、本発明の目的は、少ないメモリ容量で監視可能であり、かつ、監視対象として重要な上位Ｎ個のフィールド値とカウンタ値のデータを監視し、その時系列上の異常を検出する異常トラヒック検出方法および装置を提供することにある。
本発明の前記ならびにその他の目的と新規な特徴は、本明細書の記述及び添付図面によって明らかにする。 However, the method of extracting the top N field value and the counter value only provides a snapshot of the top N field value and the counter value data at a certain point in time, and detects an abnormality in the top N field value data on the time series. Not what you want.
The present invention has been made to solve the above-described problems of the prior art, and an object of the present invention is to monitor the top N field values that can be monitored with a small memory capacity and are important for monitoring. An object of the present invention is to provide an abnormal traffic detection method and apparatus for monitoring data of a counter value and detecting an abnormality in the time series.
The above and other objects and novel features of the present invention will become apparent from the description of this specification and the accompanying drawings.

本願において開示される発明のうち、代表的なものの概要を簡単に説明すれば、下記の通りである。
前述の目的を達成するために、本発明は、時系列トラヒックデータ中の異常を検出する異常トラヒック検出方法であって、トラヒックデータを取得する第１のステップと、監視対象となる少なくとも１つのフィールド、およびそのフィールドに対するカウンタの種類を指定する第２のステップと、取得したトラヒックデータから、予め定められた時間間隔で、指定されたフィールドの指定カウンタに対する上位Ｎ（Ｎ≧２）個のフィールド値、およびそのカウンタ値を算出する第３のステップと、算出した上位Ｎ個のフィールド値と、そのカウンタ値データを蓄積し、新規に算出されたデータと過去のデータとの類似度を計算する第４のステップと、計算された類似度が、予め定められた閾値を下回っていれば異常として警報を発し、かつ類似度が低下した原因と推定されるフィールド値を抽出する第５のステップとを含むことを特徴とする。
また、本発明では、前記第４のステップにおいて算出する類似度が、新規に算出された上位Ｎフィールド値のカウンタ値と、当該フィールド値の過去のカウンタ値との相関係数に基づくものである。 Of the inventions disclosed in this application, the outline of typical ones will be briefly described as follows.
In order to achieve the above object, the present invention provides an abnormal traffic detection method for detecting an anomaly in time series traffic data, the first step of acquiring traffic data, and at least one field to be monitored. , And a second step of designating the type of counter for the field, and the top N (N ≧ 2) field values for the designated counter of the designated field at a predetermined time interval from the acquired traffic data , And a third step of calculating the counter value, the calculated top N field values, and the counter value data are accumulated, and the similarity between the newly calculated data and the past data is calculated. Step 4 and if the calculated similarity is below a predetermined threshold, an alarm is given as an abnormality, and Degrees is characterized in that it comprises a fifth step of extracting a field value that is estimated to cause of the reduction.
In the present invention, the similarity calculated in the fourth step is based on a correlation coefficient between the newly calculated counter value of the upper N field value and the past counter value of the field value. .

また、本発明では、前記第４のステップにおいて算出する類似度が、ある過去の監視期間における上位Ｎフィールド値のカウンタ値と、該フィールド値の新規算出されたカウンタ値との相関係数に基づくものである。
また、本発明では、前記第４のステップにおいて算出する類似度が、新規に算出された上位Ｎフィールド値のカウンタ値と、過去に算出された上位Ｎフィールド値の同一順位のカウンタ値との相関係数に基づくものである。
また、本発明では、前述の各方法において、前記の前記第４のステップにおいて算出する類似度として、前記相関係数に代えて、カウンタ値間の距離の和に基づくものを使用する。
また、本発明では、前記第４のステップにおいて算出する類似度が、新規に算出された上位Ｎフィールド値の順位と、当該フィールド値の過去の順位との相関係数に基づくものである。
また、本発明では、前記第４のステップにおいて算出する類似度が、ある過去の監視期間における上位Ｎフィールド値の順位と、該フィールド値の新規算出されたカウンタ値における順位との相関係数に基づくものである。
また、本発明では、前記第４のステップにおいて算出する類似度が、過去に算出されたデータから算出されたフィールド値毎のカウンタ値の時系列予測と、新規に算出された上位Ｎフィールド値のカウンタ値との誤差に基づくものである。 In the present invention, the similarity calculated in the fourth step is based on a correlation coefficient between the counter value of the upper N field value in a certain past monitoring period and the newly calculated counter value of the field value. Is.
In the present invention, the similarity calculated in the fourth step is a phase difference between the newly calculated counter value of the upper N field value and the counter value of the upper N field value calculated in the past in the same order. Based on the number of relationships.
In the present invention, in the above-described methods, the similarity calculated in the fourth step is based on the sum of the distances between the counter values instead of the correlation coefficient.
In the present invention, the similarity calculated in the fourth step is based on a correlation coefficient between the newly calculated rank of the top N field value and the past rank of the field value.
In the present invention, the similarity calculated in the fourth step is a correlation coefficient between the rank of the top N field value in a certain past monitoring period and the rank of the field value in the newly calculated counter value. Is based.
In the present invention, the similarity calculated in the fourth step is calculated based on the time series prediction of the counter value for each field value calculated from the data calculated in the past, and the newly calculated upper N field value. This is based on an error from the counter value.

また、本発明では、前記第５のステップにおいて、上位Ｎフィールド値をカウンタ値間の距離に関して降順に並べた際に、上位のものから当該フィールド値のカウンタ値を類似度の算出から省いた場合に閾値を下回らなくなるフィールド値を、前記類似度が低下した原因のフィールド値として推定する。
また、本発明では、前記第５のステップにおいて、上位Ｎフィールド値を順位の距離に関して降順に並べた際に、上位のものから当該フィールド値の順位を類似度の算出から省いた場合に閾値を下回らなくなるフィールド値を、前記類似度が低下した原因のフィールド値として推定する。
また、本発明では、前記第５のステップにおいて、上位Ｎフィールド値毎のカウンタ値の時系列予測と、新規に算出された上位Ｎフィールド値のカウンタ値の誤差の距離に関して降順に並べた際に、上位のものから該フィールド値のカウンタ値を類似度の算出から省いた場合に閾値を下回らなくなるフィールド値を、前記類似度が低下した原因のフィールド値として推定する。
また、本発明は、前述の各異常トラヒック検出方法を実行する異常トラヒック検出装置である。 In the present invention, when the top N field values are arranged in descending order with respect to the distance between the counter values in the fifth step, the counter value of the field value is omitted from the similarity calculation from the top one. The field value that does not fall below the threshold value is estimated as the field value that caused the similarity to decrease.
In the present invention, in the fifth step, when the top N field values are arranged in descending order with respect to the rank distance, the threshold value is set when the rank of the field value is omitted from the similarity calculation from the top rank. The field value that does not fall below is estimated as the field value that caused the similarity to decrease.
In the present invention, in the fifth step, when the counter value time series prediction of the top N field value and the error distance of the counter value of the newly calculated top N field value are arranged in descending order, The field value that does not fall below the threshold when the counter value of the field value is omitted from the similarity calculation from the higher rank is estimated as the field value that caused the similarity to decrease.
The present invention is also an abnormal traffic detection apparatus that executes each of the abnormal traffic detection methods described above.

本願において開示される発明のうち代表的なものによって得られる効果を簡単に説明すれば、下記の通りである。
本発明によれば、少ない容量のメモリで監視可能な上位Ｎ個のフィールド値およびそのカウンタ値データのみを監視することにより、それらデータの異常を検出し、かつ異常の原因となるフィールド値を検出することが可能となる。 The effects obtained by the representative ones of the inventions disclosed in the present application will be briefly described as follows.
According to the present invention, by monitoring only the top N field values and counter value data that can be monitored with a small amount of memory, an abnormality in those data is detected, and a field value that causes the abnormality is detected. It becomes possible to do.

以下、図面を参照して本発明の実施例を詳細に説明する。
なお、実施例を説明するための全図において、同一機能を有するものは同一符号を付け、その繰り返しの説明は省略する。
図１は、本発明の実施例の異常トラヒック検出装置の基本構成を示すブロック図である。本実施例の異常トラヒック検出方法装置は、ＩＰネットワーク中に配置される。
トラヒック取得部１０は、ネットワーク内のリンク上、およびネットワーク内部のノード内で転送されるトラヒック情報を取得し、取得トラヒック情報を上位Ｎフィールド値算出部１２に転送する。
フィールド、カウンタ種類指定部１１は、上位Ｎフィールド値算出部１２に対して、フィールド、カウンタ種類を指定する。ここで、フィールドの指定例としては、ヘッダ情報である発信ＩＰアドレス、受信ＩＰアドレス、｛発信ＩＰナドレス、受信ＩＰアドレス、プロトコル、発信ポート番号、受信ポート番号｝の組（通常フローと呼ばれる）や、ペイロード情報であるＵＲＩ、ＤＮＳドメイン名が挙げられる。
カウンタ種類の指定例としては、パケット数、バイト数、発信・受信ＩＰアドレスのフィールド値に対しては、フロー数、ＵＲＩ、ＤＮＳドメイン名等のフィールド値に対しては、発信ＩＰアドレス数などが挙げられる。
上位Ｎフィールド値算出部１２は、トラヒック取得部１０からのトラヒック情報を受け、フィールド、カウンタ種類指定部１１から指示された、指定フィールドの指定カウンタに関する上位Ｎ個のフィールド値、カウンタ値の組を算出する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
In all the drawings for explaining the embodiments, parts having the same functions are given the same reference numerals, and repeated explanation thereof is omitted.
FIG. 1 is a block diagram showing a basic configuration of an abnormal traffic detection apparatus according to an embodiment of the present invention. The apparatus for detecting abnormal traffic according to the present embodiment is arranged in an IP network.
The traffic acquisition unit 10 acquires traffic information transferred on a link in the network and within a node in the network, and transfers the acquired traffic information to the upper N field value calculation unit 12.
The field / counter type designation unit 11 designates the field / counter type to the upper N field value calculation unit 12. Here, as a field specification example, a set of header information such as a transmission IP address, a reception IP address, {transmission IP address, reception IP address, protocol, transmission port number, reception port number} (referred to as a normal flow), URI, which is payload information, and DNS domain name.
Examples of specifying the counter type include the number of packets, the number of bytes, the field value of the originating / receiving IP address, the number of flows, the field number of the URI, DNS domain name, etc. Can be mentioned.
The upper N field value calculation unit 12 receives the traffic information from the traffic acquisition unit 10 and sets a combination of the upper N field values and counter values related to the designated counter of the designated field designated by the field and counter type designation unit 11. calculate.

図２に、指定フィールドとして、発信ＩＰアドレス、カウンタ値としてパケット数を指定した場合の上位Ｎフィールド値算出部１２の出力の一例を示す。ここでは、Ｎ＝５、出力間隔５分間としている。
類似度算出部１３は、出力された上位Ｎ個のフィールド値とそのカウンタ値のデータを過去のデータと比較し、最新のデータが過去データとどの程度類似しているかを算出する。
類似度の算出方法は様々な方法が考えられるが、本実施例では、過去データとの相関係数を用いている。
以下、類似度の算出方法の一例について説明する。
時間スロット（ｔ）におけるランキングｉ位のフィールド値をｆ（ｉ，ｔ）とし、時間スロット（ｔ）におけるフィールド値ｆのカウンタ値をｃ（ｆ，ｔ）とする。
したがって、図２の例では、ｆ（１，１）＝１９２．１６８．０．１，ｆ（１，２）＝１９２．１６８．０．２であり、ｃ（ｆ（１，１），１）＝５０００，ｃ（ｆ（１，１），２）＝６０００となる。
このとき、本実施例の第２の方法（本願の請求項２に記載の方法）では、時間スロット（ｔ）の上位Ｎ（Ｎは自然数で、Ｎ≧２）個のフィールド値のそのカウンタ値データと、直近データである時間スロット（ｔ−１）のデータとの相関係数ｒ（ｔ，ｔ−１）を、下記（１）式を用いて計算する。 FIG. 2 shows an example of the output of the upper N field value calculation unit 12 when the originating IP address is designated as the designated field and the number of packets is designated as the counter value. Here, N = 5 and the output interval is 5 minutes.
The similarity calculation unit 13 compares the data of the output top N field values and the counter values with past data, and calculates how similar the latest data is to the past data.
There are various methods for calculating the similarity, but in this embodiment, a correlation coefficient with past data is used.
Hereinafter, an example of a method for calculating the similarity will be described.
The field value at the ranking i in the time slot (t) is f (i, t), and the counter value of the field value f in the time slot (t) is c (f, t).
Therefore, in the example of FIG. 2, f (1,1) = 192.168.0.1, f (1,2) = 192.168.0.2, and c (f (1,1), 1 ) = 5000, c (f (1,1), 2) = 6000.
At this time, in the second method of the present embodiment (the method according to claim 2 of the present application), the counter value of the upper N (N is a natural number, N ≧ 2) field values of the time slot (t). The correlation coefficient r (t, t-1) between the data and the data in the time slot (t-1) which is the latest data is calculated using the following equation (1).

ここで、ｃ_ａｖｇ（ｔ）、ｃ_ａｖｇ（ｔ−１）は、それぞれ時刻スロット（ｔ，ｔ−１）における、ｆ（ｉ，ｔ）のカウンタ値の平均であり、それぞれ、下記（２）式、（３）式で表される。

Here, c _avg (t) and c _avg (t−1) are averages of the counter values of f (i, t) in the time slot (t, t−1), respectively. It is represented by the formula (3).

前述の（１）式で示されるように、相関係数は、「−１」から「１」の値をとり、時刻スロット（ｔ，ｔ−１）において、上位Ｎ個のフィールド値およびそのカウンタ値が同一であれば、相関係数ｒ（ｔ，ｔ−１）は「１」である。したがって、相関係数が「１」に近いほど類似していることを示す。
なお、例えば、図２における時刻スロット３の１位のフィールド値”１０．０．０．１”に対する時刻スロット２におけるカウンタ値のように、時刻スロット（ｔ）には上位Ｎ個に現れたフィールド値が、時刻スロット（ｔ−１）には上位Ｎ個以内には存在せず、対応するカウンタ値データが存在しないことも考えられる。その場合は、例えば、そのようなカウンタ値を「０」として相関係数を計算する方法が考えられる。
図３は、図２の例における時刻スロット２と、時刻スロット３のカウンタ値を図示したものである。
通常のトラヒックでは、一定間隔で算出した上位Ｎ個のフィールド値とそのカウンタ値データは、その他のフィールド値とそのカウンタ値データと比較して、大数の法則により変化が少なく、互いに類似していることが予想される。
図４は、ある２４時間のトラヒックデータの上位１００フィールド値のデータについて、隣り合う時刻スロット間の相関係数ｒ（ｔ，ｔ−１）の時系列を示したものである。
殆どの時刻で予想される通り、上位１００個のデータは類似しており、相関係数は、（０．９５〜１）の間であるが、３時および２０時付近に相関係数の急激な落ち込みが見られる。
これは、この時間帯に他の時間帯では下位に存在したフィールド値が、急激に上位に現れたためである。 As shown in the above equation (1), the correlation coefficient takes a value from “−1” to “1”, and in the time slot (t, t−1), the top N field values and their counters If the values are the same, the correlation coefficient r (t, t−1) is “1”. Therefore, the closer the correlation coefficient is to “1”, the more similar.
For example, as shown in the counter value in time slot 2 for the first field value “10.0.0.1” in time slot 3 in FIG. It is also conceivable that the value does not exist in the top N in the time slot (t−1) and there is no corresponding counter value data. In this case, for example, a method of calculating the correlation coefficient with such a counter value as “0” can be considered.
FIG. 3 shows the counter values of time slot 2 and time slot 3 in the example of FIG.
In normal traffic, the top N field values calculated at regular intervals and their counter value data are less changed by the law of large numbers than other field values and their counter value data, and are similar to each other. It is expected that
FIG. 4 shows a time series of correlation coefficients r (t, t−1) between adjacent time slots for data of the top 100 field values of traffic data for a certain 24 hours.
As expected at most times, the top 100 data are similar and the correlation coefficient is between (0.95 and 1), but the correlation coefficient suddenly increases at around 3 o'clock and 20 o'clock. There is a slight decline.
This is because the field values that existed in the lower order in other time periods suddenly appeared in the upper order during this time period.

一方、過去データに現れた上位Ｎ個のフィールド値が新規データになかった場合の異常を検出するためには、過去の上位Ｎフィールド値に基づくカウンタ値の相関係数を計算する必要がある。本実施例の第３の方法（本願の請求項３に記載の方法）はそのような相関係数を計算する手法である。
また、短時間で発生終了するフローなどのフィールドの場合、上位Ｎフィールド値の変動は大きいが、対応するカウンタ値は比較的変動が小さいことが予想される。この場合は、フィールド値を固定して、対応する過去と新規のカウンタ値の相関係数を比較するより、それぞれ順位を固定して、対応する過去と新規のカウンタ値の相関係数を比較する方法が有効な場合も考えられる。
本実施例の第４の方法（本願の請求項４に記載の方法）においては、このような場合において各順位の対応するカウンタ値の相関係数を用いる。
本実施例の第５の方法（本願の請求項５に記載の方法）においては、前述の第２，３，４の方法において、相関係数でなく、過去と新規データにおけるカウンタ値間の差の絶対値の和の逆数を用いるものである。本方法は、相関係数のように、「−１」から「１」に正規化した類似度ではなく、カウンタ値の差の絶対値を重視した異常検出を実施したい場合に有用である。 On the other hand, in order to detect an abnormality when the top N field values appearing in the past data are not in the new data, it is necessary to calculate the correlation coefficient of the counter value based on the past top N field values. The third method of the present embodiment (the method described in claim 3 of the present application) is a method for calculating such a correlation coefficient.
Further, in the case of a field such as a flow that ends in a short time, the upper N field value varies greatly, but the corresponding counter value is expected to vary relatively little. In this case, the field value is fixed and the correlation coefficient between the corresponding past and new counter values is compared, and the order is fixed and the correlation coefficient between the corresponding past and new counter values is compared. There are cases where the method is effective.
In the fourth method of the present embodiment (the method described in claim 4 of the present application), the correlation coefficient of the counter value corresponding to each rank is used in such a case.
In the fifth method of the present embodiment (the method according to claim 5 of the present application), the difference between the counter values in the past and new data, not the correlation coefficient, in the above-described second, third, and fourth methods. The reciprocal of the sum of absolute values of is used. This method is useful when it is desired to perform anomaly detection that emphasizes the absolute value of the difference between counter values, not the similarity normalized from “−1” to “1”, such as a correlation coefficient.

また、上位Ｎフィールド値は得られるが、そのカウンタ値が得られない場合、得られるが変動が大きい場合等はカウンタ値ではなく、順位の変動を用いる方法も考えられる。
本実施例の第６の方法（本願の請求項６に記載の方法）は、このような場合において、カウンタ値を利用せずに順位に基づく異常検出を行うものである。
ここで、時間スロット（ｔ）におけるランキングｉ位のフィールド値をｆ（ｉ，ｔ）とし、時間スロット（ｔ）の上位Ｎ（Ｎは自然数で、Ｎ≧２）個のフィールド値の順位をｒ（ｆ、ｔ）とするとき、順位の相関係数ｒ_ｃ（ｔ，ｔ−１）は、下記（４）式で計算される。
さらに、過去データに現れた上位Ｎ個のフィールド値が新規データになかった場合の異常を検出するためには、過去の上位Ｎフィールド値に基づく順位の相関係数を計算する必要がある。本実施例の第７の方法（本願の請求項７に記載の方法）は、そのような順位の相関係数を計算する方法である。 In addition, when the upper N field value is obtained, but the counter value is not obtained, or obtained, but the fluctuation is large, a method of using a change in rank instead of the counter value may be considered.
The sixth method of the present embodiment (the method described in claim 6 of the present application) detects an abnormality based on the rank without using the counter value in such a case.
Here, the field value of i-th ranking in the time slot (t) is f (i, t), and the ranking of the top N (N is a natural number, N ≧ 2) field values in the time slot (t) is r. When (f, t) is set, the rank correlation coefficient r _c (t, t−1) is calculated by the following equation (4).
Furthermore, in order to detect an abnormality when the top N field values appearing in the past data are not in the new data, it is necessary to calculate a correlation coefficient of ranks based on the past top N field values. The seventh method of the present embodiment (the method according to claim 7 of the present application) is a method for calculating the correlation coefficient of such rank.

本実施例の第８の方法（本願の請求項８に記載の方法）においては、過去の上位フィールド値のカウンタ値の観測データに基づき、上位Ｎフィールド値毎の時系列予測を、例えば、ＡＲＩＭＡモデルなどの線形予測モデルなどで生成し、新規の上位Ｎフィールド値のカウンタ値と予測値との差の絶対値の和の逆数を類似度として計算する。
本方法は、線形予測モデルを生成するために計算量が多くなる反面、トレンド、周期変動等を異常検出の対象から除くことが出来ると期待される。

In the eighth method of the present embodiment (the method according to claim 8 of the present application), time series prediction for each upper N field value is performed based on the observation data of the past counter value of the upper field value, for example, ARIMA It is generated by a linear prediction model such as a model, and the reciprocal of the sum of absolute values of the difference between the counter value of the new top N field value and the predicted value is calculated as the similarity.
Although this method requires a large amount of calculation to generate a linear prediction model, it is expected that trends, periodic fluctuations, and the like can be excluded from an abnormality detection target.

異常判定、異常原因特定部１４では、類似度算出部１３が出力した類似度が、予め定められた閾値を下回っていないかを比較し、下回っている場合は警報を発する。
閾値の定め方の一例としては、過去の類似度のデータから統計分布を生成し、その９９％値を閾値とする方法が考えられる。さらに、閾値を下回っている場合は、その原因と推定されるフィールド値を抽出する。
抽出方法の一例として、本実施例の第９の方法（本願の請求項９に記載の方法）では、過去と新規のカウンタ値の差の絶対値に関して降順に並べた際に、上位のものから当該フィールド値のカウンタ値を類似度の算出から省いた場合に閾値を下回らなくなるフィールド値を、類似度が低下した原因のフィールド値として推定する。
図３の例では、傾き１の直線から最も離れているフィールド値から類似度低下原因のフィールド値かどうかを計算することになり、まずフィールド値”１０．０．０．１”が類似度低下原因となっているかどうかを計算することになる。
また、本実施例の第１０の方法（本願の請求項１０に記載の方法）では、上位Ｎフィールド値を順位の距離に関して降順に並べた際に、上位のものから当該フィールド値の順位を類似度の算出から省いた場合に閾値を下回らなくなるフィールド値を、前記類似度が低下した原因のフィールド値として推定する。 The abnormality determination and abnormality cause identification unit 14 compares whether or not the similarity output by the similarity calculation unit 13 is below a predetermined threshold, and issues an alarm if the similarity is below.
As an example of how to set the threshold value, a method of generating a statistical distribution from past similarity data and using the 99% value as the threshold value is conceivable. Further, when the value is below the threshold value, the field value estimated as the cause is extracted.
As an example of the extraction method, in the ninth method of the present embodiment (the method described in claim 9 of the present application), when the absolute values of the differences between the past and the new counter values are arranged in descending order, the higher ones are used. When the counter value of the field value is omitted from the similarity calculation, a field value that does not fall below the threshold value is estimated as a field value that causes the similarity to decrease.
In the example of FIG. 3, the field value that is the farthest from the straight line with the slope 1 is used to calculate whether or not the field value is the cause of the similarity decrease. First, the field value “10.0.0.1” decreases the similarity It will be calculated whether it is the cause.
Further, in the tenth method of the present embodiment (the method according to claim 10 of the present application), when the top N field values are arranged in descending order with respect to the rank distance, the ranks of the field values from the top are similar. A field value that does not fall below the threshold when omitted from the degree calculation is estimated as a field value that causes the similarity to decrease.

さらに、本実施例の第１１の方法（本願の請求項１１に記載の方法）では、上位Ｎフィールド値毎のカウンタ値の時系列予測と、新規に算出された上位Ｎフィールド値のカウンタ値の誤差の距離に関して降順に並べた際に、上位のものから該フィールド値のカウンタ値を類似度の算出から省いた場合に閾値を下回らなくなるフィールド値を、前記類似度が低下した原因のフィールド値として推定する。
通常のトラヒックでは、一定間隔で算出した上位Ｎ個のフィールド値とそのカウンタ値データは、その他のフィールド値とそのカウンタ値データと比較して、大数の法則により変化が少なく、互いに類似していることが予想される。
従って、本実施例によれば、過去のデータとの類似度を監視することにより、上位Ｎ個のフィールド値とそのカウンタ値データに異常があった場合、類似度の低下をもって比較的容易に統計的異常度を検出でき、かつその要因となるフィールド値の検出も容易という利点がある。
以上、本発明者によってなされた発明を、前記実施例に基づき具体的に説明したが、本発明は、前記実施例に限定されるものではなく、その要旨を逸脱しない範囲において種々変更可能であることは勿論である。 Furthermore, in the eleventh method of the present embodiment (the method according to claim 11 of the present application), the time series prediction of the counter value for each upper N field value and the newly calculated counter value of the upper N field value are performed. When the error values are arranged in descending order, the field value that does not fall below the threshold when the counter value of the field value is omitted from the calculation of the similarity from the top is used as the field value that caused the similarity to decrease. presume.
In normal traffic, the top N field values calculated at regular intervals and their counter value data are less changed by the law of large numbers than other field values and their counter value data, and are similar to each other. It is expected that
Therefore, according to the present embodiment, by monitoring the degree of similarity with past data, if there is an abnormality in the top N field values and the counter value data, it is relatively easy to perform statistics with a decrease in degree of similarity. This is advantageous in that it can detect the degree of mechanical anomaly and easily detect the field value that causes it.
As mentioned above, the invention made by the present inventor has been specifically described based on the above embodiments. However, the present invention is not limited to the above embodiments, and various modifications can be made without departing from the scope of the invention. Of course.

本発明の実施例の異常トラヒック検出装置の基本構成を示すブロック図である。It is a block diagram which shows the basic composition of the abnormal traffic detection apparatus of the Example of this invention. 上位Ｎフィールド値およびそのカウンタ値の一例を示す図である。It is a figure which shows an example of an upper N field value and its counter value. 図２に示す例において、時刻スロット２時刻とスロット３のカウンタ値の関係を示すグラフである。In the example shown in FIG. 2, it is a graph which shows the relationship between the time slot 2 time and the counter value of slot 3. 上位Ｎフィールド値とそのカウンタ値の、隣り合う時刻スロット間の相関係数の一例を示す図である。It is a figure which shows an example of the correlation coefficient between adjacent time slots of an upper N field value and its counter value.

Explanation of symbols

１０トラヒック取得部
１１フィールド、カウンタ種類指定部
１２上位Ｎフィールド値算出部
１３類似度算出部
１４異常判定、異常原因特定部 DESCRIPTION OF SYMBOLS 10 Traffic acquisition part 11 Field, counter kind designation | designated part 12 Upper N field value calculation part 13 Similarity calculation part 14 Abnormality determination, abnormal cause identification part

Claims

An anomaly traffic detection method for detecting anomalies in time series traffic data,
A first step of obtaining traffic data;
A second step of specifying at least one field to be monitored and a counter type for the field;
A third step of calculating the top N (N ≧ 2) field values for the designated counter of the designated field, and the counter value from the acquired traffic data at a predetermined time interval;
A fourth step of accumulating the calculated top N field values and the counter value data, and calculating the similarity between the newly calculated data and past data;
And a fifth step of extracting a field value that is presumed to be the cause of the decrease in similarity, and that issues a warning as an abnormality if the calculated similarity is below a predetermined threshold. To detect abnormal traffic.

The similarity calculated in the fourth step is based on a correlation coefficient between a newly calculated counter value of the top N field value and a past counter value of the field value. Item 6. The abnormal traffic detection method according to Item 1.

The similarity calculated in the fourth step is based on a correlation coefficient between a counter value of the top N field value in a certain past monitoring period and a newly calculated counter value of the field value. The method for detecting abnormal traffic according to claim 1.

The similarity calculated in the fourth step is based on a correlation coefficient between a newly calculated counter value of the top N field value and a counter value of the same rank of the top N field value calculated in the past. The abnormal traffic detection method according to claim 1, wherein:

5. The similarity calculated in the fourth step is based on a sum of distances between counter values instead of the correlation coefficient. 6. The anomaly traffic detection method described in 1.

The similarity calculated in the fourth step is based on a correlation coefficient between the rank of the newly calculated top N field value and the past rank of the field value. The anomaly traffic detection method described in 1.

The similarity calculated in the fourth step is based on the correlation coefficient between the rank of the top N field value in a certain past monitoring period and the rank of the field value in the newly calculated counter value. The method for detecting abnormal traffic according to claim 1, wherein:

The similarity calculated in the fourth step is an error between the time series prediction of the counter value for each field value calculated from the data calculated in the past and the counter value of the newly calculated upper N field value. The abnormal traffic detection method according to claim 1, wherein the method is based on the above.

In the fifth step, when the upper N field values are arranged in descending order with respect to the distance between the counter values, the field that does not fall below the threshold when the counter value of the field value is omitted from the similarity calculation from the upper one. 6. The abnormal traffic detection method according to claim 2, wherein the value is estimated as a field value that causes the similarity to decrease.

In the fifth step, when the top N field values are arranged in descending order with respect to the rank distance, a field value that does not fall below the threshold when the rank of the field value is omitted from the similarity calculation from the top rank, The abnormal traffic detection method according to claim 6, wherein the field value of the cause of the decrease in the similarity is estimated.

In the fifth step, when the time series prediction of the counter value for each top N field value and the error distance of the counter value of the newly calculated top N field value are arranged in descending order, 9. The abnormal traffic detection according to claim 8, wherein a field value that does not fall below a threshold when the counter value of the field value is omitted from the similarity calculation is estimated as a field value that causes the similarity to decrease. Method.

An anomaly traffic detection device for detecting anomalies in time series traffic data,
Means for obtaining traffic data;
Means for designating at least one field to be monitored and a counter type for the field;
Means for calculating upper (N ≧ 2) field values for a designated counter of a designated field and a counter value thereof at a predetermined time interval from the acquired traffic data;
Means for accumulating the calculated top N field values and their counter value data, and calculating the similarity between the newly calculated data and past data;
An abnormal traffic characterized by including a means for issuing an alarm as an abnormality if the calculated similarity is below a predetermined threshold value and extracting a field value presumed to be the cause of the decrease in the similarity Detection device.