JP2008118242A

JP2008118242A - Method and device for detecting abnormal traffic, and program

Info

Publication number: JP2008118242A
Application number: JP2006297383A
Authority: JP
Inventors: Ryoichi Kawahara; 亮一川原; Tatsuya Mori; 達哉森; Kensho Kamiyama; 憲昭上山; Shigeaki Harada; 薫明原田; Keisuke Ishibashi; 圭介石橋
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2006-11-01
Filing date: 2006-11-01
Publication date: 2008-05-22
Anticipated expiration: 2026-11-01
Also published as: JP4324189B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an abnormal traffic detection technique capable of detecting an abnormal traffic appropriately even if performing packet sampling. <P>SOLUTION: An abnormal traffic detector 20 has a section 31 for determining the number of division groups, a group classification section 32, a time series analysis section 33, and an abnormal traffic detection section 34. The section 31 determines the number of division groups. The group classification section 32 groups an observed amount of traffic N_t into M groups according to predetermined rules, sets the amount of traffic in a jth group to N_t(j) (in this case, Σ<SB>j</SB>=<SB>1-M</SB>N_t(j)=N_t), and notifies the time series analysis section 33 of N_t(j). The time series analysis section 33 analyzes time series relating to N_t(j) in each group j and detects that an abnormal traffic has occurred in the group when a timelike change in N_t(j) is detected. The abnormal traffic detection section 34 notifies an operator of the detection of the abnormal traffic. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、ＩＰネットワークにおけるトラヒックを管理する技術に関するものである。 The present invention relates to a technique for managing traffic in an IP network.

ＩＰネットワークが広く利用されてくるに伴って、ＩＰネットワーク上での通信品質保証に対する要求が高まっている、その一方で、ＤＤｏＳ攻撃やネットワークスキャンなど、様々な異常トラヒックが発生してユーザの通信品質を阻害することが問題となっている。従って、ネットワークを監視してこのような異常トラヒックを迅速に検出し、適切な処置を行う必要がある。一方、ネットワークの大規模化、超高速化に伴い、全パケットをキャプチャして分析するのはスケーラビリティに問題があるため、近年、各ルータでパケットサンプリングを実施し、サンプルされたフロー（フローとは、同一の（発信元ＩＰアドレス、着信先ＩＰアドレス、プロトコル番号、発信元ポート番号、着信先ポート番号）を持つパケット群）に関する統計情報をネットワーク監視装置で集約して、フロー単位で広域トラヒック監視を行う方法が着目されている。 As IP networks are widely used, there is an increasing demand for guaranteeing communication quality over IP networks. On the other hand, various abnormal traffic such as DDoS attacks and network scans occur, resulting in user communication quality. Inhibiting the problem is a problem. Therefore, it is necessary to monitor the network to quickly detect such abnormal traffic and take appropriate measures. On the other hand, as the scale of the network increases and the speed increases, capturing and analyzing all packets has a problem with scalability. In recent years, packet sampling has been performed at each router, and sampled flows (what is a flow? , Statistical information on the same (packets having the same source IP address, destination IP address, protocol number, source port number, destination port number) is aggregated by a network monitoring device, and wide area traffic monitoring is performed in units of flows Attention has been focused on the method of performing.

サンプルフロー情報を分析する方法として、これまでに非特許文献１、２、３のような検討がなされている。非特許文献１では、フローサイズが大きいフローの統計を精度よく得る方法を提案している。非特許文献２では、サンプルされたＳＹＮパケット（ＴＣＰフラグの一つで、通信開始を意味する）の数を用いて、サンプルされていない全体のフロー発生数やフローサイズを推定する方法を提案している。非特許文献３は、サイズの大きい、あるいは帯域の占有率が高いフローを特定し、それらフローを過剰に発生するユーザを迅速に切り分けることを可能にする。しかしながら、いずれの文献も、定常状態におけるフロー特性を分析するための手法であり、異常トラヒックによるフロー特性の変化を検出するための手法ではなかった。一方、非特許文献４では異常トラヒックによるトラヒックパターンの変化を特徴付ける方法について提案しているが、パケットサンプリングがその性能にどのような影響を与えるかについては考慮していなかった。 Non-patent documents 1, 2, and 3 have been studied as methods for analyzing sample flow information. Non-Patent Document 1 proposes a method for accurately obtaining statistics of a flow having a large flow size. Non-Patent Document 2 proposes a method of estimating the total number of unflowed flows and the flow size using the number of sampled SYN packets (one of the TCP flags, meaning communication start). ing. Non-Patent Document 3 makes it possible to identify a flow having a large size or a high bandwidth occupancy rate, and to quickly identify users who generate excessive flows. However, any of these documents is a method for analyzing the flow characteristics in a steady state, and is not a technique for detecting a change in the flow characteristics due to abnormal traffic. On the other hand, Non-Patent Document 4 proposes a method for characterizing a change in a traffic pattern due to abnormal traffic, but does not consider how the packet sampling affects its performance.

C. Estan and G. Varghese, “New Directions in Traffic Measurement and Accounting”, ACM SIGCOMM2002, Aug. 2002.C. Estan and G. Varghese, “New Directions in Traffic Measurement and Accounting”, ACM SIGCOMM2002, Aug. 2002. N. Duffield, C. Lund, and M. Thorup, “Properties and Prediction of Flow Statistics From Sampled Packet Streams”, ACM SIGCOMM Internet Measurement Conference 2002, Nov. 2002.N. Duffield, C. Lund, and M. Thorup, “Properties and Prediction of Flow Statistics From Sampled Packet Streams”, ACM SIGCOMM Internet Measurement Conference 2002, Nov. 2002. T. Mori et al., “Identifying elephant flows through periodically sampled packets”, ACM SIGCOMM Internet Measurement Conference, 2004.T. Mori et al., “Identifying elephant flows through periodically sampled packets”, ACM SIGCOMM Internet Measurement Conference, 2004. A. Lakhina, M. Crovella, and C. Diot, “Mining Anomalies Using Traffic Feature Distributions”, Proc. ACM SIGCOMM 2005, September 2005.A. Lakhina, M. Crovella, and C. Diot, “Mining Anomalies Using Traffic Feature Distributions”, Proc. ACM SIGCOMM 2005, September 2005. 武田、太田、加藤、根元、“トラヒックパターンを用いた不正アクセス検出及び追跡方式”、電子情報通信学会論文誌Ｂ、Vol. J84-B、no.8、pp.1464-1473、２００１年８月。Takeda, Ota, Kato, Nemoto, “Unauthorized Access Detection and Tracking Method Using Traffic Patterns”, IEICE Transactions B, Vol. J84-B, no.8, pp.1464-1473, August 2001 . A. Gunnar, M. Johansson, and T. Telkamp, “Traffic matrix estimation on a large IP backbone A comparison on real data”, ACM SIGCOMM IMC, Oct, 2004.A. Gunnar, M. Johansson, and T. Telkamp, “Traffic matrix estimation on a large IP backbone A comparison on real data”, ACM SIGCOMM IMC, Oct, 2004.

図１にあるインターネットバックボーンリンクで測定した結果を示す。図１（ａ）は、パケットサンプリングを行わなかった場合すなわちサンプリング確率ｐ＝１の場合、図１（ｂ）、（ｃ）はそれぞれサンプリング確率ｐを０．０１、０．００１にした場合における、フロー数の時系列データである。横軸が時間を表し、縦軸がフロー数を表す。図１（ａ）をみると、フロー数が急激に増加している箇所を確認することができる。実データを詳細に分析したところ、このスパイクは、様々な偽装発信元ＩＰアドレスからいくつかの宛先ＩＰアドレスに向けて大量のＴＣＰのＳＹＮパケットを送出するＳＹＮｆｌｏｏｄｉｎｇであることが分かった。ところが、サンプリングを行った図１（ｂ）（ｃ）をみると、サンプリング確率ｐが小さくなるにつれてこれらの異常トラヒックによるスパイクが小さくなってしまっている。これは、パケットサンプリングすることにより、このような異常フローがサンプルされにくいためである。ネットワークスキャンやポートスキャン、ワームなどの異常トラヒックは、個々のフロー当りのパケット数は少ないもののフローを大量に発生するのが特徴である。このような異常フローは正常フローに比べてフロー当りのパケット数が少ないため、異常フローがサンプルされる確率（つまり、該フローから少なくとも１パケットサンプルされる確率）はサンプリング確率ｐが小さくなるほど正常フローがサンプルされる確率に比べて小さくなる。その結果、異常フローが図１（ｂ）（ｃ）のように埋もれてしまう、という問題点があった。 The result measured by the Internet backbone link in FIG. 1 is shown. 1A shows a case where packet sampling is not performed, that is, a sampling probability p = 1, and FIGS. 1B and 1C show cases where the sampling probability p is 0.01 and 0.001, respectively. It is time series data of the number of flows. The horizontal axis represents time, and the vertical axis represents the number of flows. Looking at FIG. 1 (a), it is possible to confirm a location where the number of flows is rapidly increasing. Detailed analysis of actual data revealed that this spike was SYN flooding that sent a large amount of TCP SYN packets from various forged source IP addresses to several destination IP addresses. However, in FIGS. 1B and 1C in which sampling is performed, spikes due to these abnormal traffics become smaller as the sampling probability p becomes smaller. This is because such an abnormal flow is hardly sampled by packet sampling. Abnormal traffic such as network scans, port scans, and worms is characterized by generating a large number of flows although the number of packets per individual flow is small. Since such an abnormal flow has a smaller number of packets per flow than the normal flow, the probability that the abnormal flow is sampled (that is, the probability that at least one packet is sampled from the flow) is normal flow as the sampling probability p decreases. Becomes smaller than the probability of being sampled. As a result, there is a problem that the abnormal flow is buried as shown in FIGS.

本発明の目的は、上述の問題点に鑑み、パケットサンプリングが行われても異常トラヒックを適切に検出できるような異常トラヒック検出方法およびその装置およびプログラムを提供することにある。 In view of the above-described problems, an object of the present invention is to provide an abnormal traffic detection method, an apparatus, and a program thereof that can appropriately detect abnormal traffic even if packet sampling is performed.

本明細書において開示される発明のうち、代表的なものの概要を簡単に説明すれば、以下のとおりである。 Of the inventions disclosed in this specification, the outline of typical ones will be briefly described as follows.

本発明では、監視トラヒックをいくつかのグループに分割することにより、異常トラヒックの検出確率を高める。ここでは、監視単位毎（例えばリンク毎、あるいは対地間毎）にトラヒックを監視しているとし、予め定めた時間間隔ｔ０毎の発生トラヒック量を測定し、ｔ番目の測定区間におけるトラヒック量をＮ_ｔとおき、Ｎ_ｔの時間変化をみて異常トラヒックを検出する場合を考える。例えば、その監視トラヒックを構成する個々のフローの発信元ＩＰアドレスをみて、同じＩＰアドレスを持つフローを同じグループにマッピングさせてＮ_ｔをＮ_ｔ（ｊ）（ｊ＝１〜Ｍ）に分割する。ここで、Σ_{ｊ＝１〜Ｍ}Ｎ_ｔ（ｊ）＝Ｎ_ｔである。 In the present invention, the probability of detecting abnormal traffic is increased by dividing the monitoring traffic into several groups. Here, it is assumed that the traffic is monitored for each monitoring unit (for example, for each link or between the ground), the generated traffic amount is measured at a predetermined time interval t0, and the traffic amount in the t-th measurement interval is expressed as N_t. Consider the case where abnormal traffic is detected by looking at the time variation of N_t. For example, by looking at the source IP address of each flow constituting the monitoring traffic, flows having the same IP address are mapped to the same group, and N_t is divided into N_t (j) (j = 1 to M). Here, Σ _{j = 1 to} MN_t (j) = N_t.

スキャンのような異常トラヒックはある特定のホストから、多数の着信先ＩＰアドレスや着信先ポート番号に向けて大量のフローを発生するため、ある特定のグループにそのような異常フローは集約される可能性が高い。従って、発信元ＩＰアドレスでグループ分けすることにより、正常トラヒックは各グループへ分散させつつ、異常トラヒックは特定のグループに集約させることが可能となり、該グループでの異常フロー数の正常フロー数に対する比率を高めることにより、異常検出確率を高めることができる。一方、ＤＤｏＳのような特定の宛先ホスト（群）への攻撃トラヒックは、着信先ＩＰアドレスでグループ分けすることにより検出確率を高められると考えられる。 Abnormal traffic such as scanning generates a large number of flows from a specific host to a large number of destination IP addresses and destination port numbers, and such abnormal flows can be aggregated into a specific group. High nature. Therefore, by grouping by source IP address, normal traffic can be distributed to each group while abnormal traffic can be aggregated into a specific group, and the ratio of the number of abnormal flows in this group to the number of normal flows Can increase the probability of abnormality detection. On the other hand, it is considered that attack traffic to a specific destination host (group) such as DDoS can be increased in detection probability by grouping by destination IP address.

本発明の第１の方法においては、観測されたトラヒック量Ｎ_ｔを予め定めた規則にしたがってＭ個にグループ分けし、ｊ番目のグループにおけるトラヒック量をＮ_ｔ（ｊ）とし、（このとき、Σ_{ｊ＝１〜Ｍ}Ｎ_ｔ（ｊ）＝Ｎ_ｔ）、各グループｊにおけるＮ_ｔ（ｊ）に関する時系列を解析することによりＮ_ｔ（ｊ）の時間的変化を検出したら、該グループに異常トラヒックが発生したとして検出することを特徴とする。 In the first method of the present invention, the observed traffic volume N_t is grouped into M according to a predetermined rule, and the traffic volume in the j-th group is N_t (j) (in this case, Σ _{j = 1 to M} N_t (j) = N_t), if a time change of N_t (j) is detected by analyzing the time series related to N_t (j) in each group j, it is detected that abnormal traffic has occurred in the group It is characterized by doing.

なお、時系列の時間的変化により異常トラヒックを検出するという部分については、例えば非特許文献５のように公知の技術である。 Note that the part of detecting abnormal traffic based on time-series temporal changes is a known technique as described in Non-Patent Document 5, for example.

本発明の第２の方法においては、第１の方法において時系列を解析する際、Ｎ_ｔ（ｊ）が、その移動平均ｍに標準偏差σのα倍した値ｍ＋α×σを超えたら、異常トラヒックが発生したとして検出することを特徴とする。 In the second method of the present invention, when the time series is analyzed in the first method, if N_t (j) exceeds a value m + α × σ obtained by multiplying the moving average m by a standard deviation σ, anomalous traffic. Is detected as occurring.

例えばα＝３とすると、異常トラヒックが発生していないときに誤検出する確率を約０．１３％に抑えることになる（Ｎ_ｔ（ｊ）が正規分布に従うと仮定して）。なお、このように平均に標準偏差のα倍加えたものをしきい値にするという方法自体は公知の技術である。 For example, when α = 3, the probability of erroneous detection when abnormal traffic does not occur is suppressed to about 0.13% (assuming that N_t (j) follows a normal distribution). The method of setting the threshold value by adding the average deviation α times the average to the average is a known technique.

本発明の第３の方法においては、第１の方法において観測するトラヒック量として、発生バイト数、発生パケット数、発生フロー数のいずれかをＮ_ｔとして用いることを特徴とする。 The third method of the present invention is characterized in that any of the number of generated bytes, the number of generated packets, and the number of generated flows is used as N_t as the traffic amount observed in the first method.

なお、第１の方法において、Ｍ個のグループ分けする際の規則として、Ｎ_ｔを構成する個々のフローの発信元ＩＰアドレスをみて、同じＩＰアドレスを持つフローを同じグループにマッピングさせてＮ_ｔをＮ_ｔ（ｊ）（ｊ＝１〜Ｍ）に分割してもよい。また、発信元ＩＰアドレスでグループ分けする代わりに、着信先ＩＰアドレスでグループ分けしてもよい。 In the first method, as a rule for grouping M pieces, N_t is assigned to N_t by mapping the flows having the same IP address to the same group by looking at the source IP address of each flow constituting N_t. (J) You may divide | segment into (j = 1-M). Further, instead of grouping by source IP address, grouping may be performed by destination IP address.

本発明の第４の方法においては、第１の方法における分割グループ数Ｍの決定方法として、第３の方法のうちフロー数を監視する場合、正常フローに関して、パケットサンプリングを行う前のフロー数の平均ｍと分散σ＾２を実データ分析により事前に定めておく。ここで、正常フロー数Ｙが平均ｍ、分散σ＾２の正規分布Ｎ（ｍ，σ＾２）に従うとし、Ｆ（ｙ，ｍ，σ＾２）＝Ｐ［Ｙ＜ｙ］とし、（つまりＹがｙ以下となる確率を正規分布Ｎ（ｍ，σ＾２）を用いて計算する）、検出すべき異常フロー数ｄをｄ＝ａｒｇｍｉｎ｛ｄ｜Ｆ（ｍ−ｄ＋α×σ，ｍ，σ＾２）＜ε｝とする。なお、εは異常トラヒック非検知率に対する予め定めた目標値である（０＜ε＜１）。ここで、サンプリング確率ｐでパケットサンプリングを行ったときの平均フロー数ｍ（ｐ）と分散σ（ｐ）＾２を測定しておき、これをＭ個のグループに分割したときのグループｊでのフロー数ｍ（ｐ，ｊ）をｍ（ｐ，ｊ）＝ｍ（ｐ）／Ｍとし、分散σ（ｐ，ｊ）＾２をσ（ｐ，ｊ）＾２＝σ（ｐ）＾２／Ｍにより計算する。以上の準備の下、分割グループ数Ｍを、Ｍ＝ａｒｇｍｉｎ｛Ｍ｜Ｆ（ｍ（ｐ，ｊ）＋α×σ（ｐ，ｊ），ｍ（ｐ，ｊ）＋ｄ×ｐ，σ（ｐ，ｊ）＾２＋ｄ×ｐ×（１−ｐ））＜ε｝と設定することを特徴とする。 In the fourth method of the present invention, as a method of determining the number of divided groups M in the first method, when monitoring the number of flows in the third method, the number of flows before packet sampling is performed for the normal flow. Average m and variance σ ^ 2 are determined in advance by actual data analysis. Here, it is assumed that the normal flow number Y follows a normal distribution N (m, σ ^ 2) having an average m and a variance σ ^ 2, and F (y, m, σ ^ 2) = P [Y <y] (that is, The probability that Y is less than or equal to y is calculated using the normal distribution N (m, σ ^ 2)), and the number of abnormal flows d to be detected is d = argmin {d | F (md−α + σ × σ, m, σ ^ 2) Let <ε}. Note that ε is a predetermined target value for the abnormal traffic non-detection rate (0 <ε <1). Here, the average number of flows m (p) and variance σ (p) ^ 2 when packet sampling is performed with the sampling probability p is measured, and this is divided into M groups. The flow number m (p, j) is m (p, j) = m (p) / M, and the variance σ (p, j) ^ 2 is σ (p, j) ^ 2 = σ (p) ^ 2 / Calculate by M. With the above preparation, the number M of divided groups is set to M = argmin {M | F (m (p, j) + α × σ (p, j), m (p, j) + d × p, σ (p, j ) ^ 2 + d * p * (1-p)) <[epsilon]}.

ここで上述の方法について説明する。基本的な考え方は、サンプリングを行っていなければε未満の非検出率で検出できる異常トラヒックを、サンプリング後も同じくε未満の非検出率で見つけるのに必要な分割グループ数Ｍを決定している。 Here, the above-described method will be described. The basic idea is to determine the number of divided groups M required to find anomalous traffic that can be detected with a non-detection rate of less than ε if sampling is not performed, with a non-detection rate of less than ε after sampling. .

まず最初にサンプリングを行う前の元のフロー数の平均ｍと分散σ＾２を決める必要がある。一つの方法として、例えば事前に一時的に全パケットキャプチャをして求めておく方法がある。別の方法として、パケットサンプリングにより得られたサンプルフロー統計から、元のフロー統計を推定する方法がある。まず、元のフロー数の平均ｍは、例えば非特許文献２の方法を用いて推定できる。次に、元のフロー数の分散σ＾２の推定法について述べる。サンプリング確率ｐ０でパケットサンプリングを行ってサンプルフロー情報を収集しているとし、ｐ０より小さい確率ｐ１，ｐ２，…（ｐ１，ｐ２，…＜ｐ０）でサンプリングした場合のサンプルフロー情報を生成する。例えばｐ０＝０．０１、ｐ１＝０．００１であれば、ｐ０＝０．０１で得られたサンプルフローを更に１／１０のパケットサンプリング確率で間引いてやればよい。次に、各確率ｐｉに対するサンプルフロー数の平均ｍ（ｐｉ）と分散σ（ｐｉ）＾２を計算する。その後、平均ｍと分散σ＾２の間にσ＾２＝ａ×ｍ＾ｃの関係が成り立つとして、測定結果（ｍ（ｐｉ），σ＾２（ｐｉ））（ｉ＝０，１，２，…）がこの関係式に合うようなａとｃを計算する。具体的には、関係式のｌｏｇを取り、ｌｏｇ｛σ＾２｝＝ａ＋ｃ×ｌｏｇ｛ｍ｝としておき、測定結果のｌｏｇを取った値ｌｏｇ｛ｍ（ｐｉ）｝を説明変数Ｘｉ、ｌｏｇ｛σ（ｐｉ）＾２｝を被説明変数ＹｉとしてＸｉとＹｉの関係をＹ＝ａ＋ｃＸと線形近似して、その近似誤差が最小となる傾きｃとＹ切片ａを求める。ａとｃが求まったら、元の分散σ＾２を、σ＾２＝ａ×ｍ＾ｃにより推定する。なお、ここで用いた平均と分散の関係式σ＾２＝ａ×ｍ＾ｃは、非特許文献６を参考にしており、トラヒックの時系列に関する平均ｍと分散σ＾２の間にこのような関係が成り立つことを利用している。 First, it is necessary to determine the average m and the variance σ ^ 2 of the original number of flows before sampling. As one method, for example, there is a method in which all packet captures are temporarily obtained in advance. Another method is to estimate the original flow statistics from the sample flow statistics obtained by packet sampling. First, the average m of the original number of flows can be estimated using the method of Non-Patent Document 2, for example. Next, an estimation method of the original flow number variance σ ^ 2 will be described. Assume that sample flow information is collected by performing packet sampling with a sampling probability p0, and sample flow information is generated when sampling is performed with a probability p1, p2,... (P1, p2,... <P0) smaller than p0. For example, if p0 = 0.01 and p1 = 0.001, the sample flow obtained with p0 = 0.01 may be further thinned out with a packet sampling probability of 1/10. Next, the average m (pi) and variance σ (pi) ^ 2 of the number of sample flows for each probability pi are calculated. Thereafter, assuming that a relationship of σ ^ 2 = a × m ^ c holds between the average m and the variance σ ^ 2, the measurement result (m (pi), σ ^ 2 (pi)) (i = 0, 1, 2 ,...) Are calculated such that a and c match this relational expression. Specifically, the log of the relational expression is taken, log {σ ^ 2} = a + c × log {m}, and the value log {m (pi)} obtained by taking the log of the measurement result is used as the explanatory variables Xi, log { The relationship between Xi and Yi is linearly approximated to Y = a + cX with σ (pi) ^ 2} as the explained variable Yi, and the slope c and Y intercept a that minimize the approximation error are obtained. When a and c are obtained, the original variance σ ^ 2 is estimated by σ ^ 2 = a × m ^ c. Note that the relational expression σ ^ 2 = a × m ^ c between the average and the variance used here is based on Non-Patent Document 6, and the average m and the variance σ ^ 2 related to the traffic time series are as described above. To make use of the relationship.

以上の準備の下、検出すべき異常フロー数ｄをｄ＝ａｒｇｍｉｎ｛ｄ｜Ｆ（ｍ−ｄ＋α×σ，ｍ，σ＾２）＜ε｝としている。ここで、Ｆ（ｍ−ｄ＋α×σ，ｍ，σ＾２）＝Ｐ［Ｙ＋ｄ＜ｍ＋α×σ］を意味しており、すなわち、異常トラヒックｄが加わっているにもかかわらず検出しきい値ｍ＋α×σを下回り正しく検出できない確率（つまり、非検出率）を表す。従ってｄをｄ＝ａｒｇｍｉｎ｛ｄ｜Ｆ（ｍ−ｄ＋α×σ，ｍ，σ＾２）＜ε｝とすることにより、非検出率がε未満となる最小のｄを異常トラヒックと定義している。 With the above preparation, the number d of abnormal flows to be detected is set to d = argmin {d | F (md−α × σ, m, σ ^ 2) <ε}. Here, F (m−d + α × σ, m, σ ^ 2) = P [Y + d <m + α × σ], that is, the detection threshold value m + α despite the addition of abnormal traffic d. The probability (that is, non-detection rate) that cannot be detected correctly below xσ. Therefore, d is defined as d = argmin {d | F (m−d + α × σ, m, σ ^ 2) <ε}, and the minimum d with a non-detection rate of less than ε is defined as abnormal traffic. .

このような異常トラヒックｄを、パケットサンプリング後も検出できるようにトラヒックを分割していき、該異常トラヒックを非検出率ε未満で検出するのに必要な分割数ＭをＭ＝ａｒｇｍｉｎ｛Ｍ｜Ｆ（ｍ（ｐ，ｊ）＋α×σ（ｐ，ｊ），ｍ（ｐ，ｊ）＋ｄ×ｐ，σ（ｐ，ｊ）＾２＋ｄ×ｐ×（１−ｐ））＜ε｝により決定している。なお、ここでは異常トラヒックは発生フロー数ｄで異常フローを発生するとし、該フローを確率ｐでサンプルしたときの平均フロー数がｄ×ｐ、分散がｄ×ｐ×（１−ｐ）になるとしている。また異常フローはＭ分割後もある一つのグループに集中しているとしている。 The traffic is divided so that such abnormal traffic d can be detected even after packet sampling, and the number M of divisions required to detect the abnormal traffic with a non-detection rate ε is M = argmin {M | F (M (p, j) + α × σ (p, j), m (p, j) + d × p, σ (p, j) ^ 2 + d × p × (1-p)) <ε} Yes. Here, it is assumed that the abnormal traffic generates an abnormal flow with the generated flow number d, and the average flow number when the flow is sampled with the probability p is d × p, and the variance is d × p × (1−p). It is said. Further, it is assumed that abnormal flows are concentrated in one group even after M division.

なお、Ｍを過度に大きくすると、本当は異常トラヒックが発生していないのに誤検出する回数が急増してしまい、オペレータの負担を大きくしてしまう可能性がある。また、Ｍが大きいとその分時系列解析すべきデータ数も増加し、処理コストも増加してしまう、従って、Ｍは異常トラヒック非検出率を目標値ε以下にできる範囲でなるべく小さく設定している。 Note that if M is excessively increased, the number of erroneous detections may increase rapidly even though no abnormal traffic has actually occurred, which may increase the burden on the operator. Further, if M is large, the number of data to be time-series analyzed increases accordingly, and the processing cost also increases. Therefore, M is set as small as possible within a range where the abnormal traffic non-detection rate can be made equal to or less than the target value ε. Yes.

本発明の第５の方法においては、第４の方法において、Ｍ個のグループに分割したときのグループｊでのフロー数ｍ（ｐ，ｊ）をｍ（ｐ，ｊ）＝ｍ（ｐ）／Ｍとしていた代わりに、Ｍ分割したときのフロー数分配比率ｗ（ｉ）（Σｗ（ｉ）＝１，０＜ｗ（ｉ）＜１）を実データ分析により事前に定めておき、Ｍ分割後にフロー数が最大となるグループｊ（つまりｗ（ｊ）が最大となるグループ）でのフロー数をｍ（ｐ，ｊ）＝ｍ（ｐ）×ｗ（ｊ）とし、分散σ（ｐ，ｊ）＾２をσ（ｐ，ｊ）＾２＝ａ×ｍ（ｐ，ｊ）＾ｃにより計算してから（ａ，ｃは予め定めるか実データ分析により事前に定めておく係数）を用いて第４の方法と同様の手順で分割グループ数Ｍを決定することを特徴とする。 In the fifth method of the present invention, in the fourth method, the number of flows m (p, j) in group j when divided into M groups is m (p, j) = m (p) / Instead of M, the flow number distribution ratio w (i) (Σw (i) = 1, 0 <w (i) <1) when divided into M is determined in advance by actual data analysis. The number of flows in the group j where the number of flows is maximum (that is, the group where w (j) is maximum) is m (p, j) = m (p) × w (j), and the variance σ (p, j) ^ 2 is calculated by σ (p, j) ^ 2 = a × m (p, j) ^ c (a and c are coefficients determined in advance or determined in advance by actual data analysis). The number M of divided groups is determined by the same procedure as the method 4.

第４の方法では、トラヒックは均等に分割されることを仮定していたが、ここでは分割後の正常フロー数に偏りがある場合を想定し、フロー数が最も多いグループに着目している。これは、正常フロー数が多いとそれだけ異常が埋もれる確率が高いが、そのような正常フロー数が多いグループにおいても異常非検出率をε未満に抑えられるようにすることを意味する。 In the fourth method, it is assumed that traffic is divided evenly, but here, the case where there is a bias in the number of normal flows after division is focused on the group with the largest number of flows. This means that if there is a large number of normal flows, there is a high probability that the abnormality will be buried, but even in such a group with a large number of normal flows, the abnormality non-detection rate can be suppressed to less than ε.

ここで必要となるフロー数分配比率の決定方法について述べる。たとえば、事前に一時的に全パケットキャプチャを実施し（あるいは、過去のある時間帯において収集されたサンプルフロー情報を用いてもよい）、分割グループ数をＭとしたときの分配比率ｗ（ｉ）（ｉ＝１〜Ｍ）を計算しておき、その中でフロー数が最大となるグループｊでの分配比率ｗ（ｊ）を記憶しておく。異常を、Ｍを変えた場合それぞれに対して実施しておき、Ｍの値とそのときのフロー数が最大となるグループｊでの分配比率ｗ（ｊ）の対応表を用意しておく。 Here, a method for determining the required flow number distribution ratio will be described. For example, all packet captures are temporarily performed in advance (or sample flow information collected in a past time period may be used), and the distribution ratio w (i) when the number of divided groups is M (I = 1 to M) is calculated, and the distribution ratio w (j) in the group j in which the number of flows is maximum is stored. Abnormality is performed for each case where M is changed, and a correspondence table of the distribution ratio w (j) in the group j in which the value of M and the number of flows at that time are maximum is prepared.

本発明により、パケットサンプリングが行われても異常トラヒックを適切に検出することができる。 According to the present invention, abnormal traffic can be appropriately detected even when packet sampling is performed.

以下、図面を用いて本発明の実施例を説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図２は本発明が適用されるＩＰネットワークの基本構成の一例を示す構成図である。図２において、２０は異常トラヒック検出装置であり、２１〜２４はルータである。図２に示すように、ＩＰネットワークを構成する各ルータ２１〜２４からサンプリング確率ｐでパケットサンプリングされたサンプルフロー情報が異常トラヒック検出装置２０に出力され、異常トラヒック検出装置２０はサンプルフロー情報を元に異常トラヒックを検出する。 FIG. 2 is a configuration diagram showing an example of a basic configuration of an IP network to which the present invention is applied. In FIG. 2, 20 is an abnormal traffic detection device, and 21 to 24 are routers. As shown in FIG. 2, sample flow information packet-sampled with a sampling probability p from each router 21 to 24 constituting the IP network is output to the abnormal traffic detection device 20, and the abnormal traffic detection device 20 is based on the sample flow information. Detect abnormal traffic.

図３は、本発明における異常トラヒック検出装置の構成例を表すブロック図である。トラヒック検出装置２０は、パケットサンプリングによりサンプルされたフローに関する情報を収集分析し、予め定めた監視単位毎に、予め定めた時間間隔ｔ０毎の発生トラヒック量を測定し、ｔ番目の測定区間におけるトラヒック量をＮ_ｔとおき、Ｎ_ｔの時間変化をみて異常トラヒックを検出する。 FIG. 3 is a block diagram showing a configuration example of the abnormal traffic detection apparatus according to the present invention. The traffic detection device 20 collects and analyzes information relating to flows sampled by packet sampling, measures the amount of traffic generated for each predetermined time interval t0 for each predetermined monitoring unit, and performs traffic in the t-th measurement interval. The amount is set as N_t, and abnormal traffic is detected by looking at the time variation of N_t.

図３に示すように、異常トラヒック検出装置２０は、分割グループ数決定部３１とグループ分類部３２と時系列解析部３３と異常トラヒック検出部３４を備える。各部が行う処理の概要は次の通りである。分割グループ数決定部３１は、分割グループ数を決定する。グループ分類部３２は、観測されたトラヒック量Ｎ_ｔを予め定めた規則にしたがってＭ個にグループ分けし、ｊ番目のグループにおけるトラヒック量をＮ_ｔ（ｊ）（このとき、Σ_{ｊ＝１〜Ｍ}Ｎ_ｔ（ｊ）＝Ｎ_ｔ）とし、Ｎ_ｔ（ｊ）を時系列解析部３３に通知する。時系列解析部３３は、各グループｊにおけるＮ_ｔ（ｊ）に関する時系列を解析し、Ｎ_ｔ（ｊ）の時間的変化を検出したら、該グループに異常トラヒックが発生したとして検出する。異常トラヒック検出部３４は、時系列解析部３３から異常検出の旨の通知を受けたら異常トラヒックが検出されたことをオペレータに通知する。 As shown in FIG. 3, the abnormal traffic detection device 20 includes a divided group number determination unit 31, a group classification unit 32, a time series analysis unit 33, and an abnormal traffic detection unit 34. The outline of processing performed by each unit is as follows. The division group number determination unit 31 determines the number of division groups. The group classification unit 32 groups the observed traffic volume N_t into M according to a predetermined rule, and sets the traffic volume in the j-th group to N_t (j) (in this case, Σ _{j = 1 to M} N_t ( j) = N_t), and N_t (j) is notified to the time series analysis unit 33. The time series analysis unit 33 analyzes the time series related to N_t (j) in each group j, and detects a time-dependent change in N_t (j) and detects that abnormal traffic has occurred in the group. The abnormal traffic detection unit 34 notifies the operator that abnormal traffic has been detected upon receiving a notification from the time series analysis unit 33 that an abnormality has been detected.

本実施例のトラヒック検出装置２０は、本発明の第１の方法に記載のように、予め定めた監視単位（例えばリンク単位あるいは対地間単位）毎にトラヒックを監視し、本発明の第３の方法に記載のうち監視トラヒック項目としてフロー数を監視して、第４の方法で分割グループ数Ｍを決定し、発信元ＩＰアドレスで監視トラヒックをＭ個に分割し、第２の方法で各グループのトラヒックの時系列を解析して異常トラヒックを検出する。 As described in the first method of the present invention, the traffic detection device 20 of the present embodiment monitors the traffic for each predetermined monitoring unit (for example, a link unit or a unit between the ground), and the third method of the present invention. The number of flows is monitored as a monitoring traffic item in the method, the number M of divided groups is determined by the fourth method, the monitoring traffic is divided into M by the source IP address, and each group is determined by the second method. Anomalous traffic is detected by analyzing the time series of traffic.

図３に示されている通り、ルータから到着したフロー情報は、グループ分類部３２でどのグループに属するか決定される（なお、この手順は、監視単位毎に行われるとする）。ここで、フロー情報としては、該フローを構成する発信元ＩＰアドレス、着信先ＩＰアドレス、プロトコル番号、発信元ポート番号、着信先ポート番号、およびパケット数、バイト数が記載されている。ここでは、一例として、発信元ＩＰアドレスからそのアドレスプリフィックスを調べ、同一プリフィックスを持つフローは同一のグループに属するようにマッピングさせる。他の例として、プリフィックスからそのプリフィックスの属するＡＳ（autonomous system）番号を調べ、ＡＳ番号をＭで割った余りがｊならグループｊにマッピングさせてもよい。どのグループにマッピングさせるかを決定したら、その旨を時系列解析部３３に通知する。時系列解析部３３では、グループ毎に発生フロー数に関するカウンタを用意しており、グループ分類部からグループｊへのフロー到着があった旨の通知を受けたら、グループｊのフロー数カウンタＮ_ｔ（ｊ）をＮ_ｔ（ｊ）←Ｎ_ｔ（ｊ）＋１とする。予め定めた時間間隔ｔ０が経過したら、Ｎ_ｔ（ｊ）をしきい値ｍ（ｊ）＋α×σ（ｊ）（ｍ（ｊ）はＮ_ｔ（ｊ）の平均、σ（ｊ）は標準偏差であり、計算手順は後述。一方、αは予め定める係数）と比較し、しきい値を超えていたら異常トラヒック検出部３４にその旨を通知する。その後、平均ｍ（ｊ）をｍ（ｊ）←（１−γ）×ｍ（ｊ）＋γ×Ｎ_ｔ（ｊ）により更新し、過去ｎ期分のＮ_ｔ（ｊ）とｍ（ｊ）の差に関する標準偏差をσとして計算する。その後、Ｎ_ｔ（ｊ）を０にクリアする。異常トラヒック検出部３４は、時系列解析部３３から異常検出の旨の通知を受けたら、そのグループ番号をオペレータに通知する。 As shown in FIG. 3, the flow information arriving from the router is determined to belong to which group by the group classification unit 32 (note that this procedure is performed for each monitoring unit). Here, as the flow information, a source IP address, a destination IP address, a protocol number, a source port number, a destination port number, a packet number, and a byte number constituting the flow are described. Here, as an example, the address prefix is checked from the source IP address, and flows having the same prefix are mapped so as to belong to the same group. As another example, an AS (autonomous system) number to which the prefix belongs may be checked from the prefix, and if the remainder obtained by dividing the AS number by M is j, it may be mapped to the group j. When the group to be mapped is determined, the fact is notified to the time series analysis unit 33. The time series analysis unit 33 prepares a counter related to the number of generated flows for each group, and upon receiving notification from the group classification unit that a flow has arrived at the group j, the flow number counter N_t (j of the group j ) Is set to N_t (j) ← N_t (j) +1. When a predetermined time interval t0 has elapsed, N_t (j) is a threshold value m (j) + α × σ (j) (m (j) is an average of N_t (j), and σ (j) is a standard deviation. On the other hand, α is compared with a predetermined coefficient), and if the threshold value is exceeded, the abnormal traffic detection unit 34 is notified of this. Thereafter, the average m (j) is updated by m (j) ← (1−γ) × m (j) + γ × N_t (j), and the difference between N_t (j) and m (j) for the past n periods The standard deviation is calculated as σ. Thereafter, N_t (j) is cleared to 0. When the abnormal traffic detection unit 34 receives notification from the time-series analysis unit 33 that abnormality has been detected, the abnormal traffic detection unit 34 notifies the operator of the group number.

一方、分割グループ数決定部３１は、以下の手順に従って分割グループ数Ｍを決定しておく。
正常フローに関して、パケットサンプリングを行う前のフロー数の平均ｍと分散σ＾２を与えておく。例えば、事前に一時的に全パケットキャプチャをして求めておく。あるいは、パケットサンプリングにより得られたサンプルフロー統計から、元のフロー数の平均ｍを推定する（例えば非特許文献２の方法を用いる）。次に、元のフロー数の分散σ＾２を以下の手順で推定する。サンプリング確率ｐ０でパケットサンプリングを行ってサンプルフロー情報を収集しているとし、ｐ０より小さい確率ｐ１，ｐ２，…（ｐ１，ｐ２，…＜ｐ０）でサンプリングした場合のサンプルフロー情報を生成する。例えばｐ０＝０．０１、ｐ１＝０．００１であれば、ｐ０＝０．０１で得られたサンプルフローを更に１／１０のパケットサンプリング確率で間引いてやればよい。次に、各確率ｐｉに対するサンプルフロー数の平均ｍ（ｐｉ）と分散σ（ｐｉ）＾２を計算する。その後、平均ｍと分散σ＾２の間にσ＾２＝ａ×ｍ＾ｃの関係が成り立つとして、測定結果（ｍ（ｐｉ），σ＾２（ｐｉ））（ｉ＝０，１，２，…）がこの関係式に合うようなａとｃを計算する。具体的には、関係式のｌｏｇを取り、ｌｏｇ｛σ＾２｝＝ａ＋ｃ×ｌｏｇ｛ｍ｝としておき、測定結果のｌｏｇを取った値ｌｏｇ｛ｍ（ｐｉ）｝を説明変数Ｘｉ、ｌｏｇ｛σ（ｐｉ）＾２｝を被説明変数ＹｉとしてＸｉとＹｉの関係をＹ＝ａ＋ｃＸと線形近似して、その近似誤差が最小となる傾きｃとＹ切片ａを求める。ａとｃが求まったら、元の分散σ＾２を、σ＾２＝ａ×ｍ＾ｃにより推定する。なお、Ａ＾ＢはＡのＢ乗を意味する。ここで、正常フロー数Ｙが平均ｍ、分散σ＾２の正規分布Ｎ（ｍ，σ＾２）に従うとし、Ｆ（ｙ，ｍ，σ＾２）＝Ｐ［Ｙ＜ｙ］とし（つまりＹがｙ以下となる確率を正規分布Ｎ（ｍ，σ＾２）を用いて計算する）、検出すべき異常フロー数ｄをｄ＝ａｒｇｍｉｎ｛ｄ｜Ｆ（ｍ−ｄ＋α×σ，ｍ，σ＾２）＜ε｝とする。なお、ａｒｇｍｉｎ｛Ａ｜Ｂ｝はＢという条件を満たすＡのうち最小の値を意味する。また、εは異常トラヒック非検知率に対する予め定めた目標値である（０＜ε＜１）。ここで、サンプリング確率ｐでパケットサンプリングを行ったときの平均フロー数ｍ（ｐ）と分散σ（ｐ）＾２を測定しておき、これをＭ個のグループに分割したときのグループｊでのフロー数ｍ（ｐ，ｊ）をｍ（ｐ，ｊ）＝ｍ（ｐ）／Ｍとし、分散σ（ｐ，ｊ）＾２をσ（ｐ，ｊ）＾２＝σ（ｐ）＾２／Ｍにより計算する。以上の準備の下、分割グループ数Ｍを、Ｍ＝ａｒｇｍｉｎ｛Ｍ｜Ｆ（ｍ（ｐ，ｊ）＋α×σ（ｐ，ｊ），ｍ（ｐ，ｊ）＋ｄ×ｐ，σ（ｐ，ｊ）＾２＋ｄ×ｐ×（１−ｐ））＜εと設定する。 On the other hand, the division group number determination unit 31 determines the division group number M according to the following procedure.
Regarding the normal flow, the average m of the number of flows before packet sampling and the variance σ ^ 2 are given. For example, it is obtained by capturing all packets temporarily in advance. Alternatively, the average m of the original number of flows is estimated from the sample flow statistics obtained by packet sampling (for example, the method of Non-Patent Document 2 is used). Next, the variance σ ^ 2 of the original number of flows is estimated by the following procedure. Assume that sample flow information is collected by performing packet sampling with a sampling probability p0, and sample flow information is generated when sampling is performed with a probability p1, p2,... (P1, p2,... <P0) smaller than p0. For example, if p0 = 0.01 and p1 = 0.001, the sample flow obtained with p0 = 0.01 may be further thinned out with a packet sampling probability of 1/10. Next, the average m (pi) and variance σ (pi) ^ 2 of the number of sample flows for each probability pi are calculated. Thereafter, assuming that a relationship of σ ^ 2 = a × m ^ c holds between the average m and the variance σ ^ 2, the measurement result (m (pi), σ ^ 2 (pi)) (i = 0, 1, 2 ,...) Are calculated such that a and c match this relational expression. Specifically, the log of the relational expression is taken, log {σ ^ 2} = a + c × log {m}, and the value log {m (pi)} obtained by taking the log of the measurement result is used as the explanatory variables Xi, log { The relationship between Xi and Yi is linearly approximated to Y = a + cX with σ (pi) ^ 2} as the explained variable Yi, and the slope c and Y intercept a that minimize the approximation error are obtained. When a and c are obtained, the original variance σ ^ 2 is estimated by σ ^ 2 = a × m ^ c. A ^ B means A to the Bth power. Here, it is assumed that the normal flow number Y follows a normal distribution N (m, σ ^ 2) having an average m and a variance σ ^ 2, and F (y, m, σ ^ 2) = P [Y <y] (that is, Y Is calculated using the normal distribution N (m, σ ^ 2)), and the number of abnormal flows d to be detected is d = argmin {d | F (md + α × σ, m, σ ^ 2) Let <ε}. Note that argmin {A | B} means the smallest value of A that satisfies the condition B. Also, ε is a predetermined target value for the abnormal traffic non-detection rate (0 <ε <1). Here, the average number of flows m (p) and variance σ (p) ^ 2 when packet sampling is performed with the sampling probability p is measured, and this is divided into M groups. The flow number m (p, j) is m (p, j) = m (p) / M, and the variance σ (p, j) ^ 2 is σ (p, j) ^ 2 = σ (p) ^ 2 / Calculate by M. With the above preparation, the number M of divided groups is set to M = argmin {M | F (m (p, j) + α × σ (p, j), m (p, j) + d × p, σ (p, j ) ^ 2 + d * p * (1-p)) <[epsilon].

実施例２は、本発明の第４の方法で分割グループ数Ｍを決定するものである。実施例２では、実施例１のように、Ｍ個のグループに分割したときのグループｊでのフロー数ｍ（ｐ，ｊ）をｍ（ｐ，ｊ）＝ｍ（ｐ）／Ｍとしていた代わりに、Ｍ分割したときのフロー数分配比率ｗ（ｉ）（Σｗ（ｉ）＝１，０＜ｗ（ｉ）＜１）を実データ分析により事前に定めておき、Ｍ分割後にフロー数が最大となるグループｊ（つまりｗ（ｊ）が最大となるグループ）でのフロー数ｍ（ｐ，ｊ）＝ｍ（ｐ）×ｗ（ｊ）を用いる。また、分散σ（ｐ，ｊ）＾２をσ（ｐ，ｊ）＾２＝ａ×ｍ（ｐ，ｊ）＾ｃにより計算する（ａ，ｃは予め定めるか実データ分析により事前に定めておく係数）。その他の点は実施例１と同様である。 In the second embodiment, the number M of divided groups is determined by the fourth method of the present invention. In the second embodiment, as in the first embodiment, the number of flows m (p, j) in group j when divided into M groups is set to m (p, j) = m (p) / M. In addition, the flow number distribution ratio w (i) (Σw (i) = 1, 0 <w (i) <1) when divided into M is determined in advance by actual data analysis, and the number of flows is maximized after dividing into M. The number of flows m (p, j) = m (p) × w (j) in the group j (that is, the group in which w (j) is maximum) is used. Further, the variance σ (p, j) ^ 2 is calculated by σ (p, j) ^ 2 = a × m (p, j) ^ c (a and c are determined in advance or determined in advance by actual data analysis). Factor). Other points are the same as in the first embodiment.

実施例３では、実施例１のように発信元ＩＰアドレスを元にグループ分けする代わりに、着信先ＩＰアドレスを用いてグループ分けする。あるいは、プロトコル番号を用いてグループ分けしてもよいし、発信元ポート番号、あるいは着信先ポート番号を用いてグループ分けしてもよい。さらに、これらを組み合わせて、たとえば同じ発信元ＩＰアドレスとプロトコル番号の組を持つフローを共通のグループに対応させてもよい。その他の点は実施例１と同様である。 In the third embodiment, instead of grouping based on the source IP address as in the first embodiment, grouping is performed using the destination IP address. Alternatively, grouping may be performed using a protocol number, or grouping may be performed using a source port number or a destination port number. Further, by combining these, for example, flows having the same source IP address and protocol number pair may correspond to a common group. Other points are the same as in the first embodiment.

実施例４では、実施例１のようにフロー数を監視していた代わりに、パケット数、あるいはバイト数、またはこれらを併用して異常トラヒック検出を行う。パケット数を監視する場合は、時系列解析部３３において、グループｊからのフロー到着があった旨の通知を受けたら、グループｊのパケット数カウンタＰｋｔ_ｔ（ｊ）←Ｐｋｔ_ｔ（ｊ）＋｛該フローのパケット数｝とすればよい。バイト数監視の場合は、Ｂ_ｔ（ｊ）←Ｂ_ｔ（ｊ）＋｛該フローのバイト数｝とすればよい。なお、パケット数、あるいはバイト数の場合の分割グループ数Ｍの決定については、以下の通りである。 In the fourth embodiment, instead of monitoring the number of flows as in the first embodiment, abnormal traffic detection is performed using the number of packets, the number of bytes, or a combination thereof. When monitoring the number of packets, when the time series analysis unit 33 receives a notification that a flow has arrived from the group j, the packet number counter Pkt_t (j) ← Pkt_t (j) + {the flow of the group j Number of packets}. In the case of monitoring the number of bytes, B_t (j) ← B_t (j) + {number of bytes of the flow} may be set. The determination of the number of divided groups M in the case of the number of packets or the number of bytes is as follows.

パケット数の場合、正常フローに関して、時間間隔ｔ０における平均到着パケット数と分散を測定あるいは予め見積もっておく。これを、サンプリングを行う前の正常フローからの全到着パケット数の平均ｍおよび分散σ＾２とする。ここで、正常パケット数Ｙが平均ｍ、分散σ＾２の正規分布Ｎ（ｍ，σ＾２）に従うとし、Ｆ（ｙ，ｍ，σ＾２）＝Ｐ［Ｙ＜ｙ］とし（つまりＹがｙ以下となる確率を正規分布Ｎ（ｍ，σ＾２）を用いて計算する）、検出すべき異常パケット数ｄをｄ＝ａｒｇｍｉｎ｛ｄ｜Ｆ（ｍ−ｄ＋α×σ，ｍ，σ＾２）＜εとする。なお、εは異常トラヒック非検知率に対する予め定めた目標値である（０＜ε＜１）。ここで、サンプリング確率ｐでパケットサンプリングを行ったときの平均パケット数ｍ（ｐ）と分散σ（ｐ）＾２を測定しておき、これをＭ個のグループに分割したときのグループｊでのパケット数ｍ（ｐ，ｊ）をｍ（ｐ，ｊ）＝ｍ（ｐ）／Ｍとし、分散σ（ｐ，ｊ）＾２をσ（ｐ，ｊ）＾２＝σ（ｐ）＾２／Ｍにより計算する。以上の準備の下、分割グループ数Ｍを、Ｍ＝ａｒｇｍｉｎ｛Ｍ｜Ｆ（ｍ（ｐ，ｊ）＋α×σ（ｐ，ｊ），ｍ（ｐ，ｊ）＋ｄ×ｐ，σ（ｐ，ｊ）＾２＋ｄ×ｐ×（１−ｐ））＜ε｝と設定する。 In the case of the number of packets, with respect to the normal flow, the average number of arrival packets and the variance at the time interval t0 are measured or estimated in advance. Let this be the average m of all arriving packets from the normal flow before sampling and variance σ ^ 2. Here, it is assumed that the number of normal packets Y follows a normal distribution N (m, σ ^ 2) with an average m and variance σ ^ 2, and F (y, m, σ ^ 2) = P [Y <y] (that is, Y Is calculated using the normal distribution N (m, σ ^ 2)), and the number of abnormal packets d to be detected is d = argmin {d | F (md + α × σ, m, σ ^ 2) <ε. Note that ε is a predetermined target value for the abnormal traffic non-detection rate (0 <ε <1). Here, the average number of packets m (p) and the variance σ (p) ^ 2 when packet sampling is performed with the sampling probability p is measured, and this is divided into M groups. The number of packets m (p, j) is m (p, j) = m (p) / M, and the variance σ (p, j) ^ 2 is σ (p, j) ^ 2 = σ (p) ^ 2 / Calculate by M. With the above preparation, the number M of divided groups is set to M = argmin {M | F (m (p, j) + α × σ (p, j), m (p, j) + d × p, σ (p, j ) ^ 2 + d * p * (1-p)) <[epsilon]}.

バイト数の場合は、全正常フローからの到着バイト数の平均をｍとし、上述のパケット数に関する手順と同様のことを実施する。
以上の点以外は実施例１と同様である。 In the case of the number of bytes, m is the average number of bytes arrived from all normal flows, and the same procedure as the above-described procedure regarding the number of packets is performed.
The other points are the same as those in the first embodiment.

本発明の有効性について評価した結果を報告する、図４に、ある実トラヒックを分析した結果を示す。横軸は時間であり、縦軸はフロー数［ｆｌｏｗｓ／ｍｉｎ］である。図４（ａ）はパケットサンプリングを行う前の全フロー数（サンプリング確率ｐ＝１）、（ｂ）はサンプリング確率をｐ＝０．００１としたときの全サンプルフロー数、（ｃ）はｐ＝０．００１、分割グループ数Ｍ＝５としたときの各グループのサンプルフロー数の時系列である。図４（ａ）をみるとフロー数の急増している箇所がみてとれる。データを詳細に分析した結果、これは、あるホストが膨大な着信先ＩＰアドレス数に対してＳＹＮパケットを送出するＳＹＮｆｌｏｏｄｉｎｇであることがわかった。図４（ｂ）の全フロー数の結果（ｐ＝０．００１）と図４（ｃ）の結果を比較すると分かるように、トラヒックを分割することにより、全体でみると埋もれていた異常トラヒックが目視できるようになっている。なお、ここでは発信元ＩＰアドレスを元に５つのグループに分割している。ここで、グループＤ（図中ｃｌａｓｓＤ）には、図４（ａ）を用いて説明したＳＹＮｆｌｏｏｄｉｎｇが存在していた。 FIG. 4, which reports the results of evaluating the effectiveness of the present invention, shows the results of analyzing certain real traffic. The horizontal axis represents time, and the vertical axis represents the number of flows [flows / min]. 4A shows the total number of flows before sampling (sampling probability p = 1), FIG. 4B shows the total number of sample flows when the sampling probability is p = 0.001, and FIG. 4C shows p = It is a time series of the number of sample flows of each group when 0.001 and the number of divided groups M = 5. As shown in FIG. 4A, it can be seen that the number of flows is rapidly increasing. As a result of detailed analysis of the data, it was found that this is SYN flooding in which a certain host sends out SYN packets for a huge number of destination IP addresses. As can be seen by comparing the result of the total number of flows in FIG. 4B (p = 0.001) and the result of FIG. 4C, by dividing the traffic, abnormal traffic that was buried as a whole can be reduced. It can be visually checked. Here, the group is divided into five groups based on the source IP address. Here, in the group D (class D in the figure), the SYN flooding described with reference to FIG.

図４（ｂ）、（ｃ）を比較すれば明らかなように、グループに分割しない図４（ｂ）ではＳＹＮｆｌｏｏｄｉｎｇによるフロー数の急増がそれほど明確ではないのに対して、グループに分割した図４（ｃ）では、ＳＹＮｆｌｏｏｄｉｎｇによるフロー数の急増がｃｌａｓｓＤにおいて明確である。 As is clear from comparison between FIGS. 4B and 4C, FIG. 4B, which is not divided into groups, shows that the rapid increase in the number of flows due to SYN flooding is not so clear, but is divided into groups. In 4 (c), the rapid increase in the number of flows due to SYN flooding is clear in class D.

以上説明したように、本発明によれば、監視対象トラヒックを適切に分割することにより、パケットサンプリングによって埋もれてしまう異常トラヒックも適切に検出することが可能となる。 As described above, according to the present invention, it is possible to appropriately detect abnormal traffic that is buried by packet sampling by appropriately dividing the monitoring target traffic.

以上に説明した異常トラヒック検出装置はその機能、動作を実現する手段を有しており、その手段はコンピュータとプログラムで構成できる。また、そのプログラムの一部または全部の代わりにハードウェアを用いてもよい。 The abnormal traffic detection apparatus described above has means for realizing its functions and operations, and the means can be constituted by a computer and a program. Further, hardware may be used in place of part or all of the program.

以上、本発明者によってなされた発明を、前記実施形態に基づき具体的に説明したが、本発明は、前記実施形態に限定されるものではなく、その要旨を逸脱しない範囲において種々変更可能であることは勿論である。 As mentioned above, the invention made by the present inventor has been specifically described based on the embodiment. However, the invention is not limited to the embodiment, and various modifications can be made without departing from the scope of the invention. Of course.

パケットサンプリングすると異常トラヒックが埋もれてしまう例を表すグラフである。It is a graph showing the example in which abnormal traffic is buried when packet sampling is carried out. 本発明が適用されるネットワークの基本構成例を示す構成図である。It is a block diagram which shows the basic structural example of the network to which this invention is applied. 本発明における異常トラヒック検出装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the abnormal traffic detection apparatus in this invention. 本発明の効果を示す評価結果である。It is an evaluation result which shows the effect of this invention.

Explanation of symbols

２０…異常トラヒック検出装置、２１〜２４…ルータ、３１…分割グループ数決定部、３２…グループ分類部、３３…時系列解析部、３４…異常トラヒック検出部 DESCRIPTION OF SYMBOLS 20 ... Abnormal traffic detection apparatus, 21-24 ... Router, 31 ... Divided group number determination part, 32 ... Group classification | category part, 33 ... Time series analysis part, 34 ... Abnormal traffic detection part

Claims

Collect and analyze information about the flow, measure the amount of traffic generated at each predetermined time interval t0 for each predetermined monitoring unit, set the traffic amount at the t-th measurement interval as N_t, and see the time variation of N_t In the abnormal traffic detection method in the abnormal traffic detection device for detecting abnormal traffic,
The abnormal traffic detection device comprises group classification means and time series analysis means,
The group classification means groups the observed traffic volume N_t into M according to a predetermined rule, and sets the traffic volume in the jth group to N_t (j) (in this case, Σ _{j = 1 to M} N_t ( j) = N_t) and notifying N_t (j) to the time series analysis means;
The time series analyzing means analyzing a time series related to N_t (j) in each group j, and detecting a temporal change in N_t (j), detecting that abnormal traffic has occurred in the group;
A method for detecting abnormal traffic, comprising:

The method for detecting abnormal traffic according to claim 1,
A step of detecting that abnormal traffic has occurred when N_t (j) exceeds a value m + α × σ obtained by multiplying the moving average m by a standard deviation σ when the time series analyzing means analyzes the time series; An abnormal traffic detection method comprising:

The method for detecting abnormal traffic according to claim 1,
An abnormal traffic detection method characterized in that any one of the number of generated bytes, the number of generated packets, and the number of generated flows is used as N_t as the amount of traffic to be observed.

The abnormal traffic detection method according to claim 3,
The abnormal traffic detection device includes a division group number determining means,
When the division group number determination means monitors the number of flows when determining the division group number M,
Regarding normal flows, the average m of the number of flows before packet sampling and the variance σ ^ 2 are determined in advance by actual data analysis,
Assume that the normal flow number Y follows a normal distribution N (m, σ ^ 2) with an average m and variance σ ^ 2, and F (y, m, σ ^ 2) = P [Y <y] (that is, Y is less than or equal to y) And the number of abnormal flows to be detected d = argmin {d | F (md−α × σ, m, σ ^ 2) < ε} (ε is a predetermined target value for the abnormal traffic non-detection rate (0 <ε <1)),
The average flow number m (p) and variance σ (p) ^ 2 when packet sampling is performed with the sampling probability p is measured, and the flow number m (p) in the group j when this is divided into M groups. , J) is m (p, j) = m (p) / M, and the variance σ (p, j) ^ 2 is calculated by σ (p, j) ^ 2 = σ (p) ^ 2 / M When,
With the preparation of the above step, the number M of divided groups is set to M = argmin {M | F (m (p, j) + α × σ (p, j), m (p, j) + d × p, σ (p, j) setting ^ 2 + d * p * (1-p)) <[epsilon]};
A method for detecting abnormal traffic, comprising:

The abnormal traffic detection method according to claim 4,
The division group number determining means includes
Instead of setting m (p, j) = m (p) / M as the number of flows m (p, j) in group j when divided into M groups,
The flow number distribution ratio w (i) (Σw (i) = 1, 0 <w (i) <1) when divided into M is determined in advance by actual data analysis, and the number of flows becomes maximum after dividing into M. The flow number m (p, j) = m (p) × w (j) in the group j (that is, the group in which w (j) is maximum) is set, and the variance σ (p, j) ^ 2 is set to σ (p, j j) ^ 2 = a × m (p, j) ^ c is calculated (a and c are predetermined or coefficients determined in advance by actual data analysis) to determine the number of divided groups M To detect abnormal traffic.

Collect and analyze information about the flow sampled by packet sampling, measure the amount of traffic generated at each predetermined time interval t0 for each predetermined monitoring unit, set the traffic amount at the t-th measurement interval as N_t, In the abnormal traffic detection device that detects abnormal traffic by looking at the time variation of N_t,
The observed traffic amount N_t is grouped into M according to a predetermined rule, and the traffic amount in the j-th group is N_t (j) (in this case, Σ _{j = 1 to M} N_t (j) = N_t). , N_t (j), group classification means for notifying the time series analysis means,
A time series analysis unit that analyzes a time series related to N_t (j) in each group j and detects that abnormal traffic has occurred in the group when a temporal change in N_t (j) is detected;
An abnormal traffic detection device comprising:

A program for causing a computer to execute the steps according to any one of claims 1 to 5.