JP5511562B2

JP5511562B2 - Traffic fluctuation amount estimation device, traffic management device, traffic distribution device, and method

Info

Publication number: JP5511562B2
Application number: JP2010162171A
Authority: JP
Inventors: 亮一川原; 達哉森; 憲昭上山; 治久長谷川; 哲哉滝根
Original assignee: Nippon Telegraph and Telephone Corp; Osaka University NUC
Current assignee: Nippon Telegraph and Telephone Corp; Osaka University NUC
Priority date: 2010-07-16
Filing date: 2010-07-16
Publication date: 2014-06-04
Anticipated expiration: 2030-07-16
Also published as: JP2012023687A

Description

本発明は、IPネットワークにおけるトラヒックを管理する技術に関するものである。 The present invention relates to a technique for managing traffic in an IP network.

IPネットワークが広く利用されてくるに伴って、トラヒックを測定し、適切なネットワーク設計や運用に反映させる必要がある。例えば、ネットワークの帯域を設計する場合にはトラヒック量の平均だけでなく、変動量（例えばトラヒック量の95%値）を把握することが重要である。 As IP networks become more widely used, traffic needs to be measured and reflected in appropriate network design and operation. For example, when designing a network bandwidth, it is important to grasp not only the average traffic volume but also the fluctuation amount (for example, 95% of the traffic volume).

また、ネットワークを流れるフロー数に関しても見積もる必要がある場合がある。ここでフローとは、発信元IPアドレス(srcIP)、着信先IPアドレス(dstIP)、発信元ポート番号(srcPort)、着信先ポート番号(dstPort)、プロトコル(protocol)の５つ組を同じくするパケット群のことを指す。例えば、ネットワーク上を流れるトラヒックをフローレベルで測定するNetFlowやsFlowといった技術がルータに実装されており、ルータで観測されたフロー情報を、コレクタと呼ばれる監視装置へexportさせてコレクタで分析することで、ネットワークの状態を把握する。このとき、コレクタへ到着するフロー数がどの程度かを見積もる必要がある。その際も、平均だけでなく、変動量を把握して、適切な処理能力のコレクタを用意する必要がある。あるいは、SIPのようなセッション（フロー）単位で通信を制御する場合には、SIPサーバでのセッション数を把握する必要がある。 It may also be necessary to estimate the number of flows that flow through the network. Here, a flow is a packet having the same set of five elements: a source IP address (srcIP), a destination IP address (dstIP), a source port number (srcPort), a destination port number (dstPort), and a protocol (protocol). Refers to a group. For example, technologies such as NetFlow and sFlow that measure traffic flowing on the network at the flow level are implemented in the router, and flow information observed by the router is exported to a monitoring device called a collector and analyzed by the collector. , Grasp the state of the network. At this time, it is necessary to estimate how many flows arrive at the collector. At that time, it is necessary to grasp not only the average but also the fluctuation amount and prepare a collector having an appropriate processing capacity. Alternatively, when communication is controlled in units of sessions (flows) such as SIP, it is necessary to grasp the number of sessions on the SIP server.

さらに、近年、DDoS(Distributed Denial of Service)やスキャンといった異常トラヒックが急増しているため、これら異常トラヒックを測定を通じて検出する必要性も増している。このとき、異常トラヒックをみつけるためには平常時のトラヒックの変動がどの程度かを正確に見積もり、その変動の範囲を超えてトラヒックが急増したら、異常トラヒックが発生したとする、といった手法（例えば非特許文献１）も考えられているため、その点においても、トラヒックの変動量を適切に把握することが重要となる。 Furthermore, in recent years, abnormal traffic such as DDoS (Distributed Denial of Service) and scanning has increased rapidly, and the need to detect such abnormal traffic through measurement is also increasing. At this time, in order to find abnormal traffic, a method of accurately estimating the degree of fluctuation in normal traffic and assuming that abnormal traffic occurs when traffic rapidly increases beyond the range of the fluctuation (for example, non-traffic) Since Patent Document 1) is also considered, it is important to appropriately grasp the traffic fluctuation amount in this respect as well.

一方、回線速度の高速化に対応可能な測定を実施するため、パケットサンプリングを通じてトラヒックを測定分析する技術が近年注目されている（非特許文献２）。例えばN個に1個のパケットを周期的に参照し、パケットがサンプルされたフロー情報（どのフローから何パケットサンプルされたか）を収集分析する技術である。このようなサンプルフロー情報を集計し、一定周期（例えば５分）毎にある監視単位（リンク毎、ルータ毎、あるいは対地毎）に発生したパケット数、バイト数、フロー数に関する時系列データを作成し、分析することが考えられている（非特許文献３）。 On the other hand, in recent years, a technique for measuring and analyzing traffic through packet sampling has been attracting attention in order to perform measurement that can cope with an increase in line speed (Non-patent Document 2). For example, it is a technique of collecting and analyzing flow information (how many packets have been sampled from which flow) by periodically referencing one packet per N packets. Aggregate such sample flow information and create time series data on the number of packets, bytes, and flows generated in a monitoring unit (for each link, for each router, or for each ground) at a certain period (for example, 5 minutes). However, it is considered to analyze (Non-patent Document 3).

原田薫明, 川原亮一, 森達哉, 上山憲昭, 廣川裕, 山本公洋, "異常トラヒック発生検出および終了判定手法," 信学技報, vol. 106, no. 420, IN2006-133, pp. 115-120, 2006年12月.Harada Toshiaki, Kawahara Ryoichi, Mori Tatsuya, Kamiyama Noriaki, Hirokawa Hiroshi, Yamamoto Kimihiro, "Abnormal Traffic Occurrence Detection and Termination Judgment Technique," IEICE Technical Report, vol. 106, no. 420, IN2006-133, pp. 115 -120, December 2006. [PSAMP] RFC 5474 on A Framework for Packet Selection and Reporting http://www.faqs.org/rfcs/rfc5474.html[PSAMP] RFC 5474 on A Framework for Packet Selection and Reporting http://www.faqs.org/rfcs/rfc5474.html 大倉他，信学ソ大，B-7-70, 2006Okura et al., Shingaku Sodai, B-7-70, 2006 T. MORI, T. TAKINE, J. PAN, R. KAWAHARA, M. UCHIDA, and S. GOTO, "Identifying Heavy-Hitter Flows from Sampled Flow Statistics," IEICE TRANSACTIONS on Communications, Vol.E90-B, No.11, pp.3061-3072, Nov 2007.T. MORI, T. TAKINE, J. PAN, R. KAWAHARA, M. UCHIDA, and S. GOTO, "Identifying Heavy-Hitter Flows from Sampled Flow Statistics," IEICE TRANSACTIONS on Communications, Vol.E90-B, No. 11, pp.3061-3072, Nov 2007. C. Estan and G. Varghese, "New Directions in Traffic Measurement and Accounting," ACM SIGCOMM2002, Aug. 2002.C. Estan and G. Varghese, "New Directions in Traffic Measurement and Accounting," ACM SIGCOMM2002, Aug. 2002. N. Duffield, C. Lund, and M. Thorup, "Properties and Prediction of Flow Statistics from Sampled Packet Streams," ACM SIGCOMM Internet Measurement Conference 2002, Nov. 2002.N. Duffield, C. Lund, and M. Thorup, "Properties and Prediction of Flow Statistics from Sampled Packet Streams," ACM SIGCOMM Internet Measurement Conference 2002, Nov. 2002. N. Duffield, C. Lund, and M. Thorup, ``Estimating Flow Distributions from Sampled Flow Statistics,'' In Proceedings of ACM SIGCOMM, pp. 325-336, Aug. 2003.N. Duffield, C. Lund, and M. Thorup, `` Estimating Flow Distributions from Sampled Flow Statistics, '' In Proceedings of ACM SIGCOMM, pp. 325-336, Aug. 2003. R. Kawahara, T. Mori, N. Kamiyama, S. Harada, and S. Asano, "A study on detecting network anomalies using sampled flow statistics," SAINT 2007Workshop on Internet measurement technology and its applications to building next generation Internet, Jan. 2007.R. Kawahara, T. Mori, N. Kamiyama, S. Harada, and S. Asano, "A study on detecting network anomalies using sampled flow statistics," SAINT 2007 Workshop on Internet measurement technology and its applications to building next generation Internet, Jan . 2007.

しかしながら、上記のようにパケットサンプリングを通じてトラヒックを測定分析する技術においては、パケットサンプリングしているために、必要な情報が失われている可能性があり、元のフロー統計量を推定する必要がある。非特許文献４では、パケットサンプリングを用いてリンク帯域の占有率が高いフローを特定する方法を提案している。非特許文献５では、フローサイズが大きいフローの統計を精度よく得る方法を提案している。また、非特許文献６、７では、サンプルされたSYNパケット（TCPフラグの一つで、通信開始を意味する）の数を用いて、サンプルされていない全体のフロー発生数やフローサイズの平均や分布を推定する方法を提案している。 However, in the technique of measuring and analyzing traffic through packet sampling as described above, since packet sampling is performed, necessary information may be lost, and it is necessary to estimate the original flow statistics . Non-Patent Document 4 proposes a method of identifying a flow with a high link bandwidth occupancy rate using packet sampling. Non-Patent Document 5 proposes a method for accurately obtaining statistics of a flow having a large flow size. In Non-Patent Documents 6 and 7, using the number of sampled SYN packets (one of the TCP flags, meaning the start of communication), the total number of unsampled flows, the average of the flow sizes, A method for estimating the distribution is proposed.

しかしながら、これらの提案技術は、トラヒック時系列の変動量を推定することを可能にするものではなかった。 However, these proposed techniques have not made it possible to estimate the amount of traffic time series fluctuation.

本発明は、上述の問題点に鑑みてなされたものであり、観測可能なデータから、観測が困難なトラヒックの変動量を推定することを可能にする技術を提供するとともに、その推定されたトラヒック変動量を用いて、トラヒック管理を実現するための技術を提供することを目的とする。 The present invention has been made in view of the above-described problems, and provides a technique that makes it possible to estimate the amount of traffic fluctuation that is difficult to observe from observable data, and the estimated traffic. An object is to provide a technique for realizing traffic management using the fluctuation amount.

上記の課題を解決するために、本発明は、通信網においてサンプリング率ｐでのパケットサンプリングにより収集されたサンプルフロー情報を受信し、当該サンプルフロー情報から、トラヒック変動量を推定するトラヒック変動量推定装置であって、前記サンプルフロー情報に基づき、フロー数の時系列データをトラヒック量として生成する時系列データ生成手段と、トラヒック量に関する標本分散の期待値yと、標本平均の期待値xとの間の関係を示す近似式y=α(p)x² + β(p)xにおけるパラメータα(p)とβ(p)を、前記時系列データ生成手段により生成された時系列のフロー数から算出する近似式算出手段と、前記近似式算出手段により算出されたパラメータα(p)とβ(p)を有する前記近似式y=α(p)x² + β(p)xを用いて、トラヒック変動量の推定を行うトラヒック推定手段とを備えるトラヒック変動量推定装置であり、前記トラヒック変動量推定装置は、前記サンプルフロー情報におけるフローの発信元ＩＰアドレスをキーとして用いることにより、当該フローをM（Ｍは自然数）個のグループに分割するフロー分類手段を備え、前記フロー分類手段は、前記フローを、M ₁ 個とM ₂ 個（M ₁ とM ₂ はそれぞれ自然数）の２通りに分類し、前記近似式算出手段は、前記時系列データ生成手段により生成されたグループm（m=1〜M）毎の時系列のフロー数から、標本平均X ^(m) (p)と標本分散Vx ^(m) (p)を計算し、グループ全体の標本平均の期待値をS _E ^(M) (p)=1/M ΣX ^(m) (p)（m=1〜Mで和を取る）、標本分散の期待値をS _V ^(M) (p)=1/M ΣVx ^(m) (p) （m=1〜Mで和を取る）として計算する処理をM=M ₁ とM ₂ の２通りに対して実施し、M=M _i （i=1, 2）のときのS _E ^(Mi) (p)とS _V ^(Mi) (p)の組を(x _i 、y _i )として、前記パラメータα(p)とβ(p)を

により算出することを特徴とするトラヒック変動量推定装置として構成することができる。
In order to solve the above-described problems, the present invention receives sample flow information collected by packet sampling at a sampling rate p in a communication network, and estimates traffic fluctuation amount from the sample flow information. A time-series data generating means for generating time-series data of the number of flows as a traffic amount based on the sample flow information, an expected value y of a sample variance relating to the traffic amount, and an expected value x of a sample average The parameters α (p) and β (p) in the approximate expression y = α (p) x ² + β (p) x indicating the relationship between the time series flow numbers generated by the time series data generation means Using the approximate expression y = α (p) x ² + β (p) x having the parameters α (p) and β (p) calculated by the approximate expression calculating means, and calculating the approximate expression calculating means, A tiger that estimates the amount of traffic fluctuation Tsu a traffic variation amount estimation device Ru and a click estimating means, the traffic fluctuation amount estimation device, by using the source IP address of flow in the sample flow information as a key, the flow M (M is a natural number ) comprising a flow classification means for dividing the number of groups, the flow classification unit, the flow, _one and M ₂ pieces M (M ₁ and M ₂ are classified into two types of each natural number), the approximate equation The calculating means calculates the sample mean X ^(m) (p) and the sample variance Vx ^(m) (p ⁾ from the number of time series flows for each group m (m = 1 to M) generated by the time series data generating means. ) And calculate the expected value of the sample mean for the entire group as S _E ^(M) (p) = 1 / M ΣX ^(m) (p) (summed from m = 1 to M ), expected value of sample variance the performed for two different _{^{S V (M) (p)}} = 1 / M ΣVx (m) (p) = M a process of calculating the (m sums in = 1 to M) M ₁ and M ₂ M = M _i (i = 1, 2), the set of S _E ^(Mi) (p) and S _V ^(Mi) (p) is (x _i , y _i ), and the parameters α (p) and β (p) are

It is possible to configure as a traffic fluctuation amount estimation device characterized by the above calculation .

また、本発明は、通信網においてサンプリング率ｐでのパケットサンプリングにより収集されたサンプルフロー情報を受信し、当該サンプルフロー情報から、異常トラヒック監視に適したフロー分割数を決定するトラヒック管理装置であって、前記サンプルフロー情報に基づき、フロー数の時系列データをトラヒック量として生成する時系列データ生成手段と、前記時系列データ生成手段から、時系列のフロー数を受け取り、当該フロー数が所定の異常検出閾値を超えた場合に異常であると判定する異常検出手段と、トラヒック量に関する標本分散の期待値yと、標本平均の期待値xとの間の関係を示す近似式y=α(p)x² + β(p)xにおけるパラメータα(p)とβ(p)を、前記異常検出手段により異常と判定されていない正常時の時系列のフロー数から算出する近似式算出手段と、正常フロー数に異常フロー数が加わったときの(平均，分散)を、(x+m_d, α(p)x²+β(p)x+v_d)とし、正常フロー数に平均がm_dで分散がv_dとなる異常フロー数が加わったときのフロー数の平均と分散を持つ確率分布Fx(u)において、異常判定閾値をthとし、予め定めた異常見逃し率をεとして、Fx(th)<εを満たす正常平均フロー数x*を計算し、フローをMの分割数で分割したときのグループjのフロー数の標本平均X^(j)(p)を求め、グループ全体の標本平均S_E ^(M)(p) =1/M ΣX^(m)(p)（m=1〜Mで和を取る）とx*とを比較して、x*/ S_E ^(M)(p)が予め定めた閾値よりも小さければ、分割数をMより大きくする処理を行うことにより、異常トラヒック監視に適したフロー分割数を決定する分割数決定手段とを備えたことを特徴とするトラヒック管理装置として構成される。 In addition, the present invention is a traffic management apparatus that receives sample flow information collected by packet sampling at a sampling rate p in a communication network and determines a flow division number suitable for abnormal traffic monitoring from the sample flow information. Based on the sample flow information, the time-series data generating means for generating the time-series data of the number of flows as the traffic amount, the time-series flow number from the time-series data generating means, Approximate expression y = α (p ) The parameters α (p) and β (p) in x ² + β (p) x are calculated from the number of normal time series flows that are not determined to be abnormal by the abnormality detection means. (Average, variance) when the number of abnormal flows is added to the number of normal flows and (x + m_d, α (p) x ² + β (p) x + v_d) In the probability distribution Fx (u) with the average and variance of the number of flows when the number of abnormal flows with the average of m_d and variance of v_d is added to the number, the abnormality judgment threshold is set to th, and the predetermined abnormality miss rate is ε Calculate the normal average number of flows x * satisfying Fx (th) <ε, and obtain the sample average X ^(j) (p) of the number of flows in group j when the flow is divided by the number of divisions of M. The overall sample mean S _E ^(M) (p) = 1 / M ΣX ^(m) (p) (m = 1 to M is summed) and x * are compared, and x * / S _E ^{(M ) When} (p) is smaller than a predetermined threshold value, it is provided with a division number determining means for determining a flow division number suitable for abnormal traffic monitoring by performing a process of increasing the division number to be larger than M. As a traffic management device Made.

また、本発明は、通信ネットワークを介してL個（Lは自然数）のサーバに接続され、当該L個のサーバに対してトラヒックのフローを分配するトラヒック分配装置であって、前記フローをL個（Lは自然数）のグループに分類し、グループ毎に各サーバへフローを転送するフロー分類手段と、前記フロー分類手段で分類された各グループのフロー数の時系列データをトラヒック量として生成する時系列データ生成手段と、トラヒック量に関する標本分散の期待値yと、標本平均の期待値xとの間の関係を示す近似式y=α(p)x² +β(p)xにおけるパラメータα(p)とβ(p)を、前記時系列データ生成手段により生成された時系列のフロー数から算出する近似式算出手段と、フローの振り分け先サーバを変更するか否かを決定する振分先サーバ決定手段と、を備え、サーバj（j=1〜L）へ振り分けられているフローをグループjのフローとした場合に、前記近似式算出手段は、グループjのフロー数の標本平均X^(j)(p)と標本分散Vx^(j)(p)を計算し、組(X^(j)(p) , Vx^(j)(p))を(x_j, y_j)として、前記近似式y=α(p)x² + β(p)xにフィットさせることにより、前記パラメータα(p)とβ(p)を求め、
前記振分先サーバ決定手段は、前記近似式を用いることにより、前記フロー数が平均x、分散α(p)x²+β(p)xをパラメータにもつあらかじめ定めた分布に従うものとして上位Xパーセンタイルのフロー数を算出し、当該上位Xパーセンタイルのフロー数がサーバkの処理能力B_k×a(aは係数で1未満の値)となるような、平均x_target_kを算出し、X⁽¹⁾(p)+ X⁽²⁾(p)+…+ X^(J)(p)<= x_target_kを満たす範囲で、グループ1, 2, …, Jのフローをサーバkに収容替えする計算処理を、残りのグループ全てがいずれかのサーバに収容されるまで各サーバについて繰り返して実施し、その結果、もし不要となったサーバが存在する場合に、フローの振り分け先サーバを変更すると決定することを特徴とするトラヒック分配装置として構成することもできる。 Further, the present invention is a traffic distribution device that is connected to L (L is a natural number) servers via a communication network and distributes traffic flows to the L servers. (L is a natural number) When classifying as a traffic amount, a flow classification means for transferring a flow to each server for each group, and time-series data of the number of flows of each group classified by the flow classification means The parameter α () in the approximate expression y = α (p) x ² + β (p) x indicating the relationship between the series data generation means and the expected value y of the sample variance regarding the traffic volume and the expected value x of the sample mean p) and β (p) are approximate expression calculating means for calculating from the number of time-series flows generated by the time-series data generating means, and a distribution destination for determining whether or not to change the flow distribution destination server A server determination means, Specimens in the case where the flow is distributed to over server j (j = 1~L) and flow of group j, the approximate expression calculation means, the flow speed of the sample mean X of the group ^{j (j) (p)} and The variance Vx ^(j) (p) is calculated, and the set (X ^(j) (p), Vx ^(j) (p)) is (x _j , y _j ), and the approximate expression y = α (p) x By fitting to ² + β (p) x, the parameters α (p) and β (p) are obtained,
The distribution destination server determining means uses the approximate expression, so that the number of flows follows a predetermined distribution having parameters of average x and variance α (p) x ² + β (p) x. Calculate the number of flows in the percentile, and the number of flows in the upper X percentile is the processing capacity B _k × a of server k Calculate the average x_target_k such that (a is a coefficient less than ¹⁾ , and X ⁽¹⁾ (p) + X ⁽²⁾ (p) +… + X ^(J) (p) <= x_target_k As long as all the remaining groups are accommodated in one of the servers, the calculation process of accommodating the flows of groups 1, 2, ..., J to the server k is repeated for each server as long as it is satisfied. If there is a server that is no longer needed, it can be configured as a traffic distribution device that determines to change the flow distribution destination server.

また、本発明は、上記各装置の処理方法に対応する方法の発明として構成することもできる。 In addition, the present invention can be configured as a method invention corresponding to the processing method of each of the above apparatuses.

本発明によれば、観測可能なデータから、観測が困難なトラヒックの変動量を推定することを可能にする技術を提供できるとともに、その推定されたトラヒック変動量を用いて、フロー分割数の決定や、フロー振り分け先の変更等のトラヒック管理を実現する技術を提供できる。 According to the present invention, it is possible to provide a technique that makes it possible to estimate a traffic fluctuation amount that is difficult to observe from observable data, and determine the number of flow divisions using the estimated traffic fluctuation amount. In addition, it is possible to provide a technology that realizes traffic management such as changing the flow distribution destination.

異常トラヒック検出閾値と、誤検出率、見逃し率の関係を説明する図である。It is a figure explaining the relationship between an abnormal traffic detection threshold value, a false detection rate, and a miss rate. 本発明の実施例１におけるシステム構成を示す図である。It is a figure which shows the system configuration | structure in Example 1 of this invention. 本発明の実施例１におけるトラヒック変動量推定装置１０の構成例を示すブロック図である。It is a block diagram which shows the structural example of the traffic fluctuation amount estimation apparatus 10 in Example 1 of this invention. 本発明の実施例１におけるトラヒック変動量推定装置１０の動作手順を示すフローチャートである。It is a flowchart which shows the operation | movement procedure of the traffic fluctuation amount estimation apparatus 10 in Example 1 of this invention. 本発明の実施例３におけるトラヒック管理装置３０の構成例を示すブロック図である。It is a block diagram which shows the structural example of the traffic management apparatus 30 in Example 3 of this invention. 本発明の実施例３におけるトラヒック管理装置３０の動作手順を示すフローチャートである。It is a flowchart which shows the operation | movement procedure of the traffic management apparatus 30 in Example 3 of this invention. 本発明の実施例４におけるシステム構成を示す図である。It is a figure which shows the system configuration | structure in Example 4 of this invention. 本発明の実施例４におけるトラヒック分配装置４０の構成例を示すブロック図である。It is a block diagram which shows the structural example of the traffic distribution apparatus 40 in Example 4 of this invention. 本発明の実施例４におけるトラヒック分配装置４０の動作手順を示すフローチャートである。It is a flowchart which shows the operation | movement procedure of the traffic distribution apparatus 40 in Example 4 of this invention. 本発明に係る技術での近似式の有効性を実験した結果を示すグラフである。It is a graph which shows the result of having experimented the effectiveness of the approximate expression in the technique which concerns on this invention.

まず、本発明の各実施例における装置が実行する処理方法として用いられる第１〜第７の方法について説明する。 First, the 1st-7th method used as a processing method which the apparatus in each Example of this invention performs is demonstrated.

（第１の方法）
第１の方法は、通信網において、フローを発信元IPアドレス(srcIP)、着信先IPアドレス(dstIP)、発信元ポート番号(srcPort)、着信先ポート番号(dstPort)、プロトコル番号(Protocol)の５つ組みを同じくするパケット群と定義し、サンプリング率pでのパケットサンプリングによりフローに関するトラヒックデータを収集し、観測して得られるデータから、観測が困難なトラヒック変動量を推定するトラヒック変動量推定法である。 (First method)
In the first method, in a communication network, a flow is made up of a source IP address (srcIP), a destination IP address (dstIP), a source port number (srcPort), a destination port number (dstPort), and a protocol number (Protocol). By defining the five packet groups as the same packet group, collecting traffic data related to the flow by packet sampling at the sampling rate p, and estimating the traffic fluctuation amount that is difficult to observe from the data obtained by observation Is the law.

この第１の方法では、トラヒック量として一定周期毎に観測されたサンプルフロー数（少なくとも一つのパケットがサンプルされたフロー数）の時系列データを扱うこととし、トラヒック量の時系列に関する標本平均と標本分散について、標本分散の期待値yと、標本平均の期待値xの関係式をy=α(p)x² +β(p)xで近似することとし、本近似式において用いるパラメータα(p)とβ(p)を観測されたデータから推定することとしている。 In this first method, the time series data of the number of sample flows (number of flows in which at least one packet is sampled) observed every fixed period as the traffic amount is handled, and the sample average relating to the time series of the traffic amount and For the sample variance, the relational expression between the expected value y of the sample variance and the expected value x of the sample mean is approximated by y = α (p) x ² + β (p) x, and the parameter α ( p) and β (p) are estimated from the observed data.

これにより、観測が困難な領域、つまり平均トラヒック量が観測されている領域よりも仮に大きくあるいは小さくなったとしたときに、そのときの分散を上記近似式を用いて推定することが可能になる。 As a result, when an area that is difficult to observe, that is, when the average traffic volume is assumed to be larger or smaller than the observed area, the variance at that time can be estimated using the above approximate expression.

ここで、この近似式の導出方法を以下で説明する。 Here, a method for deriving this approximate expression will be described below.

＜近似式の導出方法＞
（１）問題設定
K単位時間の間、ランダムパケットサンプリングを用いて無作為標本抽出を行い、標本抽出されたパケットの集約フロー指標（例えば、送信元IPアドレス）をハッシュ関数にかけ、M個のグループに分類する。以下では、標本抽出は時間間隔(0,K]で行われるものとし、時間区間(k-1,k](k=1, 2,…..,K)を区間ｋと呼ぶ。また、サンプリング率をpで表す。 <Derivation method of approximate expression>
(1) Problem setting
During K unit time, random sampling is performed using random packet sampling, and the aggregated flow index (for example, source IP address) of the sampled packet is applied to a hash function to classify into M groups. In the following, sampling is performed at time intervals (0, K), and the time interval (k−1, k) (k = 1, 2,..., K) is referred to as interval k. The rate is expressed as p.

母集団におけるフローの総数をFとし、集約フロー指標の異なり数をF_Aとする。また、m番目(m=1,2,…M)のグループに分類されるフロー数をF^(m)とし、m番目(m=1,2,…M)のグループに分類される集約フロー指標の異なり数をF_A ^(m)とする。定義より Let F be the total number of flows in the population, and F _A be the number of aggregated flow index differences. Also, the number of flows classified into the mth (m = 1,2, ... M) group is F ^(m), and the aggregated flow index is classified into the mth (m = 1,2, ... M) group. Let F _A ^(m) be the number of differences. From definition

である。

It is.

j番目(j=1,2,3,…., F_A)の集約フロー指標がm番目(m=1,2,…M)のグループに分類されるという事象に対する指示関数J_Aj(m) を次式で定義する。 Indicator function J _{Aj (m)} for the event that the j-th (j = 1,2,3, ...., F _A ) aggregated flow index is classified into the mth (m = 1,2, ... M) group Is defined by the following equation.

また、j番目(j=1,2,3,…,F_A)の集約フロー指標をもつフローの総数をF_jで表す。よって、m番目のグループに分類される集約フロー指標の総数F_A ^(m)ならびにm番目のグループに分類されるフローの総数F^(m)は

Further, the total number of flows having the j-th (j = 1, 2, 3,..., F _A ) aggregated flow index is represented by F _j . Therefore, the total number F _A ^{(m) of} aggregated flow indicators classified into the mth group and the total number F ^(m) of flows classified into the mth group are

で与えられる。以下では、各グループに属するフローに番号を振り、個々のフローを区別する。

Given in. In the following, numbers are assigned to flows belonging to each group to distinguish individual flows.

m番目(m=1,2,…,M)のグループに属するi番目(i=1,2,3,…,F^(m))のフロー（以下、フロー(m,i)と呼ぶ）に含まれるパケットが、区間kにおいて少なくとも一つ抽出されるという事象に対する指示関数Y_i ^(m)(k|p)(k=1,2,…,K)を次式で定義する。 To the i-th (i = 1,2,3, ..., F ^(m) ) flow (hereinafter referred to as flow (m, i)) belonging to the m-th (m = 1,2, ..., M) group An instruction function Y _i ^(m) (k | p) (k = 1, 2,..., K) for an event that at least one included packet is extracted in the interval k is defined by the following equation.

さらに、グループmにおいて区間kで少なくとも一つのパケットが抽出されるフローの数X^(m)(k)を定義する。

Further, the number of flows X ^(m) (k) from which at least one packet is extracted in the interval k in the group m is defined.

このとき、標本抽出実験の結果得られる、K個の単位区間に渡って得られるX^(m)(k|p)の標本平均

At this time, the sample average of X ^(m) (k | p) obtained over K unit intervals obtained as a result of the sampling experiment

ならびに標本分散

And sample variance

は次式で与えられる。

Is given by:

定義より、これらの期待値は、

By definition, these expected values are

で与えられる。以下では、

Given in. Below,

と

When

の関係を議論する。
（２）モデル
この節では、以下の仮定を置く。

Discuss the relationship.
(2) Model This section makes the following assumptions.

仮定１：j番目(j=1,2,3,…,F_A)の集約フロー指標がm番目(m=1,2,…,M)のグループに分類されるという事象に対する指示関数J_Aj(m) は他の事象とは独立であり、その期待値は Assumption 1: Indication function J _Aj for the event that the j-th (j = 1, 2, 3,..., F _A ) aggregated flow index is classified into the m-th (m = 1, 2,..., M) group (m) is independent of other events and its expected value is

で与えられる。

Given in.

仮定２：j番目(j=1,2,3,…,F_A)の集約フロー指標をもつフローの総数F_jは独立同一な分布に従う。
母集団における集約フロー指標の異なり数F_Aを（未知の）確定値として扱うと Assumption 2: The total number F _j of flows having the j-th (j = 1, 2, 3,..., F _A ) aggregated flow index independently follows the same distribution.
Different number of aggregated flow indicators in the population F _A is treated as an (unknown) definite value

であり、式（１）、式（２）、仮定１、仮定２より、F^(m)の平均と分散はそれぞれ次式で与えられる。

From Equation (1), Equation (2), Assumption 1, and Assumption 2, the mean and variance of F ^(m) are given by the following equations, respectively.

ただし、Cは

Where C is

である。

It is.

仮定３：区間kにおいてフロー(m,i)(m=1,2,…,M, i=1,2,…, F^(m))のパケットが少なくとも１つ抽出されるという事象に対する指示関数の組{Y_i ^(m)(k|p); k=1,2,…,K}はm, iの値に依らず、同じ結合分布を持つ。 Assumption 3: An indicator function for an event that at least one packet of flow (m, i) (m = 1, 2,..., M, i = 1, 2,..., F ^(m) ) is extracted in interval k. The set {Y _i ^(m) (k | p); k = 1, 2,..., K} has the same connection distribution regardless of the values of m and i.

仮定３より、任意のフロー(m, i)に対する{ Y_i ^(m)(k|p); k=1,2,…,K}と同じ結合分布に従う確率変数の組{ Y(k|p); k=1,2,…,K}が定義できる。これを用いるとグループmにおける観測期間内でランダムに選ばれた時間区間におけるフロー数X^(m)(p)の平均E[X^(m)(p)]と分散Var[X^(m)(p)]は次式で与えられる。 From Assumption 3, a set of random variables {Y (k | p) that follows the same joint distribution as {Y _i ^(m) (k | p); k = 1,2, ..., K} for an arbitrary flow (m, i) ); k = 1,2, ..., K} can be defined. Using this, the mean E [X ^(m) (p)] and variance Var [X ^(m) (p ^{) of the} number of flows X ^(m) (p) in the time interval randomly selected within the observation period in group m )] Is given by:

式（９）に現れる鍵括弧内の項は、凸関数に対するJensenの不等式より

The term in square brackets that appears in equation (9) is from Jensen's inequality for convex functions

であり、等号はE[Y(k|p)]がkに依らず一定の場合のみ成立することに注意する。

Note that the equal sign holds only if E [Y (k | p)] is constant regardless of k.

式（８）より From equation (8)

が得られ、これと式（６）より

From this and equation (6)

が成立する。これらを式（９）へ代入し、整理すると

Is established. Substituting these into equation (9) and rearranging

を得る。ただし、A(p), B(p)は任意のフローを特徴付ける確率変数{Y(k|p);k=1,2,…,K}のみで定まる値であり、

Get. However, A (p) and B (p) are values determined only by a random variable {Y (k | p); k = 1,2, ..., K} characterizing an arbitrary flow,

で与えられる。

Given in.

式（１０）はm番目のグループに分類されるフローの数の分散Var[X^(m)(p)]が平均E[X^(m)(p)]の２次関数で与えられることを示している。すなわち、平均分散曲線は未知の定数α(p)、β(p)を含む２次曲線
y=α(p)x² +β(p)x (12)
となる。 Equation (10) shows that the variance Var [X ^(m) (p)] of the number of flows classified into the mth group is given by a quadratic function of the average E [X ^(m) (p)]. ing. That is, the mean dispersion curve is a quadratic curve with unknown constants α (p), β (p)
y = α (p) x ² + β (p) x (12)
It becomes.

ここで、Mに対するデータX^(m)(k|p)の標本平均 Where the sample mean of data X ^(m) (k | p) for M

と標本分散

And sample variance

が既知であるとする。まず、それぞれの分類におけるE[X^(m)(p)]とVar[X^(m)(p)] (m = 1, 2, …, M)に対応する標本平均S_E ^(M)(p)、標本分散S_V ^(M)(p)を次式により算出する。

Is known. First, the sample mean S _E ^(M) (p ⁾ corresponding to E [X ^(m) (p)] and Var [X ^(m) (p)] (m = 1, 2,…, M) in each classification ), Sample variance S _V ^(M) (p) is calculated by the following equation.

式（３）と式（８）より、

From Equation (3) and Equation (8),

が成立するため、S_E ^(M)(p)は平均E[X^(m)(p)]の不偏推定量である。一方、式（４）から、

Therefore, S _E ^(M) (p) is an unbiased estimator of the average E [X ^(m) (p)]. On the other hand, from equation (4)

にはK個の時間区間内での共分散を含む項が含まれており、偏りがある。今後、以下を仮定する。

Contains a term that includes covariance within K time intervals and is biased. From now on, we will assume the following.

仮定４： Assumption 4:

における共分散を含む項は無視できる。すなわち、

Terms with covariance in can be ignored. That is,

が成立する。

Is established.

従って、標本平均の期待値と、標本分散の期待値との間に、式（１２）の２次関数の関係式が成り立つことになる。 Therefore, the relational expression of the quadratic function of Expression (12) is established between the expected value of the sample mean and the expected value of the sample variance.

（第２の方法）
次に、第２の方法について説明する。第２の方法では、第１の方法において、観測されたフローのsrcIPをキーとして予め用意したハッシュ関数に入力し、そのハッシュ値を元にM個のグループに分割する。 (Second method)
Next, the second method will be described. In the second method, in the first method, the srcIP of the observed flow is input to a hash function prepared in advance as a key, and divided into M groups based on the hash value.

例えば、ハッシュ値をMで割った余りを同じくするフローを同一グループへと分類する。このときのm番目（m=1〜M）のグループに属するフロー数時系列に関して、その標本平均X^(m)(p)と標本分散Vx^(m)(p)を計算し、グループ全体の標本平均の期待値をS_E ^(M)(p)=1/M ΣX^(m)(p)（m=1〜Mで和を取る）、標本分散の期待値をS_V ^(M)(p)=1/M ΣVx^(m)(p)（m=1〜Mで和を取る）で推定する。これを、M=M₁とM₂の２通りに対して実施し、M=M_i（i=1、2）のときのS_E ^(Mi)(p)とS_V ^(Mi)(p)の組を(x_i, y_i)として、第１の方法におけるパラメータα(p)とβ(p)を以下の式で推定することとしている。 For example, flows having the same remainder when the hash value is divided by M are classified into the same group. The sample mean X ^(m) (p) and sample variance Vx ^(m) (p) are calculated for the flow number time series belonging to the mth group (m = 1 to M) at this time, and the samples of the entire group are calculated. The average expected value is S _E ^(M) (p) = 1 / M ΣX ^(m) (p) (m = 1 to M is summed), and the expected sample variance is S _V ^(M) (p) = 1 / M Estimate by ΣVx ^(m) (p) (m = 1 to M and sum). This is performed for M = M ₁ and M ₂ , and S _E ^(Mi) (p) and S _V ^(Mi) (p) when M = M _i (i = 1, 2). Is set to (x _i , y _i ), and parameters α (p) and β (p) in the first method are estimated by the following equations.

（第３の方法）
次に、第３の方法について説明する。第３の方法では、第２の方法において、srcIPをキーとしてグループに分けていた代わりに、dstIPをキーにするか、または｛発信元IPアドレス(srcIP)、着信先IPアドレス(dstIP)、発信元ポート番号(srcPort)、着信先ポート番号(dstPort)、プロトコル番号(Protocol)｝の５つ組みのうちのいくつかの組み合わせでキーを定義してグループに分類することとしている。

(Third method)
Next, the third method will be described. In the third method, instead of using srcIP as a key and dividing into groups in the second method, dstIP is used as a key, or {source IP address (srcIP), destination IP address (dstIP), outgoing Keys are defined and classified into groups of several combinations among five combinations of an original port number (srcPort), a destination port number (dstPort), and a protocol number (Protocol)}.

どのようにキーを定義してグループ分けするかについては、そのときの目的に応じて設定すればよい。例えば、後述する第６の方法においては、異常トラヒック監視を目的としている。その場合には、例えばsrcIPをキーとしてグループに分類することで、スキャンのようなある特定のsrcIPから多数のdstIPやdstPortへトラヒックを分散させている異常を見つけるのに適している。なぜならば、そのような異常トラヒックについては、srcIPをキーにしてグループ分類を行った際には全て同一のグループに集約されるため、異常トラヒックを検出しやすくなるためである（非特許文献８）。 How to define and group keys can be set according to the purpose at that time. For example, the sixth method, which will be described later, aims to monitor abnormal traffic. In that case, for example, by classifying srcIP into a group, it is suitable for finding anomalies such as scanning where traffic is distributed from a specific srcIP to many dstIPs and dstPorts. This is because such abnormal traffic is all collected in the same group when group classification is performed using srcIP as a key, so that it is easy to detect abnormal traffic (Non-patent Document 8). .

（第４の方法）
次に、第４の方法について説明する。第４の方法においては、第２の方法により求めたパラメータを用いて、標本分散の期待値yと標本平均の期待値xの関係式y=α(p)x² + β(p)xを導出しておき、将来の平均トラヒック量がx_estになったと仮定したときの分散をy_est=α(p) (x_est)² + β(p)(x_est)により推定する。 (Fourth method)
Next, the fourth method will be described. In the fourth method, using the parameters obtained by the second method, the relational expression y = α (p) x ² + β (p) x between the expected value y of the sample variance and the expected value x of the sample mean is Derived and the variance when assuming that the future average traffic amount is x_est is estimated by y_est = α (p) (x_est) ² + β (p) (x_est).

一方、トラヒックは平均x_est、分散y_estをパラメータにもつあらかじめ定めた分布（正規分布等）に従うものとして、上位Xパーセンタイルを導出し、それを該将来時点において加わるトラヒック量とみなして必要な設備を設計することとしている。 On the other hand, assuming that traffic follows a predetermined distribution (normal distribution, etc.) with mean x_est and variance y_est as parameters, the top X percentile is derived, and the necessary equipment is designed by regarding it as the amount of traffic added at the future time point. To do.

（第５の方法）
続いて、第５の方法について説明する。第５の方法においては、ネットワークに加わるトラヒックをM個のグループに分割して監視しているものとし、各グループに分類されるフローをハッシュ関数を用いてさらにM'個に分割することで、第２の方法におけるM₁、M₂の２つのグループ数への分類を実現し、第２の方法の手順で、第１の方法のパラメータα(p)とβ(p)を推定することしている。 (Fifth method)
Subsequently, the fifth method will be described. In the fifth method, it is assumed that traffic added to the network is divided into M groups and monitored, and flows classified into each group are further divided into M ′ pieces using a hash function, By classifying M ₁ and M ₂ into two groups in the second method, and estimating the parameters α (p) and β (p) of the first method in the procedure of the second method Yes.

（第６の方法）
次に、第６の方法について説明する。第６の方法においては、フローをM個のグループに分割して、各グループm（m=1〜M）のフロー数時系列を観測し、該フロー数の急激な変化を監視することで異常トラヒックを検出しているとする。このとき、現在のトラヒックにおいて異常トラヒックが存在しないと判定されている場合には、第５の方法でパラメータα(p)とβ(p)を推定しておき、正常フロー数の標本分散の期待値yと標本平均の期待値xの関係式y=α(p)x² + β(p)xを導出しておく。 (Sixth method)
Next, the sixth method will be described. In the sixth method, the flow is divided into M groups, the flow number time series of each group m (m = 1 to M) is observed, and a sudden change in the flow number is monitored to detect abnormalities. Assume that traffic is detected. At this time, if it is determined that there is no abnormal traffic in the current traffic, the parameters α (p) and β (p) are estimated by the fifth method, and the expectation of the sample variance of the number of normal flows A relational expression y = α (p) x ² + β (p) x between the value y and the sample average expectation value x is derived in advance.

一方、フロー数は平均x、分散yをパラメータにもつあらかじめ定めた分布（正規分布等）に従うとする。ここで、(m_d, v_d)を検出すべき異常トラヒックの(フロー数の平均，フロー数の分散)とする。これらは予め定める値である。そして、仮に該異常トラヒックが印加された場合に予め定めた異常見逃し率ε以下で検出可能となるように適切な分割数M*を以下で決定する。
正常フロー数に異常フロー数が加わったときの(平均，分散)を、(x+m_d, α(p)x²+β(p)x+v_d)とし、正常フロー数＋異常フロー数がこの平均と分散を持つ確率分布Fx(u)に従うとみなす。ここで、Fx(u)は正常フロー数の平均がxのときに、正常フロー数+異常フロー数という確率変数Uがu以下となる確率を意味する。つまり、Fx(u)=P[U=<u]である。該確率分布においてFx(th)<εを満たすx*を計算する。なおthは異常と判定する閾値であり、予め定めるパラメータγ(例えばγ=3)を用いてth=x+γ√{α(p)x²+β(p)x}とする。 On the other hand, the number of flows follows a predetermined distribution (normal distribution or the like) having an average x and a variance y as parameters. Here, (m_d, v_d) is defined as abnormal traffic to be detected (average number of flows, variance of number of flows). These are predetermined values. Then, if the abnormal traffic is applied, an appropriate division number M * is determined below so that detection is possible at a predetermined abnormality miss rate ε or less.
The (average, variance) when the abnormal flow number is added to the normal flow number is (x + m_d, α (p) x ² + β (p) x + v_d), and the normal flow number + abnormal flow number is this Consider a probability distribution Fx (u) with mean and variance. Here, Fx (u) means the probability that the random variable U of normal flow number + abnormal flow number is less than or equal to u when the average number of normal flows is x. That is, Fx (u) = P [U = <u]. X * satisfying Fx (th) <ε in the probability distribution is calculated. Note that th is a threshold value for determining an abnormality, and th = x + γ√ {α (p) x ² + β (p) x} using a predetermined parameter γ (for example, γ = 3).

そして、現在のMグループに分割したときのグループ全体の標本平均S_E ^(M)(p) =1/M ΣX^(m)(p)（m=1〜Mで和を取る）とx*を比較して、x*/ S_E ^(M)(p)が予め定めた閾値よりも小さければ、分割数をMより大きくする。具体的には、分割数をINT[S_E ^(M)(p)/ x* ×M]（INT[a]はaの小数点少数を切り上げて整数にすること）と更新する。 Then, the sample average S _E ^(M) (p) = 1 / M ΣX ^(m) (p) (summed from m = 1 to M) and x * when divided into the current M groups In comparison, if x * / S _E ^(M) (p) is smaller than a predetermined threshold, the number of divisions is made larger than M. Specifically, the number of divisions is updated to INT [S _E ^(M) (p) / x * × M] (INT [a] is rounded up to a whole number by rounding off the decimal point of a).

以下、上記の方法の基本的考え方について説明する。 The basic concept of the above method will be described below.

バックボーンでトラヒックを観測していると、トラヒック量が大きいため、異常トラヒックがその中に混在していても、見つけることが困難となる。そこで、監視トラヒックを適切なグループ数に分割して、グループ毎にトラヒックを監視することで異常トラヒック検出を行うことが考えられる。グループ内の正常トラヒック量が小さくなり、異常トラヒックだけ特定のグループに集約できれば、異常トラヒックを検出しやすくなる。 When traffic is observed in the backbone, the amount of traffic is large, so that it is difficult to find even if abnormal traffic is mixed. Therefore, it is conceivable to perform abnormal traffic detection by dividing the monitoring traffic into an appropriate number of groups and monitoring the traffic for each group. If the amount of normal traffic in a group becomes small and only abnormal traffic can be aggregated into a specific group, it becomes easy to detect abnormal traffic.

一般に異常トラヒックの検出は、トラヒックの時系列を監視し、トラヒック量が閾値を超えたら（急激に増加したら）、異常トラヒックが発生したと判断する。このような判断においては、正常時に閾値を超える確率（誤検出率）をなるべく小さくし、同時に異常発生時には見逃し率をなるべく小さくできるようにする必要がある。 In general, abnormal traffic is detected by monitoring the time series of traffic and determining that abnormal traffic has occurred when the traffic volume exceeds a threshold value (when it rapidly increases). In such a determination, it is necessary to reduce the probability of exceeding the threshold value during normal operation (false detection rate) as much as possible, and at the same time, to reduce the oversight rate as much as possible when an abnormality occurs.

図１は、正常フロー数、ならびに正常フロー数＋異常フロー数の分布と、誤検出率・見逃し率の関係を示す図である。一般に、閾値thは「平均x+γ√(分散)」と設定する。例えば正規分布の場合、γ=3にセットすると、閾値を超える確率、つまり誤検出率は0.13%程度に抑えられる。一方で、異常混在時のフロー数は(x+m_d, α(p)x²+β(p)x+v_d)を(平均、分散)とする確率分布Fx(u)に従うことになる。ここで、Fx(u)は正常フロー数の平均がxのときに、正常フロー数+異常フロー数という確率変数Uがu以下となる確率を意味する（つまり、Fx(u)=P[U=<u]）。このとき、異常発生時には閾値を下回る確率（つまり見逃し率）とは該確率分布においてFx(th)がその確率に相当する。 FIG. 1 is a diagram showing the relationship between the number of normal flows and the distribution of the number of normal flows + the number of abnormal flows and the false detection rate / missing rate. In general, the threshold th is set to “average x + γ√ (variance)”. For example, in the case of a normal distribution, when γ = 3 is set, the probability of exceeding the threshold, that is, the false detection rate is suppressed to about 0.13%. On the other hand, the number of flows at the time of abnormal mixing follows a probability distribution Fx (u) in which (x + m_d, α (p) x ² + β (p) x + v_d) is (average, variance). Here, Fx (u) means the probability that the random variable U of normal flow number + abnormal flow number is less than u when the average number of normal flows is x (that is, Fx (u) = P [U = <u]). At this time, when the abnormality occurs, the probability of falling below the threshold (that is, the miss rate) is Fx (th) in the probability distribution.

従って、Fx(th)<εを満たすように、分割数を設定すれば（つまり、正常トラヒック量の平均がこの不等式を満たす程度にできれば）よいことになるため、Fx(th)<εを満たす平均正常トラヒックx*を計算し、現在のMグループに分割したときのグループ全体の標本平均S_E ^(M)(p) =1/M ΣX^(m)(p)（m=1〜Mで和を取る）とx*を比較して、x*/ S_E ^(M)(p)が予め定めた閾値よりも小さければ、分割数をMより大きくすることとしている。具体的には、分割数をINT[S_E ^(M)(p)/ x* ×M]（INT[a]はaの小数点少数を切り上げて整数にすること）と更新する。 Therefore, if the number of divisions is set so as to satisfy Fx (th) <ε (that is, if the average of the normal traffic amount can satisfy this inequality), Fx (th) <ε is satisfied. Average normal traffic x * is calculated and sample average S _E ^(M) (p) = 1 / M ΣX ^(m) (p) (sum of m = 1 to ^M) when divided into current M groups And x * is compared, and if x * / S _E ^(M) (p) is smaller than a predetermined threshold, the number of divisions is made larger than M. Specifically, the number of divisions is updated to INT [S _E ^(M) (p) / x * × M] (INT [a] is rounded up to a whole number by rounding off the decimal point of a).

（第７の方法）
次に、第７の方法について説明する。第７の方法においては、フローを処理するサーバがL台存在するものとし、サーバjの処理能力をB_j[flow/s]とする。また、これらサーバへフローを振り分ける装置が存在するとし、現在、サーバjへ振り分けられているフローをグループjのフローと呼ぶことにし、グループjのフロー数の標本平均X^(j)(p)と標本分散Vx^(j)(p)を計算する。そして、組(X^(j)(p) , Vx^(j)(p))（j=1〜L）を(x_j, y_j)として、y=α(p)x² + β(p)xにフィットさせて、パラメータα(p)とβ(p)を求める。第７の方法では、この関係式を用いて、フローの振り分け先サーバを変更するかどうかを以下の手順で決定する。 (Seventh method)
Next, the seventh method will be described. In the seventh method, it is assumed that there are L servers that process flows, and the processing capability of the server j is B _j [flow / s]. Also, assuming that there is a device that distributes flows to these servers, the flow that is currently allocated to server j is called the flow of group j, and the sample average X ^(j) (p) of the number of flows of group j is Compute the sample variance Vx ^(j) (p). The set (X ^(j) (p), Vx ^(j) (p)) (j = 1 to L) is (x _j , y _j ), and y = α (p) x ² + β (p) The parameters α (p) and β (p) are obtained by fitting to x. In the seventh method, using this relational expression, whether to change the flow distribution destination server is determined by the following procedure.

トラヒックは平均x、分散α(p)x²+β(p)xをパラメータにもつあらかじめ定めた分布（正規分布等）に従うとして上位Xパーセンタイルを導出し、その上位Xパーセンタイルがサーバkの処理能力B_k×a(aは係数で1未満に設定)となるような、平均x_target_kを算出する。該サーバkには、X⁽¹⁾(p)+ X⁽²⁾(p)+…+ X^(J)(p)<= x_target_kを満たす範囲で、グループ1, 2, …, Jのフローをサーバkに収容替えするとする。この手順を残りのグループ全てがいずれかのサーバに収容されるまで繰り返し実施し、その結果、もし不要となったサーバが存在したら、該サーバをsleep状態にして、実際にフローの収容替えを実施する。その後、activeなサーバjの中で、該サーバjへのフロー数の分散から決まる上位Xパーセンタイルの値がa×B_jを超えた場合には、sleepになっているサーバをactiveにして、該サーバjで収容中のフローの一部をsleepサーバへ収容替えをする。 Traffic is derived according to a predetermined distribution (normal distribution, etc.) with mean x and variance α (p) x ² + β (p) x as parameters, and the upper X percentile is the processing capacity of server k. B _k × a The average x_target_k is calculated such that (a is a coefficient set to less than 1). The server k receives the flows of groups 1, 2,…, J within a range that satisfies X ⁽¹⁾ (p) + X ⁽²⁾ (p) +… + X ^(J) (p) <= x_target_k. Assume that server k is replaced. Repeat this procedure until all the remaining groups are accommodated in any server. As a result, if there is a server that is no longer needed, put the server in sleep state and actually change the accommodation of the flow. To do. After that, in the active server j, when the value of the upper X percentile determined by the distribution of the number of flows to the server j exceeds a × B _j , the server that is in the sleep state is made active and the server Part of the flow accommodated by server j is transferred to the sleep server.

上記の方法では、複数台のサーバで負荷分散をしているケースを想定し、トラヒックが減ってきて全部のサーバをアクティブ状態にしなくても処理可能な程度のトラヒックであれば、トラヒックを一部のサーバに片寄せし、残りのサーバをsleep状態とすることで、低消費電力化を図っている。 In the above method, assuming that the load is distributed among multiple servers, if the traffic is reduced and traffic can be processed without having to activate all the servers, part of the traffic The power consumption is reduced by putting the remaining servers in a sleep state and putting the remaining servers in the sleep state.

以下、上述した本発明に係る方法を実装した各実施例を図面を参照して説明する。 Embodiments implementing the above-described method according to the present invention will be described below with reference to the drawings.

まず、実施例１について説明する。図２は、実施例１に係るシステムの基本構成の一例を示す構成図である。図２に示すように、このシステムは、IPネットワーク１を構成する複数のルータ２と、トラヒック変動量推定装置１０とを備えている。 First, Example 1 will be described. FIG. 2 is a configuration diagram illustrating an example of a basic configuration of the system according to the first embodiment. As shown in FIG. 2, this system includes a plurality of routers 2 constituting the IP network 1 and a traffic fluctuation amount estimation device 10.

本システムでは、図２に示すように、各ルータ２においてパケットサンプリング率pで測定されたサンプルフロー情報を、トラヒック変動量推定装置１０が収集する構成となっている。ここで、上記サンプルフロー情報は、フローIDと統計情報のセットであり、フローIDとして、送信元IPアドレス、着信先IPアドレス、送信元ポート番号、着信先ポート番号、プロトコル番号を持ち、統計情報として、サンプルパケット数、サンプルパケットの総バイト数を持つ。 In this system, as shown in FIG. 2, the traffic fluctuation amount estimation device 10 collects the sample flow information measured at each packet 2 at the packet sampling rate p. Here, the above sample flow information is a set of flow ID and statistical information. As flow ID, it has source IP address, destination IP address, source port number, destination port number, protocol number, and statistical information. As the number of sample packets and the total number of bytes of the sample packets.

トラヒック変動量推定装置１０は、第１から第４の方法を用いてトラヒック変動量推定を行う装置である。図３に、トラヒック変動量推定装置１０の機能構成図を示す。図３に示す通り、トラヒック変動量推定装置１０は、フロー分類部１１と、時系列データ生成部１２と、近似式算出部１３と、トラヒック予測部１４を備えている。当該構成を有するトラヒック変動量推定装置１０の動作を、図４に示す処理手順に沿って説明する。 The traffic fluctuation amount estimation device 10 is a device that performs traffic fluctuation amount estimation using the first to fourth methods. FIG. 3 shows a functional configuration diagram of the traffic fluctuation amount estimation apparatus 10. As illustrated in FIG. 3, the traffic fluctuation amount estimation device 10 includes a flow classification unit 11, a time series data generation unit 12, an approximate expression calculation unit 13, and a traffic prediction unit 14. The operation of the traffic fluctuation amount estimation apparatus 10 having this configuration will be described along the processing procedure shown in FIG.

ステップ１０１）フロー分類部１１は、各ルータから到着したサンプルフロー情報を受け取り、該フローの（送信元IPアドレス、着信先IPアドレス、送信元ポート番号、着信先ポート番号、プロトコル番号）を読み出し、フローID照合手段におけるメモリ等に予め登録されているルールを参照し、当該ルールに従ってフローを複数のグループに分類する。 Step 101) The flow classification unit 11 receives the sample flow information arriving from each router, reads out the (source IP address, destination IP address, source port number, destination port number, protocol number) of the flow, With reference to a rule registered in advance in a memory or the like in the flow ID collating means, the flows are classified into a plurality of groups according to the rule.

一例として、フロー分類部１１は、srcIPをキーとしてハッシュ関数に入力し、そのハッシュ値の先頭8ビットをみて、M＝2^8個のグループに分類する。ここではsrcIPをキーとする例を示したが、srcIPではなく、dstIP、または｛発信元IPアドレス(srcIP)、着信先IPアドレス(dstIP)、発信元ポート番号(srcPort)、着信先ポート番号(dstPort)、プロトコル番号(Protocol)｝の５つ組みのうちのいくつかの組み合わせでキーを定義してもよい。なお、ここでは、フロー分類部１１は、M個のグループへの分類をM=M₁とM₂の２通りに対して実施する。例えばM₁=2、 M₂=4とセットする。 As an example, the flow classification unit 11 inputs srcIP as a key into a hash function, and looks at the first 8 bits of the hash value to classify into M = 2 ^ 8 groups. In this example, srcIP is used as a key, but instead of srcIP, dstIP or {source IP address (srcIP), destination IP address (dstIP), source port number (srcPort), destination port number ( dstPort) and protocol number (Protocol)} may be defined in some combinations of five. Here, the flow classification unit 11 performs classification into M groups for M = M ₁ and M ₂ . For example, set M ₁ = 2 and M ₂ = 4.

分類後、該フロー情報と該フローがマッピングされたグループ番号m₁、m₂（前者はM= M₁としたとき、後者はM= M₂としたときのグループ番号）をセットにして、時系列データ生成部１２に通知する。 After classification, set the flow information and the group numbers m ₁ and m ₂ to which the flows are mapped (the former is M = M ₁ and the latter is the group number when M = M ₂ ) as a set. Notify the series data generation unit 12.

ステップ１０２）時系列データ生成部１２は、各グループmのフロー数を一定周期（例えば１分毎）毎にカウントしておく。これを、予め定めた測定期間（例えば１分×１８０回）実施し、測定期間終了時に、フロー数の時系列データを近似式算出部１３へ転送する。 Step 102) The time-series data generation unit 12 counts the number of flows in each group m every predetermined period (for example, every minute). This is performed for a predetermined measurement period (for example, 1 minute × 180 times), and the time-series data of the number of flows is transferred to the approximate expression calculation unit 13 at the end of the measurement period.

ステップ１０３）近似式算出部１３は、各グループ番号mの標本平均X^(m)(p)と標本分散Vx^(m)(p)を計算し、グループ全体の標本平均の期待値をS_E ^(M)(p)=1/M ΣX^(m)(p)（m=1〜Mで和を取る）、標本分散の期待値をS_V ^(M)(p) = 1/M ΣVx^(m)(p) （m=1〜Mで和を取る）で推定する。これを、M=M₁とM₂の２通りに対して実施し、M=M_i（i=1, 2）のときのS_E ^(Mi)(p)とS_V ^(Mi)(p)の組を(x_i, y_i)として、パラメータα(p)とβ(p)を以下の式を用いて算出する。 Step 103) The approximate expression calculation unit 13 calculates the sample mean X ^(m) (p) and sample variance Vx ^(m) (p) of each group number m, and calculates the expected value of the sample mean for the entire group as S _E ^{( M)} (p) = 1 / M ΣX ^(m) (p) (m = 1 to M is summed), and the expected sample variance is S _V ^(M) (p) = 1 / M ΣVx ^(m) (p) Estimate by (summing from m = 1 to M). This is performed for M = M ₁ and M ₂ , and S _E ^(Mi) (p) and S _V ^(Mi) (p) when M = M _i (i = 1, 2). The parameters α (p) and β (p) are calculated using the following equations, where (x _i , y _i ) is a set of

パラメータを決定したら、当該パラメータを、トラヒック予測部１４へ通知する。

When the parameter is determined, the parameter is notified to the traffic prediction unit 14.

ステップ１０４）トラヒック予測部１４では、近似式算出部１３により決定されたパラメータを用いて、標本分散の期待値yと標本平均の期待値xの関係式y=α(p)x² + β(p)xを導出しておき、将来の平均トラヒック量がx_estになったと仮定したときの分散をy_est=α(p) (x_est)² + β(p)(x_est)により推定する。一方、トラヒックは平均x_est、分散y_estをパラメータにもつあらかじめ定めた分布（正規分布等）に従うとして、上位Xパーセンタイルを導出し、それを該将来時点において加わるトラヒック量の予測値として出力する。 Step 104) The traffic prediction unit 14 uses the parameter determined by the approximate expression calculation unit 13, and the relational expression y = α (p) x ² + β () between the expected value y of the sample variance and the expected value x of the sample mean. p) x is derived, and the variance when the future average traffic amount is assumed to be x_est is estimated by y_est = α (p) (x_est) ² + β (p) (x_est). On the other hand, assuming that the traffic follows a predetermined distribution (normal distribution or the like) having an average x_est and a variance y_est as parameters, an upper X percentile is derived and output as a predicted value of the traffic amount to be added at the future time point.

トラヒック予測部１４により予測されたトラヒック量は、例えば、オペレータの端末に出力される。また、トラヒック予測部１４は、上記のようにして推定された分散を出力し、トラヒック量の予測を他の手段で行うこととしてもよい。 The traffic volume predicted by the traffic prediction unit 14 is output to an operator's terminal, for example. Further, the traffic prediction unit 14 may output the variance estimated as described above, and may predict the traffic amount by other means.

トラヒック変動量推定装置１０は、例えば、ＣＰＵやメモリ、ハードディスク等を備えた一般的なコンピュータに、各機能部に対応するプログラムを実行させることにより実現可能である。当該プログラムは、可搬メモリ等のコンピュータ読み取り可能な記録媒体に記録して配布してもよいし、ネットワーク上のサーバからダウンロードすることもできる。このように、トラヒック変動量推定装置１０をコンピュータとプログラムで構成した場合、CPUは、プログラムの命令に従って、フロー数等の処理データの読み書きをメモリに対し行うことにより処理を進め、これにより、トラヒック変動量推定装置１０における各機能部が実現される。 The traffic fluctuation amount estimation device 10 can be realized, for example, by causing a general computer including a CPU, a memory, a hard disk, and the like to execute a program corresponding to each functional unit. The program may be distributed by being recorded on a computer-readable recording medium such as a portable memory, or may be downloaded from a server on the network. As described above, when the traffic fluctuation amount estimation device 10 is configured by a computer and a program, the CPU advances the processing by reading / writing the processing data such as the number of flows to / from the memory in accordance with the instructions of the program. Each functional unit in the variation estimation device 10 is realized.

また、トラヒック変動量推定装置１０を、各機能部の処理機能を埋め込んだハードウェアロジック回路を用いて実現することも可能である。 It is also possible to implement the traffic fluctuation amount estimation device 10 using a hardware logic circuit in which the processing functions of the functional units are embedded.

上記のようにコンピュータとプログラム、もしくはハードウェアロジック回路で実現できる点は、他の実施例における装置についても同様である。 The points that can be realized by a computer and a program or a hardware logic circuit as described above are the same for devices in other embodiments.

実施例２における装置構成は、実施例１と同じである。実施例２では、実施例１のフロー分類部１１のようにして予めセットした２通りのグループ分類を実施する代わりに、以下のようなグループ分けを行う。 The apparatus configuration in the second embodiment is the same as that in the first embodiment. In the second embodiment, the following grouping is performed instead of performing two types of preset group classification like the flow classification unit 11 of the first embodiment.

ここではネットワークを流れるトラヒックがM個のグループに分類されているとする。例えば後述する実施例３のように、異常トラヒック監視目的で、M個（これをM₁個とする)のグループに分けてトラヒックを監視しているものとする。そして、フロー分類部１１は、各グループに分類されるフローをハッシュ関数を用いてさらにM'個に分割することで、全体としてM₂個のグループ分けを実現し、実施例1に記載のM₁、M₂の２つのグループ数への分類を実現し、あとは実施例１と同様の手順で、トラヒック変動量を推定する。 Here, it is assumed that the traffic flowing through the network is classified into M groups. For example, as in Example 3 described later, for the purpose of monitoring abnormal traffic, it is assumed that the traffic is monitored by being divided into M groups (this is M ₁ ). Then, the flow classification unit 11 further divides the flow classified into each group into M ′ pieces using a hash function, thereby realizing M ₂ grouping as a whole, and the M described in the first embodiment. ₁ and M ₂ are classified into two groups, and the traffic fluctuation amount is estimated by the same procedure as in the first embodiment.

次に、実施例３を説明する。図５に、本実施例におけるトラヒック管理装置３０の機能構成図を示す。このトラヒック管理装置３０は、図２に示したシステムにおいて、トラヒック変動量推定装置１０に代えて備えられ、運用することを想定している。あるいは、トラヒック変動量推定装置１０の中に付加的に、トラヒック管理装置３０としての機能を実現するために必要な機能部を備えることとしてもよい。 Next, Example 3 will be described. FIG. 5 is a functional configuration diagram of the traffic management device 30 in the present embodiment. This traffic management device 30 is assumed to be provided and operated in place of the traffic fluctuation amount estimation device 10 in the system shown in FIG. Alternatively, the traffic fluctuation amount estimation device 10 may additionally include a functional unit necessary for realizing the function as the traffic management device 30.

図５に示す通り、トラヒック管理装置３０は、フロー分類部３１と、時系列データ生成部３２と、異常検出部３３と、近似式算出部３４と、グループ数決定部３５を備えている。当該構成を有するトラヒック管理装置３０の動作を、図６に示す処理手順に沿って説明する。 As shown in FIG. 5, the traffic management device 30 includes a flow classification unit 31, a time series data generation unit 32, an abnormality detection unit 33, an approximate expression calculation unit 34, and a group number determination unit 35. The operation of the traffic management device 30 having this configuration will be described along the processing procedure shown in FIG.

ステップ３０１にて、フロー分類部３１が、フローをM個のグループに分類し、ステップ３０２にて、時系列データ生成部３２が、各グループのフロー数時系列を生成する。ここまでの処理は実施例１におけるステップ１０１及びステップ１０２と同じ処理である。 In step 301, the flow classification unit 31 classifies the flows into M groups, and in step 302, the time-series data generation unit 32 generates the number of flows for each group. The processing so far is the same processing as Step 101 and Step 102 in the first embodiment.

ステップ３０３）時系列データ生成部３２は、一定周期毎に時系列データが追加されるたびに、異常検出部３３へそのデータを転送する。 Step 303) The time series data generation unit 32 transfers the data to the abnormality detection unit 33 each time time series data is added at regular intervals.

ステップ３０４）異常検出部３３では、各グループmのフロー数の平均と分散を過去の時系列データから計算しておき、異常検出閾値を「平均＋γ×√（分散）」に設定し、現時点のフロー数がこの閾値を超えたら異常が発生したと判断する。異常と判定された場合はその旨を近似式算出部３４へ伝える。 Step 304) The anomaly detection unit 33 calculates the average and variance of the number of flows of each group m from the past time series data, sets the anomaly detection threshold to “average + γ × √ (dispersion)”, and If the number of flows exceeds this threshold, it is determined that an abnormality has occurred. If it is determined that there is an abnormality, the fact is notified to the approximate expression calculation unit 34.

ステップ３０５）近似式算出部３４では、現在のトラヒックにおいて、異常検出部３３により異常トラヒックが存在しないと判定されている場合には、実施例２の方法でパラメータα(p)とβ(p)を推定しておき、正常フロー数の標本分散の期待値yと標本平均の期待値xの関係式y=α(p)x² +β(p)xを導出しておく。そして、その結果をグループ数決定部３５に通知する。 Step 305) In the approximate expression calculation unit 34, when it is determined by the abnormality detection unit 33 that there is no abnormal traffic in the current traffic, the parameters α (p) and β (p) are determined by the method of the second embodiment. , And the relational expression y = α (p) x ² + β (p) x between the expected value y of the sample variance of the number of normal flows and the expected value x of the sample mean is derived in advance. Then, the result is notified to the group number determination unit 35.

ステップ３０６）グループ数決定部３５では、以下の手順で異常トラヒック監視のために適したグループ数（分割数）を決定する。 Step 306) The group number determination unit 35 determines the number of groups (number of divisions) suitable for monitoring abnormal traffic according to the following procedure.

まず、フロー数は平均x、分散yをパラメータにもつあらかじめ定めた分布（正規分布等）に従うとする。ここで、(m_d, v_d)を検出すべき異常トラヒックの(フロー数の平均、フロー数の分散)とする。なお、異常トラヒックの(フロー数の平均、フロー数の分散)は予め定める値である。そして、仮に該異常トラヒックが印加された場合に予め定めた異常見逃し率ε以下となるように適切な分割数M*を以下のようにして決定する。 First, it is assumed that the number of flows follows a predetermined distribution (normal distribution or the like) having an average x and a variance y as parameters. Here, (m_d, v_d) is the abnormal traffic to be detected (average number of flows, variance of number of flows). The abnormal traffic (average number of flows, variance of number of flows) is a predetermined value. Then, if the abnormal traffic is applied, an appropriate division number M * is determined as follows so as to be equal to or less than a predetermined abnormality miss rate ε.

正常フロー数に異常フロー数が加わったときの(平均，分散)を、(x+m_d, α(p)x²+β(p)x+v_d)とし、正常フロー数＋異常フロー数がこの平均と分散を持つ確率分布Fx(u)に従うとみなす。ここで、Fx(u)は正常フロー数の平均がxのときに、正常フロー数+異常フロー数という確率変数Uがu以下となる確率を意味する。つまり、Fx(u)=P[U=<u]である。該確率分布においてFx(th)<εを満たすx*を計算する。なおthは異常と判定する閾値であり、予め定めるパラメータγ（例えばγ=3)を用いてth=x+γ√{α(p)x² + β(p)x}とする。現在のMグループに分割したときのグループ全体の標本平均S_E ^(M)(p) =1/M ΣX^(m)(p)（m=1〜Mで和を取る）とx*を比較して、x*/ S_E ^(M)(p)が予め定めた閾値よりも小さければ、分割数をMより大きくする。具体的には、分割数をINT[S_E ^(M)(p)/ x* ×M]（INT[a]はaの小数点少数を切り上げて整数にすること）と更新することにより、分割数を決定する。 The (average, variance) when the abnormal flow number is added to the normal flow number is (x + m_d, α (p) x ² + β (p) x + v_d), and the normal flow number + abnormal flow number is this Consider a probability distribution Fx (u) with mean and variance. Here, Fx (u) means the probability that the random variable U of normal flow number + abnormal flow number is less than or equal to u when the average number of normal flows is x. That is, Fx (u) = P [U = <u]. X * satisfying Fx (th) <ε in the probability distribution is calculated. Note that th is a threshold value for determining an abnormality, and th = x + γ√ {α (p) x ² + β (p) x} using a predetermined parameter γ (for example, γ = 3). Compare the sample average S _E ^(M) (p) = 1 / M ΣX ^(m) (p) (summed from m = 1 to M) and x * when divided into the current M groups Thus, if x * / S _E ^(M) (p) is smaller than a predetermined threshold, the number of divisions is made larger than M. Specifically, the number of divisions is updated by updating the number of divisions to INT [S _E ^(M) (p) / x * × M] (INT [a] rounds up the decimal point of a to an integer). To decide.

次に、実施例４について説明する。本実施例におけるシステムは、図７に示すように、トラヒック分配装置４０とL台のサーバ（５０）が、ネットワークを介して接続された構成を有する。 Next, Example 4 will be described. As shown in FIG. 7, the system in this embodiment has a configuration in which a traffic distribution device 40 and L servers (50) are connected via a network.

図７の構成において、トラヒック分配装置４０が、L台のサーバへフローを分配している。なお、サーバjの処理能力をB_j[flow/s]とする。現在、サーバjへ振り分けられているフローをグループjのフローと呼ぶことにする。 In the configuration of FIG. 7, the traffic distribution device 40 distributes the flow to L servers. Note that the processing capacity of the server j is B _j [flow / s]. The flow currently distributed to the server j will be referred to as a group j flow.

図８に、本実施例におけるトラヒック分配装置４０の機能構成図を示す。図８に示すとおり、トラヒック分配装置４０は、フロー分類部４１と、時系列データ生成部４２と、近似式算出部４３と、振分先サーバ決定部４４を備えている。当該構成を有するトラヒック分配装置４０の動作を、図９に示す処理手順に沿って説明する。 FIG. 8 shows a functional configuration diagram of the traffic distribution device 40 in the present embodiment. As shown in FIG. 8, the traffic distribution device 40 includes a flow classification unit 41, a time series data generation unit 42, an approximate expression calculation unit 43, and a distribution destination server determination unit 44. The operation of the traffic distribution device 40 having this configuration will be described along the processing procedure shown in FIG.

ステップ４０１）フロー分類部４１では、フローをL個のグループに分類し、該当するグループ番号のサーバへフローを転送している。同時に、各グループのサンプルフロー数の情報と、どのフローをどのグループ番号へ割り当てたかを時系列データ生成部４２へ通知する。 Step 401) The flow classification unit 41 classifies the flows into L groups and transfers the flows to the server of the corresponding group number. At the same time, the time-series data generation unit 42 is notified of information on the number of sample flows in each group and which flow is assigned to which group number.

ステップ４０２）時系列データ生成部４２において、各グループのフロー数時系列を生成する。この手順は実施例１におけるステップ１０２と同じである。すなわち、時系列データ生成部４２は、グループmのフロー数を一定周期（例えば１分毎）毎にカウントしておく。これを、予め定めた測定期間（例えば１分×１８０回）実施し、測定期間終了時に、時系列データを近似式算出部４３へ転送する。 Step 402) The time-series data generation unit 42 generates a flow time-series for each group. This procedure is the same as step 102 in the first embodiment. In other words, the time-series data generation unit 42 counts the number of flows of the group m every fixed period (for example, every minute). This is performed for a predetermined measurement period (for example, 1 minute × 180 times), and the time-series data is transferred to the approximate expression calculation unit 43 at the end of the measurement period.

ステップ４０３）近似式算出部４３では、グループjのフロー数の標本平均X^(j)(p)と標本分散Vx^(j)(p)を計算する。組(X^(j)(p) , Vx^(j)(p))（j=1〜L）を(x_j, y_j)として、近似式y=α(p)x² + β(p)xにフィットさせて、パラメータα(p)とβ(p)を求める。その結果を振分先サーバ決定部４４へ通知する。 Step 403) The approximate expression calculation unit 43 calculates the sample average X ^(j) (p) and the sample variance Vx ^(j) (p) of the number of flows in group j. The set (X ^(j) (p), Vx ^(j) (p)) (j = 1 to L) is (x _j , y _j ), and the approximate expression y = α (p) x ² + β (p) The parameters α (p) and β (p) are obtained by fitting to x. The result is notified to the distribution destination server determination unit 44.

ステップ４０４）振分先サーバ決定部４４では、上記近似式を用いて、フローの振り分け先サーバを変更するかどうかを以下の手順で決定する。 Step 404) The distribution destination server determination unit 44 determines whether to change the flow distribution destination server by the following procedure using the above approximate expression.

ここでは、トラヒックは平均x、分散α(p)x²+β(p)xをパラメータにもつあらかじめ定めた分布（正規分布等）に従うとして上位Xパーセンタイルを導出し、その上位Xパーセンタイルがサーバkの処理能力B_k×a(aは係数で1未満に設定)となるような、平均x_target_kを算出する。そして、X⁽¹⁾(p)+ X⁽²⁾(p)+…+ X^(J)(p)<= x_target_kを満たす範囲で、グループ1, 2, …, Jのフローをサーバkに収容替えする計算処理を行う。この手順を残りのグループ全てがいずれかのサーバに収容されるまで繰り返し実施し、その結果、もし不要となったサーバが存在したら、該サーバをsleep状態にして、上記計算処理の結果に従って、実際にフローの収容替えを実施する。一方、activeなサーバjの中で、該サーバjへのフロー数の分散から決まる上位Xパーセンタイルの値がa×B_jを超えた場合には、sleepになっているサーバをactiveにして、該サーバjで収容中のフローの一部をsleepサーバへ収容替えをする。
（実施の形態の効果）
以上、各実施例を用いて説明した本発明に係る技術によれば、観測されているサンプルデータから、直接観測が困難なトラヒックの変動量を推定することが可能になるとともに、その変動量を用いて、分割数の決定や、振り分け先サーバの収容替え制御等のトラヒック管理を実現することが可能になる。 Here, the upper X percentile is derived assuming that the traffic follows a predetermined distribution (normal distribution, etc.) having parameters of average x and variance α (p) x ² + β (p) x, and the upper X percentile is server k. Processing capacity B _k × a The average x_target_k is calculated such that (a is a coefficient set to less than 1). X ⁽¹⁾ (p) + X ⁽²⁾ (p) +… + X ^(J) (p) <= The flow of group 1, 2,…, J is accommodated in server k within the range satisfying x_target_k. Perform the calculation process to be replaced. This procedure is repeated until all the remaining groups are accommodated in any server. As a result, if there is a server that is no longer needed, the server is put into the sleep state, and the actual processing is performed according to the result of the above calculation process. The flow will be changed. On the other hand, in the active server j, when the value of the upper X percentile determined by the distribution of the number of flows to the server j exceeds a × B _j , the server that is in sleep is set active and the server Part of the flow accommodated by server j is transferred to the sleep server.
(Effect of embodiment)
As described above, according to the technology according to the present invention described using each embodiment, it is possible to estimate the traffic fluctuation amount that is difficult to observe directly from the observed sample data, and to calculate the fluctuation amount. By using it, it becomes possible to realize traffic management such as determination of the number of divisions and accommodation change control of the distribution destination server.

ここで、本発明において提案する近似式が正しく実際のトラヒック変動を推定できていることを実データを用いて示す。インターネットバックボーンで測定されたパケットキャプチャデータ（１７１分）を用いて以下の実験を行った。 Here, it is shown using actual data that the approximate expression proposed in the present invention correctly estimates the actual traffic fluctuation. The following experiment was performed using packet capture data (171 minutes) measured on the Internet backbone.

まず、パケットをサンプリング率p=1/1024でサンプリングし、少なくとも１パケットサンプルされたフローについて、グループ数M=1、2、4、8、16、32、64、128、256、512、1024の１１パタンで分類し、１分周期で各グループのサンプルフロー数の時系列を作成した。そして、各グループの（標本平均、標本分散）を計算し、それらをプロットした結果を図１０に示す。X軸が平均、Y軸が分散である。 First, packets are sampled at a sampling rate p = 1/1024, and at least one packet is sampled for the number of groups M = 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024. Classification was made with 11 patterns, and a time series of the number of sample flows in each group was created at a cycle of 1 minute. Then, (sample average, sample variance) of each group is calculated, and the result of plotting them is shown in FIG. The X axis is the mean and the Y axis is the variance.

同時に、M=1とM=2のときの結果を用いて、近似式y=α(p)x²+β(p)xを計算した。このときα=0.002910、β=0.626320となった。その近似曲線も図１０に示す。これより、本発明の技術を用いることで、M=４以上の領域について近似できていることが確認できる。 At the same time, an approximate expression y = α (p) x ² + β (p) x was calculated using the results when M = 1 and M = 2. At this time, α = 0.002910 and β = 0.626320. The approximate curve is also shown in FIG. From this, it can be confirmed that the region of M = 4 or more can be approximated by using the technique of the present invention.

IPネットワークにおけるトラヒック変動量を推定することを含むトラヒックの管理を行うための技術に適用できる。 The present invention can be applied to a technology for managing traffic including estimating traffic fluctuation in an IP network.

本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において、種々変更・応用が可能である。 The present invention is not limited to the above-described embodiments, and various modifications and applications are possible within the scope of the claims.

１ IPネットワーク
２ルータ
１０トラヒック変動量推定装置
１１、３１、４１フロー分類部
１２、３２、４２時系列データ生成部
１３、３４、４３近似式算出部
１４トラヒック予測部
３０トラヒック管理装置
３３異常検出部
３５グループ数決定部
４０トラヒック分配装置
４４振り分け先サーバ決定部
５０サーバ DESCRIPTION OF SYMBOLS 1 IP network 2 Router 10 Traffic fluctuation amount estimation apparatus 11, 31, 41 Flow classification part 12, 32, 42 Time series data generation part 13, 34, 43 Approximation formula calculation part 14 Traffic prediction part 30 Traffic management apparatus 33 Abnormality detection part 35 Group number determination unit 40 Traffic distribution device 44 Distribution destination server determination unit 50 Server

Claims

A traffic fluctuation amount estimation apparatus that receives sample flow information collected by packet sampling at a sampling rate p in a communication network and estimates a traffic fluctuation quantity from the sample flow information,
Based on the sample flow information, time-series data generating means for generating time-series data of the number of flows as a traffic amount;
The parameters α (p) and β (p in the approximate expression y = α (p) x ² + β (p) x, which shows the relationship between the sample variance expectation y for the traffic volume and the sample mean expectation x ) Is calculated from the number of time-series flows generated by the time-series data generating means,
Traffic that estimates traffic fluctuation amount using the approximate expression y = α (p) x ² + β (p) x having the parameters α (p) and β (p) calculated by the approximate expression calculation means Ru and a estimation unit is traffic fluctuation amount estimation device,
The traffic fluctuation amount estimation device includes a flow classification unit that divides the flow into M (M is a natural number) groups by using a flow source IP address in the sample flow information as a key.
The flow classification unit, the flow, _one and M ₂ pieces M (M ₁ and M ₂ are each a natural number) are classified into two types of
The approximate expression calculating means calculates the sample mean X ^(m) (p) and the sample variance Vx ^(m from the number of time series flows for each group m (m = 1 to M) generated by the time series data generating means. ⁾ (p) is calculated, and the expected value of the sample mean for the entire group is S _E ^(M) (p) = 1 / M ΣX ^(m) (p) (m = 1 to M is summed), sample variance The process of calculating the expected value of S _V ^(M) (p) = 1 / M ΣVx ^(m) (p) (summing from m = ₁ to M) in two ways: M = M ₁ and M ₂ The above parameters are set as (x _i , y _i ) with a set of S _E ^(Mi) (p) and S _V ^(Mi) (p) when M = M _i (i = 1, 2). α (p) and β (p)

A traffic fluctuation amount estimation device characterized by calculating by the following .

The flow classification means uses the destination IP address as a key instead of using the source IP address as a key, or the source IP address, destination IP address, source port number, destination port number, and by using some combination of five of the protocol number as a key, a traffic variation quantity estimation apparatus according to claim 1, characterized in that classifying the flow to the group.

The traffic estimation means uses the approximate expression y_est = α (p) (x_est) ² + β (p) (x_est) with the variance y_est assuming that the future average traffic amount becomes x_est as the traffic fluctuation amount traffic change amount estimating apparatus according to claim 1 or 2, characterized in that estimation.

The flow classification means further divides each of the flows classified into M ₁ groups into M ′ pieces, and performs M ₂ classifications as a whole, so that _two types of M ₁ and M ₂ are obtained. The traffic fluctuation amount estimation apparatus according to claim 1 , wherein the traffic fluctuation amount estimation apparatus performs classification of

A traffic management device that receives sample flow information collected by packet sampling at a sampling rate p in a communication network, and determines a flow division number suitable for abnormal traffic monitoring from the sample flow information,
Based on the sample flow information, time-series data generating means for generating time-series data of the number of flows as a traffic amount;
An abnormality detection unit that receives the number of time-series flows from the time-series data generation unit and determines that it is abnormal when the number of flows exceeds a predetermined abnormality detection threshold;
The parameters α (p) and β (p in the approximate expression y = α (p) x ² + β (p) x, which shows the relationship between the sample variance expectation y for the traffic volume and the sample mean expectation x ) Is calculated from the number of normal time series flows not determined to be abnormal by the abnormality detection means,
When the number of abnormal flows is added to the number of normal flows, the (average, variance) is (x + m_d, α (p) x ² + β (p) x + v_d), and the average is distributed to the number of normal flows with m_d In the probability distribution Fx (u) having the average and variance of the number of flows when the number of abnormal flows with v_d is added, the abnormality determination threshold is set to th, and the predetermined abnormality miss rate is set to ε, Fx (th) < The number of normal average flows x * satisfying ε is calculated, the sample average X ^(j) (p) of the number of flows in group j when the flow is divided by the number of divisions of M is obtained, and the sample average S _E ^{( M)} (p) = 1 / M ΣX ^(m) (p) (m = 1 to M is summed) and x * are compared, and x * / S _E ^(M) (p) is determined in advance. A traffic management device comprising: a division number determination means for determining a flow division number suitable for abnormal traffic monitoring by performing a process of making the division number larger than M if the threshold value is smaller than the threshold value.

A traffic distribution device that is connected to L (L is a natural number) servers via a communication network and distributes traffic flows to the L servers.
Classifying the flows into L groups (L is a natural number), and a flow classification means for transferring the flows to each server for each group;
Time-series data generation means for generating time-series data of the number of flows of each group classified by the flow classification means as a traffic amount;
The parameters α (p) and β (p in the approximate expression y = α (p) x ² + β (p) x, which shows the relationship between the sample variance expectation y for the traffic volume and the sample mean expectation x ) Is calculated from the number of time-series flows generated by the time-series data generating means,
A distribution destination server determining means for determining whether or not to change a flow distribution destination server,
When the flow distributed to the server j (j = 1 to L) is a group j flow, the approximate expression calculation means calculates the sample mean X ^(j) (p) of the number of flows in the group j and the sample variance. Vx ^(j) (p) is calculated, and the set (X ^(j) (p), Vx ^(j) (p)) is (x _j , y _j ), and the approximate expression y = α (p) x ² By fitting to + β (p) x, the parameters α (p) and β (p) are obtained,
The distribution destination server determining means uses the approximate expression, so that the number of flows follows a predetermined distribution having parameters of average x and variance α (p) x ² + β (p) x. Calculate the number of flows in the percentile, and the number of flows in the upper X percentile is the processing capacity B _k × a of server k Calculate the average x_target_k such that (a is a coefficient less than ¹⁾ , and X ⁽¹⁾ (p) + X ⁽²⁾ (p) +… + X ^(J) (p) <= x_target_k As long as all the remaining groups are accommodated in one of the servers, the calculation process of accommodating the flows of groups 1, 2, ..., J to the server k is repeated for each server as long as it is satisfied. If there is a server that is no longer needed, it is decided to change the flow distribution destination server.

A traffic fluctuation amount estimation method executed by a traffic fluctuation amount estimation device that receives sample flow information collected by packet sampling at a sampling rate p in a communication network and estimates a traffic fluctuation quantity from the sample flow information,
Based on the sample flow information, a time-series data generation step for generating time-series data of the number of flows as a traffic amount;
The parameters α (p) and β (p in the approximate expression y = α (p) x ² + β (p) x, which shows the relationship between the sample variance expectation y for the traffic volume and the sample mean expectation x ), An approximate expression calculating step for calculating from the number of time-series flows generated by the time-series data generating step,
Traffic for estimating traffic fluctuation amount using the approximate expression y = α (p) x ² + β (p) x having parameters α (p) and β (p) calculated in the approximate expression calculating step a traffic variation estimation method Ru and a estimation step,
The traffic fluctuation amount estimation method includes a flow classification step of dividing the flow into M (M is a natural number) groups by using a flow source IP address in the sample flow information as a key,
In the flow classification step, the traffic change amount estimating apparatus, the flow, _one and M ₂ pieces M (M ₁ and M ₂ are each a natural number) are classified into two types of
In the approximate expression calculating step, the traffic fluctuation amount estimating device calculates a sample average X ^(m) ( from the number of time-series flows for each group m (m = 1 to M) generated by the time-series data generating step. p) and the sample variance Vx ^(m) (p), and calculate the expected value of the sample mean for the entire group as S _E ^(M) (p) = 1 / M ΣX ^(m) (p) (m = 1 to M M = M for the calculation to calculate the expected value of sample variance as S _V ^(M) (p) = 1 / M ΣVx ^(m) (p) (m = 1 to M is summed ⁾ ₁ and M ₂ are performed, and a set of S _E ^(Mi) (p) and S _V ^(Mi) (p) when M = M _i (i = 1, 2) is ^expressed as (x _i , Y _i ), the parameters α (p) and β (p)

The traffic fluctuation amount estimation method characterized by calculating by the following .

A flow division number determination method executed by a traffic management device that receives sample flow information collected by packet sampling at a sampling rate p in a communication network and determines a flow division number suitable for abnormal traffic monitoring from the sample flow information Because
Based on the sample flow information, a time-series data generation step for generating time-series data of the number of flows as a traffic amount;
An abnormality detection step that receives the number of time-series flows generated in the time-series data generation step and determines that it is abnormal when the number of flows exceeds a predetermined abnormality detection threshold;
The parameters α (p) and β (p in the approximate expression y = α (p) x ² + β (p) x, which shows the relationship between the sample variance expectation y for the traffic volume and the sample mean expectation x ) Is calculated from the number of normal time series flows that are not determined to be abnormal by the abnormality detection step,
When the number of abnormal flows is added to the number of normal flows, the (average, variance) is (x + m_d, α (p) x ² + β (p) x + v_d), and the average is distributed to the number of normal flows with m_d In the probability distribution Fx (u) having the average and variance of the number of flows when the number of abnormal flows with v_d is added, the abnormality determination threshold is set to th, and the predetermined abnormality miss rate is set to ε, Fx (th) < The number of normal average flows x * satisfying ε is calculated, the sample average X ^(j) (p) of the number of flows in group j when the flow is divided by the number of divisions of M is obtained, and the sample average S _E ^{( M)} (p) = 1 / M ΣX ^(m) (p) (m = 1 to M is summed) and x * are compared, and x * / S _E ^(M) (p) is determined in advance. A division number determination step for determining a flow division number suitable for abnormal traffic monitoring by performing a process for making the division number greater than M if the threshold value is smaller than the threshold value. .

A distribution destination server determination method executed by a traffic distribution device that is connected to L (L is a natural number) servers via a communication network and distributes traffic flows to the L servers.
Classifying the flows into L groups (L is a natural number), and a flow classification step of transferring the flows to each server for each group;
A time-series data generation step for generating time-series data of the number of flows of each group classified in the flow classification step as a traffic amount;
The parameters α (p) and β (p in the approximate expression y = α (p) x ² + β (p) x, which shows the relationship between the sample variance expectation y for the traffic volume and the sample mean expectation x ), An approximate expression calculating step for calculating from the number of time-series flows generated by the time-series data generating step,
A distribution destination server determination step for determining whether or not to change a flow distribution destination server,
In the approximate expression calculation step, when the flow distributed to the server j (j = 1 to L) is the flow of the group j, the sample average X ^(j) (p) of the number of flows of the group j and the sample variance Vx ^(j) (p) is calculated, and the set (X ^(j) (p), Vx ^(j) (p)) is (x _j , y _j ), and the approximate expression y = α (p) x ² By fitting to + β (p) x, the parameters α (p) and β (p) are obtained,
In the allocation destination server determination step, by using the approximate expression, the upper X is assumed to follow a predetermined distribution with the number of flows as an average x and variance α (p) x ² + β (p) x as parameters. Calculate the number of flows in the percentile, and the number of flows in the upper X percentile is the processing capacity B _k × a of server k Calculate the average x_target_k such that (a is a coefficient less than ¹⁾ , and X ⁽¹⁾ (p) + X ⁽²⁾ (p) +… + X ^(J) (p) <= x_target_k As long as all the remaining groups are accommodated in one of the servers, the calculation process of accommodating the flows of groups 1, 2, ..., J to the server k is repeated for each server as long as it is satisfied. A method of determining a distribution destination server, characterized in that if there is a server that is no longer needed, it is determined to change the flow distribution destination server.