JP5545674B2

JP5545674B2 - Simultaneous flow number variation estimation method and apparatus

Info

Publication number: JP5545674B2
Application number: JP2011148569A
Authority: JP
Inventors: 亮一川原; 憲昭上山; 達哉森; 哲哉滝根
Original assignee: Nippon Telegraph and Telephone Corp; Osaka University NUC
Current assignee: Nippon Telegraph and Telephone Corp; Osaka University NUC
Priority date: 2011-07-04
Filing date: 2011-07-04
Publication date: 2014-07-09
Anticipated expiration: 2031-07-04
Also published as: JP2013017055A

Description

本発明は、同時フロー数変動量推定方法及び装置に係り、特に、フロー毎に状態を管理するネットワークシステム、例えばフロールータ（http://anagran.com/を参照）や、IPv4アドレス枯渇問題への対策として、ISP（Internet Service Provider）がNAT(Network Address Translation)を用いて通信ネットワークを構成する際に設置するlarge scale NAT(LSN)(T. Nishitani et al., "Common functions of large scale NAT(LSN)," draftnishitani-cgn-02, IETF, May 2009.参照)などにおいて必要なフローテーブルのサイズを見積もる際に必要となる同時フロー数の変動量を推定するための同時フロー数変動量推定方法及び装置に関するものである。 The present invention relates to a method and an apparatus for estimating the number of simultaneous flow fluctuations, and in particular, to a network system that manages a state for each flow, for example, a flow router (see http://anagran.com/) and an IPv4 address exhaustion problem. As a countermeasure, ISP (Internet Service Provider) uses large scale NAT (LSN) (T. Nishitani et al., “Common functions of large scale NAT” installed when configuring a communication network using NAT (Network Address Translation). (LSN), "draftnishitani-cgn-02, IETF, May 2009.) etc. Estimate the number of concurrent flows to estimate the number of concurrent flows required to estimate the size of the required flow table It relates to a method and a device.

IPネットワークが広く利用されてくるに伴ってIPネットワークの利用形態やアプリケーションも多様化し、それに伴ってネットワークへ加わるトラヒックも多様化している。一方で、トラヒックは年々増加しており、トラヒックがネットワークへどのようなインパクトを与えるかを見積もることが非常に重要となってきている。例えば、動画共有トラヒックや、ビデオやTVなどのリッチコンテンツ配信によるトラヒックなどの増加が指摘されている（例えば、非特許文献1参照）。 As IP networks are widely used, the usage forms and applications of IP networks are diversified, and the traffic added to the networks is also diversified. On the other hand, traffic has been increasing year by year, and it has become very important to estimate the impact that traffic will have on the network. For example, it has been pointed out that there is an increase in video sharing traffic and traffic due to rich content distribution such as video and TV (see Non-Patent Document 1, for example).

一方、上記のように、１通信当りのトラヒック量が多いことに加えて、例えば、近年のアプリケーションでは、一つのホストが多数のフローを同時に発生することが指摘されている（例えば、非特許文献2参照）。 On the other hand, as described above, in addition to a large amount of traffic per communication, for example, in recent applications, it has been pointed out that one host generates many flows simultaneously (for example, non-patent literature). 2).

また、SYN floodingやネットワークスキャンなどの異常トラヒックによって大量のフローが生成されることもある。このようなフロー数の多いトラヒックがネットワークに加わると、large scale NAT(LSN)へのインパクトが大きくなると予想される。ここで、LSNとは、IPv4アドレス枯渇問題への対策として考えられている方式である。図１(a)に示すように、ISPは、LSNを内部ネットワークと外部ネットワークの境界に設置し、内部ネットワークでは独自のアドレスを用い、それを内部ホストに割り当て、インターネットへアクセスする際にはLSNにおいて内部ホストの独自アドレスをグローバルIPv4アドレスへ変換することで、インターネット上のホストとの通信を可能にしている。 Also, anomalous traffic such as SYN flooding and network scanning can generate a large amount of flows. If traffic with such a large number of flows is added to the network, the impact on large scale NAT (LSN) is expected to increase. Here, LSN is a method considered as a countermeasure against the IPv4 address exhaustion problem. As shown in Fig. 1 (a), an ISP installs an LSN at the boundary between an internal network and an external network, uses a unique address in the internal network, assigns it to an internal host, and accesses the Internet when accessing the Internet. Enables communication with hosts on the Internet by converting the internal address of the internal host into a global IPv4 address.

例えば、NAT444, dual stack (DS)-lite呼ばれる方式が提案されている（例えば、非特許文献３参照）。図１(a)のNAT444では、宅内のエンドホストは、宅内で用いるプライベートIPv4アドレスXを用いて、customer premise equipment (CPE)-NATへパケットを送信する。CPE-NATで、アドレスXは、ISP内で用いる独自アドレスYに変換される。この独自アドレスは例えば「ISPシェアドアドレス」（例えば、非特許文献４参照）と呼ばれるものを利用する。これは、Internet assigned numbers authority(IANA)によって未割当のIPv4アドレスを、ISP間でシェア可能なアドレスとして規定するという案である。独自アドレスYを付与されたパケットはLSNまで転送され、そこで、グローバルIPv4アドレスに変換後、インターネットへと転送される。 For example, a method called NAT444, dual stack (DS) -lite has been proposed (see, for example, Non-Patent Document 3). In NAT444 in FIG. 1 (a), a home end host transmits a packet to customer premise equipment (CPE) -NAT using a private IPv4 address X used in the home. In CPE-NAT, the address X is converted to a unique address Y used in the ISP. As this unique address, for example, a so-called “ISP shared address” (for example, see Non-Patent Document 4) is used. This is a proposal to specify an unassigned IPv4 address as an address that can be shared between ISPs by the Internet assigned numbers authority (IANA). The packet with the unique address Y is transferred to the LSN, where it is converted to a global IPv4 address and then transferred to the Internet.

図１(b)のDS-liteでは、ISPはIPv6を内部で利用する。図１(a)と同様に、エンドホストからのパケットはプライベートアドレスXでDS-lite home gateway (HGW)まで転送され、そのパケットはLSNまでIPv4 over an IPv6トンネルを用いてLSNまで転送される。そして、LSNで、そのパケットのアドレスを、グローバルIPv4アドレスに変換する。 In DS-lite in Fig. 1 (b), ISP uses IPv6 internally. As in FIG. 1A, a packet from the end host is transferred to the DS-lite home gateway (HGW) with the private address X, and the packet is transferred to the LSN using the IPv4 over an IPv6 tunnel to the LSN. Then, the LSN converts the address of the packet into a global IPv4 address.

以上のいずれの方式も、LSNの内部で、フローの状態を管理する必要がある。ここで、フローとは{srcIP，dstIP，srcPort，dstPort，Protocol}の5つ組を同じくするパケット群を指す。 In any of the above methods, it is necessary to manage the flow state within the LSN. Here, the flow refers to a packet group having the same set of five sets of {srcIP, dstIP, srcPort, dstPort, Protocol}.

フローの状態の管理の例を図２に示す。この例では、ネットワーク外部ホストX1（インターネット上のホスト）129.1.1.1とネットワーク内部ホストY1（プライベートアドレス：ポート番号＝192.168.0.1:1024）と通信し、ネットワーク外部ホストX2 129.2.2.2とネットワーク内部ホストY2（192.168.0.1:1025）と通信しているとする。通常のNAPTだと、２つの内部ホストに対して、それぞれ別の{グローバルIPv4アドレス，ポート番号}を割り当てる。例えば、192.168.0.1:1024⇔129.0.0.1:1000，192.168.0.1:1025⇔129.0.0.1:1001、というように、２つのプライベートアドレス：ポートに対して、129.0.0.1:1000, 129.0.0.1:1001という２つのグローバルIPv4：ポートを割り当てる。
以上，図２に示すように，フロー状態を管理する必要のあるシステムにおいては、フロー管理テーブルのサイズを適切に設計する必要がある。LSNへのトラヒックのインパクト評価として、非特許文献５〜７の報告がある。これらは、ホスト当りの生成フロー数の平均やその分布がどうなっているかを、実測データから算出している。例えば、非特許文献５では、一部のホストが多数のフローを生成していることを示している。 An example of flow state management is shown in FIG. In this example, network external host X1 (host on the Internet) 129.1.1.1 and network internal host Y1 (private address: port number = 192.168.0.1:1024) communicate with network external host X2 129.2.2.2 and network internal host Suppose you are communicating with Y2 (192.168.0.1:1025). In the case of normal NAPT, different {global IPv4 address, port number} are assigned to two internal hosts. For example, 192.168.0.1:1024⇔129.0.0.1:1000, 192.168.0.1:1025⇔129.0.0.1:1001, and so on, for two private addresses: ports, 129.0.0.1:1000, 129.0.0.1: Two global IPv4: 1001 ports are assigned.
As described above, in the system that needs to manage the flow state as shown in FIG. 2, it is necessary to appropriately design the size of the flow management table. There are reports of Non-Patent Documents 5 to 7 as traffic impact evaluation to LSN. These calculate the average number of generated flows per host and how the distribution is based on the measured data. For example, Non-Patent Document 5 indicates that some hosts generate a large number of flows.

Cisco Visual Networking Index: Forecast and Methodology, 2008-2013, http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white paper c11-481360.pdfCisco Visual Networking Index: Forecast and Methodology, 2008-2013, http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white paper c11-481360.pdf S. Miyakawa, "From IPv4 only To v4/v6 Dual Stack," IETF72 IABTechnical Plenary, Aug. 2008. http://www.nttv6.jp/ miyakawa/IETF72/IETF-IAB-TECH-PLENARY-NTT-miyakawa.pdfS. Miyakawa, "From IPv4 only To v4 / v6 Dual Stack," IETF72 IABTechnical Plenary, Aug. 2008. http://www.nttv6.jp/ miyakawa / IETF72 / IETF-IAB-TECH-PLENARY-NTT-miyakawa. pdf T. Nishitani et al., "Common functions of large scale NAT (LSN)," draftnishitani-cgn-02, IETF, May 2009.T. Nishitani et al., "Common functions of large scale NAT (LSN)," draftnishitani-cgn-02, IETF, May 2009. Y. Shirasaki et al., "NAT444 with ISP shared address," draft-shirasakinat444-isp-shared-addr-01 (work in progress), March 2009.Y. Shirasaki et al., "NAT444 with ISP shared address," draft-shirasakinat444-isp-shared-addr-01 (work in progress), March 2009. G. Maier et al., "On dominant characteristics of residential broadband Internet traffic," ACM IMC 2009.G. Maier et al., "On dominant characteristics of residential broadband Internet traffic," ACM IMC 2009. http://www.janog.gr.jp/meeting/janog24/program/d2p5.htmlhttp://www.janog.gr.jp/meeting/janog24/program/d2p5.html 屏, "ISPへのNAT導入によるユーザ影響評価,"電子情報通信学会誌, Vol. 93, No. 6, pp. 473-478, 2010年6月.Tsuji, "Evaluation of user influence by introducing NAT into ISP," IEICE Journal, Vol. 93, No. 6, pp. 473-478, June 2010.

しかしながら、上記の非特許文献５〜７においては、１ホスト当りの生成フロー数分布に関する報告、あるいは、その結果、全ホストを多重したときに平均して１ホスト当り何フロー生成しているかといった分析を行っているだけである。一方、フロー管理テーブルのサイズを設計するには、複数のホストを多重していったときに、フローテーブルにおけるエントリ数（同時フロー数）の変動量、つまり分散がどのようになるかを把握する必要がある。例えば、ホストを多重したときの同時フロー数の平均が100本であったとしても、その分散が100なのか10000なのかで必要となるフローテーブルサイズは大きく異なるからである。分散が100の場合、標準偏差はその平方根である10となり、例えば、同時フロー数の変動を考慮して、平均＋３×標準偏差＝130エントリ分のテーブルを用意すれば、同時フロー数が正規分布に従う場合にはエントリ成功率は99.87%とできる。一方、分散が10000の場合には、同じ成功率を達成するためには、平均＋３×標準偏差＝400エントリ用意する必要がある。以上のように、同時フロー数の変動量（分散）を正しく見積もる必要があるという課題があった。 However, in the above non-patent documents 5 to 7, a report on the distribution of the number of flows generated per host, or an analysis of how many flows are generated per host on average when all hosts are multiplexed. Just doing. On the other hand, to design the size of the flow management table, when multiple hosts are multiplexed, grasp the amount of change in the number of entries in the flow table (number of simultaneous flows), that is, how the distribution will be. There is a need. For example, even if the average number of simultaneous flows when the hosts are multiplexed is 100, the required flow table size differs greatly depending on whether the distribution is 100 or 10000. When the variance is 100, the standard deviation is 10 which is the square root. For example, if the table of average +3 x standard deviation = 130 entries is prepared in consideration of the fluctuation of the number of concurrent flows, the number of concurrent flows is normally distributed. , The entry success rate can be 99.87%. On the other hand, if the variance is 10000, it is necessary to prepare an average + 3 × standard deviation = 400 entries in order to achieve the same success rate. As described above, there is a problem that it is necessary to correctly estimate the fluctuation amount (dispersion) of the number of simultaneous flows.

本発明の目的は、上述の問題点に鑑み、ネットワーク内で測定可能な統計量を用いて、フロー状態を管理するネットワーク装置内にて必要となるフローテーブルのサイズを決定するための、同時フロー数の変動量（分散）を推定するための同時フロー数変動量推定方法及び装置を提供することにある。 In view of the above-described problems, an object of the present invention is to use a statistic that can be measured in a network to determine the size of a flow table required in a network device that manages a flow state. It is an object to provide a simultaneous flow number fluctuation amount estimation method and apparatus for estimating the number fluctuation amount (variance).

本発明の第1の方法においては、
{srcIP，dstIP，srcPort，dstPort，Protocol}の５つ組みを同じくするパケット群をフローと定義し、フロー毎の状態を管理するネットワーク装置において必要となるフローエントリ数を見積もるために、
ネットワーク内を通過するフローiの情報を、予め定めた測定期間Tの間、収集して記憶手段に格納する。ここで、フローiの情報とは、フローキーi={srcIP，dstIP，srcPort，dstPort，Protocol}、フローiの開始時刻Tfirst_iと終了時刻Tlast_i、およびそれらから計算されるフロー持続時間d_i=Tlast_i-Tfirst_iとなる。ここでd_i>0となるフローがN本収集されたとし、フロー持続時間の平均をavg_d=Σd_i/Nとする。これら収集されたフローの中から、ランダムに２つのフローiとjを選択し、Hmin=min(d_i,d_j)とし、これを予め定めた回数M回実施する。m番目の試行におけるHminの値をHmin(k)とし、その平均avg_Hmin=ΣHmin(m)/Mにより計算する。 In the first method of the present invention,
In order to estimate the number of flow entries required in a network device that manages the state of each flow by defining a packet group that uses the same five sets of {srcIP, dstIP, srcPort, dstPort, Protocol} as a flow,
Information on the flow i passing through the network is collected for a predetermined measurement period T and stored in the storage means. Here, the flow i information includes flow key i = {srcIP, dstIP, srcPort, dstPort, Protocol}, start time Tfirst_i and end time Tlast_i of flow i, and flow duration d_i = Tlast_i− calculated from them. Tfirst_i. Here, it is assumed that N flows with d_i> 0 are collected, and the average of the flow durations is avg_d = Σd_i / N. Two flows i and j are selected at random from these collected flows, and Hmin = min (d_i, d_j) is set, and this is performed M times in a predetermined number of times. The value of Hmin in the m-th trial is Hmin (k), and the average is calculated by avg_Hmin = ΣHmin (m) / M.

次に、フロー情報を用いて、srcIP#x毎の発生フロー数に関する統計情報を求める。具体的には、まず、測定期間Tを、Tより小さい測定周期τで分割し（区間がT/τ=L個生成される）、k番目の測定区間における、srcIP#xからの発生フロー数をF(x,k)とする。F(x,k)を用いて、あるsrcIPから生成されるフロー数の平均Eと分散Vを計算する。 Next, using the flow information, statistical information regarding the number of generated flows for each srcIP # x is obtained. Specifically, first, the measurement period T is divided by a measurement period τ smaller than T (T / τ = L sections are generated), and the number of flows generated from srcIP # x in the kth measurement section Is F (x, k). The average E and the variance V of the number of flows generated from a certain srcIP are calculated using F (x, k).

そして、単位時間当たりの到着srcIP数Aを、A＝N/T/Eにより計算する。 Then, the number A of arrival srcIPs per unit time is calculated by A = N / T / E.

以上の準備の下、同時フロー数の分散VarLを以下の式で算出することを特徴とする。 With the above preparation, the variance VarL of the number of simultaneous flows is calculated by the following formula.

VarL=A×(V+E²-E)×avg_Hmin+A×E×avg_d （１）
本発明の第２の方法においては、第１の方法におけるEとVを計算する方法として、出現srcIPの集合をw_sとしたとき、x∈w_sかつk∈[1,L]に対し収集されたF(x,k)のうち、F(k,x)>0を満たすものだけを抽出する。そして抽出されたF(k,x)の数をLWとし、
E=ΣF(k,x)/LW，V=ΣF(k,x)²/LW-E²
により計算することを特徴とする。 VarL = A × (V + E ² -E) × avg_Hmin + A × E × avg_d (1)
In the second method of the present invention, as a method for calculating E and V in the first method, when the set of occurrences srcIP is w_s, x∈w_s and k∈ [1, L] are collected. Of F (x, k), only those satisfying F (k, x)> 0 are extracted. And let LW be the number of F (k, x) extracted,
E = ΣF (k, x) / LW, V = ΣF (k, x) ² / LW-E ²
It is characterized by calculating by.

本発明の第３の方法においては、
第１の方法では、収集フロー情報からAを計算していたのに対して、将来時点でのAが予め定めたA*になると仮定したときの、同時フロー数の分散を
VarL=A*×(V+E²-E)×avg_Hmin+A*×E×avg_d
により見積もることを特徴とする。 In the third method of the present invention,
In the first method, while calculating A from the collected flow information, the distribution of the number of concurrent flows when assuming that A at a future time becomes A * determined in advance
VarL = A * × (V + E ² -E) × avg_Hmin + A * × E × avg_d
It is characterized by estimating by.

本発明の第４の方法においては、
第１または３の方法で、同時フロー数の分散を推定する際に、推定に用いるフロー情報のうち、宛先ポート番号が特定の番号（スキャンなどの異常トラヒックに多く見られるポート番号）であるフローを除外してから、分散の推定を行うことを特徴とする。 In the fourth method of the present invention,
When estimating the variance of the number of simultaneous flows by the first or third method, the flow whose destination port number is a specific number (port number often seen in abnormal traffic such as scan) among the flow information used for estimation The variance is estimated after removing.

本発明の第５の方法においては、
第１の方法では、収集フロー情報からAを計算していたのに対して、フロー情報の収集対象となるユーザ数をZとして、仮に収容ユーザ数をZ'としたときの同時フロー数の分散を
VarL=A/Z×Z'×(V+E²-E)×avg_Hmin+ A/Z×Z'×E×avg_d
により見積もることを特徴とする。 In the fifth method of the present invention,
In the first method, A is calculated from the collected flow information, whereas the number of users to be collected for flow information is Z, and the number of concurrent flows is assumed when the number of accommodated users is Z ′. The
VarL = A / Z × Z '× (V + E ² -E) × avg_Hmin + A / Z × Z' × E × avg_d
It is characterized by estimating by.

本発明の第６の方法においては、
第１の方法におけるavg_dとavg_Hminを用いて
α=avg_Hmin/avg_d
を計算しておき、別途、第１の方法における測定期間Tよりも短い測定期間T'において収集されたフロー情報があったとする。このとき、該フロー情報を用いて、srcIP#x毎の発生フロー数に関する統計情報を求める。具体的には、まず、測定期間T'を、T'より小さい測定周期τで分割し（区間がT'/τ=L'個生成される）、k番目の測定区間における、srcIP#xからの発生フロー数をF(x,k)とする。F(x,k)を用いて、あるsrcIPから生成されるフロー数の平均Eと分散Vを計算する。また、単位時間当たりの到着srcIP数Aを、
A＝N/T'/E
により計算する。 In the sixth method of the present invention,
Using avg_d and avg_Hmin in the first method, α = avg_Hmin / avg_d
Assume that there is flow information collected separately in a measurement period T ′ shorter than the measurement period T in the first method. At this time, statistical information relating to the number of generated flows for each srcIP # x is obtained using the flow information. Specifically, first, the measurement period T ′ is divided by a measurement period τ smaller than T ′ (T ′ / τ = L ′ sections are generated), and from srcIP # x in the kth measurement section Let F (x, k) be the number of generated flows. The average E and the variance V of the number of flows generated from a certain srcIP are calculated using F (x, k). In addition, the number of arrival srcIP per unit time A,
A ＝ N / T '/ E
Calculate according to

また、測定期間T'における同時フロー数の平均avgL'を収集フロー情報より算出しておく。 Further, the average avgL ′ of the number of simultaneous flows in the measurement period T ′ is calculated from the collected flow information.

以上の準備の元、同時フロー数の分散Var_Lを以下の式で推定することを特徴とする。 Based on the above preparation, the variance Var_L of the number of simultaneous flows is estimated by the following equation.

VarL=A×(V+E²-E)×avgL'/(A×E)×α+avgL' VarL = A × (V + E ² -E) × avgL '/ (A × E) × α + avgL'

以上説明したように、本発明によれば、ネットワーク内で測定可能な統計量を用いて、フロー状態を管理するネットワーク装置内にて必要となるフローテーブルのサイズを決定するための、同時フロー数の変動量（分散）を推定することが可能になる。 As described above, according to the present invention, the number of simultaneous flows for determining the size of the flow table required in the network device that manages the flow state using the statistics that can be measured in the network. Can be estimated.

本発明において対象となるlarge scale NAT(LSN)の説明図である。It is explanatory drawing of large scale NAT (LSN) used as object in this invention. 本発明において対象となるLSN内でのフロー管理テーブルの例である。It is an example of the flow management table in LSN used as object in this invention. 本発明が適用されるIPネットワークの基本構成図である。1 is a basic configuration diagram of an IP network to which the present invention is applied. 本発明の第１の実施の形態における推定装置の構成図である。It is a block diagram of the estimation apparatus in the 1st Embodiment of this invention. 本発明の第１の実施の形態におけるフロー管理テーブル管理のフローチャートである。It is a flowchart of the flow management table management in the 1st Embodiment of this invention. ポアソン到着を仮定したときの標準偏差の推定結果と実測値の結果を比較したグラフである。It is the graph which compared the estimation result of the standard deviation when assuming Poisson arrival, and the result of actual measurement. 本発明の第２の実施の形態に適用されるシステム構成図である。It is a system block diagram applied to the 2nd Embodiment of this invention. 本発明を適用した実データ分析結果である。It is an actual data analysis result to which the present invention is applied.

以下図面と共に、本発明の実施の形態を説明する。 Embodiments of the present invention will be described below with reference to the drawings.

[第1の実施の形態]
図３は本発明が適用されるIPネットワークの基本構成図であり、その一例を示す。 [First embodiment]
FIG. 3 is a basic configuration diagram of an IP network to which the present invention is applied, and shows an example thereof.

図３に示すように本推定装置１０はノード１，２間のリンクに挿入される形態で利用される。あるいは、ノード（スイッチやルータ）において、パケットをポートへミラーして、そのポートの先に本推定装置を設置してもよい。あるいは、パケットを図３の構成で一旦キャプチャしておき、後処理でパケットを先頭から読み出して本実施の形態に記載の手順（以下に記載）を実施してもよい。 As shown in FIG. 3, the estimation apparatus 10 is used in a form of being inserted into a link between the nodes 1 and 2. Alternatively, in the node (switch or router), the packet may be mirrored to a port, and the estimation device may be installed at the end of the port. Alternatively, the packet may be once captured with the configuration of FIG. 3, and the procedure described in the present embodiment (described below) may be performed by reading the packet from the beginning in post-processing.

図４は、本発明の第１の実施の形態における推定装置の構成を示す。 FIG. 4 shows the configuration of the estimation apparatus according to the first embodiment of the present invention.

同図に示す推定装置１０は、前述の第１の方法で同時フロー数の分散を推定し、さらに第２の方法でその推定に必要となるホスト当りの発生フロー数の平均と分散を計算する装置であり、パケット解析部１１、フロー管理部１２、フロー統計情報収集部１３、同時フロー数分散推定部１４から構成される。 The estimation apparatus 10 shown in the figure estimates the variance of the number of simultaneous flows by the first method described above, and further calculates the average and variance of the number of generated flows per host necessary for the estimation by the second method. The apparatus includes a packet analysis unit 11, a flow management unit 12, a flow statistical information collection unit 13, and a simultaneous flow number variance estimation unit 14.

また、当該推定装置のフロー管理テーブル管理の動作のフローを図５に示す。 FIG. 5 shows a flow of the flow management table management operation of the estimation apparatus.

図４において、前段ノード１からパケットが到着したら、パケット解析部１１において、そのパケットのフローキー＝{srcIP#i，dstIP#i，srcPort#i，dstPort#i，Protocol#i}を読み出し、フロー管理部１２にフローキーを通知する。その後、パケットを後段ノード２に転送する（図５のステップ１０１）。 In FIG. 4, when a packet arrives from the preceding node 1, the packet analysis unit 11 reads out the flow key of the packet = {srcIP # i, dstIP # i, srcPort # i, dstPort # i, Protocol # i} The management unit 12 is notified of the flow key. Thereafter, the packet is transferred to the subsequent node 2 (step 101 in FIG. 5).

フロー管理部１２では、フロー管理テーブル（図示せず）を内部または外部に用意し、そのテーブルでは、フローキーi＝{srcIP#i，dstIP#i，srcPort#i，dstPort#i，Protocol#i}、および最初ならびに最後にパケットが到着した時刻Tfirst_i，Tlast_i，の２つを記憶しておく。該パケットのフローキーi＝{srcIP#i，dstIP#i，srcPort#i，dstPort#i，Protocol#i}が、フロー管理テーブルに既にエントリされているかチェックする（図５のステップ１０２）。新規であれば、パケットがTCPかつFINまたはRSTにフラグが立っていたら（図５のステップ１０３）何もしない。FINかつRSTでなければ、フロー管理テーブルに新規にエントリして、Tfirst_i←Tnow(現在の時刻)、Tlast_i←Tnowに設定する（図５のステップ１０４）。 The flow management unit 12 prepares a flow management table (not shown) inside or outside, and in the table, the flow key i = {srcIP # i, dstIP # i, srcPort # i, dstPort # i, Protocol # i }, And the first and last packet arrival times Tfirst_i and Tlast_i are stored. It is checked whether the flow key i = {srcIP # i, dstIP # i, srcPort # i, dstPort # i, Protocol # i} of the packet has already been entered in the flow management table (step 102 in FIG. 5). If it is new, if the packet is TCP and FIN or RST are flagged (step 103 in FIG. 5), nothing is done. If it is not FIN and RST, a new entry is made in the flow management table, and Tfirst_i ← Tnow (current time) and Tlast_i ← Tnow are set (step 104 in FIG. 5).

既出と判定されたら（図５のステップ１０２、Yes）、パケットがTCPかつFINまたはRSTにフラグが立っているかチェックする（図５のステップ１０５）。もしフラグが立っていれば（図５のステップ１０５、Yes）、Tlast_i←Tnowと更新してから、該フローの情報（つまり、フローキーiと、Tfirst_i，Tlast_i）をフロー統計情報収集部に出力し、フロー管理テーブルから該情報を削除する（図５のステップ１０６）。もし、フラグが立っていなければ（図５のステップ１０５、No）、現在の時刻Tnowと、Tlast_iに予め定めたタイマー値Toutを足したものTlast_i+Toutを比較する（図５のステップ１０７）。もしTnowの方が大きければ（図５のステップ１０７、Yeｓ）、フロー管理テーブルのTlast_iをTlast_i+Toutに更新してから、該フローの情報（つまり、フローキーiと、Tfirst_i，Tlast_i）をフロー統計情報収集部１３に出力し、フロー管理テーブルから該情報を削除する（図５のステップ１０８）。 If it is determined that the packet has already been issued (step 102 in FIG. 5, Yes), it is checked whether the packet is TCP and a flag is set in FIN or RST (step 105 in FIG. 5). If the flag is set (step 105 in FIG. 5, Yes), Tlast_i ← Tnow is updated, and then the flow information (that is, flow key i, Tfirst_i, Tlast_i) is output to the flow statistics information collection unit. Then, the information is deleted from the flow management table (step 106 in FIG. 5). If the flag is not set (No in step 105 in FIG. 5), the current time Now is compared with Tlast_i + Tout obtained by adding a predetermined timer value Tout to Tlast_i (step 107 in FIG. 5). If Ton is larger (step 107 in FIG. 5, Yes), Tlast_i in the flow management table is updated to Tlast_i + Tout, and then the flow information (that is, flow key i, Tfirst_i, Tlast_i) is flowed. The data is output to the statistical information collection unit 13 and the information is deleted from the flow management table (step 108 in FIG. 5).

フロー管理部１２は、以上を測定期間Tの間実施する。測定期間終了時点でフロー管理テーブルに残っているフロー情報はフロー統計情報収集部１３に出力する。ただし、このときのTlast_iは，Tlast_i+Tout<TnowであればTlast_iをTlast_i+Tout に更新し，そうでなければTlast_i をnullと設定してから出力する。 The flow management unit 12 performs the above during the measurement period T. The flow information remaining in the flow management table at the end of the measurement period is output to the flow statistical information collection unit 13. However, Tlast_i at this time is updated after Tlast_i is updated to Tlast_i + Tout if Tlast_i + Tout <Tnow, otherwise Tlast_i is set to null.

フロー統計情報収集部１３では、フロー管理部１２から出力されたフロー情報をメモリ（図示せず）に格納し、そのフロー情報のうち、Tlast_iがnullでない、かつ、Tfirst_iが測定開始からTout以上経過しているフローのみ抽出し、同時フロー数分散推定部１４に出力する。その理由は、Tout経過していないフローは、測定開始以前から通信を継続している可能性があるため、Tfirst_iが本当のフロー開始時刻とは限らないためである。 The flow statistics information collection unit 13 stores the flow information output from the flow management unit 12 in a memory (not shown). Among the flow information, Tlast_i is not null, and Tfirst_i has passed Tout or more from the start of measurement. Only the flows that have been extracted are extracted and output to the simultaneous flow number variance estimation unit 14. The reason is that Tfirst_i is not necessarily the true flow start time because there is a possibility that the communication for which Tout has not elapsed has continued communication before the start of measurement.

同時フロー数分散推定部１４では、フローiの開始時刻Tfirst_iと終了時刻Tlast_iからフロー持続時間d_i=Tlast_i-Tfirst_iを計算する。ここでd_i>0となるフローがN本収集されたとし、フロー持続時間の平均をavg_d=Σd_i/Nとする。これら収集されたフローの中から、ランダムに２つのフローiとjを選択し、Hmin=min(d_i,d_j)とし、これを予め定めた回数M回実施する。m番目の試行におけるHminの値をHmin(k)とし、その平均を、
avg_Hmin=ΣHmin(m)/M
により計算し、メモリ（図示せず）に格納する。 The simultaneous flow number variance estimation unit 14 calculates the flow duration d_i = Tlast_i−Tfirst_i from the start time Tfirst_i and the end time Tlast_i of the flow i. Here, it is assumed that N flows with d_i> 0 are collected, and the average of the flow durations is avg_d = Σd_i / N. Two flows i and j are selected at random from these collected flows, and Hmin = min (d_i, d_j) is set, and this is performed M times in a predetermined number of times. The value of Hmin in the m-th trial is Hmin (k), and the average is
avg_Hmin = ΣHmin (m) / M
And is stored in a memory (not shown).

次に、フロー統計情報収集部１３は、フロー情報を用いて、srcIP#x毎の発生フロー数に関する統計情報を求める。具体的には、まず、測定期間Tを、Tより小さい測定周期τで分割し（区間がT/τ=L個生成される）、k番目の測定区間における、srcIP#xからの発生フロー数をF(x,k)とする。同時フロー数分散推定部１４は、当該フロー数F(x,k)を用いて、あるsrcIPから生成されるフロー数の平均Eと分散Vを計算する。そして、単位時間当たりの到着srcIP数Aを、A＝N/T/Eにより計算する。 Next, the flow statistical information collection unit 13 obtains statistical information regarding the number of generated flows for each srcIP # x using the flow information. Specifically, first, the measurement period T is divided by a measurement period τ smaller than T (T / τ = L sections are generated), and the number of flows generated from srcIP # x in the kth measurement section Is F (x, k). The simultaneous flow number variance estimation unit 14 calculates the average E and variance V of the number of flows generated from a certain srcIP using the number of flows F (x, k). Then, the number A of arrival srcIPs per unit time is calculated by A = N / T / E.

以上の準備の下、同時フロー数分散推定部１４は、同時フロー数の分散VarLを以下の式で算出する。 With the above preparation, the concurrent flow number variance estimation unit 14 calculates the variance VarL of the number of concurrent flows by the following equation.

VarL=A×(V+E²-E)×avg_Hmin+A×E×avg_d (1)
ここで上記数式(1)について説明する。そのために、図６の実験結果についてまず説明する。これは本発明者らによる先行研究（文献１：R. Kawahara, T. Mori, T. Yada, "Analysis of impact of traffic on large-scale NAT, " IEICE Technical Report, vol. 110, no. 224, IN2010-78, pp. 75-80, Oct. 2010. ）における結果である。 VarL = A × (V + E ² -E) × avg_Hmin + A × E × avg_d (1)
Here, the mathematical formula (1) will be described. For that purpose, the experimental results of FIG. 6 will be described first. This is a previous study by the present inventors (Reference 1: R. Kawahara, T. Mori, T. Yada, "Analysis of impact of traffic on large-scale NAT," IEICE Technical Report, vol. 110, no. 224, IN2010-78, pp. 75-80, Oct. 2010.).

図６のグラフのうち、"Poisson"'というラインは、フローの発生が、ポアソン過程と呼ばれるトラヒック理論でよく用いられる確率過程に従うと仮定して、同時フロー数の標準偏差（＝分散の平方根）を見積もった結果である。 In the graph of FIG. 6, the line “Poisson” indicates that the flow generation follows a stochastic process often used in traffic theory called the Poisson process, and the standard deviation of the number of simultaneous flows (= square root of variance). It is the result of estimating.

それに対して、図中の"data set A"、"data set B"は、あるインターネットトラヒックの実測データを用いて、同時フロー数の平均と標準偏差をプロットした値である。これより、ポアソンを仮定したラインと、実測値の結果は大きくずれていることが分かる。 On the other hand, “data set A” and “data set B” in the figure are values obtained by plotting the average and standard deviation of the number of simultaneous flows using actual measured data of Internet traffic. From this, it can be seen that the line assuming Poisson and the result of the actual measurement value deviate greatly.

単位時間当たりの到着フロー数をλ、フローの持続時間Hの平均をE[H]と置くと、同時フロー数Lの平均E[L]は、E[L]=λ×E[H]で与えられる。これはフローの発生がポアソンかどうかに依らず成立し、リトルの公式と呼ばれる（文献２：「滝根，伊藤，西尾，ネットワーク設計理論，岩波講座」参照）。一方、フローの発生がポアソンに従う場合、待ち行列モデルにおけるM/G/∞を適用すると系内滞在客数（つまり同時フロー数）の分散Var[L]は、Var[L]=E[L]となる。つまり、標準偏差はsqrt(E[L])となる。従って、図６のPoissonのラインは、y=sqrt(x)のラインということになる。 Assuming that the number of arrival flows per unit time is λ and the average flow duration H is E [H], the average E [L] of the number of simultaneous flows L is E [L] = λ × E [H] Given. This is true regardless of whether the flow is generated by Poisson, and is called Little's formula (refer to Reference 2: “Takine, Ito, Nishio, Network Design Theory, Iwanami Lecture”). On the other hand, when flow generation follows Poisson, applying M / G / ∞ in the queuing model, the variance Var [L] of the number of visitors in the system (that is, the number of simultaneous flows) is Var [L] = E [L] Become. That is, the standard deviation is sqrt (E [L]). Therefore, the Poisson line in FIG. 6 is a line of y = sqrt (x).

図６のように、実測値が理論値と大きく乖離する理由の一つとして、フローがポアソン過程に従わず、同時に複数のフローが発生することが考えられる。実際、前述の非特許文献２では、最近のアプリケーションが同時に何十本ものフローを生成することを指摘している。 As shown in FIG. 6, as one of the reasons why the actually measured value greatly deviates from the theoretical value, it is considered that the flow does not follow the Poisson process and a plurality of flows are generated at the same time. In fact, the aforementioned non-patent document 2 points out that recent applications generate dozens of flows at the same time.

そこで本発明では、フローの集団到着モデルを考える。具体的には、M[x]/G/∞という集団到着モデルの適用を考える。この場合、各ホスト(=srcIP)がポアソンで到着し、ホスト到着時に同時に複数のフローが発生するモデルとなる。このとき、同時フロー数の分散Var[L]は、次式で計算される（文献３：J. Keilson and A. Seidmann, "M/G/∞ with batch arrivals," Operations Research Letters, Oct. 1988. 参照）。 Therefore, in the present invention, a collective arrival model of flows is considered. Specifically, consider the application of the collective arrival model M [x] / G / ∞. In this case, each host (= srcIP) arrives at Poisson, and a plurality of flows are generated at the same time when the host arrives. At this time, the variance Var [L] of the number of simultaneous flows is calculated by the following equation (Reference 3: J. Keilson and A. Seidmann, “M / G / ∞ with batch arrivals,” Operations Research Letters, Oct. 1988. .)

Var[L] =λE[K(K-1)]E[min(H1,H2)]+E[L] （２）
ここで、Kはあるホストが到着した際に生成するフロー数、H1，H2はフローの持続時間、を表す確率変数である。 Var [L] = λE [K (K-1)] E [min (H1, H2)] + E [L] (2)
Here, K is a random variable representing the number of flows generated when a certain host arrives, and H1 and H2 are flow durations.

本発明においては、この式(2)の、E[K(K-1)]，E[min(H1,H2)]をネットワークで測定可能な統計量から決定している。 In the present invention, E [K (K-1)] and E [min (H1, H2)] in this equation (2) are determined from statistics that can be measured by the network.

具体的には、収集されたフロー情報から、フローiの持続時間d_iを計算し、これらフローの中から、ランダムに２つのフローiとjを選択し、Hmin=min(d_i,d_j)とし、これを予め定めた回数M回実施する。m番目の試行におけるHminの値をHmin(k)とし、その平均を
avg_Hmin=ΣHmin(m)/M
により計算する。そして上記の数式(２)におけるE[min(H1,H2)]を、avg_Hminにより推定する。
次に、測定期間Tを、Tより小さい測定周期τで分割し（区間がT/τ=L個生成される）、k番目の測定区間における、srcIP#xからの発生フロー数をF(x,k)とし、F(x,k)を用いて、あるsrcIPから生成されるフロー数の平均Eと分散Vを計算する。そして数式(２)におけるE[K(K-1)]をEとVを用いて、V+E²-Eにより推定する。これは、
E[K(K-1)]=E[K²]-E[K]=Var[K]+E[K]²-E[K]
となるからである。 Specifically, the duration d_i of the flow i is calculated from the collected flow information, and two flows i and j are randomly selected from these flows, and Hmin = min (d_i, d_j) is set. This is performed a predetermined number of times M. The value of Hmin in the mth trial is Hmin (k), and the average is
avg_Hmin = ΣHmin (m) / M
Calculate according to Then, E [min (H1, H2)] in the above equation (2) is estimated by avg_Hmin.
Next, the measurement period T is divided by a measurement period τ smaller than T (T / τ = L sections are generated), and the number of flows generated from srcIP # x in the kth measurement section is F (x , k) and F (x, k) is used to calculate the average E and variance V of the number of flows generated from a certain srcIP. Then, E [K (K−1)] in Equation (2) is estimated by V + E ² −E using E and V. this is,
E [K (K-1)] = E [K ² ] -E [K] = Var [K] + E [K] ² -E [K]
Because it becomes.

数式(２)におけるE[L]は、リトルの公式E[L]=λ×E[H]より、到着ホスト数の平均Aにホスト到着当りの発生フロー数の平均EをかけたものA×Eを到着フロー数λの推定値とし、それと持続時間の平均d_avgを用いて、A×E×d_avgによりE[L]を推定している。 E [L] in Equation (2) is obtained by multiplying the average A of the number of arriving hosts by the average E of the number of generated flows per host arrival from the Little's formula E [L] = λ × E [H] A × E [L] is estimated by A × E × d_avg using E as an estimated value of the number of arrival flows λ and the average d_avg of duration.

[第２の実施の形態]
第１の実施の形態では、図３のように、ノード１，２間に挿入する形で本推定装置を構成したが、図７に示すように、ネットワーク内の各ルータ３からフロー情報を出力させて、それを収集する形態でもよい。なお、推定装置の構成自体は、図４と同様であり、推定装置の処理も第１の実施の形態と略同様であるが、一般的なルータのフロー出力機能においてはフロー持続時間d_i=0となるフロー（例えば、１パケットだけからなるフローで、そのパケットがTCPのFINまたはRSTが付与されている）も存在するため、d_i>0となるフローのみを対象とする必要がある。 [Second Embodiment]
In the first embodiment, as shown in FIG. 3, the estimation apparatus is configured to be inserted between the nodes 1 and 2, but as shown in FIG. 7, the flow information is output from each router 3 in the network. It is also possible to collect it. The configuration of the estimation device itself is the same as that in FIG. 4 and the processing of the estimation device is substantially the same as in the first embodiment. However, in a general router flow output function, the flow duration d_i = 0 (For example, a flow consisting of only one packet to which the TCP FIN or RST is assigned), it is necessary to target only a flow where d_i> 0.

［第３の実施の形態］
本実施の形態における推定装置の構成は図４と同様である。 [Third Embodiment]
The configuration of the estimation apparatus in the present embodiment is the same as that in FIG.

第１の実施の形態における平均Eと分散Vを計算する方法として、本実施の形態では、同時フロー数分散推定部１４において、出現srcIPの集合をw_sとしたとき、x∈w_sかつk∈[1,L]に対し収集されたF(x,k)のうち、F(k,x)>0を満たすものだけを抽出する。そして抽出されたF(k,x)の数をLWとし、平均Eと分散Vを、
E=ΣF(k,x)/LW，V=ΣF(k,x)²/LW-E²
により計算する。 As a method for calculating the average E and variance V in the first embodiment, in the present embodiment, in the simultaneous flow number variance estimation unit 14, when the set of appearance srcIP is w_s, x∈w_s and k∈ [ Of F (x, k) collected for [1, L], only those satisfying F (k, x)> 0 are extracted. And let LW be the number of F (k, x) extracted, and average E and variance V
E = ΣF (k, x) / LW, V = ΣF (k, x) ² / LW-E ²
Calculate according to

［第４の実施の形態］
本実施の形態における推定装置の構成は図４と同様であるが、同時フロー数分散推定部１４の動作が異なる。 [Fourth Embodiment]
The configuration of the estimation apparatus in the present embodiment is the same as that in FIG. 4, but the operation of the simultaneous flow number variance estimation unit 14 is different.

第１の実施の形態において、収集フロー情報からAを計算していたのに対して、本実施の形態では、同時フロー数分散推定部１４において、将来時点でのAが予め定めたA*になると仮定したときの、同時フロー数の分散を
VarL=A*×(V+E²-E)×avg_Hmin+A*×E×avg_d
により見積もる。 In the first embodiment, A is calculated from the collected flow information. In this embodiment, in the simultaneous flow number variance estimation unit 14, A at a future time is set to A * determined in advance. Assuming that
VarL = A * × (V + E ² -E) × avg_Hmin + A * × E × avg_d
Estimate by

こうすることで、将来時点での需要A*に対して、同時フロー数の変動量を見積もることが可能となる。なお、需要がA*に変化したときの同時フロー数の平均は、
A*×E×d_avg
により算出される。 This makes it possible to estimate the amount of change in the number of simultaneous flows with respect to demand A * at a future time. The average number of concurrent flows when demand changes to A * is
A * × E × d_avg
Is calculated by

［第５の実施の形態］
本実施の形態における推定装置の構成は図４と同様であるが、同時フロー分散推定部１４の動作が第１、第４の実施の形態と異なる。 [Fifth Embodiment]
The configuration of the estimation apparatus in the present embodiment is the same as that in FIG. 4, but the operation of the simultaneous flow variance estimation unit 14 is different from those in the first and fourth embodiments.

第１の実施の形態及び第４の実施の形態に記載の方法で、同時フロー数分散推定部１４において、本実施の形態では、同時フロー数の分散を推定する際に、推定に用いるフロー情報のうち、宛先ポート番号が特定の番号（スキャンなどの異常トラヒックに多く見られるポート番号）であるフローを除外してから、分散の推定を行う。 In the method described in the first embodiment and the fourth embodiment, in the concurrent flow number variance estimation unit 14, in this embodiment, when estimating the variance of the number of concurrent flows, flow information used for estimation Among these, a flow whose destination port number is a specific number (a port number often found in abnormal traffic such as scanning) is excluded, and then the variance is estimated.

文献１に記載のように、フローを大量に生成するホストからのトラヒックの中には、スキャンなどの異常トラヒックが含まれていることがある。そこで、そのようなトラヒックを除去する機能をLSNに設けることで、そのようなトラヒックを除いた後のトラヒックに対して適切なフロー管理テーブルのサイズを設計する必要がある。そこで、ポート番号１３５，４４５といったスキャントラヒックを除去した後に、同時フロー数の変動量を推定することを提案している。 As described in Document 1, the traffic from a host that generates a large amount of flows may include abnormal traffic such as scanning. Therefore, it is necessary to design an appropriate flow management table size for the traffic after removing such traffic by providing the LSN with a function of removing such traffic. Therefore, it has been proposed to estimate the amount of change in the number of simultaneous flows after removing scan traffic such as port numbers 135 and 445.

[第６の実施の形態]
本実施の形態における推定装置の構成は図４と同様であるが、同時フロー分散推定部１４の動作が第１の実施の形態と異なる。 [Sixth embodiment]
The configuration of the estimation apparatus in the present embodiment is the same as that in FIG. 4, but the operation of the simultaneous flow variance estimation unit 14 is different from that in the first embodiment.

第１の実施の形態では、収集フロー情報から単位時間当たりの到着srcIP数A を計算していたのに対して、本実施の形態では、同時フロー数分散推定部１４において、フロー情報の収集対象となるユーザ数をZとして、仮に収容ユーザ数をZ'としたときの同時フロー数の分散を
VarL=A/Z×Z'×(V+E²-E)×avg_Hmin+ A/Z×Z'×E×avg_d
により見積もる。 In the first embodiment, the number of arriving srcIPs A per unit time is calculated from the collected flow information. In the present embodiment, the concurrent flow number variance estimation unit 14 performs the flow information collection target. If the number of users is Z and the number of accommodated users is Z '
VarL = A / Z × Z '× (V + E ² -E) × avg_Hmin + A / Z × Z' × E × avg_d
Estimate by

なお、ユーザ数Zについては、監視対象としているネットワークにおける収容ユーザ数を用いるか、あるいは、収集されたフロー情報から、出現するsrcIP数をカウントし、それをZとして用いてもよい。 As for the number of users Z, the number of accommodated users in the network to be monitored may be used, or the number of srcIPs that appear may be counted from the collected flow information and used as Z.

[第７の実施の形態]
本実施の形態における推定装置の構成は図４と同様であるが、フロー情報収集部１３、同時フロー数分散推定部１４の動作が異なる。 [Seventh embodiment]
The configuration of the estimation apparatus in the present embodiment is the same as that in FIG. 4, but the operations of the flow information collection unit 13 and the simultaneous flow number variance estimation unit 14 are different.

第１の実施の形態におけるフロー持続時間の平均avg_dとavg_Hminを用いて
α=avg_Hmin/avg_d
を計算しておき、別途、第１の実施の形態における測定期間Tよりも短い測定期間T'において収集されたフロー情報があったとする。本実施の形態では、このとき、フロー統計情報収集部１３は、該フロー情報を用いて、srcIP#x毎の発生フロー数に関する統計情報を求める。 Using the average avg_d and avg_Hmin of the flow duration in the first embodiment, α = avg_Hmin / avg_d
Assume that there is flow information collected separately in a measurement period T ′ shorter than the measurement period T in the first embodiment. In this embodiment, at this time, the flow statistical information collection unit 13 uses the flow information to obtain statistical information regarding the number of generated flows for each srcIP # x.

具体的には、まず、測定期間T'を、T'より小さい測定周期τで分割し（区間がT'/τ=L'個生成される）、k番目の測定区間における、srcIP#xからの発生フロー数をF(x,k)とする。同時フロー数分散推定部１４は、F(x,k)を用いて、あるsrcIPから生成されるフロー数の平均Eと分散Vを計算する。また、単位時間当たりの到着srcIP数Aを、
A＝N / T' / E
により計算する。 Specifically, first, the measurement period T ′ is divided by a measurement period τ smaller than T ′ (T ′ / τ = L ′ sections are generated), and from srcIP # x in the kth measurement section Let F (x, k) be the number of generated flows. The simultaneous flow number variance estimation unit 14 calculates an average E and a variance V of the number of flows generated from a certain srcIP using F (x, k). In addition, the number of arrival srcIP per unit time A,
A = N / T '/ E
Calculate according to

また、測定期間T'における同時フロー数の平均avgL'を、図４のフロー統計情報収集部１３において収集されたフロー情報を用いて算出しておく。その手順について述べる。 Further, the average avgL ′ of the number of simultaneous flows in the measurement period T ′ is calculated using the flow information collected in the flow statistical information collection unit 13 of FIG. The procedure is described.

測定期間T'のデータについて、便宜上、時刻0から時刻T'までのデータとして説明する。T'を測定周期τで分割したとし、時点kτ(k=1からL')に対して、
Tfirst_i<kτ<Tlast_i
を満たすフローiの数をカウントし、それを時点kτでの同時フロー数L(k)とし、
avgL'=ΣL(k)/L'
と計算する。
以上の準備の元、同時フロー数分散推定部１４は、同時フロー数の分散Var_Lを以下の式で推定する。 The data of the measurement period T ′ will be described as data from time 0 to time T ′ for convenience. Suppose that T ′ is divided by the measurement period τ, and for the time point kτ (k = 1 to L ′),
Tfirst_i <kτ <Tlast_i
Count the number of flows i that satisfy, and let it be the number of simultaneous flows L (k) at time kτ,
avgL '= ΣL (k) / L'
And calculate.
Based on the above preparation, the concurrent flow number variance estimation unit 14 estimates the variance Var_L of the concurrent flow number using the following equation.

VarL=A×(V+E²-E)×avgL'/(A×E)×α+avgL'
上記の方法について補足説明する。ここでは、測定期間T'におけるデータを用いて同時フロー数の平均avgL'は直接算出している。一方、期間T'が分散を出すのには測定期間長が不十分である場合を想定し、上記の式で分散は推定している。その際に、上記の式の右辺第一項において、第1の実施の形態ではavg_Hminの値を用いていたが、avg_Hminはそのときのavg_d（フロー持続時間の平均）に依存して決まるため、現在の（つまり期間T'における）フロー持続時間の平均が、別途測定されているそれ（つまりavg_d）と異なる可能性がある。そこで、本実施の形態では、現在の同時フロー数の平均avgL'から、現在のフロー持続時間の平均を、リトルの公式を用いて、avgL'/(A×E)により推定し、その値にαをかけることで、現在のavg_Hminの値を推定している。 VarL = A × (V + E ² -E) × avgL '/ (A × E) × α + avgL'
The above method will be supplementarily described. Here, the average avgL ′ of the number of simultaneous flows is directly calculated using the data in the measurement period T ′. On the other hand, assuming that the measurement period length is insufficient for the period T ′ to exhibit variance, the variance is estimated by the above formula. At that time, in the first term on the right side of the above formula, the value of avg_Hmin was used in the first embodiment, but since avg_Hmin is determined depending on the avg_d at that time (average of the flow duration), The current (ie, during period T ′) average flow duration may be different from that measured separately (ie, avg_d). Therefore, in the present embodiment, the average of the current flow duration is estimated from the average avgL ′ of the current number of concurrent flows by using the Little's formula, and avgL ′ / (A × E) is used. The current value of avg_Hmin is estimated by multiplying by α.

また、上記の第１〜７の実施の形態における推定装置の構成要素の動作をプログラムとして構築し、推定装置として利用されるコンピュータにインストールして実行させることが可能である。 Moreover, it is possible to construct the operation of the components of the estimation apparatus in the first to seventh embodiments as a program, and install and execute it on a computer used as the estimation apparatus.

なお、本発明は上記の実施の形態に限定されることなく、特許請求の範囲内において種々変更、応用が可能である。 In addition, this invention is not limited to said embodiment, A various change and application are possible within a claim.

１前段のノード
２後段のノード
１０推定装置
１１パケット解析部
１２フロー管理部
１３フロー統計情報収集部
１４同時フロー数分散推定部 DESCRIPTION OF SYMBOLS 1 Preliminary node 2 Subsequent node 10 Estimation apparatus 11 Packet analysis unit 12 Flow management unit 13 Flow statistics information collection unit 14 Simultaneous flow number variance estimation unit

Claims

Simultaneous flow to estimate the number of flow entries required in a network device that manages the state of each flow by defining a group of packets with the same flow key {srcIP, dstIP, srcPort, dstPort, Protocol} as a flow group A number variation estimation method,
The flow management means passes through the network, the flow key i = {srcIP, dstIP, srcPort, dstPort, Protocol}, the start time Tfirst_i and the end time Tlast_i of the flow i, and the flow duration d_i = calculated from them A flow collecting step of collecting N pieces of information of the flow i including Tlast_i-Tfirst_i during a predetermined measurement period T and storing the information in a storage unit;
The flow statistics information collecting means obtains an average of flow durations for N flows d_i> 0 collected in the flow collecting step by avg_d = Σd_i / N, and the flow stored in the storage means Two flows i and j are randomly selected from the above, and the process of setting Hmin = min (d_i, d_j) is performed M times a predetermined number of times, and the value of Hmin in the m-th trial is set to Hmin (k) and the average is
avg_Hmin = ΣHmin (m) / M
An average calculation step to calculate according to
The flow statistical information collection means obtains statistical information on the number of generated flows for each srcIP # x using the flow information stored in the storage means, and determines the number of arrival srcIPs A per unit time as A = N Statistical information calculation step obtained by / T / E,
Simultaneous flow number variance estimation means,
Distributed VarL of the number of simultaneous flows,
VarL = A × (V + E ² -E) × avg_Hmin + A × E × avg_d
The simultaneous flow number variance estimation step to calculate,
And
In the statistical information calculation step,
The measurement period T is divided by a measurement period τ smaller than T (T / τ = L sections are generated), and the number of flows generated from srcIP # x in the kth measurement section is expressed as F (x, k), and using the generated flow number F (x, k), an average E and variance V of the number of flows generated from a certain srcIP are calculated.

In the statistical information calculation step,
When calculating the average E and the variance V of the number of flows,
When the set of occurrences srcIP is w_s, only those that satisfy F (k, x)> 0 among the number of generated flows F (x, k) for x∈w_s and k∈ [1, L] Extract
The number of F (k, x) extracted is LW, and the average E is
E = ΣF (k, x) / LW, V = ΣF (k, x) ² / LW-E ²
The simultaneous flow number variation estimation method according to claim 1, wherein the calculation is performed by:

In the statistical information calculation step,
Distributing the number of simultaneous flows when assuming that the number of arriving srcIPs per unit time in the future A is a predetermined A *
VarL = A * × (V + E ² -E) × avg_Hmin + A * × E × avg_d
The simultaneous flow number variation estimation method according to claim 1, characterized in that:

In the simultaneous flow number variance estimation step,
When estimating the variance of the number of simultaneous flows, the flow information used for estimation excludes flows whose destination port number is a specific number (port number often found in abnormal traffic such as scans), and then distributed 4. The simultaneous flow number variation estimation method according to claim 1 or 3, wherein the estimation is performed.

In the simultaneous flow number variance estimation step,
Assuming that the number of users to be collected from the flow information is Z, the distribution of the number of simultaneous flows when the number of accommodated users is Z ′,
VarL = A / Z × Z '× (V + E ² -E) × avg_Hmin + A / Z × Z' × E × avg_d
The simultaneous flow number variation estimation method according to claim 1, characterized in that:

In the flow information collecting step,
Collecting flow information in a measurement period T ′ shorter than the measurement period T;
In the average calculation step,
Α = avg_Hmin / avg_d is calculated using the average avg_d and avg_Hmin,
In the statistical information calculation step,
Using the flow information, obtain statistical information on the number of generated flows for each srcIP # x,
Calculate the average avgL ′ of the number of simultaneous flows in the measurement period T ′ from the collected flow information,
In the simultaneous flow number variance estimation step,
Disperse Var_L of the number of concurrent flows
VarL = A × (V + E ² -E) × avgL '/ (A × E) × α + avgL'
Estimated by
In the statistical information calculation step,
The measurement period T ′ is divided by a measurement period τ smaller than T ′ (T ′ / τ = L ′ sections are generated), and the number of flows generated from srcIP # x in the kth measurement section is F. (x, k), and using the generated flow number F (x, k), calculate the average E and the variance V of the number of flows generated from a certain srcIP,
The simultaneous flow number fluctuation amount estimation method according to claim 1, wherein the number A of arrival srcIPs per unit time is calculated by A = N / T ′ / E.

Simultaneous flow to estimate the number of flow entries required in a network device that manages the state of each flow by defining a group of packets with the same flow key {srcIP, dstIP, srcPort, dstPort, Protocol} as a flow group A number variation estimation device,
Flow key i = {srcIP, dstIP, srcPort, dstPort, Protocol}, including flow i start time Tfirst_i and end time Tlast_i, and flow duration d_i = Tlast_i-Tfirst_i calculated from them Flow management means that collects N pieces of information of the flow i during a predetermined measurement period T and stores them in the storage means;
For the N flows with d_i> 0 collected by the flow management means, an average of flow durations is obtained by avg_d = Σd_i / N, and 2 randomly selected from the flows stored in the storage means. The process of selecting two flows i and j and setting Hmin = min (d_i, d_j) is performed a predetermined number of times M, and the value of Hmin in the mth trial is set to Hmin (k), and the average The
avg_Hmin = ΣHmin (m) / M
An average calculating means for calculating by:
Using the flow information stored in the storage means, obtain statistical information on the number of generated flows for each srcIP # x, and obtain the number of arriving srcIPs per unit time A by A = N / T / E Information gathering means;
Distributed VarL of the number of simultaneous flows,
VarL = A × (V + E ² -E) × avg_Hmin + A × E × avg_d
The simultaneous flow number variance estimating means to calculate,
Have
The statistical information calculation means includes
The measurement period T is divided by a measurement period τ smaller than T (T / τ = L sections are generated), and the number of flows generated from srcIP # x in the kth measurement section is expressed as F (x, k), and includes means for calculating the average E and variance V of the number of flows generated from a certain srcIP using the generated flow number F (x, k), .