JP4420114B2

JP4420114B2 - Counting method, counting program, counting device

Info

Publication number: JP4420114B2
Application number: JP2008007359A
Authority: JP
Inventors: 恵志伊加田
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2008-01-16
Filing date: 2008-01-16
Publication date: 2010-02-24
Anticipated expiration: 2028-01-16
Also published as: JP2009171252A

Description

本発明は、データ系列中の各値の出現頻度を計数する方法に関するものである。 The present invention relates to a method for counting the appearance frequency of each value in a data series.

継続的に流れる大規模な時系列データから、一定以上の頻度で現れる値を抽出する手段は、様々な場面で必要になる。例えば、トラフィック中からパケットロス等のエラーの発生を検出したＩＰアドレスをデータ列とし、ＩＰアドレス毎にエラー検出数を集計し、一定割合以上のエラーの要因となっているＩＰアドレスもしくは経路を得るような場合などがある。 Means for extracting a value that appears at a certain frequency or more from large-scale time-series data that flows continuously is required in various situations. For example, an IP address that detects the occurrence of an error such as packet loss in traffic is used as a data string, and the number of error detections is counted for each IP address to obtain an IP address or route that causes an error of a certain rate or more. There are cases.

このような場合、最も単純な計数方法としては、ＩＰアドレス毎にカウンタを持ち、エラーの発生毎にカウンタを増加させることが考えられる。しかし、ＩＰアドレス空間は膨大であり、少なくとも繋がっている端末分程度のカウンタが必要となるため、大きなメモリ容量を要する。
また、一定頻度以上のカウンタを探索するために大きなメモリ空間をスキャンする必要がある。 In such a case, the simplest counting method may be to have a counter for each IP address and increase the counter each time an error occurs. However, since the IP address space is enormous and requires counters for at least the connected terminals, a large memory capacity is required.
Also, it is necessary to scan a large memory space in order to search for counters with a certain frequency or more.

この点に関し、下記非特許文献１では、確率的に、もしくはある一定誤差範囲内に収まるように、頻度が低いデータを随時削除することにより、少ないメモリ量で、一定以上の頻度で出現するデータとその頻度を計数する手法が論じられている。
下記非特許文献１の４．２節”ＬｏｓｓｙＣｏｕｎｔｉｎｇＡｌｇｏｒｉｔｈｍ”に記載の手法を簡単に説明すると、以下のようになる。 In this regard, in Non-Patent Document 1 below, data that appears at a certain frequency or more with a small amount of memory by deleting data with low frequency at any time so as to be stochastically or within a certain error range. And the method of counting the frequency is discussed.
The technique described in Section 4.2 “Lossy Counting Algorithm” of Non-Patent Document 1 below will be briefly described as follows.

（１）データ系列をカウントするためのＮ個分の領域を確保し、許容誤差率εの逆数個の容量を有する区間に分割する。区間には、０から始まる番号をつけておく。 (1) N areas for counting the data series are secured and divided into sections each having an inverse capacity of the allowable error rate ε. Numbers starting with 0 are assigned to the sections.

（２）上記領域にＮ個分のデータをセットし、１番目のデータ（＝１番目の区間の最初のデータ）から順にカウントを開始する。新たな値が出現する毎に、その時の区間番号Δを、そのデータの値とともに記録していく。
既に記録済みの値が再度出現したときは、その値のカウント数ｆを１増やす。 (2) N pieces of data are set in the area, and counting is started in order from the first data (= first data in the first section). Each time a new value appears, the section number Δ at that time is recorded together with the value of the data.
When an already recorded value appears again, the count number f of that value is increased by one.

（３）区間の境界に到達する度に、以下の基準に従い、カウント数ｆの少ないデータを、カウント結果の集計から削除する。
（３．１）ｆ＋Δ≦現在の区間番号となるデータは、集計結果から削除する。
（３．２）上記式を満たさないデータは、そのまま残す。 (3) Every time the boundary of the section is reached, data with a small count number f is deleted from the count result total according to the following criteria.
(3.1) Data with f + Δ ≦ current section number is deleted from the tabulation result.
(3.2) Data that does not satisfy the above equation is left as it is.

以上のステップを最後のデータ（Ｎ番目のデータ）まで繰り返すと、一定のカウント数以上のデータは全て保持されることが数学的に保証されており、かつカウント数の少ないデータはステップ（３．１）で削除されて残らない。
したがって、カウント数の多い重要なデータのカウントに必要なメモリ容量のみが必要となり、より少ないメモリ容量で、大量のデータ系列中の各値の出現頻度をカウントすることができる。 When the above steps are repeated until the last data (Nth data), it is mathematically guaranteed that all data exceeding a certain number of counts is retained, and data with a small number of counts is represented by step (3. Deleted in 1) does not remain.
Therefore, only a memory capacity necessary for counting important data having a large number of counts is required, and the appearance frequency of each value in a large amount of data series can be counted with a smaller memory capacity.

ＧｕｒｍｅｅｔＳｉｎｇｈＭａｎｋｕ，ＲａｊｅｅｖＭｏｔｗａｎｉ，”ＡｐｐｒｏｘｉｍａｔｅＦｒｅｑｕｅｎｃｙＣｏｕｎｔｓｏｖｅｒＤａｔａＳｔｒｅａｍ”，ＶＬＤＢ２００２（２８ｔｈＶＬＤＢ），ｐ．３４６−３５７，Ａｕｇｕｓｔ２００２Gurmeet Singh Manku, Rajeev Mottani, “Approximate Frequency Counts over Data Stream”, VLDB 2002 (28th VLDB), p. 346-357, August 2002

トラフィック観測のように、データが継続して終端なく流れ、かつ、実際に通信を行っているＩＰアドレスが頻繁に入れ替わる場合には、観測されるデータ系列の中で頻出するＩＰアドレスが時間とともに変化することが多い。
この場合、頻出するＩＰアドレスの経時変化をできる限りリアルタイムで把握する観点から、頻度が高いデータを、断続することなく継続的に抽出できることが、トラフィック観測の上で望ましい。 As in traffic observation, when data continues to flow without termination and the IP address that is actually communicating changes frequently, the IP address that appears frequently in the observed data series changes over time. Often to do.
In this case, it is desirable in terms of traffic observation that data with high frequency can be continuously extracted without being interrupted from the viewpoint of grasping the time-dependent change of the IP address in real time as much as possible.

しかし、上記非特許文献１に記載の技術では、Ｎ個のデータが処理された時点で一区切りの処理となるため、頻度が高いデータを継続的に抽出したい場合でも、次のＮ個が処理されるまで待たなければならない。
例えば、時系列に得られる１０万個のデータから高頻度データを抽出する処理を、１００個の新たなデータが得られる度に行いたいというような処理を行うことはできず、Ｎ個のデータのカウントが完了するまで待たなければならない。 However, in the technique described in Non-Patent Document 1, since the processing is divided into one when N pieces of data are processed, the next N pieces are processed even when it is desired to continuously extract frequently used data. I have to wait until
For example, the process of extracting high-frequency data from 100,000 data obtained in time series cannot be performed every time 100 new data is obtained. You have to wait until the count is complete.

もし、Ｎ個のデータのカウントが完了するまで待つことなく、上述のようなことを実現しようとすると、複数の処理を所望のデータ数だけずらして平行して行う必要がある。
この場合、各並行処理のために、個別のメモリ領域とデータ処理能力が必要となり、多大なリソースを消費してしまう。 If an attempt is made to realize the above without waiting for the count of N data to be completed, it is necessary to perform a plurality of processes in parallel by shifting the desired number of data.
In this case, a separate memory area and data processing capability are required for each parallel processing, and a great deal of resources are consumed.

そのため、統計上の出現頻度の低いデータを削除することでメモリ容量を抑えつつ、個別の並行処理を実行することなく、カウントの途中経過を逐次得ることのできる計数方法が望まれていた。 Therefore, there has been a demand for a counting method capable of sequentially obtaining the progress of counting without executing individual parallel processing while suppressing memory capacity by deleting data having a low statistical appearance frequency.

本発明に係る計数方法は、入力されるデータ系列のうち等価のものをグループ化してグループ毎に当該等価データの出現頻度を計数する方法であって、入力されるデータ系列を所定数記憶する記憶領域を備えた記憶装置を設けておくとともに、前記記憶領域を許容計数誤差の逆数個の区間に分割しておき、前記記憶領域に前記データ系列を入力しながら等価データをグループ化するとともにグループ毎に当該等価データの出現頻度を計数する計数ステップと、前記記憶領域に入力する前記データ系列の数が当該記憶領域の最大サイズに達した時、初期段階で計数された計数結果の全部または一部を破棄する破棄ステップと、各前記グループのうち出現頻度が少ないものを削除する低頻度データ削除ステップと、前記データ系列を１個の前記区間に相当する個数追加入力して計数結果を更新する更新ステップと、を有し、前記計数ステップでは、各前記グループの計数結果とそのグループの許容計数誤差を組にして前記記憶装置に記憶させておき、前記破棄ステップでは、計数結果を破棄するグループの許容計数誤差を、破棄前の当該グループの計数結果を用いて更新することにより、破棄前後に係る計数誤差を前記許容計数誤差の範囲内に収めるものである。 A counting method according to the present invention is a method of grouping equivalents of input data series and counting the appearance frequency of the equivalent data for each group, and storing a predetermined number of input data series A storage device having an area is provided, the storage area is divided into reciprocal sections of an allowable count error, and equivalent data is grouped while inputting the data series into the storage area. A counting step for counting the appearance frequency of the equivalent data, and all or part of the counting results counted in the initial stage when the number of the data series input to the storage area reaches the maximum size of the storage area A discarding step for discarding each of the groups, a low-frequency data deleting step for deleting one with a low appearance frequency among the groups, and a group of the data series. An update step for updating the counting result by additionally inputting the number corresponding to the number, and in the counting step, the counting result of each group and the allowable counting error of the group are combined and stored in the storage device. In the discarding step, the count error of the group that discards the count result is updated using the count result of the group before the discard, so that the count error before and after the discard is within the range of the allowed count error. It is to be stored.

本発明に係る計数方法によれば、統計上の出現頻度が少ないデータを計数結果から削除するので、計数結果を格納するメモリ容量を抑えることができる。
また、更新ステップにおいて、１区間分のデータを追加入力して計数結果を更新するので、データの計数結果を逐次得ることができる。
また、古い計数結果は重要でないとの観点から、入力するデータ数が最大サイズに達すると、破棄ステップにおいて初期段階で計数された計数結果の全部または一部を削除するので、これによるメモリ容量の節約効果がある。
また、破棄ステップで初期の計数結果を削除する際には、許容計数誤差を破棄前の計数結果に基づき更新するので、古い計数結果を削除しても、一定の誤差範囲内の計数結果が維持され、計数の精度が維持される。 According to the counting method of the present invention, data having a low statistical appearance frequency is deleted from the counting result, so that the memory capacity for storing the counting result can be suppressed.
In addition, in the update step, data for one section is additionally input to update the count result, so that the data count result can be obtained sequentially.
In addition, from the viewpoint that the old counting result is not important, when the number of input data reaches the maximum size, all or part of the counting result counted in the initial stage is deleted in the discarding step. There is a saving effect.
Also, when deleting the initial counting result in the discarding step, the allowable counting error is updated based on the counting result before discarding, so that even if the old counting result is deleted, the counting result within a certain error range is maintained. And the accuracy of counting is maintained.

実施の形態１．
図１は、本発明の実施の形態１に係る計数装置１００の機能ブロック図である。
計数装置１００は、データ系列を入力して、系列中に出現する同一データ値の出現頻度をカウントする装置であり、計数部１１０、記憶部１２０、破棄処理部１３０、低頻度データ削除部１４０、更新部１５０を備える。 Embodiment 1 FIG.
FIG. 1 is a functional block diagram of a counting device 100 according to Embodiment 1 of the present invention.
The counting device 100 is a device that inputs a data series and counts the appearance frequency of the same data value that appears in the series, and includes a counting unit 110, a storage unit 120, a discard processing unit 130, a low-frequency data deleting unit 140, An update unit 150 is provided.

計数部１１０は、データ系列を受け取り、同一データ値の出現頻度をカウントして、その計数結果を記憶部１２０に格納する。
記憶部１２０は、データ系列の計数結果を格納する記憶装置である。
破棄処理部１３０は、記憶部１２０に格納されている計数結果のうち、初期段階で計数された計数結果を削除するものである。
低頻度データ削除部１４０は、記憶部１２０に格納されている計数結果のうち、出現頻度が低いものを削除する。
更新部１５０は、計数部１１０が一定数のデータ系列のカウントを終えた後、追加分のデータ系列を逐次受け取り、その計数結果を記憶部１２０に格納する。先に記憶部１２０に格納されている計数結果は、追加分の計数結果で更新する。
これら各部の具体的な処理手順は、後述の図３〜図６で説明する。 The counting unit 110 receives the data series, counts the appearance frequency of the same data value, and stores the counting result in the storage unit 120.
The storage unit 120 is a storage device that stores the count result of the data series.
The discard processing unit 130 deletes the counting results counted in the initial stage from the counting results stored in the storage unit 120.
The low frequency data deletion unit 140 deletes the count results stored in the storage unit 120 with a low appearance frequency.
The updating unit 150 sequentially receives additional data series after the counting unit 110 finishes counting a certain number of data series, and stores the count result in the storage unit 120. The counting result previously stored in the storage unit 120 is updated with the additional counting result.
Specific processing procedures of these units will be described with reference to FIGS.

計数部１１０、破棄処理部１３０、低頻度データ削除部１４０、更新部１５０は、これらの機能を実現する回路デバイスなどのハードウェアで構成することもできるし、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）やマイコンなどの演算装置と、その動作を規定するソフトウェアとで構成することもできる。
これらの構成部のうち全部または一部を、一体的に構成することもできる。
記憶部１２０は、容量の観点から、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）のような比較的大容量の記憶装置で構成することが好ましい。 The counting unit 110, the discard processing unit 130, the infrequent data deleting unit 140, and the updating unit 150 can be configured by hardware such as a circuit device that realizes these functions, a CPU (Central Processing Unit), a microcomputer, or the like. It is also possible to configure the computer with software that defines its operation.
All or a part of these components can be configured integrally.
The storage unit 120 is preferably configured from a relatively large capacity storage device such as an HDD (Hard Disk Drive) from the viewpoint of capacity.

図２は、計数部１１０がカウントするデータ系列の分割区分を説明するものである。
計数部１１０は、データ系列のカウントを行うに際し、一定サイズの入力バッファを記憶部１２０上に設定し、バッファ中にデータ系列を取り込んだ上でカウントを実行する。ここでは、Ｎ個分のデータをカウントできるだけのバッファサイズを確保するものとして以下の説明を行う。
また、後述する処理手順の便宜上、上述の入力バッファを、複数の区分に分割する。分割基準は以下の通りである。 FIG. 2 explains the division of the data series counted by the counting unit 110.
When counting the data series, the counting unit 110 sets an input buffer of a certain size on the storage unit 120, and executes the count after taking the data series into the buffer. Here, the following description will be made assuming that a buffer size sufficient to count N data is secured.
Further, for the convenience of the processing procedure described later, the above-described input buffer is divided into a plurality of sections. The division criteria are as follows.

（１）データ系列のカウントに際し許容される誤差（許容誤差）をεとする。
（２）１区間のデータ数を、（１／ε）個とする。
（３）総区間数は、εＮ個となる。各区間に０から開始する区間番号をつけておく。 (1) Let ε be an error (allowable error) allowed in counting the data series.
(2) The number of data in one section is (1 / ε).
(3) The total number of sections is εN. A section number starting from 0 is assigned to each section.

最初のＮ個のデータ、即ち「区間番号＝εＮ−１」までのデータは、計数部１１０がカウントを実行する。以後の区間のデータは、更新部１５０がカウントを実行する。
計数部１１０が行うカウント処理は、Ｎ個分のデータをバッファに入力して行うが、以後の更新部１５０が行うカウント処理は、１区間分のデータ毎にカウントを行う。そのため、これ以後のカウント結果は、１区間毎に逐次得られる。 The counting unit 110 counts the first N pieces of data, that is, data up to “section number = εN−1”. The update unit 150 counts the data in the subsequent sections.
The counting process performed by the counting unit 110 is performed by inputting N pieces of data to the buffer, but the subsequent counting process performed by the updating unit 150 performs counting for each section of data. Therefore, subsequent count results are obtained sequentially for each section.

ここで、本実施の形態１に係る計数装置１００との対比のため、上記非特許文献１に記載の従来技術におけるカウント手法について説明しておく。その後、本実施の形態１に係る計数装置１００の動作説明に戻る。 Here, for comparison with the counting device 100 according to the first embodiment, the conventional counting method described in Non-Patent Document 1 will be described. Thereafter, the description returns to the operation of the counting device 100 according to the first embodiment.

図８は、従来技術における計数結果の格納例を示すものである。ここではテーブル形式の格納例を示した。なお、データ系列は、図２で説明したものと同じように、複数の区間に分割されてカウントされるものとする。 FIG. 8 shows an example of storing count results in the prior art. Here, an example of storage in a table format is shown. It is assumed that the data series is divided into a plurality of sections and counted in the same manner as described with reference to FIG.

図８のテーブルは、「データ値」列、「出現頻度」列、「計数開始位置」列を有する。
「データ値」列には、データ系列中に出現したデータの値が格納される。
「出現頻度」列には、「データ値」列で示されるデータの出現頻度が格納される。
「計数開始位置」列には、「データ値」列で示されるデータがデータ系列中で最初に出現したときの区間番号が格納される。本列の値は、当該データ値の計数に際しての許容誤差を表す値でもある。これは、１区間のデータ数を（１／ε）個としたことによる。 The table in FIG. 8 includes a “data value” column, an “appearance frequency” column, and a “counting start position” column.
In the “data value” column, values of data appearing in the data series are stored.
The “appearance frequency” column stores the appearance frequency of the data indicated by the “data value” column.
The “counting start position” column stores the section number when the data shown in the “data value” column first appears in the data series. The value in this column is also a value representing an allowable error in counting the data value. This is because the number of data in one section is (1 / ε).

図８の例では、「データ値＝Ｄ１」のデータは、「区間番号＝０」で最初に出現し、これまでに同値のデータが４１０個カウントされていることが分かる。
同様に、「データ値＝Ｄ２」のデータは、「区間番号＝１０」で最初に出現し、これまでに同値のデータが３２０個カウントされていることが分かる。 In the example of FIG. 8, it can be seen that the data of “data value = D1” first appears with “section number = 0”, and 410 data of the same value have been counted so far.
Similarly, it can be seen that the data “data value = D2” appears first with “section number = 10”, and 320 data of the same value have been counted so far.

非特許文献１に記載の従来技術では、計数結果が下記（式１）を満たすデータは、出現頻度の低い低頻度データとして削除対象となる。これにより、重要度が高いと思われる出現頻度の高いデータのみを残し、メモリ容量を抑えることを図っている。 In the prior art described in Non-Patent Document 1, data whose counting result satisfies the following (Equation 1) is to be deleted as low-frequency data with low appearance frequency. As a result, only data with a high appearance frequency that seems to be high in importance is left, and the memory capacity is reduced.

出現頻度＋統計開始位置≦現在の区間番号・・・（式１） Appearance frequency + statistical start position ≤ current section number (Equation 1)

ただし、非特許文献１に記載の従来技術では、入力バッファをＮ個分のデータ相当に設定しているため、Ｎ個分のカウントが完了するまでの間は、計数結果を利用する他のアプリケーション等は待たされることになる。
これは、リアルタイムまたはこれに近い頻度で計数結果を得たいアプリケーションでは都合が悪い。そこで、本実施の形態１に係る計数装置１００では、最初のＮ個分のデータのカウントが完了した後は、計数結果を逐次更新することのできる手法を提案する。 However, in the prior art described in Non-Patent Document 1, since the input buffer is set to correspond to N data, another application that uses the count result until the N counts are completed. Etc. will be kept waiting.
This is inconvenient for applications that want to obtain counting results in real time or close to this frequency. Therefore, the counting device 100 according to the first embodiment proposes a method capable of sequentially updating the counting result after the first N data has been counted.

ただし、計数結果を逐次更新すると、計数の母数が変動するため、母数がＮ個であることを前提に設けられている上記（式１）がそのまま使えなくなる。
そこで、本実施の形態１に係る計数装置１００では、許容誤差を都度更新し、上記（式１）を使用できるように配慮する。具体的には、後述の図３〜図６で説明する。 However, if the count result is updated sequentially, the parameter of the count fluctuates, so that the above (Formula 1) provided on the assumption that the number of parameters is N cannot be used as it is.
Therefore, in the counting device 100 according to the first embodiment, consideration is given so that the allowable error is updated each time and the above (Equation 1) can be used. Specifically, this will be described with reference to FIGS.

以上、非特許文献１に記載の従来技術におけるカウント手法について説明した。
以下、本実施の形態１に係る計数装置１００の動作説明に戻る。 In the above, the counting method in the prior art described in Non-Patent Document 1 has been described.
Hereinafter, the description returns to the operation of the counting device 100 according to the first embodiment.

図３は、計数部１１０が実行するカウント処理を説明するものである。
計数部１１０は、入力されたデータ系列中のデータの出現頻度をカウントする際に、出現頻度が所定の閾値をどの時点で超えたかを、出現頻度と併せて記録部１２０に記録しておく。図３では、上述の閾値の１例として、「εＮ＝５０」と設定して記載した。
ここで設定する閾値は、新たな許容誤差の基準となるものである。具体例は後述の図４〜図６で改めて説明する。
計数部１１０は、「データ値＝Ｄ１」のデータの出現頻度をカウントするに際し、出現頻度が前述の閾値５０を超える毎に、その時点の区間番号を記憶部１２０に記録する。格納形式は、後述の図４で説明する。 FIG. 3 illustrates a counting process executed by the counting unit 110.
When counting the appearance frequency of the data in the input data series, the counting unit 110 records the time at which the appearance frequency exceeded a predetermined threshold in the recording unit 120 together with the appearance frequency. In FIG. 3, “εN = 50” is set as an example of the above-described threshold value.
The threshold value set here becomes a reference for a new allowable error. Specific examples will be described later with reference to FIGS.
When counting the appearance frequency of the data “data value = D1”, the counting unit 110 records the section number at that time in the storage unit 120 every time the appearance frequency exceeds the threshold value 50 described above. The storage format will be described later with reference to FIG.

図３の例では、「区間番号＝５」で出現頻度が５０を超え、さらに１２区間進んだ「区間番号＝１７」で再び出現頻度が５０を超えている。計数部１１０は、これらの区間番号を後に特定できるよう、記憶部１２０に記録する。 In the example of FIG. 3, the appearance frequency exceeds 50 at “section number = 5”, and the appearance frequency exceeds 50 again at “section number = 17”, which is 12 sections ahead. The counting unit 110 records these section numbers in the storage unit 120 so that they can be specified later.

図４は、計数部１１０のカウント結果の記憶部１２０への格納例を示すものである。
図４（ａ）は計数結果テーブル、図４（ｂ）は閾値位置テーブルの構成とデータ例を示している。 FIG. 4 shows an example of storing the count result of the counting unit 110 in the storage unit 120.
4A shows a count result table, and FIG. 4B shows a configuration and data example of a threshold position table.

計数結果テーブルは、データ系列の計数結果を格納するためのテーブルであり、図８で説明した従来技術におけるテーブル例に加えて、新たに「許容誤差」列を有する。
従来技術では、許容誤差はカウントの進行によらず一定であるが、本実施の形態１では後述する図５の手順により許容誤差が逐次変動するため、本列が新たに設けられている。 The count result table is a table for storing the count results of the data series, and has a new “allowable error” column in addition to the table example in the prior art described with reference to FIG.
In the prior art, the allowable error is constant regardless of the progress of the count, but in the first embodiment, the allowable error sequentially varies according to the procedure of FIG. 5 described later, so this column is newly provided.

図４（ａ）の例では、「許容誤差」列の値は「計数開始位置」列の値と同じになっている。これは、後述する図５の手順を実行するまでは、許容誤差は変動せず、初期状態（＝統計開始位置の値）に等しくなっているためである。 In the example of FIG. 4A, the value in the “allowable error” column is the same as the value in the “counting start position” column. This is because the tolerance does not change until the procedure of FIG. 5 described later is executed, and is equal to the initial state (= statistic start position value).

閾値位置テーブルは、図３で説明した、計数結果が閾値（５０）を超えた位置を逐次格納するためのテーブルであり、「データ値」列、「次回更新までの区間距離」列、「許容誤差更新値」列、「初期区間頻度値」列、「データ値通番」列を有する。 The threshold position table is a table for sequentially storing the positions where the counting result exceeds the threshold (50), as described with reference to FIG. 3, and includes a “data value” column, a “distance distance until next update” column, and an “allowable” It has an “error update value” column, an “initial interval frequency value” column, and a “data value serial number” column.

「データ値」列には、データ系列中に出現したデータの値が格納される。
「次回更新までの区間距離」列には、計数結果が前回閾値（５０）を超えてから次に閾値（５０）を超えるまでに要した区間数が格納される。
「許容誤差更新値」列には、計数結果が閾値（５０）を超えてから次に閾値（５０）を超える直前の区間までの計数結果が格納される。
なお、最初の区間（区間番号０）あるいは、前記閾値位置テーブルに閾値を越えたことにより各列に値を格納した後、同一区間内でさらに係数結果が閾値を超えた場合、「次回更新までの区間距離」列には「０」を、「許容誤差更新値」列には、現在の区間までの計数結果が格納される。
「初期区間頻度値」列には、計数結果が閾値（５０）を超えた区間内のみでのデータの出現頻度が格納される。
「データ値通番」列には、計数結果が閾値（５０）を超える毎に新たに採番される番号で、同一の「データ値」列の値の記録順を識別するために便宜上設けられたものである。 In the “data value” column, values of data appearing in the data series are stored.
In the “section distance until next update” column, the number of sections required from when the counting result exceeds the previous threshold value (50) until the next threshold value (50) is stored is stored.
The “allowable error update value” column stores the count results from the count result exceeding the threshold (50) to the next interval immediately before the threshold (50) is exceeded.
If the coefficient result further exceeds the threshold within the same section after storing the value in each column because the threshold is exceeded in the first section (section number 0) or the threshold position table, “until the next update” The “distance interval” column stores “0”, and the “allowable error update value” column stores count results up to the current interval.
The “initial interval frequency value” column stores the appearance frequency of data only in the interval where the counting result exceeds the threshold value (50).
The “data value serial number” column is a number that is newly assigned every time the counting result exceeds the threshold (50), and is provided for convenience in order to identify the recording order of the values in the same “data value” column. Is.

図４（ｂ）の例では、１行目〜２行目のデータは、図３で説明した例に即したデータ例を示した。
即ち、最初の区間からカウントを開始して６番目の区間で初めて計数結果が５０を超えたため、１行目では「次回更新までの区間距離＝６」とし、その直前の区間までの計数結果は４３であるから、「許容誤差更新値＝４３」とし、閾値位置テーブルに「Ｄ１」に関するデータがなかったので、「初期頻度値＝０」とした。
次に計数結果が５０を超えたのはさらに１２区間進んだ時点であるため、２行目では「次回更新までの区間距離＝１２」とし、その直前の区間までの計数結果は３２であるから、「許容誤差更新値＝３２」とし、また、前回５０を越えた区間（区間番号＝５）のみでデータが出現した頻度は８であるから、「初期頻度値＝８」した。
１行目と２行目は同じ「データ値＝Ｄ１」に対する閾値位置であるため、「データ値通番」列の値は、１から順番に１ずつ増やして採番した。 In the example of FIG. 4B, the data in the first and second rows is an example of data that is in line with the example described in FIG.
That is, since the counting result has exceeded 50 for the first time after starting counting from the first interval, “distance distance until next update = 6” is set in the first row, and the counting result up to the immediately preceding interval is Since it is 43, “allowable error update value = 43” is set, and since there is no data regarding “D1” in the threshold position table, “initial frequency value = 0” is set.
Next, the count result exceeded 50 when it was further 12 sections forward, so in the second row, “section distance until next update = 12” and the count result up to the immediately preceding section is 32. “Allowable error update value = 32”, and since the frequency of appearance of data only in the section (section number = 5) exceeding the previous 50 is 8, “initial frequency value = 8”.
Since the first row and the second row are threshold positions for the same “data value = D1”, the values in the “data value serial number” column are incremented by 1 in order from 1, and are assigned numbers.

本実施の形態１における「許容計数誤差」は、εＮがこれに相当する。
また、「許容計数誤差更新値」は、閾値位置テーブルの「許容誤差更新値」列の値がこれに相当する。 The “allowable counting error” in the first embodiment corresponds to εN.
The “allowable count error update value” corresponds to the value in the “allowable error update value” column of the threshold position table.

図５は、破棄処理部１３０の処理手順を説明するものである。以下、図５のステップ順に説明する。なお、図５において、色付きの両端矢印は、各ステップにおける計数対象範囲を示すものである。 FIG. 5 illustrates a processing procedure of the discard processing unit 130. Hereinafter, description will be made in the order of steps in FIG. In FIG. 5, colored double-ended arrows indicate the count target range in each step.

（１）Ｎ個をカウント
計数部１１０は、最初のＮ個分のデータをカウントし、計数結果を記憶部１２０に格納する。ここでは、図３〜図４で説明したものと同様の計数結果が得られたものとする。 (1) Count N The counting unit 110 counts the first N pieces of data and stores the counting result in the storage unit 120. Here, it is assumed that the same counting results as those described with reference to FIGS.

（２）初期の計数結果を削除
破棄処理部１３０は、以下の処理を実行する。
（２．１）破棄処理部１３０は、区間０〜４におけるデータＤ１の計数結果を全て削除する。
（２．２）計数結果の削除によりデータＤ１の計数結果に同数の誤差が生じるので、図４（ｂ）で説明した閾値位置テーブルを用いて、データＤ１に関する許容誤差の値を同数で更新する。更新の手順は、後述の図６で説明する。 (2) Delete initial counting result The discard processing unit 130 executes the following processing.
(2.1) The discard processing unit 130 deletes all the count results of the data D1 in the sections 0 to 4.
(2.2) Since the same number of errors occur in the counting result of the data D1 due to the deletion of the counting result, the allowable error value related to the data D1 is updated by the same number using the threshold position table described with reference to FIG. . The update procedure will be described later with reference to FIG.

（３）区間を１ずらす
計数対象範囲を１区間分進め、以後のデータのカウントは、更新部１５０が担当する。
ここでは、「区間番号＝εＮ」の区間のデータがカウントされる。更新部１５０は、以後同様に、１区間ずつ区間を先に進めながら、計数結果を記憶部１２０に逐次格納する。他のアプリケーションは、１区間毎に逐次格納される計数結果を取得することができる。 (3) Shifting the section by one The count target range is advanced by one section, and the update unit 150 is responsible for counting the subsequent data.
Here, the data of the section “section number = εN” is counted. Similarly, the update unit 150 thereafter sequentially stores the count results in the storage unit 120 while moving forward one section at a time. Other applications can obtain count results stored sequentially for each section.

（４）初期の計数結果を削除
計数対象区間の左端が、データＤ１の計数結果が５０を超えた区間（区間番号＝５）に達すると、破棄処理部１３０は、以下の処理を実行する。
（４．１）破棄処理部１３０は、区間５〜１６におけるデータＤ１の計数結果を全て削除する。
（４．２）計数結果の削除によりデータＤ１の計数結果に同数の誤差が生じるので、図４（ｂ）で説明した閾値位置テーブルを用いて、データＤ１に関する許容誤差の値を同数で更新する。更新の手順は、後述の図６で説明する。 (4) Delete the initial counting result When the left end of the counting target section reaches a section (section number = 5) in which the counting result of the data D1 exceeds 50, the discard processing unit 130 executes the following processing.
(4.1) The discard processing unit 130 deletes all the counting results of the data D1 in the sections 5 to 16.
(4.2) Since the same number of errors occur in the counting result of the data D1 due to the deletion of the counting result, the allowable error value related to the data D1 is updated by the same number using the threshold position table described with reference to FIG. . The update procedure will be described later with reference to FIG.

以下同様に、破棄処理部１３０は、データＤ１の計数結果が５０を超えた区間に計数対象区間の左端が達する毎に、古い計数結果を破棄するとともに、データＤ１に関する許容誤差の値を同数で更新する。 Similarly, the discard processing unit 130 discards the old count result every time the left end of the count target section reaches the section where the count result of the data D1 exceeds 50, and the same number of allowable error values for the data D1. Update.

図６は、閾値位置テーブルに格納されているデータの使用手順を説明するものである。データ値は、図４で説明したものと同じものを用いた。
図６（１）は、破棄処理部１３０が、図５（２）の破棄処理を行う前の状態である。
破棄処理部１３０は、区間の境界毎に、計数結果テーブルの「計数開始位置」列の値を確認し、同列の値が「０」となっているデータを探す。ここでは、データＤ１は区間番号０から出現しているため、データＤ１の同列の値が「０」となっている。 FIG. 6 explains the procedure for using the data stored in the threshold position table. The same data values as those described in FIG. 4 were used.
FIG. 6A shows a state before the discard processing unit 130 performs the discard process of FIG.
The discard processing unit 130 checks the value in the “counting start position” column of the counting result table for each section boundary, and searches for data in which the value in the column is “0”. Here, since the data D1 appears from the section number 0, the value in the same column of the data D1 is “0”.

次に、破棄処理部１３０は、閾値位置テーブルのデータのうち、「データ値＝Ｄ１」のものを検索し、さらに「データ値通番」が最も小さいものを取得する。ここでは、図６（１）（ｂ）の１行目のデータが相当する。もし、そのようなデータが見つからなければ、「データ値＝Ｄ１」の計数結果が許容誤差「εＮ＝５０」未満になることを示すので、計数結果テーブルから「Ｄ１」の計数結果を削除する。 Next, the discard processing unit 130 searches the data in the threshold position table for “data value = D1”, and acquires the data having the smallest “data value serial number”. Here, the data in the first row in FIGS. 6 (1) and 6 (b) correspond. If such data is not found, it indicates that the count result of “data value = D1” is less than the allowable error “εN = 50”, and therefore the count result of “D1” is deleted from the count result table.

次に、破棄処理部１３０は、図６（１）（ｂ）の１行目のデータを用いて、計数結果テーブルのデータＤ１に関するデータを更新する。具体的には、以下のような処理を行う。 Next, the discard processing unit 130 updates the data related to the data D1 in the count result table using the data in the first row in FIGS. 6 (1) and 6 (b). Specifically, the following processing is performed.

（１）破棄処理部１３０は、計数結果テーブルの「出現頻度」列の値から、閾値位置テーブルの「許容誤差更新値」列の値と「初期区間頻度値」列の和の数を減算する。これは、区間番号５以前の計数結果を削除するものであり、図５のステップ（２）の処理に相当する。 (1) The discard processing unit 130 subtracts the sum of the value of the “allowable error update value” column and the “initial interval frequency value” column of the threshold position table from the value of the “appearance frequency” column of the count result table. . This is to delete the counting result before the section number 5, and corresponds to the processing of step (2) in FIG.

（２）破棄処理部１３０は、計数結果テーブルの「許容誤差」列の値を、閾値位置テーブルの「許容誤差更新値」列の値で更新する。ただし、「次回更新までの区間距離」列の値が０の場合は、「許容誤差」列の値を０で更新する。これは、区間番号５以前の計数結果を削除したことに伴う補填処理であり、計数結果に削除分の誤差が生じていることに鑑みたものである。 (2) The discard processing unit 130 updates the value in the “allowable error” column of the count result table with the value in the “allowable error update value” column of the threshold position table. However, when the value of the “distance distance until next update” column is 0, the value of the “allowable error” column is updated with 0. This is a compensation process that accompanies the deletion of the count result of section number 5 or earlier, and is based on the fact that an error corresponding to the deletion occurs in the count result.

（３）破棄処理部１３０は、計数結果テーブルの「計数開始位置」列の値を、閾値位置テーブルの「次回更新までの区間距離」列の値から１減算した値で更新する。ただし、「次回更新までの区間距離」列の値が０の場合は、「係数開始位置」列の値を０のままにする（つまり、更新しない）。これは、図５のステップ（４）に備えるための処理である。
以後、更新部１５０は、計数対象区間が１進む毎に、計数結果テーブルの全ての「計数開始位置」列の値を１ずつ減算し、同列の値が０になったデータに対し、その時点で上述の（１）〜（２）と同様の破棄処理が実行される。
例えばデータＤ１における次回の破棄処理の実行は、「計数開始位置」列の値が０になる時点であるから、５区間先、即ち図５のステップ（４）の時点である。 (3) The discard processing unit 130 updates the value of the “counting start position” column of the counting result table with a value obtained by subtracting 1 from the value of the “section distance until next update” column of the threshold position table. However, if the value in the “distance distance until next update” column is 0, the value in the “coefficient start position” column is left as 0 (that is, not updated). This is a process for preparing for step (4) in FIG.
Thereafter, every time the counting target section advances by 1, the updating unit 150 subtracts 1 from all the “counting start position” columns in the counting result table, and the data for which the value in the column becomes 0 is Then, the discarding process similar to the above (1) to (2) is executed.
For example, the next execution of the discarding process for the data D1 is the time when the value of the “counting start position” column becomes 0, so it is five sections ahead, that is, the time of step (4) in FIG.

図６（２）は、破棄処理部１３０が以上の処理を行った後の各テーブルの状態である。
計数結果テーブルの１行目のデータが更新されて古い計数結果が削除されるとともに、「許容誤差」列の値が更新され、さらに閾値位置テーブルの１行目のデータが削除されていることが分かる。 FIG. 6B shows the state of each table after the discard processing unit 130 performs the above processing.
The first row data of the counting result table is updated to delete the old counting result, the value in the “allowable error” column is updated, and the first row data of the threshold position table is further deleted. I understand.

以後、破棄処理部１３０は、区間の境界に達する毎に同様の手順を繰り返す。これにより、計数結果テーブルのデータは、古い計数結果が削除されて新しいデータのみが保持されるとともに、「許容誤差」列の値は、破棄した分のデータに相当する値で更新されるため、メモリ容量を節約するとともに、計数結果の精度も一定レベルに維持される。 Thereafter, the discard processing unit 130 repeats the same procedure every time the section boundary is reached. As a result, the data of the counting result table deletes the old counting result and holds only new data, and the value in the “allowable error” column is updated with a value corresponding to the discarded data. The memory capacity is saved and the accuracy of the counting result is also maintained at a certain level.

なお、古い計数結果を削除しているのは、次々に最新のデータが計数装置１００に到達するようなデータ系列を計数対象とする場合、昔のデータであればあるほど、現在の状態を統計的に把握するためには不要となることに鑑みたものである。
図６に即して説明すると、「計数開始位置」列の値が０となっているようなデータは、計数のごく初期段階で出現したデータであり、これを一定数削除しても、現在の同データの状態を統計的に知る上では、あまり影響がないものと考えられるからである。 It should be noted that the old count results are deleted when the data series in which the latest data reaches the counting device 100 one after another is counted, the more old data is, the more the current state is statistical This is in view of the fact that it is unnecessary to grasp the situation.
Referring to FIG. 6, data whose value in the “counting start position” column is 0 is data that appears at the very initial stage of counting. This is because it is considered that there is not much influence on statistically knowing the state of the same data.

なお、破棄処理部１３０が図５〜図６で説明した破棄処理を行うことに加えて、非特許文献１に記載の従来技術と同様に、低頻度データ削除部１４０は、上記（式１）を用いて出現頻度の少ないデータを計数結果テーブルから削除する。
これら２重の削除処理により、消費するメモリ容量を効果的に抑えることができ、かつ更新部１５０が逐次データ更新を行うので、１区間分のデータ更新に要する時間間隔で計数結果を逐次得ることができる。 In addition to the discard processing unit 130 performing the discard processing described with reference to FIGS. 5 to 6, the low-frequency data deletion unit 140 performs the above (formula 1) as in the conventional technique described in Non-Patent Document 1. Is used to delete data with a low appearance frequency from the count result table.
These double deletion processes can effectively reduce the memory capacity to be consumed, and the update unit 150 sequentially updates data, so that count results can be obtained sequentially at the time interval required for data update for one section. Can do.

以上をまとめると、本実施の形態１に係る計数装置１００の動作手順は、概ね以下の通りである。 In summary, the operation procedure of the counting device 100 according to the first embodiment is roughly as follows.

（１）計数部１１０は、最初のＮ個分のデータをバッファ中にセットした上で計数し、計数結果を計数結果テーブルに格納するとともに、閾値位置テーブルに逐次値をセットしていく。 (1) The counting unit 110 performs counting after setting the first N pieces of data in the buffer, stores the counting results in the counting result table, and sequentially sets values in the threshold position table.

（２）破棄処理部１３０は、計数部１１０がＮ個分のデータをカウントした後、区間の境界毎に、古いデータの破棄を行うか否かを、「統計開始位置」列の値が０となっているか否かで判定し、図５で説明したような破棄処理を行う。 (2) After the counting unit 110 counts N data, the discard processing unit 130 indicates whether or not to discard old data for each section boundary, and the value of the “statistic start position” column is 0. Is determined, and the discarding process as described with reference to FIG. 5 is performed.

（３）低頻度データ削除部１４０は、１区間毎に上記（式１）の条件を満たす低頻度データを削除する。
（４）更新部１５０は、計数部１１０がＮ個分のデータをカウントした後、１区間毎に追加データのカウント結果を計数結果テーブルおよび閾値位置テーブルに格納する。 (3) The low frequency data deletion unit 140 deletes the low frequency data that satisfies the condition of (Equation 1) for each section.
(4) After the counting unit 110 counts N pieces of data, the updating unit 150 stores the count result of the additional data for each section in the counting result table and the threshold position table.

本実施の形態１では、計数結果等の格納形式として、図４のようなテーブル形式を例として説明したが、これに限られるものではなく、例えばリスト形式など、任意の格納形式を用いることができる。 In the first embodiment, the table format as shown in FIG. 4 has been described as an example of the storage format for counting results, but the present invention is not limited to this, and an arbitrary storage format such as a list format may be used. it can.

以上のように、本実施の形態１に係る計数装置１００では、計数部１１０は、出現頻度が閾値（εＮ＝５０）を超えた時点の区間番号を閾値位置テーブルに記録しておき、破棄処理部１３０は、Ｎ個のカウントが完了した以後は、区間の境界毎に閾値位置テーブルを参照し、所定条件の下で初期段階の計数結果を削除する。
これにより、計数結果を格納するためのメモリ容量を節約することができる。
また、初期段階の計数結果は最新の状態を統計的に知る観点からはあまり重要でないため、このような削除処理を行っても、計数結果の精度は一定レベルに維持される。 As described above, in the counting device 100 according to the first embodiment, the counting unit 110 records the section number at the time when the appearance frequency exceeds the threshold (εN = 50) in the threshold position table, and performs the discard process. After the N counts are completed, the unit 130 refers to the threshold position table for each section boundary and deletes the initial count result under a predetermined condition.
Thereby, the memory capacity for storing the counting result can be saved.
Further, since the count result at the initial stage is not so important from the viewpoint of statistically knowing the latest state, the accuracy of the count result is maintained at a certain level even if such deletion processing is performed.

また、破棄処理部１３０が初期段階の計数結果を削除する際には、閾値位置テーブルの「許容誤差更新値」、即ち計数結果が設定した閾値（許容誤差）を超える直前までの計数結果を用いて、計数結果テーブルの「許容誤差」列の値を更新するので、古い計数結果を削除しても、一定の誤差範囲内の計数結果が維持され、計数結果の精度が維持される。 When the discard processing unit 130 deletes the counting result at the initial stage, the “allowable error update value” in the threshold position table, that is, the counting result until immediately before the counting result exceeds the set threshold (allowable error) is used. Since the value in the “allowable error” column of the counting result table is updated, the counting result within a certain error range is maintained even when the old counting result is deleted, and the accuracy of the counting result is maintained.

また、更新部１５０は、計数部１１０がＮ個分のデータをカウントした後は、１区間分のデータのカウントを行う毎に計数結果を計数結果テーブルに逐次格納するので、他のアプリケーションは、計数結果をリアルタイムまたはこれに近い時間間隔で得ることができる。
これにより、計数結果を逐次利用したいアプリケーションが、計数の完了まで処理を待たされることがなくなり、処理の迅速に資する。 In addition, after the counting unit 110 counts N pieces of data, the update unit 150 sequentially stores the count result in the count result table every time the data for one section is counted. The counting result can be obtained in real time or at a time interval close to this.
As a result, an application that wants to sequentially use the counting result does not have to wait for the processing until the counting is completed, thereby contributing to a quick processing.

また、低頻度データ削除部１４０は、統計上の出現頻度が少ないデータを計数結果から削除するので、計数結果を格納するメモリ容量を抑えることができる。 Moreover, since the low frequency data deletion unit 140 deletes data with a low statistical appearance frequency from the count result, the memory capacity for storing the count result can be reduced.

実施の形態２．
図７は、本発明の実施の形態２に係るパケット収集装置２００の機能ブロック図である。パケット収集装置２００は、ネットワーク上を流れる通信パケットを収集し、パケット中の所定の情報、例えば送受信アドレスをカウントして、その結果を記憶するための装置である。
パケット収集装置２００は、実施の形態１の図１で説明した計数装置１００の構成に加え、新たにパケット収集部２１０ａ、２１０ｂを備える。図７では、計数部１１０と更新部１５０にそれぞれパケット収集部２１０を接続したが、これらは共通化してよい。 Embodiment 2. FIG.
FIG. 7 is a functional block diagram of the packet collection device 200 according to Embodiment 2 of the present invention. The packet collection device 200 is a device for collecting communication packets flowing on the network, counting predetermined information in the packets, for example, transmission / reception addresses, and storing the results.
The packet collection device 200 is further provided with packet collection units 210a and 210b in addition to the configuration of the counting device 100 described in FIG. 1 of the first embodiment. In FIG. 7, the packet collection unit 210 is connected to the counting unit 110 and the updating unit 150, respectively, but these may be shared.

パケット収集部２１０ａ、２１０ｂは、ネットワークに接続され、通信パケットを収集して、計数対象の情報、例えばパケットの送受信アドレスなどを抽出し、それぞれ計数部１１０、更新部１５０に出力する。
最初のＮ個分の送受信アドレスはパケット収集部２１０ａが収集し、以後はパケット収集部２１０ｂが収集する。
パケットを収集した以後の処理は、実施の形態１と同様であるため、説明を省略する。 The packet collection units 210a and 210b are connected to the network, collect communication packets, extract information to be counted, for example, transmission / reception addresses of packets, and output them to the counting unit 110 and the updating unit 150, respectively.
The first N transmission / reception addresses are collected by the packet collection unit 210a, and thereafter collected by the packet collection unit 210b.
Since the processing after collecting the packets is the same as in the first embodiment, the description thereof is omitted.

ネットワークの通信パケットの送受信アドレスは、膨大な数が存在するため、これを計数対象とする場合、メモリ容量を多く必要とする。
そこで、実施の形態１で説明したような計数装置１００の構成を用いて、送受信アドレスを計数することにより、少ないメモリ容量で、効率的に送受信アドレスの計数を行うことができる。 Since there are an enormous number of transmission / reception addresses of communication packets on the network, a large memory capacity is required when these addresses are counted.
Therefore, by counting the transmission / reception addresses using the configuration of the counting device 100 described in the first embodiment, the transmission / reception addresses can be efficiently counted with a small memory capacity.

実施の形態３．
以上の実施の形態１〜２では、データ系列中に出現する同一の値、例えば同一の送受信アドレスをカウントする手法について説明したが、カウントする対象は、必ずしも同一のデータに限る必要はなく、カウントの目的に合致するのであれば、「等価」な値の個数をカウントするようにしてもよい。 Embodiment 3 FIG.
In the first and second embodiments described above, the method of counting the same value appearing in the data series, for example, the same transmission / reception address, has been described. However, the object to be counted is not necessarily limited to the same data. The number of “equivalent” values may be counted as long as they meet the purpose.

例えば、実施の形態２におけるパケット収集装置２００において、同一のネットワークアドレスに関して送受信されるパケットをカウントしたいような場合は、計数部１１０がカウントを行う際に、サブネットマスクを掛け合わせて同値となるアドレスは、同じ値としてカウントするようにしてもよい。 For example, in the packet collection device 200 according to the second embodiment, when it is desired to count packets transmitted / received with respect to the same network address, when the counting unit 110 performs the counting, an address that is the same value by multiplying by the subnet mask May be counted as the same value.

実施の形態１に係る計数装置１００の機能ブロック図である。2 is a functional block diagram of a counting device 100 according to Embodiment 1. FIG. 計数部１１０がカウントするデータ系列の分割区分を説明するものである。The division division of the data series which the counting part 110 counts is demonstrated. 計数部１１０が実行するカウント処理を説明するものである。A counting process executed by the counting unit 110 will be described. 計数部１１０のカウント結果の記憶部１２０への格納例を示すものである。An example of storing the count result of the counting unit 110 in the storage unit 120 is shown. 破棄処理部１３０の処理手順を説明するものである。The processing procedure of the discard processing unit 130 will be described. 閾値位置テーブルに格納されているデータの使用手順である。This is a procedure for using data stored in the threshold position table. 実施の形態２に係るパケット収集装置２００の機能ブロック図である。6 is a functional block diagram of a packet collection device 200 according to Embodiment 2. FIG. 従来技術における計数結果の格納例を示すものである。The example of storing the count result in a prior art is shown.

Explanation of symbols

１００計数装置、１１０計数部、１２０記憶部、１３０破棄処理部、１４０低頻度データ削除部、１５０更新部、２００パケット収集装置、２１０ａ〜２１０ｂパケット収集部。 100 counting device, 110 counting unit, 120 storage unit, 130 discard processing unit, 140 infrequent data deleting unit, 150 updating unit, 200 packet collecting device, 210a to 210b packet collecting unit.

Claims

A method of grouping equivalents of input data series and counting the appearance frequency of the equivalent data for each group,
A storage device having a storage area for storing a predetermined number of input data series is provided, and
Dividing the storage area into a number of intervals equal to the allowable counting error;
A counting step of grouping equivalent data while inputting the data series in the storage area and counting the appearance frequency of the equivalent data for each group;
When the number of the data series input to the storage area reaches the maximum size of the storage area, a discarding step for discarding all or part of the counting result counted in the initial stage;
A low-frequency data deletion step of deleting each of the groups with a low frequency of appearance;
An update step of updating the counting result by additionally inputting the number corresponding to one section of the data series;
Have
In the counting step,
A set of the counting result of each group and the allowable counting error of the group is stored in the storage device,
In the discarding step,
The count error of the group that discards the count result is updated with the count result of the group before the discard, so that the count error before and after the discard is within the range of the allowable count error. Method.

In the counting step,
The count result up to the previous section of the section at the time when the appearance frequency count result exceeds the allowable count error is stored in the storage device as the allowable count error update value of the group for each group. Every
In the discarding step,
The counting method according to claim 1, wherein an allowable counting error of a group for discarding a counting result is updated with the updated allowable counting error value.

In the discarding step,
When updating the allowable count error of the group for discarding the count result with the allowable count error update value,
The counting method according to claim 2, wherein a count result of the appearance frequency of the group is subtracted by the allowable count error update value.

Repeating the discarding step, the infrequent data deleting step, and the updating step until the data series is completed,
In the counting step,
The number of the section at the time when the appearance frequency count result exceeds the allowable count error is stored in the storage device,
In the discarding step,
When updating the allowable count error of the group for discarding the count result with the allowable count error update value,
4. The counting according to claim 2, wherein the section number stored in the storage device in the counting step is stored in the storage device in combination with a counting result of the group. Method.

Every time the update step is executed,
1 is subtracted from the number of the section stored in the storage device in the discarding step,
The counting method according to claim 4, wherein the discarding step is executed only when the number is 0.

In the low frequency data deletion step,
Find the sum of the counting results for each group and the allowable counting error for that group,
The counting method according to any one of claims 1 to 5, wherein when the sum is equal to or less than the number of the sections, the counting result of the group is deleted.

A counting program for causing a computer to execute the counting method according to any one of claims 1 to 6.

An apparatus that groups equivalents of input data series and counts the appearance frequency of the equivalent data for each group,
A storage device having a storage area for storing a predetermined number of input data series;
A counting unit that groups equivalent data while inputting the data series in the storage area and counts the appearance frequency of the equivalent data for each group;
When the number of the data series input to the storage area reaches the maximum size of the storage area, a discarding unit that discards all or part of the counting result counted in the initial stage;
A low-frequency data deletion unit that deletes each of the groups with a low appearance frequency;
An update unit that updates the counting result by additionally inputting the number corresponding to one section of the data series;
With
The storage area is divided into a number of intervals equal to the allowable counting error,
The counting unit is
A set of the counting result of each group and the allowable counting error of the group is stored in the storage device;
The discarding unit
The count error of the group that discards the count result is updated with the count result of the group before the discard, so that the count error before and after the discard is within the range of the allowable count error. apparatus.

The counting unit is
The count results up to the previous section of the section at the time when the appearance frequency count result exceeds the allowable count error are stored in the storage device as the allowable count error update value of the group for each group,
The discarding unit
The counting device according to claim 8, wherein an allowable counting error of a group that discards the counting result is updated with the allowable counting error update value.

The discarding unit
When updating the allowable count error of the group for discarding the count result with the allowable count error update value,
The counting device according to claim 9, wherein a count result of the appearance frequency of the group is subtracted by the allowable count error update value.

Each process of the discard unit, the low frequency data deletion unit, and the update unit is repeated until the data series ends,
The counting unit is
The storage device stores the number of the section at the time when the appearance frequency count result exceeds the allowable count error,
The discarding unit
When updating the allowable count error of the group for discarding the count result with the allowable count error update value,
11. The counting device according to claim 9, wherein the number of the section stored in the storage device by the counting unit is stored in the storage device in combination with the count result of the group.

The update unit
Each time the processing of the updating unit is executed, the discarding unit stores 1 in the section number stored in the storage device,
The discarding unit
The counting device according to claim 11, wherein the processing of the discarding unit is executed only when the number is 0.

The infrequent data deletion unit is
Find the sum of the counting results for each group and the allowable counting error for that group,
The counting device according to any one of claims 8 to 12, wherein when the sum is equal to or less than the number of the sections, the counting result of the group is deleted.