JP4688083B2

JP4688083B2 - Reference value prediction method, system and program

Info

Publication number: JP4688083B2
Application number: JP2007155066A
Authority: JP
Inventors: 薫明原田; 亮一川原; 達哉森; 憲昭上山; 秀明吉野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2007-06-12
Filing date: 2007-06-12
Publication date: 2011-05-25
Anticipated expiration: 2027-06-12
Also published as: JP2008311720A

Description

本発明は、時々刻々と観測される周期を持ったトラヒック時系列データの分析技術に係り、特に、周期性を利用して、観測されるネットワークトラヒック等の異常判定に用いる基準値（ベーストラヒック、ベースライン）の予測を高精度に行うのに好適な技術に関するものである。 The present invention relates to a technique for analyzing traffic time-series data having a period observed every moment, and in particular, using a periodicity, a reference value (base traffic, The present invention relates to a technique suitable for performing (baseline) prediction with high accuracy.

日々増大するネットワーク需要を背景に、ＤＤｏＳ（ＤｉｓｔｒｉｂｕｔｅｄＤｅｎｉａｌｏｆＳｅｒｖｉｃｅ）攻撃にあげられるようなネットワークトラヒック資源を大量消費する不正行為が増えてきた。このような不正行為でネットワーク帯域が浪費されてしまえば、一般ユーザの通信品質を著しく劣化させることにつながる。そのため、このＤＤｏＳ攻撃のような異常トラヒックが発生したことを一刻も早く検知し、対策することがネットワーク管理者に求められている。 Against the background of ever-increasing network demand, fraudulent acts that consume a large amount of network traffic resources, such as DDoS (Distributed Denial of Service) attacks, have increased. If the network bandwidth is wasted due to such illegal acts, the communication quality of general users will be significantly degraded. Therefore, a network administrator is required to detect and take measures as soon as possible that abnormal traffic such as this DDoS attack has occurred.

異常トラヒックの発生を検知する技術として、観測トラヒックのボリューム変化に着目する技術がある。ここで観測トラヒックとは、ある決められた時間間隔毎に測定される、パケット数やバイト数などのデータ量の時間推移を言う。この技術では、観測トラヒックが基準値からある一定量以上の逸脱を示したとき、トラヒックに異常が発生したと判定する。従来、この「基準値（ベーストラヒック）」と「ある一定量（閾値）」は、オペレータの経験により決定されていた。 As a technique for detecting the occurrence of abnormal traffic, there is a technique that pays attention to a change in the volume of observation traffic. Here, the observation traffic refers to a time transition of a data amount such as the number of packets and the number of bytes measured at a predetermined time interval. In this technique, when the observed traffic shows a certain amount of deviation from a reference value, it is determined that an abnormality has occurred in the traffic. Conventionally, the “reference value (base traffic)” and “a certain amount (threshold value)” have been determined based on the experience of the operator.

しかし、大規模ネットワークの綿密な監視を想定したとき、監視対象が膨大となるため、人手により全てを管理することは不可能に近い。何故なら、それぞれの監視対象におけるトラヒックデータは、通常のトラヒック量やトラヒック量の変動幅も異なるため、ベーストラヒック（基準値）と閾値（一定量）は監視対象毎個別に設定する必要があるためである。そこで、ベーストラヒック（基準値）設定と閾値（一定量）設定を自動化することが望まれる。 However, when close monitoring of a large-scale network is assumed, the number of monitoring targets becomes enormous, and it is almost impossible to manage everything manually. This is because the traffic data of each monitoring target has different normal traffic amount and fluctuation range of traffic amount, so it is necessary to set the base traffic (reference value) and threshold value (constant amount) individually for each monitoring target. It is. Therefore, it is desired to automate the base traffic (reference value) setting and the threshold (constant amount) setting.

閾値設定を自動化する技術として、観測トラヒックの平均値や標準偏差といった統計量を用い、統計的根拠に基づいた外れ値判定を利用する技術がある。例えば非特許文献１において記載されているように、移動平均値をベーストラヒック（基準値）、標準偏差を閾値（一定量）として用いるボリンジャーバンドと呼ばれる技術が、特に多用されている。 As a technique for automating threshold setting, there is a technique that uses an outlier determination based on a statistical basis using a statistical quantity such as an average value or standard deviation of observation traffic. For example, as described in Non-Patent Document 1, a technique called a Bollinger band that uses a moving average value as a base traffic (reference value) and a standard deviation as a threshold value (a constant amount) is particularly frequently used.

しかし、このボリンジャーバンドをそのままトラヒックの異常検知に適用しても、精度の良い異常検知を行うことは難しい。その理由は、ボリンジャーバンドは観測トラヒックにおける統計的性質の変化を検出できるが、実際の観測トラヒックでは統計的性質の変化がそのまま異常であるとは言い切れないためである。 However, even if this Bollinger band is applied to traffic abnormality detection as it is, it is difficult to accurately detect abnormality. The reason is that the Bollinger band can detect a change in the statistical properties in the observed traffic, but it cannot be said that the change in the statistical properties is actually abnormal in the actual observed traffic.

すなわち、トラヒックの生成が人間の生活サイクルに依存し、この場合、トラヒックデータは一日単位、一週間単位、一年単位といった周期性を見せることがある。 That is, the generation of traffic depends on the human life cycle, and in this case, the traffic data may show periodicity such as one day, one week, or one year.

例えば、ネットワークトラヒックにおいては、一日単位の周期的傾向としては、人々の活動が少ない深夜から朝にかけてはトラヒック量が少なく、人々の活動が活発となる昼から夜にかけてはトラヒック量が増加するという傾向が見られ、また、一週間単位の周期的傾向としては、平日はトラヒック量が多く、休日はトラヒック量が減少するという傾向が見られ、さらに、一年単位の周期的傾向としては、長期休暇が見られる時期はトラヒック量が少なく、休暇明けと見られる時期にはトラヒック量が増加するという傾向が見られるといったものである。 For example, in network traffic, the daily trend is that the amount of traffic is low from midnight to morning when there is little activity, and the amount of traffic increases from day to night when people are active. There is a tendency, and as a weekly periodic trend, there is a tendency that the traffic volume is high on weekdays and the traffic volume decreases on holidays, and further, the cyclical trend per year is long-term. There is a small amount of traffic during the holidays, and a tendency for traffic to increase during the holidays.

また、観測トラヒックのばらつき具合も、トラヒック量と共に周期的な増減を見せる。 In addition, the variation in observed traffic also shows a periodic increase and decrease with the amount of traffic.

以上のように、トラヒック量が周期性を見せる場合、特定の時間帯でトラヒックの統計的性質が劇的に変化したとしても、毎周期起こっている変化であれば正常だといえる。 As described above, when the traffic amount shows periodicity, it can be said that it is normal if the change occurs every period even if the statistical properties of the traffic change dramatically in a specific time zone.

しかし、ボリンジャーバンドのように、近傍過去の観測値のみを利用する移動平均値では、ベーストラヒックの統計的性質の周期的な変化に対応することができない。また、トラヒック量の観測値のばらつきを示す標準偏差も、近傍過去の観測値のみを利用するのであれば同様の問題が生じる。 However, like the Bollinger band, a moving average value that uses only observation values in the vicinity of the past cannot cope with periodic changes in the statistical properties of base traffic. In addition, the standard deviation indicating the variation in the observed value of the traffic amount has the same problem if only the observed values in the past are used.

このように、ボリンジャーバンドは周期性を考慮する仕組みがないため、周期的トラヒックを対象とした場合、精度の良い異常検知を行うのは難しい。 Thus, since the Bollinger band does not have a mechanism for considering periodicity, it is difficult to detect anomalies with high accuracy when periodic traffic is targeted.

一方、周期性を考慮した異常検知技術として、例えば非特許文献２に記載のＨｏｌｔ−Ｗｉｎｔｅｒｓ法を用いる技術が知られている。この技術は、直近の過去の情報の他にも、より遠い過去の情報も利用し、周期性を考慮したベースラインの予測による異常検知を実現している。 On the other hand, as an abnormality detection technique in consideration of periodicity, for example, a technique using the Holt-Winters method described in Non-Patent Document 2 is known. This technology uses not only the latest past information but also more distant past information, and realizes anomaly detection based on baseline prediction in consideration of periodicity.

しかし、この技術では、複数のパラメタをトラヒックにあわせて経験的に調節する必要があり、複数の観測トラヒックを対象とした異常検知に用いるのは困難である。 However, with this technique, it is necessary to empirically adjust a plurality of parameters in accordance with the traffic, and it is difficult to use it for anomaly detection targeting a plurality of observation traffic.

ＪｏｈｎＡ．Ｂｏｌｌｉｎｇｅｒ（ジョン・Ａ・ボリンジャー）著，長尾慎太郎監修，飯田恒夫訳，“ボリンジャーバンド入門”，パンローリング株式会社，２００２，ｐｐ．９５−１１７，３５１．John A. Bollinger (John A. Bollinger), supervised by Shintaro Nagao, translated by Tsuneo Iida, “Introduction to Bollinger Band”, Pan Rolling Co., 2002, pp. 95-117,351. ＪａｋｅＤ．Ｂｒｕｔｌａｇ，“ＡｂｅｒｒａｎｔＢｅｈａｖｉｏｒＤｅｔｅｃｔｉｏｎｉｎＴｉｍｅＳｅｒｉｅｓｆｏｒＮｅｔｗｏｒｋＭｏｎｉｔｏｒｉｎｇ”，１４ｔｈＳｙｓｔｅｍｓａｄｍｉｎｉｓｔｒａｔｉｏｎｃｏｎｆｅｒｅｎｃｅ（ＬＩＳＡ２０００），Ｄｅｃ．２０００，ｐｐ．１３８−１４６．Jake D. Brutlag, "Aberrant Behavior Detection in Time Series for Network Monitoring", 14th Systems administration conference (LISA2000), Dec. 2000, pp. 138-146.

解決しようとする問題点は、従来の技術のボリンジャーバンドでは周期性を考慮する仕組みがないため、周期的トラヒックを対象とした場合、精度の良い異常検知を行うのは難しい点と、従来技術のＨｏｌｔ−Ｗｉｎｔｅｒｓ法を用いる技術では、複数のパラメタをトラヒックにあわせて経験的に調節する必要があり、複数の観測トラヒックを対象とした異常検知に用いるのが困難な点である。 The problem to be solved is that the conventional technology Bollinger band does not have a mechanism to consider periodicity, so it is difficult to detect anomalies with high accuracy when targeting periodic traffic. In the technique using the Holt-Winters method, it is necessary to empirically adjust a plurality of parameters in accordance with the traffic, which is difficult to use for detecting anomalies targeting a plurality of observed traffic.

本発明の目的は、これら従来技術の課題を解決し、例えば、ネットワークで送受信されるデータ量などのトラヒックの異常検知を高精度に、かつ、効率的に行うことを可能とすることである。 An object of the present invention is to solve these problems of the prior art, and to detect a traffic abnormality such as the amount of data transmitted and received over a network with high accuracy and efficiency.

上記目的を達成するため、本発明では、所定の時間間隔単位で時系列に集計されるデータの量が周期的に増減する際の、このデータの量の正常性（異常）の判定に用いる基準値を、プログラムされたコンピュータによって予測するために、一周期をスロットへと分割して統計情報を精密に取り扱うことで、周期的に統計的性質が変化する時系列データに対するベーストラヒック予測の精度を向上させる。すなわち、プログラムされたコンピュータが実行する処理手段として、選択手段と複数のパラメタ推定手段および予測手段を設け、選択手段は、一周期分のデータを、予め設定されたタイムスロット単位で時系列に分割して出力し、各パラメタ推定手段は、選択手段から出力された各タイムスロット内のそれぞれのデータに対して、例えばＥＭアルゴリズム等の第１の時系列解析手法を用いて、基準値を予測するためのパラメタを推定し、この推定したパラメタ推定値と過去周期の同じタイムスロットにおけるパラメタ推定値の重み付け平均値を次周期におけるパラメタとして算出し、予測手段は、各パラメタ推定手段が算出した次周期におけるパラメタを用いた例えばカルマンフィルタ等の時系列解析モデルを利用した第２の時系列解析手法により基準値を算出する。 In order to achieve the above object, in the present invention, a criterion used for determining the normality (abnormality) of the amount of data when the amount of data aggregated in time series periodically increases or decreases in predetermined time interval units. In order to predict the value by a programmed computer, the accuracy of base traffic prediction for time-series data whose statistical properties change periodically can be improved by dividing one cycle into slots and handling statistical information precisely. Improve. That is, as a processing means to be executed by a programmed computer, a selection means, a plurality of parameter estimation means and a prediction means are provided, and the selection means divides data for one cycle into time series in units of preset time slots. Each parameter estimation unit predicts a reference value for each data in each time slot output from the selection unit using a first time series analysis method such as an EM algorithm, for example. And the weighting average value of the parameter estimation value in the same time slot of the past cycle is calculated as a parameter in the next cycle, and the prediction unit is configured to calculate the next cycle calculated by each parameter estimation unit. Second time series analysis method using time series analysis model such as Kalman filter using parameters in Calculating a reference value by.

本発明によれば、一周期をスロットへと分割して統計情報を精密に取り扱うことにより、周期的に統計的性質が変化する時系列データに対するベーストラヒック予測の精度を向上させることが可能となる。 According to the present invention, by accurately handling statistical information by dividing one period into slots, it is possible to improve the accuracy of base traffic prediction for time-series data whose statistical properties change periodically. .

以下、図を用いて本発明を実施するための最良の形態例を説明する。図１は、本発明に係る基準値予測システムの構成例を示すブロック図であり、図２は、本発明に係る基準値予測処理のスロット割り当ての第１の例を示す説明図、図３は、本発明に係る基準値予測処理のスロット割り当ての第２の例を示す説明図、図４は、本発明に係る基準値予測処理のスロット割り当ての第３の例を示す説明図、図５は、本発明に係る基準値予測処理動作例を示すフローチャートである。 The best mode for carrying out the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram illustrating a configuration example of a reference value prediction system according to the present invention, FIG. 2 is an explanatory diagram illustrating a first example of slot allocation in a reference value prediction process according to the present invention, and FIG. FIG. 4 is an explanatory diagram showing a second example of slot allocation in the reference value prediction process according to the present invention. FIG. 4 is an explanatory diagram showing a third example of slot allocation in the reference value prediction process according to the present invention. 5 is a flowchart showing an example of reference value prediction processing operation according to the present invention.

図１における基準値予測システム１は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）や主メモリ、表示装置、入力装置、外部記憶装置からなるコンピュータ構成からなり、光ディスク駆動装置等を介してＣＤ−ＲＯＭ等の記憶媒体に記録されたプログラムやデータを外部記憶装置内にインストールした後、この外部記憶装置から主メモリに読み込みＣＰＵで処理することにより、所定の時間間隔単位で時系列に集計されるデータの量が周期的に増減する際の、データの量の異常を判定する基準値を予測するシステムであって、プログラムされたコンピュータの処理機能として、分配器２、選択器３、複数のパラメタ推定器４ａ〜４ｎ、複数の混合器５ａ〜５ｎ、予測器６ａ〜６ｎを有する。 A reference value prediction system 1 in FIG. 1 has a computer configuration including a CPU (Central Processing Unit), a main memory, a display device, an input device, and an external storage device, and a storage medium such as a CD-ROM via an optical disk drive device or the like. After the program or data recorded in the external storage device is installed in the external storage device, it is read from the external storage device into the main memory and processed by the CPU, so that the amount of data aggregated in time series in a predetermined time interval unit Is a system for predicting a reference value for determining an abnormality in the amount of data when it increases or decreases automatically, and as a processing function of a programmed computer, a distributor 2, a selector 3, and a plurality of parameter estimators 4a to 4n And a plurality of mixers 5a to 5n and predictors 6a to 6n.

本例では、所定の時間間隔単位で時系列に集計されるデータの量をネットワーク上のトラヒックとし、集計したネットワークトラヒックに異常が発生したと判断する際に用いる基準値をベーストラヒックとし、このベーストラヒックを予測する動作を説明する。 In this example, the amount of data aggregated in time series in a predetermined time interval is defined as traffic on the network, and the reference value used when determining that an abnormality has occurred in the aggregated network traffic is defined as base traffic. An operation for predicting traffic will be described.

本例で想定するトラヒックとは、トラヒック観測装置で観測されたネットワーク上で送受信されるパケット数やバイト数などを、一定間隔毎に集約したものを言う。例えば、一分間隔に集約するなら、一時間の観測トラヒックは6０個のデータ時系列となる。 The traffic assumed in this example refers to an aggregation of the number of packets and the number of bytes transmitted and received on the network observed by the traffic observation apparatus at regular intervals. For example, if the data is aggregated at one minute intervals, the observation traffic for one hour becomes 60 data time series.

観測できるものであれば、フロー数やルータのメモリ使用量などをトラヒックとして取り扱っても良い。 As long as it can be observed, the number of flows and the memory usage of the router may be handled as traffic.

本例では、正常なトラヒック量の基準（ベーストラヒック）を予測することが目的であり、ここで、ベーストラヒックを予測することの必要性は、新しく受け取った観測値が異常値であるかどうかを迅速に判定したいことにある。そして、トラヒックの傾向変化に応じたトラヒック予測をするため、本例では過去周期の情報を参照する。 In this example, the goal is to predict normal traffic volume criteria (base traffic), where the necessity of predicting base traffic is based on whether newly received observations are outliers. It is to be judged quickly. In order to perform traffic prediction according to a change in traffic trend, information on the past cycle is referred to in this example.

具体的には、観測トラヒックを周期毎に取り扱い、現時刻におけるベーストラヒック予測に、過去周期の同時刻に対応するトラヒック傾向を利用する。 Specifically, the observed traffic is handled for each period, and the traffic tendency corresponding to the same time in the past period is used for the base traffic prediction at the current time.

また、このベーストラヒック予測において、パラメタ設定の簡略化を実現する。 In addition, in this base traffic prediction, parameter setting is simplified.

このように、本例では、予め定められる一定期間にネットワーク上で計測されたフロー数やバイト数などの観測値について、一定期間に計測された観測値の時間変化をトラヒックと呼ぶ。また、観測値が記録されるタイミングを時刻と呼ぶ。 As described above, in this example, with respect to observation values such as the number of flows and the number of bytes measured on the network during a predetermined period, a time change of the observation value measured during the predetermined period is referred to as traffic. The timing at which the observation value is recorded is called time.

このトラヒックについて、本例では、最初に、一周期分のデータ長を決定し、一周期分のデータをスロットと呼ぶさらに小さな単位に分割する。ただし、各スロットは一つ以上の時刻を含むとする。 With regard to this traffic, in this example, first, the data length for one cycle is determined, and the data for one cycle is divided into smaller units called slots. However, each slot includes one or more times.

この一周期の長さとスロットの分割パターンは、本システムを制御するコンフィグ情報として予め定められるものとする。例えば、一周期の長さは監視対象によって様々な値をとる可能性はあるものの、人間の活動に依存するトラヒックであれば、一日、一週間、一年等が妥当である。 The length of one cycle and the slot division pattern are determined in advance as configuration information for controlling the system. For example, although the length of one cycle may take various values depending on the monitoring target, one day, one week, one year, etc. are appropriate for traffic depending on human activities.

また、本例では、スロット毎にパラメタ推定を行うため、スロットの大きさはトラヒック傾向が推定できるある程度の大きさ、例えば３０分、一時間、二時間等が妥当となる。 Further, in this example, since parameter estimation is performed for each slot, the size of the slot is appropriate to a certain extent that allows the traffic tendency to be estimated, for example, 30 minutes, 1 hour, 2 hours, and the like.

このようにトラヒックデータを分割する理由は、トラヒックの統計的性質が周期的に変化することを利用しやすくするためである。これは、本例のベーストラヒック予測において、統計量に基づく予測手法やカルマンフィルタなどの時系列解析手法を利用していることによる。 The reason for dividing the traffic data in this way is to make it easier to utilize the fact that the statistical properties of the traffic change periodically. This is because the base traffic prediction in this example uses a prediction method based on statistics and a time series analysis method such as a Kalman filter.

このようなベーストラヒック予測手法のいずれにおいても、トラヒックを正確に予測するためには正確な入力パラメタを必要とする。例えば、統計的手法に基づく予測手法では、次に観測される値が現時刻の観測値からどの程度増減するかを示す増加率が、また、時系列解析手法においては、アルゴリズムを制御するための一つ以上の入力値がパラメタである。 In any of such base traffic prediction methods, accurate input parameters are required to accurately predict traffic. For example, in the prediction method based on the statistical method, the rate of increase indicating how much the next observed value increases or decreases from the observed value at the current time, and in the time series analysis method, the algorithm is used to control the algorithm. One or more input values are parameters.

これらのパラメタは、予測を行う時刻におけるトラヒックの統計的性質を反映しているほど正確と言え、正確なパラメタであれば正確なベーストラヒックの予測を実現することができる。 These parameters are so accurate that they reflect the statistical properties of the traffic at the time of prediction, and accurate base traffic prediction can be realized if the parameters are accurate.

本例では、過去周期のあるスロットにおけるトラヒック傾向を、将来の同じ位置にあるスロットにおけるトラヒック傾向に近いとして、スロット単位でパラメタを見積もる。例えば、過去周期の対応するスロットにおけるトラヒックが増加傾向であれば、将来もそのスロットにおいては同様の傾向が観測されるだろうというものである。 In this example, assuming that the traffic tendency in a slot having a past cycle is close to the traffic tendency in a slot at the same position in the future, the parameter is estimated in units of slots. For example, if the traffic in the corresponding slot of the past cycle is increasing, the same trend will be observed in that slot in the future.

このとき、パラメタ推定はスロット内のデータを用いて行われるため、パラメタ推定のタイミングは該当スロットに含まれるデータが全て観測された後となる。この周期性を利用したパラメタ推定により、より正確な入力パラメタ設定が実現され、各予測手法におけるトラヒック予測精度が向上する。 At this time, since parameter estimation is performed using data in the slot, the parameter estimation timing is after all the data included in the corresponding slot has been observed. By parameter estimation using this periodicity, more accurate input parameter setting is realized, and traffic prediction accuracy in each prediction method is improved.

本例において、スロット単位で統計的性質を見積もる理由は、（ａ）十分なサンプルデータ数を確保することによる各周期における統計量の推定精度を向上させるためと、（ｂ）トラヒックの統計的性質変化を精密に推定できる粒度に一周期を分割するためと、（ｃ）トラヒックにおける統計的性質変化のタイミングについて周期毎の時間のずれを許容するためである。 In this example, the reason why the statistical property is estimated in slot units is that (a) to improve the estimation accuracy of the statistic in each period by securing a sufficient number of sample data, and (b) the statistical property of the traffic This is because one period is divided into granularities that can accurately estimate the change, and (c) a time shift for each period is allowed with respect to the timing of the statistical property change in the traffic.

パラメタ推定は、推定に利用できるデータ数が多いほど推定精度が高まるため、スロットサイズはある程度の大きさにするほうが良い。ただし、トラヒックの統計的性質が時間経過と共に変化する場合、スロットサイズを大きくしすぎるとその変化を考慮できなくさせ、結果的にパラメタ推定精度の低下を招く。 In parameter estimation, as the number of data that can be used for estimation increases, the estimation accuracy increases. Therefore, it is better to set the slot size to some extent. However, when the statistical properties of traffic change with time, if the slot size is too large, the change cannot be taken into account, resulting in a decrease in parameter estimation accuracy.

このように、スロットサイズは大きすぎても小さすぎてもパラメタ推定精度が劣化するものの、人間の生活サイクルに依存するトラヒックであれば、一時間、二時間といったスロットサイズが妥当となる。 Thus, although the parameter estimation accuracy deteriorates if the slot size is too large or too small, a slot size of 1 hour or 2 hours is appropriate for traffic depending on the human life cycle.

このとき、各スロットにいくつの時刻（データ）が含まれるかにより、該当スロットでのパラメタ推定精度が左右される。本例では、パラメタ推定値の精度を向上させるため、過去周期のスロットにおけるパラメタ推定値と現在周期のスロットにおける最新のパラメタ推定値の重み付け平均値を、次周期で用いるパラメタとして採用する。 At this time, the parameter estimation accuracy in the corresponding slot depends on how many times (data) are included in each slot. In this example, in order to improve the accuracy of the parameter estimation value, the weighted average value of the parameter estimation value in the slot of the past cycle and the latest parameter estimation value in the slot of the current cycle is adopted as a parameter used in the next cycle.

このように、本例では、パラメタ推定に間接的に過去データを利用しているため、スロットサイズが小さい場合においても正確なパラメタ設定を実現できる。 Thus, in this example, since past data is indirectly used for parameter estimation, accurate parameter setting can be realized even when the slot size is small.

過去の推定値と現在の推定値を加重平均することは、偶発的なトラヒック傾向変化による推定精度の劣化を抑えることと、持続的なトラヒック傾向変化を学習することを可能とする。 The weighted average of the past estimated value and the current estimated value makes it possible to suppress deterioration in estimation accuracy due to an accidental traffic trend change and to learn a continuous traffic trend change.

この加重平均における、過去の推定値と現在の推定値とに付与する重みは、周期的変化が安定している場合は過去の推定値の重みを大きくし、周期的変化が不安定な場合は現在の推定値の重みを大きくすれば良い。 In this weighted average, the weight given to the past estimated value and the current estimated value increases the weight of the past estimated value when the periodic change is stable, and when the periodic change is unstable The weight of the current estimated value may be increased.

また、スロットにある程度の幅を持たせることで、時刻単位で見た場合に、各周期の統計的性質が完全な一致を見せなくとも、スロット単位で見た場合は、各周期の統計的性質はほぼ一致するとみなせる。 In addition, by giving a certain amount of width to the slot, the statistical properties of each period when viewed in slots, even if the statistical properties of each cycle do not show a perfect match when viewed in time units. Can be regarded as almost matching.

このように、統計的性質の時間変化を見積もる粒度を粗くすることで、結果的により精度の良いパラメタ推定を実現させることができる。 As described above, by coarsening the granularity for estimating the temporal change in statistical properties, it is possible to realize parameter estimation with higher accuracy as a result.

本例において予め定めるコンフィグは、以上のように、（１）一周期の長さ、（２）スロットの分割パターン、（３）過去のパラメタとの加重平均を取る重みの３つとなる。 As described above, there are three predetermined configurations in this example: (1) the length of one cycle, (2) the slot division pattern, and (3) the weight for taking a weighted average with the past parameters.

ただし、これらのコンフィグはＨｏｌｔ−Ｗｉｎｔｅｒｓ法のように精密に定める必要はない。何故なら、トラヒックの傾向変化は連続的に生じるものであり、一周期の長さやスロットの分割パターンに多少のずれがあっても、パラメタ推定値への影響は少ない。 However, these configurations do not need to be determined precisely as in the Holt-Winters method. This is because the traffic trend changes continuously, and even if there is a slight shift in the length of one cycle or the division pattern of the slot, the influence on the parameter estimation value is small.

また、観測トラヒックが周期性を有しているとき、スロット内の統計的性質は各周期で似通ったものとなる。そのため、パラメタ推定における加重平均の重みについても、設定値の差が予測精度に与える影響は少ない。 Further, when the observed traffic has periodicity, the statistical properties in the slot are similar in each period. For this reason, the weighted average weight in parameter estimation has little influence on the prediction accuracy due to the difference in the set values.

また、パラメタ推定精度向上の手法として、複数のコンフィグによるパラメタ推定値について重み付け平均をとることも考えられる。これは、例えばトラヒックの周期が複合的構造となっている場合に有効な手法となる。ここで、複合的構造とは、一日周期、一週間周期、一年周期というように、より大きな周期の中に小さな周期が見られるトラヒック構造を指す。 In addition, as a technique for improving the parameter estimation accuracy, it is also conceivable to take a weighted average for parameter estimation values by a plurality of configurations. This is an effective technique when, for example, the traffic cycle has a complex structure. Here, the composite structure refers to a traffic structure in which a small cycle is seen in a larger cycle such as a daily cycle, a weekly cycle, or a yearly cycle.

このような複合的周期構造を持つトラヒックに対して、複数のコンフィグによるパラメタ推定を行い、推定されたそれぞれのパラメタを各時刻で合成することにより、短期的傾向、中期的傾向、長期的傾向を反映した、ベーストラヒック予測のためのより詳細なパラメタ設定を実現することができる。 For traffic with such a complex periodic structure, parameter estimation by multiple configurations is performed, and each estimated parameter is synthesized at each time, so that short-term trends, medium-term trends, and long-term trends can be obtained. It is possible to realize more detailed parameter setting for reflecting the base traffic.

また、この手法は、周期毎のパラメタ推定精度を確保するだけのスロットサイズを維持しつつ、そのスロットサイズより細かな時間粒度でのトラヒックの時間変動に対応するためにも有効であると考えられる。 In addition, this method is considered to be effective for dealing with traffic time fluctuations at a time granularity finer than the slot size while maintaining a slot size sufficient to ensure parameter estimation accuracy for each period. .

例えば、図４のように、スロットの分割時刻を半分ずらした二つのパラメタ推定値を合成すれば、スロットサイズの半分の時間粒度でパラメタ設定を実現できる。 For example, as shown in FIG. 4, if two parameter estimation values obtained by shifting the slot division time by half are combined, parameter setting can be realized with a time granularity that is half the slot size.

さらに、いずれの場合も、パラメタ推定値の重み付けを自動化させる技術として以下の手法が考えられる。 Furthermore, in any case, the following methods can be considered as techniques for automating the weighting of parameter estimation values.

まず、あるコンフィグにおけるパラメタ推定値に基づくある時刻の予測について、予測誤差を観測値と予測値の差として定義する。この場合、予測誤差の時間累積である累積予測誤差が小さいほど予測精度が高いと言える。 First, for a prediction at a certain time based on a parameter estimation value in a certain configuration, a prediction error is defined as a difference between the observed value and the predicted value. In this case, it can be said that the smaller the cumulative prediction error that is the time accumulation of the prediction error, the higher the prediction accuracy.

次に、累積予測誤差に反比例する量を、各コンフィグにおけるパラメタ推定値の重みとする。これにより、精度の良いパラメタ推定値は大きな重みが設定される。このように設定される重みで重み付け平均を行うことにより、精度の良いパラメタが強く反映されるベーストラヒック予測が実現される。 Next, an amount that is inversely proportional to the cumulative prediction error is set as a weight of the parameter estimation value in each configuration. Thus, a large weight is set for the parameter estimation value with high accuracy. By performing the weighted average with the weights set in this way, the base traffic prediction in which highly accurate parameters are strongly reflected is realized.

このように、図１における基準予測システム１は、所定の時間間隔単位で時系列に集計されるデータの量が周期的に増減する際の、このデータ量の正常性を判定するのに用いる基準値を、一周期をスロットへと分割して統計情報を精密に取り扱うことで、周期的に統計的性質が変化する時系列データに対するベーストラヒック予測の精度を向上させる。 As described above, the reference prediction system 1 in FIG. 1 uses the reference used to determine the normality of the data amount when the amount of data aggregated in time series periodically increases or decreases in predetermined time interval units. By accurately handling statistical information by dividing one period into slots, the accuracy of base traffic prediction for time-series data whose statistical properties change periodically is improved.

すなわち、基準予測システム１は、プログラムされたコンピュータが実行する処理手段として、分配器２、選択器３、パラメタ推定器４ａ〜４ｎ、混合器５ａ〜５ｎ、予測器６ａ〜６ｎを具備し、分配器２は入力された観測値（所定の時間間隔単位で時系列に集計されるデータ量）を選択器３および予測器６ａ〜６ｎに出力し、選択器３は、入力された観測値の一周期分を、予め設定されたタイムスロット単位で時系列に分割して各タイムスロット毎に設けられたパラメタ推定器４ａ〜４ｎに出力する。 That is, the reference prediction system 1 includes a distributor 2, a selector 3, parameter estimators 4a to 4n, mixers 5a to 5n, and predictors 6a to 6n as processing means executed by a programmed computer. The device 2 outputs the input observed value (the amount of data aggregated in time series in a predetermined time interval unit) to the selector 3 and the predictors 6a to 6n. The selector 3 outputs one of the input observed values. The period is divided into time series in units of preset time slots and output to parameter estimators 4a to 4n provided for each time slot.

各パラメタ推定器４ａ〜４ｎは、選択器３から出力された各タイムスロット内のそれぞれのデータに対して、例えば、「Ｒ．Ｈ．ＳｈｕｍｗａｙａｎｄＤ．Ｓ．Ｓｔｏｆｆｅｒ，“ＤｙｎａｍｉｃＬｉｎｅａｒＭｏｄｅｌｓＷｉｔｈＳｗｉｔｃｈｉｎｇ” ＪｏｕｒｎａｌｏｆｔｈｅＡｍｅｒｉｃａｎＳｔａｔｉｓｔｉｃａｌＡｓｓｏｃｉａｔｉｏｎ，Ｓｅｐｔｅｍｂｅｒ１９９１，Ｖｏｌ．８６，Ｎｏ．４１５」に記載の「ＥＭアルゴリズム」等の第１の時系列解析手法を用いて、基準値を予測するためのパラメタを推定し、この推定したパラメタ推定値と過去周期の同じタイムスロットにおけるパラメタ推定値の重み付け平均値を次周期におけるパラメタとして算出し、予測器６ａ〜６ｎは、各パラメタ推定器４ａ〜４ｎが算出した次周期におけるパラメタを用いた例えばカルマンフィルタ等の時系列解析モデルを利用した第２の時系列解析手法により基準値を算出する。 Each parameter estimator 4a to 4n applies, for example, “RH SHUMWAY and DS Stoffer,“ Dynamic Linear Models With Switching ”to each data in each time slot output from the selector 3. The first time series analysis method such as “EM algorithm” described in “Journal of the American Statistical Association, September 1991, Vol. 86, No. 415” is used to estimate parameters for predicting the reference value. The estimated parameter estimated value and the weighted average value of the parameter estimated value in the same time slot of the past cycle are calculated as parameters in the next cycle, and the predictors 6a to 6n are connected to the parameter estimators 4a to 4n. n to calculate the reference value by the second time-series analysis method using time series analysis model of e.g. Kalman filter or the like using the parameters in the next cycle is calculated.

図１の基準値予測システム１は、各パラメタ推定器４ａ〜４ｎによるパラメタ推定精度を向上させるため、特に、トラヒックの周期が、一日周期、一週間周期、一年周期というように、より大きな周期の中に小さな周期が見られるトラヒック構造（複合的構造）となっている場合に有効とするために、複数のコンフィグによるパラメタ推定値について重み付け平均をとるようにしている。 In order to improve the parameter estimation accuracy by the parameter estimators 4a to 4n, the reference value prediction system 1 in FIG. 1 has a particularly large traffic cycle such as a daily cycle, a week cycle, or a year cycle. In order to be effective when the traffic structure (compound structure) in which a small period is seen in the period is used, a weighted average is taken for parameter estimation values by a plurality of configurations.

このような複合的周期構造を持つトラヒックに対して、各パラメタ推定器４ａ〜４ｎにおいて、複数のコンフィグによるパラメタ推定を行い、推定されたそれぞれのパラメタを、混合器５ａ〜５ｎにおいて、各時刻で合成することにより、短期的傾向、中期的傾向、長期的傾向を反映した、ベーストラヒック予測のためのより詳細なパラメタ設定を実現することができる。 For each of the traffic having such a complex periodic structure, the parameter estimators 4a to 4n perform parameter estimation by a plurality of configurations, and the estimated parameters are mixed at the time points in the mixers 5a to 5n. By combining, it is possible to realize more detailed parameter setting for base traffic prediction reflecting the short-term trend, medium-term trend, and long-term trend.

尚、このように、複数のコンフィグによるパラメタ推定値について重み付け平均をとるようにすることにより、例えば、図４のように、スロットの分割時刻を半分ずらした二つのパラメタ推定値を合成することにより、スロットサイズの半分の時間粒度でパラメタ設定を実現できる。すなわち、周期毎のパラメタ推定精度を確保するだけのスロットサイズを維持しつつ、そのスロットサイズより細かな時間粒度でのトラヒックの時間変動に対応するためにも有効である。 In this way, by taking a weighted average of the parameter estimation values by a plurality of configurations, for example, as shown in FIG. 4, by synthesizing two parameter estimation values obtained by shifting the slot division time by half. Parameter setting can be realized with a time granularity that is half the slot size. In other words, it is effective to cope with traffic time fluctuations at a time granularity finer than the slot size while maintaining a slot size sufficient to ensure parameter estimation accuracy for each period.

以下、このような基準値予測システム１の動作の詳細を説明する。 Hereinafter, the details of the operation of the reference value prediction system 1 will be described.

周期Ｔを持つトラヒックデータｙ_１，ｙ_２，…について、一周期分のデータを予め定められる大きさのスロット（タイムスロット）と呼ぶ小さな単位に分割する。各スロットは時間の順に１，２，…，Ｎと番号付けをする。 For the traffic data y ₁ , y ₂ ,... Having a period T, the data for one period is divided into small units called slots (time slots) having a predetermined size. Each slot is numbered 1, 2,..., N in order of time.

現在の周期をｉとあらわし、周期ｉに属するトラヒックデータを下記の式（数１）と表すとき、数２に示される各スロットに属するデータｙ_ｔは、数３〜数５で示されるように、（イ）重なりがない、（ロ）混ざりがない、（ハ）もれがない、という性質を持つ。 When the current cycle is represented by i and the traffic data belonging to the cycle i is expressed by the following equation (Equation 1), the data y _t belonging to each slot shown in Equation 2 is expressed by Equations 3 to 5. , (A) No overlap, (b) No mixing, (c) No leakage.

さらに、数６に示すように、（ニ）各スロットの大きさは任意で良いが、各周期でスロットサイズは不変とする。 Further, as shown in Equation 6, (d) the size of each slot may be arbitrary, but the slot size is not changed in each cycle.

このような分割を一つの観測トラヒックに対し複数通り用意しても良い。例えば、図２は各スロットを均一のサイズにした場合である。 A plurality of such divisions may be prepared for one observation traffic. For example, FIG. 2 shows a case where the slots are made to have a uniform size.

また、観測トラヒックの統計的性質の変化において、時刻により密な部分と疎な部分が存在する場合、図３のように分割することにより、保持すべき統計量を少なくすることもできる。 Further, when there are a dense portion and a sparse portion depending on the time in the change in the statistical properties of the observed traffic, the statistics to be retained can be reduced by dividing as shown in FIG.

また、統計的性質が時間によって変化するトラヒックにおいても、小さなスロットに分割することで、スロット内データの統計的性質をほぼ一定とみなすことができる。 Further, even in traffic whose statistical characteristics change with time, the statistical characteristics of the data in the slot can be regarded as almost constant by dividing the traffic into small slots.

また、周期性がある場合、過去周期の対応スロットにおける統計的性質も一致するため、
過去周期における統計量を利用することも可能となる。 Also, if there is periodicity, the statistical properties in the corresponding slots of the past period also match,
It is also possible to use statistics in the past cycle.

次に、具体的なトラヒック予測手法について、カルマンフィルタを用いる場合を以下で説明する。 Next, as a specific traffic prediction method, a case where a Kalman filter is used will be described below.

観測トラヒックは大きな周期成分であるベーストラヒックと、小さな周期成分である瞬時変動の合成トラヒックとして考えられる。 The observed traffic can be considered as the base traffic that is a large periodic component and the combined traffic of the instantaneous fluctuation that is a small periodic component.

ただし、この二つの周期成分を分離して観測することはできない。そのため、カルマンフィルタを利用し、ベーストラヒックをシステム発展、瞬時変動を観測誤差として扱うことにより、ベーストラヒックの予測を行う。 However, these two periodic components cannot be observed separately. Therefore, the base traffic is predicted by utilizing the Kalman filter, treating the base traffic as system evolution, and treating the instantaneous fluctuation as an observation error.

今、現時刻をｔと表す。カルマンフィルタを利用する際、まずトラヒックのモデル化を行う。本例では、周期ｉ、スロットjにおけるトラヒックモデルは、観測値ｙ_ｔと明示的には観測できないシステムの状態値ｘ_ｔにより、下記の数７の観測式、数８のシステム状態発展式で表現される。 Now, the current time is represented by t. When using the Kalman filter, the traffic is first modeled. In this example, the traffic model in the period i and the slot j is _expressed by the following equation 7 and the system state evolution equation 8 based on the observed value y _t and the state value x _t of the system that cannot be explicitly observed. Is done.

尚，増加率を数９、観測誤差分散を数１０およびシステム誤差分散を数１１とし、観測誤差ｖ_ｔとシステム誤差ｗ_ｔは平均０で分散が、それぞれ数１２の互いに独立な正規分布に従うと仮定する。 It is to be noted that the increase rate is represented by Equation 9, the observation error variance is represented by Equation 10, and the system error variance is represented by Equation 11, and the observation error v _t and the system error w _t have an average of 0 and the variance follows a mutually independent normal distribution of Equation 12, respectively. Assume.

ここで，増加率、観測誤差分散、およびシステム誤差分散は、周期ｉ、スロットｊにおけるトラヒックモデルの観測できない真のパラメタであり、例えば、ＥＭアルゴリズムと呼ばれる手法により自動的に推定される。 Here, the increase rate, the observation error variance, and the system error variance are true parameters that cannot be observed in the traffic model in the period i and the slot j, and are automatically estimated by a method called an EM algorithm, for example.

カルマンフィルタは，観測値ｙ_ｔのほかにシステムの数１３に示す状態推定値を利用する。 Kalman filter utilizes state estimation value indicative of the number of the system 13 in addition to the observations y _t.

ここで、数１４に示す表記は，時刻ｔまでの観測系列が与えられたときの時刻ｉのシステム状態推定値を表す。 Here, the notation shown in Equation 14 represents the estimated system state value at time i when the observation sequence up to time t is given.

また，周期ｉ、スロットｊにおけるＥＭアルゴリズムによるパラメタ推定値を数１５と表す。 In addition, the parameter estimation value by the EM algorithm in the period i and the slot j is expressed as Equation 15.

カルマンフィルタは，観測値を受け取るたび、状態推定ステップと状態予測ステップを交互に繰り返す。 Each time the Kalman filter receives an observation value, the state estimation step and the state prediction step are alternately repeated.

具体的には、時刻ｔの観測トラヒックｙ_ｔを受け取ると、次のような更新式によりシステムの状態値推定（数１６）、モデル分散値推定（数１７）へとフィードバックする。尚、モデル分散値は、推定したトラヒックモデルの良さを表す指標である。また、時刻ｔが周期ｉ、スロットｊに属していると仮定する。 Specifically, when receiving the observations traffic y _t at time t, the system of the state value estimating equation (16) by the updating formula as follows, is fed back to the model variance estimation (number 17). The model variance value is an index that represents the goodness of the estimated traffic model. Also assume that time t belongs to period i, slot j.

ここで、ｋ_ｔはカルマンゲインと呼ばれる量で、次の数１８の式で与えられる。 Here, k _t in an amount called the Kalman gain, given by the formula for a number of 18.

カルマンフィルタによる状態予測値は、システム状態の推定値ｘ_ｔ｜ｔよりシステム状態発展式に基づき、数１９に示すように計算される。 The state predicted value by the Kalman filter is calculated as shown in Equation 19 based on the system state evolution formula from the estimated value x _{t | t of the} system state.

したがって、本例におけるベースライン予測値（数２０）は、次の数２１の式で与えられる。 Therefore, the baseline prediction value (Equation 20) in this example is given by the following equation (21).

尚、モデル分散値ｐ_ｔも時間発展し、次の数２２の式により計算される。 The model variance value _pt also evolves with time and is calculated by the following equation (22).

次に各スロットの統計量を計算する手法に言及する。 Next, a method for calculating the statistics of each slot will be described.

カルマンフィルタでベーストラヒック予測を実施する際、必要となる統計量は次の数２４に示す入力パラメタである。 When the base traffic prediction is performed by the Kalman filter, the necessary statistics are input parameters expressed by the following equation (24).

このパラメタは、スロット内の観測データがそろっていればＥＭアルゴリズムで推定できるが、ベーストラヒックを予測する際、事前に推定することはできない。 This parameter can be estimated by the EM algorithm if the observation data in the slot is available, but cannot be estimated in advance when predicting the base traffic.

しかし、本例では観測トラヒックの周期性を仮定しているので、これらの入力パラメタは過去周期に推定されたパラメタが利用できる。 However, since the periodicity of the observed traffic is assumed in this example, the parameters estimated in the past period can be used as these input parameters.

また、スロット内のデータ数が少ない場合には推定精度は劣化するが、予め定められるパラメタηを用い、過去に推定された統計量を利用して以下の数２５に示すように推定することで精度を向上させる。 In addition, when the number of data in the slot is small, the estimation accuracy deteriorates, but by using a previously determined parameter η and estimating as shown in the following Equation 25 using a statistical amount estimated in the past. Improve accuracy.

ここで、次の数２６に示すものは、前周期における同一スロットの観測時系列（数２７に示す）に対する、ＥＭアルゴリズムによるパラメタ推定値とした。 Here, what is shown in the following Expression 26 is a parameter estimation value by the EM algorithm for the observation time series (shown in Expression 27) of the same slot in the previous period.

過去の推定値と新たな推定値を混合することにより、偶然のトラヒック傾向変化による影響を抑えることと、長期間継続するトラヒック傾向を学習することを両立している。 By mixing past estimated values and new estimated values, it is possible to suppress the influence of accidental changes in traffic trends and to learn traffic trends that last for a long time.

また、推定された統計量のみを保持しておけばよく、過去の系列全てを保持する必要はない。 Moreover, it is only necessary to hold the estimated statistics, and it is not necessary to hold all the past series.

次に、複数種類の分割方法を併用する手法を以下に示す。 Next, a method of using a plurality of types of division methods together is shown below.

統計量推定において推定に利用するサンプルデータが少ない場合、一般的に推定精度が悪くなる。そのため統計量推定の精度を高めるには、スロットを大きくしてサンプルデータを増やす必要があるが、スロットを大きくすると、統計量の時間変化を精密に取り扱えなくなってしまうという問題が生じる。 In general, when the sample data used for estimation in the statistic estimation is small, the estimation accuracy deteriorates. Therefore, in order to increase the accuracy of statistics estimation, it is necessary to increase the sample data by enlarging the slot. However, if the slot is increased, there arises a problem that it becomes impossible to accurately handle the time change of the statistics.

スロットの分割方法を数種類準備するのはこの問題を解決するためで、それぞれの分割パターンで推定された統計情報を混合することで推定精度の向上と、時間変化の精密な推定を両立する。 Several types of slot division methods are prepared in order to solve this problem. By mixing the statistical information estimated by each division pattern, both improvement in estimation accuracy and precise estimation of time change are achieved.

例えば一時間間隔のスロット分割において、１２:００からの分割と１２:３０からの分割とを準備する。この場合、それぞれの分割パターンでは一時間毎の時間変化しか推定できないが、それぞれで推定された統計量を合成することで、３０分毎の時間変化を推定することが可能となる（図４参照）。 For example, in the slot division at one hour intervals, the division from 12:00 and the division from 12:30 are prepared. In this case, each division pattern can only estimate a time change for each hour, but it is possible to estimate a time change for every 30 minutes by combining the statistics estimated for each (see FIG. 4). ).

より一般的には、あるスロット分割パターンをＳ_ｎとし、時刻ｔにおける観測値が属するＳ_ｎ内のスロットをＳ（ｔ，Ｓ_ｎ）とあらわす。 More generally, a certain slot division pattern and S _n, represents the slot in the S _n the observed value belongs at time t S (t, S _n) and.

Ｎ通りの異なる分割パターンＳ_ｎを準備するとき、時刻ｔにおける統計量θｔは、時刻ｔについての各スロットＳ（ｔ，Ｓ_ｎ）で推定された統計量，θｓ（ｔ，Ｓｎ）より、
次の数２８に示すように計算される。 When preparing a different division pattern _{S n} of N kinds, statistics θt at time t, each slot S (t, _{S n)} for time t estimated statistics in than [theta] s (t, Sn),
The following calculation is performed.

ここで、（ｕ_１，…，ｕ_Ｎ）は予め定められる重み付け係数とした。 Here, (u ₁ ,..., U _N ) is a predetermined weighting coefficient.

このように、複数の分割パターンを併用することで、より粒度が細かく推定値の安定した統計量推定が可能となる。 In this way, by using a plurality of division patterns in combination, it is possible to estimate statistics with a finer granularity and a stable estimated value.

次に、図５を用いて、図１における基準値予測システム１の処理手順例を説明する。 Next, a processing procedure example of the reference value prediction system 1 in FIG. 1 will be described with reference to FIG.

まず、図示していない処理手段により、予め記録媒体に記録された、あるいは入力装置から入力された、コンフィグ情報（ネットワークトラヒックの一周期分のデータ長、タイムスロットのサイズ等）を読み込み、コンフィグ設定ファイルとして記憶装置に格納する（ステップＳ５０１）。 First, configuration information (data length for one cycle of network traffic, time slot size, etc.) that has been recorded in advance on a recording medium or input from an input device is read by a processing means (not shown) to set the configuration. The file is stored in the storage device (step S501).

分配器２を介して観測データが入力される度に、選択器３は、コンフィグ設定ファイルを参照して、一周期分の観測データをスロット別データとして取り扱うために、入力された観測値データを、予め一周期を分割して設定された複数のタイムスロットに順次に対応付けて記憶装置に出力する（ステップＳ５０２）。 Each time the observation data is input via the distributor 2, the selector 3 refers to the configuration setting file, and handles the input observation data in order to handle the observation data for one period as slot-specific data. Then, the data are sequentially associated with a plurality of time slots set by dividing one cycle in advance and output to the storage device (step S502).

各タイムスロットに対応して設けられたパラメタ推定器４ａ〜４ｎは、選択器３から出力された各タイムスロット内のそれぞれのデータをスロット別データから読み出し、それぞれのデータに対して、例えばＥＭアルゴリズム等の第１の時系列解析手法を用いて、基準値を予測するためのパラメタを推定し（ステップＳ５０３）、この推定したパラメタ推定値と過去周期の同じタイムスロットにおけるパラメタ推定値の重み付け平均値を次周期におけるパラメタとして算出し、パラメタファイルとして記憶装置に出力する（ステップＳ５０４）。 The parameter estimators 4a to 4n provided corresponding to the respective time slots read the respective data in the respective time slots output from the selector 3 from the slot-specific data, and for example, the EM algorithm for each data. Is used to estimate a parameter for predicting the reference value (step S503), and the weighted average value of the estimated parameter estimate and the parameter estimate in the same time slot of the past cycle Is calculated as a parameter in the next period, and is output to the storage device as a parameter file (step S504).

予測器６ａ〜６ｎは、パラメタファイルから、各パラメタ推定器４ａ〜４ｎが算出した次周期におけるパラメタを読み出し、各パラメタと分配器２を介して入力された観測データを用いた例えばカルマンフィルタ等の時系列解析モデルを利用した第２の時系列解析手法により基準値を算出し、予測ベーストラヒックヒックファイルとして記憶装置に出力する（ステップＳ５０５）。 The predictors 6a to 6n read parameters in the next period calculated by the parameter estimators 4a to 4n from the parameter file, and use, for example, a Kalman filter or the like using the parameters and observation data input via the distributor 2. The reference value is calculated by the second time series analysis method using the series analysis model, and is output to the storage device as a prediction base traffic file (step S505).

尚、ステップＳ５０１の処理手順において、各パラメタ推定器４ａ〜４ｎによるパラメタ推定精度を向上させるため、複数のコンフィグによるパラメタ推定値について重み付け平均をとるようにした場合、各コンフィグ毎に各パラメタ推定器４ａ〜４ｎを設け、ステップＳ５０２において、選択器３は、各コンフィグ毎の各パラメタ推定器４ａ〜４ｎに対して、各一周期分の観測値データを、各タイムスロット単位に分割して出力し、ステップＳ５０３，Ｓ５０４において、各パラメタ推定器４ａ〜４ｎは、複数のコンフィグによるパラメタ推定を行い、推定されたそれぞれのパラメタを、混合器５ａ〜５ｎにおいて、各時刻で合成し、ステップＳ５０５において、予測器６ａ〜６ｎは、混合器５ａ〜５ｎで合成（混合）された各コンフィグ毎の各パラメタ推定器４ａ〜４ｎが算出した次周期におけるパラメタを用いて基準値を算出する。 In the processing procedure of step S501, in order to improve the parameter estimation accuracy by the parameter estimators 4a to 4n, when the weighted average is taken for the parameter estimation values by a plurality of configurations, each parameter estimator is set for each configuration. 4a to 4n are provided, and in step S502, the selector 3 divides the observation value data for each period into each time slot unit and outputs it to each parameter estimator 4a to 4n for each configuration. In steps S503 and S504, the parameter estimators 4a to 4n perform parameter estimation based on a plurality of configurations, and combine the estimated parameters at the respective times in the mixers 5a to 5n. In step S505, The predictors 6a to 6n are configured (mixed) by the mixers 5a to 5n. Calculating a reference value using the parameters in the next cycle each parameter estimator 4a~4n were calculated for.

以上、図１〜図５を用いて説明したように、本例では、観測したネットワークトラヒックが、例えば、基準値から一定量以上逸脱したとき、トラヒックに異常が発生したと判定する際に用いる、基準値を意味するベーストラヒックを予測するために、観測したネットワークトラヒックの一周期分のデータ長を決定すると共に、一周期分のデータをスロットと名付ける小単位に分割し、各スロット内のデータに対し、例えばＥＭアルゴリズム等の時系列解析手法を用いてベーストラヒックを予測するためのパラメタを推定し、推定したパラメタ推定値と過去周期の該当スロットにおけるパラメタ推定値の重み付け平均値を次周期におけるパラメタとして求め、この次周期におけるパラメタを用いた、例えばカルマンフィルタ等の時系列解析手法によりベーストラヒックを算出する。 As described above with reference to FIGS. 1 to 5, in this example, when the observed network traffic deviates from a reference value by a certain amount or more, for example, it is used to determine that an abnormality has occurred in the traffic. In order to predict the base traffic that means the reference value, the data length for one period of the observed network traffic is determined, and the data for one period is divided into small units called slots, and the data in each slot is divided into data. On the other hand, for example, a parameter for predicting base traffic is estimated using a time series analysis method such as an EM algorithm, and the estimated parameter estimated value and the weighted average value of the parameter estimated value in the corresponding slot of the past cycle are used as parameters in the next cycle. For example, a time series analysis method such as a Kalman filter using parameters in this next period To calculate a more-based traffic.

このように、本例では、一周期をスロットへと分割して統計情報を精密に取り扱うことにより、周期的に統計的性質が変化する時系列データに対するベーストラヒック予測の精度を向上させることが可能となる。 In this way, in this example, by accurately handling statistical information by dividing one period into slots, it is possible to improve the accuracy of base traffic prediction for time-series data whose statistical properties change periodically. It becomes.

また、重み付け平均値は、複数のコンフィグと名付けた、一周期分のデータ長およびスロットの指定方法によるパラメタ推定値について取り、複数のコンフィグによるパラメタ推定値について重み付け平均を取る場合においては、それぞれのコンフィグにおける予測精度を観測値と予測値の差として定義し、予測精度のよいコンフィグによるパラメタ推定値に対して大きな重みを付加する。 In addition, the weighted average value is taken for parameter estimates based on the data length and slot designation method named multiple configurations, and when taking the weighted average for parameter estimates for multiple configurations, The prediction accuracy in the configuration is defined as the difference between the observed value and the prediction value, and a large weight is added to the parameter estimation value by the configuration having a good prediction accuracy.

このように、複数のコンフィグによるパラメタ推定値について重み付け平均をとるようにすることにより、例えば、図４に示したように、スロットの分割時刻を半分ずらした二つのパラメタ推定値を合成することにより、スロットサイズの半分の時間粒度でパラメタ設定を実現でき、周期毎のパラメタ推定精度を確保するだけのスロットサイズを維持しつつ、そのスロットサイズより細かな時間粒度でのトラヒックの時間変動に対応するためにも有効である。 In this way, by taking a weighted average for the parameter estimation values by a plurality of configurations, for example, as shown in FIG. 4, by synthesizing two parameter estimation values obtained by shifting the slot division time by half. , Parameter setting can be realized with a time granularity that is half the slot size, and while maintaining a slot size sufficient to ensure the accuracy of parameter estimation for each period, it can handle traffic time fluctuations with a finer granularity than the slot size. This is also effective.

尚、本発明は、図１〜図５を用いて説明した例に限定されるものではなく、その要旨を逸脱しない範囲において種々変更可能である。例えば、本例では、時系列解析モデルとしてカルマンフィルタを取り上げたが、本発明の適用範囲はこの限りではない。例えば、各スロット内の時系列データについて、最小自乗推定などで直線近似し、スロット毎につなぎ合わせても良い。このとき保持されるパラメタは、各スロットの近似直線の傾きとなる。 In addition, this invention is not limited to the example demonstrated using FIGS. 1-5, In the range which does not deviate from the summary, various changes are possible. For example, in this example, the Kalman filter is taken as the time series analysis model, but the scope of application of the present invention is not limited to this. For example, the time-series data in each slot may be linearly approximated by least square estimation or the like and connected for each slot. The parameter held at this time is the slope of the approximate straight line of each slot.

また、本例では、ネットワークにおける通信トラヒックを例に説明したが、道路上での交通量測定や施設への入場者数測定などにも適用することが可能である。 In this example, communication traffic in the network has been described as an example. However, the present invention can also be applied to the measurement of traffic on the road and the number of visitors to the facility.

また、コンピュータ構成例に関しても、キーボードや光ディスクの駆動装置の無いコンピュータ構成としても良く、また、記録媒体としても光ディスクに限らずＦＤ（ＦｌｅｘｉｂｌｅＤｉｓｋ）等を用いることでも良い。また、プログラムのインストールに関しても、通信装置を介してネットワーク経由でプログラムをダウンロードしてインストールすることでも良い。 Further, regarding a computer configuration example, a computer configuration without a keyboard or an optical disk drive device may be used, and a recording medium is not limited to an optical disk, and an FD (Flexible Disk) or the like may be used. As for the program installation, the program may be downloaded and installed via a network via a communication device.

本発明に係る基準値予測システムの構成例を示すブロック図である。It is a block diagram which shows the structural example of the reference value prediction system which concerns on this invention. 本発明に係る基準値予測処理のスロット割り当ての第１の例を示す説明図である。It is explanatory drawing which shows the 1st example of slot allocation of the reference value prediction process which concerns on this invention. 図３は、本発明に係る基準値予測処理のスロット割り当ての第２の例を示す説明図である。FIG. 3 is an explanatory diagram showing a second example of slot allocation in the reference value prediction process according to the present invention. 本発明に係る基準値予測処理のスロット割り当ての第３の例を示す説明図である。It is explanatory drawing which shows the 3rd example of slot allocation of the reference value prediction process which concerns on this invention. 図５は、本発明に係る基準値予測処理動作例を示すフローチャートである。FIG. 5 is a flowchart showing an example of reference value prediction processing operation according to the present invention.

Explanation of symbols

１：基準予測システム、２：分配器、３：選択器、４ａ〜４ｎ：パラメタ推定器、５ａ〜５ｎ：混合器、６ａ〜６ｎ：予測器。 1: reference prediction system, 2: distributor, 3: selector, 4a-4n: parameter estimator, 5a-5n: mixer, 6a-6n: predictor.

Claims

A method for predicting, by a programmed computer, a reference value used for determining the normality of the data amount when the amount of data aggregated in time series in a predetermined time interval periodically increases or decreases,
As a programmed computer processing means, comprising a selection means, a plurality of parameter estimation means and a prediction means,
The selection means includes
A first procedure of recording the input data amount information in the storage device in association with a plurality of time slots set in advance by dividing one cycle each time the aggregated data amount information is input. Run,
The plurality of parameter estimation means includes:
Reading each data amount information in each time slot recorded by the selection means, using the first time series analysis method for each data amount information, estimating a parameter for predicting the reference value, Executing a second procedure for calculating a weighted average value of parameter estimates in the same time slot of the past cycle as the estimated parameter estimate as a parameter in the next cycle;
The prediction means includes
Executing a third procedure for calculating the reference value by a second time series analysis method using the input data amount information and the parameter in the next period calculated by the parameter estimation means; Prediction method.

The reference value prediction method according to claim 1,
The prediction means uses a time series analysis model when calculating base traffic in the third procedure.

The reference value prediction method according to claim 2, wherein
The reference value prediction method, wherein the prediction means uses a Kalman filter as a time series analysis model used in the third procedure.

A reference value prediction method according to any one of claims 1 to 3,
The calculation of the weighted average value of the parameter estimation value in the second procedure by the parameter estimation means is executed for each period set as a plurality of configuration information,
A reference value prediction method characterized by combining weighted average values calculated for each of a plurality of periods by mixing means provided as processing means of a programmed computer.

A reference value prediction method according to any one of claims 1 to 3,
Calculation of the weighted average value of the parameter estimation value in the second procedure by the parameter estimation means is executed for each time slot set as a plurality of configuration information,
A reference value prediction method characterized by combining weighted average values calculated for each of a plurality of time slots by a mixing means provided as a processing means of a programmed computer.

A reference value prediction method according to any one of claims 1 to 3,
The calculation of the weighted average value of the parameter estimated value in the second procedure by the parameter estimating means is executed for each period and each time slot set as a plurality of configuration information,
A reference value prediction method comprising: combining weighted average values calculated for a plurality of period fractions and a plurality of time slots by a mixing unit provided as a processing unit of a programmed computer.

A reference value prediction method according to any one of claims 4 to 6,
In the calculation of the weighted average value of the parameter estimation value in the second procedure by the parameter estimation means, the difference between the observed value and the prediction value based on each configuration information is obtained as the prediction accuracy, and the configuration information with the good prediction accuracy A reference value prediction method characterized in that a large weight is added to a parameter estimation value based on the above.

A reference value prediction method according to any one of claims 1 to 7,
The reference value predicting method according to claim 1, wherein the amount of data periodically increasing / decreasing that is aggregated in time series in the predetermined time interval unit is network traffic.

A system for predicting a reference value used for determining the normality of the amount of data when the amount of data collected in a time series periodically increases and decreases in a predetermined time interval unit by a programmed computer,
9. A reference value prediction system comprising means for executing each procedure according to claim 1 as processing means of a programmed computer.

The program for making a computer perform each procedure in the reference value prediction method in any one of Claims 1-8.