JP2012253445A

JP2012253445A - Traffic prediction method, device and program

Info

Publication number: JP2012253445A
Application number: JP2011122599A
Authority: JP
Inventors: Megumi Takeshita; 恵竹下; Masayuki Tsujino; 雅之辻野; Haruhisa Hasegawa; 治久長谷川; Naohisa Komatsu; 尚久小松; Masatsusugu Ichino; 将嗣市野
Original assignee: Waseda University; Nippon Telegraph and Telephone Corp
Current assignee: Waseda University; Nippon Telegraph and Telephone Corp
Priority date: 2011-05-31
Filing date: 2011-05-31
Publication date: 2012-12-20
Anticipated expiration: 2031-05-31
Also published as: JP5628745B2

Abstract

PROBLEM TO BE SOLVED: To enable a traffic prediction within a practical time with increased traffic prediction accuracy and a reduced processing scale by sampling.SOLUTION: A packet flow time and a packet size are obtained through a communication network, and stored into packet data storage means. External cause information is extracted from external cause information storage means that stores external cause information including a cause affecting a traffic volume. Packet data is extracted from the packet data storage means. A traffic volume prediction formula is obtained using given time granularity to be predicted. From the external cause prediction information storage means that stores prediction information of a future external cause, external cause data for a period to be predicted is obtained and applied to the traffic volume prediction formula to calculate the prediction traffic volume.

Description

本発明は、トラヒック予測方法及び装置及びプログラムに係り、特に、通信網上を流れるトラヒック量、および、通信網から得られないデータである外的要因(気象情報やイベント情報など)を用いて、将来のトラヒック量を推定するためトラヒック予測方法及び装置及びプログラムに関する。 The present invention relates to a traffic prediction method, apparatus, and program, and in particular, by using an external factor (such as weather information and event information) that is data that cannot be obtained from the communication network, and the amount of traffic that flows on the communication network. The present invention relates to a traffic prediction method, apparatus and program for estimating future traffic volume.

通信網上を流れるトラヒック量は大きく変動しており、その要因としては各ユーザが通信網を使うかどうかを決定する要因(カレンダ情報(平日/休日や時間帯)、天気、イベントなど)や、アプリケーションの普及状況等の様々な要因によって決定しており、非常に複雑である。 The amount of traffic flowing on the communication network varies greatly, including the factors that determine whether each user uses the communication network (calendar information (weekdays / holidays and time zones), weather, events, etc.), It is determined by various factors such as the spread of applications and is very complex.

従来の技術として、これらの要因は考慮せず、単純にトラヒック量の時系列データから、回帰分析等で将来のトラヒック量を予測する検討がある。例えば、トラヒックの直近の時系列の変動データを入力として与え、サポートベクタマシンを用いて次の時間でのトラヒック量を推定する方法がある(例えば、非特許文献１参照)。この手法の入力はトラヒック量のみであるが、トラヒック量を変動させる上記の要因は間接的に含まれていることとなる。 As a conventional technique, there is a study in which these factors are not taken into consideration and a future traffic amount is simply predicted from time series data of the traffic amount by regression analysis or the like. For example, there is a method in which time-series fluctuation data of the latest traffic is given as an input, and a traffic amount at the next time is estimated using a support vector machine (see, for example, Non-Patent Document 1). The input of this method is only the traffic amount, but the above-described factors that change the traffic amount are indirectly included.

一方で、アトラクタという定常状態の概念を用いて、時系列のトラヒックデータから、トラヒックがどの範囲で変動するかを求めることで今後のトラヒック量を予測している。この時に、アトラクタを求める際のトラヒックとして、どのデータを与えるかをカレンダの情報を元に決定している（例えば、特許文献１参照）。 On the other hand, the future traffic volume is predicted by determining in which range the traffic fluctuates from time-series traffic data using the concept of a steady state called an attractor. At this time, as data for obtaining the attractor, the data to be given is determined based on the calendar information (for example, see Patent Document 1).

また、入力としてカレンダ情報と天気の情報を入力として、これらの情報がトラヒック量にどのように影響を与えるかを分析し、重回帰分析法によってトラヒック量の推定を行っている(例えば、非特許文献２参照)。 In addition, calendar information and weather information are input as input, how these information affect the traffic volume, and the traffic volume is estimated by multiple regression analysis (for example, non-patent Reference 2).

特開平１１−９６１３５号公報JP-A-11-96135

Bermolen, P. et al. "Support vector regression for link load prediction"in proceedings of the Telecommunication Networking Workshop on QoS in Multiservice IP Networks, 2008, pp268-273, 2008.Bermolen, P. et al. "Support vector regression for link load prediction" in proceedings of the Telecommunication Networking Workshop on QoS in Multiservice IP Networks, 2008, pp268-273, 2008. 秋永和計、金田茂、品川準輝、三浦章、"呼特性と周辺環境特性を用いたトラヒック予測法の提案" 信学技報, NS2005-6, pp.21-24, April 2005.Kazuaki Akinaga, Shigeru Kaneda, Junteru Shinagawa, Akira Miura, "Proposal of Traffic Prediction Method Using Call Characteristics and Surrounding Environment Characteristics" IEICE Technical Report, NS2005-6, pp.21-24, April 2005.

ユーザが通信網を使うかどうかを決定する要因は多くが非線形である(例えば、気温が1度上昇した際の、トラヒック量の増加量は気温によって異なる事は容易に想像できる)。また、それぞれの要素が複雑に影響しあっている。例えば、『夜の雨の日にはトラヒック量が多くなる』等のケースでは、それぞれ（「夜」、「雨」）が独立で発生した場合よりも、２つの要素が同時に発生した場合のほうがトラヒック量の増加量は多くなる。このように、非線形な複数の要素を同時に考慮しなければ予測精度が向上しないという問題がある。よって、非特許文献２の手法を含む多くの予測手法で使われているような重回帰分析では線形性を仮定しているため精度の良い予測は困難である。 Many factors that determine whether or not a user uses a communication network are non-linear (for example, it can be easily imagined that the amount of increase in traffic when the temperature rises once varies with temperature). In addition, each element has a complex influence. For example, in the case of “the amount of traffic increases on a rainy day at night”, the case where two elements occur at the same time is more likely than when each (“night” and “rain”) occurs independently. The increase in traffic volume increases. As described above, there is a problem in that the prediction accuracy cannot be improved unless a plurality of non-linear elements are considered at the same time. Therefore, since multiple regression analysis such as that used in many prediction methods including the method of Non-Patent Document 2 assumes linearity, accurate prediction is difficult.

本発明は、上記の点に鑑みなされたもので、トラヒック予測の精度を高め、サンプリングによる処理規模を削減し、かつ、現実的な時間内でトラヒック予測が可能なトラヒック予測方法及び装置及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and provides a traffic prediction method, apparatus, and program capable of improving the accuracy of traffic prediction, reducing the processing scale by sampling, and enabling traffic prediction within a realistic time. The purpose is to provide.

上記の課題を解決するために、本発明（請求項１）は、将来のトラヒック量を推定するトラヒック予測方法であって、
トラヒックデータ収集手段が、通信網を介してパケットが流れた時間とパケットのサイズを取得し、パケットデータ記憶手段に格納するトラヒックデータ収集ステップと、
学習手段が、トラヒック量に影響する要因を含む外的要因情報を格納した外的要因情報記憶手段から前記外的要因情報を抽出し、前記パケットデータ記憶手段から前記パケットデータを抽出し、与えられた予測したい時間粒度を用いてトラヒック量予測式を求める学習ステップと、
予測手段が、将来の外的要因の予測情報を格納した外的要因予測情報記憶手段から予測したい期間の外的要因のデータを取得して、前記トラヒック量予測式に適用することにより予測トラヒック量を算出する予測ステップと、を行うことを特徴とする。 In order to solve the above problems, the present invention (Claim 1) is a traffic prediction method for estimating a future traffic volume,
A traffic data collecting means for acquiring the time and the size of the packet that flowed through the communication network, and storing the packet data in the packet data storage means;
The learning means extracts the external factor information from the external factor information storage means storing the external factor information including the factors affecting the traffic volume, extracts the packet data from the packet data storage means, and A learning step for obtaining a traffic volume prediction formula using a time granularity to be predicted;
The prediction means obtains the data of the external factor in the period to be predicted from the external factor prediction information storage means storing the prediction information of the external factor in the future, and applies it to the traffic amount prediction formula, thereby predicting the traffic volume And a prediction step of calculating.

また、本発明(請求項２)は、前記学習ステップにおいて、
前記パケットデータについて、前記時間粒度の単位で求められたトラヒック量と、前記外的要因情報から求められた外的要因の合計数を特徴ベクトルとする特徴量計算ステップと、
前記特徴ベクトルにK-平均法を適用してサンプル数を削減するサンプル削減ステップと、
前記サンプル削減ステップで削減されたサンプルに対して、前記外的要因情報と前記トラヒック量を用いてカーネル回帰分析を適用して、前記トラヒック量予測式を求めるトラヒック予測式作成ステップと、を含む。 In the present invention (Claim 2), in the learning step,
For the packet data, a feature amount calculation step using a traffic amount determined in units of the time granularity and a total number of external factors determined from the external factor information as a feature vector;
A sample reduction step of applying a K-means method to the feature vector to reduce the number of samples;
Applying a kernel regression analysis to the sample reduced in the sample reduction step by using the external factor information and the traffic volume to create a traffic prediction formula to obtain the traffic volume prediction formula.

本発明では、高精度でトラヒック量を予測する手法を提案している。天気やカレンダ情報といった外的要因を使用することで、数日〜数週間先までのトラヒック量を予測する。トラヒック量を予測することができれば、通信網が混雑して品質劣化が起こる可能性を予見できる．その際に、トラヒックエンジニアリング技術やサーバの増設等の対処を行うことで、混雑による影響を事前に抑えることができる。また、どのような条件でトラヒックが増大するかを明らかにすることで、今後の設備設計計画にも寄与する。 The present invention proposes a method for predicting the traffic volume with high accuracy. By using external factors such as weather and calendar information, the traffic volume from several days to several weeks ahead is predicted. If the amount of traffic can be predicted, the possibility of quality degradation due to congestion in the communication network can be predicted. At that time, the influence of congestion can be suppressed in advance by taking measures such as adding traffic engineering technology and servers. In addition, it will contribute to future facility design plans by clarifying under what conditions traffic will increase.

本発明の一実施形態におけるトラヒック予測装置の構成図である。It is a block diagram of the traffic prediction apparatus in one Embodiment of this invention. 本発明の一実施形態におけるトラヒックデータ収集部の構成図である。It is a block diagram of the traffic data collection part in one Embodiment of this invention. 本発明の一実施形態におけるトラヒック予測部の学習部の構成図である。It is a block diagram of the learning part of the traffic estimation part in one Embodiment of this invention. 本発明の一実施形態における学習部の特徴量計算部のフローチャートである。It is a flowchart of the feature-value calculation part of the learning part in one Embodiment of this invention. 本発明の一実施形態における指定した粒度でのトラヒック量集計のイメージである。It is an image of traffic volume totaling with the designated granularity in one embodiment of the present invention. 本発明の一実施形態におけるダミー変数のイメージである。It is an image of a dummy variable in one embodiment of the present invention. 本発明の一実施形態における学習部のサンプル数削減部のフローチャートである。It is a flowchart of the sample number reduction part of the learning part in one Embodiment of this invention. 本発明の一実施形態における学習部のカーネル回帰による予測式作成部のフローチャートである。It is a flowchart of the prediction formula preparation part by the kernel regression of the learning part in one Embodiment of this invention. 本発明の一実施形態におけるトラヒック予測部の予測部の構成図である。It is a block diagram of the prediction part of the traffic prediction part in one Embodiment of this invention. 本発明の一実施形態における特徴空間での分布の様子である。It is a mode of distribution in the feature space in one embodiment of the present invention.

以下図面と共に、本発明の実施の形態を説明する。 Embodiments of the present invention will be described below with reference to the drawings.

本発明では、下記の３つの要因を組み合わせることにより、高精度なトラヒック予測を行う。 In the present invention, highly accurate traffic prediction is performed by combining the following three factors.

（１）予測のための入力データ：
入力データに通信網内で得られるトラヒック量の情報だけでなく、気象情報、カレンダ情報、イベント情報などのトラヒック量に影響する要因を通信網の内外から設定することで、トラヒック予測の精度を高めることができる。後述するトラヒック予測装置では、これを外的要因ＤＢ１０１に格納されているものとして説明する。 (1) Input data for prediction:
Improve the accuracy of traffic prediction by setting factors that affect traffic volume, such as weather information, calendar information, and event information, from inside and outside the network, as well as traffic volume information obtained within the communication network in the input data be able to. In the traffic prediction device to be described later, this will be described as being stored in the external factor DB 101.

（２）サンプリングによる処理規模の削減：
得られた複数のサンプル点から、代表的な点のみを用いて予測を行うことにより、（３）に示すように計算量がかかる非線形処理にも対応ができるようになる。具体的には、「文献１：市野将嗣、坂野鋭、小松尚久、"クラスタリングを用いた核非線形相互部分空間法の処理量削減手法、"電子情報通信学会論文誌D、 J90-D、 no.8、pp.2168-2181、（2007.8）」に示す手法を用いることにより、カーネル法を用いた他の手法において、大幅な処理量削減を可能としている。当該手法は、後述する学習部１３０のサンプル数削減部１３２に適用される。 (2) Reduction of processing scale by sampling:
By performing prediction using only representative points from the obtained plurality of sample points, it is possible to cope with nonlinear processing that requires a large amount of calculation as shown in (3). Specifically, “Reference 1: Masanori Ichino, Akira Sakano, Naohisa Komatsu,“ A method for reducing the throughput of the nuclear nonlinear mutual subspace method using clustering, ”IEICE Transactions D, J90-D, no. 8, pp.2168-2181, (2007.8) ”, it is possible to significantly reduce the amount of processing in other methods using the kernel method. This method is applied to the sample number reduction unit 132 of the learning unit 130 described later.

（３）カーネル回帰によるトラヒック予測：
トラヒックの非線形性や、複数の要素の相関を考慮して予測式を立てるために、カーネル法による高次元への写像を行っている。高次元空間上で、重回帰分析の様な線形の回帰を行い、元の次元に戻すことにより、非線形の処理が可能となる。高次元空間での処理は非常に計算量がかかるが、（２）に示すサンプル削減を行うことにより、現実的な時間内での予測が可能となる。当該処理は、後述する学習部１３０のカーネル回帰による予測式作成部１３３に適用される。 (3) Traffic prediction by kernel regression:
High-dimensional mapping is performed by the kernel method in order to formulate prediction formulas considering the nonlinearity of traffic and the correlation of multiple elements. By performing linear regression such as multiple regression analysis on a high-dimensional space and returning to the original dimension, nonlinear processing becomes possible. Although processing in a high-dimensional space is very computationally intensive, prediction within a realistic time becomes possible by performing the sample reduction shown in (2). This process is applied to a prediction formula creation unit 133 by kernel regression of the learning unit 130 described later.

以下、具体的に説明する。 This will be specifically described below.

以下の実施形態では、気象情報、カレンダ情報、トラヒック量からインターネットのトラヒック予測を実現するものとして説明する。以下では、気象情報とカレンダ情報を外的要因と呼ぶ。 In the following embodiment, description will be made assuming that Internet traffic prediction is realized from weather information, calendar information, and traffic volume. In the following, weather information and calendar information are referred to as external factors.

図１は、本発明の一実施形態におけるトラヒック予測装置の構成を示す。 FIG. 1 shows the configuration of a traffic prediction apparatus according to an embodiment of the present invention.

同図に示すトラヒック予測装置は、トラヒックデータ収集部１１０、トラヒック量予測部１２０、予測トラヒック量表示部１５０、外的要因ＤＢ１０１、将来の外的要因予測ＤＢ１０３から構成される。 The traffic prediction apparatus shown in FIG. 1 includes a traffic data collection unit 110, a traffic amount prediction unit 120, a predicted traffic amount display unit 150, an external factor DB 101, and a future external factor prediction DB 103.

トラヒックデータ収集部１１０は、パケットデータＤＢ１０２を有し、通信網を介して取得したトラヒックデータを格納する。外的要因ＤＢ１０１は、例えば気象情報ならば気象庁等から取得することで作成する。 The traffic data collection unit 110 has a packet data DB 102 and stores traffic data acquired via a communication network. The external factor DB 101 is created by, for example, acquiring weather information from the Japan Meteorological Agency or the like.

以下に、トラヒック予測装置の各構成要素について説明する。 Below, each component of a traffic prediction apparatus is demonstrated.

・トラヒックデータ取得部：
トラヒックデータ取得部１１０は、トラヒック量と外的要因の関係を学習するために、予測したい箇所に流れるトラヒック量を事前に測定するための機能部である。トラヒックデータ取得部１１０のブロック図を図２に示す。パケットデータ取得部１１２では、トラヒック量を取得するため、通信網（インターネット）１１１を流れるパケット(データの分割単位)それぞれに付いて、そのパケットが流れた時間とパケットの大きさを記録し、パケットデータDB１０２に逐次格納していく。・ Traffic data acquisition unit:
The traffic data acquisition unit 110 is a functional unit for measuring in advance the amount of traffic flowing to a location to be predicted in order to learn the relationship between the traffic amount and external factors. A block diagram of the traffic data acquisition unit 110 is shown in FIG. In order to acquire the traffic volume, the packet data acquisition unit 112 records the packet flow time and the packet size for each packet (data division unit) flowing through the communication network (Internet) 111, and The data is stored in the data DB 102 sequentially.

・トラヒック予測部：
トラヒック予測部１２０は、既存のデータからトラヒック量を予測する式を設計する学習部１３０と、予測式を用いて実際に将来のトラヒック量を予測する予測部１４０に分かれている。・ Traffic prediction part:
The traffic prediction unit 120 is divided into a learning unit 130 that designs a formula for predicting the traffic volume from existing data, and a prediction unit 140 that actually predicts the future traffic volume using the prediction formula.

まず、学習部１３０について説明する。学習部１３０のブロック図を図３に示す．学習部１３０は、特徴量計算部１３１、サンプル数削減部１３２、予測式作成部１３３から構成され、予測したい時間粒度を与えることにより、パケットデータDB１０１と外的要因DB１０２からデータを抽出しトラヒック量予測式を設計する。
特徴量計算部１３１は、トラヒックデータより特徴量を計算する。特徴量計算部１３１においては、外的要因DB１０１、パケットデータDB１０２から指定した予測粒度でトラヒック量を予測するための集計を行う。その動作を図４に示す。 First, the learning unit 130 will be described. A block diagram of the learning unit 130 is shown in FIG. The learning unit 130 includes a feature amount calculation unit 131, a sample number reduction unit 132, and a prediction formula creation unit 133. By giving a time granularity to be predicted, the learning unit 130 extracts data from the packet data DB 101 and the external factor DB 102, and traffic volume. Design prediction formulas.
The feature amount calculation unit 131 calculates a feature amount from traffic data. The feature amount calculation unit 131 performs aggregation for predicting the traffic amount with the prediction granularity designated from the external factor DB 101 and the packet data DB 102. The operation is shown in FIG.

ステップ１０１）まず、特徴量計算部１３１は、外部要因DB１０１、パケットデータDB１０２から、予測に用いるため、過去1年分のデータを取得する(この例では1年としているが、実際には計算量・精度から適切な期間を設定する必要がある)。 Step 101) First, the feature quantity calculation unit 131 obtains data for the past year to be used for prediction from the external factor DB 101 and the packet data DB 102 (in this example, it is assumed to be one year, but the calculation quantity is actually calculated). -It is necessary to set an appropriate period from the accuracy).

ステップ１０２）次に、1年分のデータを指定した予測粒度の単位でトラヒック量と外的要因を集計する。例えば、図５のａ，ｂ，ｃのように、1日のトラヒック負荷を予測する場合には、1日毎にトラヒック量を合計する。外的要因に関して、数量で表すことができるものについては抽出区間に対応するものを算出する。数量で表すことができない外的要因については、図６に示すようにダミー変数で表現する。 Step 102) Next, the traffic volume and external factors are tabulated in the unit of the predicted granularity specifying the data for one year. For example, as shown in FIGS. 5A, 5B, and 5C, when the daily traffic load is predicted, the traffic volume is summed up every day. For external factors that can be expressed in quantity, the one corresponding to the extraction interval is calculated. External factors that cannot be expressed in quantity are expressed by dummy variables as shown in FIG.

ステップ１０３）これにより、求めたトラヒック量、外的要因を特徴ベクトルとして出力する。トラヒック量、外的要因の合計数Nが各ベクトルの次元数となる。ベクトルの数自体は、1年を予測粒度で割った数計算される(1日単位でのトラヒック量を予測する場合には365個のN次元ベクトルが計算される)。 Step 103) Thereby, the obtained traffic amount and external factors are output as feature vectors. The total amount N of traffic volume and external factors is the number of dimensions of each vector. The number of vectors is calculated by dividing the year by the prediction granularity (365 N-dimensional vectors are calculated when the traffic volume in a day unit is predicted).

次に、学習部１３０のサンプル数削減部１３２では、特徴ベクトル（サンプル）の削減を行うためにK-平均量を利用する。これにより、現実的な時間内での予測が可能となる。 Next, the sample number reduction unit 132 of the learning unit 130 uses the K-average amount in order to reduce feature vectors (samples). Thereby, prediction within a realistic time becomes possible.

サンプル数削減部１３２の動作を図７に示す。 The operation of the sample number reduction unit 132 is shown in FIG.

クラスタ中心の数(この数が、最終的に削減されたサンプル数となる)をＫとする。各サンプルはＫ個あるどれかのクラスに分類される。各クラスＣ_jに分類されるサンプルの数をN_ｊ(ｊ=1，…，Ｋ)とし、各クラスＣ_j （ｊ＝１，…，Ｋ）のクラスタ中心を Let K be the number of cluster centers (this number will eventually be the number of samples reduced). Each sample is classified into one of K classes. The number of samples classified into each class C _j is N _j (j = 1,..., K), and the cluster center of each class C _j (j = 1,.

とする。以下の処理を行うことで、与えたサンプルを代表するのに適切なK個のクラスタ中心を得ることが出来る。ただし、

And By performing the following processing, K cluster centers suitable for representing a given sample can be obtained. However,

はi番目のサンプルである．
ステップ２０１） m個のサンプル集合

Is the i th sample.
Step 201) m sample sets

からj個のクラスタ中心をランダムに選ぶ。

Choose j cluster centers at random.

ステップ２０２）ｍ個のサンプル Step 202) m samples

それぞれに対して、

For each

（あるサンプルデータとクラス中心の距離の2乗)が最小になるクラスタＣ_jへクラス分けする。

Classification is made into clusters C _j where (some sample data and the square of the distance between class centers) are minimized.

ステップ２０３）サンプルデータ全体で適切なクラスに分けられているかの指標（＝全てのサンプルデータのクラス中心との距離の2乗。小さいほど良い)） Step 203) Indicator of whether the entire sample data is divided into appropriate classes (= the square of the distance from the class center of all sample data. The smaller the better))

を計算する。

Calculate

ステップ２０４）ステップ２０３で求められた距離の和の２乗の値が閾値ε以下であれば適切なクラスタ中心が得られたとして、ステップ２０６に移行し、閾値εより大きければより良いクラスタ中心を得るためにステップ２０５に移行する。 Step 204) If the square value of the sum of the distances obtained in Step 203 is equal to or smaller than the threshold ε, it is determined that an appropriate cluster center is obtained. To get to step 205.

ステップ２０５）各クラスタについてクラスタ中心 Step 205) Cluster center for each cluster

を再計算し、クラスタ中心を更新してステップ２０３に移行する。

Is recalculated, the cluster center is updated, and the process proceeds to step 203.

ステップ２０６）ステップ２０３で求められたクラスタ中心 Step 206) Cluster center obtained in Step 203

を学習データとしてメモリ(図示せず)に保存する。

Are stored in a memory (not shown) as learning data.

ステップ２０７）ユーザからの入力で指定されたサンプル数K個にサンプル削減されたクラスタ中心のN次元ベクトルを出力する。 Step 207) An N-dimensional vector at the center of the cluster, which is sample-reduced to K samples specified by the input from the user, is output.

上記の処理が終了すると、m個のデータを代表するK個の点 When the above process is completed, K points representing m data

が求められるのでこれを削減されたサンプルとして利用する．
次に、学習部１３０のカーネル回帰によるトラヒック予測式作成部１３３について説明する。

Is used as a reduced sample.
Next, the traffic prediction formula creation unit 133 by kernel regression of the learning unit 130 will be described.

ステップ１０２において事前に測定したトラヒック量と外的要因を用いて、トラヒック量(目的変数)と外的要因(説明変数)の関係を表す回帰式を求める。 In step 102, using the traffic volume and external factors measured in advance, a regression equation representing the relationship between the traffic volume (target variable) and the external factors (explanatory variable) is obtained.

トラヒック量と外的要因を各軸に取った空間(以下、「特徴空間」とす記す)において、トラヒックの分布は非線形性を持つことが考えられる。これは、外的要因に依存して特徴空間上でとり得る状態が多く、さらに予測に使用する外的要因の数を増やすについて自由度も大きくなるためである。特徴空間での分布の様子の一例を図１０に示す。 In a space (hereinafter referred to as a “feature space”) in which the traffic amount and external factors are taken on each axis (hereinafter referred to as “feature space”), the traffic distribution may be nonlinear. This is because there are many states that can be taken in the feature space depending on external factors, and the degree of freedom for increasing the number of external factors used for prediction also increases. An example of the distribution in the feature space is shown in FIG.

そこで、本発明ではカーネル回帰分析を組み合わせたトラヒック予測手法を提案する．カーネル回帰分析とは、非線形関数により各点をより高次の空間へ写像して回帰分析を行うにより元の次元空間での非線形回帰分析を行うことができる手法である。 Therefore, the present invention proposes a traffic prediction method combined with kernel regression analysis. Kernel regression analysis is a technique that enables nonlinear regression analysis in the original dimensional space by mapping each point to a higher-order space using a nonlinear function and performing regression analysis.

図８にカーネル回帰によるトラヒック予測式作成部１３３のフローを示す。 FIG. 8 shows a flow of the traffic prediction formula creation unit 133 by kernel regression.

ステップ３０１）サンプル数削減部１３２によりサンプル削減された特徴ベクトル Step 301) The feature vector sampled by the sample number reduction unit 132

を入力する。

Enter.

ステップ３０２）入力されたサンプル削減された特徴ベクトルをメモリ（図示せず）に格納する。 Step 302) Store the input sample-reduced feature vector in a memory (not shown).

ステップ３０３）使用するカーネル関数ｋを選択する。 Step 303) Select a kernel function k to be used.

ステップ３０４）カーネル関数の中にあるカーネルパラメータの値を設定する。 Step 304) Set the value of the kernel parameter in the kernel function.

ステップ３０５）ステップ３０２で保持されているサンプル削減された特徴ベクトルとカーネルパラメータを用いて予測式を生成し、出力する。 Step 305) A prediction equation is generated and output using the sample-reduced feature vector and kernel parameter held in Step 302.

具体的には以下の様に回帰モデルを表すことができる。 Specifically, the regression model can be expressed as follows.

ただし、

However,

は

Is

における予測値、

Predicted value at,

はカーネル関数、

Is the kernel function,

は事前に測定した各サンプルのうち、それぞれのクラスタに所属しているトラヒック量の平均値である。

Is an average value of traffic volume belonging to each cluster among samples measured in advance.

次に、トラヒック予測部１２０の予測部１４０について説明する。 Next, the prediction unit 140 of the traffic prediction unit 120 will be described.

図９に予測部１４０の構成を示す。 FIG. 9 shows the configuration of the prediction unit 140.

予測部１４０は、特徴量計算部１４１、予測処理部１４２から構成される。 The prediction unit 140 includes a feature amount calculation unit 141 and a prediction processing unit 142.

特徴量計算部１４１は、状来の外的要因予測ＤＢ１０３から予測したい期間の外的要因のデータを取得する。取得後、数量で表すことができるものについては抽出区間に対応する特徴量を算出する。数量で表すことができない外的要因については、図６に示すようにダミー変数で表現する。そして、予測処理部１４２は、特徴量を学習部１３０で求められた予測式（１）に代入し、予測値を計算する。 The feature amount calculation unit 141 acquires data of external factors for a period to be predicted from the conventional external factor prediction DB 103. After acquisition, the feature quantity corresponding to the extraction section is calculated for those that can be expressed in quantity. External factors that cannot be expressed in quantity are expressed by dummy variables as shown in FIG. Then, the prediction processing unit 142 calculates the predicted value by substituting the feature amount into the prediction formula (1) obtained by the learning unit 130.

なお、上記の実施の形態におけるトラヒックデータ収集部１１０、トラヒック予測部１２０の動作をプログラムとして構築し、トラヒック予測装置として利用されるコンピュータにインストールして実行させる、または、ネットワークを介して流通させることも可能である。 In addition, the operation of the traffic data collection unit 110 and the traffic prediction unit 120 in the above embodiment is constructed as a program and installed in a computer used as a traffic prediction device to be executed or distributed via a network. Is also possible.

本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。 The present invention is not limited to the above-described embodiments, and various modifications and applications can be made within the scope of the claims.

１００トラヒック予測装置
１０１外的要因ＤＢ
１０２パケットデータＤＢ
１０３将来の外的要因予測ＤＢ
１１０トラヒックデータ収集部
１２０トラヒック予測部
１３０学習部
１３１特徴量計算部
１３２サンプル数削減部
１３３カーネル回帰による予測式作成部
１４０予測部
１４１特徴量計算部
１４２予測処理部
１５０予測トラヒック量表示部 100 Traffic prediction device 101 External factor DB
102 packet data DB
103 Future External Factor Prediction DB
DESCRIPTION OF SYMBOLS 110 Traffic data collection part 120 Traffic prediction part 130 Learning part 131 Feature-value calculation part 132 Sample number reduction part 133 Prediction formula preparation part 140 by kernel regression 140 Prediction part 141 Feature-value calculation part 142 Prediction processing part 150 Predictive traffic amount display part

Claims

A traffic prediction method for estimating future traffic volume,
A traffic data collecting means for acquiring the time and the size of the packet that flowed through the communication network, and storing the packet data in the packet data storage means;
The learning means extracts the external factor information from the external factor information storage means storing the external factor information including the factors affecting the traffic volume, extracts the packet data from the packet data storage means, and A learning step for obtaining a traffic volume prediction formula using a time granularity to be predicted;
The prediction means obtains the data of the external factor in the period to be predicted from the external factor prediction information storage means storing the prediction information of the external factor in the future, and applies it to the traffic amount prediction formula, thereby predicting the traffic volume A prediction step of calculating
The traffic prediction method characterized by performing.

In the learning step,
For the packet data, a feature amount calculation step using a traffic amount determined in units of the time granularity and a total number of external factors determined from the external factor information as a feature vector;
A sample reduction step of applying a K-means method to the feature vector to reduce the number of samples;
Applying a kernel regression analysis to the sample reduced in the sample reduction step using the external factor information and the traffic volume, a traffic prediction formula creating step for obtaining the traffic volume prediction formula;
The traffic prediction method according to claim 1, comprising:

A traffic prediction device for estimating future traffic volume,
External factor information storage means for storing external factor information including factors that affect traffic volume;
An external factor prediction information storage means storing prediction information of future external factors;
A traffic data collecting means for acquiring a time when the packet has flowed through the communication network and a size of the packet and storing the packet data in the packet data storage means;
Learning means for extracting the external factor information from the external factor information storage means, extracting the packet data from the packet data storage means, and obtaining a traffic amount prediction formula using a given time granularity to be predicted;
Predicting means for obtaining data of external factors for a period to be predicted from the external factor prediction information storage means, and calculating a predicted traffic amount by applying to the traffic amount prediction formula;
A traffic prediction apparatus comprising:

The learning means includes
For the packet data, a feature amount calculation means having a feature vector that is a traffic amount determined in units of the time granularity and a total number of external factors determined from the external factor information;
Sample reduction means for reducing the number of samples by applying a K-means method to the feature vector;
Traffic prediction formula creation means for obtaining the traffic amount prediction formula by applying kernel regression analysis to the sample reduced by the sample reduction means using the external factor information and the traffic volume;
The traffic prediction apparatus according to claim 3, comprising:

Computer
The traffic prediction program for functioning as each means of the traffic prediction apparatus of Claim 3 or 4.