JP2015075896A

JP2015075896A - Flow aggregation device and method

Info

Publication number: JP2015075896A
Application number: JP2013211333A
Authority: JP
Inventors: 高橋　洋介; Yosuke Takahashi; 洋介高橋; 石橋　圭介; Keisuke Ishibashi; 圭介石橋; 塩本　公平; Kohei Shiomoto; 公平塩本; 裕一大下; Yuichi Oshita; 正幸村田; Masayuki Murata; 村田　　正幸
Original assignee: Nippon Telegraph and Telephone Corp; Osaka University NUC
Current assignee: Nippon Telegraph and Telephone Corp; Osaka University NUC
Priority date: 2013-10-08
Filing date: 2013-10-08
Publication date: 2015-04-20
Anticipated expiration: 2033-10-08
Also published as: JP6021111B2

Abstract

PROBLEM TO BE SOLVED: To provide a flow aggregation device that accelerates a search function (slice function) of any dimension combination of large-scale multi- dimensional data.SOLUTION: A flow aggregation device according to the present invention is characterized as follows: a unique member is stored every dimension in a dimension member table every dimension in a multidimensional attribute table; multidimensional member information is unified and stored in a member index conversion table; the member index conversion table is referred and a member index to the member is acquired; and a member index data index conversion table is created from the data index of the data table and the acquired member index. The member index and the data index are acquired from the member index data index conversion table and converted into a single data train from the data index. A wavelet tree is created from the data train, and a search wavelet tree DB is created which comprises the combination of the member index, the data index and the wavelet tree.

Description

本発明は、フロー集約装置及び方法に係り、特に、フローデータ分析装置及びフロー集約装置において、フローデータの分析結果に基づくフロー集約を効率化するためのフロー集約装置及び方法に関する。 The present invention relates to a flow aggregation apparatus and method, and more particularly, to a flow aggregation apparatus and method for improving the efficiency of flow aggregation based on flow data analysis results in a flow data analysis apparatus and flow aggregation apparatus.

ネットワーク上のトラヒックを監視することは、ネットワーク資源の適切な設計や、異常なトラヒックの検出・制御を実現する上で欠かせない技術である。このためには細粒度でのトラヒック監視技術が必要となる。 Monitoring the traffic on the network is a technology that is indispensable for the appropriate design of network resources and the detection and control of abnormal traffic. This requires a fine-grained traffic monitoring technique.

NetFlow技術では、フローの属性情報として、送信元アドレス(SrcIP)、宛先アドレス(DstIP)、送信元ポート(SrcPort)、宛先ポート(DstPort)、プロトコル(Proto)の５-tupleに加えて、送信元AS(Autonomous System)番号、宛先AS番号、転送に用いられるルータインタフェースの入出力番号、ToS(Type of Service)値、TCP(Transmission Control Protocol)_flag値等のフロー属性情報が含まれている。 In NetFlow technology, the source attribute (SrcIP), destination address (DstIP), source port (SrcPort), destination port (DstPort), and protocol (Proto) 5-tuple are added as the flow attribute information. It includes flow attribute information such as an AS (Autonomous System) number, a destination AS number, an input / output number of a router interface used for transfer, a ToS (Type of Service) value, and a TCP (Transmission Control Protocol) _flag value.

また、転送されるトラヒック量は日々増大しており、観測されるNetFlowデータのフロー数も爆発的に増加している。このため、従来手法でのNetFlowデータへの自在なアクセスが困難となりつつある。 In addition, the amount of traffic to be transferred is increasing day by day, and the number of flows of NetFlow data observed is increasing explosively. For this reason, it is becoming difficult to freely access NetFlow data using conventional methods.

多次元フローデータを保持するためのデータ構造として、FlowID(5-toupleのフロー情報）をキーとして保持する１次元ハッシュテーブルや、Tupleごとにハッシュテーブルを作成し、ハッシュテーブルのキーをLinked Listを用いて繋げたデータ構造を有する多次元ハッシュテーブルがある（例えば、非特許文献１）。 As a data structure to hold multidimensional flow data, create a one-dimensional hash table that holds FlowID (5-touple flow information) as a key or a hash table for each Tuple, and use the Linked List as the hash table key. There is a multi-dimensional hash table having a data structure connected by use (for example, Non-Patent Document 1).

しかしながら、１次元ハッシュテーブルは、５-tuple情報をそのまま保持するため、Tupleを自在に組み合わせての問合せができない。また、多次元ハッシュテーブルは、特定のtupleをワイルドカード指定して集約フローを探索する際に、探索時間が大きいという問題がある。 However, since the one-dimensional hash table holds the 5-tuple information as it is, it is not possible to make a query by combining Tuples freely. In addition, the multidimensional hash table has a problem that the search time is long when a specific tuple is designated as a wild card to search for an aggregate flow.

また、多次元論理データ空間ビットマップによる多次元データベースがある（例えば特許文献１参照）。これは、次元の特定の組み合わせにおいて、データが多次元論理データ空間内の座標を示すビットマップを作成しておくものである。 In addition, there is a multidimensional database based on a multidimensional logical data space bitmap (see, for example, Patent Document 1). This is to create a bitmap in which data indicates coordinates in a multidimensional logical data space in a specific combination of dimensions.

多次元フローデータベースにおいて、特定次元の組み合わせを指定したフロー検索（スライス検索）を実施することで、異常トラヒック等の特徴的なフローを抽出することができる。例えば、ワーム感染の疑いのあるホストの送信元IPアドレスとワーム感染拡大に利用される宛先ポート番号の組を指定して多次元データベース内を検索することで、ワーム感染拡大に伴って発信されるフローと、その感染先ホストのIPアドレス情報を得ることができる。 By performing a flow search (slice search) specifying a combination of specific dimensions in a multi-dimensional flow database, it is possible to extract a characteristic flow such as abnormal traffic. For example, by searching the multidimensional database by specifying the pair of the source IP address of the host suspected of worm infection and the destination port number used for spreading the worm infection, it will be transmitted along with the spread of the worm infection. You can get the flow and IP address information of the infected host.

特開平０９−２６５４７９号公報JP 09-265479 A

Yan Hu, Dah-Ming Chiu, John C. S. Lui, "Entropy Based Adaptive Flow Aggregation", IEEE/ACM TRANSACTIONS ON NETWORKING, VOL., 17. NO. 3, (2009) , pp. 698 - 711.Yan Hu, Dah-Ming Chiu, John C. S. Lui, "Entropy Based Adaptive Flow Aggregation", IEEE / ACM TRANSACTIONS ON NETWORKING, VOL., 17. NO. 3, (2009), pp. 698-711.

上記の多次元データベースでは、高速にアクセスできるが、ビットマップ保持に必要な記憶容量が次元メンバ数、次元数、データ数に応じて増大するため、多次元大規模データにおいては、データベースに必要となるメモリ量が膨大となる。また、アクセス可能な次元の組み合わせパターンが限定されており、任意の次元の組み合わせでのアクセスが困難である。On-the-fly方式で問合せがある度に上記データベースを任意の組み合わせで構築する場合には、データベース構築時間及びデータベースに必要となるメモリ量が膨大となり、アクセスにかかるオーバヘッドが大きい。 The above multi-dimensional database can be accessed at high speed, but the storage capacity required to hold the bitmap increases according to the number of dimension members, the number of dimensions, and the number of data. The amount of memory becomes huge. In addition, accessible dimension combination patterns are limited, and access in any combination of dimensions is difficult. When the above database is constructed in any combination every time there is a query in the on-the-fly method, the database construction time and the amount of memory required for the database become enormous, and the access overhead is large.

分析対象となる異常トラヒックの特徴が予めわかっている場合で、かつ、一定以下の次元数・規模のデータであれば、特許文献１のように多次元論理データ空間ビットマップを作成することで高速なアクセスが可能である。しかし、分析対象となる異常トラヒックの特徴が未知である場合には、多次元データベースに対して任意の次元組み合わせでの複数パターンのスライス検索を実施しつつ、異常トラヒックの特徴を探る等の作業が必要となる。従来技術では、多次元データベースに対する、自在な次元組み合わせのスライス検索が困難である。 If the characteristics of abnormal traffic to be analyzed are known in advance, and if the number of dimensions / scale data is below a certain level, a multi-dimensional logical data space bitmap can be created at high speed as in Patent Document 1. Access is possible. However, if the characteristics of the abnormal traffic to be analyzed are unknown, a task such as searching for the characteristics of the abnormal traffic while performing a slice search of multiple patterns in any combination of dimensions in the multidimensional database is performed. Necessary. In the prior art, it is difficult to search a slice of a free dimension combination for a multidimensional database.

上記のように、１次元ハッシュテーブル、多次元ハッシュテーブルでは、フロー属性情報を自在に組み合わせた集約フローへのアクセスが困難である。また、多次元フローデータベースの多次元論理データ空間ビットマップでは、短時間でのアクセスが可能となる一方で、多次元の全ての組み合わせを考えると組み合わせ数が爆発するため、データベースに必要となる空間が膨大となり実施できない。 As described above, in a one-dimensional hash table and a multi-dimensional hash table, it is difficult to access an aggregated flow that freely combines flow attribute information. In addition, the multidimensional logical data space bitmap of the multidimensional flow database can be accessed in a short time, but the number of combinations explodes when considering all the multidimensional combinations. Is too large to implement.

本発明は、上記の点に鑑みなされたもので、大規模多次元データの任意の次元組み合わせの検索機能（スライス機能）を高速化可能なフロー集約装置及び方法を提供することを目的とする。 The present invention has been made in view of the above points, and an object of the present invention is to provide a flow aggregation apparatus and method capable of speeding up a search function (slice function) of an arbitrary dimension combination of large-scale multidimensional data.

一態様によれば、高次元かつ大規模なフローデータを対象として、多次元データを組み合わせた集約フローを検索するためのデータベースを構築するフロー集約装置であって、
読み込まれた多次元データの多次元属性を格納する多次元属性テーブルと、
前記多次元データに対するインデックス番号を格納するデータインデックステーブルと、
前記多次元データのフロー情報を格納するデータテーブルと、
前記検索するためのデータベースを構築するデータベース構築手段と、
を有し、
前記データベース構築手段は、
前記多次元属性テーブル内の次元毎に一意なメンバを次元ごとに次元メンバテーブルに格納する手段と、
前記次元メンバテーブルの前記メンバに対してインデックス番号を付与し、次元をまたがるメンバに対しては一意なインデックス番号を付与し、多次元メンバ情報を一元化し、メンバインデックス変換テーブルに格納する手段と、
前記多次元属性テーブルの属性をメンバインデックスに変換し、前記メンバインデックス変換テーブルを参照して、メンバに対するメンバインデックスを取得し、前記データテーブルのデータインデックスと取得した該メンバインデックスからメンバインデックス・データインデックス変換テーブルを作成する手段と、
前記メンバインデックス・データインデックス変換テーブルの前記メンバインデックスをウェーブレット木のルート列内インデックスに変換し、ウェーブレット木インデックス変換テーブルを作成する手段と、
前記メンバインデックス・データインデックス変換テーブルから前記メンバインデックスと前記データインデックスを取得し、該データインデックスから単一のデータ列に変換し、該データ列からウェーブレット木を作成し、該メンバインデックスと該データインデックス及び該ウェーブレット木の組み合わせからなる検索用ウェーブレット木ＤＢを作成する手段と、を有するフロー集約装置が提供される。 According to one aspect, for a high-dimensional and large-scale flow data, a flow aggregation device that constructs a database for searching an aggregation flow that combines multidimensional data,
A multi-dimensional attribute table for storing multi-dimensional attributes of the read multi-dimensional data;
A data index table storing index numbers for the multidimensional data;
A data table for storing flow information of the multidimensional data;
Database construction means for constructing a database for the search;
Have
The database construction means includes
Means for storing a unique member for each dimension in the multi-dimensional attribute table in the dimension member table for each dimension;
Means for assigning an index number to the member of the dimension member table, assigning a unique index number to members across dimensions, centralizing multi-dimensional member information, and storing in a member index conversion table;
The attribute of the multi-dimensional attribute table is converted into a member index, a member index for the member is obtained by referring to the member index conversion table, and a data index of the data table and a member index / data index from the acquired member index Means for creating a conversion table;
Means for converting the member index of the member index / data index conversion table into an index in a root column of a wavelet tree, and creating a wavelet tree index conversion table;
The member index and the data index are acquired from the member index / data index conversion table, converted from the data index into a single data string, a wavelet tree is created from the data string, the member index and the data index And a means for creating a search wavelet tree DB comprising a combination of the wavelet trees.

一態様によれば、大規模高次元フローデータを現実的なメモリ空間量で管理することができ、さらに、目的のフローを検索する際には、フロー数に対数比例する時間計算量での高速な探索が可能となる。 According to one aspect, large-scale high-dimensional flow data can be managed with a realistic amount of memory space. Further, when searching for a target flow, high speed with a time calculation amount that is logarithmically proportional to the number of flows. Search becomes possible.

本発明の一実施の形態における多次元データベース構成装置の構成例である。It is a structural example of the multidimensional database structure apparatus in one embodiment of this invention. 本発明の一実施の形態におけるデータインデックステーブル、多次元属性テーブル、データテーブルの例である。It is an example of the data index table in one embodiment of this invention, a multidimensional attribute table, and a data table. 本発明の一実施の形態における次元メンバテーブルの例である。It is an example of the dimension member table in one embodiment of this invention. 本発明の一実施の形態におけるメンバインデックス・データインデックス変換テーブルの例である。It is an example of the member index / data index conversion table in one embodiment of the present invention. 本発明の一実施の形態におけるメンバインデックス変換テーブルの例である。It is an example of the member index conversion table in one embodiment of the present invention. 本発明の一実施の形態における検索用ウェーブレット木への変換方法を示す図である。It is a figure which shows the conversion method to the wavelet tree for a search in one embodiment of this invention. 本発明の一実施の形態におけるウェーブレット木の例である。It is an example of the wavelet tree in one embodiment of this invention. 本発明の一実施の形態におけるウェーブレット木インデックス変換テーブルの例である。It is an example of the wavelet tree index conversion table in one embodiment of the present invention. 本発明の一実施の形態におけるデータベースアクセス部の処理のフローチャート（その１）である。It is a flowchart (the 1) of a process of the database access part in one embodiment of this invention. 本発明の一実施の形態におけるウェーブレット木を用いた共通部分集合演算の例である。It is an example of the common subset operation using the wavelet tree in one embodiment of this invention. 本発明の一実施の形態におけるデータベースアクセス部の処理のフローチャート（その２）である。It is a flowchart (the 2) of a process of the database access part in one embodiment of this invention. 本発明の一実施の形態におけるデータベースアクセス部の処理のフローチャート（その３）である。It is a flowchart (the 3) of a process of the database access part in one embodiment of this invention. 本発明の一実施の形態におけるPatricia木ＤＢの例である。It is an example of Patricia tree DB in one embodiment of the present invention.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

本発明は、高次元かつ大規模なフローデータを対象として、自在な組み合わせの集約フローに対して高速なアクセスを可能にするフロー集約装置（多次元データベース構成装置）を提供するものである。 The present invention provides a flow aggregation device (multi-dimensional database configuration device) that enables high-speed access to any combination of aggregate flows for high-dimensional and large-scale flow data.

図１は、本発明の一実施の形態における多次元データベース構成装置の構成例である。 FIG. 1 is a configuration example of a multidimensional database configuration apparatus according to an embodiment of the present invention.

同図に示す多次元データベース構成装置１００は、NetFlowデータ読み込み部１１０、テーブル作成部１２０、ＤＢ構築部１３０、ユーザクエリ解析部１４０、データベースアクセス部１５０、データ出力部１６０、中間ＤＢ１７０、ＤＢ１８０、データテーブル１０１、多次元属性テーブル１０２、データインデックステーブル１０３を有する。 The multidimensional database configuration apparatus 100 shown in the figure includes a NetFlow data reading unit 110, a table creation unit 120, a DB construction unit 130, a user query analysis unit 140, a database access unit 150, a data output unit 160, an intermediate DB 170, a DB 180, data A table 101, a multidimensional attribute table 102, and a data index table 103;

中間ＤＢ１７０は、メモリ上に設定され、次元メンバテーブル１７１と、メンバインデックス・データインデックス変換テーブル１７２を有する。ＤＢ１８０が構築されると当該中間ＤＢ１７０のメモリは解放される。 The intermediate DB 170 is set on a memory and includes a dimension member table 171 and a member index / data index conversion table 172. When the DB 180 is constructed, the memory of the intermediate DB 170 is released.

ＤＢ１８０は、検索用ウェーブレット木ＤＢ１８１、ウェーブレット木インデックス変換テーブル１８２、メンバインデックス変換テーブル１８３、Patricia木ＤＢ１８４を有する。 The DB 180 includes a search wavelet tree DB 181, a wavelet tree index conversion table 182, a member index conversion table 183, and a Patricia tree DB 184.

多次元データ読み込み部１１０は、フローデータ等の多次元データを読み込み、テーブル作成部１２０、ＤＢ構築部１３０に渡す。本例では、５次元のIPフロー情報の時系列トラヒック情報を読み込むものとする。 The multidimensional data reading unit 110 reads multidimensional data such as flow data and passes the data to the table creation unit 120 and the DB construction unit 130. In this example, it is assumed that time-series traffic information of five-dimensional IP flow information is read.

テーブル作成部１２０は、読み込まれた多次元データを、図２に示すように、メモリ上のテーブル１０１，１０２，１０３に格納する。データインデックステーブル１０１は、読み込まれた各５次元情報に対して付与されたインデックス番号を保持する。多次元属性テーブル１０２は、各インデックス番号に対応するSrcIP、DstIP、SrcPort、DstPort、Protoを格納する。データテーブル１０３は、インデックス番号に対応するフローの情報（トラヒック量、パケット数等）を時系列に従って保持する。データテーブル１０３はフローごとに行が分かれており、多次元属性テーブル１０２は、データテーブル１０３の各行のポインタを返すため、当該多次元属性テーブル１０２を介してポインタを取得することで、データテーブル１０３内の該当するフローの情報を得ることができる。 The table creation unit 120 stores the read multidimensional data in the tables 101, 102, and 103 on the memory as shown in FIG. The data index table 101 holds an index number assigned to each read five-dimensional information. The multidimensional attribute table 102 stores SrcIP, DstIP, SrcPort, DstPort, and Proto corresponding to each index number. The data table 103 holds flow information (traffic amount, number of packets, etc.) corresponding to the index number in time series. The data table 103 is divided into lines for each flow, and the multidimensional attribute table 102 returns a pointer of each line of the data table 103. Therefore, the data table 103 is obtained by acquiring the pointer via the multidimensional attribute table 102. Information on the corresponding flow can be obtained.

ＤＢ構築部１３０は、以下のように中間ＤＢ１７０、ＤＢ１８０を構築する。 The DB constructing unit 130 constructs the intermediate DB 170 and DB 180 as follows.

まず、ＤＢ構築部１３０は、多次元データを読み込む際に、図３に示すように次元ごとに、読み込みデータに出現した一意なメンバを次元メンバテーブル１７１に格納する。但し、SrcPort、DstPort、Protoについては昇順にメンバを格納する。SrcIP、DstIPについては、構築済みのPatricia木ＤＢ１８４を利用してIPアドレスの昇順にメンバを格納する。 First, when reading multidimensional data, the DB construction unit 130 stores, in the dimension member table 171, unique members that appear in the read data for each dimension, as shown in FIG. 3. However, members are stored in ascending order for SrcPort, DstPort, and Proto. For SrcIP and DstIP, members are stored in ascending order of IP addresses using the constructed Patricia tree DB 184.

ＤＢ構築部１３０は、メンバインデックス・データインデックス変換テーブル１７２を生成する。 The DB construction unit 130 generates a member index / data index conversion table 172.

メンバインデックス・データインデックス変換テーブル１７２は、図４に示すように、次元メンバテーブル１７１のメンバインデックスとデータテーブル１０３から得られたデータインデックスから構成される。 The member index / data index conversion table 172 includes a member index of the dimension member table 171 and a data index obtained from the data table 103, as shown in FIG.

ＤＢ構築部１３０は、以下の手順でメンバインデックス・データインデックス変換テーブル１７２を生成する。 The DB constructing unit 130 generates the member index / data index conversion table 172 by the following procedure.

（１）多次元属性テーブル１０２と図５に示すメンバインデックス変換テーブル１８３を用いて、多次元属性テーブル１０２の属性をメンバインデックスに変換する。例えば、図２の多次元テーブル１０２の１行目の
"SrcIP:10.0.01，DstIP:20.0.0.1，SrcPort:10，DstPort:20，Proto:6"
については、図５のメンバインデックス変換テーブル１８３を用いて
"SrcIP:0，DstIP:1000，SrcPort:2000，DstPort:3000，Proto:4000"
となる（これを仮に、多次元属性テーブル-2とする）。 (1) Using the multidimensional attribute table 102 and the member index conversion table 183 shown in FIG. 5, the attributes of the multidimensional attribute table 102 are converted into member indexes. For example, the first line of the multidimensional table 102 in FIG.
"SrcIP: 10.0.01, DstIP: 20.0.0.1, SrcPort: 10, DstPort: 20, Proto: 6"
About the member index conversion table 183 in FIG.
"SrcIP: 0, DstIP: 1000, SrcPort: 2000, DstPort: 3000, Proto: 4000"
(This is assumed to be multi-dimensional attribute table-2).

（２）データインデックステーブル１０１内の適当なData Indexから上記の多次元属性テーブル-2内の特定の行を得ることができる。例えば、Data Index１にアクセスすると、上記の
"SrcIP:0，DstIP:1000，SrcPort:2000，DstPort:3000，Proto:4000"
が得られ、下記のように、メンバインデックス・データインデックス変換テーブル１７２を作成する。 (2) A specific row in the multi-dimensional attribute table-2 can be obtained from an appropriate Data Index in the data index table 101. For example, when accessing Data Index 1,
"SrcIP: 0, DstIP: 1000, SrcPort: 2000, DstPort: 3000, Proto: 4000"
The member index / data index conversion table 172 is created as follows.

メンバインデックス：0に対して、Data Index 1であるので、0→１；
メンバインデックス：1000に対して、Data Index 1であるので、1000→１；
メンバインデックス：2000に対して、Data Index 1であるので、2000→１；
メンバインデックス：3000に対して、Data Index 1であるので、3000→１；
メンバインデックス：4000に対して、Data Index 1であるので、4000→１；
上記の処理を全てのData Indexにアクセスして繰り返すことでメンバインデックス・データインデックス変換テーブル１７２を作成する。 Since member index: 0 is Data Index 1, 0 → 1;
Since member index: 1000 is Data Index 1, 1000 → 1;
Since member index: 2000 is Data Index 1, 2000 → 1;
Since member index: 3000 is Data Index 1, 3000 → 1;
Since member index: 4000 is Data Index 1, 4000 → 1;
The member index / data index conversion table 172 is created by accessing and repeating all the above-described processes.

Patricia木ＤＢ１８４は、SrcIP、DstIP次元に関して、多次元データを読み込む際に、読み込みデータに出現したメンバを用いて構築されたPatricia木を保持する。パトリシア木ＤＢ１８４は、後述する図１３に示すように、SrcIP、DstIPそれぞれについて構成され、読み込まれる多次元データに存在する全てのSrcIPアドレス、DstIPアドレスが格納されている。 The Patricia tree DB 184 holds a Patricia tree constructed using members that appear in the read data when reading multi-dimensional data regarding the SrcIP and DstIP dimensions. As shown in FIG. 13 to be described later, the Patricia tree DB 184 is configured for each of SrcIP and DstIP, and stores all SrcIP addresses and DstIP addresses existing in the read multidimensional data.

メンバインデックス変換テーブル１８３は、メンバ情報に対応するメンバインデックスを保持する。ＤＢ構築部１３０は、次元メンバテーブル１７１を参照してメンバに対してインデックス番号を付与する。同時に、図５に示すように、メンバ情報からインデックス番号に変換するテーブルを作成する。この際、次元をまたがるメンバに対して一意なインデックス番号を付与することで、多次元メンバ情報の一次元化を行う（ウェーブレット木を１次元リストで構築しなければならないため）。 The member index conversion table 183 holds a member index corresponding to member information. The DB construction unit 130 refers to the dimension member table 171 and assigns index numbers to the members. At the same time, as shown in FIG. 5, a table for converting member information into index numbers is created. At this time, by assigning unique index numbers to members that cross dimensions, one-dimensionalization of multidimensional member information is performed (because a wavelet tree must be constructed with a one-dimensional list).

検索用ウェーブレット木ＤＢ１８１は、メンバインデックス・データインデックス変換テーブル１７２のメンバインデックス(図４のMember index)とデータインデックス（図４のData Index）を用いて、多次元データの一次元マッピングを行い、ウェーブレット木インデックス変換テーブル１８２を参照して、図６に示すようにマルチインデックス構造（ウェーブレット木）へと変換することにより生成される。 The wavelet tree DB 181 for search performs one-dimensional mapping of multidimensional data using the member index (Member index in FIG. 4) and the data index (Data Index in FIG. 4) of the member index / data index conversion table 172, and the wavelet The tree index conversion table 182 is referenced to generate a multi-index structure (wavelet tree) as shown in FIG.

具体的には、メンバインデックス・データインデックス変換テーブル１７２を先頭から順に参照して、Data Index(図４)を単一のデータ列に変換する。図４の例では、1,3,2,4,6,5,…,1,2,3,5,4,6…となる。当該データ列からウェーブレット木を作成する。なお、データ列からウェーブレット列の生成については既存技術（例えば、非特許文献２：http://www.slideshare.net/pfi/ss-15916040）を用いることが可能である。図７にウェーブレット木の例を示す。ウェーブレット木は、完全二分木であり、各節点にはビット列が付随する。葉は各値に対応し、内部節点は子孫の葉の範囲に対応する。 Specifically, the member index / data index conversion table 172 is referred to in order from the top, and the Data Index (FIG. 4) is converted into a single data string. In the example of FIG. 4, it becomes 1,3,2,4,6,5,..., 1,2,3,5,4,6. A wavelet tree is created from the data sequence. Note that an existing technique (for example, Non-Patent Document 2: http://www.slideshare.net/pfi/ss-15916040) can be used for generating a wavelet sequence from a data sequence. FIG. 7 shows an example of a wavelet tree. The wavelet tree is a complete binary tree, and each node is accompanied by a bit string. A leaf corresponds to each value, and an internal node corresponds to a range of descendant leaves.

このウェーブレット木を用いることで、共通のメンバインデックスを含むデータインデックスの部分集合を得られる。このために例えば、Rank辞書という省メモリで構築可能な簡潔辞書を構築しておき、高速に（データ数に対して対数比例する時間計算量O(log n,但し、nはフロー数)で）共通部分集合を求めることが可能である（例えば、非特許文献３：T. Gagie, G. Navarro, S.J. Puglisi, New algorithms on wavelet trees and applications to information retrieval, Theoretical Computer Science 426-427 (2012) pp. 25-41.参照）。Rank辞書は索引構造を持ち、ビット列B[0…n]に対し、以下の操作を備えた辞書を完備辞書（FID）と呼ぶ。 By using this wavelet tree, a subset of data indexes including a common member index can be obtained. For this purpose, for example, a simple dictionary that can be constructed with low memory, called the Rank dictionary, is built, and at high speed (time complexity O (log n, where n is the number of flows) that is logarithmically proportional to the number of data) It is possible to obtain a common subset (for example, Non-Patent Document 3: T. Gagie, G. Navarro, SJ Puglisi, New algorithms on wavelet trees and applications to information retrieval, Theoretical Computer Science 426-427 (2012) pp See 25-41. The Rank dictionary has an index structure, and a dictionary having the following operations for the bit string B [0 ... n] is called a complete dictionary (FID).

・rankb(B,pos):B[0…pos]中のｂの出現回数を返す；
・selectb(B,ind):(ind+1)番目のｂの出現位置を返す；
例えば、図７の例では、rank1(6)=2であるとき、B[0,6）中に"１"は２回出現することを返し、select0(4)=8であるとき、(4+1)番目の"０"は８で出現することを返す。 Rankb (B, pos): returns the number of occurrences of b in B [0 ... pos];
Selectb (B, ind): returns the (ind + 1) th occurrence position of b;
For example, in the example of FIG. 7, when rank1 (6) = 2, “1” is returned twice in B [0,6), and when select0 (4) = 8, (4 The +1) th "0" returns to appear at 8.

ウェーブレット木インデックス変換テーブル１８２は、データ検索時に使用され、検索対象データがウェーブレット木のどの位置に存在するかを示すテーブルである。メンバインデックス変換テーブル１８３のメンバインデックス(Member index)をウェーブレット木のルート列内インデックスに変換するテーブルであり、メンバインデックス変換テーブル１８３の作成と同時に作成される。ＤＢ構築部１３０は、メンバインデックス・データインデックス変換テーブル１７２から当該ウェーブレット木インデックス変換テーブル１８２を作成する。具体的には、図６の例では、メンバインデックス"０"は、データ列において０〜１番目に存在し、メンバインデックス"３"は、データ列において４〜６番目に存在するという情報を、図８に示すように、メンバインデックスごとのウェーブレット開始インデックス（Wavelet_start_Index）と終了インデックス(Wavelet_last_Index)を設定する。 The wavelet tree index conversion table 182 is used at the time of data search, and is a table indicating where the search target data exists in the wavelet tree. This is a table for converting the member index (Member index) of the member index conversion table 183 into an index in the root column of the wavelet tree, and is created simultaneously with the creation of the member index conversion table 183. The DB construction unit 130 creates the wavelet tree index conversion table 182 from the member index / data index conversion table 172. Specifically, in the example of FIG. 6, information that the member index “0” exists in the 0th to 1st positions in the data string, and the member index “3” exists in the 4th to 6th positions in the data string. As shown in FIG. 8, a wavelet start index (Wavelet_start_Index) and an end index (Wavelet_last_Index) are set for each member index.

クエリ解析部１４０は、入力された各次元名とそのメンバ情報からなるユーザクエリを解析し、データベース１８０にアクセスするためのデータベースアクセスクエリを生成する。 The query analysis unit 140 analyzes a user query including each input dimension name and its member information, and generates a database access query for accessing the database 180.

データベースアクセス部１５０は、データベースアクセスクエリを以下の手順でデータインデックスへと変換する。 The database access unit 150 converts the database access query into a data index according to the following procedure.

（１）データベースアクセスクエリをウェーブレット木インデックス変換テーブル１８２を参照して、ウェーブレット木ルート列内インデックスの組へと変換する。 (1) The database access query is converted into a set of indexes in the wavelet tree root column with reference to the wavelet tree index conversion table 182.

（２）検索用ウェーブレット木ＤＢ１８１に対して、共通部分集合を求める処理を実施する。 (2) A process for obtaining a common subset is performed on the search wavelet tree DB 181.

本例では、５次元データであるので、最大５つの共通部分集合を求める関数を準備する。具体的には、非特許文献３の３章（3. New algorithms）に記載されている、２つの共通部分集合を得るためのアルゴリズムを用いる。当該非特許文献３の３．３節（3.3 Range intersection）において、３つ以上の共通部分集合を得るための関数拡張が可能である旨が記載されている。 In this example, since it is five-dimensional data, a function for obtaining a maximum of five common subsets is prepared. Specifically, an algorithm for obtaining two common subsets described in Chapter 3 (3. New algorithms) of Non-Patent Document 3 is used. In section 3.3 (3.3 Range intersection) of Non-Patent Document 3, it is described that a function extension for obtaining three or more common subsets is possible.

以下にデータベースアクセス部１５０の具体的な動作を説明する。 The specific operation of the database access unit 150 will be described below.

まず、ユーザクエリXとして、SrcIP，DstIP，SrcPort，DstPort，Protoの全てが指定されている場合について説明する。 First, a case where all of SrcIP, DstIP, SrcPort, DstPort, and Proto are specified as the user query X will be described.

図９は、本発明の一実施の形態におけるデータベースアクセス部の処理のフローチャート（その１）である。 FIG. 9 is a flowchart (No. 1) of the process of the database access unit in the embodiment of the present invention.

本例では、ユーザクエリXとして、
"SrcIP=10.0.0.1，DstIP=20.0.0.5，SrcPort=10，DstPort=20，Proto=6"
が入力されると（ステップ１０１）、当該ユーザクエリXでメンバインデックステーブル変換テーブル１８３（図５）を参照し、ユーザクエリXに対応するメンバインデックスX
"0，1001，2000，3000，4001"
を取得する（ステップ１０２）。 In this example, as user query X,
"SrcIP = 10.0.0.1, DstIP = 20.0.0.5, SrcPort = 10, DstPort = 20, Proto = 6"
Is entered (step 101), the member index table conversion table 183 (FIG. 5) is referenced with the user query X, and the member index X corresponding to the user query X is entered.
"0, 1001, 2000, 3000, 4001"
Is acquired (step 102).

次に、上記のメンバインデックスXに基づいて、ウェーブレット木インデックス変換テーブル１８２（図８）を参照し、メンバインデックスXに対するウェーブレット木インデックスX
"［0,1］，[40,46]，[60,72]，[89,94]，[130,132])"
を取得する（ステップ１０３）。 Next, based on the member index X, the wavelet tree index conversion table 182 (FIG. 8) is referred to, and the wavelet tree index X for the member index X
"[0,1], [40,46], [60,72], [89,94], [130,132])"
Is acquired (step 103).

上記で取得したウェーブレット木インデックスXに対して、非特許文献３の技術を適用して共通部分集合演算を行い、データインデックス（1）を取得し、データ出力部１６０に出力する（ステップ１０４）。 A common subset operation is performed on the wavelet tree index X acquired above by applying the technique of Non-Patent Document 3, and a data index (1) is acquired and output to the data output unit 160 (step 104).

ステップ１０４のウェーブレット木を用いた共通部分集合演算は、図１０の『0721436725047263』について共通部分集合を求める場合、二つの集合［（214）と（250）］の共通部分集合をウェーブレット木で求めるものとする。Rank0操作で左の節に移動（実線）し、Rank１操作で右の節に移動（破線）できる。Rank操作は集合の始点と終点でそれぞれ実施される（集合の長さに無関係）。Rank操作は定数時間とする。２つの集合が同じ値のポインタをさす場合、それが共通部分集合の値となる。 The common subset operation using the wavelet tree in step 104 is to calculate the common subset of the two sets [(214) and (250)] using the wavelet tree when obtaining the common subset for “0721436725047263” in FIG. And You can move to the left node with the Rank0 operation (solid line) and move to the right node with the Rank1 operation (dashed line). Rank operation is performed at the start point and end point of the set (regardless of the set length). Rank operation is a constant time. If two sets point to the same value pointer, it becomes the value of the common subset.

次に、ユーザクエリYとして、DstIP，Protoのみが指定されている場合について説明する。 Next, a case where only DstIP and Proto are specified as the user query Y will be described.

図１１は、本発明の一実施の形態におけるデータベースアクセス部の処理のフローチャート（その２）である。 FIG. 11 is a flowchart (part 2) of the process of the database access unit in the embodiment of the present invention.

本例では、ユーザクエリYとして、
"SrcIP=*，DstIP=20.0.0.1，SrcPort=*，DstPort=*，Proto=6"
が入力されると（ステップ２０１）、当該ユーザクエリYでメンバインデックステーブル変換テーブル１８３（図５）を参照し、ユーザクエリYに対応するメンバインデックスY
"1000，4000"
を取得する（ステップ２０２）。 In this example, as user query Y,
"SrcIP = *, DstIP = 20.0.0.1, SrcPort = *, DstPort = *, Proto = 6"
Is input (step 201), the member index table conversion table 183 (FIG. 5) is referenced with the user query Y, and the member index Y corresponding to the user query Y
"1000, 4000"
Is acquired (step 202).

次に、上記のメンバインデックスYに基づいて、ウェーブレット木インデックス変換テーブル１８２（図８）を参照し、メンバインデックスYに対するウェーブレット木インデックスY
" [30,40]，[120,130])"
を取得する（ステップ２０３）。 Next, based on the member index Y, the wavelet tree index conversion table 182 (FIG. 8) is referred to, and the wavelet tree index Y for the member index Y
"[30,40], [120,130])"
Is acquired (step 203).

上記で取得したウェーブレット木インデックスYに対して、非特許文献３の技術を適用して共通部分集合演算を行い、データインデックス（1，2，5）を取得し、データ出力部１６０に出力する（ステップ２０４）。共通部分集合演算は、上記と同様である。 The common subset operation is performed on the wavelet tree index Y acquired above by applying the technique of Non-Patent Document 3, the data index (1, 2, 5) is acquired and output to the data output unit 160 ( Step 204). The common subset operation is the same as described above.

次に、ユーザクエリZとして、SrcIPとDstIPのみが指定されている場合について説明する。 Next, a case where only SrcIP and DstIP are specified as the user query Z will be described.

図１２は、本発明の一実施の形態におけるデータベースアクセス部の処理のフローチャート（その３）である。 FIG. 12 is a flowchart (No. 3) of the process of the database access unit in the embodiment of the present invention.

本例では、ユーザクエリZとして、
"SrcIP=*，DstIP=10.0.0.4/30， DstIP=20.0.0.1"
が入力されると（ステップ３０１）、当該ユーザクエリZにSrcIPについて、DstIP[10.0.0.4/30]の範囲でメンバインデックス変換テーブル１８３に含まれるアドレスが10.0.0.1〜10.0.0.6であるため[10.0.0.1，10.0.0.6]とし、図１３に示すようなPatricia木ＤＢ１８４を参照する。（ステップ３０２）。 In this example, as user query Z,
"SrcIP = *, DstIP = 10.0.0.4 / 30, DstIP = 20.0.0.1"
Is input (step 301), since the address included in the member index conversion table 183 is 10.0.0.1 to 10.0.0.6 in the range of DstIP [10.0.0.4/30] for SrcIP in the user query Z [ 10.0.0.1, 10.0.0.6] and refer to the Patricia tree DB 184 as shown in FIG. (Step 302).

ユーザクエリZのSrcIP=[10.0.0.1]とSrcIP= [10.0.0.6]に基づいてメンバインデックス変換テーブル１８３を参照し、メンバインデックス[0，3]を取得し、DstIP=20.0.0.1に基づいてメンンバインデックス"1000"を取得する。ここで、SrcIP_firstのメンバインデックスの先頭と末尾の組をクエリとする（ステップ３０３）。 Based on SrcIP = [10.0.0.1] and SrcIP = [10.0.0.6] of the user query Z, the member index conversion table 183 is referred to, the member index [0, 3] is obtained, and based on DstIP = 20.0.0.1 Get member index "1000". Here, the pair of the head and tail of the member index of SrcIP_first is used as a query (step 303).

ステップ３０３で得られたユーザクエリZ(［0,3］，1000)に基づいて、ウェーブレット木インデックス変換テーブル１８２（図８）を参照し、ウェーブレット木インデックスZ（[0,6]，[40,46]）を取得する（ステップ３０４）。図８のウェーブレット木インデックス変換テーブル１８２の例では、１つ目の集合内の先頭であるメンバインデックス"０"のWevelet_start_Indexは"0"であり、集合内の末尾であるメンバインデックス"３"のWavelet_last_Indexは"6"である。 Based on the user query Z ([0, 3], 1000) obtained in step 303, the wavelet tree index conversion table 182 (FIG. 8) is referred to, and the wavelet tree index Z ([0, 6], [40, 46]) is acquired (step 304). In the example of the wavelet tree index conversion table 182 in FIG. 8, the Wavelet_start_Index of the member index “0” at the top in the first set is “0”, and the Wavelet_last_Index of the member index “3” at the end in the set. Is "6".

上記で取得したウェーブレット木インデックスZに対して、非特許文献３の技術を適用して共通部分集合演算を行い、データインデックス（1，2，3，5）を取得し、データ出力部１６０に出力する（ステップ３０５）。共通部分集合演算は、上記と同様である。 The common subset operation is performed on the wavelet tree index Z acquired above by applying the technique of Non-Patent Document 3, the data index (1, 2, 3, 5) is acquired and output to the data output unit 160 (Step 305). The common subset operation is the same as described above.

上記のように、本発明では、マルチインデックス構造とすることにより、任意のtupleの組み合わせに対して均一時間でのアクセスが可能となる。また、ウェーブレット木構造を組み合わせることで、アクセス時間の時間計算量をO(n)からO(log n)へ削減することが可能となる。 As described above, in the present invention, the multi-index structure allows access to any combination of tuples in a uniform time. Also, by combining the wavelet tree structure, the time complexity of access time can be reduced from O (n) to O (log n).

上記の実施の形態に示した多次元データベース構成装置の各構成要素の動作をプログラムとして構築し、多次元データベース構成装置として利用されるコンピュータにインストールして実行させる、または、ネットワークを介して流通させることが可能である。 The operation of each component of the multidimensional database configuration apparatus shown in the above embodiment is constructed as a program and installed in a computer used as the multidimensional database configuration apparatus to be executed or distributed via a network. It is possible.

本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において、種々変更・応用が可能である。 The present invention is not limited to the above-described embodiments, and various modifications and applications are possible within the scope of the claims.

１００多次元データベース構成装置
１０１データインデックステーブル
１０２多次元属性テーブル
１０３データテーブル
１１０多次元データ読み込み部
１２０テーブル作成部
１３０ＤＢ構築部
１４０ユーザクエリ解析部
１５０データベースアクセス部
１６０データ出力部
１７０中間ＤＢ
１７１次元メンバテーブル
１７２メンバインデックス・データインデックス変換テーブル
１８０ＤＢ
１８１検索用ウェーブレット木ＤＢ
１８２ウェーブレット木インデックス変換テーブル
１８３メンバインデックス変換テーブル
１８４ Patricia木 DESCRIPTION OF SYMBOLS 100 Multidimensional database structure apparatus 101 Data index table 102 Multidimensional attribute table 103 Data table 110 Multidimensional data reading part 120 Table creation part 130 DB construction part 140 User query analysis part 150 Database access part 160 Data output part 170 Intermediate DB
171 Dimension member table 172 Member index / data index conversion table 180 DB
181 Wavelet tree DB for search
182 Wavelet tree index conversion table 183 Member index conversion table 184 Patricia tree

Claims

A flow aggregation device that constructs a database for searching aggregate flows combining multidimensional data for high-dimensional and large-scale flow data,
A multi-dimensional attribute table for storing multi-dimensional attributes of the read multi-dimensional data;
A data index table storing index numbers for the multidimensional data;
A data table for storing flow information of the multidimensional data;
Database construction means for constructing a database for the search;
Have
The database construction means includes
Means for storing a unique member for each dimension in the multi-dimensional attribute table in the dimension member table for each dimension;
Means for assigning an index number to the member of the dimension member table, assigning a unique index number to members across dimensions, centralizing multi-dimensional member information, and storing in a member index conversion table;
The attribute of the multi-dimensional attribute table is converted into a member index, a member index for the member is obtained by referring to the member index conversion table, and a data index of the data table and a member index / data index from the acquired member index Means for creating a conversion table;
Means for converting the member index of the member index / data index conversion table into an index in a root column of a wavelet tree, and creating a wavelet tree index conversion table;
The member index and the data index are acquired from the member index / data index conversion table, converted from the data index into a single data string, a wavelet tree is created from the data string, the member index and the data index And means for creating a search wavelet tree DB comprising a combination of the wavelet trees;
A flow aggregating apparatus comprising:

The database construction means includes
When creating the dimension member table, the dimension is a source port (SrcPort), a destination port (DstPort), and a protocol (Proto) for storing in the dimension member table in ascending order;
For the source address (SrcIP) and the destination address (DstIP), means for storing in the dimension member table in ascending order of IP addresses using the Patricia tree;
The flow aggregation device according to claim 1, comprising:

When a user query consisting of a dimension name and member information is input, query analysis means for analyzing the user query and generating a database access query for accessing the database;
A database that performs processing for converting the database access query to a set of indexes in a wavelet tree root sequence with reference to the wavelet tree index conversion table, and for obtaining a common subset by referring to the wavelet tree DB for search Access means;
The flow aggregation device according to claim 1, further comprising:

A flow aggregation method for constructing a database for searching aggregate flows combining multidimensional data for high-dimensional and large-scale flow data,
A multi-dimensional attribute table for storing multi-dimensional attributes of the read multi-dimensional data;
A data index table storing index numbers for the multidimensional data;
A data table for storing flow information of the multidimensional data;
Database construction means for constructing a database for the search;
In a device having
The database construction means
Storing a unique member for each dimension in the multidimensional attribute table in the dimension member table for each dimension;
Assigning an index number to the member of the dimension member table, assigning a unique index number to members across dimensions, centralizing multi-dimensional member information, and storing in a member index conversion table;
The attribute of the multi-dimensional attribute table is converted into a member index, a member index for the member is obtained by referring to the member index conversion table, and a data index of the data table and a member index / data index from the acquired member index Creating a translation table;
Converting the member index of the member index / data index conversion table into an index in a root column of a wavelet tree to create a wavelet tree index conversion table;
The member index and the data index are acquired from the member index / data index conversion table, converted from the data index into a single data string, a wavelet tree is created from the data string, the member index and the data index Creating a search wavelet tree DB comprising a combination of the wavelet trees;
The flow aggregation method characterized by performing.

When creating the dimension member table,
For the dimension, the source port (SrcPort), the destination port (DstPort), and the protocol (Proto) are stored in the dimension member table in ascending order; and
For the source address (SrcIP) and the destination address (DstIP), using the Patricia tree, storing in the dimension member table in ascending order of IP addresses;
The flow aggregation method according to claim 4, comprising:

In an apparatus further comprising query analysis means and database access means,
When the query analysis means receives a user query consisting of a dimension name and member information, the query analysis means analyzes the user query and generates a database access query for accessing the database;
The database access means converts the database access query into a set of indexes in a wavelet tree root sequence with reference to the wavelet tree index conversion table, refers to the search wavelet tree DB, and sets a common subset. Performing a process for obtaining
The flow aggregation method according to claim 4 or 5, further comprising: