JP2011170461A

JP2011170461A - Information accumulation retrieval method and information accumulation retrieval program

Info

Publication number: JP2011170461A
Application number: JP2010031795A
Authority: JP
Inventors: Yutaka Arakawa; 豊荒川; Takayuki Nakamura; 隆幸中村; Motonori Nakamura; 元紀中村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2010-02-16
Filing date: 2010-02-16
Publication date: 2011-09-01
Anticipated expiration: 2030-02-16
Also published as: JP5470082B2

Abstract

<P>PROBLEM TO BE SOLVED: To perform efficient retrieval with respect to various kinds of multidimensional tuples. <P>SOLUTION: In an information accumulation retrieval method, a retrieval key is indexed from a tuple that is a unit of information of an accumulation retrieval target, and a retrieval tree is constructed. When an entry B is inserted into an existing entry A already included by an operation target node, when an existing entry having a smallest "penalty when inserting an entry X into an existing entry already included by the operation target node" is selected in the operation target node, when a slave node or the entry X comprising a tuple identifier and a key is inserted into a node of an index in construction of the index, penalty is defined as a value indicating increment of a value corresponding to probability that the entry A is retrieved, when the entry A prior to an entry B is inserted and the entry A after the insertion, when the entry B is inserted into the existing entry A already included by the operation target node. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、多次元かつ多種類のデータの中から、目的のデータを高速に検索するためのインデキシング方式の構築の際に用いて好適な情報蓄積検索方法及び情報蓄積検索プログラムに関するものである。 The present invention relates to an information storage search method and an information storage search program suitable for use in the construction of an indexing method for searching for target data at a high speed from multi-dimensional and various types of data.

データベース技術の分野においては、多量のデータから目的のデータを高速に検索するために、様々なインデキシング技術が考案されている。以下、個々のデータを、リレーショナルデータベース（ＲＤＢ）における呼び名と同様にタプルと呼ぶ。タプルは、１つ以上の属性と値のペア（ＡＶペア（属性−値ペア））から構成されるものを指すものとする。また、タプルの集合を表（テーブル）として表したものが、タプル表である。 In the field of database technology, various indexing technologies have been devised in order to retrieve target data from a large amount of data at high speed. Hereinafter, each piece of data is referred to as a tuple, as is a name in a relational database (RDB). A tuple is assumed to be composed of one or more attribute-value pairs (AV pairs (attribute-value pairs)). A tuple table represents a set of tuples as a table.

現在最も広く使われているＲＤＢにおいては、Ｂ木と呼ばれるデータ構造がインデックスとして用いられることが多い。特に、実用化されているＲＤＢにおいてはＢ木の改良型であるＢ＋木やＢ＊木を使うことが多い。これらの技術はファイルシステムにおいても活用されている。 In the most widely used RDB at present, a data structure called a B-tree is often used as an index. In particular, in RDB that is put into practical use, B + trees and B * trees, which are improved types of B trees, are often used. These technologies are also used in file systems.

これらは各タプル内の１種類の属性に対する属性値（ＲＤＢで言えばある１つのカラムに入る値）の集合に対して構築される木の形をしたデータ構造（木構造）であり、検索木とも呼ばれる。検索時に検索条件を指定したい属性毎に、この検索木を構築することで、高速な検索が可能となる。 These are data structures (tree structures) in the form of a tree constructed for a set of attribute values (values that fall within one column in RDB) for one type of attribute in each tuple. Also called. By constructing this search tree for each attribute for which a search condition is desired at the time of search, high-speed search is possible.

これらＢ木ないしその改良型は１つの属性に対して構築されるインデックスであるが、他にも、２つ以上の属性に対して構築されるインデックスが存在し、２つ以上の属性の値もしくは値の範囲を指定した検索を高速に実現するのに利用される。こうしたインデックスは複数の属性をもつタプル、すなわち多次元データを含むタプルに対するものであるため、多次元インデックスと呼ばれる。多次元インデックスの最も代表的なものはＲ木である。Ｒ木はＢ木と同様に木構造であり、Ｒ＊木、Ｒ＋木など多くの改良型が存在する。 These B-trees or their improved types are indexes built for one attribute, but there are other indexes built for two or more attributes and the values of two or more attributes or It is used to realize a search with a specified range of values at high speed. Since such an index is for a tuple having a plurality of attributes, that is, a tuple containing multidimensional data, it is called a multidimensional index. The most typical multidimensional index is an R-tree. The R tree has a tree structure similar to the B tree, and there are many improved types such as an R * tree and an R + tree.

これら従来の検索木の多くは、非特許文献１に記載されているように、共通して以下の特徴をもつ。 Many of these conventional search trees have the following characteristics in common as described in Non-Patent Document 1.

・１つ以上のノードを含み、
該ノードは０から予め定められた閾値までの個数のエントリを含み、
該ノードはリーフノードまたはインナーノードであり、
該リーフノードの該エントリは、
タプルの識別子とタプルのキーの組から構成され、
該インナーノードの該エントリは、
他のノードへのポインタと、
該他のノードが含むエントリ集合のキーの和をとったキーの組とから構成され、
該キーは、１以上の長さ（すなわち１以上のデータ列の長さ；以下同様）を有する属性−値または属性−値範囲の組の並びであり、
ルートノードと呼ぶ第１階層のノードから、
該ポインタによってノード間が階層的に接続された
木構造の検索木である。 Contains one or more nodes,
The node includes a number of entries from 0 to a predetermined threshold;
The node is a leaf node or an inner node,
The entry of the leaf node is
It consists of a tuple identifier and a tuple key pair.
The entry of the inner node is
A pointer to another node,
A key set obtained by summing the keys of the entry set included in the other node,
The key is a sequence of attribute-value or attribute-value range pairs having one or more lengths (ie, one or more data string lengths; the same shall apply hereinafter);
From the first layer node called the root node,
This is a tree-structured search tree in which nodes are hierarchically connected by the pointer.

また同様に、非特許文献１に記載されているように、これらの検索木へのタプル挿入方法、すなわちタプルのインデキシング方法は、共通して以下の手順を含む。 Similarly, as described in Non-Patent Document 1, the tuple insertion method to these search trees, that is, the tuple indexing method, includes the following procedures in common.

・非特許文献１においてＳｐｌｉｔと呼ばれる手順：
検索木内の１つのノードと、１つのエントリに対し、
該ノード内のキー集合と該エントリ内のキーを含めたキー集合を２つのグループに分割し、そのうち１つを新しいノードに入れ、該新しいノードを該ノードの親ノードに挿入するノード分割手順。 A procedure called Split in Non-Patent Document 1:
For one node and one entry in the search tree,
A node splitting procedure in which a key set in the node and a key set including a key in the entry are divided into two groups, one of them is put into a new node, and the new node is inserted into the parent node of the node.

・非特許文献１においてＡｄｊｕｓｔＫｅｙｓと呼ばれる手順：
検索木内の１つのノードに対し、
該ノードが含むキー集合情報を親ノードに伝達し親ノードに子ノードである
該ノードが含むキー集合情報を保持させる操作を該ノードからルートノードまで再帰的に行うキー集合情報調整手順。 A procedure called “AdjustKeys” in Non-Patent Document 1:
For one node in the search tree,
A key set information adjustment procedure for recursively performing an operation for transmitting the key set information included in the node to the parent node and holding the key set information included in the node as a child node from the node to the root node.

・非特許文献１においてＣｈｏｏｓｅＳｕｂｔｒｅｅと呼ばれる手順：
１つのエントリＸと、レベル数Ｌに対し、
ルートノードから、レベル数Ｌのノードに至るまでの各ノードにおいて、
該ノードが含むエントリのうち、該ノードが含むエントリを既存エントリ、
該エントリＸを挿入するエントリとした場合のペナルティが
最も小さい該ノードが含むエントリを選出し、
該エントリに含まれるポインタが指し示す子ノードを次の操作対象とする操作を再帰的に行い、
最終的にレベル数Ｌのノードを１つ選択する部分木選択手順。 A procedure called ChooseSubtree in Non-Patent Document 1:
For one entry X and number of levels L,
In each node from the root node to the node of level number L,
Among the entries included in the node, the entry included in the node is an existing entry,
When the entry X is an entry to be inserted, an entry included in the node having the smallest penalty is selected,
Recursively perform the operation that sets the child node indicated by the pointer included in the entry as the next operation target,
A subtree selection procedure for finally selecting one node of level number L.

ここで、ペナルティとは、インデキシングにおける各手順を経て構築された検索木を用いて所望の検索条件を満たすタプルを見つける際の効率を表す指標であり、部分木選択手順では、良い選択か否かを表す指標となる。 Here, the penalty is an index indicating the efficiency in finding a tuple satisfying a desired search condition using a search tree constructed through each procedure in indexing. It becomes an index showing.

・非特許文献１においてＩｎｓｅｒｔと呼ばれる手順：
１つのエントリと、レベル数Ｌに対し、
該エントリと該レベル数Ｌに対する部分木選択手順を実施し、
選択されたレベル数Ｌのノードに対し、
該ノードが予め決められた数未満のエントリを含む場合には、
該ノードに該エントリを加え、
該ノードが予め決められた数以上のエントリを含む場合には、
該ノードと、該エントリに対するノード分割手順を実施し、
該ノードに対するキー集合情報調整手順を実施するエントリ挿入手順。 A procedure called Insert in Non-Patent Document 1:
For one entry and the number of levels L,
Performing a subtree selection procedure for the entry and the level number L;
For the selected number of level L nodes,
If the node contains less than a predetermined number of entries,
Add the entry to the node,
If the node contains more than a predetermined number of entries,
Perform node splitting procedure for the node and the entry,
An entry insertion procedure for performing a key set information adjustment procedure for the node.

また同様に、非特許文献１に記載されているように、これらの検索木からのタプル検索方法は、共通して以下の手順を含む。 Similarly, as described in Non-Patent Document 1, the tuple search method from these search trees includes the following procedures in common.

・非特許文献１においてＳｅａｒｃｈと呼ばれる手順：
検索対象とする１つのノードと、検索式としてのキーである検索キーに対し、
該ノードがリーフノードでなければ、該ノードに含まれる各エントリについて、
該エントリのキーが検索キーに含まれる各属性を全て含みかつ
検索キーに含まれる各属性に対応する値または値範囲と該エントリのキーに含まれる該属性に対応する値または値範囲とが一致または部分的に一致するかを調べ、
一致または部分的に一致する場合には該エントリのポインタで接続されたノードを検索対象、該検索キーを検索キーとしてノード検索手順を再帰的に行い、
該ノードがリーフノードであれば、該ノードに含まれる各エントリについて、
該エントリのキーが検索キーに含まれる各属性を全て含みかつ
検索キーに含まれる各属性に対応する値または値範囲に該エントリのキーに含まれる該属性に対応する値が含まれるかを調べ、
含まれる場合には該エントリのタプルの識別子を検索結果タプル識別子集合に加えるノード検索手順。 A procedure called “Search” in Non-Patent Document 1:
For one node to be searched and a search key that is a key as a search expression,
If the node is not a leaf node, for each entry contained in the node:
The key of the entry includes all the attributes included in the search key, and the value or value range corresponding to each attribute included in the search key matches the value or value range corresponding to the attribute included in the key of the entry Or check for partial matches,
If there is a match or partial match, the node connected by the entry pointer is searched, the node search procedure is performed recursively using the search key as a search key,
If the node is a leaf node, for each entry contained in the node,
Check whether the key of the entry includes all the attributes included in the search key and whether the value or value range corresponding to each attribute included in the search key includes the value corresponding to the attribute included in the key of the entry ,
If included, a node search procedure for adding the tuple identifier of the entry to the search result tuple identifier set.

前記ペナルティとして、例えばＲ木においては、一般に最小包囲矩形、あるいは最小外接矩形（ＭｉｎｉｍｕｍＢｏｕｎｄｉｎｇＲｅｃｔａｎｇｌｅ；ＭＢＲ）と呼ばれるオブジェクトの大きさの増加量が用いられる。３次元空間内の座標データを示すタプル集合であれば、各エントリのキーも３次元となり、エントリの最小外接矩形は、該エントリが下位の階層に含むタプル集合を全て含む各面がいずれかの軸に平行な最小の直方体となる。タプル挿入においては、前記部分木選択手順により該直方体の体積増加が最小となるノードをタプル挿入先として選択することになる。これは、３次元空間内で近いタプル同士を同じノードへ入れていくクラスタリングに相当し、これにより、検索条件を満たすタプルを効率的に見つけることができるようになる。また、ペナルティは、前記ノード分割手順においても、良い分割か否かの指標として用いられる。 As the penalty, for example, in an R-tree, an increase amount of an object size generally called a minimum bounding rectangle or a minimum bounding rectangle (MBR) is used. If it is a tuple set indicating coordinate data in a three-dimensional space, the key of each entry is also three-dimensional, and the minimum circumscribed rectangle of the entry is any surface that contains all the tuple sets included in the lower hierarchy. The smallest cuboid parallel to the axis. In tuple insertion, a node that minimizes the volume increase of the rectangular parallelepiped is selected as a tuple insertion destination by the subtree selection procedure. This corresponds to clustering in which tuples that are close to each other in the three-dimensional space are put into the same node, and thereby, tuples that satisfy the search condition can be found efficiently. Also, the penalty is used as an index of whether or not the division is good in the node division procedure.

ＪｏｓｅｐｈＭ．Ｈｅｌｌｅｒｓｔｅｉｎ，ＪｅｆｆｒｅｙＦ．ＮａｕｇｈｔｏｎａｎｄＡｖｉＰｆｅｆｆｅｒ， “ＧｅｎｅｒａｌｉｚｅｄＳｅａｒｃｈＴｒｅｅｓｆｏｒＤａｔａｂａｓｅＳｙｓｔｅｍｓ，” Ｐｒｏｃ．２１ｓｔＩｎｔ’ｌＣｏｎｆ．ｏｎＶｅｒｙＬａｒｇｅＤａｔａＢａｓｅｓ，Ｚｕｅｒｉｃｈ，Ｓｅｐｔｅｍｂｅｒ１９９５，ｐｐ．５６２−５７３．Joseph M.M. Hellerstein, Jeffrey F .; Naughton and Avi Pfeffer, “Generalized Search Trees for Database Systems,” Proc. 21st Int'l Conf. on Very Large Data Bases, Zürich, September 1995, pp. 562-573.

ＭＥＭＳ（Ｍｉｃｒｏ−Ｅｌｅｃｔｒｏ−ＭｅｃｈａｎｉｃａｌＳｙｓｔｅｍｓ）技術、蓄電技術、通信技術などの発達により、様々な小型センサデバイスが安価に入手可能となりつつある。身の回りにこうした多様なセンサが多数配置され、それらを利用した様々なアプリケーションが我々の生活を支援する、そうしたユビキタス環境の実現が期待される。こうしたセンサデバイスから出力されるセンサデータは、通常、前記タプルの形式で表現することが可能である。 With the development of MEMS (Micro-Electro-Mechanical Systems) technology, power storage technology, communication technology, and the like, various small sensor devices are becoming available at low cost. It is expected to realize such a ubiquitous environment where many such various sensors are placed around us and various applications using them support our lives. Sensor data output from such a sensor device can usually be expressed in the form of the tuple.

様々なセンサやアプリケーションが存在するユビキタス環境においては、それだけ様々な属性を含んだセンサデータ（タプル）が蓄積され、また様々な属性を指定した検索が行われるようになる。たとえばｘ，ｙ方向の２次元の加速度センサから得られるタプルと、ｘ，ｙ，ｚ方向の３次元の加速度センサから得られるタプルが混在する場合、ｘ，ｙ方向の加速度を知りたいアプリケーションは、どちらの種類のタプルも活用できるべきである。すなわち、「ｘ，ｙ方向の加速度データ」を検索条件とした場合、どちらの種類のタプルも検索する必要がある。このような、次元数や次元種類に依存しない、タプルの横断的検索が必要とされ、また特に、数値で示されることが多いセンサデータの検索においては、範囲検索が重要となる。こうした多種多次元なタプルに対する範囲検索をなるべく効率的に実現する場合、従来のインデックス技術を利用する場合にはＢ木（あるいはその改良型）を使う方法と、Ｒ木（あるいはその改良型）を使う方法が考えられる。 In a ubiquitous environment in which various sensors and applications exist, sensor data (tuples) including various attributes are accumulated, and a search specifying various attributes is performed. For example, when a tuple obtained from a two-dimensional acceleration sensor in the x, y direction and a tuple obtained from a three dimensional acceleration sensor in the x, y, z direction are mixed, an application that wants to know the acceleration in the x, y direction is: Both types of tuples should be available. That is, when “acceleration data in x and y directions” is used as a search condition, it is necessary to search for both types of tuples. In such a search of sensor data that is often indicated by a numerical value, a range search is important. In order to realize range search for such multi-dimensional tuples as efficiently as possible, when using conventional index technology, a method using a B-tree (or an improved version thereof) and an R-tree (or an improved version thereof) are used. The method of using can be considered.

Ｂ木を使う方法は、各属性に対してＢ木によるインデックスを構築しておき、検索条件で指定される各属性に対し、当該属性に対応するＢ木インデックスを使用して当該属性の条件を満たすタプルを検索した後、全結果のＡＮＤ（論理積）をとり（すなわち、どの結果にも含まれているタプルのみを抽出し）、それを最終的な検索結果とする、というものである。 In the method using the B-tree, an index based on the B-tree is constructed for each attribute, and for each attribute specified by the search condition, the condition of the attribute is set using the B-tree index corresponding to the attribute. After searching for the tuples that satisfy the condition, an AND (logical product) of all the results is obtained (that is, only the tuples included in any result are extracted), and this is used as the final search result.

しかしながら、Ｂ木を使う方法は、複数のＢ木にアクセスするため、検索時のアクセスノード数の総数が大きくなってしまう。アクセスノード数は、検索の処理量や速度を決める重要なパラメータである。またＡＮＤをとる処理の処理量は、最終的な検索結果の量ではなく、各Ｂ木での検索で得られた中間結果の量に依存するため、タプル数に従って大きくなりやすい。つまりＢ木を使う方法には、検索処理全体の処理量や処理時間が大きくなりやすいという問題がある。 However, since the method using the B-tree accesses a plurality of B-trees, the total number of access nodes at the time of search increases. The number of access nodes is an important parameter that determines the search processing amount and speed. Further, the amount of processing of AND processing depends not on the amount of final search results but on the amount of intermediate results obtained by searching with each B-tree, and therefore tends to increase according to the number of tuples. In other words, the method using the B-tree has a problem that the processing amount and processing time of the entire search process are likely to increase.

Ｒ木を使う方法は、タプル種類毎に、つまり前記の加速度センサの例であればｘ，ｙ２次元のタプルとｘ，ｙ，ｚ３次元のタプルそれぞれに対して、Ｒ木によるインデックスを構築しておき、検索条件で指定される全属性を含むタプル種類に対応するＲ木インデックスを使用して検索条件を満たすタプルを検索した後、全結果のＯＲ（論理和）をとり（各結果をまとめ）、それを最終的な検索結果とする、というものである。 The method using the R-tree is to construct an index by the R-tree for each tuple type, that is, in the case of the acceleration sensor described above, for each of the x, y2 dimensional tuple and the x, y, z3 dimensional tuple. After searching for tuples that satisfy the search condition using the R-tree index corresponding to the tuple type including all attributes specified by the search condition, OR (logical sum) of all results is taken (summarizing each result) This is the final search result.

しかしながら、Ｒ木を使う方法においても、やはり複数のＲ木にアクセスするため、タプル種類数が大きくなった場合には、Ｒ木検索時のアクセスノード数の総数が大きくなってしまう。またどのＲ木を検索するべきかを判断する前処理も必要である。つまりこの方法にも、検索処理全体の処理量や処理時間が大きくなりやすいという問題がある。 However, in the method using the R-tree, too, since a plurality of R-trees are accessed, when the number of tuple types increases, the total number of access nodes at the time of R-tree search increases. In addition, preprocessing for determining which R-tree should be searched is also necessary. In other words, this method also has a problem that the processing amount and processing time of the entire search process tends to increase.

そこで、１つの検索木に多種多次元のタプル集合、すなわち属性の数や種類が異なるタプル集合を挿入することが考えられるが、前記従来の木構造のインデックスは、対象とするタプル集合が、すべて同次元であることが前提とされている。つまり、属性の数と種類が同じタプルの集合に対してインデックスを構築することを前提としている。例えば、２次元空間内の座標データを示すタプル集合をＲ木によってインデキシングする場合を考える。このとき、該タプル集合を下位の階層に含むエントリの最小外接矩形は、該タプル集合を全て含む最小の長方形で表される。多種多次元のタプルをインデキシングする場合、ここに新たに３次元のタプルが挿入される場合もあり、この場合、最小外接矩形は直方体となる。ペナルティは、定義上、直方体の体積から長方形の面積を引いた値となるが、次元の異なる値同士の引き算による値は論理的意味をもたず、良いクラスタリングの指標とはなりえない。結果、多種多次元のタプル集合に対し、Ｒ木では良い検索効率を実現できない。別の表現をすれば、従来の木構造インデックスにおいては、２つのエントリを入力とするペナルティの計算において、該エントリのキーが含む全次元について、次元毎に大小関係を判別可能である必要があるが、多種多次元のタプルを下位に含むエントリ間では、そのキーが含む次元も異なるために、次元毎の大小関係を必ずしも判別できず、多種多次元のタプルに対する効率的なインデックスを構築することができなかった。 Therefore, it is conceivable to insert various multi-dimensional tuple sets, that is, tuple sets having different numbers and types of attributes, into one search tree. However, the conventional tree structure index includes all target tuple sets. It is assumed that they are the same dimension. In other words, it is assumed that an index is constructed for a set of tuples having the same number and type of attributes. For example, consider a case where a tuple set indicating coordinate data in a two-dimensional space is indexed by an R-tree. At this time, the minimum circumscribed rectangle of the entry including the tuple set in the lower hierarchy is represented by the minimum rectangle including all the tuple sets. When indexing a multi-dimensional tuple, a new three-dimensional tuple may be inserted here. In this case, the minimum circumscribed rectangle is a rectangular parallelepiped. By definition, the penalty is a value obtained by subtracting the rectangular area from the volume of the rectangular parallelepiped, but the value obtained by subtracting values of different dimensions has no logical meaning and cannot be a good clustering index. As a result, good search efficiency cannot be realized with the R-tree for various multidimensional tuple sets. In other words, in the conventional tree structure index, it is necessary to be able to determine the magnitude relation for each dimension for all dimensions included in the key of the entry in calculating the penalty with two entries as inputs. However, because the dimension that the key contains is different between the entries that contain multi-dimensional tuples at the lower level, the size relationship for each dimension cannot always be determined, and an efficient index for multi-dimensional multi-tuples must be constructed. I could not.

一方、従来から、次元数が非常に大きい多次元データに対する検索においては、効率的な検索ができなくなる問題（いわゆる「次元の呪い」と呼ばれる問題）が存在することが知られている。「次元の呪い」問題の具体例であるが、ｎ次元空間中に点が一様に分布している場合、ある点から他の各点までの距離は、ｎが大きくなるほど、距離の差が小さくなっていく。こうした現象により、Ｒ−ｔｒｅｅなどの手法によりクラスタリングを行っても、互いに大きく重なりあったクラスタばかりができてしまい、クラスタリングの効果が発揮されない。つまり、検索時に検索木上の多くのノードを辿る必要が生じ、効率的な検索ができない。 On the other hand, it is conventionally known that there is a problem that a search for multidimensional data having a very large number of dimensions cannot be performed efficiently (a problem called “dimension curse”). This is a specific example of the “dimensional curse” problem. When the points are uniformly distributed in the n-dimensional space, the distance from one point to the other points increases as n increases. It gets smaller. Due to such a phenomenon, even when clustering is performed by a technique such as R-tree, only clusters that are largely overlapped with each other are formed, and the effect of clustering is not exhibited. That is, it is necessary to trace many nodes on the search tree at the time of search, and efficient search cannot be performed.

ユビキタス環境においては多様なデバイス（センサやアクチュエータ）が使用され、それに応じてデータの種類も多様となり、次元数は非常に大きくなると想定される。つまり、ユビキタス環境を対象とした検索では、多様なデバイスに応じて検索データの次元数が増加することから、「次元の呪い」問題が発生しうると考えられる。 In the ubiquitous environment, various devices (sensors and actuators) are used, and accordingly, the types of data are varied, and the number of dimensions is expected to be very large. In other words, in the search for the ubiquitous environment, the number of dimensions of the search data increases according to various devices, so it is considered that the “dimension curse” problem may occur.

本発明は、このような事情を考慮し、上記の問題を解決すべくなされたもので、その目的は、多種多次元のタプルに対する検索を、タプル数や含まれる属性種類数が大きくなった場合にも、効率的に実現することにある。また、本発明の他の目的は、多種多次元のタプルに対する検索において、「次元の呪い」の問題が発生することを回避しつつ、効率的な検索を可能とすることにある。 The present invention has been made to solve the above-mentioned problems in consideration of such circumstances, and its purpose is to search for various multi-dimensional tuples when the number of tuples and the number of included attribute types are large. In addition, it is to realize efficiently. Another object of the present invention is to enable an efficient search while avoiding the problem of “curse of dimension” in a search for various multidimensional tuples.

上記課題を解決するため、請求項１記載の発明は、蓄積検索対象の情報の単位であるタプルは、属性−値の組の並びを少なくとも含む１以上の長さを有するものであり、該属性の種類や該並びの長さが同じまたは異なる１つ以上の該タプルが複数記憶されるものであり、該タプルから検索キーがインデックス化され検索木が構築されるものであり、該検索木はノードを階層的に有する構成であり、該ノードの内最下層のリーフノードはエントリとして該タプルを識別するタプル識別子とキーを有するものであり、該ノードの内該リーフノードより上位のインナーノードはエントリとして下位ノードである子ノードの位置情報であるポインタとキーを有するものであり、該インデックスの構築において、エントリをノードに挿入する際、１つの挿入すべきエントリＸと、該エントリを挿入すべきレベル数Ｌ（前記リーフノードを０として階層が上がるごとに１増える階層数）が与えられた際、はじめにルートノードを操作対象ノードとし、操作対象ノードにおいて、「該操作対象ノードがすでに含む既存エントリに対し、該エントリＸを挿入した場合のペナルティ」に基づいて１つの該既存エントリを選出し、該既存エントリに含まれるポインタが指し示す子ノードを次の操作対象ノードと選定し、該次の操作対象ノードのレベル数がＬより大きい場合は、該次の操作対象ノードからさらに次の操作対象ノードを選定することを再帰的に繰り返し、該次の操作対象ノードのレベル数がＬに到達した場合は、このノードを選択ノードに決定するステップと、前記選択ノードが許容できる最大数のエントリを持っていない場合において、前記選択ノードに対し、前記エントリＸを追加するステップと、前記選択ノードが既に許容できる最大数のエントリを持っている場合において、前記エントリＸを追加しつつ前記選択ノードの分割を行うステップと、前記分割を行った場合において、前記選択ノードの上位ノードのエントリを更新するステップと、を含むことを特徴とする情報蓄積検索方法である。 In order to solve the above problem, the invention according to claim 1 is characterized in that a tuple as a unit of information to be stored and searched has one or more lengths including at least an array of attribute-value pairs. A plurality of one or more tuples having the same type or the same or different length are stored, a search key is indexed from the tuples, and a search tree is constructed. The node has a hierarchical structure, and the leaf node at the lowest layer of the node has a tuple identifier and key for identifying the tuple as an entry, and an inner node higher than the leaf node of the node It has a pointer and a key which are position information of child nodes which are lower nodes as an entry, and when the entry is inserted into the node in the construction of the index, one entry When the entry X to be entered and the number L of levels to insert the entry (the number of hierarchies incremented by 1 each time the hierarchy is increased with the leaf node set to 0), the root node is first set as the operation target node, and the operation target In the node, one existing entry is selected based on “the penalty when the entry X is inserted into the existing entry already included in the operation target node”, and the child node indicated by the pointer included in the existing entry is selected. When the next operation target node is selected, and the number of levels of the next operation target node is greater than L, it is recursively repeated to select the next operation target node from the next operation target node. When the number of levels of the operation target node reaches L, the step of determining this node as a selection node and the selection node are acceptable In the case of not having a large number of entries, the step of adding the entry X to the selected node; and in the case where the selected node already has the maximum allowable number of entries, the entry X is added. An information storage and retrieval method, comprising: dividing the selected node; and updating an entry of an upper node of the selected node when the division is performed.

請求項２記載の発明は、請求項１に記載の分割を行うステップが、前記選択ノードが含む複数のエントリに前記エントリＸを追加した各エントリを２つのグループに順次、振り分ける際に、各グループがすでに含むエントリの和をとったエントリを既存エントリＡとし、該振り分けられるエントリをエントリＢとしてグループ毎に求めた各前記ペナルティに基づいて振り分け先のグループを選択して振り分けを行うステップと、該２つのグループうちの１つのグループに振り分けられた各エントリを前記選択ノードのエントリとするとともに、もう１つのグループに振り分けられた各エントリを新たに生成したノードのエントリとするステップと、該新たに生成したノードのエントリを該ノードの親ノードに挿入するステップと、を含むことを特徴とする情報蓄積検索方法である。 According to a second aspect of the present invention, in the step of performing the division according to the first aspect, when each entry obtained by adding the entry X to a plurality of entries included in the selection node is sequentially allocated to two groups, each group Selecting the distribution destination group based on each penalty obtained for each group, with the entry that is the sum of the entries already included as the existing entry A and the entry to be distributed as the entry B; Each entry distributed to one of the two groups as an entry of the selected node, and each entry distributed to another group as an entry of a newly generated node; Inserting the generated node entry into the parent node of the node; The information storage retrieval method characterized.

請求項３記載の発明は、請求項１ないし２に記載の選択ノードを決定するステップが、前記操作対象ノードにおいて、前記ペナルティが最も小さい前記既存エントリを選出することを特徴とする情報蓄積検索方法である。 The invention according to claim 3 is characterized in that the step of determining the selection node according to claim 1 or 2 selects the existing entry having the smallest penalty in the operation target node. It is.

請求項４記載の発明は、請求項３に記載のペナルティが、前記操作対象ノードがすでに含む既存エントリＡに対し、エントリＢを挿入する際、該エントリＢを挿入する前の該エントリＡと、挿入した後の該エントリＡとを比べた場合の、該エントリＡが検索される確率に対応した値の増分を示す値として定義され、該エントリＡが検索される確率に対応した値として、該エントリＡに含まれるキーに含まれる各属性に対する重みづけ付き正規化被検索面積の和を用い、該重みづけ付き正規化被検索面積は、正規化被検索面積に、該属性が検索式に用いられる確率を掛け合わせた値であり、該正規化被検索面積は、数式（（ｑ−ａ）（ｂ−ｐ）−（ｑ−ｐ）^２／２）／（（ｂ−ａ）^２／２）（但し、ｐ、ｑはそれぞれ、該キーに含まれる該属性の最小値、最大値。ａ、ｂはそれぞれ、それまで該検索木に挿入された全タプル集合に含まれる該属性の最小値、最大値。）で規定される、ことを特徴とする情報蓄積検索方法である。 In the invention according to claim 4, when the penalty according to claim 3 inserts the entry B into the existing entry A already included in the operation target node, the entry A before the entry B is inserted; When compared with the entry A after insertion, it is defined as a value indicating an increment of a value corresponding to the probability that the entry A is searched, and as a value corresponding to the probability that the entry A is searched, The sum of the weighted normalized searched areas for the attributes included in the key included in the entry A is used. The weighted normalized searched area is used as the normalized searched area and the attribute is used as the search expression. a value obtained by multiplying the probabilities, the normalized target search area, the formula ((q-a) (b -p) - (q-p) 2/2) / ((b-a) 2/2 ) (Where p and q are each the genus included in the key) Minimum and maximum values of the sex, a and b are defined by the minimum and maximum values of the attributes included in all tuple sets inserted in the search tree until then. It is an accumulation search method.

請求項５記載の発明は、請求項３に記載のペナルティが、前記操作対象ノードがすでに含む既存エントリＡに対し、エントリＢを挿入する際、該エントリＢを挿入する前の該エントリＡと、挿入した後の該エントリＡとを比べた場合の、該エントリＡに含まれるキーに含まれる属性種類数の増分を示す値であることを特徴とする情報蓄積検索方法である。 In the invention according to claim 5, when the penalty according to claim 3 inserts the entry B into the existing entry A already included in the operation target node, the entry A before the entry B is inserted; The information storage and retrieval method is characterized in that the value indicates an increase in the number of attribute types included in the key included in the entry A when compared with the entry A after insertion.

請求項６記載の発明は、請求項１ないし２に記載の選択ノードを決定するステップが、２種の前記ペナルティを用い、前記操作対象ノードにおいて、第一のペナルティが最も小さい前記既存エントリを選出し、このとき該最も小さい既存エントリが複数存在する場合には、さらに、該最も小さい既存エントリのうち、第二のペナルティが最も小さい既存エントリを選出することを特徴とする情報蓄積検索方法である。 According to a sixth aspect of the present invention, the step of determining the selection node according to the first or second aspect uses the two types of penalties, and selects the existing entry having the smallest first penalty in the operation target node. At this time, when there are a plurality of the smallest existing entries, the information storage and retrieval method is characterized in that, among the smallest existing entries, an existing entry having the smallest second penalty is selected. .

請求項７記載の発明は、請求項６において、前記第一のペナルティが、前記操作対象ノードがすでに含む既存エントリＡに対し、エントリＢを挿入する際、該エントリＢを挿入する前の該エントリＡと、挿入した後の該エントリＡとを比べた場合の、該エントリＡに含まれるキーに含まれる属性種類数の増分を示す値であり、前記第二のペナルティが、前記操作対象ノードがすでに含む既存エントリＡに対し、エントリＢを挿入する際、該エントリＢを挿入する前の該エントリＡと、挿入した後の該エントリＡとを比べた場合の、該エントリＡが検索される確率に対応した値の増分を示す値であることを特徴とする情報蓄積検索方法である。 According to a seventh aspect of the present invention, in the sixth aspect, the entry before the entry B is inserted when the first penalty inserts the entry B into the existing entry A already included in the operation target node. A is a value indicating an increase in the number of attribute types included in the key included in the entry A when comparing the entry A after insertion, and the second penalty indicates that the operation target node Probability that the entry A is searched when the entry A is inserted into the existing entry A that is already included and the entry A before the entry B is inserted is compared with the entry A after the insertion. This is an information storage and retrieval method characterized by being a value indicating an increment of a value corresponding to.

請求項８記載の発明は、請求項７において、前記エントリが検索される確率に対応した値として、前記エントリに含まれるキーに含まれる各属性に対する前記重みづけ付き正規化被検索面積の和を用いることを特徴とする情報蓄積検索方法である。 The invention according to claim 8 provides the sum of the weighted normalized search areas for each attribute included in the key included in the entry as a value corresponding to the probability that the entry is searched. This is an information storage and retrieval method characterized by being used.

請求項９記載の発明は、請求項４または８において、前記属性が検索式に用いられる確率として、属性出現回数をタプル総数で除した値を用い、該属性出現回数は、それまでに該検索木に挿入された全タプル集合における該属性の出現回数であり、該タプル総数は、それまでに該検索木に挿入された全タプル集合の要素数である、ことを特徴とする情報蓄積検索方法である。 The invention according to claim 9 uses the value obtained by dividing the number of occurrences of the attribute by the total number of tuples as the probability that the attribute is used in the search formula according to claim 4 or 8, and the number of occurrences of the attribute is The number of appearances of the attribute in all tuple sets inserted in the tree, and the total number of tuples is the number of elements in all tuple sets inserted in the search tree so far It is.

請求項１０記載の発明は、請求項１ないし９に記載の前記検索木を用いた検索において、検索式としての前記キーである検索キーに対し、前記ルートノードからその下位の前記ノードへと検索対象を変えながら、１つの該ノードを検索対象とし、該ノードが前記インナーノードであれば、該ノードに含まれる前記各エントリについて、該エントリのキーが前記検索キーに含まれる各属性を全て含み、かつ、該検索キーに含まれる各属性に対応する値または値範囲と該エントリのキーに含まれる該属性に対応する値または値範囲とが一致または部分的に一致する（一方の値または値範囲が、他方の値範囲に含まれるまたは一部重なる）かを調べる検索手順を行い、一致または部分的に一致する場合には、該エントリの前記ポインタで接続されたノードを検索対象、該検索キーを検索キーとして前記ノード検索手順を再帰的に行い、また、該ノードが前記リーフノードであれば、該ノードに含まれる前記各エントリについて、該エントリのキーが前記検索キーに含まれる各属性を全て含み、かつ、前記検索キーに含まれる各属性に対応する値または値範囲に該エントリのキーに含まれる該属性に対応する値が含まれるかを調べ、含まれる場合には該エントリの前記タプル識別子を検索結果タプル識別子集合に加えるノード検索ステップ、を有することを特徴とする情報蓄積検索方法である。 A tenth aspect of the present invention is the retrieval using the retrieval tree according to any one of the first to ninth aspects, wherein the retrieval key which is the key as a retrieval formula is retrieved from the root node to the lower node. While changing the target, if one of the nodes is a search target and the node is the inner node, for each entry included in the node, the key of the entry includes all the attributes included in the search key. And the value or value range corresponding to each attribute included in the search key and the value or value range corresponding to the attribute included in the key of the entry match or partially match (one value or value) Search procedure to find out if the range is included or partially overlaps the other value range, and if it matches or partially matches, it is connected by the pointer of the entry The node search procedure is recursively performed using a search key as a search target and the search key as a search key. If the node is the leaf node, the key of the entry is set for each entry included in the node. Check whether all the attributes included in the search key are included, and whether the value corresponding to each attribute included in the search key or the value range corresponding to the attribute included in the key of the entry is included in the value range, And a node search step of adding the tuple identifier of the entry to a search result tuple identifier set if it is included.

請求項１１記載の発明は、蓄積検索対象の情報の単位であるタプルは、属性−値の組の並びを少なくとも含む１以上の長さを有するものであり、該属性の種類や該並びの長さが同じまたは異なる１つ以上の該タプルが複数記憶されるものであり、該タプルから検索キーがインデックス化され検索木が構築されるものであり、該検索木はノードを階層的に有する構成であり、該ノードの内最下層のリーフノードはエントリとして該タプルを識別するタプル識別子とキーを有するものであり、該ノードの内該リーフノードより上位のインナーノードはエントリとして下位ノードである子ノードの位置情報であるポインタとキーを有するものであり、該インデックスの構築において、エントリをノードに挿入する際、１つの挿入すべきエントリＸと、該エントリを挿入すべきレベル数Ｌ（前記リーフノードを０として階層が上がるごとに１増える階層数）が与えられた際、はじめにルートノードを操作対象ノードとし、操作対象ノードにおいて、「該操作対象ノードがすでに含む既存エントリに対し、該エントリＸを挿入した場合のペナルティ」に基づいて１つの該既存エントリを選出し、該既存エントリに含まれるポインタが指し示す子ノードを次の操作対象ノードと選定し、該次の操作対象ノードのレベル数がＬより大きい場合は、該次の操作対象ノードからさらに次の操作対象ノードを選定することを再帰的に繰り返し、該次の操作対象ノードのレベル数がＬに到達した場合は、このノードを選択ノードに決定するステップと、前記選択ノードが許容できる最大数のエントリを持っていない場合において、前記選択ノードに対し、前記エントリＸを追加するステップと、前記選択ノードが既に許容できる最大数のエントリを持っている場合において、前記エントリＸを追加しつつ前記選択ノードの分割を行うステップと、前記分割を行った場合において、前記選択ノードの上位ノードのエントリを更新するステップと、をコンピュータを用いて実行するための情報蓄積検索プログラムである。 According to the eleventh aspect of the present invention, a tuple which is a unit of information to be stored and searched has one or more lengths including at least an array of attribute-value pairs. The type of attribute and the length of the array A plurality of one or more tuples having the same or different lengths are stored, a search key is indexed from the tuples, and a search tree is constructed, and the search tree has nodes hierarchically The leaf node at the lowest layer of the node has a tuple identifier and key for identifying the tuple as an entry, and an inner node higher than the leaf node of the node is a child that is a lower node as an entry. A pointer and a key that are position information of the node, and when the entry is inserted into the node in the construction of the index, one entry X to be inserted; When the number L of levels to insert the entry (the number of hierarchies that increases by 1 every time the hierarchy is increased with the leaf node being 0) is given, the root node is first set as the operation target node. One existing entry is selected based on the penalty when the entry X is inserted for an existing entry already included in the node, and the child node indicated by the pointer included in the existing entry is selected as the next operation target node If the number of levels of the next operation target node is greater than L, recursively repeating the selection of the next operation target node from the next operation target node, and the number of levels of the next operation target node If L reaches L, the step of determining this node as the selected node and having the maximum number of entries allowed by the selected node If there is not, the step of adding the entry X to the selected node; and if the selected node already has the maximum allowable number of entries, dividing the selected node while adding the entry X An information storage / retrieval program for executing, using a computer, a step of performing and a step of updating an entry of an upper node of the selected node when the division is performed.

請求項１２記載の発明は、請求項１１に記載の分割を行うステップが、前記選択ノードが含む複数のエントリに前記エントリＸを追加した各エントリを２つのグループに順次、振り分ける際に、各グループがすでに含むエントリの和をとったエントリを既存エントリＡとし、該振り分けられるエントリをエントリＢとしてグループ毎に求めた各前記ペナルティに基づいて振り分け先のグループを選択して振り分けを行うステップと、該２つのグループうちの１つのグループに振り分けられた各エントリを前記選択ノードのエントリとするとともに、もう１つのグループに振り分けられた各エントリを新たに生成したノードのエントリとするステップと、該新たに生成したノードのエントリを該ノードの親ノードに挿入するステップと、を含むことを特徴とする情報蓄積検索プログラムである。 According to a twelfth aspect of the present invention, in the step of performing the division according to the eleventh aspect, when each entry obtained by adding the entry X to a plurality of entries included in the selection node is sequentially allocated to two groups, each group Selecting the distribution destination group based on each penalty obtained for each group, with the entry that is the sum of the entries already included as the existing entry A and the entry to be distributed as the entry B; Each entry distributed to one of the two groups as an entry of the selected node, and each entry distributed to another group as an entry of a newly generated node; Inserting the generated node entry into the parent node of the node. The information storage retrieval program characterized and.

請求項１３記載の発明は、請求項１１ないし１２に記載の選択ノードを決定するステップが、前記操作対象ノードにおいて、前記ペナルティが最も小さい前記既存エントリを選出することを特徴とする情報蓄積検索プログラムである。 According to a thirteenth aspect of the present invention, the step of determining a selection node according to the eleventh to twelfth aspects selects the existing entry having the smallest penalty in the operation target node. It is.

請求項１４記載の発明は、請求項１３に記載のペナルティが、前記操作対象ノードがすでに含む既存エントリＡに対し、エントリＢを挿入する際、該エントリＢを挿入する前の該エントリＡと、挿入した後の該エントリＡとを比べた場合の、該エントリＡが検索される確率に対応した値の増分を示す値として定義され、該エントリＡが検索される確率に対応した値として、該エントリＡに含まれるキーに含まれる各属性に対する重みづけ付き正規化被検索面積の和を用い、該重みづけ付き正規化被検索面積は、正規化被検索面積に、該属性が検索式に用いられる確率を掛け合わせた値であり、該正規化被検索面積は、数式（（ｑ−ａ）（ｂ−ｐ）−（ｑ−ｐ）^２／２）／（（ｂ−ａ）^２／２）（但し、ｐ、ｑはそれぞれ、該キーに含まれる該属性の最小値、最大値。ａ、ｂはそれぞれ、それまで該検索木に挿入された全タプル集合に含まれる該属性の最小値、最大値。）で規定される、ことを特徴とする情報蓄積検索プログラムである。 In the invention described in claim 14, when the penalty described in claim 13 inserts the entry B into the existing entry A already included in the operation target node, the entry A before the entry B is inserted; When compared with the entry A after insertion, it is defined as a value indicating an increment of a value corresponding to the probability that the entry A is searched, and as a value corresponding to the probability that the entry A is searched, The sum of the weighted normalized searched areas for the attributes included in the key included in the entry A is used. The weighted normalized searched area is used as the normalized searched area and the attribute is used as the search expression. a value obtained by multiplying the probabilities, the normalized target search area, the formula ((q-a) (b -p) - (q-p) 2/2) / ((b-a) 2/2 ) (However, p and q are included in the key. Minimum value and maximum value of the attribute, a and b are defined by the minimum value and maximum value of the attribute included in all tuple sets inserted in the search tree until then. This is an information storage and retrieval program.

請求項１５記載の発明は、請求項１３に記載のペナルティが、前記操作対象ノードがすでに含む既存エントリＡに対し、エントリＢを挿入する際、該エントリＢを挿入する前の該エントリＡと、挿入した後の該エントリＡとを比べた場合の、該エントリＡに含まれるキーに含まれる属性種類数の増分を示す値であることを特徴とする情報蓄積検索プログラムである。 In the invention of claim 15, when the penalty of claim 13 inserts the entry B into the existing entry A already included in the operation target node, the entry A before the entry B is inserted; An information storage and retrieval program characterized in that the value indicates an increment of the number of attribute types included in a key included in the entry A when compared with the entry A after insertion.

請求項１６記載の発明は、請求項１１ないし１２に記載の選択ノードを決定するステップが、２種の前記ペナルティを用い、前記操作対象ノードにおいて、第一のペナルティが最も小さい前記既存エントリを選出し、このとき該最も小さい既存エントリが複数存在する場合には、さらに、該最も小さい既存エントリのうち、第二のペナルティが最も小さい既存エントリを選出することを特徴とする情報蓄積検索プログラムである。 According to a sixteenth aspect of the present invention, the step of determining the selection node according to the eleventh to twelfth aspects uses the two types of penalties and selects the existing entry having the smallest first penalty in the operation target node. At this time, when there are a plurality of the smallest existing entries, the information storage and retrieval program is characterized in that, among the smallest existing entries, an existing entry having the smallest second penalty is selected. .

請求項１７記載の発明は、請求項１６において、前記第一のペナルティが、前記操作対象ノードがすでに含む既存エントリＡに対し、エントリＢを挿入する際、該エントリＢを挿入する前の該エントリＡと、挿入した後の該エントリＡとを比べた場合の、該エントリＡに含まれるキーに含まれる属性種類数の増分を示す値であり、前記第二のペナルティが、前記操作対象ノードがすでに含む既存エントリＡに対し、エントリＢを挿入する際、該エントリＢを挿入する前の該エントリＡと、挿入した後の該エントリＡとを比べた場合の、該エントリＡが検索される確率に対応した値の増分を示す値であることを特徴とする情報蓄積検索プログラムである。 According to a seventeenth aspect of the present invention, in the sixteenth aspect, the entry before the entry B is inserted when the first penalty inserts the entry B into the existing entry A already included in the operation target node. A is a value indicating an increase in the number of attribute types included in the key included in the entry A when comparing the entry A after insertion, and the second penalty indicates that the operation target node Probability that the entry A is searched when the entry A is inserted into the existing entry A that is already included and the entry A before the entry B is inserted is compared with the entry A after the insertion. An information storage and retrieval program characterized by being a value indicating an increment of a value corresponding to.

請求項１８記載の発明は、請求項１７において、前記エントリが検索される確率に対応した値として、前記エントリに含まれるキーに含まれる各属性に対する前記重みづけ付き正規化被検索面積の和を用いることを特徴とする情報蓄積検索プログラムである。 The invention according to claim 18 is the sum of the weighted normalized search areas for each attribute included in the key included in the entry as a value corresponding to the probability that the entry is searched. An information storage and retrieval program characterized by being used.

請求項１９記載の発明は、請求項１４または１８において、前記属性が検索式に用いられる確率として、属性出現回数をタプル総数で除した値を用い、該属性出現回数は、それまでに該検索木に挿入された全タプル集合における該属性の出現回数であり、該タプル総数は、それまでに該検索木に挿入された全タプル集合の要素数である、ことを特徴とする情報蓄積検索プログラムである。 The invention according to claim 19 uses the value obtained by dividing the number of occurrences of attributes by the total number of tuples as the probability that the attribute is used in the search expression according to claim 14 or 18, wherein the number of occurrences of attributes An information storage and retrieval program characterized in that the number of occurrences of the attribute in all tuple sets inserted in the tree, and the total number of tuples is the number of elements in all tuple sets inserted in the search tree so far It is.

請求項２０記載の発明は、請求項１１ないし１９に記載の前記検索木を用いた検索において、検索式としての前記キーである検索キーに対し、前記ルートノードからその下位の前記ノードへと検索対象を変えながら、１つの該ノードを検索対象とし、該ノードが前記インナーノードであれば、該ノードに含まれる前記各エントリについて、該エントリのキーが前記検索キーに含まれる各属性を全て含み、かつ、該検索キーに含まれる各属性に対応する値または値範囲と該エントリのキーに含まれる該属性に対応する値または値範囲とが一致または部分的に一致する（一方の値または値範囲が、他方の値範囲に含まれるまたは一部重なる）かを調べる検索手順を行い、一致または部分的に一致する場合には、該エントリの前記ポインタで接続されたノードを検索対象、該検索キーを検索キーとして前記ノード検索手順を再帰的に行い、また、該ノードが前記リーフノードであれば、該ノードに含まれる前記各エントリについて、該エントリのキーが前記検索キーに含まれる各属性を全て含み、かつ、前記検索キーに含まれる各属性に対応する値または値範囲に該エントリのキーに含まれる該属性に対応する値が含まれるかを調べ、含まれる場合には該エントリの前記タプル識別子を検索結果タプル識別子集合に加えるノード検索ステップ、を有することを特徴とする情報蓄積検索プログラムである。 The invention according to claim 20 is the search using the search tree according to any one of claims 11 to 19, wherein the search key which is the key as a search expression is searched from the root node to the node below it. While changing the target, if one of the nodes is a search target and the node is the inner node, for each entry included in the node, the key of the entry includes all the attributes included in the search key. And the value or value range corresponding to each attribute included in the search key and the value or value range corresponding to the attribute included in the key of the entry match or partially match (one value or value) The search procedure is performed to check whether the range is included or partially overlaps the other value range. If the range matches or partially matches, the pointer of the entry is connected. If the node is a search target and the search key is a search key, the node search procedure is recursively performed. If the node is the leaf node, the key of the entry is set for each entry included in the node. Check whether all the attributes included in the search key are included, and whether the value corresponding to each attribute included in the search key or the value range corresponding to the attribute included in the key of the entry is included in the value range, If it is included, a node search step of adding the tuple identifier of the entry to a search result tuple identifier set is an information storage search program.

本発明によれば、多種多次元のタプルに対し、統一的に１つの木構造インデックスを構築する際に、操作対象ノードにおいて、「該操作対象ノードがすでに含む既存エントリに対し、該エントリＸを挿入した場合のペナルティ」に基づいて１つの既存エントリを選出してエントリをノードに挿入するようにした。このペナルティとして、例えば、エントリに含まれるキーに含まれる属性種類数の増加量や、エントリ挿入前後での検索される確率に対応する値（すなわち検索される確率ないし検索される確率の近似値）の増加量を用いることで、ペナルティを、次元毎に大小関係を判別する必要はなく、エントリが含むキーの次元数や種類が異なっても定義することができる。これにより、多種多次元のタプル集合を蓄積・検索する際の、複数のインデックスを用いることによるオーバーヘッド、すなわち処理量や記憶容量、処理速度が大きくなることを抑え、また多種多次元のタプル集合に対して検索効率を向上させるクラスタリングを実現することが可能となる。 According to the present invention, when constructing a single tree structure index uniformly for various multidimensional tuples, in the operation target node, “the entry X is assigned to the existing entry already included in the operation target node. One existing entry is selected based on the “penalty for insertion” and the entry is inserted into the node. As this penalty, for example, an increase in the number of attribute types included in the key included in the entry, and a value corresponding to the search probability before and after the entry insertion (that is, a search probability or an approximate value of the search probability) By using this increase amount, it is not necessary to determine the magnitude relationship for each dimension, and the penalty can be defined even if the number of key dimensions and types included in the entry are different. This suppresses the overhead caused by using multiple indexes when accumulating / retrieving various multi-dimensional tuple sets, that is, the amount of processing, storage capacity, and processing speed, and increases the number of multi-dimensional tuple sets. On the other hand, it is possible to realize clustering that improves search efficiency.

また、ペナルティの算出において、エントリが検索される確率に対応した値として、エントリに含まれるキーに含まれる各属性に対する重みづけ付き正規化被検索面積の和を用い、該重みづけ付き正規化被検索面積は、該正規化被検索面積に、該属性が検索式に用いられる確率を掛け合わせた値であり、該正規化被検索面積は、数式（（ｑ−ａ）（ｂ−ｐ）−（ｑ−ｐ）^２／２）／（（ｂ−ａ）^２／２）（但し、ｐ、ｑはそれぞれ、該キーに含まれる該属性の最小値、最大値。ａ、ｂはそれぞれ、それまで該検索木に挿入された全タプル集合に含まれる該属性の最小値、最大値。）で規定されるようにすることで、あるいは、ペナルティとして、操作対象ノードがすでに含む既存エントリＡに対し、エントリＢを挿入する際、該エントリＢを挿入する前の該エントリＡと、挿入した後の該エントリＡとを比べた場合の、該エントリＡに含まれるキーに含まれる属性種類数の増分を示す値でとすることで、あるエントリにそれまで含まれていなかった属性がタプルの挿入により該エントリに新たに追加された場合であっても、ペナルティが発生することとなるため、属性種類数が増加する状態遷移、すなわちエントリの次元増加を抑制し、「次元の呪い」の問題が発生することを回避することができる。 In calculating the penalty, the sum of the weighted normalized search areas for each attribute included in the key included in the entry is used as a value corresponding to the probability that the entry is searched, and the weighted normalized search target is used. The search area is a value obtained by multiplying the normalized search area by the probability that the attribute is used in the search formula, and the normalized search area is expressed by the formula ((q−a) (b−p) − ^{(q-p) 2/2} ) / ((b-a) 2/2) ( where, p, respectively q is the attribute minimum value of which is included in the key, the maximum value .a, b, respectively, it Until it is specified by the minimum and maximum values of the attributes included in all the tuple sets inserted in the search tree until, or as a penalty for the existing entry A already included in the operation target node When entry B is inserted, entry B When the entry A before the insertion is compared with the entry A after the insertion, a value indicating the increment of the number of attribute types included in the key included in the entry A is obtained. Even if an attribute that was not included in the previous is newly added to the entry by inserting a tuple, a penalty will occur, so the state transition that increases the number of attribute types, that is, the dimension of the entry The increase can be suppressed and the problem of “curse of dimension” can be avoided.

本発明が特徴とするペナルティの算出手法を説明するための模式図である。It is a schematic diagram for demonstrating the penalty calculation method characterized by the present invention. 本発明の情報蓄積検索方法を実行するための情報蓄積検索システム全体図を示す図である。It is a figure which shows the information storage search system whole figure for performing the information storage search method of this invention. 図２のシステムで用いられるタプルの例を示す図である。It is a figure which shows the example of the tuple used with the system of FIG. 図２のシステムで用いられる検索式の例を示す図である。It is a figure which shows the example of the search formula used with the system of FIG. 図２のサーバ・コンピュータ装置２０１の構成例を示すブロック図である。FIG. 3 is a block diagram illustrating a configuration example of a server computer apparatus 201 in FIG. 2. 図２のシステムで用いられるインデックスのデータ構造の例を示す図である。It is a figure which shows the example of the data structure of the index used with the system of FIG. 図２のシステムで用いられるタプル表の例を示す図である。It is a figure which shows the example of the tuple table used with the system of FIG. 図２のシステムで用いられるリーフノードのエントリ表の例を示す図である。It is a figure which shows the example of the entry table of the leaf node used with the system of FIG. 図２のシステムで用いられるインナーノードのエントリ表の例を示す図である。It is a figure which shows the example of the entry table of the inner node used with the system of FIG. 図２のシステムで用いられる属性表の例を示す図である。It is a figure which shows the example of the attribute table used with the system of FIG. 図２のクライアント・コンピュータ装置３０１の構成例を示すブロック図である。FIG. 3 is a block diagram illustrating a configuration example of a client computer device 301 in FIG. 2. 図２のシステムにおけるエントリ挿入手順（Ｉｎｓｅｒｔ）のフローチャートである。3 is a flowchart of an entry insertion procedure (Insert) in the system of FIG. 図２のシステムにおける部分木選択手順（ＣｈｏｏｓｅＳｕｂｔｒｅｅ）のフローチャートである。3 is a flowchart of a subtree selection procedure (ChooseSubtree) in the system of FIG. 2. 図１５および図１６とあわせて図２のシステムにおけるノード分割手順（Ｓｐｌｉｔ）を示すフローチャートである。17 is a flowchart showing a node division procedure (Split) in the system of FIG. 2 together with FIGS. 15 and 16. 図１４および図１６とあわせて図２のシステムにおけるノード分割手順（Ｓｐｌｉｔ）を示すフローチャートである。17 is a flowchart showing a node division procedure (Split) in the system of FIG. 2 together with FIGS. 14 and 16. 図１４および図１５とあわせて図２のシステムにおけるノード分割手順（Ｓｐｌｉｔ）を示すフローチャートである。16 is a flowchart showing a node division procedure (Split) in the system of FIG. 2 together with FIGS. 14 and 15. 図２のシステムにおけるノード分割手順によるノード分割例１を示す図である。It is a figure which shows the node division example 1 by the node division | segmentation procedure in the system of FIG. 図２のシステムにおけるノード分割手順によるノード分割例２を示す図である。It is a figure which shows the node division example 2 by the node division | segmentation procedure in the system of FIG. 図２のシステムにおけるキー集合情報調整手順（ＡｄｊｕｓｔＫｅｙｓ）のフローチャートである。It is a flowchart of the key set information adjustment procedure (AdjustKeys) in the system of FIG. 図２のシステムにおけるノード検索手順（Ｓｅａｒｃｈ）のフローチャートである。It is a flowchart of the node search procedure (Search) in the system of FIG.

以下、図面を参照して本発明の実施形態について説明する。本発明の情報蓄積検索方法は、検索木を構築する際に用いられるペナルティが、ノード選択処理等の場合に操作対象となるノードがすでに含む既存エントリＡに対し、エントリＢを挿入するとき、該エントリＢを挿入する前の該エントリＡと、挿入した後の該エントリＡとを比べたときの、該エントリＡが検索される確率に対応した値の増分を示す値として定義される、ことを特徴としている。本実施形態では、このエントリが検索される確率に対応した値の増分を示す値として、エントリに含まれるキーに含まれる各属性に対する重みづけ付き正規化被検索面積の和の増加量を用いることとしている。そこで、まず、本実施形態におけるペナルティの算出手法について図１を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the information storage and retrieval method of the present invention, when the penalty used when constructing the retrieval tree is to insert the entry B into the existing entry A that is already included in the node to be operated in the case of node selection processing or the like, It is defined as a value indicating an increment of a value corresponding to the probability that the entry A is searched when the entry A before the entry B is inserted and the entry A after the insertion B are compared. It is a feature. In the present embodiment, as the value indicating the increment of the value corresponding to the probability that this entry is searched, an increase amount of the sum of the weighted normalized search area for each attribute included in the key included in the entry is used. It is said. First, the penalty calculation method in this embodiment will be described in detail with reference to FIG.

本実施形態においては、検索木のペナルティとして、エントリに含まれるキーに含まれる各属性に対する重みづけ付き正規化被検索面積の和の増加量を用いることとした。このエントリに含まれるキーに含まれる各属性に対する重みづけ付き正規化被検索面積の和の増加量について、数式を示して説明する。 In this embodiment, as the penalty of the search tree, an increase amount of the sum of the weighted normalized search areas for each attribute included in the key included in the entry is used. An increase amount of the sum of the weighted normalized search areas for the attributes included in the key included in the entry will be described with reference to mathematical expressions.

（数式の定義）
今、ある属性ｘａの定義域がＲＡからＲＢまでであり、あるエントリＥが保持するキーに属性ｘａが値範囲をｐからｑとして含まれていた場合を考える。この場合、属性ｘａの値範囲を指定した検索条件は、指定する値範囲が属性ｘａの定義域内であるとすると、図１の三角形内の一点に対応する。図１は、エントリＥが属性ｘａの値範囲を指定して検索される確率を模式的に説明するための座標系であり、属性ｘａを指定した検索条件について、横軸に値範囲の始点、縦軸に値範囲の終点をとってその検索条件に対応する座標を表している。例えばｘａ１〜ｘａ２の値範囲を指定した検索条件であれば、図１中の点Ｘａに対応する。このとき、該エントリＥ以下に該当するタプルを含みうる検索条件、すなわちエントリＥにアクセスする必要がある検索条件は、図１の斜線部内の一点に対応する。 (Formula definition)
Consider a case where the domain of a certain attribute xa is from RA to RB, and the attribute xa is included in a key held by a certain entry E as a value range from p to q. In this case, the search condition specifying the value range of the attribute xa corresponds to one point in the triangle of FIG. 1 if the specified value range is within the domain of the attribute xa. FIG. 1 is a coordinate system for schematically explaining the probability that an entry E is searched by specifying a value range of the attribute xa. The search condition specifying the attribute xa is the start point of the value range on the horizontal axis. The vertical axis represents the end point of the value range, and represents the coordinates corresponding to the search condition. For example, if the search condition specifies the value range of xa1 to xa2, it corresponds to the point Xa in FIG. At this time, a search condition that can include a tuple corresponding to the entry E or lower, that is, a search condition that needs to access the entry E corresponds to one point in the shaded portion in FIG.

ここで、数式Ｉを、以下の通り定義する。 Here, Formula I is defined as follows.

（（ｑ−ａ）（ｂ−ｐ）−（ｑ−ｐ）^２／２）／（（ｂ−ａ）^２／２）（式Ｉ） ((Q-a) (b -p) - (q-p) 2/2) / ((b-a) 2/2) ( Formula I)

ｐ、ｑはそれぞれ、キーに含まれる該属性の最小値、最大値であり、また、ａ、ｂはそれぞれ、それまで検索木に挿入された全タプル集合に含まれる属性の最小値、最大値を表すものである。数式Ｉは、正規化被検索面積を表す。 p and q are the minimum value and maximum value of the attribute included in the key, respectively, and a and b are the minimum value and maximum value of the attribute included in all tuple sets inserted in the search tree until then. Is expressed. Formula I represents the normalized search area.

属性ｘａを指定する検索が行われる際、その値範囲がｘｓｔａｒｔ〜ｘｅｎｄである確率が、確率密度関数Ｘｐ（ｘｓｔａｒｔ、ｘｅｎｄ）で与えられるとすると、属性ｘａを指定する検索においてエントリＥがアクセスされる確率は、斜線部に渡り確率密度関数Ｘｐを積分した値に等しい。 When a search specifying the attribute xa is performed, if the probability that the value range is xstart to xend is given by the probability density function Xp (xstart, xend), the entry E is accessed in the search specifying the attribute xa Is equal to the value obtained by integrating the probability density function Xp across the shaded area.

しかしながら、通常、確率度密度分布Ｘｐを事前に知ることは難しい。そこで、図１の三角形内のどの点に対応する検索条件も等しい確率で発生しうると仮定すると、エントリＥがアクセスされる確率は、（（斜線部の面積）／（三角形の面積））となる。すなわち、該確率は次式で与えられる。 However, it is usually difficult to know the probability density distribution Xp in advance. Therefore, assuming that the search condition corresponding to any point in the triangle in FIG. 1 can occur with an equal probability, the probability that the entry E is accessed is ((shaded area) / (triangle area)). Become. That is, the probability is given by

（（ｑ−ＲＡ）（ＲＢ−ｐ）−（ｑ−ｐ）^２／２）／（（ＲＢ−ＲＡ）^２／２） ((Q-RA) (RB -p) - (q-p) 2/2) / ((RB-RA) 2/2)

また、定義域ＲＡ、ＲＢが不明な場合には、それまでにインデックスに登録されたタプル集合における、属性ｘａについての最小値ａ、最大値ｂをそれぞれＲＡ、ＲＢの代わりに用いることが考えられる。このとき、斜線部の面積は次式Ｓで与えられる。 When the definition areas RA and RB are unknown, it is considered that the minimum value a and the maximum value b for the attribute xa in the tuple set registered in the index so far are used instead of RA and RB, respectively. . At this time, the area of the shaded area is given by the following equation S.

（（ｑ−ａ）（ｂ−ｐ）−（ｑ−ｐ）^２／２）／（（ｂ−ａ）^２／２）（式Ｓ） ((Q-a) (b -p) - (q-p) 2/2) / ((b-a) 2/2) ( Equation S)

式Ｓが該数式Ｉに等しい式であり、これにより求められる値を正規化被検索面積と呼ぶ。 The expression S is an expression equal to the expression I, and a value obtained by this is called a normalized search area.

（効果の説明）
ある属性が検索式に用いられ、さらに該検索式において該属性の値範囲を指定した範囲条件が用いられている場合、該値範囲の始点、終点を示す２値が該属性の定義域内で一様の確率で選ばれると仮定すると、該エントリにアクセスする必要がある確率は数式Ｉで近似される。数式Ｉで求められる値が該正規化被検索面積である。重みづけ付き正規化被検索面積は、さらに該属性が検索式に用いられる確率を正規化被検索面積に乗じたものである。 (Explanation of effect)
When a certain attribute is used in the search expression and a range condition specifying the value range of the attribute is used in the search expression, two values indicating the start point and end point of the value range are identical in the definition area of the attribute. Assuming that it is chosen with a similar probability, the probability that the entry needs to be accessed is approximated by Equation I. The value obtained by Expression I is the normalized search area. The weighted normalized search area is obtained by multiplying the normalized search area by the probability that the attribute is used in the search expression.

従って、エントリに含まれるキーに含まれる各属性に対する重みづけ付き正規化被検索面積の和は、範囲検索時に、検索条件に適合するタプルを探すために該エントリにアクセスしなければならない確率の近似値に相当する。該重み付け付き正規化被検索面積の和をペナルティとして用い、ペナルティをなるべく小さくする戦略の下、検索木を構築することにより、検索時にアクセスしなければならないエントリ数を総じて小さくすることが可能となる。すなわち、検索の処理量、処理速度を向上させることが可能となる。 Therefore, the sum of the weighted normalized search area for each attribute included in the key included in the entry is an approximation of the probability that the entry must be accessed in order to search for a tuple that matches the search condition. Corresponds to the value. By using the sum of the weighted normalized search areas as a penalty and constructing a search tree under the strategy of reducing the penalty as much as possible, the number of entries that must be accessed during the search can be reduced as a whole. . That is, it is possible to improve the search processing amount and processing speed.

また数式Ｉにおいて、ｐ＝ｑの場合、すなわちエントリＥが保持する属性ｘａの値範囲の始点、終点が等しい場合を考えると、 In Equation I, when p = q, that is, when the start point and end point of the value range of the attribute xa held by the entry E are equal,

（（ｑ−ａ）（ｂ−ｐ）−（ｑ−ｐ）^２／２）／（（ｂ−ａ）^２／２）
＝（ｐ−ａ）（ｂ−ｐ）／（（ｂ−ａ）^２／２）（式Ｔ） ((Q-a) (b -p) - (q-p) 2/2) / ((b-a) 2/2)
= (P-a) (b -p) / ((b-a) 2/2) ( Equation T)

となる。通常、ｂ＞ａであり、また多くの場合ｐ＞ａかつｂ＞ｐとであるから、多くの場合（式Ｔの右辺）＞０となる。すなわち、属性ｘａの値範囲の始点、終点が等しい場合でも、当該属性の存在によりペナルティが発生することとなる。 It becomes. Usually, b> a, and in many cases p> a and b> p, so in many cases (right side of Expression T)> 0. That is, even when the start point and the end point of the value range of the attribute xa are equal, a penalty occurs due to the presence of the attribute.

あるエントリにそれまで含まれていなかった属性が、タプルの挿入により該エントリに新たに追加された場合、該タプルの該属性が値範囲でなく値である場合には、該エントリは該属性の値範囲の始点、終点が等しい状態となる。この状態への変化には、上述の通りペナルティの発生を伴うため、本発明においては、このような属性種類数が増加する状態遷移、すなわちエントリの次元増加は抑制される。 If an attribute that was not previously included in an entry is newly added to the entry by inserting a tuple, and if the attribute of the tuple is a value rather than a value range, the entry The start point and end point of the value range are equal. Since the change to this state is accompanied by a penalty as described above, in the present invention, such a state transition in which the number of attribute types increases, that is, an increase in the dimension of an entry is suppressed.

一方、Ｒ木で用いられるペナルティは、エントリが保持する属性の値範囲の長さの増分、あるいはそれを一辺とする超立方の体積の増分であるため、値範囲の始点、終点が等しい場合にはペナルティは０となる。すなわち、挿入するタプルの属性が値範囲でなく値をもつ場合には、次元増加への抑制力が全く働かない。 On the other hand, the penalty used in the R-tree is an increase in the length of the value range of the attribute held by the entry, or an increase in the volume of a hypercube with one side as a side, so that the start point and end point of the value range are equal. Will have a penalty of zero. That is, when the attribute of the tuple to be inserted has a value rather than a value range, the suppression force to the dimension increase does not work at all.

このように本発明は、各エントリの次元増加を抑制し、これにより次元数が大きくなって「次元の呪い」の問題が発生することを防ぐことを可能とする。 As described above, the present invention can suppress the increase in the dimension of each entry, thereby preventing the problem of “curse of dimension” due to an increase in the number of dimensions.

また、以下の実施形態では、検索木のペナルティとして、重みづけ付き正規化被検索面積の和の増加量に基づくペナルティに加えて、エントリに含まれるキーに含まれる属性種類数の増加量を用いることとした。これにより、該重みづけ付き正規化被検索面積の和の増加量のみをペナルティとした場合以上に、さらに強力にエントリの次元増加が抑えられ、「次元の呪い」の問題を抑制することができる。 Further, in the following embodiment, as the penalty of the search tree, in addition to the penalty based on the increase amount of the sum of the weighted normalized search areas, the increase amount of the attribute type included in the key included in the entry is used. It was decided. As a result, the dimensionality of the entry can be suppressed more strongly and the problem of “curse of dimension” can be suppressed more than when only the increased amount of the sum of the weighted normalized search areas is used as a penalty. .

以下の実施形態では、検索木のペナルティを１つ、あるいは、２つ使用する。すなわち、第一のペナルティとして、該「エントリに含まれるキーに含まれる属性種類数の増加量」を用い、第二のペナルティとして、エントリ挿入前後での検索される確率ないし検索される確率の近似値の増加量、すなわち該「エントリに含まれるキーに含まれる各属性に対する重みづけ付き正規化被検索面積の和の増加量」を用いる。第一のペナルティは強い次元増加低減効果を、第二のペナルティはある程度の次元増加低減効果を有するので、部分木選択手順ないしノード分割手順において、いずれかのペナルティを使用するだけでも、「次元の呪い」問題を抑圧することができる。さらには、部分木選択手順において、第一のペナルティが最も小さいエントリが複数存在する場合に、それらの間で第二のペナルティを比較し、第二のペナルティが最も小さいエントリを選ぶこととし、またノード分割手順において、第一のペナルティを分類の指標としてエントリを分類し、第一のペナルティの指標が同じ値になる場合にはさらに第二のペナルティを分類の指標としてエントリを分類することとし、具体的には、各エントリについて、グループＧ１へ入れた場合の第一のペナルティとグループＧ２へ入れた場合の第一のペナルティの差分が最も大きいエントリから順に、第一のペナルティが小さいグループへ分類し、差分が最も大きいエントリが複数存在する場合に、それらの間で第一のペナルティが小さいグループへ分類した場合の第二のペナルティを比較し、第二のペナルティが最も小さいエントリから順に、第一のペナルティが小さいグループへ分類することとすれば、第一のペナルティによる次元増加低減効果と、第二のペナルティによる検索効率化効果を合わせ持った、検索木構築方法となる。特に、第二のペナルティとして該「エントリに含まれるキーに含まれる各属性に対する重みづけ付き正規化被検索面積の和の増加量」を用いた場合には、第一のペナルティが属性の種類だけを考慮するのに対し、第二のペナルティによる「各属性（次元）の値までを考慮することによる検索効率化効果」を合わせ持った、検索木構築方法となると言えるので、より効果的である。 In the following embodiments, one or two search tree penalties are used. That is, as the first penalty, the “increase in the number of attribute types included in the key included in the entry” is used, and as the second penalty, the search probability before and after the entry insertion or an approximation of the search probability The amount of increase in value, that is, “the amount of increase in the sum of the weighted normalized search areas for each attribute included in the key included in the entry” is used. Since the first penalty has a strong dimensional increase reduction effect and the second penalty has a certain dimensional increase reduction effect, even if any penalty is used in the subtree selection procedure or node splitting procedure, It can suppress the “curse” problem. Further, in the subtree selection procedure, when there are a plurality of entries having the smallest first penalty, the second penalty is compared among them, and the entry having the smallest second penalty is selected. In the node splitting procedure, the entry is classified using the first penalty as the classification index, and if the first penalty index is the same value, the entry is further classified using the second penalty as the classification index. Specifically, for each entry, the entry having the largest difference between the first penalty when entering the group G1 and the first penalty when entering the group G2 is classified into the group having the smallest first penalty. If there are multiple entries with the largest difference, classify them into groups with the smallest first penalty between them. If the second penalty is compared, and the entries with the smallest second penalty are sorted into the group with the smallest first penalty, the dimensional increase reduction effect by the first penalty and the second penalty This is a search tree construction method that combines the search efficiency improvement effects of the penalty. In particular, when the “increase in the sum of weighted normalized search areas for each attribute included in the key included in the entry” is used as the second penalty, the first penalty is only the attribute type. Is more effective because it can be said to be a search tree construction method that combines the "search efficiency improvement effect by considering the value of each attribute (dimension)" by the second penalty. .

図２は、本発明の一実施形態による情報蓄積検索システムを示す概略ブロック図である。情報蓄積検索システムは、ネットワーク１０１を介して接続された、情報、すなわちタプルの蓄積・検索を行うサーバ・コンピュータ装置２０１と、タプルないし検索式の作成とサーバ・コンピュータ装置２０１とのタプルないし検索式ないし検索結果の送受信を行うクライアント・コンピュータ装置３０１とからなる。ネットワーク１０１は、インターネットなどの公衆網、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、専用線などの私設網からなる。なお、図２では簡単のためクライアント・コンピュータ装置３０１を１台のみ記載したが、それに限らず複数台設けても良い。 FIG. 2 is a schematic block diagram showing an information storage / retrieval system according to an embodiment of the present invention. The information storage / retrieval system includes a server / computer device 201 that stores and retrieves information, that is, tuples, and a tuple / search formula between the server / computer device 201 and the creation of tuples / search formulas. Or a client computer 301 that transmits and receives search results. The network 101 includes a public network such as the Internet, a private network such as a LAN (Local Area Network), a dedicated line. In FIG. 2, only one client computer device 301 is shown for simplicity, but a plurality of client computer devices 301 may be provided.

なお、サーバ・コンピュータ装置２０１およびクライアント・コンピュータ装置３０１は、コンピュータ及びその周辺装置と、そのコンピュータによって実行されるプログラムとを用いて実現することができる。また、そのプログラムは、コンピュータ読み取り可能な記録媒体や通信回線を介して提供することが可能である。 The server computer device 201 and the client computer device 301 can be realized using a computer and its peripheral devices and a program executed by the computer. The program can be provided via a computer-readable recording medium or a communication line.

本実施形態では、タプルを蓄積し、検索式によりタプルを検索する。タプルは、１つ以上の属性と値のペア（ＡＶペア）から構成される。蓄積するタプルのＡＶペアの数やその属性の種類は様々である。図３にタプルの例を示す。図３（ａ）は、３次元データのタプルの例であり、この例では、「機器種別」、「使用者」および「使用開始年月日」の各属性に、「ＰＣ」、「○○太郎」および「２００５／０４／０１」の各値がそれぞれ対（あるいは組）をなすことでタプルが構成されている。図３（ｂ）は、７次元データのタプルの例であり、この例では、「機器種別」、「使用者」、「緯度」、「経度」、「センシング時刻」、「センサＩＤ」、および「センサ値」の各属性に、「温度センサ」、「○○太郎」、「１２６０３０」、「５０１０１８」、「２００８／１１／１１１２：３４：５６」、「１２３４５６」および「２３．５」の各値がそれぞれ対をなすことでタプルが構成されている。ここで、図３（ｂ）のタプルでは、緯度、経度の単位は秒であり、北緯と東経が正、南緯と西経が負で表現されることとしている。 In the present embodiment, tuples are accumulated, and tuples are searched using a search formula. A tuple is composed of one or more attribute / value pairs (AV pairs). The number of tuple AV pairs to be stored and the types of their attributes vary. FIG. 3 shows an example of a tuple. FIG. 3A is an example of a tuple of three-dimensional data. In this example, “PC”, “XX” are assigned to the “device type”, “user”, and “use start date” attributes. Tuples are configured by pairs (or sets) of values of “Taro” and “2005/04/01”. FIG. 3B is an example of a 7-dimensional data tuple. In this example, “device type”, “user”, “latitude”, “longitude”, “sensing time”, “sensor ID”, and For each attribute of “sensor value”, “temperature sensor”, “Taro XX”, “126030”, “501018”, “2008/11/11 12:34:56”, “123456”, and “23.5” Tuples are formed by pairs of values. Here, in the tuple of FIG. 3B, the unit of latitude and longitude is seconds, and north latitude and east longitude are expressed as positive, and south latitude and west longitude are expressed as negative.

また、検索式は、１つ以上の属性と値ないし値の範囲のペアから構成される。図４に検索式の例を示す。検索式は、検索条件を満たすタプルを検索結果として得るための記述であり、図４に示す例では属性「機器種別」の値が「温度センサ」であり、属性「緯度」の値が値の範囲「１２６０００〜１２７０００」内にあり、属性「経度」の値が値の範囲「５０１０００〜５０１１００」内にあるタプルを検索結果とするものである。検索時には、検索式に含まれる全ての属性を含み、かつ該属性の値が検索式に含まれる該属性の値ないし値の範囲内にあるタプルを検索結果とする。例えば、図４の検索式による検索においては、図３の（ｂ）のタプルが検索式に適合し、検索結果に含まれることになる。 The search expression is composed of one or more attributes and a value or value range pair. FIG. 4 shows an example of a search expression. The search formula is a description for obtaining a tuple satisfying the search condition as a search result. In the example shown in FIG. 4, the value of the attribute “device type” is “temperature sensor”, and the value of the attribute “latitude” is a value. Tuples that are in the range “126000 to 127000” and whose attribute “longitude” value is in the value range “501000 to 501100” are used as the search results. When searching, tuples that include all the attributes included in the search expression and whose attribute value is within the attribute value or value range included in the search expression are used as search results. For example, in the search using the search formula of FIG. 4, the tuple of FIG. 3B matches the search formula and is included in the search result.

次に、サーバ・コンピュータ装置２０１の内部構成について説明する。
図５は、サーバ・コンピュータ装置２０１の構成を示すブロック図である。サーバ・コンピュータ装置２０１は、クライアント・コンピュータ装置３０１からのタプル蓄積要求／タプル検索要求の受信または検索結果の送信を行う通信部２０２と、クライアント・コンピュータ装置３０１からの要求に応じて、インデックス構築を含む蓄積処理やインデックスを用いた検索処理を行う演算部２０３と、インデックスと属性表とタプル総数とタプル表とを保持する記憶部２０４とを備える。 Next, the internal configuration of the server computer apparatus 201 will be described.
FIG. 5 is a block diagram showing a configuration of the server computer apparatus 201. The server computer apparatus 201 receives a tuple accumulation request / tuple search request from the client computer apparatus 301 or transmits a search result, and builds an index in response to a request from the client computer apparatus 301. And a storage unit 204 that stores an index, an attribute table, a total number of tuples, and a tuple table.

また、本実施形態では、記憶部２０４内における、検索式で示された検索条件を満たすタプルの格納位置を決定する際に、インデックスを用いたファイルアクセス法が採用されている。インデックスのデータ構造としては、背景技術で述べたような種々の検索木を用いることができる。本実施形態で用いられるインデックスのデータ構造の一例を図６に示す。インデックスのデータ構造は複数のノードと、階層的にそれらをつなぐ枝で表される木構造である。最も上位のノードをルートノードと呼び、最も下位のノードをリーフノードと呼ぶ。またリーフノード以外のノードをインナーノードと呼ぶ。図６に示す例では、４個のインナーノード４０２と、７個のリーフノード４０３とから検索木が構成されている。インナーノード４０２の最上位のノードがルートノード４０１である。各インナーノード４０２には枝で繋がれた他のノードへのポインタと、後述するエントリ表５０２とが保持されている。各リーフノード４０３には枝で繋がれた他のインナーノード４０２へのポインタと、後述するエントリ表５０３とが保持されている。 Further, in the present embodiment, a file access method using an index is adopted when determining the storage position of the tuple satisfying the search condition indicated by the search formula in the storage unit 204. As the data structure of the index, various search trees as described in the background art can be used. An example of an index data structure used in this embodiment is shown in FIG. The data structure of the index is a tree structure represented by a plurality of nodes and branches that connect them hierarchically. The highest node is called a root node, and the lowest node is called a leaf node. Nodes other than leaf nodes are called inner nodes. In the example illustrated in FIG. 6, a search tree is configured by four inner nodes 402 and seven leaf nodes 403. The top node of the inner node 402 is the root node 401. Each inner node 402 holds pointers to other nodes connected by branches and an entry table 502 described later. Each leaf node 403 holds a pointer to another inner node 402 connected by a branch and an entry table 503 described later.

次に、記憶部２０４に格納されているタプル表の例を図７に示す。タプル表には、タプルＩＤ（タプル識別子）とそのタプルの情報が格納される。図７に示すタプル表５０１には、２つのタプルＩＤ「００００００００１」および「００００００００２」と、各タプルＩＤにそれぞれ対応する２つのタプル「ｄｅｖｉｃｅ＿ｔｙｐｅ＝“ＰＣ”，ｕｓｅｒ＝“ｍｉｋｅ”，ｄａｔｅ＿ｏｆ＿ｆｉｒｓｔ＿ｕｓｅ＝“２００５／０４／０１”」および「ｄｅｖｉｃｅ＿ｔｙｐｅ＝“ｔｅｍｐｅｒａｔｕｒｅｓｅｎｓｏｒ”，ｕｓｅｒ＝“ｍｉｋｅ”，ｌａｔ＝“１２６０３０”，ｌｏｎｇ＝“５０１０１８”，ｔｉｍｅ＝“２００８／１１／１１１２：３４：５６”，ｓｅｎｓｏｒ＿ｉｄ＝“１２３４５６”，ｔｅｍｐｅｒａｔｕｒｅ＿ｖａｌｕｅ＝“２３．５”」の情報とが格納されている。 Next, an example of a tuple table stored in the storage unit 204 is shown in FIG. The tuple table stores a tuple ID (tuple identifier) and information on the tuple. The tuple table 501 shown in FIG. 7 includes two tuple IDs “000000001” and “000000002”, and two tuples “device_type =“ PC ”, user =“ mike ”, date_of_first_use =“ 2005 ”respectively corresponding to each tuple ID. / 04/01 ”and“ device_type = “temperature sensor”, user = “mike”, lat = “126030”, long = “501018”, time = “2008/11/11 12:34:56”, sensor_id = Information of “123456” and temperature_value = “23.5” is stored.

リーフノード４０３は、１つ上位の階層のノード（親ノード）へのポインタと、図８に示すリーフノードのエントリ表５０３とを保持する。ノードへのポインタは当該ノードの蓄積アドレスを意味し、各ノード間を繋げる枝に相当する。リーフノード４０３のエントリはタプルＩＤとタプルのキーの組から構成され、該キーは、１以上の長さを有する属性−値の組の並びから成る。すなわち、１個のエントリは、エントリ表５０３の縦の１列（１カラム）に該当する。リーフノードのエントリ表５０３は、Ｔ１からＴ２までの個数のリーフノードのエントリを保持する。Ｔ１、Ｔ２は、予め定められた定数であり、どちらも正の整数、またＴ１＜Ｔ２である。 The leaf node 403 holds a pointer to a node (parent node) in the hierarchy one level higher and a leaf node entry table 503 shown in FIG. A pointer to a node means a storage address of the node, and corresponds to a branch connecting the nodes. The entry of the leaf node 403 is composed of a tuple ID and tuple key pair, and the key is composed of a sequence of attribute-value pairs having a length of 1 or more. That is, one entry corresponds to one vertical column (one column) in the entry table 503. The leaf node entry table 503 holds the number of leaf node entries from T1 to T2. T1 and T2 are predetermined constants, both of which are positive integers, and T1 <T2.

図８に示す例では、エントリ表５０３に、タプルＩＤ「０００００８１２９」とそのタプルＩＤに対応するタプルのキーの組とからなるエントリと、タプルＩＤ「０００００００４５」とそのタプルＩＤに対応するタプルのキーの組とからなるエントリと、タプルＩＤ「００００００８０２」とそのタプルＩＤに対応するタプルのキーの組とからなるエントリと、タプルＩＤ「０００００３５８１」とそのタプルＩＤに対応するタプルのキーの組とからなるエントリとの、４個のエントリが含まれている。この場合、例えば、タプルＩＤ「０００００８１２９」に対応するタプルのキーの組は、「ＰＣ」（属性「ｄｅｖｉｃｅ＿ｔｙｐｅ」に対する値）、「ｍｉｋｅ」（属性「ｕｓｅｒ」に対する値）、「２００７／１１／２３」（属性「ｄａｔｅ＿ｏｆ＿ｆｉｒｓｔ＿ｕｓｅ」に対する値）、「１０３４５６」（属性「ｌａｔ」に対する値）および「２１８２３４」（属性「ｌｏｎｇ」に対する値）である。また、例えば、タプルＩＤ「０００００００４５」に対応するタプルのキーの組は、「ｔｅｍｐｅｒａｔｕｒｅｓｅｎｓｏｒ」（属性「ｄｅｖｉｃｅ＿ｔｙｐｅ」に対する値）、「ｍｉｋｅ」（属性「ｕｓｅｒ」に対する値）、「１３２２１０」（属性「ｌａｔ」に対する値）、「５０１２２２」（属性「ｌｏｎｇ」に対する値）、「２００２／０１／０１１２：３４：５６」（属性「ｔｉｍｅ」に対する値）、「１１９８３」（属性「ｓｅｎｓｏｒ＿ｉｄ」に対する値）および「１５．４」（属性「ｓｅｎｓｏｒ＿ｖａｌｕｅ」に対する値）である。また、記号「ＮＵＬＬ」（空値）はエントリの終わりを示している。また、記号「−」は属性に対するキーが空値であることを示している。 In the example illustrated in FIG. 8, the entry table 503 includes an entry including a tuple ID “00000008129” and a tuple key pair corresponding to the tuple ID, and a tuple ID “000000045” and a tuple key corresponding to the tuple ID. , An entry including a tuple ID “0000000082” and a tuple key set corresponding to the tuple ID, and a tuple key set corresponding to the tuple ID “000003581” and the tuple ID. 4 entries are included. In this case, for example, the tuple key pair corresponding to the tuple ID “0000008129” is “PC” (value for the attribute “device_type”), “mike” (value for the attribute “user”), “2007/11/23”. (Value for attribute “date_of_first_use”), “103456” (value for attribute “lat”), and “218234” (value for attribute “long”). In addition, for example, the tuple key pair corresponding to the tuple ID “000000045” includes “temperature sensor” (value for the attribute “device_type”), “mike” (value for the attribute “user”), “132210” (attribute “ lat ”),“ 501222 ”(value for attribute“ long ”),“ 2002/01/01 12:34:56 ”(value for attribute“ time ”),“ 11983 ”(value for attribute“ sensor_id ”) And “15.4” (value for the attribute “sensor_value”). The symbol “NULL” (null value) indicates the end of the entry. The symbol “-” indicates that the key for the attribute is a null value.

またインナーノード４０２は、親ノードへのポインタと、図９に示すインナーノードのエントリ表５０２を保持する。インナーノードのエントリは子ノードへのポインタと子ノードのキーの組から構成され、該キーは、１以上の長さを有する属性−値範囲の組の並びから成り、該並びは該子ノードが下位に含むタプル集合における各属性についての最小値から最大値までを表す。インナーノードのエントリ表は、Ｔ１からＴ２までの個数のインナーノードのエントリを保持する。エントリ表５０２において、１個のエントリは、エントリ表５０２の縦の１列（１カラム）に該当する。例えば、図９において、「子ノード１のアドレス」と「子ノード２のアドレス」の「ｄｅｖｉｃｅ＿ｔｙｐｅ」は、それぞれ「“ａ”〜“ｋ”」、「“ｈ”〜“ｑ”」となっている。これは、「ｄｅｖｉｃｅ＿ｔｙｐｅ」の頭文字が“ａ”〜“ｋ”で始まるタプルは子ノード１に格納されている可能性があることを、“ｈ”〜“ｑ” で始まるタプルは子ノード２に格納されている可能性があることを示している。すなわち、「ｄｅｖｉｃｅ＿ｔｙｐｅ」の頭文字が“ｈ”〜“ｋ”で始まるタプルについては、子ノード１または子ノード２のいずれかに格納されていることとなる。インナーノードのエントリ表は、Ｔ１以上Ｔ２以下の個数のインナーノードのエントリを保持する。また、この場合、エントリ表５０２には、子ノードへのポインタ「子ノード１のアドレス」とその子ノードのキーの組とからなるエントリと、子ノードへのポインタ「子ノード２のアドレス」とその子ノードのキーの組とからなるエントリと、子ノードへのポインタ「子ノード３のアドレス」とその子ノードのキーの組とからなるエントリと、子ノードへのポインタ「子ノード４のアドレス」とその子ノードのキーの組とからなるエントリとの、４個のエントリが含まれている。図９において、「Ｍ１〜Ｍ２」の形式の表記は、「Ｍ１」が最小値、「Ｍ２」が最大値を示している。 The inner node 402 holds a pointer to the parent node and the entry table 502 of the inner node shown in FIG. An entry of the inner node is composed of a pair of a pointer to the child node and a key of the child node, and the key is composed of a sequence of attribute-value range pairs having a length of 1 or more. Represents the minimum value to the maximum value for each attribute in the tuple set included in the lower level. The inner node entry table holds the number of inner node entries from T1 to T2. In the entry table 502, one entry corresponds to one vertical column (one column) of the entry table 502. For example, in FIG. 9, “device_type” of “address of child node 1” and “address of child node 2” are “a” to “k” and “h” to “q”, respectively. Yes. This means that tuples starting with “device_type” starting with “a” to “k” may be stored in child node 1, and tuples starting with “h” to “q” are child node 2. Indicates that it may be stored. That is, tuples whose initial letters “device_type” begin with “h” to “k” are stored in either the child node 1 or the child node 2. The inner node entry table holds the number of inner node entries from T1 to T2. In this case, the entry table 502 includes an entry including a pointer “child node 1 address” to the child node and a key set of the child node, and a pointer “child node 2 address” to the child node and its child. An entry consisting of a node key pair, a child node pointer "child node 3 address" and its child node key pair entry, a child node pointer "child node 4 address" and its children Four entries are included, an entry consisting of a key set of nodes. In FIG. 9, the notation of the format “M1 to M2” indicates that “M1” is the minimum value and “M2” is the maximum value.

図６においてルートノード４０１はインナーノードであるが、ルートノード４０１は特殊なノードであり、インデックス全体が一つのノードのみからなる場合には、ルートノード４０１は最下位のノードでもあるためリーフノードとなる。ルートノード４０１がリーフノードの場合には、エントリ表に０からＴ２の個数のリーフノードのエントリを保持する。またルートノード４０１がインナーノードの場合には、２からＴ２の個数のインナーノードのエントリを保持する。ルートノード４０１は親ノードをもたないため、親ノードへのポインタを保持しない。 In FIG. 6, the root node 401 is an inner node, but the root node 401 is a special node. When the entire index is composed of only one node, the root node 401 is also the lowest node, Become. If the root node 401 is a leaf node, 0 to T2 leaf node entries are held in the entry table. When the root node 401 is an inner node, the number of inner node entries from 2 to T2 is held. Since the root node 401 does not have a parent node, it does not hold a pointer to the parent node.

次に、記憶部２０４に格納されている属性表を図１０に示す。属性表には、それまでに蓄積したタプル集合に含まれる全属性について、含まれていた回数と最小値と最大値とを保持する。図１０に示す属性表５０４には、タプルの各属性「ｄｅｖｉｃｅ＿ｔｙｐｅ」、「ｕｓｅｒ」、「ｄａｔｅ＿ｏｆ＿ｆｉｒｓｔ＿ｕｓｅ」、「ｌａｔ」、「ｌｏｎｇ」、「ｔｉｍｅ」、「ｓｅｎｓｏｒ＿ｉｄ」および「ｓｅｎｓｏｒ＿ｖａｌｕｅ」の出現頻度ならびに各属性の値の最小値および最大値が保持されている。なお、タプルの値の最小値および最大値は、例えば、タプルの値を、属性毎にあらかじめ設定された所定の変換処理によって、所定の数値に変換することで求めること等ができる。変換処理としては、例えば、タプルの値が文字列の場合は文字列を文字コードに変換した後に四則演算等の所定の演算処理を行って数値を得るようにしたり、時、分、秒で表される値を秒の単位で表するようにして数値を得るようにしたりする処理を用いることができる。 Next, the attribute table stored in the storage unit 204 is shown in FIG. The attribute table holds the number of times included, the minimum value, and the maximum value for all attributes included in the tuple set accumulated so far. The attribute table 504 illustrated in FIG. 10 includes the occurrence frequencies of tuple attributes “device_type”, “user”, “date_of_first_use”, “lat”, “long”, “time”, “sensor_id”, and “sensor_value”. The minimum and maximum attribute values are preserved. The minimum value and the maximum value of the tuple values can be obtained, for example, by converting the tuple values into predetermined numerical values by a predetermined conversion process set in advance for each attribute. For example, when the tuple value is a character string, the conversion processing is performed by converting the character string into a character code and then performing a predetermined arithmetic processing such as four arithmetic operations to obtain a numerical value, or by expressing in hours, minutes, and seconds. A process for obtaining a numerical value by expressing the value to be expressed in units of seconds can be used.

次に、クライアント・コンピュータ装置３０１の内部構成について説明する。
図１１は、クライアント・コンピュータ装置３０１の構成を示すブロック図である。クライアント・コンピュータ装置３０１は、サーバ・コンピュータ装置２０１へのタプル蓄積要求／タプル検索要求の送信または検索結果の受信を行う通信部３０２と、サーバ・コンピュータ装置２０１へ送信するタプル蓄積要求に含まれるタプルやタプル検索要求に含まれる検索式の作成、サーバ・コンピュータ装置２０１から受信した検索結果であるタプル集合を表示するタプル／検索式作成表示部３０３とを備える。タプル／検索式の作成には、入力デバイスを用いてユーザが入力する方法や、センサデバイスから得られたセンサデータから作成する方法がある。 Next, the internal configuration of the client computer device 301 will be described.
FIG. 11 is a block diagram showing the configuration of the client computer device 301. The client computer device 301 includes a communication unit 302 that transmits a tuple accumulation request / tuple search request to the server computer device 201 or receives a search result, and a tuple included in the tuple accumulation request that is transmitted to the server computer device 201. And a tuple / search formula creation display unit 303 for displaying a tuple set that is a search result received from the server / computer apparatus 201 and a search formula included in the tuple search request. There are two methods for creating a tuple / search expression: a method in which a user inputs data using an input device, and a method in which data is created from sensor data obtained from a sensor device.

本実施形態において、タプルを蓄積する際には、クライアント・コンピュータ装置３０１から受信したタプル蓄積要求に対し、サーバ・コンピュータ装置２０１の演算部２０３は、まず記憶部２０４が保持するタプル表５０１を参照し、未使用の新しいタプルＩＤを決定し、該タプルＩＤとタプル蓄積要求に含まれるタプルをタプル表５０１に追加する。また属性表５０４を更新するとともにタプル総数（すなわち蓄積しているタプルの全個数）を１増やす。さらに、記憶部２０４が保持するインデックスに該タプルに対応するエントリを挿入する。このときのエントリ挿入手順フローを図１２に示す。図１２に示す処理における引数のエントリＸ、レベル数Ｌとしては、それぞれ該エントリ、０とする。ここでレベル数は、リーフノードの階層のレベル数を０とし、１つ上位の階層に上がるごとに、レベル数は１増えるものとする。このエントリＸは、例えば図８のエントリ表５０３に示される４個のエントリのいずれかに対応するようなデータである。 In this embodiment, when accumulating tuples, in response to a tuple accumulation request received from the client computer apparatus 301, the arithmetic unit 203 of the server computer apparatus 201 first refers to the tuple table 501 held in the storage unit 204. Then, an unused new tuple ID is determined, and the tuple ID and the tuple included in the tuple accumulation request are added to the tuple table 501. Further, the attribute table 504 is updated and the total number of tuples (that is, the total number of tuples accumulated) is increased by one. Further, an entry corresponding to the tuple is inserted into the index held by the storage unit 204. The entry insertion procedure flow at this time is shown in FIG. The entry X of the argument and the number of levels L in the processing shown in FIG. Here, the number of levels is assumed to be 0 for the level of the leaf node hierarchy, and the level number is increased by 1 each time the level is increased by one. The entry X is data corresponding to one of the four entries shown in the entry table 503 in FIG. 8, for example.

図１２に示すように、まず、該エントリ、０を入力として部分木選択手順を実施する（ステップＳ１１−１）。部分木選択手順（ステップＳ１１−１）では、エントリＸを挿入すべきレベル数Ｌのノードが選択ノードとして選択される（すなわちステップＳ１１−１のサブルーチンから戻る際に、挿入すべきノードを示すアドレスが返り値として返される）。この部分木選択手順については後述する。 As shown in FIG. 12, first, a subtree selection procedure is performed with the entry 0 as an input (step S11-1). In the subtree selection procedure (step S11-1), the node having the level number L where the entry X is to be inserted is selected as a selection node (that is, an address indicating the node to be inserted when returning from the subroutine of step S11-1) Is returned as the return value). This subtree selection procedure will be described later.

次に、ステップＳ１１−１で選択されたノードのエントリ表に記載されたエントリ数が当該エントリ表のエントリの個数の最大値Ｔ２未満かどうかを判断する（ステップＳ１１−２）。Ｔ２未満である場合には、該ノードのエントリ表に該エントリを追加する（ステップＳ１１−３）。ステップＳ１１−２においてＴ２未満でなかった場合には、該ノードと該エントリを入力とするノード分割手順を実施する（ステップＳ１１−４）。ノード分割手順については後述する。次に、該ノードを入力とするキー集合情報調整手順を実施し（ステップＳ１１−５）、処理を終える。キー集合情報調整手順については後述する。 Next, it is determined whether the number of entries described in the entry table of the node selected in step S11-1 is less than the maximum value T2 of the number of entries in the entry table (step S11-2). If it is less than T2, the entry is added to the entry table of the node (step S11-3). If it is not less than T2 in step S11-2, a node division procedure is performed in which the node and the entry are input (step S11-4). The node division procedure will be described later. Next, a key set information adjustment procedure using the node as an input is performed (step S11-5), and the process ends. The key set information adjustment procedure will be described later.

図１２のステップＳ１１−１で呼び出される部分木選択手順フローを図１３に示す。部分木選択手順では、まず、変数ｃｕｒｒｅｎｔ（カレント）にルートノードのアドレスを代入し、変数ｌｖにルートノードのレベル数を代入し、リストｍｉｎ＿ｃｈｉｌｄ＿ｌｉｓｔを空にする（ステップＳ１２−１）。変数ｃｕｒｒｅｎｔは、操作対象のノードのアドレスが代入される変数である。この場合、ステップＳ１２−１でルートノードのアドレスが最初に代入される。その後、順次、操作対象のノードのレベル数ｌｖが変化する度に、ステップＳ１２−１８においてその子ノードのアドレスが代入されることで、更新されることになる。また、リストｍｉｎ＿ｃｈｉｌｄ＿ｌｉｓｔは、後述する関数Ｐ１が最低値となるエントリの番号が１または複数個格納される変数である。 FIG. 13 shows a subtree selection procedure flow called in step S11-1 in FIG. In the subtree selection procedure, first, the address of the root node is substituted into the variable current (current), the number of levels of the root node is substituted into the variable lv, and the list min_child_list is emptied (step S12-1). The variable current is a variable to which the address of the operation target node is substituted. In this case, the address of the root node is first substituted in step S12-1. Thereafter, each time the level number lv of the operation target node changes, it is updated by substituting the address of the child node in step S12-18. The list min_child_list is a variable in which one or a plurality of entry numbers having the lowest value of a function P1 described later are stored.

次に、ｌｖと入力されたレベル数Ｌの大きさを比較し（ステップＳ１２−２）、ｌｖの方が小さければ、変数ｃｈｉｌｄ＿ｎｕｍに１を代入し、リストｍｉｎ＿ｃｈｉｌｄ＿ｌｉｓｔにｃｈｉｌｄ＿ｎｕｍに入った数を追加する（ステップＳ１２−３）。次に、ｃｕｒｒｅｎｔノードのｃｈｉｌｄ＿ｎｕｍ番目のエントリと入力されたエントリＸを引数とする関数Ｐ１によりペナルティ値を計算し、変数ｍｉｎ＿ｐｅｎａｌｔｙに代入する（ステップＳ１２−４）。 Next, lv is compared with the input level number L (step S12-2). If lv is smaller, 1 is substituted into variable child_num, and the number in child_num is added to list min_child_list. (Step S12-3). Next, a penalty value is calculated by the function P1 having the child node's child_num-th entry and the input entry X as arguments, and is substituted into the variable min_penalty (step S12-4).

関数Ｐ１は以下のように定義される。ただし、Ａ，Ｂはエントリを示す。 The function P1 is defined as follows. A and B indicate entries.

Ｐ１（Ａ，Ｂ）＝Ｑ１（Ａ＋Ｂ）−Ｑ１（Ａ） P1 (A, B) = Q1 (A + B) -Q1 (A)

ここでＱ１（Ｃ）は、エントリＣに含まれるキーに含まれる属性の種類数を示す。またＡ＋Ｂは、エントリＡのキーとエントリＢのキーの和をとった新たなキーを持つエントリを示す。ここで２つのキーの和をとったキーとは、該２つのキーに含まれる属性を全て含み、各属性の値として、該属性が該２つのキーのどちらか一方に含まれる場合には、該一方のキーに含まれる該属性の値もしくは値範囲と同じ値もしくは値範囲をもち、該属性が該２つのキーの両方に含まれる場合には、該両方のキーに含まれる該属性の２つの値もしくは値範囲のどちらをも含む最小限の値もしくは値範囲をもつキーを意味する。例えばキー（ａ＝３、ｂ＝２〜４）とキー（ａ＝４〜５、ｂ＝３〜５、ｃ＝１）の和をとったキーとは、（ａ＝３〜５、ｂ＝２〜５、ｃ＝１）となる。ただし、本発明はこれに限るものではなく、例えば、２つのキーの和をとったキーに含まれる属性の値は、該２つのキーの該属性の値もしくは値範囲のどちらをも含む値もしくは値範囲とし、必ずしも最小限の値もしくは値範囲としないことも可能である。 Here, Q1 (C) indicates the number of attribute types included in the key included in the entry C. A + B indicates an entry having a new key obtained by adding the key of the entry A and the key of the entry B. Here, the key obtained by summing two keys includes all the attributes included in the two keys, and when the attribute is included in one of the two keys as the value of each attribute, If the attribute has the same value or value range as the attribute or value range included in the one key, and the attribute is included in both of the two keys, the attribute 2 included in both keys Means a key with a minimum value or value range that contains both values or value ranges. For example, a key obtained by summing a key (a = 3, b = 2-4) and a key (a = 4-5, b = 3-5, c = 1) is (a = 3-5, b = 2-5, c = 1). However, the present invention is not limited to this. For example, the value of the attribute included in the key obtained by summing two keys is a value including both the value of the attribute or the value range of the two keys. It is possible to use a value range and not necessarily a minimum value or value range.

上述したように、ステップＳ１２−４では、変数ｍｉｎ＿ｐｅｎａｌｔｙにＰ１（（ｃｈｉｌｄ＿ｎｕｍ番目のエントリ），Ｘ）（＝Ｐ１（Ａ，Ｂ））の値が代入される。すなわち、変数ｃｕｒｒｅｎｔでアドレスが指定されるノードのｃｈｉｌｄ＿ｎｕｍ番目のエントリに対し、エントリＸを挿入する際、エントリＸを挿入する前のｃｈｉｌｄ＿ｎｕｍ番目のエントリに含まれるキーに含まれる属性の種類数と、挿入した後のｃｈｉｌｄ＿ｎｕｍ番目のエントリ（すなわちｃｈｉｌｄ＿ｎｕｍ番目のエントリとエントリＸの和をとった新たなキーを持つエントリ）に含まれるキーに含まれる属性の種類数とを比べた場合の差分が、変数ｍｉｎ＿ｐｅｎａｌｔｙに代入されることになる。 As described above, in step S12-4, the value of P1 ((child_num-th entry), X) (= P1 (A, B)) is substituted into the variable min_penalty. That is, when the entry X is inserted into the child_num-th entry of the node whose address is specified by the variable current, the number of attribute types included in the key included in the child_num-th entry before the entry X is inserted; The difference when comparing the number of attribute types included in the key included in the child_num-th entry after insertion (that is, the entry having a new key obtained by adding the child_num-th entry and the entry X) is a variable. It will be assigned to min_penalty.

ステップＳ１２−４の後、ｃｕｒｒｅｎｔに対応するノード（ｃｕｒｒｅｎｔノード）のｃｈｉｌｄ＿ｎｕｍ番目のエントリが存在するかを判定し（ステップＳ１２−５）、存在する場合には、ｃｕｒｒｅｎｔノードのｃｈｉｌｄ＿ｎｕｍ番目のエントリと入力されたエントリＸを引数とする該関数Ｐ１によりペナルティ値を計算し、該ペナルティ値とｍｉｎ＿ｐｅｎａｌｔｙの大きさを比較する（ステップＳ１２−６）。 After step S12-4, it is determined whether or not the child_num-th entry of the node corresponding to the current (current node) exists (step S12-5). If there is, the child_num-th entry of the current node is input. The penalty value is calculated by the function P1 using the entry X as an argument, and the penalty value is compared with the magnitude of min_penalty (step S12-6).

ｍｉｎ＿ｐｅｎａｌｔｙの方が大きいまたは等しい場合には、再度、ｃｕｒｒｅｎｔノードのｃｈｉｌｄ＿ｎｕｍ番目のエントリと入力されたエントリＸを引数とする該関数Ｐ１によりペナルティ値を計算し、該ペナルティ値とｍｉｎ＿ｐｅｎａｌｔｙの大きさを比較し（ステップＳ１２−７）、等しくない場合には、リストｍｉｎ＿ｃｈｉｌｄ＿ｌｉｓｔを空にし、ｃｕｒｒｅｎｔノードのｃｈｉｌｄ＿ｎｕｍ番目のエントリと入力されたエントリＸを引数として該関数Ｐ１より計算したペナルティ値をｍｉｎ＿ｐｅｎａｌｔｙに代入する（ステップＳ１２−８）。 If min_penalty is greater or equal, the penalty value is again calculated by the function P1 using the child_num-th entry of the current node and the input entry X as an argument, and the magnitude of the penalty value and min_penalty are compared. However, if they are not equal, the list min_child_list is emptied, and the penalty value calculated from the function P1 is substituted for min_penalty using the child node's child_num-th entry and the input entry X as arguments (step S12-7). Step S12-8).

さらに、リストｍｉｎ＿ｃｈｉｌｄ＿ｌｉｓｔにｃｈｉｌｄ＿ｎｕｍの値を追加し（ステップＳ１２−９）、ｃｈｉｌｄ＿ｎｕｍを１つ増加させ（ステップＳ１２−１０）、ステップＳ１２−５に戻る。 Further, the value of child_num is added to the list min_child_list (step S12-9), child_num is incremented by 1 (step S12-10), and the process returns to step S12-5.

ステップＳ１２−６において、ｍｉｎ＿ｐｅｎａｌｔｙの方が小さい場合には、ステップＳ１２−１０に進む。 In step S12-6, if min_penalty is smaller, the process proceeds to step S12-10.

ステップＳ１２−７において、比較の結果、等しい場合には、ステップＳ１２−９に進む。 In step S12-7, if they are equal as a result of the comparison, the process proceeds to step S12-9.

これらのステップＳ１２−５〜Ｓ１２−１０を繰り返し実行することで、ｃｕｒｒｅｎｔノードのすべてのエントリに対して関数Ｐ１を求めるとともに、リストｍｉｎ＿ｃｈｉｌｄ＿ｌｉｓｔに関数Ｐ１が最低値となるエントリの番号が１または複数個格納されることになる。 By repeatedly executing these steps S12-5 to S12-10, the function P1 is obtained for all entries of the current node, and the number of entries having the lowest function P1 in the list min_child_list is one or more. Will be stored.

ステップＳ１２−５において、存在しない場合には、変数ｉに１を代入し、変数ｃｈｉｌｄ＿ｎｕｍにリストｍｉｎ＿ｃｈｉｌｄ＿ｌｉｓｔのｉ番目（すなわち、１番目）の数を代入し、変数ｍｉｎ＿ｃｈｉｌｄにｃｈｉｌｄ＿ｎｕｍを代入する（ステップＳ１２−１１）。 If it does not exist in step S12-5, 1 is substituted for variable i, i-th (that is, first) number of list min_child_list is substituted for variable child_num, and child_num is substituted for variable min_child (step S12). -11).

ｃｕｒｒｅｎｔノードのｃｈｉｌｄ＿ｎｕｍ番目のエントリと入力されたエントリＸを引数とする関数Ｐ２によりペナルティ値を計算し、変数ｍｉｎ＿ｐｅｎａｌｔｙに代入する（ステップＳ１２−１２）。 A penalty value is calculated by the function P2 using the child_num-th entry of the current node and the input entry X as an argument, and is substituted for the variable min_penalty (step S12-12).

関数Ｐ２は以下のように定義される。ただし、Ａ，Ｂはエントリを示す。 The function P2 is defined as follows. A and B indicate entries.

Ｐ２（Ａ，Ｂ）＝Ｑ２（Ａ＋Ｂ）−Ｑ２（Ａ） P2 (A, B) = Q2 (A + B) -Q2 (A)

ここでＱ２（Ｃ）は、エントリＣに含まれるキーに含まれる各属性に対する重みづけ付き正規化被検索面積の和であり、エントリＣが検索される確率を示す。またＡ＋Ｂは、エントリＡのキーとエントリＢのキーの和をとった新たなキーを持つエントリを示す。この関数Ｐ２は、「該操作対象ノードがすでに含む既存エントリに対し、該エントリＸを挿入した場合のペナルティ」を表す関数であり、該操作対象ノードがすでに含む既存エントリＡに対し、エントリＢを挿入する際、該エントリＢを挿入する前の該エントリＡと、挿入した後の該エントリＡとを比べた場合の、該エントリＡが検索される確率に対応した値（すなわち検索される確率ないし検索される確率の近似値）の増分を示す値して定義されるものである。 Here, Q2 (C) is the sum of the weighted normalized search area for each attribute included in the key included in entry C, and indicates the probability that entry C is searched. A + B indicates an entry having a new key obtained by adding the key of the entry A and the key of the entry B. This function P2 is a function representing “a penalty when the entry X is inserted with respect to the existing entry already included in the operation target node”, and the entry B is set to the existing entry A already included in the operation target node. At the time of insertion, the value corresponding to the probability that the entry A is searched when the entry A before the insertion of the entry B is compared with the entry A after the insertion (that is, the probability or It is defined as a value indicating an increase in the approximate value of the probability to be searched).

またＱ２（Ｃ）は以下のように定義される。 Q2 (C) is defined as follows.

Ｑ２（Ｃ）＝Σ_Ｃｉ（Ｓ（Ｃｉ）・Ｒ（Ｃｉ）） Q2 (C) = _ΣCi (S (Ci) · R (Ci))

ＣｉはエントリＣが持つキーに含まれる属性を示す。Σ_Ｃｉ（Ｄ）は、すべてのＣｉについてのＤの総和を意味する。Ｓ（Ｃｉ）は、正規化被検索面積であり、前記数式Ｉと同じく、次式で定義される。 Ci indicates an attribute included in the key of entry C. Σ _Ci (D) means the sum of D for all Ci. S (Ci) is a normalized search area, and is defined by the following equation as in Equation I.

Ｓ（Ｃｉ）＝（（ｑＣｉ−ａＣｉ）（ｂＣｉ−ｐＣｉ）−（ｑＣｉ−ｐＣｉ）^２／２）／（（ｂＣｉ−ａＣｉ）^２／２） S (Ci) = ((qCi -aCi) (bCi-pCi) - (qCi-pCi) 2/2) / ((bCi-aCi) 2/2)

該ｐＣｉ、ｑＣｉはそれぞれ、該キーに記述された属性Ｃｉの最小値、最大値であり、該ａＣｉ、ｂＣｉはそれぞれ、それまで該検索木に挿入された全タプル集合Ｙにおける属性Ｃｉの最小値、最大値である。ａＣｉ、ｂＣｉは属性表から得ることができる。ただし、ｂＣｉ−ａＣｉ＝０となる場合には、Ｓ（Ｃｉ）はあらかじめ設定された定数値をとるものとする。 The pCi and qCi are the minimum value and maximum value of the attribute Ci described in the key, respectively. The aCi and bCi are the minimum value of the attribute Ci in the entire tuple set Y that has been inserted into the search tree so far. The maximum value. aCi and bCi can be obtained from the attribute table. However, when bCi−aCi = 0, S (Ci) takes a preset constant value.

Ｒ（Ｃｉ）は、属性Ｃｉが検索式に用いられる確率であり、以下のように定義される。 R (Ci) is a probability that the attribute Ci is used in the search formula, and is defined as follows.

Ｒ（Ｃｉ）＝（全タプル集合ＹにおけるＣｉの出現回数）／（全タプル集合Ｙの要素数） R (Ci) = (Number of occurrences of Ci in all tuple sets Y) / (number of elements in all tuple sets Y)

全タプル集合ＹにおけるＣｉの出現回数は、属性表から得ることができる。 The number of occurrences of Ci in all tuple sets Y can be obtained from the attribute table.

ステップＳ１２−１２では、変数ｍｉｎ＿ｐｅｎａｌｔｙにＰ２（（ｃｈｉｌｄ＿ｎｕｍ番目のエントリ），Ｘ）の値が代入される。すなわち、変数ｃｕｒｒｅｎｔでアドレスが指定されるノードのｃｈｉｌｄ＿ｎｕｍ番目のエントリに対し、エントリＸを挿入する際、エントリＸを挿入する前のｃｈｉｌｄ＿ｎｕｍ番目のエントリと、挿入した後のｃｈｉｌｄ＿ｎｕｍ番目のエントリとを比べた場合の、ｃｈｉｌｄ＿ｎｕｍ番目のエントリが検索される確率に対応した値の増分を示す値が、変数ｍｉｎ＿ｐｅｎａｌｔｙに代入されることになる。ただし、ステップＳ１２−１１で変数ｉに１が代入され、変数ｃｈｉｌｄ＿ｎｕｍにリストｍｉｎ＿ｃｈｉｌｄ＿ｌｉｓｔのｉ番目の数すなわち１番目の数が代入されているので、ステップＳ１２−１２で関数Ｐ２を求める際には、変数ｃｕｒｒｅｎｔでアドレスが指定されるノードのリストｍｉｎ＿ｃｈｉｌｄ＿ｌｉｓｔの１番目の数が示す番号のエントリに対し、エントリＸを挿入する際、エントリＸを挿入する前の当該エントリと、挿入した後の当該エントリとを比べた場合の、当該エントリが検索される確率に対応した値の増分を示す値が、変数ｍｉｎ＿ｐｅｎａｌｔｙに代入されることになる。 In step S12-12, the value of P2 ((child_num-th entry), X) is substituted into the variable min_penalty. That is, when the entry X is inserted into the child_num-th entry of the node whose address is specified by the variable current, the child_num-th entry before the entry X is inserted is compared with the child_num-th entry after the insertion. In this case, a value indicating the increment of the value corresponding to the probability that the child_num-th entry is searched is substituted into the variable min_penalty. However, since 1 is substituted into the variable i in step S12-11 and the i-th number of the list min_child_list, that is, the first number is substituted into the variable child_num, when obtaining the function P2 in step S12-12, When the entry X is inserted into the entry having the number indicated by the first number in the node list min_child_list whose address is specified by the variable current, the entry before the entry X is inserted, and the entry after the insertion , The value indicating the increment of the value corresponding to the probability that the entry is searched is substituted into the variable min_penalty.

次に、リストｍｉｎ＿ｃｈｉｌｄ＿ｌｉｓｔのｉ番目の数が存在するか否かを判定し（ステップＳ１２−１３）、存在する場合には、リストｍｉｎ＿ｃｈｉｌｄ＿ｌｉｓｔのｉ番目の数をｃｈｉｌｄ＿ｎｕｍに代入し（ステップＳ１２−１４）、ｃｕｒｒｅｎｔノードのｃｈｉｌｄ＿ｎｕｍ番目のエントリと入力されたエントリＸを引数として関数Ｐ２により計算したペナルティ値とｍｉｎ＿ｐｅｎａｌｔｙの大きさを比較し（ステップＳ１２−１５）、ｍｉｎ＿ｐｅｎａｌｔｙの方が大きいまたは等しい場合には、ｃｈｉｌｄ＿ｎｕｍの値をｍｉｎ＿ｃｈｉｌｄに代入し、ｃｕｒｒｅｎｔノードのｃｈｉｌｄ＿ｎｕｍ番目のエントリと入力されたエントリＸを引数として関数Ｐ２により計算したペナルティ値をｍｉｎ＿ｐｅｎａｌｔｙに代入する（ステップＳ１２−１６）。そしてｉを１増加させ（ステップＳ１２−１７）、ステップＳ１２−１３に戻る。 Next, it is determined whether or not the i-th number of the list min_child_list exists (step S12-13). If it exists, the i-th number of the list min_child_list is substituted for child_num (step S12-14). , The child node's child_num-th entry and the input entry X as an argument are compared with the penalty value calculated by the function P2 and the magnitude of min_penalty (step S12-15). If min_penalty is greater or equal, The value of child_num is substituted for min_child, and the penalty value calculated by the function P2 using the child node's child_num-th entry and the input entry X as an argument is substituted for min_penalty. (Step S12-16). Then, i is incremented by 1 (step S12-17), and the process returns to step S12-13.

ステップＳ１２−１５において、ｍｉｎ＿ｐｅｎａｌｔｙの方が小さい場合には、ステップＳ１２−１７に進む。 In step S12-15, if min_penalty is smaller, the process proceeds to step S12-17.

このステップＳ１２−１６を繰り返し行うことで、変数ｍｉｎ＿ｐｅｎａｌｔｙが最も小さい関数Ｐ２の計算結果で更新されるとともに、変数ｍｉｎ＿ｃｈｉｌｄにその最も小さい関数Ｐ２の計算結果が得られたエントリの番号が代入されることになる。 By repeating this step S12-16, the variable min_penalty is updated with the calculation result of the smallest function P2, and the number of the entry from which the calculation result of the smallest function P2 is obtained is substituted into the variable min_child. become.

ステップＳ１２−１３において、存在しない場合には、ｃｕｒｒｅｎｔノードのｍｉｎ＿ｃｈｉｌｄ番目のエントリに含まれる子ノードのアドレスをｃｕｒｒｅｎｔに代入し、該子ノードを新たなｃｕｒｒｅｎｔノードとし、ｌｖを１減らし（ステップＳ１２−１８）、ステップＳ１２−２に戻る。 In step S12-13, if it does not exist, the address of the child node included in the min_child-th entry of the current node is substituted into current, the child node becomes a new current node, and lv is reduced by 1 (step S12- 18) Return to step S12-2.

上記のステップＳ１２−１３、Ｓ１２−１４、Ｓ１２−１５、Ｓ１２−１６、Ｓ１２−１７を繰り返し行うことで、ｃｕｒｒｅｎｔノードに含まれる全エントリのうち、関数Ｐ１を用いた判定処理によってリストｍｉｎ＿ｃｈｉｌｄ＿ｌｉｓｔにリストアップされたすべてのエントリについて、エントリＸを挿入した場合の関数Ｐ２の値が求められる。そして、そのリストアップされたすべてのエントリについて計算が終了した時点で、変数ｍｉｎ＿ｃｈｉｌｄに最も小さい関数Ｐ２の計算結果が得られたエントリの番号が代入されることになる。そして、そのリストアップされたすべてのエントリについての関数Ｐ２の計算および比較が終了すると、ステップＳ１２−１８でｃｕｒｒｅｎｔノードのｍｉｎ＿ｃｈｉｌｄ番目の子ノードのアドレスが変数ｃｕｒｒｅｎｔに代入され、ｌｖの大きさが１減らされ、ステップＳ１２−２へ戻ることになる。したがって、その後、関数Ｐ２の値が最も小さくなるエントリで指定された子ノードに対して、同様に、関数Ｐ１による挿入後のエントリに含まれるキーに含まれる属性の種類数が小さいエントリのリストアップと、リストアップされた全てのエントリに対する関数Ｐ２の計算と、その計算結果に基づいて関数Ｐ２の値が最も小さくなるエントリを求める処理が行われることになる。 By repeatedly performing the above steps S12-13, S12-14, S12-15, S12-16, and S12-17, the list min_child_list is listed in the list min_child_list by the determination process using the function P1 among all the entries included in the current node. For all entries that have been uploaded, the value of the function P2 when the entry X is inserted is obtained. Then, when the calculation for all the entries listed is completed, the number of the entry from which the calculation result of the smallest function P2 is obtained is substituted for the variable min_child. When the calculation and comparison of the function P2 for all the listed entries are completed, the address of the min_child-th child node of the current node is substituted into the variable current in step S12-18, and the size of lv is 1 It is decremented and it returns to step S12-2. Therefore, for the child node designated by the entry having the smallest value of the function P2, the list of entries having the small number of attribute types included in the key included in the entry after the insertion by the function P1 is similarly applied. Then, the function P2 is calculated for all the entries listed, and the process for obtaining the entry having the smallest value of the function P2 is performed based on the calculation result.

ステップＳ１２−２において、Ｌの方が大きいまたは等しい場合には、ｃｕｒｒｅｎｔノードを選択することとし、ｃｕｒｒｅｎｔを返り値とし（ステップＳ１２−１９）、処理を終了する。この例では、Ｌ＝０なので（すなわちＬにはリーフノードのレベル数が代入されているので）、ｃｕｒｒｅｎｔノードのレベル数ｌｖが０に到達したことろで、すなわち変数ｃｕｒｒｅｎｔでアドレスが指定されるノードがリーフノードとなったところで処理が終了する。 In step S12-2, if L is larger or equal, the current node is selected, and current is returned as the return value (step S12-19), and the process is terminated. In this example, since L = 0 (that is, the number of leaf node levels is substituted for L), the address is specified by the variable current when the level number lv of the current node has reached 0. The process ends when the node becomes a leaf node.

次に、図１２のステップＳ１１−４で呼び出されるノード分割手順フローを図１４〜図１６に示す。このノード分割手順は、分割対象のノード（ノードＮおよびそのアドレスをＮとする。）とエントリＸとを引数として呼び出される処理である。なお、図１４〜図１６のフローは、結合子ＳまたはＴによって互いに結合されている。 Next, FIG. 14 to FIG. 16 show the node division procedure flow called in step S11-4 in FIG. This node division procedure is a process called by using a node to be divided (node N and its address as N) and entry X as arguments. 14 to 16 are connected to each other by a connector S or T.

まず、入力されたノードアドレスＮに対応するノード（ノードＮ）のエントリ表に、入力されたエントリＸを追加する（ステップＳ１３−１）。図１２のステップＳ１１−４から呼び出される場合、分割対象ノードはリーフノードなので、リーフノードのエントリ表にエントリＸが追加されることになる。 First, the input entry X is added to the entry table of the node (node N) corresponding to the input node address N (step S13-1). When called from step S11-4 in FIG. 12, since the node to be divided is a leaf node, an entry X is added to the entry table of the leaf node.

変数ｉ、ｊにそれぞれ１、２を代入し（ステップＳ１３−２）、下記式によりｄ＿ｍａｘの初期値を計算する（ステップＳ１３−３）。ここで、変数ｉおよびｊは、分割対象ノードにおける処理対象エントリの番号を一時的に記憶するための変数である。変数ｄ＿ｍａｘは関数Ｐ１等を用いた演算結果の最大値を一時的に記憶するための変数である。このステップＳ１３−２〜Ｓ１３−３では、それらの変数が初期化されている。 1 and 2 are substituted for variables i and j, respectively (step S13-2), and an initial value of d_max is calculated by the following equation (step S13-3). Here, the variables i and j are variables for temporarily storing the number of the processing target entry in the division target node. The variable d_max is a variable for temporarily storing the maximum value of the calculation result using the function P1 or the like. In these steps S13-2 to S13-3, those variables are initialized.

ｄ＿ｍａｘ＝Ｐ１（ノードＮのｉ番目のエントリ，ノードＮのｊ番目のエントリ）−Ｑ１（ノードＮのｊ番目のエントリ） d_max = P1 (i-th entry of node N, j-th entry of node N) −Q1 (j-th entry of node N)

上述した関数Ｐ１の定義から、変数ｄ＿ｍａｘは、ｄ＿ｍａｘ＝Ｐ１（ノードＮのｉ番目のエントリ，ノードＮのｊ番目のエントリ）−Ｑ１（ノードＮのｊ番目のエントリ）＝Ｑ１（ノードＮのｉ番目のエントリ＋ノードＮのｊ番目のエントリ）−Ｑ１（ノードＮのｉ番目のエントリ）−Ｑ１（ノードＮのｊ番目のエントリ）と表される。 From the definition of the function P1 described above, the variable d_max is defined as d_max = P1 (i-th entry of node N, j-th entry of node N) −Q1 (j-th entry of node N) = Q1 (i of node N -Th entry + j-th entry of node N) -Q1 (i-th entry of node N) -Q1 (j-th entry of node N).

さらに、リストｄ＿ｍａｘ＿ｐａｉｒｓを空にし（ステップＳ１３−４）、ノードＮのエントリ表のｊ番目のエントリが存在するか否かを判定する（ステップＳ１３−５）。このリストｄ＿ｍａｘ＿ｐａｉｒｓは、ｄ＿ｍａｘが最大となる１又は複数のエントリの組の番号（上記のｉおよびｊの組）が格納される変数である。 Further, the list d_max_pairs is emptied (step S13-4), and it is determined whether or not the jth entry in the entry table of the node N exists (step S13-5). This list d_max_pairs is a variable in which the number of a set of one or a plurality of entries that maximizes d_max (the set of i and j described above) is stored.

ステップＳ１３−５において、存在する場合には、ｉとｊの大きさを比較し（ステップＳ１３−６）、等しくない場合には、下記ｄを計算する（ステップＳ１３−７）。 In step S13-5, if it exists, the magnitudes of i and j are compared (step S13-6). If they are not equal, the following d is calculated (step S13-7).

ｄ＝Ｐ１（ノードＮのｉ番目のエントリ，ノードＮのｊ番目のエントリ）−Ｑ１（ノードＮのｊ番目のエントリ） d = P1 (i-th entry of node N, j-th entry of node N) −Q1 (j-th entry of node N)

上述した関数Ｐ１の定義から、変数ｄは、変数ｄ＿ｍａｘと同様に、ｄ＝Ｐ１（ノードＮのｉ番目のエントリ，ノードＮのｊ番目のエントリ）−Ｑ１（ノードＮのｊ番目のエントリ）＝Ｑ１（ノードＮのｉ番目のエントリ＋ノードＮのｊ番目のエントリ）−Ｑ１（ノードＮのｉ番目のエントリ）−Ｑ１（ノードＮのｊ番目のエントリ）と表される。 From the definition of the function P1, the variable d is d = P1 (i-th entry of node N, j-th entry of node N) −Q1 (j-th entry of node N) = Q1 (i-th entry of node N + j-th entry of node N) −Q1 (i-th entry of node N) −Q1 (j-th entry of node N)

次に、ｄとｄ＿ｍａｘの大きさを比較し（ステップＳ１３−８）、ｄ＿ｍａｘの方が小さいまたは等しい場合には、再度ｄとｄ＿ｍａｘの大きさを比較し（ステップＳ１３−９）、等しくない場合には、リストｄ＿ｍａｘ＿ｐａｉｒｓを空にし、ｄをｄ＿ｍａｘに代入する（ステップＳ１３−１０）。さらに、リストｄ＿ｍａｘ＿ｐａｉｒｓに（ｉ，ｊ）の値のペアを追加し（ステップＳ１３−１１）、ｉを１増加させ（ステップＳ１３−１２）、ステップＳ１３−６に戻る。 Next, the magnitudes of d and d_max are compared (step S13-8). If d_max is smaller or equal, the magnitudes of d and d_max are compared again (step S13-9). The list d_max_pairs is emptied and d is substituted for d_max (step S13-10). Further, a pair of values (i, j) is added to the list d_max_pairs (step S13-11), i is incremented by 1 (step S13-12), and the process returns to step S13-6.

ステップＳ１３−８において、ｄ＿ｍａｘの方が大きい場合には、ステップＳ１３−１２に進む。 In step S13-8, when d_max is larger, the process proceeds to step S13-12.

ステップＳ１３−９において、比較の結果、等しい場合には、ステップＳ１３−１１に進む。 In step S13-9, if the result of comparison is equal, the process proceeds to step S13-11.

ステップＳ１３−６において、比較の結果、等しい場合には、ｉに１を代入し（ステップＳ１３−１３）、さらにｊを１増加させ（ステップＳ１３−１４）、ステップＳ１３−５に戻る。 In step S13-6, if they are equal as a result of the comparison, 1 is substituted for i (step S13-13), j is further increased by 1 (step S13-14), and the process returns to step S13-5.

これらのステップＳ１３−６〜Ｓ１３−１４の処理によって、分割対象のノードＮの全エントリ（すなわちノードＮの既存のエントリにエントリＸを追加したエントリの集合）から、上記ｄの値を最大とする２つのエントリが１または複数組、選択され、リストｄ＿ｍａｘ＿ｐａｉｒｓに選択されたエントリのペアを示す値（番号）が格納される。 Through the processing of these steps S13-6 to S13-14, the value of d is maximized from all entries of the node N to be divided (that is, a set of entries in which the entry X is added to the existing entries of the node N). One or plural sets of two entries are selected, and a value (number) indicating a pair of selected entries is stored in the list d_max_pairs.

ステップＳ１３−５において、存在しない場合には、変数ｋ、ｄ＿ｍａｘ、ｉ＿ｍａｘ、ｊ＿ｍａｘに、それぞれ１、０、０、０を代入する（ステップＳ１３−１５）。ここで、変数ｋは、リストｄ＿ｍａｘ＿ｐａｉｒｓ内の値（要素）の格納位置を示す番号を一時的に記憶するための変数である。変数ｉ＿ｍａｘおよびｊ＿ｍａｘは関数Ｐ２等を用いた演算結果が最大となるエントリの番号の組を一時的に記憶するための変数である。変数ｄ＿ｍａｘは関数Ｐ２等を用いた演算結果の最大値を一時的に記憶するための変数である。このステップＳ１３−１５では、それらの変数が初期化されている。 If it does not exist in step S13-5, 1, 0, 0, and 0 are assigned to variables k, d_max, i_max, and j_max, respectively (step S13-15). Here, the variable k is a variable for temporarily storing a number indicating a storage position of a value (element) in the list d_max_pairs. Variables i_max and j_max are variables for temporarily storing a set of entry numbers having the maximum calculation result using the function P2 or the like. The variable d_max is a variable for temporarily storing the maximum value of the calculation result using the function P2 or the like. In step S13-15, these variables are initialized.

なお、ステップＳ１３−４〜ステップＳ１３−１５の手順は、「ノードＮが含むエントリのうち、どちらのエントリにも含まれる属性の種類数が最も少ない２つのエントリのペア」を調べ、その２つのエントリの番号をリストｄ＿ｍａｘ＿ｐａｉｒｓに入れる手順と言えるが、本発明の範囲はそれに限らず、この手順を「ノードＮが含むエントリのうち、一方のエントリにしか含まれない属性の種類数が最も多い２つのエントリのペア」を調べ、その２つのエントリの番号をリストｄ＿ｍａｘ＿ｐａｉｒｓに入れる手順に差し替えても良い。そのためには、ステップＳ１３−７で用いたｄのかわりに、下記ｄを用いれば良い。 The procedure from step S13-4 to step S13-15 is performed by examining “a pair of two entries having the smallest number of attribute types included in both entries among the entries included in node N”. Although it can be said that the number of the entry is entered in the list d_max_pairs, the scope of the present invention is not limited to this, and this procedure is defined as “the largest number of attribute types included in only one of the entries included in the node N 2. One entry pair "may be examined, and the number of the two entries may be replaced with a procedure for entering the list d_max_pairs. For this purpose, the following d may be used instead of d used in step S13-7.

ｄ＝Ｐ１（ノードＮのｉ番目のエントリ，ノードＮのｊ番目のエントリ）＋Ｐ１（ノードＮのｊ番目のエントリ，ノードＮのｉ番目のエントリ） d = P1 (i-th entry of node N, j-th entry of node N) + P1 (j-th entry of node N, i-th entry of node N)

次にリストｄ＿ｍａｘ＿ｐａｉｒｓのｋ番目の要素が存在するか否かを判定し（ステップＳ１３−１６）、存在する場合には、リストｄ＿ｍａｘ＿ｐａｉｒｓのｋ番目の要素であるペアを構成する２つの数をそれぞれｉ、ｊに代入する（ステップＳ１３−１７）。 Next, it is determined whether or not the k-th element of the list d_max_pairs exists (step S13-16). If there is, the two numbers constituting the pair that is the k-th element of the list d_max_pairs are respectively set to i. , J is substituted (step S13-17).

さらに下記ｄを計算する（ステップＳ１３−１８）。 Further, the following d is calculated (step S13-18).

ｄ＝Ｐ２（ノードＮのｉ番目のエントリ，ノードＮのｊ番目のエントリ）−Ｑ２（ノードＮのｊ番目のエントリ） d = P2 (i-th entry of node N, j-th entry of node N) −Q2 (j-th entry of node N)

上述した関数Ｐ２の定義から、変数ｄは、ｄ＝Ｐ２（ノードＮのｉ番目のエントリ，ノードＮのｊ番目のエントリ）−Ｑ２（ノードＮのｊ番目のエントリ）＝Ｑ２（ノードＮのｉ番目のエントリ＋ノードＮのｊ番目のエントリ）−Ｑ２（ノードＮのｉ番目のエントリ）−Ｑ２（ノードＮのｊ番目のエントリ）と表される。 From the definition of the function P2 described above, the variable d is d = P2 (i-th entry of node N, j-th entry of node N) −Q2 (j-th entry of node N) = Q2 (i of node N -Th entry + j-th entry of node N) -Q2 (i-th entry of node N) -Q2 (j-th entry of node N).

ｄとｄ＿ｍａｘの大きさを比較し（ステップＳ１３−１９）、ｄ＿ｍａｘの方が小さいまたは等しい場合には、ｉ＿ｍａｘ、ｊ＿ｍａｘ、ｄ＿ｍａｘにそれぞれｉ、ｊ、ｄの値を代入する（ステップＳ１３−２０）。さらに、ｋを１増加させ（ステップＳ１３−２１）、ステップＳ１３−１６に戻る。 The magnitudes of d and d_max are compared (step S13-19), and if d_max is smaller or equal, the values of i, j, and d are substituted for i_max, j_max, and d_max, respectively (step S13-20). . Further, k is incremented by 1 (step S13-21), and the process returns to step S13-16.

ステップＳ１３−１９において、ｄ＿ｍａｘの方が大きい場合には、ステップＳ１３−２１に進む。 In step S13-19, when d_max is larger, the process proceeds to step S13-21.

これらのステップＳ１３−６〜Ｓ１３−２１の処理を繰り返し実行することで、リストｄ＿ｍａｘ＿ｐａｉｒｓに格納されているすべてのエントリのペアにおいて、上記ｄを最大とする１対のエントリの番号が変数ｉ＿ｍａｘおよびｊ＿ｍａｘに代入されることになる。 By repeatedly executing the processing of these steps S13-6 to S13-21, in all the entry pairs stored in the list d_max_pairs, the number of a pair of entries maximizing the above d is the variables i_max and j_max. Will be assigned to.

ステップＳ１３−１６において、存在しない場合には、ノードＮのｉ＿ｍａｘ番目のエントリをエントリのグループＧ１に入れ、ｊ＿ｍａｘ番目のエントリをグループＧ２に入れ、さらに両者ともノードＮのエントリ表から削除する（ステップＳ１３−２２）。ここで、Ｇ１とＧ２が含むエントリ集合は、初期状態として空集合であり、ステップＳ１３−２２が終わった時点でそれぞれ１つずつエントリを含むこととなる。 If it does not exist in step S13-16, the i_maxth entry of node N is put in the group G1 of entries, the j_maxth entry is put in group G2, and both are deleted from the entry table of node N (step S13-16). S13-22). Here, the entry set included in G1 and G2 is an empty set as an initial state, and includes one entry each when step S13-22 is completed.

ノードＮにおけるエントリ表のエントリの個数の最小値Ｔ１からＧ１内のエントリ数を引いた値が、ノードＮのエントリ表内に残っているエントリ数に比べて等しいまたは大きいか否かを判定し（ステップＳ１３−２３）、等しいか大きい場合には、ノードＮのエントリ表内に残っているエントリを全てグループＧ１に入れ、エントリ表内からは削除する（ステップＳ１３−２４）。 It is determined whether the value obtained by subtracting the number of entries in G1 from the minimum value T1 of the number of entries in the entry table at node N is equal to or greater than the number of entries remaining in the entry table of node N ( Step S13-23) If equal or larger, all entries remaining in the entry table of the node N are put into the group G1 and deleted from the entry table (step S13-24).

ステップＳ１３−２３において、否と判定された場合には、ステップＳ１３−２４を実施しない。 If it is determined as NO in step S13-23, step S13-24 is not performed.

さらにＴ１からＧ２内のエントリ数を引いた値が、ノードＮのエントリ表内に残っているエントリ数に比べて等しいまたは大きいか否かを判定し（ステップＳ１３−２５）、等しいか大きい場合には、ノードＮのエントリ表内に残っているエントリを全てグループＧ２に入れ、エントリ表内からは削除する（ステップＳ１３−２６）。 Further, it is determined whether or not the value obtained by subtracting the number of entries in G2 from T1 is equal to or greater than the number of entries remaining in the entry table of node N (step S13-25). Puts all entries remaining in the entry table of the node N into the group G2 and deletes them from the entry table (step S13-26).

ステップＳ１３−２５において、否と判定された場合には、ステップＳ１３−２６を実施しない。 If it is determined as NO in step S13-25, step S13-26 is not performed.

これらのステップＳ１３−２３〜Ｓ１３−２６の処理では、ノードＮのエントリ表から、グループＧ１およびグループＧ２に代入される（振り分けられる）エントリ集合の個数が、各ノードにおけるエントリの最小値Ｔ１を下回らないようにするための処理である。後述するステップＳ１３−４７またはＳ１３−４８では、ノードＮのエントリ表に含まれるエントリが順次、グループＧ１またはグループＧ２に振り分けられるとともに、ノードＮのエントリ表からは削除されることになる。その際、例えば一方のグループに振り分けが偏ってしまうような場合があったとしても、ステップＳ１３−２３〜Ｓ１３−２６の処理によって各グループＧ１、Ｇ２には少なくともＴ１個のエントリが代入されるようになっている。 In the processing of these steps S13-23 to S13-26, the number of entry sets to be assigned (distributed) to the group G1 and the group G2 from the entry table of the node N falls below the minimum entry value T1 at each node. This is a process for avoiding this. In step S13-47 or S13-48, which will be described later, the entries included in the entry table of node N are sequentially distributed to group G1 or group G2 and deleted from the entry table of node N. At this time, for example, even if there is a case where the distribution is biased to one group, at least T1 entries are assigned to the groups G1 and G2 by the processing of steps S13-23 to S13-26. It has become.

ｉ、ｄ＿ｍａｘにそれぞれ１、０を代入し、リストｍａｓ＿ｅｎｔｒｙを空にし（ステップＳ１３−２７）、ノードＮのエントリ表内のｉ番目のエントリＥｉが存在するか否かを判定する（ステップＳ１３−２８）。ここで、変数ｉは、分割対象ノードにおける処理対象エントリの番号を一時的に記憶するための変数である。変数ｄ＿ｍａｘは関数Ｐ１等を用いたステップＳ１３−２９での演算結果の最大値を一時的に記憶するための変数である。リストｍａｓ＿ｅｎｔｒｙは、変数ｄ＿ｍａｘに対応するエントリの番号を１または複数記憶する変数である。このステップＳ１３−２７では、それらの変数が初期化されている。 1 and 0 are assigned to i and d_max, respectively, the list mas_entry is emptied (step S13-27), and it is determined whether or not the i-th entry Ei in the entry table of the node N exists (step S13-28). ). Here, the variable i is a variable for temporarily storing the number of the processing target entry in the division target node. The variable d_max is a variable for temporarily storing the maximum value of the calculation result in step S13-29 using the function P1 or the like. The list mas_entry is a variable that stores one or a plurality of entry numbers corresponding to the variable d_max. In step S13-27, these variables are initialized.

ステップＳ１３−２８において存在する場合には、下記ｄを計算する（ステップＳ１３−２９）。ただし（Ｇ１）、（Ｇ２）は、グループＧ１、Ｇ２に含まれる全エントリのキーの和をとった新たなキーを持つエントリを示す。 If it exists in step S13-28, the following d is calculated (step S13-29). However, (G1) and (G2) indicate entries having new keys obtained by summing the keys of all the entries included in the groups G1 and G2.

ｄ＝｜Ｐ１（（Ｇ１），Ｅｉ）−Ｐ１（（Ｇ２），Ｅｉ）｜ d = | P1 ((G1), Ei) −P1 ((G2), Ei) |

このｄの式は、上記で定義した関数Ｐ１において、グループＧ１に含まれる全エントリのキーの和をとった新たなキーを持つエントリを既存エントリＡとし、エントリＥｉを挿入するエントリＢとして求めた関数Ｐ１の値と、グループＧ２に含まれる全エントリのキーの和をとった新たなキーを持つエントリを既存エントリＡとし、エントリＥｉを挿入するエントリＢとして求めた関数Ｐ１の値との差の絶対値を変数ｄに代入する演算である。すなわち、ノードＮのエントリ表にまだ残されているエントリのうちのｉ番目のエントリについて、グループＧ１に挿入する場合のペナルティＰ１とグループＧ２に挿入する場合のペナルティＰ１との差の大きさが変数ｄに代入されることになる。 This expression of d is obtained as an entry B having a new key obtained by summing the keys of all entries included in the group G1 in the function P1 defined above as an entry B into which the entry Ei is inserted. The difference between the value of the function P1 and the value of the function P1 obtained as the entry B having the new key obtained by adding the keys of all the entries included in the group G2 as the existing entry A and the entry B into which the entry Ei is inserted. This is an operation for substituting the absolute value into the variable d. That is, the magnitude of the difference between the penalty P1 when inserting into the group G1 and the penalty P1 when inserting into the group G2 for the i-th entry still remaining in the entry table of the node N is a variable. will be assigned to d.

ｄとｄ＿ｍａｘの大きさを比較し（ステップＳ１３−３０）、ｄ＿ｍａｘの方が小さいまたは等しい場合には、再度ｄとｄ＿ｍａｘの大きさを比較し（ステップＳ１３−３１）、等しくない場合には、リストｍａｘ＿ｅｎｔｒｙを空にし、ｄ＿ｍａｘにｄを代入し（ステップＳ１３−３２）、さらにリストｍａｘ＿ｅｎｔｒｙにｉを追加する（ステップＳ１３−３３）。ｉを１増加させ（ステップＳ１３−３４）、ステップＳ１３−２８に戻る。ステップＳ１３−３０において、ｄ＿ｍａｘの方が大きい場合には、ステップＳ１３−３４に進む。 The magnitudes of d and d_max are compared (step S13-30). If d_max is smaller or equal, the magnitudes of d and d_max are compared again (step S13-31). The list max_entry is emptied, d is substituted for d_max (step S13-32), and i is further added to the list max_entry (step S13-33). i is increased by 1 (step S13-34), and the process returns to step S13-28. In step S13-30, if d_max is larger, the process proceeds to step S13-34.

ステップＳ１３−３１において、等しい場合には、ステップＳ１３−３３に進む。 If equal in step S13-31, the process proceeds to step S13-33.

これらのステップＳ１３−２８〜Ｓ１３−３４の処理を繰り返し実行することで、エントリＮのエントリ表内のすべてのエントリのうち上記ｄを最大とするエントリの番号が１または複数個、リストｍａｘ＿ｅｎｔｒｙに代入されることになる。 By repeatedly executing the processes in steps S13-28 to S13-34, one or a plurality of entries having the maximum d is assigned to the list max_entry among all entries in the entry table of entry N. Will be.

ステップＳ１３−２８において、存在しない場合には、まず、変数ｉ、ｉ＿ｍａｘ、ｄ＿ｍａｘにそれぞれ、１、１、０を代入する（ステップＳ１３−３５）。ここで、変数ｉは、リストｍａｘ＿ｅｎｔｒｙ内の値（要素）の格納位置を示す番号を一時的に記憶するための変数である。変数ｉ＿ｍａｘは関数Ｐ２等を用いたステップＳ１３−３８での演算結果が最大となるエントリの番号を一時的に記憶するための変数である。変数ｄ＿ｍａｘは関数Ｐ２等を用いたステップＳ１３−３８での演算結果の最大値を一時的に記憶するための変数である。このステップＳ１３−３５では、それらの変数が初期化されている。 If it does not exist in step S13-28, first, 1, 1, and 0 are assigned to variables i, i_max, and d_max, respectively (step S13-35). Here, the variable i is a variable for temporarily storing a number indicating a storage position of a value (element) in the list max_entry. The variable i_max is a variable for temporarily storing the number of the entry having the maximum calculation result in step S13-38 using the function P2 or the like. The variable d_max is a variable for temporarily storing the maximum value of the calculation result in step S13-38 using the function P2 or the like. In step S13-35, these variables are initialized.

次にリストｍａｘ＿ｅｎｔｒｙのｉ番目の要素（数）が存在するか否かを判定する（ステップＳ１３−３６）。存在する場合には、リストｍａｘ＿ｅｎｔｒｙのｉ番目の数を変数ｃに代入する（ステップＳ１３−３７）。 Next, it is determined whether or not the i-th element (number) of the list max_entry exists (step S13-36). If it exists, the i-th number in the list max_entry is substituted into the variable c (step S13-37).

さらに、下記ｄを計算する（ステップＳ１３−３８）。ただし、ＥｃはノードＮのエントリ表のｃ番目のエントリを示す。 Further, the following d is calculated (step S13-38). Ec indicates the c-th entry in the entry table of node N.

ｄ＝｜Ｐ２（（Ｇ１），Ｅｃ）−Ｐ２（（Ｇ２），Ｅｃ）｜ d = | P2 ((G1), Ec) −P2 ((G2), Ec) |

このｄの式は、上記で定義した関数Ｐ２において、グループＧ１に含まれる全エントリのキーの和をとった新たなキーを持つエントリを既存エントリＡとし、エントリＥｃを挿入するエントリＢとして求めた関数Ｐ２の値と、グループＧ２に含まれる全エントリのキーの和をとった新たなキーを持つエントリを既存エントリＡとし、エントリＥｃを挿入するエントリＢとして求めた関数Ｐ２の値との差の絶対値を変数ｄに代入する演算である。すなわち、ノードＮのエントリ表にまだ残されているエントリのうちのｃ番目のエントリについて、グループＧ１に挿入する場合のペナルティＰ２とグループＧ２に挿入する場合のペナルティＰ２との差の大きさが変数ｄに代入されることになる。 This expression of d is obtained as an entry B having the new key obtained by summing the keys of all the entries included in the group G1 in the function P2 defined above as an entry B into which the entry Ec is inserted. The difference between the value of the function P2 and the value of the function P2 obtained as the entry B having the new key obtained by adding the keys of all the entries included in the group G2 as the existing entry A and the entry B into which the entry Ec is inserted. This is an operation for substituting the absolute value into the variable d. That is, the difference between the penalty P2 when inserting into the group G1 and the penalty P2 when inserting into the group G2 for the cth entry among the entries still remaining in the entry table of the node N is a variable. will be assigned to d.

次に、ｄとｄ＿ｍａｘの大きさを比較し（ステップＳ１３−３９）、ｄ＿ｍａｘの方が小さいまたは等しい場合には、ｉ＿ｍａｘ、ｄ＿ｍａｘにそれぞれｉ、ｄを代入し（ステップＳ１３−４０）、ｉを１増加させて（ステップＳ１３−４１）ステップＳ１３−３６に戻る。 Next, the magnitudes of d and d_max are compared (step S13-39). If d_max is smaller or equal, i and d are substituted for i_max and d_max, respectively (step S13-40). It is incremented by 1 (step S13-41) and the process returns to step S13-36.

ステップＳ１３−３９において、ｄ＿ｍａｘの方が大きい場合には、ステップＳ１３−４１に進む。 In step S13-39, when d_max is larger, the process proceeds to step S13-41.

これらのステップＳ１３−３６〜Ｓ１３−４１の処理を繰り返し行うことで、リストｍａｘ＿ｅｎｔｒｙに格納されているすべてのエントリの番号について、グループＧ１に振り分けた場合とグループＧ２に振り分けた場合でペナルティＰ２の値の差が最も大きくなるエントリの番号のリストｍａｘ＿ｅｎｔｒｙ中の格納位置（番号）が変数ｉ＿ｍａｘに記憶されることになる。 By repeatedly performing these steps S13-36 to S13-41, the value of the penalty P2 for all the entry numbers stored in the list max_entry is assigned to the group G1 and assigned to the group G2. The storage position (number) in the list max_entry of the entry number with the largest difference is stored in the variable i_max.

ステップＳ１３−３６において、存在しない場合には、リストｍａｘ＿ｅｎｔｒｙのｉ＿ｍａｘ番目の要素が存在する否かを判定し（ステップＳ１３−４２）、存在する場合には、リストｍａｘ＿ｅｎｔｒｙのｉ＿ｍａｘ番目の数を変数ｃに代入する（ステップＳ１３−４３）。さらに、Ｐ１（（Ｇ１），Ｅｃ）とＰ１（（Ｇ２），Ｅｃ）の大きさを比較し（ステップＳ１３−４４）、Ｐ１（（Ｇ２），Ｅｃ）の方が大きいか等しい場合には、再度Ｐ１（（Ｇ１），Ｅｃ）とＰ１（（Ｇ２），Ｅｃ）の大きさを比較する（ステップＳ１３−４５）。 In step S13-36, if it does not exist, it is determined whether or not the i_max-th element of the list max_entry exists (step S13-42). If it exists, the i_max-th number of the list max_entry is set as a variable c. (Step S13-43). Further, the magnitudes of P1 ((G1), Ec) and P1 ((G2), Ec) are compared (step S13-44), and if P1 ((G2), Ec) is greater or equal, The magnitudes of P1 ((G1), Ec) and P1 ((G2), Ec) are compared again (step S13-45).

Ｐ１（（Ｇ２），Ｅｃ）の方が小さいか等しい場合には、さらにＰ２（（Ｇ１），Ｅｃ）とＰ２（（Ｇ２），Ｅｃ）の大きさを比較する（ステップＳ１３−４６）。Ｐ２（（Ｇ２），Ｅｃ）の方が大きいまたは等しい場合には、ＥｃをＧ１に入れ、エントリ表からは削除し（ステップＳ１３−４７）、ステップＳ１３−２３に戻る。 If P1 ((G2), Ec) is smaller or equal, the magnitudes of P2 ((G1), Ec) and P2 ((G2), Ec) are further compared (step S13-46). If P2 ((G2), Ec) is greater or equal, Ec is entered into G1, deleted from the entry table (step S13-47), and the process returns to step S13-23.

ステップＳ１３−４４において、Ｐ１（（Ｇ２），Ｅｃ）の方が小さい場合には、ＥｃをＧ２に入れ、エントリ表からは削除し（ステップＳ１３−４８）、ステップＳ１３−２３に戻る。 In step S13-44, if P1 ((G2), Ec) is smaller, Ec is entered in G2, deleted from the entry table (step S13-48), and the process returns to step S13-23.

ステップＳ１３−４５において、Ｐ１（（Ｇ２），Ｅｃ）の方が大きい場合には、ステップＳ１３−４７に進む。 In step S13-45, if P1 ((G2), Ec) is larger, the process proceeds to step S13-47.

ステップＳ１３−４６において、Ｐ２（（Ｇ２），Ｅｃ）の方が小さい場合には、ステップＳ１３−４８に進む。 In step S13-46, if P2 ((G2), Ec) is smaller, the process proceeds to step S13-48.

ステップＳ１３−４３〜Ｓ１３−４８では、リストｍａｘ＿ｅｎｔｒｙのｉ＿ｍａｘ番目の要素の値をｃとして、ノードＮのエントリ表のｃ番目のエントリＥｃを挿入されるエントリＢ、グループＧ１またはＧ２に含まれる全エントリのキーの和をとった新たなキーを持つエントリをエントリＡとして、関数Ｐ１の値が小さくなる方のいずれかのグループにエントリＥｃが振り分けられ、エントリ表から削除される。すなわち、グループＧ２とエントリＥｃとから求められる関数Ｐ１の値が、グループＧ１とエントリＥｃとから求められる関数Ｐ１の値よりも小さい場合には（ステップＳ１３−４４で「ＹＥＳ」の場合には）、当該エントリＥｃがグループＧ２に入れられる。他方、グループＧ１とエントリＥｃとから求められる関数Ｐ１の値が、グループＧ２とエントリＥｃとから求められる関数Ｐ１の値よりも小さい場合には（ステップＳ１３−４５で「ＹＥＳ」の場合には）、当該エントリＥｃがグループＧ１に入れられる。また、グループＧ１とエントリＥｃとから求められる関数Ｐ１の値と、グループＧ２とエントリＥｃとから求められる関数Ｐ１の値が等しい場合には（ステップＳ１３−４５で「ＮＯ」の場合には）、ノードＮのエントリ表のｃ番目のエントリＥｃを挿入されるエントリＢ、グループＧ１またはＧ２に含まれる全エントリのキーの和をとった新たなキーを持つエントリをエントリＡとして、関数Ｐ２の値が小さくなる方のいずれかのグループにエントリＥｃが振り分けられ、エントリ表から削除される。すなわち、グループＧ２とエントリＥｃとから求められる関数Ｐ２の値が、グループＧ１とエントリＥｃとから求められる関数Ｐ２の値よりも小さい場合には（ステップＳ１３−４６で「ＹＥＳ」の場合には）、当該エントリＥｃがグループＧ２に入れられる。他方、そうでない場合には（ステップＳ１３−４６で「ＮＯ」の場合には）、当該エントリＥｃがグループＧ１に入れられる。 In steps S13-43 to S13-48, the value of the i_max-th element of the list max_entry is c, and all entries included in the entry B, group G1, or G2 into which the c-th entry Ec of the entry table of the node N is inserted Entry Ec is assigned to one of the groups with the smaller value of function P1 with entry A having a new key that is the sum of the two keys as entry A, and deleted from the entry table. That is, when the value of the function P1 obtained from the group G2 and the entry Ec is smaller than the value of the function P1 obtained from the group G1 and the entry Ec (in the case of “YES” in step S13-44). The entry Ec is put into the group G2. On the other hand, when the value of the function P1 obtained from the group G1 and the entry Ec is smaller than the value of the function P1 obtained from the group G2 and the entry Ec (in the case of “YES” in step S13-45). The entry Ec is put into the group G1. Further, when the value of the function P1 obtained from the group G1 and the entry Ec is equal to the value of the function P1 obtained from the group G2 and the entry Ec (in the case of “NO” in step S13-45), The entry P with the c-th entry Ec in the entry table of the node N and the entry having a new key obtained by summing the keys of all entries included in the group G1 or G2 are set as the entry A, and the value of the function P2 is The entry Ec is distributed to one of the smaller groups and deleted from the entry table. That is, when the value of the function P2 obtained from the group G2 and the entry Ec is smaller than the value of the function P2 obtained from the group G1 and the entry Ec (in the case of “YES” in step S13-46). The entry Ec is put into the group G2. On the other hand, if not (if “NO” in step S13-46), the entry Ec is placed in the group G1.

ステップＳ１３−４２において、存在しない場合には、ノードＮのエントリ表にＧ１のエントリ集合を入れ、新しいノードＮ´を生成し、そのエントリ表にＧ２のエントリ集合を入れる（ステップＳ１３−４９）。ノードＮのすべてのエントリをグループＧ１またはグループＧ２に振り分けた後は、ノードＮのエントリ表にはエントリが存在しないことになる。この場合、ステップＳ１３−２８、Ｓ１３−３６、およびＳ１３−４２はいずれも判定結果が「ＮＯ」となるので、ステップＳ１３−４９へ進み、ノードＮのエントリ表にグループＧ１に振り分けられたすべてのエントリが入れられ、新たに生成されたノードＮ´のエントリ表にグループＧ２に振り分けられたすべてのエントリが入れられることになる。 If it does not exist in step S13-42, the G1 entry set is entered into the node N entry table, a new node N 'is generated, and the G2 entry set is entered into the entry table (step S13-49). After all the entries of the node N are distributed to the group G1 or the group G2, no entry exists in the entry table of the node N. In this case, since the determination results in steps S13-28, S13-36, and S13-42 are all “NO”, the process proceeds to step S13-49, and all of the nodes assigned to the group G1 in the entry table of node N are processed. An entry is entered, and all entries distributed to the group G2 are entered in the entry table of the newly generated node N ′.

次に、ノードＮがルートノードか否かを判定し（ステップＳ１３−５０）、ルートノードでない場合には、ノードＮの親ノードが保持するエントリ表内のエントリ数がＴ２未満かどうかを判定する（ステップＳ１３−５１）。 Next, it is determined whether or not the node N is a root node (step S13-50). If it is not the root node, it is determined whether or not the number of entries in the entry table held by the parent node of the node N is less than T2. (Step S13-51).

Ｔ２未満である場合には、ノードＮの親ノードのエントリ表に、新しいノードＮ´に対応するエントリ、すなわちキーとしてノードＮ´のエントリ表内の全エントリのキーの和をとったキーをもち、またノードＮ´へのポインタをもつエントリを追加する（ステップＳ１３−５２）。 If it is less than T2, the entry table of the parent node of the node N has an entry corresponding to the new node N ′, that is, a key obtained by summing the keys of all the entries in the entry table of the node N ′ as a key. In addition, an entry having a pointer to the node N ′ is added (step S13-52).

ステップＳ１３−５０において、ノードＮがルートノードである場合には、新しいノードＮ´´を生成し、ノードＮの親ノードとする。すなわち、ノードＮに親ノードとしてノードＮ´´へのポインタを保持させる。さらに、ノードＮに対応するエントリをノードＮ´´のエントリ表に追加し（ステップＳ１３−５３）、ステップＳ１３−５２へ進む。 In step S13-50, if node N is the root node, a new node N ″ is generated and set as the parent node of node N. That is, the node N holds a pointer to the node N ″ as a parent node. Further, an entry corresponding to the node N is added to the entry table of the node N ″ (step S13-53), and the process proceeds to step S13-52.

ステップＳ１３−５２の後、ノードＮの親ノードを入力とするキー集合情報調整手順を実施し（ステップＳ１３−５５）、処理を終える。 After step S13-52, a key set information adjustment procedure using the parent node of node N as an input is performed (step S13-55), and the process ends.

ステップＳ１３−５１において、Ｔ２未満でない場合には、ノードＮの親ノードと、Ｎ´に対応するエントリを入力とするノード分割手順を実施し（すなわち図１４〜図１６に示すノード分割手順を再帰呼び出しし）（ステップＳ１３−５４）、ステップＳ１３−５５に進む。 In step S13-51, if it is not less than T2, a node division procedure is performed in which the parent node of node N and the entry corresponding to N ′ are input (that is, the node division procedure shown in FIGS. 14 to 16 is recursively performed). Calling) (step S13-54), the process proceeds to step S13-55.

該ノード分割手順によりノードが分割される様子の例を図１７に示す。この例では、Ｔ１＝２、Ｔ２＝４とする。図１７左に示すように、今、３つのノードＮ１〜Ｎ３からなる検索木において、各ノードのエントリ表にエントリＥ１〜Ｅ１０が保持されている状態を考える。 An example of how a node is divided by the node dividing procedure is shown in FIG. In this example, T1 = 2 and T2 = 4. As shown on the left side of FIG. 17, let us consider a state in which entries E1 to E10 are held in the entry table of each node in a search tree composed of three nodes N1 to N3.

エントリＥ１０は最後に該検索木に挿入され、これにより、ノードＮ３のエントリ表に保持しているエントリ数がＴ２を超える５となった場合、エントリ数を減らすために、ノード分割手順が実施される（図１２のステップＳ１１−２、ステップＳ１１−４）。この結果、図１７右に示すように、ノードＮ３がＮ３と新たなノードＮ４に分割され、Ｎ３が保持していたエントリの一部は、Ｎ４へ移る（図１４〜図１５のステップＳ１３−１〜ステップＳ１３−４９）。これに伴い、新たなノードＮ４に対応するエントリＥ１１がＮ３の親ノードであるＮ１のエントリ表に追加される（図１６のステップＳ１３−５２）。 The entry E10 is finally inserted into the search tree. When the number of entries held in the entry table of the node N3 becomes 5, which exceeds T2, the node division procedure is performed to reduce the number of entries. (Steps S11-2 and S11-4 in FIG. 12). As a result, as shown in the right of FIG. 17, the node N3 is divided into N3 and a new node N4, and a part of the entries held by N3 moves to N4 (steps S13-1 in FIGS. 14 to 15). -Step S13-49). Accordingly, the entry E11 corresponding to the new node N4 is added to the entry table of N1 which is the parent node of N3 (step S13-52 in FIG. 16).

該ノード分割手順によりノードが分割される様子のもう１つの例を図１８に示す。図１７に示した例のように、ノード分割手順においては、分割対象のノードの親ノードのエントリ表に新しいエントリが追加される。これにより、親ノードが保持するエントリ数がＴ２以上となる場合、さらに該親ノードがノード分割対象となる（図１６のステップＳ１３−５１、ステップＳ１３−５４）。 FIG. 18 shows another example of how a node is divided by the node dividing procedure. As in the example shown in FIG. 17, in the node division procedure, a new entry is added to the entry table of the parent node of the node to be divided. As a result, when the number of entries held by the parent node is T2 or more, the parent node is further subject to node division (steps S13-51 and S13-54 in FIG. 16).

図１８は、該親ノードがルートノードだった場合の例である。図１８に示すように、ルートノードＮ１が分割される場合には、Ｎ１と新たなノードＮ７に分割されるだけでなく、新たなノードＮ８がルートノードとして生成され、Ｎ１、Ｎ７に対応するエントリをエントリ表に保持する（図１６のステップＳ１３−５０、ステップＳ１３−５３、ステップＳ１３−５２）。このとき、検索木の階層が１つ増え、ルートノードのレベル数は１増えることとなる。 FIG. 18 shows an example where the parent node is the root node. As shown in FIG. 18, when the root node N1 is divided, not only is it divided into N1 and a new node N7, but a new node N8 is generated as a root node, and entries corresponding to N1 and N7 Are stored in the entry table (steps S13-50, S13-53, and S13-52 in FIG. 16). At this time, the hierarchy of the search tree increases by one, and the number of levels of the root node increases by one.

次に、図１２のステップＳ１１−５または図１６のステップＳ１３−５５で呼び出されるキー集合情報調整手順フローを図１９に示す。まず引数として入力されたノードアドレスＮが示すノード（ノードＮ）がルートノードかどうかを判定する（ステップＳ１７−１）。 Next, FIG. 19 shows a key set information adjustment procedure flow called in step S11-5 of FIG. 12 or step S13-55 of FIG. First, it is determined whether or not the node (node N) indicated by the node address N input as an argument is a root node (step S17-1).

ルートノードでない場合には、ノードＮの親ノードのエントリ表内の、ノードＮに対応するエントリ、すなわちポインタとしてノードＮのアドレスを保持するエントリのキーを、その時点でのノードＮに対応するキー、すなわちノードＮのエントリ表内のエントリ集合のキーの和をとったキーで上書きする（ステップＳ１７−２）。 If it is not the root node, the key of the entry corresponding to the node N in the entry table of the parent node of the node N, that is, the entry holding the address of the node N as a pointer, is the key corresponding to the node N at that time That is, the key is overwritten with the key obtained by summing the keys of the entry set in the entry table of the node N (step S17-2).

さらに、ノードＮの親ノードを入力とするキー集合情報調整手順を実施し（再帰呼び出しし）（ステップＳ１７−３）、処理を終える。 Further, a key set information adjustment procedure using the parent node of the node N as an input is performed (recursively called) (step S17-3), and the process ends.

ステップＳ１７−１において、ルートノードであった場合には、そのまま処理を終える。 In step S17-1, if it is the root node, the process is finished as it is.

なお、本実施形態においては、リーフノードのエントリにタプルＩＤを含ませ、タプルそのものはタプル表に格納することとしたが、本発明の範囲はそれに限らず、リーフノードのエントリにタプルそのものを含ませることも可能である。 In this embodiment, the tuple ID is included in the leaf node entry and the tuple itself is stored in the tuple table. However, the scope of the present invention is not limited to this, and the tuple itself is included in the leaf node entry. It is also possible to

また本実施形態においては、各属性が検索式に用いられる確率の近似値として、属性出現回数をタプル総数で除した値を用いたが、本発明の範囲はそれに限らず、各属性が検索式に用いられる確率について、予めデータベース管理者が入力することとしたり、実際にユーザが用いた検索式の統計情報に基づいて算出したりすることも可能である。 In this embodiment, the value obtained by dividing the number of appearances of the attribute by the total number of tuples is used as an approximate value of the probability that each attribute is used in the search formula. However, the scope of the present invention is not limited thereto, and each attribute is searched by the search formula. It is also possible for the database manager to input the probabilities used in the above in advance, or to calculate based on the statistical information of the search formula actually used by the user.

また本実施形態においては、該関数Ｐ２によるペナルティの計算において、入力される２つのエントリのキーが含む全属性について考慮したが、本発明の範囲はそれに限らず、検索式に用いられる確率が高い属性のうち例えば上位１０個の属性についての項のみを考慮することも可能である。すなわち、他の属性の項については０とみなしてもよい。 In the present embodiment, in the penalty calculation by the function P2, all attributes included in the keys of the two entries to be input are considered. However, the scope of the present invention is not limited thereto, and the probability of being used in a search expression is high. It is also possible to consider only the terms for the top 10 attributes among the attributes. That is, other attribute terms may be regarded as 0.

また本実施形態では、各属性が検索式に用いられる確率は独立として扱ったが、本発明の範囲はそれに限らず、各属性の共起性、すなわち１つの検索式に同時に現れる確率を考慮したペナルティの計算を行うことも可能である。たとえば、属性ａＡと属性ａＢを条件に含む検索式が用いられる確率をＲとすると、その属性の組み合わせに対する重みづけ付き正規化被検索面積を（属性ａＡの正規化被検索面積）×（属性ａＢの正規化被検索面積）×Ｒと定義し、全属性の全組合せに対するこのような重みづけ付き正規化被検索面積の和の増加量をペナルティとすればよい。 In this embodiment, the probability that each attribute is used in the search expression is treated as independent. However, the scope of the present invention is not limited to this, and the co-occurrence of each attribute, that is, the probability of appearing simultaneously in one search expression is considered. It is also possible to calculate a penalty. For example, when the probability that a search expression including the attribute aA and the attribute aB is used as a condition is R, the weighted normalized search area for the combination of attributes is (normalized search area of the attribute aA) × (attribute aB Normalization searched area) × R, and an increase in the sum of such weighted normalized searched areas for all combinations of all attributes may be used as a penalty.

また本実施形態では、ある属性が検索式に用いられた場合、指定される値範囲の始点、終点を示す２値が該属性の定義域内で一様の確率で選ばれると仮定した、すなわち、該始点、該終点を入力、その値範囲が検索式内で指定される確率を出力とする確率密度関数の出力値が、該属性の定義域内の点を入力とする場合に一定と仮定したが、本発明の範囲はそれに限らず、該確率密度関数を予めデータベース管理者が入力することとしたり、実際にユーザが用いた検索式の統計情報に基づいて算出したりすることも可能である。また例えば時刻属性について、現在時刻に近い値ほど検索されやすい場合には、現在時刻に近い値範囲を入力するほど高い値を出力する確率密度関数を用いるなど、別の確率密度関数を仮定することも可能である。 Further, in this embodiment, when a certain attribute is used in the search expression, it is assumed that two values indicating the start point and end point of the specified value range are selected with a uniform probability within the domain of the attribute. It is assumed that the output value of the probability density function that inputs the start point and the end point and outputs the probability that the value range is specified in the search formula is constant when the point in the domain of the attribute is input. The scope of the present invention is not limited thereto, and the probability density function can be input in advance by the database administrator, or can be calculated based on the statistical information of the search formula actually used by the user. Also, for example, if a value closer to the current time is more easily searched for a time attribute, another probability density function is assumed, such as using a probability density function that outputs a higher value as a value range closer to the current time is input. Is also possible.

次に、本実施形態のシステムにおける検索処理について説明する。検索処理では、クライアント・コンピュータ装置３０１から受信したタプル検索要求に対し、サーバ・コンピュータ装置２０１の演算部２０３は、まず記憶部２０４が保持するインデックスから、受信した検索式が意味する検索条件を満たすタプルのＩＤ集合を取得する。具体的には、検索結果タプルＩＤリストを空にした後、記憶部２０４が保持するインデックスのルートノードを検索対象ノードとしてノード検索手順を実行することにより、検索結果タプルＩＤリスト内に該ＩＤ集合を得ることができる。さらに、記憶部２０４が保持するタプル表を参照し、該ＩＤ集合に対応するタプル集合を抽出し、検索結果として該タプル集合をタプル検索要求元であるクライアント・コンピュータ装置３０１に送信する。 Next, search processing in the system of this embodiment will be described. In the search processing, in response to the tuple search request received from the client computer apparatus 301, the calculation unit 203 of the server computer apparatus 201 first satisfies the search condition indicated by the received search expression from the index held by the storage unit 204. Get the tuple ID set. Specifically, after the search result tuple ID list is emptied, a node search procedure is executed with the root node of the index held in the storage unit 204 as a search target node, whereby the ID set is included in the search result tuple ID list. Can be obtained. Further, the tuple table stored in the storage unit 204 is referred to, a tuple set corresponding to the ID set is extracted, and the tuple set is transmitted as a search result to the client computer apparatus 301 that is the tuple search request source.

ノード検索手順フローを図２０に示す。図２０に示すフローは、上記検索木を用いたノード検索手順のフローであって、検索式としてのキーである検索キーに対し、ルートノードからその下位のノードへと検索対象を変えながら、１つのノードを検索対象ノードとして検索を行う際に呼び出されるものであるとともに、ステップＳ１８−７において再帰呼び出しされるものである。 A node search procedure flow is shown in FIG. The flow shown in FIG. 20 is a flow of a node search procedure using the above search tree. For a search key that is a key as a search expression, while changing the search target from a root node to a lower node, 1 This is called when a search is performed using one node as a search target node, and is recursively called in step S18-7.

まず、検索対象ノードがリーフノードか否かを判定し（ステップＳ１８−１）、リーフノードである場合には、検索対象ノードのエントリ表に記載されたエントリのうち、検索条件を満たすものを検索結果タプルＩＤリストに追加し（ステップＳ１８−２）、処理を終える。 First, it is determined whether or not the search target node is a leaf node (step S18-1). If the search target node is a leaf node, the entry satisfying the search condition is searched among the entries described in the entry table of the search target node. It adds to a result tuple ID list (step S18-2), and complete | finishes a process.

ステップＳ１８−１において、リーフノードでない場合には、まず変数ｉに１を代入し（ステップＳ１８−３）、検索対象ノードに含まれるエントリ表内のｉ番目のエントリが存在するか否かを判定する（ステップＳ１８−４）。存在しない場合には、処理を終える。 If it is not a leaf node in step S18-1, first, 1 is substituted into variable i (step S18-3), and it is determined whether or not the i-th entry in the entry table included in the search target node exists. (Step S18-4). If not, the process ends.

ステップＳ１８−４において、ｉ番目のエントリが存在する場合には、該エントリに含まれるポインタが指し示す子ノード以下のノードに検索条件を満たすタプルが存在し得るか否かを、該エントリに含まれるキーより判定する（ステップＳ１８−５）。存在しえない場合には、変数ｉを１増加させ（ステップＳ１８−６）、ステップＳ１８−４に戻る。 In step S18-4, if the i-th entry exists, whether or not a tuple satisfying the search condition can exist in a node below the child node indicated by the pointer included in the entry is included in the entry. It is determined from the key (step S18-5). If not, the variable i is incremented by 1 (step S18-6), and the process returns to step S18-4.

ステップＳ１８−５において、存在し得る場合には、該エントリのポインタが指し示す子ノードを検索対象ノードとして再帰的にノード検索手順を実施する（ステップＳ１８−７）。 In step S18-5, if it can exist, the node search procedure is recursively performed with the child node pointed to by the entry pointer as the search target node (step S18-7).

本実施形態によれば、多種多次元のタプルに対し、統一的に１つの木構造インデックスを構築し、そのペナルティとして、エントリ挿入前後での検索される確率に対応する値（すなわち検索される確率ないし検索される確率の近似値）の増加量が用いられている。このペナルティであれば、次元毎に大小関係を判別する必要はなく、エントリが含むキーの次元数や種類が異なっても定義可能となる。これにより、多種多次元のタプル集合を蓄積・検索する際の、複数のインデックスを用いることによるオーバーヘッド、すなわち処理量や記憶容量、処理速度が大きくなることを抑え、また多種多次元のタプル集合に対して検索効率を向上させるクラスタリングを実現することが可能となる。 According to the present embodiment, a single tree structure index is uniformly constructed for various multi-dimensional tuples, and as a penalty, a value corresponding to a search probability before and after entry insertion (that is, search probability) Or an approximate value of the probability of being searched). With this penalty, it is not necessary to determine the magnitude relationship for each dimension, and it can be defined even if the number of key dimensions and types included in the entry are different. This suppresses the overhead caused by using multiple indexes when accumulating / retrieving various multi-dimensional tuple sets, that is, the amount of processing, storage capacity, and processing speed, and increases the number of multi-dimensional tuple sets. On the other hand, it is possible to realize clustering that improves search efficiency.

また、本実施形態では、ペナルティの算出において、エントリが検索される確率に対応した値として、エントリに含まれるキーに含まれる各属性に対する重みづけ付き正規化被検索面積の和を用い、該重みづけ付き正規化被検索面積が、該正規化被検索面積に、該属性が検索式に用いられる確率を掛け合わせた値であり、該正規化被検索面積が、数式（（ｑ−ａ）（ｂ−ｐ）−（ｑ−ｐ）^２／２）／（（ｂ−ａ）^２／２）（但し、ｐ、ｑはそれぞれ、該キーに含まれる該属性の最小値、最大値。ａ、ｂはそれぞれ、それまで該検索木に挿入された全タプル集合に含まれる該属性の最小値、最大値。）（数式Ｉ）で規定される。 In the present embodiment, in calculating the penalty, the weighted normalized search area for each attribute included in the key included in the entry is used as a value corresponding to the probability that the entry is searched, and the weight is calculated. The normalized normalized search area is a value obtained by multiplying the normalized search area by the probability that the attribute is used in the search expression, and the normalized search area is expressed by the formula ((q−a) ( ^{b-p) - (q-} p) 2/2) / ((b-a) 2/2) ( where, p, respectively q is the minimum value of the attribute included in the key, the maximum value .a, b is defined by (Expression I)) (minimum value and maximum value of the attribute included in all tuple sets that have been inserted into the search tree so far).

ある属性が検索式に用いられ、さらに該検索式において該属性の値範囲を指定した範囲条件が用いられている場合、該値範囲の始点、終点を示す２値が該属性の定義域内で一様の確率で選ばれると仮定すると、該エントリにアクセスする必要がある確率は数式Ｉで近似される。数式Ｉで求められる値が該正規化被検索面積である。重みづけ付き正規化被検索面積は、さらに該属性が検索式に用いられる確率を正規化被検索面積に乗じたものである。従って、エントリに含まれるキーに含まれる各属性に対する重みづけ付き正規化被検索面積の和は、範囲検索時に、検索条件に適合するタプルを探すために該エントリにアクセスしなければならない確率ないし確率の近似値に相当する。該重み付け付き正規化被検索面積の和をペナルティとして用い、ペナルティをなるべく小さくする戦略の下、検索木を構築することにより、検索時にアクセスしなければならないエントリ数を総じて小さくすることが可能となる。すなわち、検索の処理量、処理速度を向上させることが可能となる。 When a certain attribute is used in the search expression and a range condition specifying the value range of the attribute is used in the search expression, two values indicating the start point and end point of the value range are identical in the definition area of the attribute. Assuming that it is chosen with a similar probability, the probability that the entry needs to be accessed is approximated by Equation I. The value obtained by Expression I is the normalized search area. The weighted normalized search area is obtained by multiplying the normalized search area by the probability that the attribute is used in the search expression. Therefore, the sum of the weighted normalized search area for each attribute included in the key included in the entry is the probability or probability that the entry must be accessed in order to search for a tuple that matches the search condition. Is an approximate value. By using the sum of the weighted normalized search areas as a penalty and constructing a search tree under the strategy of reducing the penalty as much as possible, the number of entries that must be accessed during the search can be reduced as a whole. . That is, it is possible to improve the search processing amount and processing speed.

なお、本発明が特徴とする構成と、上記実施形態における構成との対応関係は次のとおりである。 The correspondence relationship between the configuration characterized by the present invention and the configuration in the above embodiment is as follows.

本発明の特徴は、蓄積検索対象の情報の単位であるタプル（図３）は、属性−値の組の並びを少なくとも含む１以上の長さを有するものであり、該属性の種類や該並びの長さが同じまたは異なる１つ以上の該タプルが複数記憶されるものであり、該タプルから検索キーがインデックス化され検索木（図６）が構築されるものであり、該検索木はノード（図６のルートノード４０１、インナーノード４０２、リーフノード４０３）を階層的に有する構成であり、該ノードの内最下層のリーフノードはエントリとして該タプルを識別するタプル識別子（タプルＩＤ）とキーを有するものであり（図８）、該ノードの内該リーフノードより上位のインナーノードはエントリとして下位ノードである子ノードの位置情報であるポインタとキーを有するものであり（図９）、該インデックスの構築において、エントリをノードに挿入する際、１つの挿入すべきエントリＸと、該エントリを挿入すべきレベル数Ｌ（リーフノードを０として階層が上がるごとに１増える階層数）が与えられた際（図１２のステップＳ１１−１、図１３のフロー）、はじめにルートノードを操作対象ノードとし（図１３のステップＳ１２−１）、操作対象ノードにおいて、「該操作対象ノードがすでに含む既存エントリに対し、該エントリＸを挿入した場合のペナルティ」が最も小さい該既存エントリを選出し（図１３のステップＳ１２−１３、Ｓ１２−１４、Ｓ１２−１５、Ｓ１２−１６、Ｓ１２−１７）、該既存エントリに含まれるポインタが指し示す子ノードを次の操作対象ノードと選定し（図１３のステップＳ１２−１８）、該次の操作対象ノードのレベル数がＬより大きい場合は（図１３のステップＳ１２−２で「ＮＯ」）、該次の操作対象ノードからさらに次の操作対象ノードを選定することを再帰的に繰り返し（図１３のステップＳ１２−１３、Ｓ１２−１４、Ｓ１２−１５、Ｓ１２−１６、Ｓ１２−１７、Ｓ１２−１８）、該次の操作対象ノードのレベル数がＬに到達した場合は（図１３のステップＳ１２−２で「ＹＥＳ」）、このノードを選択ノードとするステップ（図１３のステップＳ１２−１９）と、選択ノードが許容できる最大数のエントリを持っていない場合において（図１２のステップＳ１１−２で「ＹＥＳ」）、選択ノードに対し、エントリＸを追加するステップと（図１２のステップＳ１１−３）、選択ノードが既に許容できる最大数のエントリを持っている場合において（図１２のステップＳ１１−２で「ＮＯ」）、エントリＸを追加しつつ選択ノードの分割を行うステップ（図１２のステップＳ１１−４、図１４〜図１５のステップＳ１３−１およびＳ１３−２以降のステップ）と、分割を行った場合において、選択ノードの上位ノードのエントリを更新するステップ（図１２のステップＳ１１−５、図１９のフロー）と、を含むことを特徴とする。このペナルティは、例えば、該操作対象ノードがすでに含む既存エントリＡに対し、エントリＢを挿入する際、該エントリＢを挿入する前の該エントリＡと、挿入した後の該エントリＡとを比べた場合の、該エントリＡが検索される確率に対応した値の増分を示す値として定義する（関数Ｐ２、図１３のステップＳ１２−１２、１２−１５、Ｓ１２−１６）ことができる。 A feature of the present invention is that a tuple (FIG. 3), which is a unit of information to be stored and searched, has one or more lengths including at least an array of attribute-value pairs. A plurality of one or more tuples having the same or different lengths are stored, a search key is indexed from the tuples, and a search tree (FIG. 6) is constructed. (The root node 401, the inner node 402, and the leaf node 403 in FIG. 6) are hierarchically arranged, and the leaf node at the lowest layer among the nodes is a tuple identifier (tuple ID) and a key for identifying the tuple as an entry. (FIG. 8), and an inner node higher than the leaf node among the nodes has a pointer and a key as position information of a child node which is a lower node as an entry. (FIG. 9), in the construction of the index, when an entry is inserted into a node, one entry X to be inserted and the number of levels L to insert the entry (each time the hierarchy goes up with leaf nodes set to 0) Is given (step S11-1 in FIG. 12, flow in FIG. 13), the root node is first set as the operation target node (step S12-1 in FIG. 13). The existing entry having the smallest penalty when the entry X is inserted is selected from the existing entries already included in the operation target node (steps S12-13, S12-14, S12-15, S12-16 in FIG. 13). , S12-17), the child node indicated by the pointer included in the existing entry is selected as the next operation target node (step of FIG. 13). S12-18) If the number of levels of the next operation target node is larger than L (“NO” in step S12-2 in FIG. 13), the next operation target node is further selected from the next operation target node. This is repeated recursively (steps S12-13, S12-14, S12-15, S12-16, S12-17, S12-18 in FIG. 13), and the level number of the next operation target node has reached L. In this case (“YES” in step S12-2 in FIG. 13), the step of setting this node as the selected node (step S12-19 in FIG. 13) and the case where the selected node does not have the maximum number of allowable entries (“YES” in step S11-2 of FIG. 12), a step of adding entry X to the selected node (step S11-3 of FIG. 12), and the selected node is already allowed When the maximum number of entries is possible (“NO” in step S11-2 in FIG. 12), the step of dividing the selected node while adding entry X (step S11-4 in FIG. 12, FIG. 14 to FIG. 14) Steps after Steps S13-1 and S13-2 in FIG. 15), and the step of updating the entry of the upper node of the selected node in the case of division (Steps S11-5 in FIG. 12, the flow in FIG. 19) , Including. For example, when the entry B is inserted into the existing entry A already included in the operation target node, the penalty is compared between the entry A before the entry B is inserted and the entry A after the insertion. In this case, it can be defined as a value indicating an increment of a value corresponding to the probability that the entry A is searched (function P2, steps S12-12, 12-15, and S12-16 in FIG. 13).

また、他の発明は、上記分割を行うステップが、選択ノードが含む複数のエントリにエントリＸを追加した各エントリを２つのグループに順次、振り分ける際に（図１５のステップＳ１３−４７、Ｓ１３−４８）、各グループがすでに含むエントリの和をとったエントリ（（Ｇ１）、（Ｇ２））を既存エントリＡとし、該振り分けられるエントリ（Ｅｃ）をエントリＢとしてグループ毎に求めた各ペナルティに基づいて振り分け先のグループを選択して振り分けを行うステップ（図１５のステップＳ１３−３６〜Ｓ１３−４８）と、該２つのグループうちの１つのグループに振り分けられた各エントリを選択ノードのエントリとするとともに、もう１つのグループに振り分けられた各エントリを新たに生成したノードのエントリとするステップ（図１５のステップＳ１３−４９）と、該新たに生成したノードのエントリを該ノードの親ノードに挿入するステップ（図１６のステップＳ１３−５２）と、を含むことを特徴とする。 In another invention, the step of performing the above-described division is performed when each entry obtained by adding the entry X to a plurality of entries included in the selected node is sequentially allocated to two groups (steps S13-47 and S13- in FIG. 15). 48) Based on each penalty obtained for each group with the entry ((G1), (G2)) that is the sum of the entries already included in each group as the existing entry A and the distributed entry (Ec) as the entry B The step of selecting a group to be distributed (steps S13-36 to S13-48 in FIG. 15) and each entry distributed to one of the two groups as the entry of the selected node At the same time, each entry assigned to the other group is set as a newly created node entry. And (step S13-49 in FIG. 15), a step (step S13-52 in FIG. 16) for inserting an entry for the newly created node in the parent node of said node, characterized in that it comprises a.

また、他の発明は、上記選択ノードを決定するステップが、操作対象ノードにおいて、ペナルティが最も小さい既存エントリを選出する（図１３のステップＳ１２−１８）ことを特徴とする。 Another invention is characterized in that the step of determining the selected node selects an existing entry having the smallest penalty in the operation target node (step S12-18 in FIG. 13).

また、他の発明は、上記ペナルティが、操作対象ノードがすでに含む既存エントリＡに対し、エントリＢを挿入する際、エントリＢを挿入する前のエントリＡと、挿入した後のエントリＡとを比べた場合の、エントリＡが検索される確率に対応した値の増分を示す値として定義され（関数Ｐ２、図１３のステップＳ１２−１２、１２−１５、Ｓ１２−１６）、エントリＡが検索される確率に対応した値（Ｑ２（Ｃ））として、エントリＡに含まれるキーに含まれる各属性に対する重みづけ付き正規化被検索面積の和（Σ_Ｃｉ（Ｓ（Ｃｉ）・Ｒ（Ｃｉ）））を用い、該重みづけ付き正規化被検索面積は、正規化被検索面積（Ｓ（Ｃｉ））に、該属性が検索式に用いられる確率（Ｒ（Ｃｉ））を掛け合わせた値であり、正規化被検索面積は、数式（（ｑ−ａ）（ｂ−ｐ）−（ｑ−ｐ）^２／２）／（（ｂ−ａ）^２／２）（但し、ｐ、ｑはそれぞれ、該キーに含まれる該属性の最小値、最大値。ａ、ｂはそれぞれ、それまで該検索木に挿入された全タプル集合に含まれる該属性の最小値、最大値。）で規定される、ことを特徴とする。なお、該キーに含まれる属性の最小値はｐＣｉ、最大値はｑＣｉに、検索木に挿入された全タプル集合に含まれる該属性の最小値はａＣｉに、最大値はｂＣｉに、それぞれ対応するものでもある。 In another invention, when the entry B is inserted into the existing entry A already included in the operation target node, the penalty is compared between the entry A before the entry B and the entry A after the insertion. In this case, the entry A is defined as a value indicating an increment of a value corresponding to the probability that the entry A is searched (function P2, steps S12-12, 12-15, and S12-16 in FIG. 13), and the entry A is searched. As a value corresponding to the probability (Q2 (C)), a sum of weighted normalized search areas for each attribute included in the key included in entry A (Σ _Ci (S (Ci) · R (Ci))) The weighted normalized searched area is a value obtained by multiplying the normalized searched area (S (Ci)) by the probability that the attribute is used in the search formula (R (Ci)). Normalized search area is a formula ((Q-a) (b -p) - (q-p) 2/2) / ((b-a) 2/2) ( where, p, respectively q is the attribute minimum contained in the key Value and maximum value, a and b are respectively defined by the minimum value and the maximum value of the attribute included in all tuple sets inserted in the search tree. Note that the minimum value of the attribute included in the key corresponds to pCi, the maximum value corresponds to qCi, the minimum value of the attribute included in all tuple sets inserted in the search tree corresponds to aCi, and the maximum value corresponds to bCi. It is also a thing.

また、他の発明は、上記ペナルティが、操作対象ノードがすでに含む既存エントリＡに対し、エントリＢを挿入する際、エントリＢを挿入する前のエントリＡと、挿入した後のエントリＡとを比べた場合の、エントリＡに含まれるキーに含まれる属性種類数（Ｑ１（Ｃ））の増分（関数Ｐ１）を示す値であることを特徴とする。 In another invention, when the entry B is inserted into the existing entry A already included in the operation target node, the penalty is compared between the entry A before the entry B and the entry A after the insertion. In this case, it is a value indicating the increment (function P1) of the number of attribute types (Q1 (C)) included in the key included in the entry A.

また、他の発明は、選択ノードを決定するステップが、２種のペナルティ（関数Ｐ１および関数Ｐ２）を用い、操作対象ノードにおいて、第一のペナルティ（関数Ｐ１）が最も小さい既存エントリを選出し（図１３のステップＳ１２−３〜１２−１０）、このとき該最も小さい既存エントリが複数存在する場合には、さらに、該最も小さい既存エントリのうち、第二のペナルティ（関数Ｐ２）が最も小さい既存エントリを選出する（図１３のステップＳ１２−１１〜１２−１８）ことを特徴とする。 In another invention, the step of determining the selected node uses two types of penalties (function P1 and function P2), and selects an existing entry having the smallest first penalty (function P1) at the operation target node. (Steps S12-3 to 12-10 in FIG. 13) At this time, when there are a plurality of the smallest existing entries, the second penalty (function P2) is the smallest among the smallest existing entries. An existing entry is selected (steps S12-11 to 12-18 in FIG. 13).

また、他の発明は、第一のペナルティ（関数Ｐ１）が、操作対象ノードがすでに含む既存エントリＡに対し、エントリＢを挿入する際、該エントリＢを挿入する前の該エントリＡと、挿入した後の該エントリＡとを比べた場合の、該エントリＡに含まれるキーに含まれる属性種類数（Ｑ１（Ｃ））の増分を示す値であり、第二のペナルティが、操作対象ノードがすでに含む既存エントリＡに対し、エントリＢを挿入する際、該エントリＢを挿入する前の該エントリＡと、挿入した後の該エントリＡとを比べた場合の、該エントリＡが検索される確率に対応した値（Ｑ２（Ｃ））の増分を示す値（関数Ｐ２）であることを特徴とする。 According to another invention, when the first penalty (function P1) inserts the entry B into the existing entry A already included in the operation target node, the entry A before the entry B is inserted, Is a value indicating the increment of the number of attribute types (Q1 (C)) included in the key included in the entry A when compared with the entry A after the operation is performed. Probability that the entry A is searched when the entry A is inserted into the existing entry A that is already included and the entry A before the entry B is inserted is compared with the entry A after the insertion. It is a value (function P2) indicating the increment of the value (Q2 (C)) corresponding to.

また、他の発明は。エントリが検索される確率に対応した値（Ｑ２（Ｃ））として、エントリに含まれるキーに含まれる各属性に対する重みづけ付き正規化被検索面積の和（Σ_Ｃｉ（Ｓ（Ｃｉ）・Ｒ（Ｃｉ）））を用いることを特徴とする。 Also, other inventions? As a value (Q2 (C)) corresponding to the probability that the entry is searched, the sum of the weighted normalized search areas (Σ _Ci (S (Ci) · R ( Ci))) is used.

また、他の発明は、属性が検索式に用いられる確率（Ｒ（Ｃｉ））として、属性出現回数をタプル総数で除した値を用い、該属性出現回数は、それまでに該検索木に挿入された全タプル集合における該属性の出現回数であり、該タプル総数は、それまでに該検索木に挿入された全タプル集合の要素数（（全タプル集合ＹにおけるＣｉの出現回数）／（全タプル集合Ｙの要素数））である、ことを特徴とする。 Another invention uses a value obtained by dividing the number of attribute appearances by the total number of tuples as the probability (R (Ci)) that the attribute is used in the search expression, and the number of attribute appearances has been inserted into the search tree so far. The total number of tuples is the number of elements of all tuple sets that have been inserted in the search tree so far ((number of occurrences of Ci in all tuple sets Y) / (all The number of elements of the tuple set Y)).

また、他の発明は、上記検索木を用いた検索において（図２０のフロー）、検索式としてのキーである検索キーに対し、ルートノードからその下位のノードへと検索対象を変えながら、１つの該ノードを検索対象とし、該ノードがインナーノードであれば（図２０のステップＳ１８−１で「ＮＯ」）、該ノードに含まれる各エントリについて、該エントリのキーが検索キーに含まれる各属性を全て含み、かつ、該検索キーに含まれる各属性に対応する値または値範囲と該エントリのキーに含まれる該属性に対応する値または値範囲とが一致または部分的に一致する（一方の値または値範囲が、他方の値範囲に含まれるまたは一部重なる）かを調べる検索手順を行い（ステップＳ１８−４、Ｓ１８−５）、一致または部分的に一致する場合には（ステップＳ１８−５で「ＹＥＳ」）、該エントリのポインタで接続されたノードを検索対象、該検索キーを検索キーとしてノード検索手順を再帰的に行い（ステップＳ１８−７）、また、該ノードがリーフノードであれば（ステップＳ１８−１で「ＹＥＳ」）、該ノードに含まれる各エントリについて、該エントリのキーが検索キーに含まれる各属性を全て含み、かつ、検索キーに含まれる各属性に対応する値または値範囲に該エントリのキーに含まれる該属性に対応する値が含まれるかを調べ、含まれる場合には該エントリのタプル識別子を検索結果タプル識別子集合に加える（ステップＳ１８−２）、ノード検索ステップ、を有することを特徴とする。 Further, in another search using the above-described search tree (the flow of FIG. 20), the search target is changed from a root node to a lower node with respect to a search key that is a key as a search expression. If one of the nodes is a search target and the node is an inner node (“NO” in step S18-1 in FIG. 20), for each entry included in the node, the key of the entry is included in the search key. A value or value range corresponding to each attribute included in the search key and a value or value range corresponding to the attribute included in the key of the entry match or partially match (including one attribute) A search procedure is performed to check whether the value or value range is included or partially overlaps the other value range (steps S18-4 and S18-5). "YES" in step S18-5), the node search procedure is performed recursively using the node connected by the pointer of the entry as a search target and the search key as a search key (step S18-7). If it is a leaf node (“YES” in step S18-1), for each entry included in the node, the key of the entry includes all the attributes included in the search key, and each attribute included in the search key Whether the value corresponding to the attribute or the value corresponding to the attribute included in the key of the entry is included, and if included, add the tuple identifier of the entry to the search result tuple identifier set (step S18- 2) and a node search step.

また、本発明は、他の次のような態様としてとらえることもできる。 Further, the present invention can also be regarded as other aspects as follows.

本発明の他の態様は、検索木のペナルティとして、重みづけ付き正規化被検索面積の和の増加量（関数Ｐ２）に基づくペナルティに加えて、エントリに含まれるキーに含まれる属性種類数の増加量（関数Ｐ１）を用いることとした。これにより、該重みづけ付き正規化被検索面積の和の増加量のみをペナルティとした場合以上に、さらに強力にエントリの次元増加が抑えられ、「次元の呪い」の問題を抑制することができる。すなわち、検索木のペナルティを２つ使用し、第一のペナルティとして、該「エントリに含まれるキーに含まれる属性種類数の増加量」（関数Ｐ１）を用い、第二のペナルティとして、エントリ挿入前後での検索される確率ないし検索される確率の近似値の増加量、すなわち該「エントリに含まれるキーに含まれる各属性に対する重みづけ付き正規化被検索面積の和の増加量」（関数Ｐ２）を用い、部分木選択手順において、第一のペナルティが最も小さいエントリが複数存在する場合に（図１３のステップＳ１２−１３で判定）、それらの間で第二のペナルティを比較し（図１３のステップＳ１２−１５）、第二のペナルティが最も小さいエントリを選ぶこととした（図１３のステップＳ１２−１８）。これにより、第一のペナルティによる次元増加低減効果と、第二のペナルティによる検索効率化効果を合わせ持った、検索木構築方法となる。特に、第二のペナルティとして該「エントリに含まれるキーに含まれる各属性に対する重みづけ付き正規化被検索面積の和の増加量」を用いた場合には、第一のペナルティが属性の種類だけを考慮するのに対し、第二のペナルティによる「各属性（次元）の値までを考慮することによる検索効率化効果」を合わせ持った、検索木構築方法となると言える。 According to another aspect of the present invention, as a penalty of a search tree, in addition to a penalty based on the sum of weighted normalized search areas (function P2), the number of attribute types included in a key included in an entry The increase amount (function P1) was used. As a result, the dimensionality of the entry can be suppressed more strongly and the problem of “curse of dimension” can be suppressed more than when only the increased amount of the sum of the weighted normalized search areas is used as a penalty. . That is, two search tree penalties are used, and the first penalty is the “increase in the number of attribute types included in the key included in the entry” (function P1), and the second penalty is entry insertion. The amount of increase in the search probability before and after or the approximate value of the search probability, that is, the “increase amount of the sum of the weighted normalized search areas for each attribute included in the key included in the entry” (function P2 ), In the subtree selection procedure, when there are a plurality of entries having the smallest first penalty (determined in step S12-13 in FIG. 13), the second penalty is compared between them (FIG. 13). In step S12-15), the entry having the smallest second penalty is selected (step S12-18 in FIG. 13). As a result, the search tree construction method has both the dimension increase reduction effect by the first penalty and the search efficiency improvement effect by the second penalty. In particular, when the “increase in the sum of weighted normalized search areas for each attribute included in the key included in the entry” is used as the second penalty, the first penalty is only the attribute type. In contrast, it can be said that this is a search tree construction method that combines “the effect of improving the search efficiency by considering the value of each attribute (dimension)” by the second penalty.

１０１ネットワーク
２０１サーバ・コンピュータ装置
２０２通信部
２０３演算部
３０４記憶部
３０１クライアント・コンピュータ装置
３０２通信部
３０３タプル／検索式作成表示部
４０１ルートノード
４０２インナーノード
４０３リーフノード
５０１タプル表
５０２インナーノードのエントリ表
５０３リーフノードのエントリ表
５０４属性表 DESCRIPTION OF SYMBOLS 101 Network 201 Server computer apparatus 202 Communication part 203 Operation part 304 Storage part 301 Client computer apparatus 302 Communication part 303 Tuple / retrieval expression display part 401 Root node 402 Inner node 403 Leaf node 501 Tuple table 502 Inner node entry table 503 Leaf node entry table 504 Attribute table

Claims

A tuple that is a unit of information to be stored and searched has one or more lengths including at least a list of attribute-value pairs, and one or more types of the attributes and the lengths of the arrays are the same or different. A plurality of the tuples are stored, a search key is indexed from the tuples, and a search tree is constructed. The search tree has a structure having nodes in a hierarchy. The lower leaf node has a tuple identifier and key for identifying the tuple as an entry, and an inner node higher than the leaf node of the node has a pointer which is position information of a child node which is a lower node as an entry. Has a key,
In building the index, when inserting an entry into a node:
Given one entry X to be inserted and the number L of levels to insert the entry (the number of hierarchies that increases by 1 each time the hierarchy goes up with the leaf node as 0),
First, the root node is the operation target node,
In the operation target node, one existing entry is selected based on “the penalty when the entry X is inserted for the existing entry already included in the operation target node”,
The child node indicated by the pointer included in the existing entry is selected as the next operation target node,
If the number of levels of the next operation target node is greater than L, recursively repeating the selection of the next operation target node from the next operation target node,
When the number of levels of the next operation target node has reached L, determining this node as a selected node;
Adding the entry X to the selected node if the selected node does not have the maximum number of allowable entries;
Splitting the selected node while adding the entry X when the selected node already has the maximum number of allowable entries;
In the case of performing the division, updating the entry of the upper node of the selected node;
An information storage and retrieval method comprising:

The step of performing the division according to claim 1,
When each entry obtained by adding the entry X to a plurality of entries included in the selected node is sequentially distributed to two groups, an entry obtained by summing the entries already included in each group is defined as an existing entry A, and the entries that are distributed Selecting a group as a distribution destination based on each penalty determined for each group as an entry B, and performing a distribution;
Each entry distributed to one of the two groups as an entry of the selected node, and each entry distributed to another group as an entry of a newly generated node;
Inserting the newly generated node entry into the parent node of the node;
An information storage and retrieval method comprising:

The step of determining a selection node according to claim 1 or 2,
An information storage and retrieval method, wherein the existing entry having the smallest penalty is selected in the operation target node.

The penalty of claim 3 is
When the entry B is inserted into the existing entry A already included in the operation target node, the entry A when the entry A before the entry B is inserted is compared with the entry A after the insertion. Is defined as a value indicating the increment in value corresponding to the probability of being searched,
As a value corresponding to the probability that the entry A is searched,
Using the sum of the weighted normalized search area for each attribute included in the key included in the entry A,
The weighted normalized search area is
It is a value obtained by multiplying the normalized search area by the probability that the attribute is used in the search formula,
The normalized search target area, the formula ((q-a) (b -p) - (q-p) 2/2) / ((b-a) 2/2)
(Where p and q are the minimum value and maximum value of the attribute included in the key, respectively. A and b are the minimum value of the attribute included in all tuple sets that have been inserted into the search tree so far. Maximum value.)
Stipulated in
An information storage and retrieval method characterized by the above.

The penalty of claim 3 is
When the entry B is inserted into the existing entry A already included in the operation target node, the entry A when the entry A before the entry B is inserted is compared with the entry A after the insertion. A method for storing and retrieving information, wherein the value indicates an increment of the number of attribute types included in a key included in the key.

The step of determining a selection node according to claim 1 or 2,
Using the two types of penalties,
In the operation target node, when the existing entry having the smallest first penalty is selected, and there are a plurality of the smallest existing entries at this time, the second penalty is further selected among the smallest existing entries. An information storage and retrieval method characterized by selecting the smallest existing entry.

In claim 6,
When the entry B is inserted into the existing entry A already included in the operation target node, the first penalty compares the entry A before the entry B is inserted with the entry A after the insertion. Value indicating the increment of the number of attribute types included in the key included in the entry A,
When the entry B is inserted into the existing entry A already included in the operation target node, the second penalty compares the entry A before the entry B is inserted with the entry A after the insertion. In this case, the information storage retrieval method is characterized in that the value indicates an increment of a value corresponding to the probability that the entry A is retrieved.

In claim 7,
As a value corresponding to the probability that the entry is searched,
A method for storing and retrieving information, wherein the sum of the weighted normalized search areas for each attribute included in the key included in the entry is used.

In claim 4 or 8,
As a probability that the attribute is used in the search formula, a value obtained by dividing the number of attribute appearances by the total number of tuples,
The number of appearances of the attribute is the number of appearances of the attribute in all tuple sets inserted in the search tree so far.
The total number of tuples is the number of elements of all tuple sets that have been inserted into the search tree so far.
An information storage and retrieval method characterized by the above.

In the search using the search tree according to claim 1 to 9,
For the search key that is the key as a search expression,
While changing the search target from the root node to the node below it,
One such node is the search target,
If the node is the inner node, for each entry included in the node, the key of the entry includes all the attributes included in the search key and corresponds to each attribute included in the search key The value or value range and the value or value range corresponding to the attribute included in the key of the entry match or partially match (one value or value range is included or partially overlaps the other value range) ), And if there is a match or partial match, the node search procedure is performed recursively using the node connected by the pointer of the entry as a search target and the search key as a search key. ,
If the node is the leaf node, for each entry included in the node, the key of the entry includes all the attributes included in the search key, and each attribute included in the search key includes A node search step of checking whether a corresponding value or value range includes a value corresponding to the attribute included in the key of the entry, and if included, adding the tuple identifier of the entry to a search result tuple identifier set;
An information storage and retrieval method characterized by comprising:

A tuple that is a unit of information to be stored and searched has one or more lengths including at least a list of attribute-value pairs, and one or more types of the attributes and the lengths of the arrays are the same or different. A plurality of the tuples are stored, a search key is indexed from the tuples, and a search tree is constructed. The search tree has a structure having nodes in a hierarchy. The lower leaf node has a tuple identifier and key for identifying the tuple as an entry, and an inner node higher than the leaf node of the node has a pointer which is position information of a child node which is a lower node as an entry. Has a key,
In building the index, when inserting an entry into a node:
Given one entry X to be inserted and the number L of levels to insert the entry (the number of hierarchies that increases by 1 each time the hierarchy goes up with the leaf node as 0),
First, the root node is the operation target node,
In the operation target node, one existing entry is selected based on “the penalty when the entry X is inserted for the existing entry already included in the operation target node”,
The child node indicated by the pointer included in the existing entry is selected as the next operation target node,
If the number of levels of the next operation target node is greater than L, recursively repeating the selection of the next operation target node from the next operation target node,
When the number of levels of the next operation target node has reached L, determining this node as a selected node;
Adding the entry X to the selected node if the selected node does not have the maximum number of allowable entries;
Splitting the selected node while adding the entry X when the selected node already has the maximum number of allowable entries;
In the case of performing the division, updating the entry of the upper node of the selected node;
Information storage and retrieval program for executing the program using a computer.

The step of dividing according to claim 11 comprises the steps of:
When each entry obtained by adding the entry X to a plurality of entries included in the selected node is sequentially distributed to two groups, an entry obtained by summing the entries already included in each group is defined as an existing entry A, and the entries that are distributed Selecting a group as a distribution destination based on each penalty determined for each group as an entry B, and performing a distribution;
Each entry distributed to one of the two groups as an entry of the selected node, and each entry distributed to another group as an entry of a newly generated node;
Inserting the newly generated node entry into the parent node of the node;
An information storage and retrieval program characterized by including:

The step of determining a selection node according to claim 11 to 12,
The information storage / retrieval program, wherein the existing entry having the smallest penalty is selected in the operation target node.

The penalty of claim 13 is
When the entry B is inserted into the existing entry A already included in the operation target node, the entry A when the entry A before the entry B is inserted is compared with the entry A after the insertion. Is defined as a value indicating the increment in value corresponding to the probability of being searched,
As a value corresponding to the probability that the entry A is searched,
Using the sum of the weighted normalized search area for each attribute included in the key included in the entry A,
The weighted normalized search area is
It is a value obtained by multiplying the normalized search area by the probability that the attribute is used in the search formula,
The normalized search target area, the formula ((q-a) (b -p) - (q-p) 2/2) / ((b-a) 2/2)
(Where p and q are the minimum value and maximum value of the attribute included in the key, respectively. A and b are the minimum value of the attribute included in all tuple sets that have been inserted into the search tree so far. Maximum value.)
Stipulated in
An information storage and retrieval program characterized by that.

The penalty of claim 13 is
When the entry B is inserted into the existing entry A already included in the operation target node, the entry A when the entry A before the entry B is inserted is compared with the entry A after the insertion. An information storage / retrieval program characterized by being a value indicating an increase in the number of attribute types included in a key included in.

The step of determining a selection node according to claim 11 to 12,
Using the two types of penalties,
In the operation target node, when the existing entry having the smallest first penalty is selected, and there are a plurality of the smallest existing entries at this time, the second penalty is further selected among the smallest existing entries. An information storage and retrieval program characterized by selecting the smallest existing entry.

In claim 16,
When the entry B is inserted into the existing entry A already included in the operation target node, the first penalty compares the entry A before the entry B is inserted with the entry A after the insertion. Value indicating the increment of the number of attribute types included in the key included in the entry A,
When the entry B is inserted into the existing entry A already included in the operation target node, the second penalty compares the entry A before the entry B is inserted with the entry A after the insertion. In this case, the information storage retrieval program is characterized in that the entry A is a value indicating an increment of a value corresponding to the probability of retrieval.

In claim 17,
As a value corresponding to the probability that the entry is searched,
A summation of the weighted normalized search area for each attribute included in the key included in the entry is used.

In claim 14 or 18,
As a probability that the attribute is used in the search formula, a value obtained by dividing the number of attribute appearances by the total number of tuples,
The number of appearances of the attribute is the number of appearances of the attribute in all tuple sets inserted in the search tree so far.
The total number of tuples is the number of elements of all tuple sets that have been inserted into the search tree so far.
An information storage and retrieval program characterized by that.

In the search using the search tree according to claim 11 to 19,
For the search key that is the key as a search expression,
While changing the search target from the root node to the node below it,
One such node is the search target,
If the node is the inner node, for each entry included in the node, the key of the entry includes all the attributes included in the search key and corresponds to each attribute included in the search key The value or value range and the value or value range corresponding to the attribute included in the key of the entry match or partially match (one value or value range is included or partially overlaps the other value range) ), And if there is a match or partial match, the node search procedure is performed recursively using the node connected by the pointer of the entry as a search target and the search key as a search key. ,
If the node is the leaf node, for each entry included in the node, the key of the entry includes all the attributes included in the search key, and each attribute included in the search key includes A node search step of checking whether a corresponding value or value range includes a value corresponding to the attribute included in the key of the entry, and if included, adding the tuple identifier of the entry to a search result tuple identifier set;
An information storage and retrieval program characterized by comprising: