JPH11232283A

JPH11232283A - Information retrieving method

Info

Publication number: JPH11232283A
Application number: JP10027928A
Authority: JP
Inventors: Yoshi Kitagawa; 歓北川; Hidetake Kikuhara; 英武菊原
Original assignee: Hitachi Ltd; Hitachi System Engineering Ltd
Current assignee: Hitachi Ltd; Hitachi System Engineering Ltd
Priority date: 1998-02-10
Filing date: 1998-02-10
Publication date: 1999-08-27

Abstract

PROBLEM TO BE SOLVED: To compile a file to be suited to a purpose of retrieval when the file is retrieved by dividing each item of records to constitute a file into a plurality of ranks depending on a range of values capable of being taken by each item and providing it with a string of ranks. SOLUTION: A groups of records is judged whether it is divided or not by regarding the entire records of a retrieving object file 11 as a root of a look-up tree about a rank string, starting the retrieval from the root of the look-up tree and referring to a parameter 14 by a look-up tree generating part 12. When the group of records is divided, it is divided into the group of records corresponding to each rank of unprocessed item with top priority by referring to a data item comparison table 15. The group of records not to be divided is successively stored in a file 18 and the corresponding rank string is stored in a look-up tree table 17. A table 20 is created by providing each rank string of the table 17 with an address and a compilation file 21 for indexing sequence formed by storing each group of records of the file 18 in an area of its address is created by a file compiling part 19.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、電子計算機を利用
する情報検索の方法に係わり、特にファイルを構成する
レコードの各データ項目がそのデータ項目値のとり得る
範囲によって複数のランクに区分されるようなファイル
を与えられたランクの列によって検索する情報検索方法
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for retrieving information using a computer, and in particular, each data item of a record constituting a file is divided into a plurality of ranks according to a range that the data item value can take. Such a file is searched by a column of a given rank.

【０００２】[0002]

【従来の技術】データレコードの集合であるファイルや
データベースは、一般に複数のアプリケーション又は複
数の部門によって共同利用される。しかしファイルやデ
ータベースの編成が必ずしも個々のアプリケーションや
部門の検索目的に適合した編成になっているとは限らな
いので、元々のファイルやデータベースを個々の検索目
的にとって最適な編成となるよう再編成することが行わ
れる。例えばアプリケーションによってはファイルを構
成するレコードの各データ項目がそのデータ項目値のと
り得る範囲によって複数のランクに区分されるものと考
え、ランク列を与えることによってファイルから目的の
ランク列に対応するレコード群を抽出するような検索方
法が便利な場合がある。この場合には検索目的に応じて
各データ項目をランクに区分することになるし、また一
般に複数キーによるファイル検索となる。2. Description of the Related Art Generally, a file or database, which is a set of data records, is shared by a plurality of applications or a plurality of departments. However, the organization of the files and databases is not always suitable for the search purpose of each application or department, so reorganize the original files and databases so that they are optimal for each search purpose. Is done. For example, depending on the application, it is considered that each data item of a record constituting a file is divided into a plurality of ranks according to a range that the data item value can take, and a record corresponding to a target rank sequence is obtained from a file by giving a rank sequence. Search methods that extract groups may be useful. In this case, each data item is classified into ranks according to the search purpose, and file search is generally performed using a plurality of keys.

【０００３】（財）日本情報処理開発協会編；「データ
ベーススペシャリストテキスト」（中央情報処理教育研
究所，１９９４年）に記載されているように、複数キー
による索引順編成ファイルの検索を高速化するために、
各キーに対応する索引を主記憶領域（メモリ）に常駐に
し、この複数組の索引を用いて目的とするレコード又は
レコード群を検索する方法が知られている。しかしこの
方法は複数の索引をメモリ常駐とするため一般に大容量
のメモリを必要とするし、上記のようにランク列を与え
ることによって目的のレコードを抽出する場合には各索
引を１つ１つ検索しながら対象レコードを絞っていくた
め検索時間もかかり、このような検索目的にとって最適
な検索方法とは言えない。As described in "Database Specialist Text" (Central Information Processing Educational Research Institute, 1994), edited by the Japan Information Processing Development Association, speeding up retrieval of index-sequenced files using a plurality of keys. for,
A method is known in which an index corresponding to each key is made resident in a main storage area (memory), and a target record or record group is searched using the plurality of sets of indexes. However, this method generally requires a large amount of memory because a plurality of indexes are made to be memory-resident, and when extracting a target record by giving a rank sequence as described above, each index is used one by one. It takes a lot of time to search for the target records while searching, and it is not the most suitable search method for such a search purpose.

【０００４】またハッシュ関数を用いて直接編成ファイ
ルをアクセスする方法が上記の「データベーススペシャ
リストテキスト」に記載されている。この方法は検索キ
ーとして選択されたデータ項目のデータ項目値を引数と
するハッシュ関数を用いて目的のレコードを格納する記
憶領域のアドレスを求めた後、直接編成ファイルの求め
られたアドレスのレコードにアクセスするものであり、
メモリ上に索引テーブルを設ける必要がないし、高速に
目的のレコードを検索することができる。しかし検索キ
ーはただ１つに限られるので、上記のランク列のように
複数キーによるファイル検索には不向きである。A method of directly accessing an organization file by using a hash function is described in the above-mentioned "database specialist text". This method uses a hash function that takes the data item value of the data item selected as a search key as an argument, finds the address of the storage area that stores the target record, and then directly enters the record at the found address in the organization file. Access,
There is no need to provide an index table on the memory, and a target record can be searched at high speed. However, since the search key is limited to one, it is not suitable for a file search using a plurality of keys as in the above-described rank sequence.

【０００５】また「Ｏｒａｃｌｅ７．３新機能コース」
（日本オラクル株式会社，１９９６年）に記載されてい
るように、レコードを構成するデータ項目のうち項目値
がラランクに区分され得るデータ項目について各ランク
を１ビットの情報で表現するビットマップ・インデック
スを設け、このビットマップ・インデックスを参照する
ことによって与えられたランク列に対応するレコード群
を高速に検索する方法がある。この方法は、ビットマッ
プ・インデックスをメモリ常駐としても上記の複数組の
索引より必要なメモリ容量を削減できるが、与えられた
ランク列に対応するレコード群を抽出するには、指定さ
れた各データ項目について検索条件に合致するすべての
ビットマップの論理和をとり、得られたデータ項目の論
理和のすべてについて論理積をとり、ヒットしたレコー
ド群を検索対象とするファイルから抽出する方法であ
り、検索対象ファイルを個々の検索目的にとって最適と
なるよう再編成するという考え方はない。[0005] Also, "Oracle 7.3 New Function Course"
As described in (Japan Oracle Corporation, 1996), among data items constituting a record, a data item whose item value can be classified into a rank, a bitmap index for expressing each rank by 1-bit information. And a method of quickly searching for a record group corresponding to a given rank sequence by referring to the bitmap index. Although this method can reduce the required memory capacity compared to the above-mentioned plural sets of indexes even if the bitmap index is made memory-resident, in order to extract a record group corresponding to a given rank column, it is necessary to specify each specified data. This is a method of taking the logical sum of all bitmaps that match the search condition for the item, taking the logical product of all the logical sums of the obtained data items, and extracting the record group that hits from the file to be searched. There is no concept of reorganizing search target files to be optimal for individual search purposes.

【０００６】[0006]

【発明が解決しようとする課題】上記のように従来技術
によれば、複数キーによるファイル検索、特に複数のデ
ータ項目に亘ってランクを指定するランク列による検索
をする場合に、個々の検索目的にとって最適となるよう
に索引と検索対象ファイルの両方を再構成するという考
え方はなかった。As described above, according to the prior art, when performing a file search using a plurality of keys, particularly a search using a rank sequence designating a rank over a plurality of data items, individual search purposes are performed. There was no idea to restructure both the index and the search target file to be optimal for.

【０００７】本発明の目的は、複数のデータ項目に亘っ
てランクを指定するランク列による検索をする場合に、
個々の検索目的にとって最適となるように索引と検索対
象ファイルの両方を再編成することにある。[0007] An object of the present invention is to perform a search using a rank sequence specifying a rank over a plurality of data items,
It consists in reorganizing both the index and the file to be searched so as to be optimal for the respective search purpose.

【０００８】[0008]

【課題を解決するための手段】本発明は、ファイルに含
まれるレコード群の全体をランク列に関する探索木の根
とみなし、探索木の根に対応するレコード群から出発し
て設定された分割条件によってレコード群をさらに分割
するか否か判定し、レコード群を探索木の根又は節とみ
なしてさらに分割する場合にはまだ分割に関与しないデ
ータ項目の中で最も優先度の高いデータ項目の各ランク
に対応するレコード群にそれぞれ分割し、レコード群を
探索木の葉とみなしてそれ以上分割しない場合にはレコ
ード群と対応するランク列をそれぞれ検索対象ファイル
と探索木テーブルに順に格納する方法を特徴とする。こ
のようにして探索木テーブルには検索目的に従って優先
度の高いランク列から順に探索木情報が格納され、検索
対象ファイルには探索木テーブルの探索木情報の順に対
応するレコード群が格納される。According to the present invention, the entire record group included in a file is regarded as the root of a search tree relating to a rank sequence, and the record group is divided according to a division condition set starting from the record group corresponding to the root of the search tree. It is determined whether or not to further divide, and if the record group is regarded as the root or node of the search tree and further divided, a record group corresponding to each rank of the data item having the highest priority among the data items not yet involved in the division When the record group is regarded as a leaf of the search tree and is not further divided, a rank sequence corresponding to the record group is sequentially stored in the search target file and the search tree table, respectively. In this way, the search tree table stores search tree information in order from the rank sequence having the highest priority according to the search purpose, and the search target file stores records corresponding to the search tree information in the search tree table in order.

【０００９】索引順編成ファイルを作成する場合には、
索引順編成ファイルの記憶領域割当法に従って探索木テ
ーブルの各ランク列に対応するレコード群に順に記憶領
域を割り当て、その先頭のアドレスをランク列に付加し
て探索木索引テーブルを作成するとともに、各レコード
群を検索対象ファイル上の決定されたアドレスから始ま
る記憶領域に順に格納する。検索条件としてランク列が
与えられ、この索引順編成ファイルを検索するときに
は、探索木索引テーブルを検索して与えられたランク列
に対応するアドレスを求め、このアドレスによって索引
順編成ファイル上の目的のレコード群にアクセスする。
このようにして検索条件に合致するレコード群を１回の
シーケンシャル・アクセスによって抽出することがで
き、検索目的に適合するファイル検索を効率よく行うこ
とができる。To create an index sequential file,
A storage area is sequentially allocated to a record group corresponding to each rank column of the search tree table according to the storage area allocation method of the index sequential organization file, and a head address thereof is added to the rank column to create a search tree index table. The records are sequentially stored in a storage area starting from the determined address on the search target file. A rank sequence is given as a search condition, and when searching this index-sequential file, the search tree index table is searched for an address corresponding to the given rank sequence, and the target address in the index-sequential file is obtained from this address. Access records.
In this way, a record group that matches the search condition can be extracted by one sequential access, and a file search suitable for the search purpose can be efficiently performed.

【００１０】また直接編成ファイルを作成する場合に
は、探索木テーブルに収容される各ランク列に対応する
レコード群のシーケンス番号を付加して探索木索引テー
ブルを作成するとともに、このシーケンス番号を引数と
するハッシュ関数によってレコード群を収容する記憶領
域のアドレスを計算し、検索対象ファイルに収容される
各レコード群を直接編成ファイルの決定されたアドレス
の記憶領域に格納する。検索条件としてランク列が与え
られこの直接編成ファイルを検索するときには、探索木
索引テーブルを検索して与えられたランク列に対応する
シーケンス番号を求め、ハッシュ関数によってこのシー
ケンス番号から対応するレコード群のアドレスを計算
し、直接編成ファイルの指定されたランク列に対応する
レコード群にアクセスする。When a direct organization file is created, a search tree index table is created by adding a sequence number of a record group corresponding to each rank sequence contained in the search tree table, and this sequence number is used as an argument. Then, the address of the storage area accommodating the record group is calculated by a hash function, and each record group accommodated in the search target file is directly stored in the storage area of the determined address of the organized file. When a rank sequence is given as a search condition and this direct organization file is searched, a search tree index table is searched to find a sequence number corresponding to the given rank sequence, and a hash function is used to find a corresponding record group from this sequence number. Calculate the address and directly access the record group corresponding to the specified rank column in the organization file.

【００１１】[0011]

【発明の実施の形態】以下本発明の一実施形態について
図面を用いて詳細に説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below in detail with reference to the drawings.

【００１２】図１は、電子計算機を利用して検索目的に
合った探索木情報を作成し、この探索木情報を基にして
索引順ファイルを編成し、情報検索を行うまでの計算機
の処理手順を示す図である。検索対象ファイル１１は、
検索対象とする元々のファイルである。データ項目ラン
ク付けテーブル１３は、検索対象ファイル１１のレコー
ドに含まれるデータ項目の値を上限値と下限値の範囲を
設定してランク付けするテーブルである。分割条件パラ
メータ１４は、特定のランク列をもつレコード群をさら
に分割するか否かを判定する条件を設定するパラメータ
である。ここでランク列とは、レコードに含まれるデー
タ項目について項目１のランク、項目２のランク、・・
・を限定するものである。データ項目比較テーブル１５
は、レコード群をさらに分割する場合に次に優先項目と
するデータ項目を選択するときに参照されるテーブルで
ある。ワークファイル１６は、検索対象ファイル１１の
レコード群を分割するとき使用される作業用ファイルで
あり、ワークファイル１６−１とワークファイル１６−
２とを有する。ワークファイル１６−１及びワークファ
イル１６−２は、交互に入力ファイルと出力ファイルと
して使用される。探索木テーブル１７は、レコード群に
対応するランク列を探索木情報として格納するテーブル
である。検索対象ファイル１８は、検索対象ファイル１
１を探索木テーブル１７の探索木情報に対応するように
配列し直したファイルである。検索対象ファイル１１、
データ項目ランク付けテーブル１３、分割条件パラメー
タ１４、データ項目比較テーブル１５、ワークファイル
１６、探索木テーブル１７及び検索対象ファイル１８
は、計算機の記憶装置に格納される。FIG. 1 shows a processing procedure of a computer for creating search tree information suitable for a search purpose using an electronic computer, organizing an index sequential file based on the search tree information, and performing information search. FIG. The search target file 11 is
The original file to be searched. The data item ranking table 13 is a table that ranks the values of the data items included in the records of the search target file 11 by setting a range between an upper limit value and a lower limit value. The division condition parameter 14 is a parameter for setting a condition for determining whether to further divide a record group having a specific rank sequence. Here, the rank column refers to the rank of item 1 and the rank of item 2 for the data items included in the record.
・ It is something which limits. Data item comparison table 15
Is a table that is referred to when a data item to be the next priority item is selected when the record group is further divided. The work file 16 is a work file used when dividing the record group of the search target file 11, and includes a work file 16-1 and a work file 16-.
And 2. The work file 16-1 and the work file 16-2 are used alternately as an input file and an output file. The search tree table 17 is a table that stores a rank sequence corresponding to a record group as search tree information. The search target file 18 is the search target file 1
1 is a file in which files 1 are rearranged so as to correspond to the search tree information in the search tree table 17. Search target file 11,
Data item ranking table 13, division condition parameter 14, data item comparison table 15, work file 16, search tree table 17, and search target file 18
Is stored in the storage device of the computer.

【００１３】探索木作成部１２は、計算機の主記憶装置
に格納されるプログラムである。探索木作成部１２は、
検索対象ファイル１１のレコードを順に読み込んで一方
のワークファイル１６に出力して検索対象ファイル１１
全体についてのレコード群を作成する。次にワークファ
イル１６のレコード群を読み込み、データ項目ランク付
けテーブル１３を参照してレコードに含まれるデータ項
目値によって各データ項目のランク付けを行い、項目ご
と、ランクごとにレコード件数の計数を行いデータ項目
比較テーブル１５に格納する。次にデータ項目比較テー
ブル１５の項目別、ランク別に計数されたレコード件数
から基準項目の基準ランクをベースとする標準偏差を求
めて優先項目を決定し、分割条件パラメータ１４を参照
してレコード群をさらに分割するか否か判定する。レコ
ード群をさらに分割するのであれば、そのレコード群を
優先項目のランク別に分割して他方のワークファイル１
６に出力する。レコード群をそれ以上分割しないのであ
れば、そのレコード群に対応するランク列を探索木情報
として探索木テーブル１７に出力し、レコード群の実体
を順に検索対象ファイル１８に出力する。このようにし
てレコード群をランク列に対応させて細分化して行き、
すべてのレコード群の分割を終えたとき処理を終了す
る。最終的にはレコード群の探索木が作成され、探索木
テーブル１７にはこの探索木の上位の階層から順にラン
ク列の探索木情報が作成され、検索対象ファイル１８に
は各探索木情報に対応するレコード群が順に格納され
る。The search tree creating unit 12 is a program stored in the main storage of the computer. The search tree creation unit 12
The records of the search target file 11 are sequentially read and output to one of the work files 16 to output the search target file 11.
Create a record group for the whole. Next, the record group of the work file 16 is read, the data items are ranked according to the data item values included in the records with reference to the data item ranking table 13, and the number of records is counted for each item and rank. It is stored in the data item comparison table 15. Next, a priority item is determined by calculating a standard deviation based on the reference rank of the reference item from the number of records counted for each item and rank in the data item comparison table 15, and a record group is determined by referring to the division condition parameter 14. It is determined whether or not to divide further. If the record group is further divided, the record group is divided according to the rank of the priority item and the other work file 1
6 is output. If the record group is not further divided, the rank sequence corresponding to the record group is output to the search tree table 17 as search tree information, and the entities of the record group are sequentially output to the search target file 18. In this way, the record group is subdivided according to the rank column,
When the division of all the record groups is completed, the processing ends. Eventually, a search tree of a record group is created, search tree information of a rank sequence is created in the search tree table 17 in order from the upper layer of the search tree, and a search target file 18 corresponds to each search tree information. Records are sequentially stored.

【００１４】探索木索引テーブル２０は、探索木テーブ
ル１７の各探索木情報に対応するレコード群を格納する
アドレスを付加したものである。索引順編成ファイル２
１は、検索対象ファイル１８上の各レコード群を探索木
索引テーブル２０の探索木情報に対応するアドレスに格
納するファイルである。探索木索引テーブル２０及び索
引順編成ファイル２１は、計算機の記憶装置に格納され
る。ファイル編成部１９は、計算機の主記憶装置に格納
されるプログラムであり、探索木テーブル１７の各探索
木情報を読み込み、対応する検索対象ファイル１８上の
レコード群を索引順編成ファイル２１に格納するときの
先頭アドレスを決定し、探索木情報にアドレスを付加し
て探索木索引テーブル２０に格納し、検索対象ファイル
１８から対応するレコード群を読み込んで索引順編成フ
ァイル２１上の決定したアドレスから始まる領域に格納
する。The search tree index table 20 is obtained by adding an address for storing a record group corresponding to each piece of search tree information of the search tree table 17. Index sequential file 2
1 is a file that stores each record group on the search target file 18 at an address corresponding to the search tree information in the search tree index table 20. The search tree index table 20 and the index sequential organization file 21 are stored in a storage device of the computer. The file organization unit 19 is a program stored in the main storage device of the computer, reads each search tree information of the search tree table 17, and stores the corresponding record group in the search target file 18 in the index order organization file 21. The start address at the time is determined, the address is added to the search tree information, and the search tree information is stored in the search tree index table 20. The corresponding record group is read from the search target file 18 and starts from the determined address on the index sequential organization file 21. Store in area.

【００１５】２３は検索の対象とするランク列を入力す
る入力装置、２５は記憶装置上に格納される検索結果、
２６は検索結果を表示する表示装置、２７は検索結果を
出力するプリンタである。情報検索部２４は、計算機の
主記憶装置に格納されるプログラムであり、入力装置２
３から入力されたランク列をキーとして探索木索引テー
ブル２０を検索して対応するアドレスを取得し、索引順
編成ファイル２１上の取得したアドレスから始まる記憶
領域からレコード群を取り出して検索結果２５に出力
し、表示装置２６に表示し、またはプリンタ２７に出力
する。23 is an input device for inputting a rank sequence to be searched, 25 is a search result stored in a storage device,
26 is a display device for displaying the search results, and 27 is a printer for outputting the search results. The information search unit 24 is a program stored in the main storage device of the computer.
The search tree index table 20 is searched by using the rank sequence input from 3 as a key to obtain a corresponding address, a record group is extracted from a storage area starting from the obtained address on the index sequential file 21, and a search result 25 is obtained. The output is displayed on the display device 26 or output to the printer 27.

【００１６】計算機ハードウェアとして、パソコン、ワ
ークステーション、メインフレームなどの情報処理装置
を利用し、また計算機の記憶装置に上記のようなファイ
ル、パラメータ及びテーブルを格納し、探索木作成部１
２、ファイル編成部１９及び情報検索部２４のプログラ
ムを主記憶装置に格納して処理装置によって実行する情
報検索システムを構築できる。またこれらのプログラム
を計算機読み取り可能な記憶媒体に格納し、計算機に接
続される駆動装置を介してまたは伝送によって主記憶装
置に格納して実行することが可能である。An information processing device such as a personal computer, a workstation, or a mainframe is used as computer hardware, and the above-described files, parameters, and tables are stored in a storage device of the computer.
2. An information search system in which the programs of the file organization unit 19 and the information search unit 24 are stored in the main storage device and executed by the processing device can be constructed. Further, these programs can be stored in a computer-readable storage medium and stored in a main storage device via a driving device connected to the computer or by transmission and executed.

【００１７】図２は、索引順編成ファイルの探索木の構
成を示す概念図である。検索対象のデータレコードの全
体が探索木の根３１に位置付けられる。根３１は探索木
の節３２と葉３３に分割されている。図２で点線枠は根
３１又は節３２を示し、データレコード群が分岐状態の
ノードであることを示している。太線枠は葉３３を示
し、そこでデータレコード群が分岐止めのノードである
ことを示している。ノードを示す枠の上段は、レコード
に含まれる３つのデータ項目について各々のランクを示
すランク列である。ランクとはデータ項目の値の範囲に
基づいてなされた分類である。＊はその項目について未
分類状態を示す。例えばランク列が＊Ａ＊のノードは２
番目の項目がランクＡに該当するレコードの集合であ
り、ランク列がａＡ２のノードは１番目の項目がランク
ａ、２番目の項目がランクＡ、３番目の項目がランク２
に該当するレコードの集合である。枠の中段は、そのノ
ードの先頭レコードのファイル中のアドレスを相対バイ
トアドレス、相対ブロックアドレスなど相対アドレスで
表現するものである。根３１及び節３２にはレコードの
実体が割り当てられないから、相対アドレスはない。枠
の下段は、そのノードに含まれるレコードの件数を示
す。この例ではファイル中のレコードの総数は５００件
である。なおこの例でランク列が＊Ｂ＊のノードは、そ
のレコード件数が９０件であり、レコードの分割条件の
中のレコード件数の制限に従ってそれ以上ノードに細分
せず分岐止めとしている。FIG. 2 is a conceptual diagram showing the structure of a search tree of an index sequential file. The entirety of the data record to be searched is positioned at the root 31 of the search tree. The root 31 is divided into nodes 32 and leaves 33 of the search tree. In FIG. 2, the dotted frame indicates the root 31 or the node 32, and indicates that the data record group is a branching node. The thick line frame indicates the leaf 33, which indicates that the data record group is a branch stop node. The upper part of the frame indicating the node is a rank column indicating the rank of each of the three data items included in the record. The rank is a classification made based on the value range of the data item. * Indicates that the item is not classified. For example, a node whose rank sequence is * A * is 2
The second item is a set of records corresponding to rank A, and a node whose rank column is aA2 is the first item having rank a, the second item having rank A, and the third item having rank 2
Is a set of records corresponding to. The middle part of the frame expresses the address of the first record of the node in the file by a relative address such as a relative byte address or a relative block address. The root 31 and the node 32 have no relative address because no record entity is assigned. The lower part of the frame indicates the number of records included in that node. In this example, the total number of records in the file is 500. In this example, a node having a rank column of * B * has a record count of 90, and is not further divided into nodes according to the limit on the record count in the record division condition, and the branch is stopped.

【００１８】図３は、データ項目ランク付けテーブル１
３のデータ構成例を示す図である。項目名はレコードに
含まれるランク付けの対象となるデータ項目の名称であ
り、データ種別はそのデータ項目の値が数字列か文字列
かデータの種別を示す。ランク数はそのデータ項目のも
つランクの数を示し、各ランクの上限値と下限値でその
ランクについてのデータ項目値の範囲を示す。ただしデ
ータ種別が文字列の場合には、各ランクによってデータ
項目値が決定されるので範囲はなく、便宜的に下限値と
してデータ項目値を設定している。FIG. 3 shows a data item ranking table 1.
FIG. 3 is a diagram illustrating a data configuration example of No. 3; The item name is the name of the data item to be ranked included in the record, and the data type indicates whether the value of the data item is a numeric string or a character string, or the type of data. The number of ranks indicates the number of ranks of the data item, and the upper and lower limits of each rank indicate the range of the data item value for that rank. However, when the data type is a character string, there is no range since the data item value is determined by each rank, and the data item value is set as the lower limit value for convenience.

【００１９】図４は、データ項目比較テーブル１５のデ
ータ構成例を示す図である。項目名はレコードに含まれ
る基準項目を除くランク付けされたデータ項目の名称で
あり、ランク名はそのデータ項目のもつ各ランクの名称
であり、レコード件数はそのランクに属するレコードの
件数である。基準項目ｆの基準ランクｒに該当するレコ
ード件数は、対応するランクのレコード件数のうち基準
項目ｆの基準ランクｒという属性をもつレコードの件数
である。ここで基準項目とは、検索目的に適合するデー
タ項目である。標準偏差計算の中間値は、各項目の各ラ
ンクについて計算される中間値であり、標準偏差はこの
中間値から算出される各項目の標準偏差である。処理終
了フラグは対応する項目が優先項目となったとき、その
項目をすでにレコードの分割に関与した項目として後の
標準偏差の計算から外すためオンにセットされる。項目
名、ランク名及び処理終了フラグを除くところのレコー
ド件数から標準偏差までの数値は、探索木のノードに相
当するレコード群ごとに変わってくる。データ項目比較
テーブル１５は、根３１、節３２又は葉３３に相当する
レコード群に対応して作成される。テーブルに付加され
ている階層は、データ項目比較テーブル１５が属する階
層番号を示し、ランク列は当該テーブルに対応するレコ
ード群の各項目のランクを示す。初期状態では根３１に
対応するデータ項目比較テーブル１５がただ１つ用意さ
れており、階層は１でランク列は根３１のランク列であ
る。また項目名及びランク名は固定データが設定され、
他の項目は初期化されている。FIG. 4 is a diagram showing an example of the data structure of the data item comparison table 15. The item name is the name of the ranked data item excluding the reference item included in the record, the rank name is the name of each rank of the data item, and the number of records is the number of records belonging to that rank. The number of records corresponding to the reference rank r of the reference item f is the number of records having the attribute of the reference rank r of the reference item f among the records of the corresponding rank. Here, the reference item is a data item suitable for a search purpose. The intermediate value of the standard deviation calculation is an intermediate value calculated for each rank of each item, and the standard deviation is a standard deviation of each item calculated from this intermediate value. The processing end flag is set to ON when the corresponding item becomes a priority item so as to exclude the item from the calculation of the standard deviation later as an item that has already been involved in the record division. Numerical values from the number of records to the standard deviation except for the item name, rank name, and processing end flag vary for each record group corresponding to a node in the search tree. The data item comparison table 15 is created corresponding to a record group corresponding to the root 31, the node 32, or the leaf 33. The layer added to the table indicates the layer number to which the data item comparison table 15 belongs, and the rank column indicates the rank of each item of the record group corresponding to the table. In the initial state, only one data item comparison table 15 corresponding to the root 31 is prepared. The hierarchy is 1 and the rank column is the rank column of the root 31. In addition, fixed data is set for item names and rank names,
Other items have been initialized.

【００２０】図５は、分割条件パラメータ１４のデータ
例を示す図である。レコード群のレコード件数≧１４０
件は、探索木のノードに相当するレコード群の件数が指
定件数以上であればレコード群をさらに分割することを
示す。また上限値≧基準項目ｆの基準ランクｒに該当す
るレコードの比率≧下限値は、対象とするレコード群に
ついて基準項目ｆの基準ランクｒに該当するレコードの
比率が指定範囲内であればレコード群をさらに分割する
ことを示す。上限値又は下限値のみを対象とする条件で
あってもよい。あるいは「基準項目が９０点以上あるレ
コード数が３レコード未満であれば分割する」のような
分割条件でもよい。レコード群のデータ容量≧格納ペー
ジ容量は、対象とするレコード群のデータ容量がページ
容量（例えば４ＫＢ）以上であれば分割することを示
す。対象とするレコード群がすべての条件を満足すると
きレコード群をさらに分割する。いずれかの条件を満足
しないとき分割止めとする。FIG. 5 is a diagram showing an example of data of the division condition parameter 14. Number of records in record group ≥ 140
The case indicates that the record group is further divided if the number of records corresponding to the search tree node is equal to or greater than the specified number. If the upper limit value ≧ the ratio of the records corresponding to the reference rank r of the reference item f ≧ the lower limit value, the record group is set if the ratio of the records corresponding to the reference rank r of the reference item f is within the specified range. Is further divided. A condition that targets only the upper limit or the lower limit may be used. Alternatively, a division condition such as “divide if the number of records having 90 or more reference items is less than 3 records” may be used. The data capacity of the record group ≧ the storage page capacity indicates that the data is divided if the data capacity of the target record group is equal to or larger than the page capacity (for example, 4 KB). When the target record group satisfies all the conditions, the record group is further divided. If any of the conditions is not satisfied, the division is stopped.

【００２１】図６は、探索木テーブル１７のデータ構成
例を示す図である。探索木テーブル１７は、探索木の葉
３３に相当する各レコード群に対応する探索木情報を格
納するものである。探索木情報は優先度の高いレコード
群から順に、すなわち探索木の階層の高いレコード群か
ら順に配列される。各探索木情報はテーブルの１行分の
データであり各データ項目のランク及びレコード件数よ
り成る。探索木索引テーブル２０は、探索木テーブル１
７の各行にアドレスを付加したものである。アドレスは
対応するレコード群の先頭アドレスを示す。後述の探索
木索引テーブル２８は、探索木テーブル１７の各行にシ
ーケンス番号を付加したものである。シーケンス番号は
直接編成ファイルにおいて対応するレコード群の順序を
示す番号である。FIG. 6 is a diagram showing an example of the data structure of the search tree table 17. The search tree table 17 stores search tree information corresponding to each record group corresponding to the leaves 33 of the search tree. The search tree information is arranged in order from the record group with the highest priority, that is, from the record group with the highest hierarchy in the search tree. Each piece of search tree information is data for one row of the table, and includes the rank of each data item and the number of records. The search tree index table 20 is the search tree table 1
7 is obtained by adding an address to each row. The address indicates the start address of the corresponding record group. The search tree index table 28 described later is obtained by adding a sequence number to each row of the search tree table 17. The sequence number is a number indicating the order of the corresponding record group in the direct organization file.

【００２２】図７ａ及び図７ｂは、探索木作成部１２の
処理の流れを示すフローチャートである。探索木作成部
１２は、検索対象ファイル１１からレコードを順に入力
し、一方のワークファイル１６に出力する（ステップ４
１）。このワークファイル１６を入力側のワークファイ
ルとする。次にワークファイル１６の全レコードについ
て分割処理を終了したか否かを判定する（ステップ４
２）。入力側と出力側のワークファイル１６のいずれに
も有効レコードがなくなった時点が分割処理終了時点で
ある。処理終了していなければ、探索木の当該階層の処
理を終了したか否かを判定する（ステップ４３）。現在
の階層に対応するデータ項目比較テーブル１５がなくな
った時点が当該階層の処理終了時点である。当該階層の
処理を終了していなければ、次に処理対象とする入力側
のワークファイル１６上のレコード群及び対応するデー
タ項目比較テーブル１５に位置付ける（ステップ４
４）。次に対象とするデータ項目比較テーブル１５の処
理終了フラグがオフの項目、すなわち未処理の項目があ
るか否か判定する（ステップ４５）。未処理の項目があ
れば（ステップ４５ＹＥＳ）、入力側のワークファイル
１６の対象とするレコード群の各レコードを順に読み込
み、データ項目ランク付けテーブル１３を参照して対象
とするデータ項目比較テーブル１５の処理終了フラグが
オフの各項目についてランクを決定し、該当するランク
のレコード件数に１ずつ計数し、基準項目ｆの基準ラン
クｒに該当するレコード件数を計数し、データ項目比較
テーブル１５に格納する（ステップ４６）。次に処理終
了フラグがオフの各項目について基準項目ｆの基準ラン
クｒの影響度合いを表現する標準偏差を計算する（ステ
ップ４７）。次に未処理の項目が２項目以上あれば、各
項目の標準偏差を比較して最大の標準偏差をもつ項目を
優先項目とする（ステップ４８）。ここで最大の標準偏
差をもつ項目は、基準項目に対する影響度の最も高い項
目である。未処理の項目が１項目だけであれば、ステッ
プ４８をスキップする。次にデータ項目比較テーブル１
５の優先項目に対応する処理終了フラグをオンにセット
する（ステップ４９）。FIGS. 7A and 7B are flowcharts showing the processing flow of the search tree creating unit 12. The search tree creating unit 12 sequentially inputs records from the search target file 11 and outputs the records to one work file 16 (step 4).
1). This work file 16 is used as a work file on the input side. Next, it is determined whether the division processing has been completed for all records of the work file 16 (step 4).
2). The point in time when there is no valid record in both the input side and output side work files 16 is the end point of the division process. If the processing has not been completed, it is determined whether or not the processing of the hierarchy in the search tree has been completed (step 43). The point in time when the data item comparison table 15 corresponding to the current layer disappears is the point at which the processing of the layer is completed. If the processing of the layer has not been completed, the record is positioned in the record group on the work file 16 on the input side to be processed next and the corresponding data item comparison table 15 (step 4).
4). Next, it is determined whether or not there is an item for which the processing end flag of the target data item comparison table 15 is off, that is, there is an unprocessed item (step 45). If there is an unprocessed item (step 45 YES), each record of the target record group of the work file 16 on the input side is sequentially read, and the target data item comparison table 15 is read by referring to the data item ranking table 13. The rank is determined for each item for which the processing end flag is off, the number of records of the corresponding rank is counted by one, the number of records corresponding to the reference rank r of the reference item f is counted, and stored in the data item comparison table 15. (Step 46). Next, a standard deviation expressing the degree of influence of the reference rank r of the reference item f is calculated for each item for which the processing end flag is off (step 47). Next, if there are two or more unprocessed items, the standard deviation of each item is compared, and the item having the largest standard deviation is set as the priority item (step 48). Here, the item having the largest standard deviation is the item having the highest influence on the reference item. If there is only one unprocessed item, step 48 is skipped. Next, data item comparison table 1
The processing end flag corresponding to the priority item No. 5 is set on (step 49).

【００２３】次に図７ｂに移り、対象とするデータ項目
比較テーブル１５と分割条件パラメータ１４とを照合し
てレコード群をさらに分割するか否か判定する（ステッ
プ５０）。レコード群を分割するのであれば（ステップ
５０ＹＥＳ）、データ項目比較テーブル１５の優先項目
のランク数だけのデータ項目比較テーブル１５を生成す
る（ステップ５１）。生成した各データ項目比較テーブ
ル１５の階層は現在の階層＋１であり、ランク列は現在
のランク列のうち優先項目に対応する項目のランクを各
ランク名に更新したものである。また処理終了フラグが
オフの項目についてレコード件数から標準偏差までの数
値項目を初期化する。また生成の元となったデータ項目
比較テーブル１５を消去する。7B, the target data item comparison table 15 is compared with the division condition parameter 14 to determine whether or not the record group is to be further divided (step 50). If the record group is to be divided (step 50 YES), the data item comparison table 15 is generated for the number of ranks of the priority items in the data item comparison table 15 (step 51). The hierarchy of each generated data item comparison table 15 is the current hierarchy + 1, and the rank column is obtained by updating the rank of the item corresponding to the priority item in the current rank column to each rank name. In addition, for items for which the processing end flag is off, numerical items from the number of records to the standard deviation are initialized. Further, the data item comparison table 15 that is the source of the generation is deleted.

【００２４】次に当該レコード群を当該優先項目の各ラ
ンク別のレコード群に分割する処理に移る。全ランクの
分割処理が終了していなければ（ステップ５２ＮＯ）、
入力側のワークファイル１６上の当該レコード群の先頭
に位置付け、優先項目のランクのうち次に処理対象とす
るランクに該当するレコードのみを順に読み込み、出力
側のワークファイル１６上の前に書き込んだレコード群
に続く領域に抽出したレコード群を順に出力する（ステ
ップ５３）。このとき読み込んだ各レコードにデータ項
目ランク付けテーブル１３のランク区分情報を適用し、
当該優先項目の当該ランクに該当するレコードのみを抽
出して出力側のワークファイル１６に出力する。また当
該ランクのレコード群の先頭アドレスを後のレコード群
の位置付けのために保存する。次に優先項目の次のラン
クにアップし（ステップ５４）、ステップ５２に戻る。
優先項目のすべてのランクについて分割処理が終了した
とき（ステップ５２ＹＥＳ）、ワークファイル１６の入
力／出力切替の判定と切替を行う（ステップ５５）。す
なわち入力処理の終了した入力側のワークファイル１６
のレコード群を無効レコードとし、入力側のワークファ
イル１６に有効レコードが残っているか否かを判定す
る。入力側のワークファイル１６に有効レコードがあれ
ば直ちにステップ４３へ行く。入力側のワークファイル
１６に有効レコードがなくなったとき、出力側のワーク
ファイル１６を入力側にし、入力側のワークファイル１
６を出力側に切り替えてからステップ４３へ行く。未処
理の項目がない（ステップ４５ＮＯ）か、またはレコー
ド群をさらに分割しないのであれば（ステップ５０Ｎ
Ｏ）、入力側のワークファイル１６の対象とするレコー
ド群を順に読み込み、検索対象ファイル１８の前に書き
込んだレコード群に続く領域に出力する（ステップ５
６）。このとき出力したレコードの数を計数する。次に
当該レコード群の探索木情報を作成する（ステップ５
７）。探索木情報のランク列は当該レコード群に対応す
るデータ項目比較テーブル１５のランク列を格納する。
またレコード件数として計数したレコード数を格納す
る。次に作成した探索木情報を探索木テーブル１７の次
の行として出力し（ステップ５８）、ステップ５５へ行
く。Next, the process proceeds to a process of dividing the record group into record groups for each rank of the priority item. If the division processing of all ranks has not been completed (step 52 NO),
Positioned at the head of the record group on the input work file 16, read only the records corresponding to the rank to be processed next among the ranks of the priority items in order, and written before the output work file 16 The extracted record groups are sequentially output to the area following the record group (step 53). At this time, the rank division information of the data item ranking table 13 is applied to each read record,
Only records corresponding to the rank of the priority item are extracted and output to the work file 16 on the output side. Also, the head address of the record group of the rank is stored for positioning the subsequent record group. Next, the rank is raised to the next rank of the priority item (step 54), and the process returns to step 52.
When the division processing has been completed for all ranks of the priority item (YES in step 52), determination and switching of input / output switching of the work file 16 are performed (step 55). That is, the input-side work file 16 for which the input processing has been completed.
Are determined as invalid records, and it is determined whether or not valid records remain in the work file 16 on the input side. If there is a valid record in the work file 16 on the input side, the process immediately proceeds to step 43. When there are no more valid records in the work file 16 on the input side, the work file 16 on the output side is changed to the input side, and the work file 1 on the input side is used.
After switching 6 to the output side, go to step 43. If there is no unprocessed item (NO at step 45) or if the record group is not to be further divided (step 50N)
O), the target record group of the work file 16 on the input side is sequentially read and output to the area following the record group written before the search target file 18 (step 5).
6). The number of records output at this time is counted. Next, search tree information of the record group is created (step 5).
7). The rank sequence of the search tree information stores the rank sequence of the data item comparison table 15 corresponding to the record group.
Also, the number of records counted as the number of records is stored. Next, the created search tree information is output as the next row of the search tree table 17 (step 58), and the procedure goes to step 55.

【００２５】このようにして当該階層のすべてのレコー
ド群について処理を終了したとき（ステップ４３ＹＥ
Ｓ）、現在の階層を１つだけアップして（ステップ５
９）、ステップ４２に戻る。ワークファイル１６上のす
べてのレコード群について分割処理が終了したとき（ス
テップ４２ＹＥＳ）、探索木作成部１２の処理を終了す
る。When the processing has been completed for all the record groups of the hierarchy (step 43YE)
S) Up only one current layer (step 5)
9) Return to step 42. When the division processing has been completed for all the record groups on the work file 16 (step 42 YES), the processing of the search tree creating unit 12 ends.

【００２６】図８は、ステップ４７の標準偏差の計算処
理の流れを示すフローチャートである。探索木作成部１
２は、対象とするレコード群のデータ項目比較テーブル
１５を参照し（ステップ７１）、処理終了フラグがオフ
の項目、すなわち未処理の項目の数が１項目か否か判定
する（ステップ７２）。未処理の項目が２項目以上あれ
ば（ステップ７２ＮＯ）、データ項目比較テーブル１５
の先頭の項目に位置付ける（ステップ７３）。次に全項
目の処理が終了したか否か判定する（ステップ７４）。
全項目の処理が終了していなければ（ステップ７４Ｎ
Ｏ）、現在処理の対象とする項目の処理終了フラグがオ
ンか否か判定する（ステップ７５）。処理終了フラグが
オン、すなわち処理済であれば（ステップ７５ＹＥ
Ｓ）、ステップ７７へ行く。処理終了フラグがオフであ
れば（ステップ７５ＮＯ）、その項目について標準偏差
を計算しその結果をデータ項目比較テーブル１５中の当
該項目の標準偏差の欄に格納する（ステップ７６）。FIG. 8 is a flowchart showing the flow of the standard deviation calculation process in step 47. Search tree creation unit 1
2 refers to the data item comparison table 15 of the target record group (step 71), and determines whether or not the number of items for which the processing end flag is off, that is, the number of unprocessed items is one (step 72). If there are two or more unprocessed items (NO in step 72), the data item comparison table 15
(Step 73). Next, it is determined whether or not processing of all items has been completed (step 74).
If processing of all items has not been completed (step 74N
O), it is determined whether or not the processing end flag of the item to be currently processed is on (step 75). If the processing end flag is on, that is, if processing has been completed (step 75YE
S) Go to step 77. If the processing end flag is off (NO in step 75), the standard deviation is calculated for the item, and the result is stored in the column of the standard deviation of the item in the data item comparison table 15 (step 76).

【００２７】標準偏差の計算に当っては、まずIn calculating the standard deviation, first,

【００２８】[0028]

【数１】 (Equation 1)

【００２９】の計算式に従って当該項目の各ランクにつ
いて標準偏差計算中間値を計算し、データ項目比較テー
ブル１５の標準偏差計算の中間値の欄に格納する。この
ようにして当該項目の全ランクについて標準偏差計算中
間値を計算したとき、The standard deviation calculation intermediate value is calculated for each rank of the item according to the calculation formula, and is stored in the column of the standard deviation calculation intermediate value of the data item comparison table 15. When the standard deviation calculation intermediate value is calculated for all ranks of the item in this way,

【００３０】[0030]

【数２】 (Equation 2)

【００３１】の計算式に従って当該項目の標準偏差を計
算し、データ項目比較テーブル１５の標準偏差の欄に格
納する。ここでｎは当該項目のランク数であり、Σは中
間値１から中間値ｎまでを合計することを示す。The standard deviation of the item is calculated in accordance with the calculation formula and stored in the column of the standard deviation of the data item comparison table 15. Here, n is the number of ranks of the item, and Σ indicates that the values from the intermediate value 1 to the intermediate value n are totaled.

【００３２】このようにしてステップ７６の処理が終了
したとき、次の項目に位置付けし（ステップ７７）、ス
テップ７４に戻る。全項目の処理が終了したとき（ステ
ップ７４ＹＥＳ）、ステップ４８へ行く。未処理の項目
が１項目であれば（ステップ７２ＹＥＳ）、未処理項目
を優先項目に設定し（ステップ７８）、ステップ４８へ
行く。When the process of step 76 is completed in this way, the next item is positioned (step 77), and the process returns to step 74. When the processing of all the items is completed (YES in step 74), the process proceeds to step 48. If there is only one unprocessed item (YES at step 72), the unprocessed item is set as a priority item (step 78), and the process goes to step 48.

【００３３】図９は、ステップ５０の処理の詳細を示す
フローチャートである。探索木作成部１２は、分割条件
パラメータ１４を参照して（ステップ８１）、ステップ
４８で決定した優先項目について各ランクのレコード件
数を合計したレコード件数が分割条件パラメータ１４に
設定するレコード件数以上であるか否か判定する（ステ
ップ８２）。優先項目のレコード件数が分割条件のレコ
ード件数以上であれば（ステップ８２ＹＥＳ）、当該項
目のレコード件数のレコード群が占有するデータ容量が
分割条件パラメータ１４に設定するページ容量以上か否
か判定する（ステップ８３）。データ容量がページ容量
以上であれば（ステップ８３ＹＥＳ）、当該項目につい
て基準項目ｆの基準ランクｒに該当するレコードの比率
（レコード群平均値）が分割条件パラメータ１４に設定
する分割条件の範囲内か否か判定する（ステップ８
４）。レコード群の平均値は次の式によって計算され
る。平均値＝基準項目ｆの基準ランクｒに該当するレコード
件数÷当該データ項目の全レコード件数基準項目ｆの基準ランクｒの該当するレコード件数は、
当該データ項目の全ランクについてのレコード件数の合
計である。平均値が分割条件の範囲内であれば（ステッ
プ８４ＹＥＳ）、分割するものとしてステップ５１へ行
く。当該レコード群のレコード件数が分割条件のレコー
ド件数未満である場合（ステップ８２ＮＯ）、当該レコ
ード群のデータ容量が分割条件のページ容量未満である
場合（ステップ８３ＮＯ）、またはレコード群の平均値
が分割条件の範囲を外れる場合には（ステップ８４Ｎ
Ｏ）、レコード群を分割しないものと判定してステップ
５６へ行く。FIG. 9 is a flowchart showing details of the processing in step 50. The search tree creation unit 12 refers to the division condition parameter 14 (step 81), and when the total number of records of each rank for the priority item determined in step 48 is equal to or greater than the number of records set in the division condition parameter 14, It is determined whether or not there is (step 82). If the number of records of the priority item is equal to or greater than the number of records of the division condition (step 82 YES), it is determined whether the data capacity occupied by the record group of the number of records of the item is equal to or greater than the page capacity set in the division condition parameter 14 (step 82). Step 83). If the data capacity is equal to or larger than the page capacity (step 83 YES), whether the ratio (record group average value) of the records corresponding to the reference rank r of the reference item f is within the range of the division condition set in the division condition parameter 14 (Step 8
4). The average value of a record group is calculated by the following formula. Average value = number of records corresponding to reference rank r of reference item f / total number of records of this data item The number of records corresponding to reference rank r of reference item f is
This is the total number of records for all ranks of the data item. If the average value is within the range of the division condition (step 84 YES), the process goes to step 51 as the division. If the number of records of the record group is less than the number of records of the division condition (step 82 NO), if the data capacity of the record group is less than the page capacity of the division condition (step 83 NO), or if the average value of the record group is If the condition is not satisfied (step 84N
O), it is determined that the record group is not to be divided, and the flow proceeds to step 56.

【００３４】ファイル編成部１９は、探索木テーブル１
７の１行分の探索木情報を読み込み、そのレコード件数
を取得する。次に検索対象ファイル１８から取得した件
数分のレコードを順に読み込んでファイルの索引順編成
法に従って索引順編成ファイル２１の所定の記憶領域に
順に格納し、その記憶領域の先頭の相対アドレスを読み
込んだ探索木情報に付加して探索木索引テーブル２０に
格納する。このようにして探索木テーブル１７上のすべ
ての探索木情報にアドレス情報を付加して探索木索引テ
ーブル２０に格納し、検索対象ファイル１８上のすべて
のレコードを索引順編成ファイル２１に格納したときフ
ァイル編成部１９の処理を終了する。The file organizing section 19 searches the search tree table 1
7, the search tree information for one row is read, and the number of records is acquired. Next, the records of the number of records acquired from the search target file 18 are sequentially read, sequentially stored in a predetermined storage area of the index sequential file 21 according to the index sequential organization method of the file, and the relative address at the head of the storage area is read. It is stored in the search tree index table 20 in addition to the search tree information. When the address information is added to all the search tree information on the search tree table 17 and stored in the search tree index table 20 in this manner, and all the records on the search target file 18 are stored in the index sequential organization file 21 The processing of the file organization unit 19 ends.

【００３５】情報検索部２４は、入力装置２３を介して
ランク列から成る検索キーが入力されたとき、探索木索
引テーブル２０を検索して指定されたランク列に対応す
る探索木情報に付加されたアドレスを取得し、索引順編
成ファイル２１にアクセスして該当するレコード群を読
み込んで、検索結果２５として外部記憶装置上に書き込
み、表示装置２６上に表示し、またはプリンタ２７上の
帳票に出力する。索引の検索を高速化するために探索木
索引テーブル２０を主記憶（メモリ）上に常駐させても
よい。When a search key consisting of a rank sequence is input via the input device 23, the information search unit 24 searches the search tree index table 20 and adds it to search tree information corresponding to the specified rank sequence. The obtained address is accessed, the indexed sequential file 21 is accessed to read the corresponding record group, written as a search result 25 on an external storage device, displayed on a display device 26, or output to a form on a printer 27. I do. The search tree index table 20 may be resident on the main memory (memory) in order to speed up the index search.

【００３６】図１０は、探索木テーブル１７及び検索対
象ファイル１８に基づいて直接編成ファイルを作成し、
この直接編成ファイルを情報検索する処理手順を示す図
である。探索木作成部１２の処理は上記の通りであり、
探索木作成部１２はその処理結果として探索木テーブル
１７及び検索対象ファイル１８を出力する。探索木索引
テーブル２８及び直接編成ファイル２９は、計算機の記
憶装置に格納されるファイルである。ファイル編成部２
２及び情報検索部３０は、計算機の主記憶装置に格納さ
れるプログラムである。FIG. 10 shows an example in which a directly organized file is created based on the search tree table 17 and the search target file 18.
FIG. 9 is a diagram showing a processing procedure for retrieving information on the directly organized file. The processing of the search tree creating unit 12 is as described above,
The search tree creating unit 12 outputs a search tree table 17 and a search target file 18 as the processing result. The search tree index table 28 and the direct organization file 29 are files stored in the storage device of the computer. File organization unit 2
2 and the information retrieval unit 30 are programs stored in the main storage device of the computer.

【００３７】ファイル編成部２２は、探索木テーブル１
７の１行分の探索木情報を読み込み、この探索木情報に
０から始まるシーケンス番号を付加して探索木索引テー
ブル２８に格納する。次にハッシュ関数を用いて対応す
るレコード群を格納する記憶領域の先頭の相対アドレス
を計算する。相対アドレスは例えば次のハッシュ関数を
用いて計算する。相対アドレス＝ハッシュ関数＝検索対象ファイルの容量
×レコード群のシーケンス番号÷葉の総数ここでシーケンス番号はハッシュ関数の引数である。フ
ァイル容量は探索木テーブル１７の各行のレコード件数
をすべてのレコード群について合計した総レコード件数
から計算可能である。また葉の総数は、探索木テーブル
１７に格納される探索木情報の行数に等しい。ファイル
容量がデータバイト数であれば、相対アドレスは相対バ
イトアドレスとなる。またファイル容量がデータブロッ
ク数であれば、相対アドレスは相対ブロックアドレスと
なる。次に当該レコード群について検索対象ファイル１
８から件数分のレコードを順に読み込んで直接編成ファ
イル２９上の計算した相対アドレスから始まる記憶領域
に順に格納する。このようにしてシーケンス番号を１ず
つ増加させ探索木テーブル１７上のすべての探索木情報
にシーケンス番号を付加して探索木索引テーブル２８に
格納し、検索対象ファイル１８上のすべてのレコードを
直接編成ファイル２９に格納したときファイル編成部２
２の処理を終了する。The file organization unit 22 searches the search tree table 1
The search tree information for one row is read, and a sequence number starting from 0 is added to the search tree information and stored in the search tree index table 28. Next, the relative address at the head of the storage area for storing the corresponding record group is calculated using the hash function. The relative address is calculated using, for example, the following hash function. Relative address = hash function = capacity of file to be searched × sequence number of record group ÷ total number of leaves Here, the sequence number is an argument of the hash function. The file capacity can be calculated from the total number of records obtained by summing the number of records in each row of the search tree table 17 for all record groups. The total number of leaves is equal to the number of rows of search tree information stored in the search tree table 17. If the file capacity is the number of data bytes, the relative address is a relative byte address. If the file capacity is the number of data blocks, the relative address is a relative block address. Next, search target file 1 for the record group
From eight, records of the number of records are sequentially read and sequentially stored in the storage area starting from the calculated relative address on the direct organization file 29. In this way, the sequence number is incremented by one, all the search tree information on the search tree table 17 is added with the sequence number and stored in the search tree index table 28, and all records on the search target file 18 are directly organized. File organization unit 2 when stored in file 29
The process of No. 2 ends.

【００３８】情報検索部３０は、入力装置２３を介して
ランク列から成る検索キーが入力されたとき、探索木索
引テーブル２８を検索して指定されたランク列に対応す
る探索木情報に付加されたシーケンス番号を取得し、上
記のハッシュ関数を用いて指定されたランク列に対応す
る相対アドレスを計算し、直接編成ファイル２９にアク
セスして該当するレコード群を読み込んで、検索結果２
５として外部記憶装置上に書き込み、表示装置２６上に
表示し、またはプリンタ２７上の帳票に出力する。索引
の検索を高速化するために探索木索引テーブル２８を主
記憶（メモリ）上に常駐させてもよい。When a search key consisting of a rank sequence is input via the input device 23, the information search unit 30 searches the search tree index table 28 and adds it to search tree information corresponding to the specified rank sequence. The obtained sequence number is obtained, the relative address corresponding to the specified rank sequence is calculated using the above hash function, and the corresponding record group is read by directly accessing the organization file 29 to obtain the search result 2.
5 is written on the external storage device, displayed on the display device 26, or output to a form on the printer 27. The search tree index table 28 may be resident on the main memory (memory) in order to speed up the index search.

【００３９】直接編成ファイルの場合も探索木の構成は
図２に示すものと同じである。ただし直接編成ファイル
においては、図２に示す相対アドレスの代わりにシーケ
ンス番号によってファイル中のレコード群の位置を表現
する。The structure of the search tree for the direct organization file is the same as that shown in FIG. However, in the direct organization file, the position of the record group in the file is represented by a sequence number instead of the relative address shown in FIG.

【００４０】図１１は、他のデータ項目比較テーブル１
５’の例を示す図である。項目名はレコードに含まれる
データ項目の名称である。一般にデータ項目比較テーブ
ル１５’には基準項目も含まれる。検索頻度は過去の検
索履歴から抽出されたその項目が検索キーとして採用さ
れた回数である。処理終了フラグは対応する項目が優先
項目となったときオンにセットされる。FIG. 11 shows another data item comparison table 1.
It is a figure which shows the example of 5 '. The item name is the name of a data item included in the record. Generally, the data item comparison table 15 'also includes reference items. The search frequency is the number of times that the item extracted from the past search history is used as a search key. The processing end flag is set to ON when the corresponding item becomes a priority item.

【００４１】上記探索木作成部１２は、ステップ４６〜
４８でデータ項目比較テーブル１５を参照して未処理の
項目についてレコード件数を計数し、標準偏差を計算
し、未処理項目の標準偏差を比較することによって優先
項目を決定した。ステップ４６〜４８の代わりに、デー
タ項目比較テーブル１５’を参照して未処理の項目につ
いて検索頻度を比較し、最も検索頻度の高い項目を優先
項目としてもよい。The search tree creating unit 12 performs steps 46 to
At 48, the number of records is counted for the unprocessed item with reference to the data item comparison table 15, the standard deviation is calculated, and the priority item is determined by comparing the standard deviation of the unprocessed items. Instead of steps 46 to 48, the search frequency may be compared for unprocessed items with reference to the data item comparison table 15 ', and the item with the highest search frequency may be set as the priority item.

【００４２】図１２は、検索対象ファイル１１中のレコ
ードの各項目について各ランクの上限値及び下限値を算
出してデータ項目ランク付けテーブル１３を作成するラ
ンク分割プログラムの処理の流れを示すフローチャート
である。ランク分割プログラムは、表示装置上にレコー
ドを構成するデータ項目の項目名を表示し、各項目につ
いて設定するランクの数とユーザ指定の上限値／下限値
を入力するよう促す。下限値のみ設定するデータ項目は
ユーザ指定による。入力装置を介して各項目のランク数
が入力され、ユーザ指定のランクについて上限値／下限
値が入力されると（ステップ９１）、先頭の項目に位置
付けする（ステップ９２）。全項目の処理が終了してい
なければ（ステップ９３ＮＯ）、当該項目についてユー
ザ指定の上限値／下限値が設定されているか否か判定す
る（ステップ９４）。上限値／下限値の設定があれば
（ステップ９４ＹＥＳ）、指定された上限値／下限値を
データ項目ランク付けテーブル１３の該当する項目の各
ランクに格納し（ステップ９５）、ステップ９９へ行
く。上限値／下限値の指定がなければ（ステップ９４Ｎ
Ｏ）、検索対象ファイル１１からレコードを順に読み込
み、当該項目について項目値の昇順にソートしてワーク
ファイルに出力する（ステップ９６）。次に各ランクの
上限／下限に相当するレコードのシーケンス番号を計算
する（ステップ９７）。検索対象ファイル１１に含まれ
るレコードのランク分けは、検索対象ファイル１１のレ
コード件数をランク数によってほぼ均等に分割すること
によって行う。当該項目の指定されたランク数をｎとす
るとき、ランクｍ（１≦ｍ≦ｎ）の下限値であるレコー
ドのシーケンス番号は、次の式によって求められる。FIG. 12 is a flowchart showing the processing flow of a rank division program for calculating the upper limit value and the lower limit value of each rank for each item of the record in the search target file 11 and creating the data item ranking table 13. is there. The rank division program displays the item names of the data items constituting the record on the display device, and prompts the user to input the number of ranks set for each item and the upper / lower limit specified by the user. Data items for which only the lower limit is set are specified by the user. When the number of ranks of each item is input via the input device and the upper limit / lower limit is input for the rank specified by the user (step 91), the item is positioned at the first item (step 92). If the processing for all the items has not been completed (NO in step 93), it is determined whether or not the upper limit / lower limit specified by the user is set for the item (step 94). If there is an upper limit / lower limit set (YES in step 94), the designated upper limit / lower limit is stored in each rank of the corresponding item in the data item ranking table 13 (step 95), and the process proceeds to step 99. If no upper limit / lower limit is specified (step 94N
O) The records are sequentially read from the search target file 11, the items are sorted in ascending order of the item values, and output to the work file (step 96). Next, the sequence number of the record corresponding to the upper / lower limit of each rank is calculated (step 97). The ranking of the records included in the search target file 11 is performed by dividing the number of records of the search target file 11 almost equally by the rank number. Assuming that the designated rank number of the item is n, the sequence number of the record that is the lower limit of the rank m (1 ≦ m ≦ n) is obtained by the following equation.

【００４３】[0043]

【数３】 (Equation 3)

【００４４】ここで上向きの矢印は、計算結果の小数点
以下の数値を切り上げることによって整数値に丸めるこ
とを示す。またランクｍの上限値であるレコードのシー
ケンス番号は、次の式によって求められる。Here, the upward arrow indicates that the numerical value after the decimal point of the calculation result is rounded up to an integer value by rounding up. The sequence number of the record, which is the upper limit of the rank m, is obtained by the following equation.

【００４５】[0045]

【数４】 (Equation 4)

【００４６】次にワークファイルからレコード数を計数
しながら順にレコードを読み込み、計算したシーケンス
番号に該当するレコードの当該項目の項目値を取得し、
データ項目ランク付けテーブル１３の該当する項目の該
当するランクの上限値／下限値に格納する（ステップ９
８）。次に次の項目に位置付けし（ステップ９９）、ス
テップ９３へ行く。このようにして全項目の処理が終了
したとき（ステップ９３ＹＥＳ）、本プログラムの処理
を終了する。Next, the records are sequentially read from the work file while counting the number of records, and the item value of the item of the record corresponding to the calculated sequence number is obtained.
It is stored in the upper limit value / lower limit value of the corresponding rank of the corresponding item in the data item ranking table 13 (step 9).
8). Next, it is positioned at the next item (step 99), and the procedure goes to step 93. When the processing for all items is completed in this way (step 93 YES), the processing of this program is completed.

【００４７】なお図１２に示す方法では、ランクｍの上
限値とランク（ｍ＋１）の下限値とが一致するケースが
生じる。このような場合には、同一のデータ項目値はラ
ンクｍの上限値に組み入れ、ワークファイルのレコード
を順に読み込んで出現する異なるデータ項目値をランク
（ｍ＋１）の下限値とする。In the method shown in FIG. 12, there is a case where the upper limit of rank m and the lower limit of rank (m + 1) match. In such a case, the same data item value is incorporated into the upper limit value of rank m, and a different data item value appearing by sequentially reading records of the work file is set as the lower limit value of rank (m + 1).

【００４８】なお図１１に示すデータ項目比較テーブル
１５’として検索頻度の代わりに検索目的に従って利用
者が指定した優先順位を設定し、この優先順位に従って
次のレコード群の分割を行う優先項目を決定してもよ
い。Note that the data item comparison table 15 'shown in FIG. 11 sets a priority specified by the user according to the search purpose instead of the search frequency, and determines a priority item for dividing the next record group according to the priority. May be.

【００４９】また探索木情報にアドレスを付加した探索
木索引テーブル２０と索引順編成ファイル２１のデータ
構造をもつデータをコンピュータ読み取り可能な記憶媒
体に格納し、上記のように情報検索に供することができ
る。また探索木情報にシーケンス番号を付加した探索木
索引テーブル２８と直接編成ファイル２９のデータ構造
をもつデータをコンピュータ読み取り可能な記憶媒体に
格納し、上記のように情報検索に供することができる。The search tree index table 20 in which an address is added to the search tree information and the data having the data structure of the index sequential file 21 are stored in a computer-readable storage medium, and can be used for information search as described above. it can. Further, data having the data structure of the search tree index table 28 obtained by adding a sequence number to the search tree information and the direct organization file 29 are stored in a computer-readable storage medium, and can be used for information search as described above.

【００５０】[0050]

【発明の効果】以上述べたように本発明によれば、探索
木テーブルが検索目的に適合するランク列だけに絞られ
るため、従来の索引テーブルに比べて索引の検索時間が
短縮される。また検索対象ファイル上のレコード群が検
索目的に従って探索木情報の順に配列され、ランク列が
指定されれば１回のシーケンシャル・アクセスによって
対応するレコード群を抽出することができ、ファイル検
索の時間が短縮される。As described above, according to the present invention, the search tree table is narrowed down to only the rank sequence suitable for the search purpose, so that the index search time is reduced as compared with the conventional index table. The records on the search target file are arranged in the order of the search tree information according to the search purpose. If a rank sequence is specified, the corresponding records can be extracted by one sequential access, and the time of the file search can be reduced. Be shortened.

[Brief description of the drawings]

【図１】実施形態の情報検索方式の処理手順を示す図で
ある。FIG. 1 is a diagram illustrating a processing procedure of an information search method according to an embodiment.

【図２】索引順編成ファイルの探索木の構成例を示す図
である。FIG. 2 is a diagram illustrating a configuration example of a search tree of an index sequential organization file.

【図３】実施形態のデータ項目ランク付けテーブル１３
のデータ構成を示す図である。FIG. 3 is a data item ranking table 13 according to the embodiment;
FIG. 3 is a diagram showing a data configuration of FIG.

【図４】実施形態のデータ項目比較テーブル１５のデー
タ構成を示す図である。FIG. 4 is a diagram showing a data configuration of a data item comparison table 15 of the embodiment.

【図５】分割条件パラメータ１４のデータ例を示す図で
ある。FIG. 5 is a diagram showing a data example of a division condition parameter 14;

【図６】実施形態の探索木テーブル１７のデータ構成を
示す図である。FIG. 6 is a diagram illustrating a data configuration of a search tree table 17 according to the embodiment.

【図７ａ】実施形態の探索木作成部１２の処理の流れを
示すフローチャートである。FIG. 7A is a flowchart illustrating a processing flow of a search tree creating unit 12 according to the embodiment.

【図７ｂ】実施形態の探索木作成部１２の処理の流れを
示すフローチャート（続き）である。FIG. 7B is a flowchart (continued) showing a processing flow of the search tree creating unit 12 of the embodiment.

【図８】実施形態のステップ４７をさらに展開して示す
フローチャートである。FIG. 8 is a flowchart further illustrating step 47 of the embodiment.

【図９】実施形態のステップ５０をさらに展開して示す
フローチャートである。FIG. 9 is a flowchart further illustrating step 50 of the embodiment.

【図１０】他の実施形態の情報検索方式の処理手順を示
す図である。FIG. 10 is a diagram illustrating a processing procedure of an information search method according to another embodiment.

【図１１】実施形態のデータ項目比較テーブル１５’の
データ構成を示す図である。FIG. 11 is a diagram illustrating a data configuration of a data item comparison table 15 ′ according to the embodiment.

【図１２】実施形態のランク分割プログラムの処理の流
れを示すフローチャートである。FIG. 12 is a flowchart illustrating a flow of processing of a rank division program according to the embodiment.

[Explanation of symbols]

１２…探索木作成部、１３…データ項目ランク付けテー
ブル、１４…分割条件パラメータ、１５…データ項目比
較テーブル、１７…探索木テーブル、１８…検索対象フ
ァイル、１９…ファイル編成部、２０，２８…探索木索
引テーブル、２１…索引順編成ファイル、２４…情報検
索部、２９…直接編成ファイル、３０…情報検索部12: Search tree creation unit, 13: Data item ranking table, 14: Division condition parameter, 15: Data item comparison table, 17: Search tree table, 18: Search target file, 19: File organization unit, 20, 28 ... Search tree index table, 21: index sequential organization file, 24: information retrieval unit, 29: direct organization file, 30: information retrieval unit

Claims

[Claims]

1. A rank provided with a file containing a plurality of records in which each record is composed of a plurality of data items and each data item is divided into a plurality of ranks according to a range that the data item value can take. In an information retrieval method using a computer, a search is performed by using a column of a search sequence. It is determined whether or not the group is to be further divided, and when the record group is regarded as a root or a node of the search tree and is further divided, it corresponds to each rank of the data item having the highest priority among the data items not yet involved in the division. If the record group is divided into record groups and the record group is regarded as a leaf of the search tree, The rank sequence corresponding to the group is sequentially stored in the search target file and the search tree table, respectively, and the address of the storage area accommodating the record group corresponding to each rank sequence stored in the search tree table is added to the search tree index. A table is created, and each record group contained in the search target file is stored in a storage area of the determined address to create an index sequential file. When a rank sequence is given, the search tree index table is created. And searching for a corresponding address, and accessing a record group corresponding to a specified rank sequence of the indexed sequential file.

2. A data item that matches a search purpose among a plurality of data items is set as a reference item, and a data item having the greatest influence on the reference item among data items not yet involved in division is set as a priority item. 2. The information search method according to claim 1, wherein:

3. The information search method according to claim 1, wherein among the data items not yet involved in the division, the data item having the highest search frequency based on the past search history is set as the priority item.

4. A search tree by adding a sequence number of a record group corresponding to each rank sequence contained in the search tree table, instead of creating the index sequential file and searching the index sequential file. While creating an index table, calculate the address of the storage area accommodating the record group by a hash function with the sequence number as an argument,
Each record group contained in the search target file is stored in the storage area of the determined address to directly create an organization file, and when a rank sequence is given, the search tree index table is searched and a corresponding sequence is created. 2. The information retrieval method according to claim 1, wherein a number is obtained, an address of a corresponding record group is calculated by the hash function, and a record group corresponding to a specified rank sequence of the directly organized file is accessed.

5. A computer program embodied in a computer-readable storage medium, wherein each program is composed of a plurality of data items, and each data item is composed of a plurality of data items according to a possible range of the data item value. A program for retrieving a file in which a plurality of records are classified according to a given rank by a given rank column, and includes the following steps: (a) The entire record group contained in the file is related to the rank column. When the record group is regarded as the root of the search tree and it is determined that the record group is to be further divided based on the division condition set starting from the record group corresponding to the root of the search tree, For each rank of the highest priority data item that is not yet involved in the split, If the record group is regarded as a leaf of the search tree and is not further divided, the record group and the corresponding rank sequence are sequentially stored in the search target file and the search tree table, respectively.
(B) A search tree index table is created by adding an address of a storage area accommodating a record group corresponding to each rank sequence accommodated in the search tree table, and each record group accommodated in the search target file Is stored in the storage area of the determined address to create an index sequential file,
(C) When a rank sequence is given, the search tree index table is searched for a corresponding address, and a record group corresponding to the specified rank sequence in the index sequential file is accessed.