JP2001117929A

JP2001117929A - Data retrieving method, data aligning method and data retrieving device

Info

Publication number: JP2001117929A
Application number: JP29441199A
Authority: JP
Inventors: Yasunori Tsukamoto; 康則塚本
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1999-10-15
Filing date: 1999-10-15
Publication date: 2001-04-27

Abstract

PROBLEM TO BE SOLVED: To perform completely coincident retrieval without dispersion in retrieval time and high-speed ambiguously coincident retrieval to a data structure, which defines key data in records as retrieval data. SOLUTION: This device is provided with a retrieval object data storage part 4 for storing data to become a retrieval object for each digit in a bisected tree structure on a memory and a data retrieving part 5 for retrieving the target data from the route of the stored data in the bisected tree structure by comparing the data of respective nodes with the retrieval data.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、検索対象となるデ
ータをメモリ上に桁毎に２分木構造で格納し検索データ
を桁毎に比較しながら、文字列を高速に検索し、整列す
るするデータ検索方法、データ整列方法およびデータ検
索装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a high-speed search and sorting of character strings while storing data to be searched in a memory in a binary tree structure for each digit and comparing the search data digit by digit. The present invention relates to a data search method, a data alignment method, and a data search device.

【０００２】[0002]

【従来の技術】よく知られているように、２分木検索
は、検索対象となるデータを２分木構造で表現し、根と
なるデータから検索データを順次比較して該当するデー
タを検索するものである。2. Description of the Related Art As is well known, in a binary tree search, data to be searched is expressed in a binary tree structure, and search data is sequentially compared from root data to search for corresponding data. Is what you do.

【０００３】ここで、木構造とは、データの階層関係を
表現するためのデータ構造である。階層関係では、ある
要素（ノード）Ａの下に要素Ｂが存在する場合に、その
要素間に上下関係をつけて親と子を表現する。親と子の
つながりを枝（パス）、親を持たない要素を根（ルー
ト）、子を持たない要素を葉（リーフ）、親も子も持た
ない要素を節と定義すると、図９に示すような階層関係
を作成することができる。また、２分木構造とは、ある
親に対する子の要素が２つある木構造をいう。Here, the tree structure is a data structure for expressing a hierarchical relationship of data. In the hierarchical relationship, when an element B exists below a certain element (node) A, a parent and a child are expressed by giving a vertical relationship between the elements. If the connection between parent and child is defined as a branch (path), an element without a parent is defined as a root (root), an element without a child is defined as a leaf, and an element without a parent or child is defined as a node, as shown in FIG. Such a hierarchical relationship can be created. Further, the binary tree structure refers to a tree structure having two child elements for a certain parent.

【０００４】この木構造に対して、ある要素を根とし
て、根より小さいデータを左に、大きいデータを右に格
納する。各データの大小関係をデータ『ａ，ｂ，ｃ，
ｄ，ｅ，ｆ，ｇ』で表現すると、ａ＜ｂ＜ｃ＜ｄ＜ｅ＜
ｆ＜ｇより、図１０すような木構造になる。[0004] With respect to this tree structure, with a certain element as a root, data smaller than the root is stored on the left and large data is stored on the right. The magnitude relation of each data is represented by data "a, b, c,
d, e, f, g], a <b <c <d <e <
From f <g, the tree structure is as shown in FIG.

【０００５】一般にこの２分木での検索時のデータ比較
回数は、データの個数をｎとすると、以下のとおりとな
る。すなわち、・平均比較回数…ｌｏｇ_２ｎ・最大比較回数…ｎまた、検索には、完全一致検索と曖昧検索（部分一致検
索ともいう）とがある。完全一致検索では、例えば、１
０バイトのデータであれば、検索データと検索対象デー
タとで１０バイト全てを比較して検索している。曖昧検
索では、検索データが２０バイトであれば、１０バイト
のデータが一致していればヒットとするというものであ
る。In general, the number of data comparisons at the time of retrieval in this binary tree is as follows, where n is the number of data. The average number of comparisons: log ₂ n The maximum number of comparisons: n There are two types of searches: perfect match search and fuzzy search (also called partial match search). In an exact match search, for example, 1
If the data is 0 bytes, the search data and the search target data are searched by comparing all 10 bytes. In the fuzzy search, if the search data is 20 bytes, a hit is made if the 10-byte data matches.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、完全一
致検索では、データの最大比較回数がデータの個数ｎと
同じｎ回となってしまい、データの個数に応じて検索時
間にばらつきが生じてしまうという問題があった。However, in a perfect match search, the maximum number of data comparisons becomes n, which is the same as the number n of data, and the search time varies depending on the number of data. There was a problem.

【０００７】また、曖昧検索においては、文字列のよう
な複数バイトサイズで成り立つデータの検索をしようと
すると、単純な２分木構造では検索に時間がかかってし
ますという問題があった。[0007] In addition, in the fuzzy search, if it is attempted to search for data such as a character string which has a plurality of byte sizes, it takes a long time to search using a simple binary tree structure.

【０００８】本発明は上記事情に鑑みてなされたもの
で、レコード中のキーデータを検索データとするような
データ構造に対して検索時間にばらつきのない完全一致
検索および高速な曖昧一致検索を実現することのできる
データ検索方法および装置を提供することを目的として
いる。The present invention has been made in view of the above circumstances, and realizes a perfect match search and a high-speed fuzzy match search with no variation in search time for a data structure in which key data in a record is used as search data. It is an object of the present invention to provide a data search method and device capable of performing the data search.

【０００９】また、データの整列処理を簡単かつ短時間
に実行することができるデータ整列方法を提供すること
を目的としている。It is another object of the present invention to provide a data alignment method capable of executing data alignment processing easily and in a short time.

【００１０】[0010]

【課題を解決するための手段】上記の目的を達成するた
めに請求項１の発明は、検索対象となるデータを桁毎に
メモリ上に２分木構造で格納し、格納された２分木構造
のデータに対して検索データを桁毎に比較して目的のデ
ータを検索することを特徴としている。In order to achieve the above object, according to the first aspect of the present invention, data to be searched is stored in a memory in a binary tree structure for each digit, and the stored binary tree is stored. It is characterized in that search data is compared digit by digit with respect to structured data to search for target data.

【００１１】請求項２の発明は、請求項１に記載のデー
タ検索方法において、曖昧検索を実行する場合には、格
納された２分木構造のデータに対して検索データを桁毎
に比較して目的のデータを検索し、次いで、桁を１桁ず
らして再度検索する処理を最後の桁まで繰り返して実行
することを特徴としている。According to a second aspect of the present invention, in the data search method according to the first aspect, when performing an ambiguous search, the search data is compared with the stored binary tree structure data digit by digit. Then, the process of retrieving the target data and then retrieving the data by shifting the digit by one digit is repeated until the last digit is executed.

【００１２】請求項３の発明は、整列対象となるデータ
を桁毎にメモリ上に２分木構造で格納し、格納された２
分木構造のデータ上位桁または下位桁から順に後方検索
または前方検索を実行して、前記整列対象となるデータ
を降順または昇順に整列することを特徴としている。According to a third aspect of the present invention, data to be sorted is stored in a memory in a binary tree structure for each digit, and the stored binary data is stored.
A backward search or a forward search is performed in order from the upper digit or lower digit of the data of the branch tree structure, and the data to be sorted is sorted in descending order or ascending order.

【００１３】請求項４の発明は、検索対象となるデータ
を桁毎にメモリ上に２分木構造で格納する検索対象デー
タ格納手段と、格納された２分木構造のデータのルート
から各ノードのデータと検索データとを比較して目的の
データを検索するデータ検索手段とを備えることを特徴
としている。According to a fourth aspect of the present invention, there is provided a search target data storage means for storing data to be searched in a memory for each digit in a binary tree structure, and each node from a root of the stored binary tree structure data. And data search means for searching for target data by comparing the data with the search data.

【００１４】請求項５の発明は、請求項４に記載のデー
タ検索装置において、前記２分木を構成する各ノードの
データ構造は、前記検索対象となるデータを桁毎に格納
する桁データと、この桁データ部に格納された自データ
よりも小さい桁データへのポインタと、前記自データよ
りも大きい桁データへのポインタと、前記自データの次
の桁データへのポインタと、データの終端を示すフラグ
とから成ることを特徴としている。According to a fifth aspect of the present invention, in the data search apparatus according to the fourth aspect, the data structure of each node constituting the binary tree includes digit data for storing the data to be searched for each digit. A pointer to digit data smaller than the own data stored in the digit data portion, a pointer to digit data larger than the own data, a pointer to the next digit data of the own data, and an end of data. And a flag indicating the flag.

【００１５】請求項６の発明は、請求項４または５に記
載のデータ検索装置において、前記データ検索手段は、
曖昧検索を実行する場合には、格納された２分木構造の
データに対して検索データを桁毎に比較して目的のデー
タを検索するとともに、桁を１桁ずらして再度検索する
処理を最後の桁まで繰り返して実行する機能を含むこと
を特徴としている。According to a sixth aspect of the present invention, in the data search device according to the fourth or fifth aspect, the data search means comprises:
When performing an ambiguous search, the search data is compared digit by digit with respect to the stored binary tree structure data to search for the target data, and the process of searching again by shifting the digit by one digit is performed last. It is characterized by including a function of repeatedly executing up to the digit.

【００１６】上記の構成によれば、検索対象となるデー
タを桁毎にメモリ上に２分木構造で格納し、格納された
２分木構造のデータに対して検索データを桁毎に比較し
て目的のデータを検索することにより、レコード中のキ
ーデータを検索データとするようなデータ構造に対して
検索時間にばらつきのない完全一致検索および高速な曖
昧一致検索を実現する。According to the above arrangement, the data to be searched is stored in the memory for each digit in a binary tree structure, and the search data is compared with the stored binary tree structure data for each digit. By searching for the target data by using the key data in the record as the search data, it is possible to realize a complete match search and a high-speed ambiguous match search with no variation in search time for a data structure in which key data in a record is used as search data.

【００１７】また、整列対象となるデータを桁毎にメモ
リ上に２分木構造で格納し、格納された２分木構造のデ
ータ上位桁または下位桁から順に後方検索または前方検
索を実行して、前記整列対象となるデータを降順または
昇順に整列することにより、データの整列処理を簡単か
つ短時間に実行する。The data to be sorted is stored in the memory in a binary tree structure for each digit, and a backward search or a forward search is executed in order from the upper digit or lower digit of the stored data of the binary tree structure. By arranging the data to be sorted in descending order or ascending order, the data sorting process can be executed easily and in a short time.

【００１８】[0018]

【発明の実施の形態】図１は本発明によるデータ検索装
置の実施の形態の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an embodiment of a data search apparatus according to the present invention.

【００１９】同図に示すデータ検索装置１は、例えば、
パソコンで構成されており、データ検索を実行するＣＰ
Ｕ２と、ハードディスク等の外部記憶媒体で構成され、
検索対象となるデータを格納したデータ記憶部３とを備
えている。The data search device 1 shown in FIG.
CP consisting of a personal computer and executing a data search
U2 and an external storage medium such as a hard disk,
A data storage unit 3 for storing data to be searched.

【００２０】ＣＰＵ２は、検索対象となるデータを桁毎
にメモリ上に２分木構造で格納する検索対象データ格納
部４と、この検索対象データ格納部に格納された２分木
構造のデータのルートから各ノードのデータと検索デー
タとを比較して目的のデータを検索するデータ検索部３
を備えている。The CPU 2 stores a search target data storage unit 4 for storing data to be searched in a memory for each digit in a binary tree structure, and stores a binary tree structure data stored in the search target data storage unit. A data search unit 3 that compares the data of each node with the search data from the root and searches for the target data
It has.

【００２１】検索対象データ格納部２は、図２に示すよ
うに、検索対象データが、Ａ_ｎＡ_ｎ−１・・・Ａ_２Ａ_１・・・・Ｘ_ｎＸ_ｎ−１・・・Ｘ_２Ｘ_１Ｙ_ｎＹ_ｎ−１・・・Ｙ_２Ｙ_１である場合に、次に示すように、各桁毎のデータを抽出
して各桁毎に２分木構造でメモリ内に格納する。The retrieval target data storage unit 2, as shown in FIG. 2, the search target data _{_{is, A n A n-1 ···}} A 2 A 1 · · · · X n X n-1 ··· X _{In the case of 2} X ₁ Y _n Y _n-1 ... Y ₂ Y ₁ , as shown below, data for each digit is extracted and stored in the memory in a binary tree structure for each digit. .

【００２２】１桁目………Ａ_１・・・Ｘ_１Ｙ_１２桁目………Ａ_２・・・Ｘ_２Ｙ_２・・ｎ−１桁目…Ａ_ｎー１・・・Ｘ_ｎー１Ｙ_ｎー１ｎ桁目………Ａ_ｎ・・・Ｘ_ｎＹ_ｎこの場合、図３にＣ言語の構造体で示すように、検索対
象データ格納部２のメモリ内には、各桁毎に、桁データ
と、その桁データよりも小さい桁データが格納されたバ
ッファのアドレスを示すポインタと、前記桁データより
も大きい桁データが格納されたバッファのアドレスを示
すポインタと、前記桁データの次の桁データが格納され
たバッファのアドレスを示すポインタと、前記桁データ
が終端データ（桁の終わりで、次にリンクする桁が無い
もの）である場合には“ＯＮ”、終端でない場合には
“ＯＦ”となるフラグをそれぞれ格納する領域が設けら
れる。[0022] 1 digit _{_{_{......... A 1 ··· X 1 Y 1}}} 2 digit _{_{_{......... A 2 ··· X 2 Y 2}}} · · n-1 digit ... A _{n over} 1 ··· _{X n over} 1 Y _{n over 1} n-th digit ......... _a _n ··· _X n _Y n in this case, as shown in C language structures in Figure 3, the search target data storage unit 2 of the memory, each For each digit, a pointer indicating the address of a buffer storing digit data, digit data smaller than the digit data, a pointer indicating a buffer address storing digit data larger than the digit data, and A pointer indicating the address of the buffer in which the next digit data of the data is stored, and "ON" if the digit data is end data (the end of the digit and no digit to be linked next), not the end In this case, there are provided areas for storing flags that are “OF”.

【００２３】次に本実施の形態の動作を説明する。Next, the operation of this embodiment will be described.

【００２４】図４に示すように、検索対象データとし
て、“１４２”,“ａ４２”,“９４２”,“１３”,“８
３”,“ｂ３”,“３３”,“２”,“５”の９個のデータ
があるものとする。検索対象データ格納部４は、これら
の９個のデータを桁毎に２分木構造でメモリ上に格納し
ていく。具体的には、図４に示すように、１桁目のバッ
ファアドレス１には、桁データとして“３”、小さい桁
データへのポインタとして“アドレス２”、大きい桁デ
ータのポインタとして“アドレス３”、次の桁データへ
のポインタとして“アドレス４”、そしてデータの終端
を示すフラグとして“ＯＦ（ＯＦＦの略）”が格納され
る。なお、木構造はデータの格納順によっても異なる構
造となるが、図４の例では、“３”から格納する場合を
示している。As shown in FIG. 4, "142", "a42", "942", "13", "8"
It is assumed that there are nine data of “3”, “b3”, “33”, “2”, and “5.” The search target data storage unit 4 stores these nine data in a binary tree for each digit. As shown in Fig. 4, the buffer address 1 of the first digit is "3" as digit data and "address 2" is a pointer to small digit data, as shown in Fig. 4. In addition, “address 3” is stored as a pointer to the large digit data, “address 4” is a pointer to the next digit data, and “OF (abbreviation of OFF)” is stored as a flag indicating the end of the data. Has a different structure depending on the data storage order, but the example of FIG. 4 shows a case where data is stored starting from “3”.

【００２５】同様に、アドレス２には、桁データとして
“２”、小さい桁データへのポインタとして“ＮＬ”、
大きい桁データのポインタとして“ＮＬ（ＮＵＬＬポイ
ンタの略）”、次の桁データへのポインタとして“アド
レス４”、終端フラグとして“ＯＮ”がそれぞれ格納さ
れていく。また、アドレス３には、桁データとして
“５”、小さい桁データへのポインタとして“ＮＬ”、
大きい桁データのポインタとして“ＮＬ”、次の桁デー
タへのポインタとして“ＮＬ”、終端フラグとして“Ｏ
Ｎ”がそれぞれ格納されていく。Similarly, address 2 has "2" as digit data, "NL" as a pointer to smaller digit data,
“NL (abbreviation of NULL pointer)” is stored as a pointer for large digit data, “address 4” is stored as a pointer to the next digit data, and “ON” is stored as a termination flag. Address 3 has "5" as digit data, "NL" as a pointer to small digit data,
“NL” as a pointer for the large digit data, “NL” as a pointer to the next digit data, and “O” as a termination flag
N "are stored.

【００２６】次に、２桁目のバッファのアドレス３に
は、桁データとして“８”、小さい桁データへのポイン
タとして“アドレス５”、大きい桁データのポインタと
して“アドレス６”、次の桁データへのポインタとして
“ＮＬ”、終端フラグとして“ＯＮ”がそれぞれ格納さ
れていく。また、２桁目のアドレス５には、桁データと
して“３”、小さい桁データへのポインタとして“アド
レス７”、大きい桁データのポインタとして“ＮＬ”、
次の桁データへのポインタとして“ＮＬ”、終端フラグ
として“ＯＮ”がそれぞれ格納されていく。さらに、２
桁目のアドレス６には、桁データとして“ｂ”、小さい
桁データへのポインタとして“ＮＬ”、大きい桁データ
のポインタとして“ＮＬ”、次の桁データへのポインタ
として“ＮＬ”、終端フラグとして“ＯＮ”がそれぞれ
格納されていく。さらに、アドレス７には、桁データと
して“１”、小さい桁データへのポインタとして“Ｎ
Ｌ”、大きい桁データのポインタとして“ＮＬ”、次の
桁データへのポインタとして“ＮＬ”、終端フラグとし
て“ＯＮ”がそれぞれ格納されていく。さらに、アドレ
ス９には、桁データとして“４”、小さい桁データへの
ポインタとして“ＮＬ”、大きい桁データのポインタと
して“ＮＬ”、次の桁データへのポインタとして“アド
レス１０”、終端フラグとして“ＯＦ”がそれぞれ格納
されていく。Next, at address 3 of the second digit buffer, "8" is used as digit data, "address 5" is used as a pointer to small digit data, "address 6" is used as a pointer to large digit data, and the next digit is used. “NL” is stored as a pointer to data, and “ON” is stored as a termination flag. The address 5 of the second digit has “3” as digit data, “address 7” as a pointer to small digit data, “NL” as a pointer to large digit data,
“NL” is stored as a pointer to the next digit data, and “ON” is stored as a termination flag. In addition, 2
The address 6 of the digit has “b” as digit data, “NL” as a pointer to small digit data, “NL” as a pointer to large digit data, “NL” as a pointer to the next digit data, and a termination flag. Are stored as “ON”. Further, the address 7 has "1" as digit data and "N" as a pointer to small digit data.
"L", "NL" as a pointer to the large digit data, "NL" as a pointer to the next digit data, and "ON" as a termination flag, respectively. , "NL" as a pointer to small digit data, "NL" as a pointer to large digit data, "Address 10" as a pointer to the next digit data, and "OF" as an end flag.

【００２７】次に、３桁目のバッファアドレス１０に
は、桁データとして“９”、小さい桁データへのポイン
タとして“アドレス１１”、大きい桁データのポインタ
として“アドレス１２”、次の桁データへのポインタと
して“ＮＬ”、終端フラグとして“ＯＮ”がそれぞれ格
納されていく。また、アドレス１１には、桁データとし
て“１”、小さい桁データへのポインタとして“Ｎ
Ｌ”、大きい桁データのポインタとして“ＮＬ”、次の
桁データへのポインタとして“ＮＬ”、終端フラグとし
て“ＯＮ”がそれぞれ格納されていく。さらに、アドレ
ス１２には、桁データとして“ａ”、小さい桁データへ
のポインタとして“ＮＬ”、大きい桁データのポインタ
として“ＮＬ”、次の桁データへのポインタとして“Ｎ
Ｌ”、終端フラグとして“ＯＮ”がそれぞれ格納されて
いくのである。Next, the buffer address 10 of the third digit has "9" as digit data, "address 11" as a pointer to small digit data, "address 12" as a pointer to large digit data, and the next digit data. "NL" is stored as a pointer to "", and "ON" is stored as a termination flag. The address 11 has "1" as digit data and "N" as a pointer to small digit data.
"L", "NL" as a pointer to the next digit data, "NL" as a pointer to the next digit data, and "ON" as a termination flag, respectively. , "NL" as a pointer to small digit data, "NL" as a pointer to large digit data, and "N" as a pointer to the next digit data.
L, and “ON” are stored as the termination flag.

【００２８】こうして検索対象データが各桁毎に２分木
構造で格納されると、データ検索部５では、検索データ
と検索対象データとを各桁毎に比較して根から順に検索
を実行して、目的のデータを検索して出力する。以下
に、データ検索部５で実行される検索を完全一致検索
と、曖昧検索（部分一致検索）とに分けて説明する。When the search target data is stored in a binary tree structure for each digit in this way, the data search unit 5 compares the search data and the search target data for each digit and executes the search in order from the root. Search and output the desired data. Hereinafter, the search performed by the data search unit 5 will be described by dividing it into a perfect match search and an ambiguous search (partial match search).

【００２９】（１）完全一致検索＜本実施形態における完全一致検索の方法＞完全一致検
索の方法は、下位桁の２分木構造の根から順次、検索を
実行する。該当する文字が見つかった場合は、その文字
にリンクされた次の列の文字の検索を開始する。以後、
この処理を繰り返し、検索対象データの最上位桁まで実
行する。データがヒットしない場合には、処理を終了す
る。(1) Exact Match Search <Exact Match Search Method in the Present Embodiment> In the exact match search method, the search is executed sequentially from the root of the binary tree structure of the lower digit. If the character is found, it starts searching for the next column of characters linked to that character. Since then
This process is repeated until the most significant digit of the search target data is executed. If no data is hit, the process ends.

【００３０】図５には、データを順に２分木構造で格納
する様子が示されている。このようにデータが格納され
た場合には、例えば、“ａｈｄ”を検索データとして、
図５のデータ構造から検索する場合には、まず、キーデ
ータの１桁目“ｄ”を検索対象データの１桁目の根
“ｃ”と比較することから検索を開始する。データ
“ｄ”は根のデータ“ｃ”より大きいので、右の枝に進
み、データ“ｅ”と比較される。キーデータ“ｄ”は、
データ“ｅ”よりも小さいので、左のデータ“ｄ”と比
較され、ここで、１桁目のデータがヒットしたことにな
る。FIG. 5 shows how data is stored in a binary tree structure in order. When data is stored in this manner, for example, “ahd” is used as search data.
When searching from the data structure of FIG. 5, first, the search is started by comparing the first digit "d" of the key data with the root "c" of the first digit of the search target data. Since the data “d” is larger than the root data “c”, the process proceeds to the right branch and is compared with the data “e”. The key data “d” is
Since it is smaller than the data "e", it is compared with the left data "d", and here, the data of the first digit is hit.

【００３１】次に、２桁目のキーデータ“ｈ”の検索が
開始され、根のデータ“ｇ”と比較され、その結果、左
の枝に進んでデータ“ｈ”がヒットされることになる。Next, the search for the key data "h" of the second digit is started and compared with the root data "g". As a result, the data proceeds to the left branch and the data "h" is hit. Become.

【００３２】さらに、３桁目のキーデータ“ａ”の検索
では、２桁目のデータ“ｈ”にリンクしたデータ“ｂ”
と比較され、その結果、左の枝に進んでデータ“ａ”が
ヒットされる。このようにして、桁毎に格納された２分
木構造の検索対象データ中から検索データ“ａｈｄ”が
高速に検索される。In the search for the third digit key data "a", the data "b" linked to the second digit data "h" is retrieved.
As a result, the data "a" is hit by proceeding to the left branch. In this way, the search data "ahd" is searched at high speed from the binary tree structure search target data stored for each digit.

【００３３】＜本実施形態における完全一致検索の効果
＞完全一致検索では、比較回数はデータの個数ではな
く、各桁のデータの種類に比例する。<Effects of Perfect Match Search in the Present Embodiment> In perfect match search, the number of comparisons is not proportional to the number of data but to the type of data of each digit.

【００３４】一般にＸ_ｎＸ_ｎ−１・・・Ｘ_２Ｘ_１のデー
タがそれぞれＭ_ｎＭ_ｎ−１・・・Ｍ _２Ｍ_１の種類のデー
タで構成される場合、データの個数はＭ_ｎ×Ｍ_ｎ−１×
・・・×Ｍ_２×Ｍ_１で表せられ、比較回数は次の通りで
ある。In general, X_nX_n-1... X₂X₁Day of
Is M_nM_n-1... M ₂M₁Kind of day
Data, the number of data is M_n× M_n-1×
... × M₂× M₁And the number of comparisons is
is there.

【００３５】[0035]

【表１】対数は、ｌｏｇ_ａＸＹ＝ｌｏｇ_ａＸ＋ｌｏｇ_ａＹという
性質を持つことから、従来の２分木の場合と本実施形態
の場合の平均比較回数は同一回数となる。[Table 1] Since the logarithm has the property of log _a XY = log _a X + log _a Y, the average number of comparisons in the case of the conventional binary tree and the case of the present embodiment is the same.

【００３６】大量のデータが存在する場合は、Ｍの値
は、一様に存在すると考えられるので、Ｍ_ｎ＝Ｍ_ｎ−１
＝・・・＝Ｍ_２＝Ｍ_１＝Ｍとすると、次のようになる。When a large amount of data exists, the value of M is considered to exist uniformly, so that _Mn = _Mn-1
=== M ₂ = M ₁ = M

【００３７】[0037]

【表２】最大比較回数は、この実施の形態の場合の方が少なくて
済むため、検索時間にばらつきが無く、安定した検索が
可能となる。[Table 2] Since the maximum number of comparisons in this embodiment can be smaller, a stable search can be performed without variation in search time.

【００３８】（２）曖昧検索＜本実施の形態における曖昧検索の方法＞次に、ｎ桁の
データの一部にｋ桁の検索文字（列）が存在するか否か
を検索する曖昧検索について説明する。曖昧検索では、
検索対象となる全てのデータに対して検索を行い、検索
データを全て抽出する。今、検索対象データ：Ｘ_ｎＸ_ｎ−１・・・Ｘ_ｋ＋１Ｘ_ｋ・・・Ｘ_３Ｘ_２Ｘ_１検索データ：Ｙ_ｋ＋１Ｙ_ｋ・・・Ｙ_３Ｙ_２Ｙ_１として、各桁毎に２分木構造で検索データをメモリ上に
格納すると、図６に示すような木構造となる。このデー
タの格納処理については、完全一致検索の場合と同じで
ある。また、図６において、四角で囲んだ部分は、各桁
の木構造のまとまりを示している。(2) Fuzzy Search <Fuzzy Search Method in the Present Embodiment> Next, a fuzzy search for searching whether or not a k-digit search character (string) exists in a part of n-digit data. explain. In fuzzy search,
A search is performed for all data to be searched, and all search data is extracted. Now, the search target _{_{_{data: X n X n-1 ···}}} X k + 1 X k ··· X 3 X 2 X 1 search data: as _{_{_{Y k + 1 Y k ··· Y}}} 3 Y 2 Y 1, each digit When the search data is stored in the memory in a binary tree structure, a tree structure as shown in FIG. 6 is obtained. The processing for storing this data is the same as in the case of the perfect match search. In FIG. 6, a portion surrounded by a square indicates a united tree structure of each digit.

【００３９】この曖昧検索では、例えば、検索データ１
０バイトであり、検索対象データが２０バイトである場
合、１０バイトがヒットすれば、全て検索される。従っ
て、最初に、１０バイトヒットしても、次の１０バイト
にもヒットするものが存在する可能性がある。このた
め、検索データがｋ桁ある場合、図６に一点鎖線で示す
ように、最初にｋ桁の検索を実行し、その後に、桁を１
桁ずらして図中の破線で囲む範囲をデータを検索してい
く。このように、順次桁をずらして検索対象データの最
後の桁まで検索を実行する。In this fuzzy search, for example, search data 1
If it is 0 bytes and the search target data is 20 bytes, if 10 bytes are hit, all are searched. Therefore, even if a 10-byte hit occurs first, there is a possibility that there is a hit in the next 10 bytes. For this reason, when the search data has k digits, as shown by the dashed line in FIG.
The data is searched in a range shifted by a digit and surrounded by a broken line in the figure. In this manner, the search is executed by shifting the digits sequentially to the last digit of the search target data.

【００４０】具体的に説明すると、今、ｎ桁のデータが
それぞれＸ_ｎＸ_ｎ−１・・・Ｘ_２Ｘ _１で表現され、Ｍ_ｎ
Ｍ_ｎ−１・・・Ｍ_２Ｍ_１の種類のデータで構成される場
合を考える。但し、Ｘは任意の値である。More specifically, the data of n digits is now
Each X_nX_n-1... X₂X ₁, And M_n
M_n-1... M₂M₁Field composed of different types of data
Think about a match. However, X is an arbitrary value.

【００４１】２分木での曖昧検索を実現する場合には、
全データを検索する必要があるため、逐次検索と同様の
検索回数となる。In order to realize a fuzzy search using a binary tree,
Since all data must be searched, the number of searches is the same as in the sequential search.

【００４２】本発明の曖昧検索では、図７に示すような
検索範囲および検索順により検索を行う。In the fuzzy search of the present invention, a search is performed according to a search range and a search order as shown in FIG.

【００４３】すなわち、検索対象データ：Ｘ_ｎＸ_ｎ−１
・・・Ｘ_ｋ＋１Ｘ_ｋＸ_ｋー１・・・Ｘ_２Ｘ_１おいて、第
１回目の検索では、１桁からｋ桁（Ｘ_ｋ・・・Ｘ
_２Ｘ_１）までのデータが検索され、第２回目の検索で
は、１桁ずらして２桁からｋ＋１桁（Ｘ_ｋ＋１Ｘ_ｋＸ
_ｋー１・・・Ｘ_２）までの検索が実行される。このよう
にして、第（ｎ−ｋ＋１）回目までの検索が実行される
のである。[0043] In other words, the search target _data: _{X n} X _n-1
.., X _{k + 1} X _k X _k−1 ... X ₂ X _{1 In} the first search, 1 to k digits (X _k.
₂ X ₁ ) is searched, and in the second search, the data is shifted by one digit from 2 digits to k + 1 digits (X _{k + 1} X _k X
The search up to _k-1 ... X ₂ ) is executed. In this manner, the search up to the (n−k + 1) th search is executed.

【００４４】＜本実施の形態における曖昧検索の効果＞
上述のような曖昧検索の場合、各検索範囲での比較回数
は、以下のようになる。<Effect of Fuzzy Search in the Present Embodiment>
In the case of the fuzzy search described above, the number of comparisons in each search range is as follows.

【００４５】すなわち、第１回目の検索では“ＫＭ”
回、第２回目の検索では“Ｍ・ＫＭ（＝ＫＭ^２）、・・
・、第（ｎ−Ｋ＋１）回目の検索では“Ｍ^ｎ−ＫＫＭ
（＝ＫＭ^ｎ ^−Ｋ＋１）”となる。That is, in the first search, "KM"
Times, the second search is “M · KM (= KM ² ),
・ In the (n−K + 1) -th search, “M ^n−K KM
⁽⁼ ^{KM n -K +} 1) becomes ".

【００４６】大量のデータが存在する場合には、Ｍの値
は一様に存在すると考えられるので、Ｍ_ｎ＝Ｍ_ｎ−１＝
・・・＝Ｍ_２＝Ｍ_１＝Ｍとすると、検索データがｋ桁の
場合に、比較回数は次のようになる。When a large amount of data exists, the value of M is considered to exist uniformly, so that _Mn = _Mn-1 =
.. = M ₂ = M ₁ = M, and the number of comparisons is as follows when the search data has k digits.

【００４７】[0047]

【表３】上記表３からも理解されるように、本実施の形態におけ
る曖昧検索では、従来の２分木の検索に比べて高速に検
索できる。[Table 3] As can be understood from Table 3 above, the fuzzy search in the present embodiment can perform a search faster than the conventional binary tree search.

【００４８】（３）データの整列処理（ソート）検索対象となるデータを桁毎にメモリ上に２分木構造で
格納し、格納された２分木構造のデータを上位桁から検
索するという手法を用いることにより、データの整列処
理が実現できる。(3) Data Sorting Process (Sort) A method of storing data to be searched for each digit in a memory in a binary tree structure and searching the stored binary tree structure data from the upper digit. , Data alignment processing can be realized.

【００４９】今、木構造に格納されるデータを、とし、これらのデータを上位桁から順に図８に示すよう
に木構造に格納する。図８は、データを昇順に整列する
場合と、データを降順に整列する場合を示している。Now, the data stored in the tree structure is These data are stored in a tree structure as shown in FIG. FIG. 8 shows a case where data is sorted in ascending order and a case where data is sorted in descending order.

【００５０】先ず、データを昇順に整列する場合には、
図中１で示す走査路に従って、後方検索を実行する。こ
れにより、データの格納位置がわかるので、図中に黒丸
で示すように、１桁目から順にデータを取得していく。First, in order to arrange data in ascending order,
A backward search is executed according to the scanning path indicated by 1 in the figure. As a result, since the storage position of the data can be known, the data is acquired sequentially from the first digit, as indicated by the black circles in the figure.

【００５１】一方、データを降順に検索する場合には、
図中２で示す走査路に従って、前方検索を実行する。こ
れにより、データの格納位置がわかるので、図中に黒丸
で示すように、上位桁から順にデータを取得していけば
良い。On the other hand, when searching for data in descending order,
A forward search is executed according to the scanning path indicated by 2 in the figure. As a result, since the storage position of the data can be known, the data may be obtained in order from the upper digit, as indicated by a black circle in the figure.

【００５２】このように本実施の形態によれば、データ
の整列処理を簡単かつ短時間に実行することができる。As described above, according to the present embodiment, the data alignment process can be executed easily and in a short time.

【００５３】[0053]

【発明の効果】以上説明したように本発明のデータ検索
方法、およびデータ検索装置によれば、レコード中のキ
ーデータを検索データとするようなデータ構造に対して
検索時間にばらつきのない完全一致検索および高速な曖
昧一致検索を実現することができる。As described above, according to the data search method and the data search apparatus of the present invention, perfect match without variation in search time is obtained for a data structure in which key data in a record is used as search data. A search and a high-speed fuzzy match search can be realized.

【００５４】また、本発明のデータ整列方法によれば、
データの整列処理を簡単かつ短時間に実行することがで
きる。According to the data alignment method of the present invention,
The data sorting process can be executed easily and in a short time.

[Brief description of the drawings]

【図１】本発明によるデータ検索装置の実施形態の構成
を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an embodiment of a data search device according to the present invention.

【図２】多数ある検索対象データを桁毎にまとめる様子
を示す説明図である。FIG. 2 is an explanatory diagram showing a state in which a large number of search target data are grouped for each digit.

【図３】桁毎に２分木構造で格納される各要素をＣ言語
の構造体で示す説明図である。FIG. 3 is an explanatory diagram showing each element stored in a binary tree structure for each digit in a C language structure.

【図４】桁毎に２分木に格納されるデータ構造を示す説
明図である。FIG. 4 is an explanatory diagram showing a data structure stored in a binary tree for each digit.

【図５】桁毎に格納された２分木構造の一例を示す説明
図である。FIG. 5 is an explanatory diagram showing an example of a binary tree structure stored for each digit.

【図６】桁毎に格納された２分木構造において曖昧検索
を実行する場合の説明図である。FIG. 6 is an explanatory diagram of a case where an ambiguous search is executed in a binary tree structure stored for each digit.

【図７】曖昧検索における検索範囲と検索順を示す説明
図である。FIG. 7 is an explanatory diagram showing a search range and a search order in an ambiguous search.

【図８】本発明におけるデータ整列方法を示す説明図で
ある。FIG. 8 is an explanatory diagram showing a data alignment method according to the present invention.

【図９】木構造の説明図である。FIG. 9 is an explanatory diagram of a tree structure.

【図１０】木構造におけるデータ検索を示す説明図であ
る。FIG. 10 is an explanatory diagram showing a data search in a tree structure.

[Explanation of symbols]

１データ検索装置２ＣＰＵ３データ記憶部４検索対象データ格納部５データ検索部 DESCRIPTION OF SYMBOLS 1 Data search device 2 CPU 3 Data storage part 4 Search target data storage part 5 Data search part

Claims

[Claims]

1. A data to be searched is stored in a memory in a binary tree structure for each digit, and search data is compared digit by digit with respect to the stored binary tree structure data to obtain target data. A data search method characterized by searching.

2. The data search method according to claim 1, wherein when performing an ambiguous search, the search data is compared digit by digit with respect to the stored binary tree structure data, and the target data is determined. A data search method comprising: performing a search, then shifting the digit by one digit, and performing a search again to the last digit.

3. The data to be sorted is stored in a memory in a binary tree structure for each digit, and a backward search or a forward search is executed in order from the upper digit or the lower digit of the stored data of the binary tree structure. And sorting the data to be sorted in descending order or ascending order.

4. Search target data storage means for storing data to be searched in a memory in a binary tree structure for each digit, data of each node from a root of the stored data of the binary tree structure, and search data. And a data searching means for searching for target data by comparing the data with the data.

5. The data search device according to claim 4, wherein the data structure of each node forming the binary tree includes digit data for storing the data to be searched for each digit, and a digit data portion. A pointer to digit data smaller than the own data stored in the pointer, a pointer to digit data larger than the own data, a pointer to the next digit data of the own data, a flag indicating the end of the data, A data search device comprising:

6. The data search device according to claim 4, wherein the data search unit executes an ambiguous search,
The search data is compared digit by digit with respect to the stored binary tree structure data to search for the target data, and the digit is set to 1
A data search device, including a function of repeatedly executing a process of performing a search by shifting digits to the last digit.