JPH10177582A

JPH10177582A - Method and device for retrieving longest match

Info

Publication number: JPH10177582A
Application number: JP8338666A
Authority: JP
Inventors: Nobuo Muto; 信夫武藤; Masayoshi Sasaki; 優美佐々木
Original assignee: N T T COMMUN WEAR KK; Nippon Telegraph and Telephone Corp
Current assignee: N T T COMMUN WEAR KK; Nippon Telegraph and Telephone Corp
Priority date: 1996-12-18
Filing date: 1996-12-18
Publication date: 1998-06-30

Abstract

PROBLEM TO BE SOLVED: To provide a method and device for retrieving longest match which can realize longest match retrieval of a word dictionary, etc., in natural language processing at a high speed and can efficiently realize longest match retrieval at a high speed. SOLUTION: Record groups are sorted in ascending order by key items (S1) and the record number indicating the record that matches the key item of an arbitrary record at the longest length before the arbitrary record is set as a pointer at every record (S2). The arranging position of a retrieval key word in a record group composed of records is retrieved (S3) and the length of the key in the record at the retrieved position is collated with the length of the retrieval key (S4) and, when the lengths coincide with each other, the record is decided as a longest match solution (S5). When the lengths do not coincide with each other, the the record of the next candidate is retrieved by using the pointer set in the record (S6).

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、最長一致検索方法
及び装置に係り、特に、自然言語処理の形態素解析など
で、頻繁に使用される単語辞書の検索に関するものであ
り、入力された検索キーワードに最長で一致するキーを
もつレコードを検索したり、検索キーワードの先頭か
ら、辞書にあるレコードを検索する場合に利用すること
が可能な最長一致検索方法及び装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a longest match search method and apparatus, and more particularly, to a search for a word dictionary frequently used in morphological analysis of natural language processing, and the like. The present invention relates to a longest match search method and apparatus that can be used to search for a record having a key that matches the longest match for a record or to search a record in a dictionary from the beginning of a search keyword.

【０００２】[0002]

【従来の技術】従来、最長一致検索を実現するには、シ
ーケンシャル・サーチ方式やトライ方式、バイナリ・サ
ーチ方式等がある。シーケンシャル・サーチ方式は、キ
ー項目を含む表のデータを先頭から順に探索する方法で
ある。2. Description of the Related Art Conventionally, a longest match search is realized by a sequential search method, a trie method, a binary search method, or the like. The sequential search method is a method of sequentially searching data of a table including a key item from the head.

【０００３】トライ方式は、文字列の長さをｋとすると
Ｏ（ｋ）の計算量で探索、挿入等の操作が可能である。
バイナリ・サーチ方式は、一つの群の項目をある基準で
２つに分け、検索対象が含まれる方を選択する方式であ
り、データがソートされている時に有効な探索法であ
り、計算量は、Ｏ（log ｎ）である。In the trie method, when the length of a character string is k, operations such as search and insertion can be performed with a calculation amount of O (k).
The binary search method is a method in which one group of items is divided into two based on a certain criterion, and a search target is selected. The search method is effective when data is sorted. , O (log n).

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、上記従
来の自然言語処理で頻繁に用いられている単語辞書等の
最長一致検索を高速に行うには、以下のような問題があ
る。上記従来のシーケンシャル・サーチ方式は、表の大
きさがｎの場合、目的のデータが終端にある場合には、
サーチをｎ回行うことになる。サーチの平均オーダはＯ
（（ｎ＋１）／２）となり、データ量が多い場合には、
極端に処理量が多くなるという問題がある。However, performing the longest match search of a word dictionary or the like frequently used in the above-mentioned conventional natural language processing at high speed has the following problems. In the above conventional sequential search method, when the size of the table is n, and when the target data is at the end,
The search is performed n times. The average search order is O
((N + 1) / 2), and when the data amount is large,
There is a problem that the processing amount becomes extremely large.

【０００５】また、上記従来のトライ方式では、単語辞
書の検索には有効であるが、ノード間のリンク等、管理
が複雑で、記憶領域の無駄が大きい。また、データベー
ス等で頻繁に使うテーブル（表）と異なる管理が必要と
なり、レコード単位の管理には不向きである。Although the above-mentioned conventional try method is effective for searching a word dictionary, management of links between nodes and the like is complicated, and waste of a storage area is large. In addition, management different from a table frequently used in a database or the like is required, which is not suitable for management in units of records.

【０００６】また、上記従来のバイナリ・サーチ方式で
は、キーワードの前方一致する検索に有効であり、辞書
メンテが容易で高速に処理を行うことが可能であるが、
この方式のみでは、最長一致検索を実現できないという
問題がある。本発明は、上記の点に鑑みなされたもの
で、上記従来の問題点を解決し、自然言語処理における
単語辞書等の最長一致検索を高速に実現することが可能
であり、高速で効率的な最長一致検索を実現することが
可能な最長一致検索方法及び装置を提供することを目的
とする。Further, the above-mentioned conventional binary search method is effective for searching for a keyword that matches the beginning of the keyword, and can perform dictionary maintenance easily and at high speed.
There is a problem that the longest match search cannot be realized only by this method. The present invention has been made in view of the above points, and solves the above-described conventional problems, and can realize a longest match search of a word dictionary or the like in natural language processing at high speed. It is an object of the present invention to provide a longest match search method and apparatus capable of realizing a longest match search.

【０００７】更なる本発明の目的は、単語辞書の検索の
ようなバックトラックにより候補を検索する必要がある
場合でも、ポインタを管理するだけで記憶容量も少な
く、辞書の検索回数も少なくすることが可能な最長一致
検索方法及び装置を提供することである。A further object of the present invention is to reduce the storage capacity and the number of dictionary searches only by managing pointers, even when it is necessary to search for candidates by backtracking, such as when searching a word dictionary. Is to provide a longest match search method and apparatus that can perform the above.

【０００８】[0008]

【課題を解決するための手段】図１は、本発明の原理を
説明するための図である。本発明は、可変長のキー項目
を有するレコード群から、任意の検索キーワードで最長
で一致するキーを有するレコードを検索する最長一致検
索方法において、レコード群をキー項目で昇順にソート
し（ステップ１）、任意のレコードより前で該レコード
のキー項目と最長で一致するレコードを指すレコード番
号をポインタとしてレコード毎に設定し（ステップ
２）、レコードからなるレコード群中の検索キーワード
の並びの位置を探索し（ステップ３）、探索した位置の
レコード中のキーと検索キーワードの長さ分について照
合し（ステップ４）、一致した場合には、該レコードを
最長一致解とし（ステップ５）、一致しない場合には、
該レコードに設定されているポインタにより次候補のレ
コードを検索する（ステップ６）。FIG. 1 is a diagram for explaining the principle of the present invention. According to the present invention, in a longest match search method for searching a record having a key that has the longest match with an arbitrary search keyword from a record group having variable-length key items, the record groups are sorted in ascending order by key items (step 1). ), A record number indicating the record that matches the key item of the record longest before any record is set as a pointer for each record (step 2), and the position of the search keyword in the record group consisting of records is determined. A search is performed (step 3), and the key in the record at the searched position is collated with the length of the search keyword (step 4). If there is a match, the record is regarded as the longest matching solution (step 5), and no match is found. in case of,
The next candidate record is searched by the pointer set in the record (step 6).

【０００９】また、本発明は、レコード群中の検索キー
ワードの並びの位置を探索する際に、バイナリ・サーチ
を用いる。また、本発明は、レコード中に設定されるポ
インタを、該レコードからの相対番号とする。In addition, the present invention uses a binary search when searching for the position of a search keyword in a record group. According to the present invention, a pointer set in a record is a relative number from the record.

【００１０】図２は、本発明の原理構成図である。本発
明は、可変長のキー項目を有するレコード群から、任意
の検索キーワードで最長で一致するキーを有するレコー
ドを検索する最長一致検索装置であって、各レコードを
キー項目で昇順にソートし、各レコード毎に、当該レコ
ードより前で該レコードのキーと最長で一致するレコー
ドを指すレコード番号をポインタとして設定されたレコ
ード群１０と、レコード群１０中のキー項目の並びの位
置を探索し、最長一致候補のレコードを検索するレコー
ド位置検索手段２０と、検索キーワードとレコード群の
レコード中のキー項目の長さ分を照合する最長一致照合
手段３０と、レコードのポインタを用いて次候補のレコ
ードを検索する次候補検索手段４０とを有する。FIG. 2 is a diagram showing the principle of the present invention. The present invention is a longest match search device that searches a record group having a variable length key item for a record having a key that matches the longest with an arbitrary search keyword, and sorts each record in ascending order by the key item, For each record, a record group 10 set as a pointer with a record number indicating a record that matches the key of the record longest before the record, and a position of a row of key items in the record group 10 are searched, Record position searching means 20 for searching for the longest match candidate record, longest match matching means 30 for matching the search keyword with the length of the key item in the record of the record group, and the next candidate record using the record pointer And a next candidate search unit 40 for searching for.

【００１１】また、上記のレコード位置検索手段２０
は、バイナリ・サーチを用いる。また、上記のレコード
群のポインタは、レコード群のレコードからの相対番号
とする。上記のレコード位置検索手段では、バイナリ・
サーチを使用することにより、入力されたキーワードの
並びの位置と最長一致候補のレコードを検索し、最長一
致照合手段では、検索したレコードのキー値を検索キー
ワードを当該レコードのキー長分照合して一致するかを
照合する。一般に、バイナリ・サーチでは、検索キーワ
ードとレコードのキーが最長で一致するレコードが得ら
れる保証はない。しかし、本発明では、最初の文字とレ
コードのキー項目の最初の文字が一致していれば、ポイ
ンタを辿ることにより、必ず最長一致レコードに到達で
きる。The above-mentioned record position search means 20
Uses a binary search. The pointer of the record group is a relative number from the record of the record group. In the above record position search means, binary
By using the search, the position of the input keyword sequence and the record of the longest match candidate are searched, and the longest match matching means compares the key value of the searched record with the search keyword for the key length of the record. Check for a match. In general, in binary search, there is no guarantee that a record in which the key of the record matches the search keyword at the longest is obtained. However, according to the present invention, if the first character matches the first character of the key item of the record, the longest matching record can always be reached by tracing the pointer.

【００１２】従って、次候補検索手段により、ポインタ
に基づいて次候補となるレコードを検索し、最長一致照
合手段において最長一致照合することで、キーワードに
最長一致するレコードを検索することが可能となる。ポ
インタを用いた次候補検索手段により最大でも文字列の
長さのオーダの検索で実現できる。Therefore, the next candidate search means searches for the next candidate record based on the pointer, and the longest match matching means performs the longest match check, thereby making it possible to search for the record that matches the keyword the longest. . This can be realized by searching for the order of at most the length of the character string by the next candidate search means using a pointer.

【００１３】また、自然言語処理の形態素解析における
単語辞書の検索では、「あおき」「あお」「あ」等のよ
うに、最長一致候補のみでなく、次候補に最長で一致す
る単語を取り出すことが必要になる。本発明では、レコ
ード群に設定されたポインタは、次候補と連結している
ので、検索処理なしで、次候補を取得できる。In the search of a word dictionary in the morphological analysis of natural language processing, not only the longest matching candidate but also the longest matching word such as "Aoki", "Ao", "A" is extracted. Is required. In the present invention, the pointer set in the record group is linked to the next candidate, so that the next candidate can be obtained without searching.

【００１４】[0014]

【発明の実施の形態】図３は、本発明の最長一致検索装
置の構成を示す。同図に示す最長一致検索装置は、レコ
ード群を格納するスタックを有する記憶部１０、レコー
ド位置検索部２０、最長一致照合部３０、次候補検索部
４０、キーワード入力部５０及び出力部６０から構成さ
れる。FIG. 3 shows the configuration of a longest match search apparatus according to the present invention. The longest match search device shown in FIG. 1 includes a storage unit 10 having a stack for storing a record group, a record position search unit 20, a longest match check unit 30, a next candidate search unit 40, a keyword input unit 50, and an output unit 60. Is done.

【００１５】記憶部１０は、図４に示すように、レコー
ド番号、キー項目、キー長、次ポインタからなるレコー
ドをスタックに格納しており、各レコードは、キー項目
で昇順にソートされている。記憶部１０に格納される各
レコードの次ポインタは、図５に示すように、元辞書デ
ータが入力されると（ステップ１０）、当該元辞書デー
タをキー項目で昇順（５０音順・キー長）ソートし（ス
テップ１１）、レコード毎に、当該レコードより以前に
位置し、当該レコードのキー項目と最長で一致するレコ
ードを指すレコード番号をポインタとして設定し（ステ
ップ１２）、生成辞書データ、つまり、ソート済かつ次
ポインタが設定されたレコード群が生成される（ステッ
プ１３）。As shown in FIG. 4, the storage unit 10 stores records including a record number, a key item, a key length, and a next pointer in a stack, and each record is sorted in ascending order by the key item. . As shown in FIG. 5, when the original dictionary data is input (step 10), the next pointer of each record stored in the storage unit 10 is the ascending order of the original dictionary data by the key item (alphabetical order / key length). ) Sorting (step 11), and for each record, set as a pointer a record number that points to a record that is located before the record and that matches the key item of the record at the longest (step 12), and generates generated dictionary data, that is, Then, a sorted record group in which the next pointer is set is generated (step 13).

【００１６】レコード位置検索部２０は、入力されたキ
ーワードに対応する記憶部１０内のレコード群のキー項
目の位置を探索し、最長一致候補となるレコードを検索
する。最長一致照合部３０は、探索されたキー項目を有
するレコードのキーと入力されたキーワードの長さにつ
いて照合し、一致した場合には、当該レコードを最長一
致解として出力部６０に転送し、一方、一致しない場合
には、次候補検索部４０に制御を移す。The record position search unit 20 searches for the position of a key item of a record group in the storage unit 10 corresponding to the input keyword, and searches for a record that is the longest match candidate. The longest match matching unit 30 checks the key of the record having the searched key item against the length of the input keyword, and if there is a match, transfers the record to the output unit 60 as the longest match. If not, the control is transferred to the next candidate search unit 40.

【００１７】次候補検索部４０は、最長一致照合部３０
において一致しなかった当該レコードの次ポインタによ
り次候補のレコードを検索し、再度検索されたレコード
の情報を最長一致照合部３０に転送する。キーワード入
力部５０は、検索キーワードの入力を受け付ける。The next candidate search unit 40 includes the longest match matching unit 30
Then, the next candidate record is searched by the next pointer of the record that did not match in the step (a), and the information of the record searched again is transferred to the longest match matching unit 30. The keyword input unit 50 receives an input of a search keyword.

【００１８】出力部は６０は、最長一致照合部３０にお
いて最長一致した最長一致解を検索結果として出力す
る。次に、本発明の概要動作を説明する。図６は、本発
明の最長一致検索の概要動作を示すフローチャートであ
る。The output unit 60 outputs the longest matching solution obtained by the longest match in the longest match checking unit 30 as a search result. Next, the outline operation of the present invention will be described. FIG. 6 is a flowchart showing an outline operation of the longest match search of the present invention.

【００１９】ステップ１０１）まず、キーワード入力
部５０から検索キーワードが入力される。ステップ１０２）レコード位置検索部２０は、入力さ
れた検索キーワードについてバイナリ・サーチを行い、
検索キーワードの並びの位置を探索し、最長一致候補の
レコードを記憶部１０のレコード群から取り出す。Step 101) First, a search keyword is input from the keyword input section 50. Step 102) The record position search unit 20 performs a binary search for the input search keyword,
The position of the search keyword is searched for, and the longest match candidate record is extracted from the record group in the storage unit 10.

【００２０】ステップ１０３）最長一致照合部３０
は、検索キーワードとレコード位置検索部２０により取
得した最長一致候補のレコードのキー項目の長さ分を比
較する。例えば、検索キーワードがｎ文字、取得したレ
コードのキー項目がｍ文字（但し、ｎ≧ｍ）である場合
には、ｍ文字分について比較する。Step 103) Longest match checking section 30
Compares the search keyword with the length of the key item of the record of the longest match candidate acquired by the record position search unit 20. For example, if the search keyword is n characters and the key item of the acquired record is m characters (where n ≧ m), the comparison is performed for m characters.

【００２１】ステップ１０４）比較した結果、一致し
た場合には、ステップ１０５に移行し、不一致の場合に
はステップ１０６に移行する。ステップ１０５）当該レコードのキー項目を最長一致
解として出力部６０より出力する。Step 104) If the result of comparison is a match, the process proceeds to step 105, and if not, the process proceeds to step 106. Step 105) The key item of the record is output from the output unit 60 as the longest matching solution.

【００２２】ステップ１０６）次候補検索部４０は、
取得したレコードと検索キーワードの先頭の文字が一致
するかを判定し、一致する場合には、ステップ１０７に
移行し、不一致の場合には、解なしの旨を出力部６０か
ら出力する。ステップ１０７）次候補検索部４０は、取得したレコ
ードのポインタを参照して次ポインタを取得し、その情
報を最長一致照合部３０に転送し、ステップ１０３に移
行する。Step 106) The next candidate search unit 40
It is determined whether or not the acquired record and the first character of the search keyword match. If they match, the process proceeds to step 107, and if they do not match, the output unit 60 outputs no solution. Step 107) The next candidate search unit 40 acquires the next pointer by referring to the pointer of the acquired record, transfers the information to the longest match matching unit 30, and proceeds to step 103.

【００２３】[0023]

【実施例】以下、図面と共に本発明の実施例を説明す
る。［第１の実施例］最初に記憶部１０に格納されるレコー
ド群の次ポインタの設定方法について説明する。Embodiments of the present invention will be described below with reference to the drawings. [First Embodiment] First, a method of setting a next pointer of a group of records stored in the storage unit 10 will be described.

【００２４】記憶部１０に格納されるレコード群を図７
を用いて説明する。同図において、レコードは、キー項
目の昇順にソートされ、キー値の長さと次ポインタが設
定されている。例えば、記憶部１０のレコード番号６
「アオキガオカ」のスタックには、レコード番号６以前
のレコード番号５のキー「アオキ」と先頭からキー値の
長さ３文字分「アオキガオカ」と一致するので、次ポイ
ンタとして、レコード番号５を指すポインタ“５”が格
納され、「アオキ」のキー値を持つレコード番号５のス
タックには、レコード番号３の「アオ」が最長で一致す
るので、ポインタとして“３”が設定される。A record group stored in the storage unit 10 is shown in FIG.
This will be described with reference to FIG. In the figure, the records are sorted in ascending order of the key items, and the length of the key value and the next pointer are set. For example, the record number 6 of the storage unit 10
The stack of "Aokigaoka" Since the match from the beginning with the record number 6 previous record number 5 of the key, "Aoki" of key value length of 3 characters as "Aoki Gaoka", as the next pointer, refers to the record number 5 The pointer “5” is stored, and “3” is set as the pointer in the stack of record number 5 having the key value of “Aoki” because “Ao” of record number 3 matches the longest.

【００２５】次に、元辞書データから記憶部１０に格納
される生成辞書データを生成する際の動作を説明する。
図８は、本発明の第１の実施例のレコード群の次ポイン
タの設定方法を説明するための図である。Next, the operation for generating the generated dictionary data stored in the storage unit 10 from the original dictionary data will be described.
FIG. 8 is a diagram for explaining a method of setting a next pointer of a record group according to the first embodiment of this invention.

【００２６】同図に示す、レコード番号、元辞書データ
キー項目及びキー長からなる元辞書データが入力される
と、元辞書データのキー項目について５０音順、キー長
によりソートする。この例では、先頭には、５０音の先
頭であり、キー長が最も短い「ア」が生成辞書データの
先頭に設定され、その次ポインタは“０”となる。次に
５０音順には「アイオイ」のレコードが位置し、当該レ
コードより以前のレコードの「ア」と先頭の１文字が一
致するので、次ポインタは“１”となる。このような処
理を繰り返すことにより、生成辞書データが構成され
る。When the original dictionary data including the record number, the original dictionary data key item and the key length shown in FIG. 1 is input, the key items of the original dictionary data are sorted in the order of the Japanese syllabary and in the key length. In this example, “A” having the shortest key length, which is the beginning of the Japanese syllabary, is set at the beginning of the generated dictionary data, and the next pointer is “0”. Next, in the order of the Japanese syllabary, the record of “Ioi” is located, and since the first character matches “A” of the record before the record, the next pointer is “1”. By repeating such processing, generated dictionary data is configured.

【００２７】このような手順を図９に示すフローチャー
トに沿って具体的に説明する。まず、最初の処理対象レ
コード（レコード番号１）を取り出す（ステップ２０
１）。取り出されたレコードは、キー項目に「ア」を持
ち、記憶部１０のスタックトップを参照する（ステップ
２０２，２０３）。ここで、スタックが空であるので
（ステップ２０４）、次ポインタに“０”を設定する
（ステップ２０９）。該当レコードのキーとレコード番
号“１”が記憶部１０のスタックに格納され（ステップ
２１０）、図８に示すレコード“１”のように次ポイン
タ“０”が記憶部１０上に格納される（ステップ２１
１）。この状態における記憶部１０のスタックは、図１
０（ａ）の状態になっている。Such a procedure will be specifically described with reference to a flowchart shown in FIG. First, the first record to be processed (record number 1) is extracted (step 20).
1). The retrieved record has “A” as a key item, and refers to the stack top of the storage unit 10 (steps 202 and 203). Here, since the stack is empty (step 204), "0" is set to the next pointer (step 209). The key of the record and the record number “1” are stored in the stack of the storage unit 10 (step 210), and the next pointer “0” is stored in the storage unit 10 like the record “1” shown in FIG. Step 21
1). The stack of the storage unit 10 in this state is shown in FIG.
0 (a).

【００２８】次に、次のレコード（キー項目の値「アイ
オイ」）が取得されると（ステップ２０１）、現在の状
態では、記憶部１０のスタックは空ではないので（ステ
ップ２０４，Ｎｏ）、スタックの「ア」と「アイオイ」
をスタックのキー長１文字で比較する（ステップ２０
５）。その結果、一致するので（ステップ２０６，Ｙｅ
ｓ）、レコード“２”の次のポインタとして、一致した
記憶部１０のスタックに「ア」を持つレコード番号
“１”を設定して（ステップ２０７）、「アイオイ」の
レコードを記憶部１０のスタックに積む（ステップ２１
０）。これにより、記憶部１０にレコード“２”が出力
される（ステップ２１１）。この状態の記憶部１０のス
タックは、図１０（ｂ）の状態になっている。Next, when the next record (key item value “Ioi”) is obtained (step 201), the stack of the storage unit 10 is not empty in the current state (step 204, No). of the stack and the "a""aIoi"
Is compared with one key length of the stack (step 20).
5). As a result, they match (step 206, Ye
s) The record number “1” having “A” is set in the stack of the matching storage unit 10 as the next pointer of the record “2” (Step 207), and the record of “Ioi” is stored in the storage unit 10. Stack on the stack (Step 21
0). As a result, the record “2” is output to the storage unit 10 (Step 211). The stack of the storage unit 10 in this state is in the state of FIG.

【００２９】次に、「アオ」のレコードが取得され（ス
テップ２０１）、記憶部１０のスタックトップの「アイ
オイ」と「アオ」がスタックのキー長４文字分の比較が
行われ（ステップ２０５）、この場合、不一致となり、
当該「アイオイ」のレコードが記憶部１０のスタックか
ら除かれ（ステップ２０８）、ステップ２０３に移行
し、「ア」のキーを参照する。次に、「ア」と「アオ」
が１文字分比較され（ステップ２０５）、一致するた
め、「アオ」のレコードにキー項目の値「ア」のレコー
ド番号が次ポインタとして設定され（ステップ２０
７）、「アオ」のレコードが記憶部１０のスタックに積
まれる（ステップ２１０）。さらに、生成辞書データの
レコード“３”が出力される（ステップ２１１）。この
状態の記憶部１０のスタックは図１０（ｃ）の状態とな
る。Next, a record of "Ao" is obtained (Step 201), and "Ioi" and "Ao" at the top of the stack in the storage unit 10 are compared with each other for a key length of 4 characters of the stack (Step 205). , In this case a mismatch,
The record of “Ioi” is removed from the stack of the storage unit 10 (Step 208), and the process proceeds to Step 203 to refer to the key of “A”. Then, with the "A", "A O"
Are compared by one character (step 205), and because they match, the record number of the key item value “a” is set as the next pointer in the record of “ao” (step 20).
7) The record of "blue" is stacked on the stack of the storage unit 10 (step 210). Further, record “3” of the generated dictionary data is output (step 211). The stack of the storage unit 10 in this state is as shown in FIG.

【００３０】上記のように、図８に示す生成辞書データ
のように次ポインタが設定される。次に、本実施例にお
ける最長一致検索の処理を具体的に説明する。図１１
は、本発明の第１の実施例の最長一致検索処理のフロー
チャートである。同図のフローチャートは、図１２に示
す記憶部１０の生成辞書データから「アオキガオカシ
タ」と最長一致するレコードを検索する場合を示す。As described above, the next pointer is set as in the generated dictionary data shown in FIG. Next, the processing of the longest match search in this embodiment will be specifically described. FIG.
9 is a flowchart of a longest match search process according to the first embodiment of this invention. The flowchart in FIG. 11 shows a case in which a record that has the longest match with “Aokigaokashita” is searched from the generated dictionary data in the storage unit 10 shown in FIG.

【００３１】まず、レコード位置検索部２０は、入力部
５０から入力されたキーワード『アオキガオカシタ』に
対する並びの位置を検索する。記憶部１０のレコード群
は、ソートされているので、検索アルゴリズムとして
は、高速なバイナリサーチアルゴリズムが適用でき、
「アオキガオカ」と「アオバ」の間に、「アオキガオカ
シタ」が位置付けられる（ステップ３０１）。First, the record position search unit 20 searches for the position of the sequence for the keyword "Aokigaokashita" input from the input unit 50. Since the records in the storage unit 10 are sorted, a high-speed binary search algorithm can be applied as a search algorithm.
"Aokigaokashita" is positioned between "Aokigaoka" and "Aoba" (step 301).

【００３２】「アオキガオカシタ」に、辞書データのキ
ー長で一致するレコードは、ソート条件から、「アオ
バ」以降には存在しない。従って、「アオキガオカ」が
最長一致候補のレコードとなる。次に、最長一致照合部
３０により、「アオキガオカ」と「アオキガオカシタ」
を記憶部１０の「アオキガオカ」のキー長６文字で比較
すると（ステップ３０２）、一致が得られる（ステップ
３０３）。これから、「アオキガオカシタ」と最長で一
致する「アオキガオカ」を有するレコードが検索され
た。According to the sort condition, no record matching "Aokigaokashita" with the key length of the dictionary data exists after "Aoba". Therefore, “Aokigaoka” is the longest match candidate record. Next, the longest match matching unit 30 outputs “Aokigaoka” and “Aokigaokashita”
Is compared with the key length of "Aokigaoka" in the storage unit 10 (step 302), a match is obtained (step 303). From this, records having "Aokigaoka" which is the longest match with "Aokigaokashita" were searched.

【００３３】次候補検索部４０において、次のポインタ
が「アオキ」を指していることを検索する（ステップ３
０４）。順次、「アオキ」の次のポインタは「アオ」を
指すというように、入力されたキーワード「アオキガオ
カシタ」を４文字、３文字、２文字で一致する次の候補
を、次候補検索部４０でポインタが指すレコードを読み
出すことにより検索なしで取得できる。The next candidate search unit 40 searches for the next pointer pointing to "Aoki" (step 3).
04). The next candidate that matches the input keyword “Aokigaokashita” in four, three, and two characters is sequentially searched for, such that the pointer next to “Aoki” points to “Ao”. By reading the record pointed to by the pointer, it can be obtained without searching.

【００３４】次に、図１３に示す具体例に基づいて最長
一致検索の具体的な例を説明する。図１３の例は、キー
ワード入力部５０からキーワード「アンドウ」が入力さ
れた場合の例である。レコード位置検索部２０は、図１
２の場合と同様に、「アワジ」と「イアイ」の間に「ア
ンドウ」が位置付けられる。「イアイ」以降には一致す
るものがないので、「アワジ」が候補として取得され
る。Next, a specific example of the longest match search will be described based on a specific example shown in FIG. The example of FIG. 13 is an example in which the keyword “Ando” is input from the keyword input unit 50. The record position search unit 20 is configured as shown in FIG.
As in the case of No. 2, “Ando” is positioned between “Awaji” and “Iai”. Since there is no match after “Ia”, “Awaji” is acquired as a candidate.

【００３５】最長一致照合部３０で、「アワジ」と「ア
ンドウ」を「アワジ」のキー長３文字で比較すると、不
一致となり、「アワジ」のレコードは最長一致解ではな
い。先頭の１文字は、両者との「ア」であるので、次候
補検索部４０で「アワジ」の次ポインタを指すレコード
（レコード番号１）を取得する。最長一致照合部３０
で、キーワード「アンドウ」と次候補の「ア」とを１文
字で比較することで、一致が得られる。従って、「アン
ドウ」と最長一致するレコードとして「ア」を有するレ
コードが検索される。「ア」のレコードの次のポインタ
は“０”であり、これ以上最長一致するレコードがない
ことを示している。When "Awaji" and "Ando" are compared by the key length of "Awaji" in the longest match collation unit 30, they do not match, and the record of "Awaji" is not the longest match solution. Since the first character is “A” with both, the next candidate search unit 40 acquires a record (record number 1) pointing to the next pointer of “Awaji”. Longest match collation unit 30
By comparing the keyword “Ando” with the next candidate “A” with one character, a match can be obtained. Therefore, a record having “A” as the record that longest matches “Ando” is searched. The next pointer to the record of “A” is “0”, indicating that there is no longer matching record.

【００３６】更に、図１４に示す具体例に基づいて最長
一致検索の具体的な例を説明する。図１４は、キーワー
ド「イア」が入力された場合の例である。まず、レコー
ド位置検索部２０により、「アワジ」と「イアイ」の間
に「イア」が位置づけられる。「イアイ」以降には一致
するものがないので、「アワジ」が候補として取得され
る。最長一致照合部３０で、「アワジ」と「イア」を
「アワジ」のキー長３文字で比較すると不一致となり、
「アワジ」のレコードは、最長一致解とはならない。先
頭の１文字が異なるので、「アワジ」の次のポインタで
次候補を取得しても最長一致解は存在しないことにな
る。これにより、入力されたキーワード「イア」と最長
一致するレコードがないことがわかる。Further, a specific example of the longest match search will be described based on a specific example shown in FIG. FIG. 14 shows an example in which the keyword "ear" is input. First, the record position search unit 20 positions “ear” between “awaji” and “ear”. Since there is no match after “Ia”, “Awaji” is acquired as a candidate. When the longest match matching unit 30 compares “Awaji” and “Oia” with three key lengths of “Awaji”, a mismatch occurs.
The record of "Awaji" is not the longest match. Since the first character is different, even if the next candidate is obtained with the pointer next to "awaji", the longest matching solution does not exist. As a result, it is found that there is no record that longest matches the input keyword "ear".

【００３７】更に、図１５に示す具体例に基づいて最長
一致検索の具体的な例を説明する。図１５は、キーワー
ド「アカサカ」が入力された例である。レコード位置検
索部２０は、「アカサカ」と完全一致するレコードを検
索する。重複キーを許すレコードでは、同一キーの最後
を見つけてレコード（レコード番号“１２”）の「アカ
サカ」を候補とする。最長一致照合部３０で、完全一致
の「アカサカ」のレコード（レコード番号“１２”）が
得られる。次候補検索部４０における次候補の検索は、
前述の図１２、図１３と同様であるので、説明を省略す
る。Further, a specific example of the longest match search will be described based on a specific example shown in FIG. FIG. 15 is an example in which the keyword “Akasaka” has been input. The record position search unit 20 searches for a record that completely matches “Akasaka”. In the record where the duplicate key is permitted, the last of the same key is found, and “Akasaka” of the record (record number “12”) is set as a candidate. The longest match collating unit 30 obtains a record (record number “12”) of “Akasaka” that is a perfect match. The next candidate search in the next candidate search unit 40 is as follows.
12 and 13 described above, and a description thereof will be omitted.

【００３８】［第２の実施例］次に、第２の実施例とし
てポインタとして相対レコード番号を用いる例を説明す
る。図１６は、本発明の第２の実施例の記憶部のレコー
ド群の例であり、前述の第１の実施例でポインタとして
使用していた欄に相対レコード番号を設定した例であ
る。[Second Embodiment] Next, as a second embodiment, an example in which a relative record number is used as a pointer will be described. FIG. 16 is an example of a record group in the storage unit according to the second embodiment of the present invention, in which a relative record number is set in a column used as a pointer in the first embodiment.

【００３９】レコードにおいて当該レコードのレコード
番号から相対レコード番号を減算することにより、ポイ
ントしているレコードを取得することが可能である。例
えば、レコード番号“１２”のレコードの相対レコード
番号は、“３”であるので、ポイントしているレコード
は、レコード番号“９”となる。By subtracting the relative record number from the record number of the record, the pointed record can be obtained. For example, since the relative record number of the record with the record number “12” is “3”, the pointed record is the record number “9”.

【００４０】このように設定することで、前述の第１の
実施例と同様の最長一致検索を行うことが可能である。
なお、本発明は、上記の実施例に限定されることなく、
特許請求の範囲内で種々変更・応用が可能である。With this setting, it is possible to perform the longest match search similar to the first embodiment described above.
It should be noted that the present invention is not limited to the above embodiments,
Various modifications and applications are possible within the scope of the claims.

【００４１】[0041]

【発明の効果】上述のように本発明の最長一致検索方法
及び装置によれば、自然言語処理における単語辞書等の
最長一致検索を高速に実現することができ、バイナリサ
ーチ方式と次候補を連結するポインタを辞書作成時に設
定することにより、高速で効率的な最長一致検索が可能
となる。As described above, according to the longest match search method and apparatus of the present invention, the longest match search such as a word dictionary in natural language processing can be realized at high speed, and the binary search method and the next candidate are connected. By setting the pointer to be used when the dictionary is created, high-speed and efficient longest-match search can be performed.

【００４２】また、ポインタを設定したことにより、次
候補を管理するため、単語辞書の検索のようなバックト
ラックにより候補を検索する必要がある場合でも、ポイ
ンタを管理するだけで、記憶容量も少なく、辞書の検索
回数も少なくできる。また、ポインタとして相対レコー
ド番号を設定することも可能である。Further, since the next candidate is managed by setting the pointer, even when it is necessary to search for the candidate by backtracking, such as a search for a word dictionary, the storage capacity is reduced by only managing the pointer. Also, the number of dictionary searches can be reduced. It is also possible to set a relative record number as a pointer.

[Brief description of the drawings]

【図１】本発明の原理を説明するための図である。FIG. 1 is a diagram for explaining the principle of the present invention.

【図２】本発明の原理構成図である。FIG. 2 is a principle configuration diagram of the present invention.

【図３】本発明の最長一致検索装置の構成図である。FIG. 3 is a configuration diagram of a longest match search device of the present invention.

【図４】本発明の記憶部に格納されるレコード群の構成
を示す図である。FIG. 4 is a diagram illustrating a configuration of a record group stored in a storage unit according to the present invention.

【図５】本発明の記憶部におけるレコードのポインタ設
定のフローチャートである。FIG. 5 is a flowchart of setting a pointer of a record in a storage unit according to the present invention.

【図６】本発明の最長一致検索の概要動作のフローチャ
ートである。FIG. 6 is a flowchart of an outline operation of a longest match search according to the present invention.

【図７】本発明の第１の実施例の検索対象レコードの作
成方法を示す図である。FIG. 7 is a diagram illustrating a method of creating a search target record according to the first embodiment of this invention.

【図８】本発明の第１の実施例のレコード群の次ポイン
タの設定方法を説明するための図である。FIG. 8 is a diagram for explaining a method of setting a next pointer of a record group according to the first embodiment of this invention.

【図９】本発明の第１の実施例の次ポインタの設定動作
のフローチャートである。FIG. 9 is a flowchart of an operation of setting a next pointer according to the first embodiment of this invention.

【図１０】本発明の第１の実施例の記憶部のスタックの
遷移を示す図である。FIG. 10 is a diagram illustrating a transition of a stack of a storage unit according to the first embodiment of this invention.

【図１１】本発明の第１の実施例の最長一致検索処理の
フローチャートである。FIG. 11 is a flowchart of a longest match search process according to the first embodiment of this invention.

【図１２】本発明の第１の実施例の最長一致検索の第１
の具体例である。FIG. 12 illustrates a first example of a longest match search according to the first embodiment of this invention.
Is a specific example.

【図１３】本発明の第１の実施例の最長一致検索の第２
の具体例である。FIG. 13 shows a second example of the longest match search according to the first embodiment of the present invention.
Is a specific example.

【図１４】本発明の第１の実施例の最長一致検索の第３
の具体例である。FIG. 14 shows a third example of the longest match search according to the first embodiment of the present invention.
Is a specific example.

【図１５】本発明の第１の実施例の最長一致検索の第４
の具体例である。FIG. 15 is a fourth example of the longest match search according to the first embodiment of this invention;
Is a specific example.

【図１６】本発明の第２の実施例の記憶部のレコード群
の例である。FIG. 16 is an example of a record group in a storage unit according to the second embodiment of this invention.

[Explanation of symbols]

１０検索対象レコード群、記憶部２０レコード位置検索手段、レコード位置検索部３０最長一致照合手段、最長一致照合部４０次候補検索手段、次候補検索部５０キーワード入力部６０出力部 DESCRIPTION OF SYMBOLS 10 Search target record group, storage part 20 Record position search means, record position search part 30 Longest match matching means, longest match check part 40 Next candidate search means, Next candidate search part 50 Keyword input part 60 Output part

Claims

[Claims]

1. A longest match search method for searching a record having a key that has the longest match with an arbitrary search keyword from a record group having variable length key items, wherein the record group is sorted in ascending order by key items; A record number indicating the record that matches the key item of the record at the longest before any record is set as a pointer for each record, and a search position of the search keyword in a record group including the records is searched. The key in the record at the specified position is compared with the length of the search keyword, and if they match, the record is determined to be the longest matching solution. If not, the next candidate is determined by the pointer set in the record. A longest match search method, which searches for records of the same type.

2. The longest match search method according to claim 1, wherein a binary search is used when searching for the position of the search keyword in the record group.

3. The longest match search method according to claim 1, wherein the pointer set in the record is a relative number from the record.

4. A longest match search apparatus for searching a record group having a variable length key item for a record having a key that matches the longest with an arbitrary search keyword, wherein each record is sorted in ascending order by the key item. For each record, a record group set as a pointer to a record number that points to a record that matches the key of the record longest before the record, and a search is made for a position of a row of key items in the record group, A record position search unit that searches for a record of a longest match candidate; a longest match check unit that matches the search keyword with a length of a key item in a record of the record group; and a longest match in the longest match check unit. When a record cannot be obtained, there is provided a next candidate search means for searching for a next candidate record using the pointer of the record. Longest match search apparatus according to claim Rukoto.

5. The longest match search device according to claim 4, wherein said record position search means uses a binary search.

6. The longest match search device according to claim 4, wherein each pointer of the record group is a relative number from a record of the record group.