JP3427679B2

JP3427679B2 - Computer-readable recording medium recording word search device and word search program

Info

Publication number: JP3427679B2
Application number: JP16145897A
Authority: JP
Inventors: 宏梅基; 昌一舘野
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1997-06-18
Filing date: 1997-06-18
Publication date: 2003-07-22
Anticipated expiration: 2017-06-18
Also published as: JPH117451A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は単語の集合の中から
単語につけられたキーをもとに効率的に単語を検索する
単語検索装置及び単語検索プログラムを記録した媒体に
関し、特に任意の位置にある文字を指定して、関連する
単語を高速に検索する単語検索装置及びそのような検索
をコンピュータに行わせるための単語検索プログラムを
記録した媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a word search device for efficiently searching for a word from a set of words based on a key assigned to the word and a medium storing a word search program, and particularly to a medium at an arbitrary position. The present invention relates to a word search device that searches a related word at a high speed by designating a certain character, and a medium that stores a word search program for causing a computer to perform such a search.

【０００２】[0002]

【従来の技術】何らかのキーから単語を検索するという
処理は、辞書の検索やかな漢字変換といったテキスト情
報処理システムにおける基本的な過程である。それだけ
に、キーから単語を検索するために必要な処理速度や記
憶容量は、そのような処理システム全体の性能を大きく
左右する。したがって、このような処理をより高速に、
かつより少ない記憶容量から実現することによって、非
常に大きな実用的効果を得ることができる。2. Description of the Related Art The process of searching for a word from some key is a basic process in a text information processing system such as a dictionary search or kana-kanji conversion. For that reason, the processing speed and the storage capacity required to retrieve a word from a key greatly influence the performance of such a processing system. Therefore, such processing can be performed faster
Moreover, by realizing from a smaller storage capacity, a very large practical effect can be obtained.

【０００３】さて、単語をキーとして文書を検索するシ
ステムはすでに多く存在しているが、このような文書検
索システムにおいても、検索の結果として単語を出力す
る機能は、必須ではないが非常に有効な場合がある。There are already many systems for retrieving a document by using a word as a key. Even in such a document retrieval system, the function of outputting a word as a result of retrieval is not essential but very effective. There is a case.

【０００４】文書検索システムにおいて多くの場合、キ
ーである単語は、文書中に含まれているか、もしくはシ
ステムの管理者や文書の作成者によってあらかじめ選定
される。この場合、検索システムを利用する立場のユー
ザからは、登録してあるキーの単語が何であるか分から
ないことがしばしばある。そのため、ユーザを支援する
ために、文書ではなくキーワードを何らかの方法で検索
できることが必要となる。In many document retrieval systems, the key word is included in the document or is preselected by the system administrator or the document creator. In this case, a user who uses the search system often does not know what the registered key word is. Therefore, in order to assist the user, it is necessary to be able to search the keyword in some way instead of the document.

【０００５】キーワードを検索する具体的な方法とし
て、たとえば、よみから表記の単語を検索する方法、ま
たは、ある文字もしくは文字列を含む単語を検索する方
法、さらには、任意の正規表現を満たす単語を検索する
方法などが考えられる。As a specific method for searching for a keyword, for example, a method for searching a written word from a reading, a method for searching a word containing a certain character or a character string, and a word satisfying an arbitrary regular expression. The method of searching is considered.

【０００６】また、キーワードによる文書検索システム
においては、単語から所望の文書もしくは文書へのポイ
ンタを高速に得るために、インデックスとしてトライと
呼ばれる木構造（トライ・インデックス）を用いること
が多い。このトライを用いれば、高速に単語の検索を行
うことができる。トライから単語を検索するときには、
ほぼ入力文字列の長さに比例する程度の処理ステップ数
しか必要としない。またデータ圧縮率も比較的良いた
め、トライは大量の索引単語を格納するという用途に向
いている。加えて、トライを用いる場合、単語の先頭部
分の文字列を指定すると、その文字列から始まるすべて
の単語を、簡単な処理によって求めることができるとい
う利点もある。Further, in a document search system using keywords, a tree structure called a try (trie index) is often used as an index in order to quickly obtain a desired document or a pointer to the document from a word. This try can be used to search for words at high speed. When searching for a word from a try,
The number of processing steps required is almost proportional to the length of the input character string. Moreover, since the data compression rate is relatively good, the trie is suitable for storing a large number of index words. In addition, when using a try, there is an advantage that if a character string at the beginning of a word is specified, all words starting from the character string can be obtained by a simple process.

【０００７】ところが、トライから、単語の先頭以外の
任意の位置の部分文字を含む単語を、高速に探し出すこ
とはできない。ここで、これらの場合には、単語を表す
トライとは別個にインデックスを設けることによって、
任意の位置の部分文字から、その文字を含む単語を検索
することが行われている。However, a word including a partial character at an arbitrary position other than the beginning of the word cannot be searched out at high speed from the try. Here, in these cases, by providing an index separately from the trie representing the word,
It is performed to search a word including the character from a partial character at an arbitrary position.

【０００８】いま、文字を検索キーとして、その文字か
ら綴りの中の任意の位置に含む単語を検索する場合を考
えることにしよう。単語は複数の文字によって構成され
ているので、ある１つの単語は、複数の検索キー（文
字）に対応付けられている。この場合のように、検索対
象が複数のキーにリンクされているときは、検索対象の
集合を１つのデータ構造の中に格納し、検索キーにはそ
のデータ構造中のポインタの値を対応させると、必要な
記憶容量は少なくて済む。Now, let us consider a case where a character is used as a search key to search for a word included in the spelling at an arbitrary position from the character. Since a word is composed of a plurality of characters, one certain word is associated with a plurality of search keys (characters). When the search target is linked to a plurality of keys as in this case, the set of search targets is stored in one data structure, and the search key is associated with the value of the pointer in the data structure. Therefore, the required storage capacity is small.

【０００９】そこで、単語集合のデータを少ない記憶容
量で表すことができ、かつ、ポインタによって単語を特
定できるデータ構造について以下に考察する。単語の集
合を格納するデータ構造として一般的なのは、固定長ま
たは可変長の文字列として格納するレコード構造であ
る。このレコード構造を用いた単語検索方式を「第１の
従来例」と呼ぶことにする。このレコード構造であれ
ば、任意の単語に対して、単語の総数によらずほぼ一定
の時間で高速にアクセスすることができる。したがっ
て、単語集合をこのようなレコード構造として格納する
ことによって、単語検索システムにおける高速なキーワ
ード検索を実現することができる。Therefore, the data structure in which the data of the word set can be expressed with a small storage capacity and the word can be specified by the pointer will be considered below. A general data structure for storing a set of words is a record structure for storing a fixed-length or variable-length character string. The word search method using this record structure will be referred to as a "first conventional example". With this record structure, an arbitrary word can be accessed at high speed in a substantially constant time regardless of the total number of words. Therefore, by storing the word set as such a record structure, it is possible to realize high-speed keyword search in the word search system.

【００１０】また、第１の従来例とは別に、トライ・イ
ンデックスを単語集合のデータとみなして、単語の末尾
に対応するトライ中のノードの識別番号を単語へのポイ
ンタとすることが考えられる。これを「第２の従来例」
と呼ぶことにする。トライのような木構造においては、
根を除くすべてのノードの親ノードは一意に存在する。
したがって、親ノードへのリンク情報をすべてのノード
に持たせることによって、１つのノードを指定したとき
に、そのノードから根に至るまでの経路は一意に決定す
ることができる。In addition to the first conventional example, it is conceivable that the trie index is regarded as word set data and the identification number of the node in the trie corresponding to the end of the word is used as a pointer to the word. . This is the "second conventional example"
I will call it. In a tree structure like try,
All nodes except the root have a unique parent node.
Therefore, by giving the link information to the parent node to all nodes, when one node is designated, the route from the node to the root can be uniquely determined.

【００１１】[0011]

【発明が解決しようとする課題】しかし、単語をキーと
するトライ・インデックスが存在し、なおかつそのイン
デックス中の単語を何らかのキーから検索する場合にお
いて、上記の従来の方式には以下のような問題点があっ
た。However, when there is a trie index having a word as a key and a word in the index is searched from some key, the above conventional method has the following problems. There was a point.

【００１２】上記の第１の従来例、すなわち、単語の集
合を固定長または可変長の文字列として格納するレコー
ド構造のデータを、インデックスとは別個に用意する方
法では、キーから単語への高速な検索は実現できるが、
すでにトライ形式で格納されている単語の集合のデータ
とは別個にあらたなデータが必要となる。そのために必
要な記憶容量は無視し得ないほど大きくなってしまうと
いう問題点がある。In the first conventional example described above, that is, in the method of preparing record structure data for storing a set of words as a fixed-length or variable-length character string separately from the index, a high-speed key-to-word conversion is performed. Such a search can be realized,
New data is required separately from the word set data already stored in the trie format. Therefore, there is a problem in that the storage capacity required is too large to ignore.

【００１３】上記の第２の従来例、すなわち、トライ・
インデックスを単語集合のデータとみなして、単語の末
尾に対応するトライ中のノードの識別番号を単語へのポ
インタとする方法では、第１の従来例と比べて単語集合
を表すためのデータが不要な分だけ、キーから単語への
検索自体に必要な記憶容量は第１の従来例に比べて少な
くて済むが、トライ・インデックスに対して本来ならば
不要な、親ノードへのリンク情報を追加することになっ
てしまう。したがって、検索システム全体としてみた場
合、第２の従来例は第１の従来例に比べて、記憶容量の
面で著しく改善されているとは言えない。The above-mentioned second conventional example, that is,
In the method in which the index is regarded as the data of the word set and the identification number of the node in the try corresponding to the end of the word is used as the pointer to the word, the data for representing the word set is not required as compared with the first conventional example. Although the storage capacity required for the search from the key to the word itself is smaller than that of the first conventional example, the link information to the parent node, which is originally unnecessary for the trie index, is added. Will be done. Therefore, in terms of the search system as a whole, it cannot be said that the second conventional example is significantly improved in storage capacity as compared with the first conventional example.

【００１４】本発明はこのような点に鑑みてなされたも
のであり、少ない記憶容量で高速に単語を検索できる単
語検索装置を提供することを目的とする。また、本発明
の他の目的は、コンピュータに対して、少ない記憶容量
で高速に単語を検索させるための単語検索プログラムを
記録した媒体を提供することである。The present invention has been made in view of the above points, and an object of the present invention is to provide a word search device capable of searching a word at high speed with a small storage capacity. Another object of the present invention is to provide a medium recording a word search program for causing a computer to search for a word at high speed with a small storage capacity.

【００１５】[0015]

【課題を解決するための手段】本発明では上記課題を解
決するために、単語集合から単語を検索する単語検索装
置において、深さ優先順にノードが記録されるトライ形
式にしたがって、ノードに対応付けられた単語の集合が
格納された単語格納手段と、前記単語格納手段における
ノードの位置が入力されると、前記単語格納手段のトラ
イを根から順にたどっていき、入力された位置のノード
までの経路を求め、求められた経路以降の全ての経路を
たどって到達するノードに対応する全ての単語を取得
し、取得した単語の集合を出力する単語検索手段と、前
記単語格納手段に含まれる単語に対応するキーと、各単
語を構成しているノードの位置とを対応付けて格納する
キーインデックス格納手段と、前記キーインデックス格
納手段中の任意のキーが入力されると、前記キーインデ
ックス格納手段から、入力されたキーに対応するノード
の位置の集合を取得し、取得したノードの位置の集合を
前記単語検索手段に対して出力するノード位置検索手段
と、を有し、前記キーインデックス格納手段は、前記単
語格納手段に含まれる単語を構成する全ての文字と、各
文字を表しているノードの位置とを対応付け、かつ前記
単語格納手段に含まれる単語を構成する文字のうち、単
語の先頭文字および末尾文字を除いたすべての文字と、
各文字を表しているノードの位置とを対応付けているこ
とを特徴とする単語検索装置が提供される。In order to solve the above problems, the present invention relates to a word search device for searching a word from a word set, in which nodes are associated with each other according to a trie format in which nodes are recorded in depth priority order. a word storage means for a set of words is stored, which is the position of the nodes in the word storage means is input, the node of roots will follow in this order, the input position trie said word storing means
Seeking a route until the word search means for tracing all paths after determined route retrieves all words corresponding to the nodes to reach, and outputs a set of the acquired words, before
Keys corresponding to the words included in the word storage means
Store the position of the nodes that make up a word in association with each other
Key index storage means and the key index case
When any key in the payment means is entered, the key index
Node corresponding to the key input from the
Get the set of positions of
Node position searching means for outputting to the word searching means
And the key index storage means is
All the characters that make up the word included in the word storage means, and
Corresponds with the position of the node representing the character, and
Of the characters that make up the words included in the word storage,
All characters except the first and last characters of the word, and
There is provided a word search device characterized by associating the position of a node representing each character with each other .

【００１６】この単語検索装置によれば、単語格納手段
における格納先の位置情報が入力されると、単語検索手
段が、単語格納手段のトライを根から順にたどってい
き、入力された位置のノードまでの経路を求め、求めら
れた経路以降の全ての経路をたどって到達するノードに
対応する全ての単語を取得し、取得した単語の集合が出
力される。キーインデックス格納手段は、単語格納手段
に含まれる単語に対応するキーと、各単語を構成してい
るノードの位置とを対応付けて格納する。ノード位置検
索手段は、キーインデックス格納手段中の任意のキーが
入力されると、キーインデックス格納手段から、入力さ
れたキーに対応するノードの位置の集合を取得し、取得
したノードの位置の集合を単語検索手段に対して出力す
る。さらに、キーインデックス格納手段は、単語格納手
段に含まれる単語を構成する全ての文字と、各文字を表
しているノードの位置とを対応付け、かつ単語格納手段
に含まれる単語を構成する文字のうち、単語の先頭文字
および末尾文字を除いたすべての文字と、各文字を表し
ているノードの位置とを対応付けている。その結果、入
力された位置に対応する単語を含む複数の単語を、高速
に検索することができるとともに、単語格納手段に必要
な記憶容量は少なくてすむ。According to this word search device, when the position information of the storage destination in the word storage means is input, the word search means sequentially follows the tries of the word storage means from the root, and the node at the input position is searched. Is obtained, all the words corresponding to the nodes that reach all the paths after the obtained path are acquired, and the acquired word set is output. Key index storage means is word storage means
The keys that correspond to the words in
The position of the node to be stored is associated and stored. Node position detection
Any key in the key index storage means
When entered, it is entered from the key index storage means.
The set of node positions corresponding to the generated key
Output the set of node positions to the word search means
It Furthermore, the key index storage means is a word storage
Show all the letters that make up the words in the column and each letter
Corresponding to the position of the node that is doing
Of the letters that make up the word included in
And all characters except the last character and each character
It is associated with the position of the existing node. As a result, a plurality of words including the word corresponding to the input position can be searched at high speed, and the storage capacity required for the word storage means can be small.

【００１７】[0017]

【００１８】[0018]

【００１９】[0019]

【発明の実施の形態】以下、本発明の実施の形態を図面
を参照して説明する。図１は、本発明の原理構成図であ
る。本発明に係る単語検索装置は、単語格納手段１と単
語検索手段２とを有している。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the principle of the present invention. The word search device according to the present invention has a word storage means 1 and a word search means 2.

【００２０】単語格納手段１には、単語の集合１ａが、
深さ優先順にノードが記録されるトライ形式（トライ・
インデックス）で格納されている。ここで、ノードの深
さとは、トライの根からそのノードまでの経路の長さで
ある。そして、ノードの深さ優先順とは、できる限り探
索を深さ方向に進めたときにたどるノードの順番であ
る。あるノードの子孫のノードと、弟のノードとを比較
した場合、常に、子孫のノードの方が優先順が高い（ア
ドレスの値が小さい）。また、兄弟同士で比べると、兄
となるノードの方が優先順が高い。The word storage means 1 stores a set of words 1a
A trie format in which nodes are recorded in depth priority order (trie
Index). Here, the node depth is the length of the path from the root of the trie to the node. The node depth priority order is the order of nodes that are followed when the search is advanced in the depth direction as much as possible. When comparing a descendant node of a certain node with a younger brother node, the descendant node always has a higher priority (the address value is smaller). Also, when compared with siblings, the node that is the elder brother has a higher priority.

【００２１】単語検索手段２は、単語格納手段１におけ
る格納先の位置情報が入力されると、単語格納手段１の
トライを根から順にたどっていき、入力された位置のノ
ードまでの経路を求め、求められた経路以降の全ての経
路をたどって到達するノードに対応する全ての単語を取
得し、取得した単語集合を出力する。ノードの位置から
そのノードを含む単語または単語の集合を求めるアルゴ
リズムを以下に記述する。When the location information of the storage location in the word storage means 1 is input, the word search means 2 traces the tries of the word storage means 1 in order from the root, and obtains a route to the node at the input location. , All the words corresponding to the nodes that reach all paths after the obtained path are acquired, and the acquired word set is output. An algorithm for obtaining a word or a group of words including the node from the position of the node will be described below.

【００２２】図２は、ノードの位置からそのノードを含
む単語または単語の集合を求めるアルゴリズムを示すフ
ローチャートである。これは、ノードの位置情報を受け
取った単語検索手段２が行う処理である。［Ｓ１］単語格納手段１のトライにおいて、開始ノード
から遷移することを考える。まず、トライにおけるラベ
ルの列を格納するために空のラベルスタックを用意し、
開始ノードを「ノードＡ」とおき、与えられたノード位
置に相当するノードを「ノードＸ」とおく。［Ｓ２］ノードＡの長子のノードを「Ｂ１」とする。ノ
ードＡの次の位置に存在するため、長子のノードである
ノードＢ１は直ちに求まる。［Ｓ３］ノードＢ１の隣接する弟ノードをノードＢ２と
する。トライを根からたどることによって、任意のノー
ドの隣接する弟のノードＢ２も直ちに求まる。［Ｓ４］ノードＸがノードＢ１と等しいか否かを判断す
る。等しければステップＳ８に進み、等しくなければス
テップＳ５に進む。［Ｓ５］ノードＸがノードＢ２よりも前か否かを判断す
る。前であるならステップＳ６に進み、前でないならス
テップＳ６に進む。［Ｓ６］ノードＸがノードＢ２よりも前にある場合、ノ
ードＸは、ノードＢ１までの経路をたどることが分か
る。そこで、ノードＡからノードＢ１に至るまでのアー
クに付加されたラベルをラベルスタックに格納し、ノー
ドＡにノードＢ１を代入して、ステップＳ２に戻る。［Ｓ７］ノードＸがノードＢ２以降にある場合、ノード
Ｘが、ノードＢ１の経路をたどらないことが分かる。そ
こで、ノードＢ１にノードＢ２を代入して、ステップＳ
３に戻る。［Ｓ８］ノードＸがノードＢ１と等しい場合、ノードＢ
１以降の全ての経路をたどり、それらの経路となるアー
クに付加されたラベル列を求める。［Ｓ９］ラベルスタックに格納されているラベル列とノ
ードＢ１以降の経路から得られたラベル列の連結し、処
理を終了する。これにより、求めるべき単語が表され
る。FIG. 2 is a flowchart showing an algorithm for obtaining a word or a group of words including a node from the position of the node. This is a process performed by the word search means 2 that has received the node position information. [S1] Consider the transition from the start node in the try of the word storage unit 1. First, prepare an empty label stack to store the label column in the try,
The start node is set to "node A", and the node corresponding to the given node position is set to "node X". [S2] The first child node of the node A is "B1". Since the node B exists at the position next to the node A, the node B1 which is the first child node is immediately obtained. [S3] The adjacent younger brother node of the node B1 is referred to as a node B2. By tracing the trie from the root, the adjacent younger brother node B2 of any node can be immediately obtained. [S4] It is determined whether the node X is equal to the node B1. If they are equal, the process proceeds to step S8, and if they are not equal, the process proceeds to step S5. [S5] It is determined whether the node X is before the node B2. If it is before, the process proceeds to step S6. If not, the process proceeds to step S6. [S6] When the node X is located before the node B2, it can be seen that the node X follows the route to the node B1. Therefore, the label added to the arc from node A to node B1 is stored in the label stack, node B1 is substituted for node A, and the process returns to step S2. [S7] When the node X is located after the node B2, it is understood that the node X does not follow the route of the node B1. Therefore, the node B2 is substituted for the node B1, and step S
Return to 3. [S8] If the node X is equal to the node B1, the node B
All the routes after 1 are traced, and the label strings added to the arcs that are the routes are obtained. [S9] The label string stored in the label stack and the label string obtained from the path after the node B1 are concatenated, and the process ends. This represents the word to be sought.

【００２３】このような処理を単語検索手段２が行うこ
とにより、ノード位置から、そのノードを含む単語の集
合を得ることができる。しかも、トライ中のノードが深
さ優先順に記録され、かつトライを根からたどることに
よって任意のアークに関してその隣接する弟アークが特
定できるため、トライ中のノードを指定されたときに、
親のノードへのリンク情報を用いることなく、そのノー
ドを含む経路を特定することができる。その理由を以下
に説明する。By the word searching means 2 performing such processing, a set of words including the node can be obtained from the node position. Moreover, since the nodes in the trie are recorded in depth-first order, and by tracing the trie from the root, its adjacent younger brother arc can be specified, so when the node in the trie is specified,
A route including the node can be specified without using the link information to the parent node. The reason will be described below.

【００２４】まず、前提として、トライ中のあるノード
をノードＡとし、ノードＡの長子のノードをＢ１とし、
ノードＢ１の隣接する弟ノードをＢ２とし、ノードＢ１
の長子のノードをＣとする。いま、ノードＸの位置が指
定され、ノードＡからノードＸに至るまで経路を求めた
いとする。このとき、ノードＸはノードＡからたどれる
ことは分かっているものとする。First, as a premise, a certain node in the trie is set as a node A, and a node of the first child of the node A is set as B1.
The adjacent younger brother node of the node B1 is B2, and the node B1
Let C be the node of the firstborn child of. Now, assume that the position of the node X is specified and a route from the node A to the node X is to be obtained. At this time, it is assumed that the node X is known to be traced from the node A.

【００２５】本発明に係るトライでは、ノードは深さ優
先順に記録されているので、ノードＣは、ノードＢ１と
ノードＢ２との間にある。仮に、ノードＸがノードＣと
等しいとき、ノードＸは、ノードＢ１よりも後にあり、
かつ、ノードＢ２よりも前にあることになる。したがっ
て、ノードＸはノードＢ１からたどれることになる。以
上のことから、ノードＸがノードＢ１とノードＢ２との
間にある場合、ノードＸにはノードＡからノードＢ１に
遷移する経路をたどって到達できることがわかる。In the trie according to the present invention, since the nodes are recorded in the depth priority order, the node C is between the nodes B1 and B2. If node X is equal to node C, node X is after node B1,
And it is located before the node B2. Therefore, the node X will be traced from the node B1. From the above, it can be seen that when the node X is between the node B1 and the node B2, the node X can be reached by following the transition path from the node A to the node B1.

【００２６】また、ノードＸがノードＢ２よりも後にあ
る場合、ノードＸは、ノードＡからノードＢ１に遷移す
る経路をたどらないことがわかる。この場合、ノードＸ
にはノードＡから、ノードＢ１の弟のノードのいずれか
を遷移する経路をたどって到達できることが分かる。以
上の考察を繰り返すことによって、ノードＡからノード
Ｘまで到達できる経路が分かる。Further, it is understood that when the node X is after the node B2, the node X does not follow the transition path from the node A to the node B1. In this case, node X
It can be seen that the node A can be reached from the node A by following a path that transits any one of the nodes of the younger brother of the node B1. By repeating the above consideration, the route that can be reached from node A to node X is known.

【００２７】以上の結果、ノードの位置を指定されたと
きに、トライの根から遷移することによって、そのノー
ドを含む単語を求めることができる。しかも、トライに
おける単語を、トライにおけるノードの位置をポインタ
として参照することによって、トライ・インデックス以
外に単語集合を表すデータを設けておく必要がなく、情
報の記憶容量が少なくてすむ。As a result, when the position of the node is designated, the word including the node can be obtained by transitioning from the root of the trie. Moreover, by referring to the word in the trie by using the position of the node in the trie as a pointer, it is not necessary to provide data representing the word set other than the trie index, and the information storage capacity can be reduced.

【００２８】また、単語に対する検索キー（文字）の入
力に応じて、単語の集合が出力されるようにすることも
できる。そのような単語検索装置について、以下に説明
する。It is also possible to output a set of words in response to the input of a search key (character) for a word. Such a word search device will be described below.

【００２９】図３は、単語へのキーを入力とする単語検
索装置の原理構成図である。この単語検索装置は、単語
格納手段１１、キーインデックス格納手段１２、ノード
位置検索手段１３及び単語検索手段１４で構成されてい
る。なお、単語格納手段１１と単語検索手段１４とは、
図１中の単語格納手段１と単語検索手段２と同じ機能有
しているため、ここでは説明を省略する。FIG. 3 is a block diagram showing the principle of a word search device that inputs a key to a word. This word search device comprises a word storage means 11, a key index storage means 12, a node position search means 13 and a word search means 14. The word storage means 11 and the word search means 14 are
Since it has the same function as the word storage means 1 and the word search means 2 in FIG. 1, the description thereof is omitted here.

【００３０】キーインデックス格納手段１２は、単語へ
のキー（文字）と、そのキーと同じラベルが付加されて
いるノードの位置情報とを対応付けて格納している。こ
のように、キーとノード位置との対応関係のすべてをも
とにキーインデックス格納手段１２が構成されている。
なお、１つのキーに対して複数の単語が対応している場
合、キーインデックス格納手段１２において、そのキー
にはノードの位置の集合が対応している。The key index storage means 12 stores the key (character) to the word and the position information of the node to which the same label as the key is added in association with each other. In this way, the key index storage means 12 is constructed based on all the correspondences between the keys and the node positions.
When a plurality of words correspond to one key, the key index storage means 12 corresponds to the set of node positions in the key.

【００３１】ノード位置検索手段１３は、単語へのキー
が入力されると、キーインデックス格納手段１２の中か
ら、対応するノードの位置の集合を検索する。そして、
得られたノードの位置の集合を単語検索手段１４へ入力
する。When a key to a word is input, the node position searching means 13 searches the key index storing means 12 for a set of corresponding node positions. And
The obtained set of node positions is input to the word search means 14.

【００３２】以後、単語検索手段１４が、単語格納手段
１１から各ノード位置に対応する単語集号を検索し、単
語集合を出力する。これにより、単語を構成している文
字を検索キーとして入力し、その文字を含む単語の集合
を得ることができる。すなわち、単語の先頭以外の任意
の位置の部分文字を含む単語の集合を探し出すことがで
きる。After that, the word searching means 14 searches the word storing means 11 for the word assembly corresponding to each node position, and outputs the word assembly. As a result, it is possible to input a character that constitutes a word as a search key and obtain a set of words including the character. That is, it is possible to find a set of words including a partial character at any position other than the beginning of the word.

【００３３】なお、キーインデックス格納手段１２は、
単語格納手段１１に含まれる単語を構成する文字のう
ち、単語の先頭文字および末尾文字を除いたすべての文
字と、単語格納部中でのその文字を表しているノードの
位置とを対応付けているものであってもよい。それは、
単語の先頭文字あるいは末尾文字から始まる単語は、従
来の技術を用いて検索できるため、特にキーインデック
ス格納手段１２において管理する必要がないからであ
る。このように、単語の先頭文字および末尾文字に対応
するノードの位置の情報を省略すれば、キーインデック
ス格納手段１２に必要な記憶容量をさらに減らすことが
できる。The key index storage means 12 is
Among the characters forming the word included in the word storage means 11, all the characters except the first character and the last character of the word are associated with the position of the node representing the character in the word storage unit. It may be that is,
This is because the word starting from the first character or the last character of the word can be searched by using the conventional technique, and therefore it is not particularly necessary to manage it in the key index storage means 12. In this way, by omitting the information on the positions of the nodes corresponding to the first character and the last character of the word, the storage capacity required for the key index storage means 12 can be further reduced.

【００３４】また、単語格納手段１１には、単語に対応
する情報（例えば、その単語を含む文書）へのポインタ
とそれに対応する単語とを組にして格納しておき、単語
検索手段１４は、単語格納手段１１から各々のノードが
表す単語および単語に対応する情報へのポインタの集合
を出力するようにしてよい。これにより、任意のキー
（文字）を入力することにより、そのキーを含む単語を
取得し、さらに、取得した単語に対応する情報（文書な
ど）を得ることができる。In the word storage means 11, a pointer to information (for example, a document including the word) corresponding to a word and a word corresponding to the pointer are stored as a pair, and the word search means 14 stores A set of pointers to the word represented by each node and information corresponding to the word may be output from the word storage unit 11. Thus, by inputting an arbitrary key (character), a word including the key can be acquired, and further information (document or the like) corresponding to the acquired word can be acquired.

【００３５】以上が本発明の基本となる原理構成であ
る。以下に、上記の構成をより具体化した単語検索装置
の実施の形態を説明する。図４は、本発明の第１の実施
の形態を示すブロック図である。これは、単語を入力と
して、その単語に関連する単語の集合を検索する単語検
索装置である。The above is the basic configuration which is the basis of the present invention. Hereinafter, an embodiment of a word search device that is a more specific version of the above configuration will be described. FIG. 4 is a block diagram showing the first embodiment of the present invention. This is a word search device that inputs a word and searches a set of words related to the word.

【００３６】この実施の形態に係る単語検索装置は、関
連単語インデックス部２１、ノード位置検索部２２及び
単語検索部２３で構成されている。関連単語インデック
ス部２１には、図３の原理構成で示した単語格納手段１
１とキーインデックス格納手段１２との情報を保持して
いるとともに、ある単語とその単語に対する関連語集合
との対応関係をも保持している。The word search device according to this embodiment comprises a related word index section 21, a node position search section 22 and a word search section 23. The related word index unit 21 includes a word storage unit 1 shown in the principle configuration of FIG.
In addition to holding the information of 1 and the key index storage means 12, it also holds the correspondence between a word and a set of related words for that word.

【００３７】ノード位置検索部２２は、単語の入力を受
けると、その単語に関連する単語（その単語自身も含
む）のノードの位置の集合を受け取り、単語検索部２３
に入力する。単語検索部２３は、入力されたノードの位
置の集合に基づいて、関連単語インデックス部２１を検
索し、関連単語の集合を取得し、出力する。Upon receiving the input of a word, the node position searching unit 22 receives a set of node positions of words related to the word (including the word itself), and the word searching unit 23.
To enter. The word search unit 23 searches the related word index unit 21 based on the input set of node positions, acquires the set of related words, and outputs the set.

【００３８】ここで、本発明の実施の形態の詳細を説明
する前に、木に関する用語をあらためて定義する。木
は、ノードとよばれる要素の集合に対して階層関係を与
えたものである。以下、木において与えられる階層関係
は、親子親戚関係を表すことばで表現することにする。
木においては、自らを含むすべてのノードを子孫とする
ノードが１つ存在する。これを開始ノードとよぶことに
する。開始ノード以外のすべてのノードに対して、その
親であるノードが必ず１つ存在する。ノードとノードと
の間の親子関係のつながりを示すものを、アークとよぶ
ことにする。そして、ノードは、特別な状態として、終
了状態を持つことができることにし、終了状態ではない
他のノードと区別することにする。また、自分自身以外
の子孫を持たないノードは、終了状態であるとする。Before describing the details of the embodiments of the present invention, the terminology related to trees will be newly defined. A tree gives a hierarchical relationship to a set of elements called nodes. In the following, the hierarchical relationship given in the tree will be expressed by words that represent the parent-child relative relationship.
In the tree, there is one node whose descendants are all the nodes including itself. This is called a start node. For every node other than the start node, there is always one parent node. What indicates the parent-child relationship between nodes is called an arc. Then, a node can have an end state as a special state, and is distinguished from other nodes that are not in the end state. In addition, it is assumed that a node having no descendants other than itself is in an end state.

【００３９】トライは木構造の一種であり、開始ノード
から終了状態のノードまでを、任意個のノードを経由し
てアークによって結ばれている経路の１本１本に、集合
の中の１つの単語が対応している。本説明文中では、ト
ライにおいて、単語を構成する要素である文字を、アー
クに対して割りつけることにする。また、１つのノード
から派生している各々のアークに対応する文字は、すべ
て異なるようにノードを構成する。トライにおいては、
開始ノードを除いて１つのノードに遷移するアークは必
ず１つ存在するので、このようなノードとアークの組を
１つの辺節という単位と見なすことにする。以下にある
単語の集合を表したトライの例を示す。A trie is a kind of tree structure, and each of the paths connecting from the start node to the node in the end state by an arc through an arbitrary number of nodes is one of the sets. The words correspond. In the present description, in the trie, characters, which are elements that form words, are assigned to arcs. Further, the letters corresponding to the respective arcs derived from one node configure the node differently. In a try,
Since there is always one arc that transitions to one node excluding the start node, a set of such a node and arc is regarded as a unit called an edge clause. Below is an example of a trie that represents a set of words.

【００４０】図５は、単語の集合の例を示す図である。
この単語集合には、６個の単語「解」「解析」「解像
度」「解像力」「現像」「像」がある。それぞれの単語
には、ポインタが対応付けられている。ポインタは、そ
の単語を含む文書の識別子集合の位置を指し示すもので
あるが、この実施の形態では、関連語のノード位置の集
合を指し示すのにも用いられる。「解」のポインタは
「Ｔ１」であり、「解析」のポインタは「Ｔ２」であ
り、「解像度」のポインタは「Ｔ３」であり、「解像
力」のポインタは「Ｔ４」であり、「現像」のポインタ
は「Ｔ５」であり、「像」のポインタは「Ｔ６」であ
る。この単語の集合を基に、深さ優先順のノードが記録
されたトライを生成する。FIG. 5 is a diagram showing an example of a set of words.
In this word set, there are six words "solution", "analysis", "resolution", "resolution", "development", and "image". A pointer is associated with each word. The pointer points to the position of the identifier set of the document including the word, but in this embodiment, it is also used to point to the node position set of related words. The "resolution" pointer is "T1", the "analysis" pointer is "T2", the "resolution" pointer is "T3", the "resolution" pointer is "T4", and the "development" is The pointer of "" is "T5", and the pointer of "image" is "T6". Based on this set of words, a trie in which nodes in depth-first order are recorded is generated.

【００４１】図６は、深さ優先順にノードが記録された
トライの例を示す図である。これは、図５に示した６個
の単語「解」「解析」「解像度」「解像力」「現像」
「像」をトライによって表したものである。図中、丸印
若しくは２重丸で表しているのがノード３０〜３８であ
る。２重丸は終了状態のノード（対応する単語が存在す
るノード）を表している。根であるノード３０が単語検
索時の「開始ノード」となる。終了状態のノードの右下
にある記号はその終了状態のノードによって示されるこ
とばに関連する単語の集合を意味する識別子をそれぞれ
表している。なお、図中の開始ノード以外の各ノード３
１〜３８の近傍に表示しているのが、それぞれのノード
のアドレス（位置）である。また、各ノード３１〜３８
を接続している矢印がアーク４１〜４８である。各アー
ク４１〜４８の上にある文字( ここでは漢字１文字) は
ラベルである。FIG. 6 is a diagram showing an example of a trie in which nodes are recorded in the depth priority order. This is the six words “solution”, “analysis”, “resolution”, “resolution”, “development” shown in FIG.
It is a representation of the "image" by a trial. In the figure, nodes 30 to 38 are represented by circles or double circles. Double circles represent end nodes (nodes where the corresponding word exists). The node 30 which is the root becomes the "start node" at the time of word search. The symbols at the lower right of the node in the end state represent the identifiers that mean the set of words related to the words indicated by the node in the end state. Each node 3 other than the start node in the figure
The addresses (positions) of the respective nodes are displayed near 1 to 38. In addition, each node 31 to 38
The arrows connecting the lines are arcs 41 to 48. The character above each arc 41 to 48 (here, one Kanji character) is a label.

【００４２】図６におけるトライのノードを深さ優先順
に並べると、根のノード３０から派生する「解」のラベ
ルをもつアーク４１を遷移してきたノード３１、ノード
３１から派生する「析」のラベルをもつアーク４２を遷
移してきたノード３２、ノード３１から派生する「像」
のラベルをもつアーク４３を遷移してきたノード３３、
…の順となる。When the trie nodes in FIG. 6 are arranged in the depth-first order, the node 31 that has transited the arc 41 having the label of “solution” derived from the root node 30 and the label of “analysis” derived from the node 31. "Image" derived from the node 32 and the node 31 that have transited the arc 42 with
Node 33 that has transited the arc 43 with the label
The order is ...

【００４３】このように、辺節を深さ優先順に記録し、
かつ、ある辺節からそのすぐとなりの弟の位置が特定で
きるようにする。図７は、トライ・インデックスの例を
示す図である。この図には、辺節の情報が格納されたア
ドレス、その辺節の直下の弟ノードの位置（アドレ
ス）、その辺節に対応するラベル、ノードの状態及び対
応する文書集合へのリンク情報を示している。この例で
は、ラベルは漢字１文字である。ノードの状態は、「終
了」「継続」のいずれか一方、若しくは双方が設定され
ている。「終了」は、そのノードに対応する単語が存在
することを示し、「継続」は、そのノードが子供を有し
ていることを示す。In this way, the side nodes are recorded in the depth priority order,
At the same time, identify a younger brother's position immediately next to a certain verse. FIG. 7 is a diagram showing an example of a trie index. In this figure, the address where the information on the side clause is stored, the position (address) of the younger brother node immediately below that side clause, the label corresponding to that side clause, the state of the node, and the link information to the corresponding document set are shown. Shows. In this example, the label is one Kanji character. The node status is set to either "end" or "continue" or both. "End" indicates that the word corresponding to the node exists, and "Continue" indicates that the node has children.

【００４４】このようにトライを構成することによっ
て、ある「辺節Ａ」が指定されたときに、「辺節Ａ」の
長男および「辺節Ａ」のすぐとなりの弟の辺節を直ちに
特定できる。By constructing a trie in this way, when a certain "Haraku A" is specified, the eldest son of "Harushi A" and the younger brother of "Hiraku A" immediately next to him are immediately specified. it can.

【００４５】具体的な事例データを想定して、本実施例
の動作を説明しよう。簡単のために、図５に示す単語の
集合が与えられ、そのうち「解像度」と「解像力」が互
いに関連しているとする。The operation of this embodiment will be described on the assumption of concrete case data. For the sake of simplicity, the word set shown in FIG. 5 is given, and it is assumed that “resolution” and “resolution” are related to each other.

【００４６】まず、すべての単語を表すトライを作成
し、単語の末尾に相当する最終ノードには、その単語に
関連する単語の集合へのリンクを張っておく。これをト
ライ１とよぶ。図５の単語集合が与えられた場合であれ
ば、図６のトライを構成する。First, a trie representing all the words is created, and a link to a set of words related to the word is provided at the final node corresponding to the end of the word. This is called try 1. If the word set of FIG. 5 is given, the trie of FIG. 6 is constructed.

【００４７】次に、単語の集合へのリンク情報から単語
を対応付けるテーブルを用意する。これを関連語対応テ
ーブルとよぶことにする。図８は、関連語対応テーブル
を示す図である。このテーブルは、単語から関連語集合
へのリンク情報とノード位置の集合とが対応付けられて
いる。この例では、「解像度」と「解像力」が互いに関
連していることから、リンク情報「Ｔ３」「Ｔ４」には
ノードの位置「１４，１９」が対応している。Next, a table for associating words with the link information to the set of words is prepared. This is called a related word correspondence table. FIG. 8 is a diagram showing a related word correspondence table. In this table, link information from a word to a set of related words and a set of node positions are associated with each other. In this example, since the "resolution" and the "resolution" are related to each other, the node positions "14, 19" correspond to the link information "T3" and "T4".

【００４８】次に、本実施の形態の検索を実行する動作
を、例を交えて説明する。検索キーとして単語が入力さ
れ、入力単語に関連する単語集合を求める場合を考える
ことにする。いま想定している事例データのもとで、
「解像度」が検索キーとして入力された場合を考える。Next, the operation of executing the search according to this embodiment will be described with an example. Consider a case where a word is input as a search key and a word set related to the input word is obtained. Based on the assumed case data,
Consider the case where "resolution" is entered as a search key.

【００４９】まず、ノード位置検索部２２によって、検
索キーの単語「解像度」に対応するノード位置の集合、
すなわち、関連単語集合へのリンク情報を求められる。
具体的には次のように行う。トライから単語を検索する
通常の方法によって、トライ・インデックスから「Ｔ
３」というリンク情報を求める。そして、関連語対応テ
ーブルから、「Ｔ３」に対応するノード位置集合とし
て、「１４，１９」を得る。First, the node position search unit 22 sets a set of node positions corresponding to the word "resolution" of the search key,
That is, the link information to the related word set can be obtained.
Specifically, the procedure is as follows. The usual way of searching for words in a try is to search for "T" in the try index.
Link information "3" is requested. Then, “14, 19” is obtained as a node position set corresponding to “T3” from the related word correspondence table.

【００５０】次に、単語検索部２３が、ノード位置検索
部２２によって得られたノード位置集合から、対応する
単語集合を求める。以下に、図６のトライに基づいて、
１つのノード位置をキーとして対応する単語を検索する
手順を説明する。Next, the word search unit 23 obtains a corresponding word set from the node position set obtained by the node position search unit 22. Below, based on the trial of FIG.
A procedure for searching for a corresponding word using one node position as a key will be described.

【００５１】図９は、第１の実施の形態における単語検
索手順を示すフローチャートである。［Ｓ１１］検索キーのノードアドレスは「Ａ」であたえ
られたとする。まず、単語の文字列を記録するためのラ
ベル記録部を用意しておく。はじめはトライの先頭にあ
る辺節に注目する。注目している辺節を「現在の辺節」
と、その辺節に含まれるアークとノードとを、それぞれ
「現在のアーク」、「現在のノード」とよぶことにす
る。［Ｓ１２］現在の辺節が「Ａ」と等しいか否かを判断す
る。等しければステップＳ１３に進み、等しくなければ
ステップＳ１４に進む。［Ｓ１３］現在の辺節のアドレスが、「Ａ」と等しいと
き、現在のアークのラベルをラベル記録部にプッシュ
（格納）し、ラベル記憶部の内容を出力し、正常に処理
を終了する。［Ｓ１４］現在の辺節が弟を有しているか否かを判断す
る。弟を有していればステップＳ１５に進み、有してい
なければステップＳ１７に進む。［Ｓ１５］現在の辺節が弟を有している場合には、隣接
する弟の辺節のアドレスが「Ａ」以下か否かを判断す
る。「Ａ」以下であればステップＳ１６に進み、「Ａ」
以下でなければステップＳ１７に進む。［Ｓ１６］隣接する弟の辺節のアドレスが「Ａ」以下の
場合、隣接する弟の辺節に注目し、ステップＳ１２に進
む。［Ｓ１７］現在の辺節が子を有しているか否かを判断す
る。子を有していればステップＳ１８に進み、子を有し
ていなければ不正終了として処理を終了する。［Ｓ１８］現在の辺節が子を有している場合には、長子
の辺節のアドレスが「Ａ」以下であるか否かを判断す
る。「Ａ」以下であればステップＳ１９に進み、「Ａ」
以下でなければ、不正終了として処理を終了する。［Ｓ１９］現在のアークのラベルをラベル記録部にプッ
シュして、長子の辺節に注目し、ステップＳ１２に進
む。FIG. 9 is a flow chart showing the word search procedure in the first embodiment. [S11] It is assumed that the node address of the search key is "A". First, a label recording unit for recording a character string of a word is prepared. At the beginning, pay attention to the side clause at the beginning of the try. The buns you are paying attention to is the "current buns"
, And the arc and node included in the side clause are referred to as “current arc” and “current node”, respectively. [S12] It is determined whether the current side clause is equal to "A". If they are equal, the process proceeds to step S13, and if they are not equal, the process proceeds to step S14. [S13] When the address of the current edge node is equal to "A", the label of the current arc is pushed (stored) in the label recording unit, the contents of the label storage unit are output, and the process is terminated normally. [S14] It is determined whether the current node has a younger brother. If it has a younger brother, it proceeds to step S15, and if it does not have it, it proceeds to step S17. [S15] When the current side node has a younger brother, it is determined whether the address of the side node of the adjacent younger brother is "A" or less. If "A" or less, the process proceeds to step S16, and "A"
If not, proceed to step S17. [S16] If the address of the adjacent younger brother's side node is "A" or less, pay attention to the adjacent younger brother's side node and proceed to step S12. [S17] It is determined whether or not the current edge node has children. If the child has a child, the process proceeds to step S18. If the child does not have a child, the processing ends as an illegal end. [S18] If the current side node has children, it is determined whether the address of the longest side node is "A" or less. If "A" or less, the process proceeds to step S19, and "A"
If it is not below, the processing ends as an unauthorized end. [S19] The label of the current arc is pushed to the label recording section, paying attention to the longest node, and the process proceeds to step S12.

【００５２】以上の処理によって、ノード位置をキーと
して対応する関連単語が出力される。例として、単語へ
のリンク情報としてアドレス「１４」が得られたとき、
上記のアルゴリズムにしたがって図７のように記録され
たトライから対応する関連単語を求めることにする。Through the above processing, the corresponding related word is output using the node position as a key. As an example, when the address “14” is obtained as the link information to the word,
According to the above algorithm, the corresponding related word is obtained from the tries recorded as shown in FIG.

【００５３】まず、ステップＳ１１より、「Ａ」に「１
４」を代入し、アドレス「０」の辺節に注目する。ラベ
ル記録部は空にしておく。以下、辺節はそのアドレスに
よって識別することにする。すなわち、アドレス「０」
の辺節は、辺節「０」とよぶことにする。ノードとアー
クに関しても辺節と同様に識別することにする。First, from step S11, "1" is added to "A".
Substitute 4 ”and pay attention to the side node of address“ 0 ”. Leave the label recording area empty. Hereinafter, the edge clause will be identified by its address. That is, the address "0"
The side clause of is called the side clause “0”. Nodes and arcs will be identified in the same way as in the edge clause.

【００５４】ステップＳ１２において、現在の辺節
「０」は「Ａ（＝１４）」に等しくないので、ステップ
Ｓ１４に進む。ステップＳ１４において、現在の辺節
「０」には弟の辺節が存在するので、ステップＳ１５に
進む。ステップＳ１５において、隣接する弟＝２４、Ａ
＝１４であり、隣接する弟≦Ａは成り立たないので、ス
テップＳ１７に進む。In step S12, the current side clause "0" is not equal to "A (= 14)", so the process proceeds to step S14. In step S14, since the younger brother's side clause exists in the current side clause "0", the process proceeds to step S15. In step S15, adjacent younger brother = 24, A
= 14, and the adjacent younger brother ≤A does not hold, the process proceeds to step S17.

【００５５】ステップＳ１７において、現在の辺節
「０」には子の辺節が存在するので、ステップＳ１８に
進む。ステップＳ１８において、長子＝５、Ａ＝１４で
あり、長子≦Ａが成り立つので、ステップＳ１９に進
む。At step S17, since a child side node exists in the current side node "0", the process proceeds to step S18. In step S18, the eldest child = 5 and A = 14, and since the eldest child ≦ A holds, the process proceeds to step S19.

【００５６】ステップＳ１９において、現在のアーク
「０」のラベル「解」をラベル記録部にプッシュし、長
子の辺節「５」に注目する。そして、ステップＳ１２に
進む。再びステップＳ１２において、現在の辺節「５」
は「Ａ（＝１４）」に等しくないので、ステップＳ１４
に進む。In step S19, the label "solution" of the current arc "0" is pushed to the label recording section, and attention is paid to the longest node "5". Then, the process proceeds to step S12. In step S12 again, the current side clause “5”
Is not equal to “A (= 14)”, step S14
Proceed to.

【００５７】ステップＳ１４では、現在の辺節「５」に
対して弟の辺節が存在するので、ステップＳ１５に進
む。ステップＳ１５では、隣接する弟（＝１０）≦Ａ
（＝１４）が成り立つので、隣接する弟「１０」に注目
し、ステップＳ１２に進む。In step S14, since the younger brother's side clause exists for the current side clause "5", the process proceeds to step S15. In step S15, the adjacent younger brother (= 10) ≦ A
Since (= 14) holds, attention is paid to the adjacent younger brother “10”, and the process proceeds to step S12.

【００５８】ステップＳ１２において、現在の辺節「１
０」は「Ａ（＝１４）」と等しくないので、ステップＳ
１４に進む。ステップＳ１４において、現在の辺節「１
０」に対して弟の辺節が存在しないので、ステップＳ１
７に進む。In step S12, the current edge clause "1"
Since "0" is not equal to "A (= 14)", step S
Proceed to 14. In step S14, the current side clause "1
Since there is no brother's side clause for "0", step S1
Proceed to 7.

【００５９】ステップＳ１７において、現在の辺節「１
０」には子の辺節が存在するので、ステップＳ１８に進
む。ステップＳ１８において、長子＝１４、Ａ＝１４で
あり、長子≦Ａが成り立つので、ステップＳ１９に進
む。In step S17, the current side clause "1
Since "0" has a child side clause, the process proceeds to step S18. In step S18, the eldest child = 14 and A = 14, and since the eldest child ≦ A holds, the process proceeds to step S19.

【００６０】ステップＳ１９において、現在のアーク
「１０」のラベル「像」をラベル記録部にプッシュし、
長子の辺節「１４」に注目する。そして、ステップＳ１
２に進む。In step S19, the label "image" of the current arc "10" is pushed to the label recording section,
Attention is paid to the eldest son, “14”. And step S1
Go to 2.

【００６１】ステップＳ１２において、現在の辺節「１
４」は「Ａ（＝１４）」と等しいのでステップＳ１３に
進む。ステップＳ１３において、現在のアーク「１４」
のラベル「度」をラベル記録部にプッシュし、ラベル記
録部の内容である「解、像、度」を出力し、正常に処理
を終える。In step S12, the current edge clause "1"
Since "4" is equal to "A (= 14)", the process proceeds to step S13. In step S13, the current arc "14"
The label "degree" is pushed to the label recording unit, the contents "solution, image, degree" of the label recording unit are output, and the processing is normally completed.

【００６２】以上の処理によって、ノードの位置「１
４」から、対応する単語である「解像度」を得ることが
できる。同様にして、ノードの位置「１９」からは、
「解像力」が得られる。すなわち、「解像度」の入力に
対して、「解像度、解像力」の出力が得られたことにな
る。By the above processing, the node position "1"
From "4", the corresponding word "resolution" can be obtained. Similarly, from the node position "19",
"Resolution" is obtained. That is, the output of “resolution and resolution” is obtained with respect to the input of “resolution”.

【００６３】このようにして、トライを用いて、関連語
の検索処理を少ない記憶容量で高速に行うことが可能と
なる。次に、第２の実施の形態について説明する。第２
の実施の形態は、文字からその文字を含むキーワードを
検索する単語検索装置である。In this way, it is possible to perform the related word search processing at high speed with a small storage capacity by using the trie. Next, a second embodiment will be described. Second
The embodiment is a word search device that searches a character for a keyword including the character.

【００６４】図１０は、第２の実施の形態の概略構成を
示す図である。この実施の形態に係る単語検索装置は、
単語格納部５１、文字インデックス部５２、ノード位置
検索部５３、及び単語検索部５４とから構成されてい
る。この構成要素のうち、単語格納部５１、ノード位置
検索部５３及び単語検索部５４は、図３に示した、単語
格納手段１１、ノード位置検索手段１４及び単語検索手
段１２の機能を有している。また、文字インデックス部
５２は、図３のキーインデックス格納手段１３のキーイ
ンデックスを具体的な文字インデックスとしたものであ
る。FIG. 10 is a diagram showing a schematic configuration of the second embodiment. The word search device according to this embodiment is
It is composed of a word storage unit 51, a character index unit 52, a node position search unit 53, and a word search unit 54. Of these components, the word storage unit 51, the node position search unit 53, and the word search unit 54 have the functions of the word storage unit 11, the node position search unit 14, and the word search unit 12 shown in FIG. There is. Further, the character index section 52 uses the key index of the key index storage means 13 of FIG. 3 as a specific character index.

【００６５】はじめに、索引単語の集合をトライで表現
し、単語格納部５１を構成する。索引単語を表す最終ノ
ードには、その索引単語に関連する文書の集合をたどれ
るようにリンクを張っておく。第１の実施の形態と同様
に、図５のように与えられた索引単語に対応して図６の
トライによる単語インデックス（トライ・インデック
ス）を構成する。この単語インデックスは、単語格納部
５１に格納される。First, a set of index words is represented by a trie to form the word storage unit 51. The last node representing the index word is linked so as to follow the set of documents related to the index word. Similar to the first embodiment, the word index (trie index) by the trie of FIG. 6 is configured corresponding to the index word given as shown in FIG. This word index is stored in the word storage unit 51.

【００６６】さらに、索引単語を構成する文字からその
索引単語を導くことができる文字インデックスを構成す
る。この文字インデックスは、文字インデックス部５２
に格納される。図５の索引単語集合の場合、索引単語を
構成する全ての文字の集合は、｛解, 現, 析, 像, 度,
力｝である。この文字集合の各文字について、図６のト
ライを先頭からたどることによって、各々の文字の位置
を求める。その結果から、文字インデックスを作成す
る。Further, a character index that can derive the index word from the characters that form the index word is formed. This character index is the character index part 52.
Stored in. In the case of the index word set of Fig. 5, the set of all the characters that make up the index word is {solution, current, analysis, image, degree,
Power}. For each character in this character set, the position of each character is obtained by following the trie in FIG. 6 from the beginning. A character index is created from the result.

【００６７】図１１は、文字インデックスの例を示す図
である。図のように、文字とその文字を含む索引単語を
対応させる。この例では、トライ５５を文字インデック
スのデータ構造として採用している。FIG. 11 is a diagram showing an example of a character index. As shown in the figure, a character is associated with an index word containing the character. In this example, the trie 55 is adopted as the data structure of the character index.

【００６８】図５から、文字「解」を含む索引単語は
「解」「解析」「解像度」「解像力」と複数あることが
分かるが、これらへのリンク情報は「解」という文字を
表す１つのノードの位置「０」だけで済んでいる。この
ように、トライ・インデックス中での文字のノードの位
置を用いると、単語の末尾のノード位置を単語へのポイ
ンタとする場合に比べて、文字インデックスの容量を小
さくすることができる。It can be seen from FIG. 5 that there are a plurality of index words including the character "solution", that is, "solution", "analysis", "resolution", and "resolution", but the link information to these is 1 representing the character "solution". Only one node position "0" is required. As described above, by using the position of the node of the character in the trie index, the capacity of the character index can be reduced as compared with the case where the node position at the end of the word is used as a pointer to the word.

【００６９】さて、検索を実行する動作を例を交えて説
明する。検索キーとして１文字が入力され、この文字を
含む索引単語と文書集合へのリンクとを求める場合を考
えることにする。具体例として、図１１および図６のイ
ンデックスを用いて、検索キーとして「像」が入力され
た場合を想定する。Now, the operation of executing the search will be described with an example. Consider a case where one character is input as a search key and an index word including this character and a link to a document set are obtained. As a specific example, assume that “image” is input as a search key using the indexes of FIGS. 11 and 6.

【００７０】まず、ノード位置検索部５３によって、文
字インデックス部５２から検索キーの文字( 例では
「像」) を含んでいる索引単語へのリンクを見つける。
図１１より、「１０」、「２８」、「３３」というリン
ク情報が得られることが分かる。First, the node position searching unit 53 finds a link from the character indexing unit 52 to an index word containing a character (“image” in the example) of the search key.
It can be seen from FIG. 11 that the link information “10”, “28”, and “33” can be obtained.

【００７１】次に、単語検索部５４によって、リンク情
報から、単語インデックスを用いて索引単語およびその
索引単語が指し示す文書集合を求める。以下に、その手
順を説明する。Next, the word search unit 54 obtains the index word and the document set pointed to by the index word from the link information by using the word index. The procedure will be described below.

【００７２】図１２〜図１４は、第２の実施の形態にお
いて文書集合を求めるための処理手順を示すフローチャ
ートである。図１２ではステップＳ２１〜Ｓ２７の処理
を示しており、図１３ではステップＳ３１〜Ｓ３６の処
理を示しており、図１４ではステップＳ４１〜Ｓ４８の
処理を示している。以下、各ステップの処理内容を説明
する。［Ｓ２１］求める索引単語へのリンク情報は、トライの
辺節アドレスが「Ａ」で与えられたとする。索引単語の
文字列を記録するためのラベル記録部、ラベル記録部に
記録した文字列を一時的に待避させておくためのラベル
スタック、及びラベルを待避させたときの辺節を保存し
ておくための辺節スタックをそれぞれ用意し、内容をク
リアしておく。開始ノードに移動し、トライの先頭にあ
る辺節に注目する。注目している辺節を現在の辺節とよ
ぶことにする。［Ｓ２２］現在の辺節が、与えられたリンク情報「Ａ」
と等しいか否かを判断する。等しければステップＳ２３
に進み、等しくなければステップＳ３１（図１３に示
す）へ進む。［Ｓ２３］現在の辺節が、与えられたリンク情報「Ａ」
と等しい場合、現在のアークのラベルをラベル記録部に
プッシュする。［Ｓ２４］現在のノードが終了状態か否かを判断する。
終了状態であればステップＳ２５に進み、終了状態でな
ければステップＳ２６に進む。［Ｓ２５］現在のノードが終了状態の場合、ラベル記録
部の内容と、現在の終了状態ノードに対応する文書集合
へのリンクとをそれぞれ出力する。［Ｓ２６］現在の辺節が子を持っている否かを判断す
る。子をもっていればステップＳ２７に進み、子をもっ
ていなければ、処理を正常終了する。［Ｓ２７］現在の辺節が子をもっている場合、長子の辺
節に注目し、ステップＳ４１（図１４に示す）に進む。［Ｓ３１］ステップＳ２２において、現在の辺節が弟を
もっていると判断された場合、ステップＳ３２に進み、
弟をもっていなければステップＳ３４に進む。［Ｓ３２］隣接する弟の辺節のアドレスが「Ａ」以下か
否かを判断する。「Ａ」以下であればステップＳ３３に
進み、「Ａ」以下でなければステップＳ３４に進む。［Ｓ３３］隣接する弟の辺節のアドレスが「Ａ」以下の
場合、隣接する弟の辺節に注目し、ステップＳ２２（図
１２に示す）に進む。［Ｓ３４］隣接する弟の辺節のアドレスが「Ａ」以下で
ない場合、現在の辺節が子をもっているか否かを判断す
る。子をもっていればステップＳ３５に進み、子をもっ
ていなければ不正終了として処理を終了する。［Ｓ３５］長子の辺節のアドレスが「Ａ」以下か否かを
判断し、「Ａ」以下であればステップＳ３６に進み、
「Ａ」以下でなければ不正終了として処理を終了する。［Ｓ３６］現在のアークのラベルをラベル記録部にプッ
シュして、長子の辺節に注目する。そして、ステップＳ
２２（図１２に示す）に進む。［Ｓ４１］現在の辺節が弟をもっているか否かを判断す
る。弟をもっていればステップＳ４２に進み、弟をもっ
ていなければステップＳ４３に進む。［Ｓ４２］現在の辺節が弟をもっている場合には、ラベ
ル記録部の内容をラベルスタックに、隣接する弟の辺節
を辺節スタックにそれぞれプッシュする。［Ｓ４３］現在のノードが終了状態か否かを判断する。
終了状態であればステップＳ４４に進み、終了状態でな
ければステップＳ４５に進む。［Ｓ４４］現在のノードが終了状態の場合、ラベル記録
部の内容と現在のアークのラベルをつなげたもの、現在
の終了状態ノードに対応する文書集合へのリンクとをそ
れぞれ出力する。［Ｓ４５］現在の辺節が子をもっているか否かを判断す
る。子をもっていればステップＳ４６に進み、子をもっ
ていなければステップＳ４７に進む。［Ｓ４６］現在の辺節が子をもっている場合、現在のア
ークのラベルをラベル記録部にプッシュして、長子の辺
節に注目する。そして、ステップＳ４１に進む。［Ｓ４７］ラベルスタックと辺節スタックが空か否かを
判断する。空でなければステップＳ４８に進み、空であ
れば正常終了する。［Ｓ４８］ラベルスタックと辺節スタックが空ではない
とき、ラベルスタックからラベル記録部にポップし、辺
節スタックからポップした辺節に注目する。そして、ス
テップＳ４１に進む。12 to 14 are flowcharts showing the processing procedure for obtaining a document set in the second embodiment. FIG. 12 shows the processing of steps S21 to S27, FIG. 13 shows the processing of steps S31 to S36, and FIG. 14 shows the processing of steps S41 to S48. The processing contents of each step will be described below. [S21] It is assumed that the link information to the index word to be obtained is given by the side node address of the try of “A”. The label recording part for recording the character string of the index word, the label stack for temporarily saving the character string recorded in the label recording part, and the segment when the label is saved are saved. Prepare each bunsetsu stack for each and clear the contents. Go to the start node and note the edge clause at the beginning of the try. We call the bunsetsu of interest the present bunsetsu. [S22] The current side node is given the link information "A".
And whether or not If they are equal, step S23
If not equal, the process proceeds to step S31 (shown in FIG. 13). [S23] The current node is the given link information "A"
If so, the label of the current arc is pushed to the label recording unit. [S24] It is determined whether the current node is in the end state.
If it is the end state, the process proceeds to step S25, and if it is not the end state, the process proceeds to step S26. [S25] When the current node is in the end state, the contents of the label recording section and the link to the document set corresponding to the current end state node are output. [S26] It is determined whether or not the current node has children. If it has a child, the process proceeds to step S27. If it does not have a child, the process ends normally. [S27] If the current side node has a child, pay attention to the longest node and proceed to step S41 (shown in FIG. 14). [S31] If it is determined in step S22 that the current side clause has a younger brother, the process proceeds to step S32.
If no brother is present, the process proceeds to step S34. [S32] It is determined whether or not the address of the side node of the adjacent younger brother is "A" or less. If it is "A" or less, the process proceeds to step S33, and if it is not "A" or less, the process proceeds to step S34. [S33] If the address of the adjacent younger brother's side node is "A" or less, pay attention to the adjacent younger brother's side node and proceed to step S22 (shown in FIG. 12). [S34] If the address of the adjacent younger brother's node is not equal to or less than "A", it is determined whether the current node has a child. If the child has a child, the process proceeds to step S35. If the child does not have a child, the processing ends as an illegal end. [S35] It is determined whether or not the address of the child's long node is "A" or less. If "A" or less, the process proceeds to step S36.
If it is not "A" or less, the processing ends as an illegal end. [S36] The label of the current arc is pushed to the label recording unit and attention is paid to the long node. And step S
22 (shown in FIG. 12). [S41] It is determined whether or not the current node has a younger brother. If it has a younger brother, the process proceeds to step S42, and if it does not have a younger brother, the process proceeds to step S43. [S42] If the current side node has a younger brother, the contents of the label recording unit are pushed onto the label stack, and the side nodes of the adjacent younger brother are pushed onto the side node stack. [S43] It is determined whether the current node is in the end state.
If it is the end state, the process proceeds to step S44, and if it is not the end state, the process proceeds to step S45. [S44] When the current node is in the end state, the contents of the label recording section and the label of the current arc are connected, and the link to the document set corresponding to the current end state node is output. [S45] It is determined whether or not the current node has children. If it has a child, the process proceeds to step S46, and if it does not have a child, the process proceeds to step S47. [S46] If the current side node has a child, the label of the current arc is pushed to the label recording unit to pay attention to the long side node. Then, the process proceeds to step S41. [S47] It is determined whether the label stack and the side clause stack are empty. If it is not empty, the process proceeds to step S48. If it is empty, the process normally ends. [S48] When the label stack and the side segment stack are not empty, the side stack popped from the label stack to the label recording unit and the side segment popped from the side segment stack is focused on. Then, the process proceeds to step S41.

【００７３】以上の処理によって、索引単語の文字列
と、その索引単語から対応する文書集合へのリンクが出
力される。例として、索引単語へのリンク情報として
「１０」が得られたとき、上記のアルゴリズムにしたが
って図７のトライをたどることにする。By the above processing, the character string of the index word and the link from the index word to the corresponding document set are output. As an example, when “10” is obtained as the link information to the index word, the trie of FIG. 7 is followed according to the above algorithm.

【００７４】まず、ステップＳ２１より、「Ａ」に「１
０」を代入し、辺節「０」に注目する。ラベル記録部、
ラベルスタック、辺節スタックはいずれも空にしてお
く。ステップＳ２２において、現在のノード「０」は
「Ａ（＝１０）」に等しくないので、ステップＳ３１に
進む。First, from step S21, "1" is added to "A".
Substitute “0” and pay attention to the side clause “0”. Label recording section,
Leave both the label stack and the side clause stack empty. In step S22, the current node "0" is not equal to "A (= 10)", so the process proceeds to step S31.

【００７５】ステップＳ３１において、現在のノード
「０」には弟が存在するので、ステップＳ３２に進む。
ステップＳ３２において、隣接する弟＝２４、Ａ＝１０
であり、隣接する弟≦Ａは成り立たないので、ステップ
３４に進む。In step S31, since the younger brother exists in the current node "0", the process proceeds to step S32.
In step S32, adjacent younger brother = 24, A = 10
Since the adjacent younger brother ≦ A does not hold, the process proceeds to step 34.

【００７６】ステップＳ３４において、現在のノード
「０」には子が存在するので、ステップＳ３５に進む。
ステップＳ３５において、長子＝５、Ａ＝１０であり、
長子≦Ａが成り立つので、現在のアーク「０」のラベル
「解」をラベル記録部にプッシュし、長子の辺節「５」
に注目する。そして、ステップＳ２２に進む。At step S34, since the current node "0" has a child, the process proceeds to step S35.
In step S35, the eldest child = 5 and A = 10,
Since the firstborn ≤ A holds, the label "solution" of the current arc "0" is pushed to the label recording unit, and the first child's side clause "5".
Pay attention to. Then, the process proceeds to step S22.

【００７７】再びステップＳ２２において、現在の辺節
「５」は「Ａ（＝１０）」に等しくないので、ステップ
Ｓ３１に移る。ステップＳ３１では、現在の辺節「５」
に対して弟の辺節「１０」が存在するので、ステップＳ
３２に進む。ステップＳ３２では、隣接する弟（＝１
０）≦Ａ（＝１０）は成り立つので、隣接する弟「１
０」に注目し、ステップＳ２２に進む。In step S22 again, the current side clause "5" is not equal to "A (= 10)", so that the process proceeds to step S31. In step S31, the current side clause “5”
In contrast, since the younger brother's side clause “10” exists, step S
Proceed to 32. In step S32, the adjacent younger brother (= 1
Since 0) ≦ A (= 10) holds, the adjacent younger brother “1
Paying attention to "0", the process proceeds to step S22.

【００７８】ステップＳ２２において、現在の辺節「１
０」は「Ａ（＝１０）」と等しいので、現在のアーク
「１０」のラベル「像」をラベル記録部にプッシュす
る。現時点のラベル記録部の内容は「解像」である。ス
テップＳ２４に進む。ステップＳ２４において、現在の
ノード「１０」は終了状態ではないので、ステップＳ２
６に進む。ステップＳ２６において、現在の辺節「１
０」は子をもっているので、長子の辺節「１４」に注目
し、ステップＳ４１に進む。At step S22, the current side clause "1
Since "0" is equal to "A (= 10)", the label "image" of the current arc "10" is pushed to the label recording unit. The content of the label recording section at this point is "resolution". It proceeds to step S24. In step S24, the current node "10" is not in the end state, so step S2
Go to 6. In step S26, the current side clause “1
Since "0" has a child, pay attention to the longest child's edge node "14" and proceed to step S41.

【００７９】ステップＳ４１において、現在のノード
「１４」は弟をもっているので、ステップＳ４２に進
む。ステップＳ４２において、ラベル記録部の内容であ
る「解像」をラベルスタックにプッシュし、隣接する弟
の辺節「１９」を辺節スタックにプッシュする。そし
て、ステップ４３に進む。At step S41, since the current node "14" has a younger brother, the process proceeds to step S42. In step S42, "resolution", which is the content of the label recording unit, is pushed onto the label stack, and the adjacent younger brother's side node "19" is pushed onto the side node stack. Then, the process proceeds to step 43.

【００８０】ステップＳ４３において、現在のノード
「１４」は終了状態なので、ステップＳ４４に進む。ス
テップＳ４４において、ラベル記憶部の内容である「解
像」とアーク「１４」のラベルである「度」をつなげた
「解像度」と、ノード「１４」に対応する文書集合への
リンク情報である「Ｔ３」を出力する。そして、ステッ
プＳ４５に進む。At step S43, the current node "14" is in the end state, so the routine proceeds to step S44. In step S44, "resolution", which is a combination of "resolution", which is the content of the label storage unit, and "degree", which is the label of arc "14", and link information to the document set corresponding to node "14". "T3" is output. Then, the process proceeds to step S45.

【００８１】ステップＳ４５において、現在のノード
「１４」は子をもたないので、ステップＳ４７に進む。
ステップＳ４７において、ラベルスタックと辺節スタッ
クは空ではないので、ステップＳ４８に進む。ステップ
Ｓ４８において、ラベルスタックからポップして「解
像」を取り出し、これをラベル記録部に代入し、辺節ス
タックからポップして辺節「１９」を取り出し、この辺
節「１９」に注目する。現時点では、ラベルスタックと
辺節スタックは空である。そして、ステップＳ４１に進
む。In step S45, the current node "14" has no child, so the flow advances to step S47.
In step S47, the label stack and the side clause stack are not empty, so the process proceeds to step S48. In step S48, the "resolution" is popped from the label stack and is taken out into the label recording unit, and the side node "19" is popped out from the side node stack, and the side node "19" is noted. At present, the label stack and the edge stack are empty. Then, the process proceeds to step S41.

【００８２】再びステップＳ４１において、現在のノー
ド「１９」は弟をもたないので、ステップＳ４３に進
む。ステップＳ４３では、現在のノード「１９」は終了
状態なので、ステップＳ４４に進む。ステップＳ４４に
おいて、ラベル記憶部の内容である「解像」とアーク
「１９」のラベルである「力」をつなげた「解像力」
と、ノード「１９」に対応する文書集合へのリンク情報
である「Ｔ４」を出力する。そして、ステップＳ４５に
進む。In step S41 again, since the current node "19" has no younger brother, the process proceeds to step S43. In step S43, the current node "19" is in the end state, so the process proceeds to step S44. In step S44, "resolution", which is a combination of "resolution", which is the content of the label storage unit, and "force", which is the label of the arc "19".
And outputs "T4" which is the link information to the document set corresponding to the node "19". Then, the process proceeds to step S45.

【００８３】ステップＳ４５において、現在のノード
「１９」は子をもたないので、ステップＳ４７に進む。
ステップＳ４７において、ラベルスタックと辺節スタッ
クは空なので、正常に処理を終わる。In step S45, the current node "19" has no children, so the flow advances to step S47.
In step S47, since the label stack and the side clause stack are empty, the processing ends normally.

【００８４】以上の処理によって、「像」という文字を
含む索引単語へのリンクである「１０」から、２つの索
引単語( 「解像度」と「解像力」) 、および対応する文
書集合へのリンク情報（「Ｔ３」と「Ｔ４」）を求めこ
とができる。By the above processing, link information from "10", which is a link to the index word including the character "image", to two index words ("resolution" and "resolution"), and the corresponding document set. (“T3” and “T4”) can be obtained.

【００８５】次に第３の実施の形態について説明する。
図１５は、第３の実施の形態の概略構成を示すブロック
図である。これは、任意の正規表現からその正規表現を
満たすキーワードを検索する単語検索装置である。Next, a third embodiment will be described.
FIG. 15 is a block diagram showing a schematic configuration of the third embodiment. This is a word search device that searches a keyword that satisfies a regular expression from an arbitrary regular expression.

【００８６】この実施の形態に係る単語検索装置は、単
語格納部６１、文字インデックス部６２、ノード位置検
索部６３、単語検索部６４、及び正規表現解析部６５で
構成されている。このうち、単語格納部６１、文字イン
デックス部６２、ノード位置検索部６３、及び単語検索
部６４は、図１０に示した単語格納部５１、文字インデ
ックス部５２、ノード位置検索部５３、及び単語検索部
５４とほぼ同じ機能を有している。The word search device according to this embodiment comprises a word storage unit 61, a character index unit 62, a node position search unit 63, a word search unit 64, and a regular expression analysis unit 65. Of these, the word storage unit 61, the character index unit 62, the node position search unit 63, and the word search unit 64 are the word storage unit 51, the character index unit 52, the node position search unit 53, and the word search shown in FIG. It has substantially the same function as the part 54.

【００８７】正規表現解析部６５は、正規表現の検索キ
ーが入力されると、検索キーを解析し、その検索キーに
適合する単語集合を得る。その際、必要に応じて、ノー
ド位置検索部６３へ文字列を入力し、その戻り値として
各文字列のノード位置を得る。また、ノード位置を単語
検索部６４に入力して、単語集合を得る。When the regular expression search key is input, the regular expression analysis unit 65 analyzes the search key and obtains a word set that matches the search key. At that time, if necessary, the character string is input to the node position searching unit 63, and the node position of each character string is obtained as the return value. Also, the node position is input to the word search unit 64 to obtain a word set.

【００８８】以下に、正規表現解析部６５が検索を実行
する際の動作を説明する。なお、トライにおいて、２つ
のノード「Ｎ１」、「Ｎ２」を指定したときに、ノード
「Ｎ１」からたどることができ、ノード「Ｎ２」からは
たどることのできないような、「Ｎ１」を開始ノードと
するトライの部分木を、「Ｎ１」と「Ｎ２」から規定さ
れるサブトライとよぶことにする。The operation when the regular expression analysis unit 65 executes a search will be described below. In the trie, when two nodes “N1” and “N2” are designated, “N1” can be traced from the node “N1” and cannot be traced from the node “N2”. The subtree of the trie that is defined as is referred to as a subtrie defined by "N1" and "N2".

【００８９】ここで、以下の３つの手続き関数「Ｆ
１」、「Ｆ２」、「Ｆ３」を定義する。これらの関数の
処理は、正規表現解析部６５で行われる。まず、関数
「Ｆ１」について説明する。関数「Ｆ１」は、サブトラ
イ「Ｔ」と正規表現「Ｒ」を引数とし、文字列の集合
「Ｓ」を値として返す関数である。Here, the following three procedural functions "F
1 ”,“ F2 ”, and“ F3 ”are defined. The processing of these functions is performed by the regular expression analysis unit 65. First, the function “F1” will be described. The function “F1” is a function that takes the subtrie “T” and the regular expression “R” as arguments and returns the set “S” of character strings as a value.

【００９０】図１６、図１７は、関数「Ｆ１」の処理手
順を示すフローチャートである。図１６は、ステップＳ
５１〜Ｓ５８の処理を示し、図１７は、ステップＳ６１
〜Ｓ６８の処理を示している。［Ｓ５１］正規表現「Ｒ」の先頭が、確定している文字
列「Ｂ」または文字集合「Ｂ」で始まるか否かを判断す
る。確定文字列等で始まる場合にはステップＳ５２に進
み、確定文字列で始まらない場合にはステップＳ６１に
進む。［Ｓ５２］正規表現「Ｒ」の先頭が確定している文字列
「Ｂ」または文字集合「Ｂ」から始まる場合、「Ｒ」か
ら「Ｂ」の部分を除く正規表現を「Ｒ１」とし、以下の
処理を行う。［Ｓ５３］文字列または文字集合「Ｂ」を入力としてサ
ブトライ「Ｔ」をたどることができるか否かを判断す
る。たどることができた場合には、ステップＳ５５に進
み、たどることができない場合にはステップＳ５４に進
む。［Ｓ５４］エラーを関数「Ｆ１」の値「Ｓ」として返
し、処理を終了する。［Ｓ５５］「Ｂ」から「Ｔ」をたどることができた場
合、正規表現「Ｒ１」が空か否かを判断する。空であれ
ばステップＳ５６に進み、空でなければステップＳ５８
へ進む。［Ｓ５６］正規表現「Ｒ１」が空の場合、サブトライ
「Ｔ」を入力「Ｂ」でたどった先が終了状態のノードか
否かを判断する。終了状態であればステップＳ５７に進
み、終了状態でなければステップＳ５４に進む。［Ｓ５７］サブトライ「Ｔ」を入力「Ｂ」でたどった先
が終了状態のノードの場合、文字列「Ｂ」を関数「Ｆ
１」の値「Ｓ」として返し、処理を終了する。［Ｓ５８］正規表現「Ｒ１」が空でない場合、文字列
「Ｂ」を入力としてサブトライ「Ｔ」をたどったとき、
更にたどることのできる残りのサブトライを「Ｔ１」と
する。関数「Ｆ１」に、引数としてサブトライ「Ｔ１」
と正規表現「Ｒ１」を渡し、文字列「Ｂ」と関数「Ｆ
１」の評価した値である文字列集合の各々とを連結し、
得られた文字列集合を値「Ｓ」として返す。そして、処
理を終了する。［Ｓ６１］正規表現「Ｒ」の中に、確定している文字
「Ｃ１，Ｃ２，Ｃ３，・・・」が含まれているか否かを
判断する。確定している文字が含まれていればステップ
Ｓ６２に進み、含まれていなければステップＳ６８に進
む。［Ｓ６２］確定している文字Ｃｉ( ただし、ｉ＝１，
２，３，・・・）をトライにおいて表しているノードが
出現する数をＡｉとし、Ａ１，Ａ２，Ａ３，... の中で
最小値をとる文字「Ｃｊ」を、ノード位置検索部６３の
出力から決定する。［Ｓ６３］文字「Ｃｊ」を表しているノード位置の各々
「Ｐｉ( ただし、ｉ＝１，２，３，・・・）」につい
て、正規表現「Ｒ」において文字「Ｃｊ」を最後に含む
「Ｒ」の一部を正規表現「Ｒ１」とし、残りの正規表現
を「Ｒ２」とし、「Ｒ１」を受理する有限状態オートマ
トン「Ｍ」をつくり、ステップＳ６４に進む。すべての
ノード位置「Ｐｉ」について処理を終えたとき、ステッ
プＳ６７に進む。［Ｓ６４］関数「Ｆ２」に、引数として有限状態オート
マトン「Ｍ」、サブトライ「Ｔｉ」、ノード位置「Ｐ
ｉ」を渡し、関数「Ｆ２」を評価した値である文字列
「ｓ」を得る。［Ｓ６５］関数「Ｆ２」の値である文字列「ｓ」が正常
出力か否かを判断する。正常出力であればステップＳ６
６に進み、正常出力でなければ（文字列「ｓ」がエラー
の場合）ステップＳ６３に進む。［Ｓ６６］ノード位置「Ｐｉ」からたどることのできる
サブトライを「Ｔｉ」とする。関数「Ｆ１」に、引数と
してサブトライ「Ｔｉ」と正規表現「Ｒ２」を渡し、文
字列「ｓ」と、関数「Ｆ１」を評価した値である文字列
集合「Ｓ１」の各々の要素を連結した文字列の集合を求
め、文字列集合「Ｓ」としてプッシュする。そして、ス
テップＳ６３へ進む。［Ｓ６７］文字列集合「Ｓ」を出力し、処理を終える。［Ｓ６８］「Ｒ」を満たす文字数「Ｎ」を求め、関数
「Ｆ３」に、引数としてサブトライ「Ｔ」と文字数
「Ｎ」を渡し、関数「Ｆ３」を評価した値である文字列
集合を求め、文字列集合「Ｓ」として出力し、処理を終
える。16 and 17 are flowcharts showing the processing procedure of the function "F1". FIG. 16 shows the step S
51 to S58, and FIG. 17 shows step S61.
7 shows the processing of S68. [S51] It is determined whether the beginning of the regular expression "R" starts with the fixed character string "B" or the character set "B". If it starts with a fixed character string or the like, the process proceeds to step S52, and if it does not start with the fixed character string, the process proceeds to step S61. [S52] When the beginning of the regular expression "R" starts from the fixed character string "B" or the character set "B", the regular expression excluding the portion "R" from "R" is "R1", and Process. [S53] It is determined whether or not the subtrie “T” can be traced by inputting the character string or the character set “B”. If it can be traced, the process proceeds to step S55, and if it cannot be traced, the process proceeds to step S54. [S54] The error is returned as the value "S" of the function "F1", and the process ends. [S55] If "T" can be traced from "B", it is determined whether the regular expression "R1" is empty. If it is empty, the process proceeds to step S56, and if it is not empty, step S58.
Go to. [S56] If the regular expression “R1” is empty, it is determined whether or not the destination following the input “B” on the subtrie “T” is the node in the end state. If it is the end state, the process proceeds to step S57, and if it is not the end state, the process proceeds to step S54. [S57] If the node following the subtrie "T" input "B" is in the end state, the character string "B" is changed to the function "F".
The value is returned as the value "S" of "1", and the process ends. [S58] If the regular expression “R1” is not empty, and the subtrie “T” is traced with the character string “B” as input,
The remaining subtries that can be further traced are designated as "T1". Subtrie "T1" as an argument to the function "F1"
And the regular expression "R1" are passed, and the character string "B" and the function "F
Concatenate each of the character string sets that are the evaluated values of "1",
The obtained character string set is returned as the value "S". Then, the process ends. [S61] It is determined whether or not the fixed character "C1, C2, C3, ..." Is included in the regular expression "R". If the confirmed character is included, the process proceeds to step S62, and if not, the process proceeds to step S68. [S62] Determined character Ci (where i = 1,
2, 3, ...) In the trie, the number of occurrences of nodes is Ai, and the character “Cj” having the minimum value among A1, A2, A3 ,. Determined from the output of. [S63] For each of the node positions “Pi (where i = 1, 2, 3, ...)” representing the character “Cj”, the regular expression “R” includes the character “Cj” at the end. A part of "R" is defined as a regular expression "R1" and the remaining regular expressions are defined as "R2", a finite state automaton "M" that accepts "R1" is created, and the process proceeds to step S64. When the processing is completed for all the node positions “Pi”, the process proceeds to step S67. [S64] Finite state automaton "M", subtrie "Ti", node position "P" as an argument to the function "F2"
i ”is passed, and the character string“ s ”that is the value obtained by evaluating the function“ F2 ”is obtained. [S65] It is determined whether the character string "s", which is the value of the function "F2", is normally output. If the output is normal, step S6
6, the output is not normal (when the character string "s" is an error), the process proceeds to step S63. [S66] The subtrie that can be traced from the node position "Pi" is set to "Ti". The subtrie "Ti" and the regular expression "R2" are passed as arguments to the function "F1", and the character string "s" and each element of the character string set "S1" which is the value evaluated from the function "F1" are concatenated. Then, a set of character strings is calculated and pushed as a character string set "S". Then, the process proceeds to step S63. [S67] The character string set "S" is output, and the process ends. [S68] The number of characters "N" that satisfies "R" is obtained, the sub-try "T" and the number of characters "N" are passed to the function "F3" as arguments, and the character string set that is the value obtained by evaluating the function "F3" is obtained. , And output as a character string set “S”, and the process ends.

【００９１】次に、関数「Ｆ２」について説明する。関
数「Ｆ２」は、有限状態オートマトン「Ｍ」とサブトラ
イ「Ｔ」、ノード位置「Ｐ」を引数とし、文字列「ｓ」
を値として返す関数である。Next, the function "F2" will be described. The function "F2" takes the finite state automaton "M", the subtrie "T", and the node position "P" as arguments, and the character string "s".
Is a function that returns as a value.

【００９２】図１８は、関数Ｆ２の処理手順を示すフロ
ーチャートである。［Ｓ７１］サブトライ「Ｔ」において、ノード位置
「Ｐ」を指定すると、そのノードに至るまでの経路は一
意に定まる。そこで、サブトライ「Ｔ」をノード位置
「Ｐ」に至るまでたどることによって得られるラベル列
「Ｌ」を、有限状態オートマトン「Ｍ」の入力とする。［Ｓ７２］ラベル列「Ｌ」が有限状態オートマトン
「Ｍ」に受理されるか否かを判断する。受理される場合
はステップＳ７３に進み、受理されない場合はステップ
Ｓ７４に進む。［Ｓ７３］ラベル列「Ｌ」が有限状態オートマトン
「Ｍ」に受理される場合、ラベル列「Ｌ」を出力し、処
理を終了する。［Ｓ７４］ラベル列「Ｌ」が有限状態オートマトン
「Ｍ」に受理されない場合、エラーを出力し、処理を終
了する。FIG. 18 is a flow chart showing the processing procedure of the function F2. [S71] When the node position "P" is designated in the subtrie "T", the route to the node is uniquely determined. Therefore, the label string "L" obtained by tracing the subtrie "T" to the node position "P" is used as the input of the finite state automaton "M". [S72] It is determined whether the label string "L" is accepted by the finite state automaton "M". If it is accepted, the process proceeds to step S73, and if it is not accepted, the process proceeds to step S74. [S73] When the label string "L" is accepted by the finite state automaton "M", the label string "L" is output and the process ends. [S74] If the label string "L" is not accepted by the finite state automaton "M", an error is output and the process ends.

【００９３】次に、関数「Ｆ３」について説明する。関
数「Ｆ３」は、サブトライ「Ｔ」と文字数「Ｎ」を引数
とし、文字列の集合「Ｓ」を値として返す。図１９は、
関数Ｆ３の処理手順を示すフローチャートである。［Ｓ８１］開始ノードが終了状態であり、かつ文字数
「Ｎ」として「０」をとることができる場合、空の文字
列を文字列の集合「Ｓ」にプッシュする。［Ｓ８２］サブトライ「Ｔ」を文字数「Ｎ」分だけすべ
てたどり、たどった先が終了状態のノードとなるラベル
列の集合「Ｌ」を求め、「Ｌ」を文字列の集合「Ｓ」に
プッシュする。［Ｓ８３］「Ｓ」が空か否かを判断する。「Ｓ」が空で
あればステップＳ８５に進み、「Ｓ」が空でないならス
テップＳ８４に進む。［Ｓ８４］「Ｓ」が空でない場合、「Ｓ」を出力し、処
理を終了する。［Ｓ８５］「Ｓ」が空の場合、エラーを出力し、処理を
終了する。Next, the function "F3" will be described. The function "F3" takes the subtrie "T" and the number of characters "N" as arguments, and returns the set "S" of character strings as a value. FIG. 19 shows
It is a flow chart which shows a processing procedure of function F3. [S81] When the start node is in the end state and the number of characters "N" can be "0", an empty character string is pushed to the character string set "S". [S82] The subtrie "T" is traced by the number of characters "N", and the set of label strings "L" whose node is the end state is obtained, and "L" is pushed to the set of character strings "S". To do. [S83] It is determined whether "S" is empty. If "S" is empty, the process proceeds to step S85, and if "S" is not empty, the process proceeds to step S84. [S84] If "S" is not empty, "S" is output and the process ends. [S85] If "S" is empty, an error is output and the process ends.

【００９４】以上のように関数を定義すると、関数「Ｆ
１」に、引数として単語インデックスのトライと任意の
正規表現を渡し、関数「Ｆ１」を評価することによっ
て、与えた正規表現に適合する単語の集合を得ることが
できる。When the function is defined as above, the function "F
By passing a trie of a word index and an arbitrary regular expression as an argument to “1” and evaluating the function “F1”, a set of words conforming to the given regular expression can be obtained.

【００９５】具体例として、正規表現「？アイデ？ア
？」に合致する単語を探す場合を考えることにする。正
規表現において、「？」は０個以上の任意の文字に該当
するワイルドカードを意味する。検索の意図は、「アイ
デア」に対して「アイディア」などといった表記の揺れ
を考慮し、かつ「アイデ？ア」という文字列を含む単語
を検索することである。このように、正規表現に合致す
る単語を検索できることによって、検索洩れを少なくす
ることが期待できる。As a specific example, let us consider a case of searching for a word that matches the regular expression “? Ide? A?”. In the regular expression, "?" Means a wildcard corresponding to zero or more arbitrary characters. The intent of the search is to search for words that include the character string "idea", taking into consideration fluctuations in notation such as "idea" with respect to "idea." As described above, by being able to search for a word that matches the regular expression, it is possible to reduce omission of search.

【００９６】例えば、以下のような単語インデックスが
単語格納部６１に格納されている場合を考える。図２０
は、第３の実施の形態におけるトライ６６の例を示す図
である。この例では、各辺節のラベルとして、カタカナ
１文字が与えられている。For example, consider a case where the following word index is stored in the word storage unit 61. Figure 20
FIG. 16 is a diagram showing an example of a trie 66 according to the third embodiment. In this example, one katakana character is given as the label of each side clause.

【００９７】図２１は、第３の実施の形態における文字
インデックスの例を示す図である。この文字インデック
スでは、各文字に対して、「出現数」と「対応するノー
ド位置」とが対応付けられている。FIG. 21 is a diagram showing an example of a character index in the third embodiment. In this character index, the "number of appearances" and the "corresponding node position" are associated with each character.

【００９８】ここで、関数「Ｆ１」に、引数として単語
インデックスのトライ「Ｔ」と正規表現「？アイデ？ア
？」（Ｒ）を渡した場合の、関数「Ｆ１」の値の評価手
順を説明する。Here, the evaluation procedure of the value of the function “F1” when the word index trie “T” and the regular expression “? Ide? A?” (R) are passed as arguments to the function “F1”. explain.

【００９９】ステップＳ５１において、正規表現「Ｒ」
は確定文字列から始まっていなので、ステップＳ６１へ
進む。ステップＳ６１において、正規表現「Ｒ」の中
に、確定している文字( 「ア」「イ」「デ」) が含まれ
ているので、ステップＳ６２へ進む。In step S51, the regular expression "R"
Starts from the fixed character string, the process proceeds to step S61. In step S61, the regular expression "R" includes the confirmed character ("a", "i", "de"), so the process proceeds to step S62.

【０１００】ステップＳ６２において、「Ｒ」の確定し
ている３つの文字｛Ｃ１，Ｃ２，Ｃ３｝＝｛ア, イ,
デ｝を表すノードの出現数をそれぞれ求める。図２１の
文字インデックスからそれぞれ、「５」「２」「２」で
あると分かる。In step S62, the three fixed characters "R" {C1, C2, C3} = {A, Y,
The number of appearances of the node representing D} is obtained. It can be seen from the character indexes in FIG. 21 that they are “5”, “2”, and “2”, respectively.

【０１０１】ステップＳ６３において、これらの中で最
小値をとる文字として「イ」をとりあげる。そして、文
字「イ」を表しているすべてのノード位置｛Ｐ１，Ｐ
２｝＝｛１０，１２０｝について、以下の処理を行う。In step S63, "i" is taken as the character having the minimum value among these. Then, all node positions {P1, P
The following processing is performed for 2} = {10, 120}.

【０１０２】ステップＳ６３で、正規表現「？アイデ？
ア？」から、「Ｒ１」を「？アイ」、「Ｒ２」を「デ？
ア？」とし、「Ｒ１」を受理する有限オートマトンを
「Ｍ」を作る。In step S63, the regular expression "?
Oh? , "R1" for "? Eye" and "R2" for "de?
Oh? , And make a finite automaton that accepts “R1”, “M”.

【０１０３】図２２は、「イ」をとりあげた場合の有限
オートマトン６７の遷移図である。図に示すように、文
字列の途中（最初でもよい）で「ア」、「イ」が連続し
て出現した場合に、終了状態のノードへ遷移する。FIG. 22 is a transition diagram of the finite state automaton 67 when "a" is taken up. As shown in the figure, when “a” and “a” appear consecutively in the middle of the character string (may be the first), the node transits to the end state node.

【０１０４】ステップＳ６４で、関数「Ｆ２」に、引数
として有限状態オートマトン「Ｍ」、サブトライ
「Ｔ」、ノード位置「Ｐ１」を渡し、関数「Ｆ２」を評
価する。さて、関数「Ｆ２」の処理（図１８に示す）の
ステップＳ７１において、サブトライ「Ｔ１」をノード
位置「Ｐ１（＝１０）) に至るまでに得られるラベル列
「Ｌ」は「アイ」である。In step S64, the finite state automaton "M", the subtrie "T", and the node position "P1" are passed as arguments to the function "F2", and the function "F2" is evaluated. Now, in step S71 of the processing of the function "F2" (shown in FIG. 18), the label string "L" obtained until the subtrie "T1" reaches the node position "P1 (= 10)" is "eye". .

【０１０５】ステップＳ７２の判断において、「Ｌ」は
有限状態オートマトン「Ｍ」に受理されることが分か
る。ステップＳ７３で、ラベル列「Ｌ」（＝「アイ」）
を出力し、関数「Ｆ２」の処理を完了する。In the judgment at step S72, it is understood that "L" is accepted by the finite state automaton "M". In step S73, the label string “L” (= “eye”)
Is output and the processing of the function “F2” is completed.

【０１０６】関数「Ｆ２」の値は、文字列「アイ」であ
る。したがって、ステップＳ６４では、これを「ｓ」と
する。ステップＳ６５で、文字列「ｓ」（＝「アイ」)
はエラーではないので、ステップＳ６６へ進む。The value of the function "F2" is the character string "eye". Therefore, in step S64, this is set to "s". In step S65, the character string "s" (= "eye")
Is not an error, the process proceeds to step S66.

【０１０７】ステップＳ６６で、関数「Ｆ１」に、引数
としてサブトライ「Ｔ１」と正規表現「Ｒ２」（＝「デ
? ア? 」）を渡し、関数「Ｆ１」を評価する。以後、こ
の関数「Ｆ１」に関しては関数「Ｆ１’」などと表記す
る。In step S66, the function "F1" is added to the subtrie "T1" as an argument and the regular expression "R2" (= "data").
? A? ") And evaluate the function" F1 ". Hereinafter, the function "F1" will be referred to as a function "F1 '" or the like.

【０１０８】図２３は、ノード位置Ｐ１（＝１０）から
たどることのできるサブトライ「Ｔ１」を示す図であ
る。このサブトライ６８は、図２０のトライ６６におけ
るノード位置Ｐ１（＝１０）の子孫に該当する全ての経
路を抽出したものである。FIG. 23 is a diagram showing a subtrie "T1" that can be traced from the node position P1 (= 10). This subtrie 68 is an extraction of all the routes corresponding to the descendants of the node position P1 (= 10) in the trie 66 of FIG.

【０１０９】さて、関数「Ｆ１’」のステップＳ５１に
おいて、正規表現「Ｒ’」の先頭は確定している文字
「Ｂ’」（＝「デ」）から始まっている。ステップＳ５
２において、「Ｒ１’」を「？ア？」とし、ステップＳ
５３へ進む。Now, in step S51 of the function "F1 '", the head of the regular expression "R'" begins with the fixed character "B '" (= "de"). Step S5
In step 2, "R1 '" is set to "?
Proceed to 53.

【０１１０】関数「Ｆ１’」のステップＳ５３で、文字
「Ｂ’」（＝「デ」）を入力としてサブトライ「Ｔ’」
をたどることができるので、ステップＳ５５へ進む。関
数「Ｆ１’」のステップＳ５５の判断において、正規表
現「Ｒ１’」「？ア？」は空ではないので、ステップＳ
５８へ進む。In step S53 of the function "F1 '", the subtrie "T'" is input with the character "B '" (= "de") as an input.
Can be traced, the process proceeds to step S55. In the determination of the function “F1 ′” in step S55, the regular expressions “R1 ′” and “?
Proceed to 58.

【０１１１】関数「Ｆ１’」のステップＳ５８で、文字
「Ｂ’」（＝「デ」）を入力としてサブトライ「Ｔ’」
をたどり、更にたどることのできる残りのサブトライを
「Ｔ１’」とする。関数「Ｆ１」に、引数としてサブト
ライ「Ｔ１’」と正規表現「Ｒ１’」（＝「？ア？」）
を渡し、関数「Ｆ１」を評価する。以後、このＦ１に関
してはＦ１’’などと表記する。In step S58 of the function "F1 '", the sub-trie "T'" is input with the character "B '" (= "de") as an input.
The remaining subtries that can be further traced are referred to as "T1 '". Subtrie "T1 '" and regular expression "R1'" (= "? A?") As an argument to the function "F1"
To evaluate the function "F1". Hereinafter, this F1 will be referred to as F1 ″.

【０１１２】さて、関数「Ｆ１’’」のステップＳ５１
において、正規表現「Ｒ’’」（＝「？ア？」）の先頭
は確定文字列から始まっていないので、ステップＳ６１
へ進む。Now, step S51 of the function "F1""
, The regular expression “R ″” (= “? A?”) Does not start from the fixed character string, so step S61.
Go to.

【０１１３】関数「Ｆ１’’」のステップＳ５１におい
て、正規表現「Ｒ’’」（＝「？ア？」）の中に確定文
字「Ｃ１’’」（＝「ア」）が含まれているので、ステ
ップＳ６２へ進む。In step S51 of the function "F1", the fixed character "C1""(=" A ") is included in the regular expression" R "" (= "? A?"). Therefore, the process proceeds to step S62.

【０１１４】関数「Ｆ１’’」のステップＳ６２で、
「Ｃｊ’’」として「Ｃ１’’」（＝「ア」）が相当
し、ノード位置の集合｛Ｐ１’’，Ｐ２’’｝＝｛３
０，８０｝が得られる。In step S62 of the function "F1 ''",
“C1 ″” (= “A”) corresponds to “Cj ″”, and a set of node positions {P1 ″, P2 ″} = {3
0,80} is obtained.

【０１１５】「Ｃ１’’」（＝「ア」）について、関数
「Ｆ１’’」のステップＳ６３以下の処理を行う。関数
「Ｆ１’’」のステップＳ６３で、「Ｒ１’’」として
「？ア」、「Ｒ２’’」として「？」をとる。「Ｒ
１’’」を受理する有限状態オートマトン「Ｍ’’」を
つくる。With respect to “C1 ″” (= “A”), the processing of step S63 and subsequent steps of the function “F1 ″” is performed. In step S63 of the function "F1", "?" Is taken as "R1" and "?" Is taken as "R2". "R
Create a finite state automaton "M""that accepts 1"".

【０１１６】関数「Ｆ１’’」のステップＳ６４で、関
数「Ｆ２」に、引数として有限状態オートマトン
「Ｍ’’」、サブトライ「Ｔ’’」、ノード位置「Ｐ
１’’」（＝３０）を渡し、関数「Ｆ２」を評価する。
以後、この関数「Ｆ２］に関して関数「Ｆ２’’」など
と表記する。In step S64 of the function "F1", the finite state automaton "M"", the subtrie" T "", the node position "P" are added to the function "F2" as arguments.
1 ″ ”(= 30) is passed and the function“ F2 ”is evaluated.
Hereinafter, this function “F2” will be referred to as a function “F2 ″” or the like.

【０１１７】さて、関数「Ｆ２’’」のステップＳ７１
において、サブトライ「Ｔ’’」をノード位置「Ｐ
１’’」（＝３０）に至るまでに得られるラベル列
「Ｌ’’」を「ア」として、有限状態オートマトン
「Ｍ’’」に入力する。Now, step S71 of the function "F2 ''"
, The sub-trie "T""at the node position" P "
The label string “L ″” obtained up to 1 ″ ”(= 30) is input to the finite state automaton“ M ″ ”as“ A ”.

【０１１８】ステップＳ７２において、「Ｌ’’」（＝
「ア」）は有限状態オートマトン「Ｍ’’」に受理され
ることが分かる。関数「Ｆ２’’」のステップＳ７３
で、ラベル列「Ｌ’’］（＝「ア」）を出力し、関数
「Ｆ２’’」の処理を完了する。In step S72, "L ''" (=
It can be seen that "a") is accepted by the finite state automaton "M ''". Step S73 of the function "F2 ''"
Then, the label string “L ″” (= “A”) is output, and the processing of the function “F2 ″” is completed.

【０１１９】関数「Ｆ２’’」の値「ア」は、文字列で
ある。関数「Ｆ１’’」のステップＳ６４では、これを
「Ｓ’’」とする。関数「Ｆ１’’」のステップＳ６５
で、文字列「Ｓ’’」（＝「ア」）はエラーではないの
で、ステップＳ６６へ進む。The value "A" of the function "F2 ''" is a character string. In step S64 of the function "F1", this is set to "S". Step S65 of the function "F1""
Since the character string “S ″” (= “A”) is not an error, the process proceeds to step S66.

【０１２０】関数「Ｆ１’’」のステップＳ６６で、ノ
ード位置「Ｐ１’’」（＝３０）からたどることのサブ
トライを「Ｔ１’’」とし、関数「Ｆ１」に、引数とし
てサブトライ「Ｔ１’’」と正規表現「Ｒ２’’」（＝
「？）を渡し、関数「Ｆ１」を評価する。以後、この関
数「Ｆ１」に関しては関数「Ｆ１’’’」などと表記す
る。In step S66 of the function "F1", the subtrie of tracing from the node position "P1" (= 30) is set to "T1"", and the subtrie" T1 '"is added to the function" F1 "as an argument. '"And the regular expression" R2 "" (=
Pass "?" And evaluate the function "F1". Hereinafter, this function “F1” will be referred to as a function “F1 ′ ″” or the like.

【０１２１】さて、関数「Ｆ１’’’」のステップＳ５
１において、正規表現「Ｒ’’’」（＝「？」）の先頭
は確定文字列から始まっていないので、ステップＳ６１
へ進む。Now, step S5 of the function "F1 '""
1, the beginning of the regular expression “R ′ ″” (= “?”) Does not start from the fixed character string, so step S61
Go to.

【０１２２】関数「Ｆ１’’’」のステップＳ６１で、
正規表現「Ｒ’’’」（＝「？」）の中に確定文字は含
まれていないので、ステップＳ６８へ進む。関数「Ｆ
１’’’」のステップＳ６８で、「Ｒ’’’」（＝
「？」）を満たす文字数「Ｎ’’’」は無制限である。
関数「Ｆ３」に、引数としてサブトライ「Ｔ’’’」と
文字数「Ｎ’’’］を渡し、関数「Ｆ３」を評価した値
である文字列集合は｛「」, 「リズム」｝であり、関数
「Ｆ１’’’」の値として返す。At step S61 of the function "F1 '",
Since no fixed character is included in the regular expression “R ″ ′” (= “?”), The process proceeds to step S68. Function "F
In step S68 of "1"'","R'"" (=
The number of characters “N ′ ″” that satisfies “?”) Is unlimited.
The character string set which is a value obtained by passing the sub-try “T ′ ″” and the number of characters “N ′ ″] as arguments to the function“ F3 ”is {“ ”,“ rhythm ”}. , As the value of the function “F1 ′ ″”.

【０１２３】文字列「ｓ’’」（＝「ア」）と、関数
「Ｆ１’’’」を評価した値である文字列集合｛「」,
「リズム」｝の各々の要素を連結した文字列の集合は
｛「ア」, 「アリズム」｝である。関数「Ｆ１’’」の
ステップＳ６６で、これを文字列集合「Ｓ’’」にプッ
シュする。現時点での「Ｓ’’」の内容は、｛「ア」,
「アリズム」｝である。そして、ステップＳ６３へ進
む。A character string set {"", which is a value obtained by evaluating the character string "s""(=" A ") and the function" F1 '"".
A set of character strings which connect each element of "rhythm"} is {"a", "arism"}. In step S66 of the function "F1", this is pushed to the character string set "S"". The contents of "S" at the moment are {"A",
"Arism"}. Then, the process proceeds to step S63.

【０１２４】関数「Ｆ１’’」のステップＳ６３以後の
処理で、ノード位置「Ｐ１」（＝３０）についての処理
を終えたことになる。次に同様に「Ｐ２’’」について
の処理を行い、再びステップＳ６３に戻ってきて、結果
として、「Ｓ’’」＝｛「ア」, 「アリズム」, 「ィ
ア」｝を出力する。The processing for the node position "P1" (= 30) is completed by the processing of the function "F1""after step S63. Next, similarly, the process for “P2 ″” is performed, the process returns to step S63 again, and as a result, “S ″” = {“a”, “arism”, “ia”} is output.

【０１２５】関数「Ｆ１’」のステップＳ５８に戻り、
文字「Ｂ’」（＝「デ」）と関数「Ｆ１’’」の値
｛「ア」, 「アリズム」, 「ィア」｝の各々の要素とを
連結し、関数「Ｆ１’」の値として｛「デア」, 「デア
リズム」, 「ディア」｝を返す。Returning to step S58 of the function "F1 '",
The value of the function "F1 '" by connecting the character "B'" (= "de") and each element of the value "", "arism", "ia"} of the function "F1"" Returns {“Dare”, “Dearism”, “Dear”}.

【０１２６】関数「Ｆ１」のステップＳ６６に戻り、文
字列「ｓ」（＝「アイ」）と関数「Ｆ１’」の値｛「デ
ア」, 「デアリズム」, 「ディア」｝の各々の要素とを
連結し、文字列集合「Ｓ」に｛「アイデア」, 「アイデ
アリズム」, 「アイディア」｝をプッシュする。現時点
での「Ｓ」の内容は、｛「アイデア」, 「アイデアリズ
ム」, 「アイディア」｝である。ステップＳ６３へ進
む。Returning to step S66 of the function "F1", the character string "s" (= "eye") and each element of the value {"Dare", "Dearism", "Dear"} of the function "F1 '" , And push {“idea”, “idea rhythm”, “idea”} to the character string set “S”. The content of "S" at the present moment is {"idea", "idea rhythm", "idea"}. It proceeds to step S63.

【０１２７】関数「Ｆ１」のステップＳ６３において、
文字「イ」を表しているノード位置「Ｐ２」（＝１２
０）について以下の処理を行う。関数「Ｆ１」のステッ
プＳ６４で、関数「Ｆ２」に、引数として有限状態オー
トマトン「Ｍ」、サブトライ「Ｔ」、ノード位置「Ｐ
２」を渡し、関数「Ｆ２」を評価した値である文字列
「ｓ」（＝「ネオアイ」）を得る。In step S63 of the function "F1",
Node position "P2" (= 12)
The following processing is performed for 0). In step S64 of the function "F1", the finite state automaton "M", the subtrie "T", the node position "P" are added to the function "F2" as arguments.
2 ”is passed, and the character string“ s ”(=“ neoeye ”), which is the value obtained by evaluating the function“ F2 ”, is obtained.

【０１２８】関数「Ｆ１」のステップＳ６５で、文字列
「ｓ」（＝「ネオアイ」）はエラーはないので、ステッ
プＳ６６に進む。Ｆ１のステップＳ６６で、「Ｐ２」
（＝１２０）からたどることのできるサブトライを「Ｔ
２」とし、関数「Ｆ１」に、引数としてサブトライ「Ｔ
２」と正規表現「Ｒ２」（＝「デ？ア？」）を渡し、関
数「Ｆ１」を評価する。関数「Ｆ１」から文字列集合
｛「デア」, 「デアリズム」｝が得られ、「Ｓ」に
｛「ネオアイデア」, 「ネオアイデアリズム」｝をプッ
シュし、ステップＳ６３へ進む。現時点での「Ｓ」の内
容は、｛「アイデア」, 「アイデアリズム」, 「アイデ
ィア」, 「ネオアイデア」, 「ネオアイデアリズム」｝
である。In step S65 of the function "F1", there is no error in the character string "s" (= "neoeye"), so the flow advances to step S66. In step S66 of F1, "P2"
The subtrie that can be traced from (= 120) is "T
2 ”, and the function“ F1 ”has a subtrie“ T ”as an argument.
2 ”and the regular expression“ R2 ”(=“ de? A? ”) Are passed, and the function“ F1 ”is evaluated. A character string set {"Dare", "Dearism"} is obtained from the function "F1", {"Neo idea", "Neo idea rhythm"} is pushed to "S", and the process proceeds to step S63. The contents of "S" at the moment are {"idea", "idea rhythm", "idea", "neo idea", "neo idea rhythm"}
Is.

【０１２９】関数「Ｆ１」のステップＳ６３で、すべて
のノード位置「Ｐ１」, 「Ｐ２」について処理を終えた
ので、ステップＳ６７で、文字列集合「Ｓ」（＝｛「ア
イデア」, 「アイデアリズム」, 「アイディア」, 「ネ
オアイデア」, 「ネオアイデアリズム」｝）を関数「Ｆ
１」の値として返し、処理を終える。At step S63 of the function "F1", the processing is completed for all the node positions "P1" and "P2", so at step S67 the character string set "S" (= {"idea", "idea rhythm"). , "Idea", "Neo-idea", "Neo-ideaism"}) function "F
The value is returned as a value of "1", and the process ends.

【０１３０】以上のように、単語インデックスのトライ
と正規表現「？アイデ？ア？」から、単語集合｛「アイ
デア」, 「アイデアリズム」, 「アイディア」, 「ネオ
アイデア」, 「ネオアイデアリズム」｝が求まる。As described above, from the word index try and the regular expression “? Ide? A?”, The word set {“idea”, “idea rhythm”, “idea”, “neo idea”, “neo idea rhythm” } Is obtained.

【０１３１】なお、上記の原理構成若しくは実施の形態
は、以下のような変形例が考えられる。図２４は、第４
の実施の形態の概略構成を示すブロック図である。これ
は、第１の実施の形態（図４に示す）における関連単語
インデックス部２１を複数設けたものである。The following modifications can be considered for the above-described principle configuration or embodiment. FIG. 24 shows the fourth
3 is a block diagram showing a schematic configuration of the embodiment of FIG. This is provided with a plurality of related word index parts 21 in the first embodiment (shown in FIG. 4).

【０１３２】この実施の形態では、２つの関連単語イン
デックス部７１，７２のそれぞれに、関連単語インデッ
クスが格納されている。ノード位置検索部７３は、単語
が入力されると、双方の関連単語インデックス部７１，
７２からノードの位置集合を取得する。そのノードの位
置の集合は、どちらの関連単語インデックス部７１，７
２から取得したのかを示す情報と共に、単語検索部７４
に渡される。In this embodiment, a related word index is stored in each of the two related word index sections 71 and 72. When a word is input, the node position searching unit 73, the related word indexing units 71 of both sides,
Obtain the position set of nodes from 72. The set of positions of the nodes is determined by which of the related word index parts 71 and 7
The word search unit 74 together with the information indicating whether it is acquired from
Passed to.

【０１３３】単語検索部７４は、ノード位置検索部７３
から受け取ったノード位置の集合に基づいて、関連単語
インデックス部７１，７２から関連単語の集合を取得
し、出力する。The word searching unit 74 is the node position searching unit 73.
Based on the set of node positions received from, the set of related words is acquired from the related word index units 71 and 72 and output.

【０１３４】図２５は、第５の実施の形態の概略構成を
示すブロック図である。これは、第４の実施の形態（図
２４に示す）を具体化したものである。この実施の形態
では、よみインデックス部７１ａと表記インデックス部
７２ａが設けられている。よみインデックス部７１ａ
は、表記によって表された表記単語の集合を、深さ優先
順にノードが記録されるトライ形式で格納していると共
に、表記単語と各表記単語に対応する文字列を構成して
いるよみ単語のノードの位置の集合とを対応付けて格納
している。表記インデックス部７２ａは、よみによって
表された単語の集合を、深さ優先順にノードが記録され
るトライ形式で格納していると共に、よみ単語と各よみ
単語に対応する表記単語の文字列を構成しているノード
の位置の集合とを対応付けて格納している。FIG. 25 is a block diagram showing a schematic structure of the fifth embodiment. This is a concrete embodiment of the fourth embodiment (shown in FIG. 24). In this embodiment, a reading index portion 71a and a notation index portion 72a are provided. Reading index part 71a
Stores the set of notation words represented by the notation in a trie format in which nodes are recorded in depth-first order, and also includes the notation words and the reading words that compose the character string corresponding to each notation word. Stored in association with a set of node positions. The notation index unit 72a stores a set of words represented by readings in a trie format in which nodes are recorded in depth-first order, and constructs reading words and character strings of notation words corresponding to the reading words. It is stored in association with the set of positions of the nodes that are operating.

【０１３５】ノード位置検索部７３ａと単語検索部７４
ａとは、第４の実施の形態のノード位置検索部７３と単
語検索部７４と同様の機能を有してる。図２６は、第６
の実施の形態の概略構成を示すブロック図である。これ
は、第４の実施の形態の関連語インデックス部７１，７
２をさらに増やしたものである。The node position searching unit 73a and the word searching unit 74
The a has the same function as the node position searching unit 73 and the word searching unit 74 of the fourth embodiment. FIG. 26 shows the sixth
3 is a block diagram showing a schematic configuration of the embodiment of FIG. This corresponds to the related word index parts 71 and 7 of the fourth embodiment.
It is an increase of 2.

【０１３６】この実施の形態では、多数の関連単語イン
デックス部８１ａ，８１ｂ，８１ｃ，・・・が設けられ
ている。ノード位置検索部８２は、単語の入力を受け取
ると、各関連単語インデックス部８１ａ，８１ｂ，８１
ｃ，・・・から、該当するノードの位置の集合を受け取
る。単語検索部８３は、ノード位置検索部８２から受け
取ったノード位置の集合に基づいて、関連単語インデッ
クス部８１ａ，８１ｂ，８１ｃ，・・・から関連単語の
集合を取得し、出力する。In this embodiment, a large number of related word index parts 81a, 81b, 81c, ... Are provided. When the node position search unit 82 receives a word input, each of the related word index units 81a, 81b, 81
From c, ..., Receive a set of positions of the corresponding nodes. The word search unit 83 acquires and outputs a set of related words from the related word index units 81a, 81b, 81c, ... Based on the set of node positions received from the node position search unit 82.

【０１３７】以上のように、本発明においては、単語の
集合を、深さ優先順にノードが記録されるトライ形式に
したがって格納し、単語格納部を構成すると共に、トラ
イにおいて単語を構成しているノードの位置をトライ中
の単語を一意に識別できる値として用いることによっ
て、親のノードへのリンク情報を用いることなく、任意
のノードを含む経路を特定することができる。また、ト
ライにおける単語を、トライにおけるノードの位置をポ
インタとして参照することによって、トライ・インデッ
クス以外に別個に単語集合を表すデータは不要になる。
その結果、必要な記憶容量は、従来技術の場合に比べ
て、著しく少なくて済む。As described above, according to the present invention, a set of words is stored according to a trie format in which nodes are recorded in the depth-first order to form a word storage unit and a word in a trie. By using the node position as a value that can uniquely identify the word in the try, it is possible to specify a route including an arbitrary node without using the link information to the parent node. Further, by referring to the words in the trie by using the position of the node in the trie as a pointer, data other than the trie index that separately represents the word set becomes unnecessary.
As a result, the required storage capacity is significantly smaller than in the case of the prior art.

【０１３８】なお、上記の処理機能は、コンピュータに
よって実現することができる。その場合、システム構築
支援装置が有するべき機能の処理内容は、コンピュータ
で読み取り可能な記録媒体に記録されたプログラムに記
述されており、このプログラムをコンピュータで実行す
ることにより、上記処理がコンピュータで実現される。
コンピュータで読み取り可能な記録媒体としては、磁気
記録装置や半導体メモリ等がある。市場を流通させる場
合には、ＣＤ−ＲＯＭ(Compact Disc Read Only Memor
y) やフロッピーディスケット等の可搬型記録媒体にプ
ログラムを格納して流通させたり、ネットワークを介し
て接続されたコンピュータの記憶装置に格納しておき、
ネットワークを通じて他のコンピュータに転送すること
もできる。コンピュータで実行する際には、コンピュー
タ内のハードディスク装置等にプログラムを格納してお
き、メインメモリにロードして実行する。The above-mentioned processing functions can be realized by a computer. In that case, the processing contents of the functions that the system construction supporting device should have are described in a program recorded on a computer-readable recording medium, and by executing this program on a computer, the above-described processing is realized on the computer. To be done.
Computer-readable recording media include magnetic recording devices and semiconductor memories. When distributing in the market, CD-ROM (Compact Disc Read Only Memor)
y) or a floppy diskette or other portable recording medium for storing and distributing the program, or storing it in a storage device of a computer connected via a network,
It can also be transferred to other computers via a network. When the program is executed by the computer, the program is stored in a hard disk device or the like in the computer, loaded into the main memory and executed.

【０１３９】[0139]

【実施例】本発明の実施例として、必要な記憶容量を、
第５の実施の形態による場合と、従来技術の場合とを定
量的に比較することにする。EXAMPLE As a working example of the present invention,
The case of the fifth embodiment and the case of the conventional technique will be quantitatively compared.

【０１４０】図２７は、表記の単語とそれに対応するよ
みの単語の集合との対応関係を示す図である。図中左側
に「表記で表される単語」が示されており、右側に「対
応するよみ」が示されている。例えば、表記が「Ａ」の
場合、「あるふぁ」とよむ場合もあれば、「えー」とよ
む場合もある。FIG. 27 is a diagram showing a correspondence relationship between a written word and a set of reading words corresponding thereto. "Words expressed by notation" are shown on the left side of the figure, and "corresponding readings" are shown on the right side. For example, when the notation is “A”, it may be called “Arufa” or “er”.

【０１４１】図２８は、よみの単語とそれに対応する表
記の単語の集合との対応関係を示す図である。図中左側
に「よみで表される単語」が示されており、右側に「対
応する表記」が示されている。例えば、よみが「あ」の
場合、その表記は、「あ」「ア」「亜」「阿」「在」
「有」など多数ある。FIG. 28 is a diagram showing a correspondence relationship between a reading word and a set of written words corresponding to the reading word. In the figure, "words represented by readings" are shown on the left side, and "corresponding notations" are shown on the right side. For example, when the reading is "A", the notation is "A""A""A""A""A"
There are many such as "Yes".

【０１４２】図２７、図２８には、先頭の８語について
示しているが、全体では、表記単語は９３，４５２語、
よみ単語は６８，８１９語から成る。図２９は、第５の
実施の形態におけるインデックス部の情報量を示す図で
ある。このように、本発明を用いて、表記単語とよみ単
語の対応データから、表記単語からよみ単語集合へのイ
ンデックスおよびよみ単語から表記単語へのインデック
スをそれぞれ作成した結果、トライは１，２５７，５７
９．０バイト、ポインタ・テーブルは４９４，０８５．
０バイト、インデックス全体は１，７５１，６６４．０
バイトの記憶容量となった。27 and 28 show the first eight words, the total number of written words is 93,452 words.
The reading word consists of 68,819 words. FIG. 29 is a diagram showing the information amount of the index part in the fifth embodiment. Thus, using the present invention, as a result of creating an index from a notation word to a reading word set and an index from the reading word to the notation word from the correspondence data of the notation word and the reading word, the trie is 1,257,57.
9.0 bytes, the pointer table is 494,085.
0 bytes, the whole index is 1,751,664.0
It has become a storage capacity of bytes.

【０１４３】そこで、従来技術の説明における第１の従
来例、すなわち、単語の集合を固定長または可変長の文
字列として格納するレコード構造のデータをインデック
スとは別個に用意する方法で、同様の機能を果すための
情報を格納した。Therefore, the first conventional example in the description of the conventional technique, that is, a method of preparing record structure data for storing a set of words as a fixed-length or variable-length character string separately from the index, Stored information for performing functions.

【０１４４】図３０は、第１の従来例における情報量を
示す図である。第１の従来例では、テキストとその参照
テーブルは２，０４２，５７０．５バイト、トライは
１，２５７，５７９．０バイト、ポインタ・テーブルは
４９４，０８５．０バイト、インデックス全体は３，７
９４，２３４．５バイトの記憶容量となった。FIG. 30 is a diagram showing the amount of information in the first conventional example. In the first conventional example, the text and its reference table are 2,042,570.5 bytes, the trie is 1,257,579.0 bytes, the pointer table is 494,085.0 bytes, and the entire index is 3,7.
It has a storage capacity of 94,234.5 bytes.

【０１４５】また、従来技術の説明における第２の従来
例、すなわち、トライ・インデックスを単語集合のデー
タとみなして、単語の末尾に対応するトライ中のノード
の識別番号を単語へのポインタとする方法で，同様の機
能を果すための情報を格納した。A second conventional example in the description of the prior art, that is, the trie index is regarded as word set data, and the identification number of the node in the trie corresponding to the end of the word is used as a pointer to the word. By the way, we stored the information to perform the same function.

【０１４６】図３１は、第２の従来例における情報量を
示す図である。第２の従来例では、トライ・インデック
スに、２，４６９，９９６．５バイト程度、ポインタ・
テーブルは４９４，０８５．０バイト、そしてインデッ
クス全体として２，９６４，０８１．５バイトの記憶容
量を必要とすると予想できる。予想において、おおよそ
のトライデータのサイズから親ノードへのリンクに必要
なデータ幅は２．５バイトと仮定し、トライのノードの
数Ｎとし、親ノードへのリンクに必要なデータ容量Ｌ
を、Ｌ＝Ｎ×２．５( バイト) と計算した。FIG. 31 is a diagram showing the amount of information in the second conventional example. In the second conventional example, the trie index contains about 2,469,996.5 bytes and a pointer
The table can be expected to require 494,085.0 bytes and the total index storage capacity of 2,964,081.5 bytes. In the prediction, it is assumed that the data width required for linking to the parent node is 2.5 bytes from the approximate size of the trie data, and the number of try nodes is N, and the data capacity L required for linking to the parent node is L.
Was calculated as L = N × 2.5 (bytes).

【０１４７】以上の結果に基づいて、第５の実施の形態
と従来例とを比較した。図３２は、第５の実施の形態と
従来技術との情報量の比較結果を示す図である。Based on the above results, the fifth embodiment and the conventional example were compared. FIG. 32 is a diagram showing a result of comparison of information amount between the fifth embodiment and the conventional technique.

【０１４８】この比較結果から、本発明の第５の実施の
形態は、第１の従来例に比べてトライを除くインデック
スは１９．５％、全体のインデックスは４６．２％、第
２の従来例に比べてトライは５０．９％、全体のインデ
ックスは５９．１％の記憶容量しか必要としないことが
分かる。したがって、本発明によって、必要な記憶容量
の著しい削減効果が得られたと言える。From this comparison result, in the fifth embodiment of the present invention, the index excluding the tries is 19.5%, the overall index is 46.2%, and the second conventional example is compared with the first conventional example. It can be seen that the try requires only 50.9% and the overall index requires only 59.1% of the storage capacity. Therefore, it can be said that the present invention achieves a significant reduction effect of the required storage capacity.

【０１４９】[0149]

【発明の効果】以上説明したように本発明では、単語の
集合を、深さ優先順にノードが記録されるトライ形式に
したがって格納し、トライにおいて単語を構成している
ノードの位置をトライ中の単語を一意に識別できる値と
して用いたため、親のノードへのリンク情報を用いるこ
となく、任意のノードを含む経路を特定することができ
る。また、トライにおけるノードの位置をポインタとし
て、トライにおける単語を参照するようにしたため、ト
ライ形式の単語集合とは別個に単語集合を表すデータを
用意する必要がない。その結果、必要な記憶容量は、従
来技術の場合に比べて、著しく少なくて済む。特に、キ
ーインデックス格納手段を単語格納手段に含まれる単語
を構成する文字のうち、単語の先頭文字および末尾文字
を除いたすべての文字と、各文字を表しているノードの
位置とを対応付けるようにしたので、キーインデックス
格納手段に必要な記憶容量を減らすことができる。 As described above, according to the present invention, a set of words is stored according to a try format in which nodes are recorded in depth-first order, and the positions of the nodes forming the words in the try are checked during the try. Since the word is used as a value that can be uniquely identified, a route including an arbitrary node can be specified without using the link information to the parent node. Moreover, since the position of the node in the trie is used as a pointer to refer to the word in the trie, it is not necessary to prepare data representing the word set separately from the word set in the trie format. As a result, the required storage capacity is significantly smaller than in the case of the prior art. In particular,
-Words included in the word storage means
Of the letters that make up the first and last letters of a word
Of all the characters except for and the node that represents each character
Since it corresponds to the position, the key index
The storage capacity required for the storage means can be reduced.

【０１５０】[0150]

[Brief description of drawings]

【図１】本発明の原理構成図である。FIG. 1 is a principle configuration diagram of the present invention.

【図２】ノードの位置からそのノードを含む単語または
単語の集合を求めるアルゴリズムを示すフローチャート
である。FIG. 2 is a flowchart showing an algorithm for obtaining a word or a set of words including a node from the position of the node.

【図３】単語へのキーを入力とする単語検索装置の原理
構成図である。FIG. 3 is a principle configuration diagram of a word search device that inputs a key to a word.

【図４】本発明の第１の実施の形態を示すブロック図で
ある。FIG. 4 is a block diagram showing a first embodiment of the present invention.

【図５】単語の集合の例を示す図である。FIG. 5 is a diagram showing an example of a set of words.

【図６】深さ優先順にノードが記録されたトライの例を
示す図である。FIG. 6 is a diagram showing an example of a trie in which nodes are recorded in the depth priority order.

【図７】トライ・インデックスの例を示す図である。FIG. 7 is a diagram showing an example of a trie index.

【図８】関連語対応テーブルを示す図である。FIG. 8 is a diagram showing a related word correspondence table.

【図９】第１の実施の形態における単語検索手順を示す
フローチャートである。FIG. 9 is a flowchart showing a word search procedure in the first embodiment.

【図１０】第２の実施の形態の概略構成を示す図であ
る。FIG. 10 is a diagram showing a schematic configuration of a second embodiment.

【図１１】文字インデックスの例を示す図である。FIG. 11 is a diagram showing an example of a character index.

【図１２】第２の実施の形態において文書集合を求める
ための処理手順を示すフローチャート（その１）であ
る。FIG. 12 is a flowchart (No. 1) showing a processing procedure for obtaining a document set in the second embodiment.

【図１３】第２の実施の形態において文書集合を求める
ための処理手順を示すフローチャート（その２）であ
る。FIG. 13 is a flowchart (No. 2) showing a processing procedure for obtaining a document set in the second embodiment.

【図１４】第２の実施の形態において文書集合を求める
ための処理手順を示すフローチャート（その３）であ
る。FIG. 14 is a flowchart (No. 3) showing a processing procedure for obtaining a document set in the second embodiment.

【図１５】第３の実施の形態の概略構成を示すブロック
図である。FIG. 15 is a block diagram showing a schematic configuration of a third embodiment.

【図１６】関数Ｆ１の処理手順を示すフローチャート
（その１）である。FIG. 16 is a flowchart (part 1) showing the processing procedure of the function F1.

【図１７】関数Ｆ１の処理手順を示すフローチャート
（その２）である。FIG. 17 is a flowchart (part 2) showing the processing procedure of the function F1.

【図１８】関数Ｆ２の処理手順を示すフローチャートで
ある。FIG. 18 is a flowchart showing a processing procedure of a function F2.

【図１９】関数Ｆ３の処理手順を示すフローチャートで
ある。FIG. 19 is a flowchart showing a processing procedure of a function F3.

【図２０】第３の実施の形態におけるトライの例を示す
図である。FIG. 20 is a diagram showing an example of a trie according to the third embodiment.

【図２１】第３の実施の形態における文字インデックス
の例を示す図である。FIG. 21 is a diagram showing an example of a character index according to the third embodiment.

【図２２】「イ」をとりあげた場合の有限オートマトン
の遷移図である。FIG. 22 is a transition diagram of a finite state automaton when “a” is selected.

【図２３】ノード位置Ｐ１（＝１０）からたどることの
できるサブトライ「Ｔ１」を示す図である。FIG. 23 is a diagram showing a sub-trie “T1” that can be traced from the node position P1 (= 10).

【図２４】第４の実施の形態の概略構成を示すブロック
図である。FIG. 24 is a block diagram showing a schematic configuration of a fourth embodiment.

【図２５】第５の実施の形態の概略構成を示すブロック
図である。FIG. 25 is a block diagram showing a schematic configuration of a fifth embodiment.

【図２６】第６の実施の形態の概略構成を示すブロック
図である。FIG. 26 is a block diagram showing a schematic configuration of a sixth embodiment.

【図２７】表記の単語とそれに対応するよみの単語の集
合との対応関係を示す図である。FIG. 27 is a diagram showing a correspondence relationship between a written word and a set of reading words corresponding to the written word.

【図２８】よみの単語とそれに対応する表記の単語の集
合との対応関係を示す図である。FIG. 28 is a diagram showing a correspondence relationship between a reading word and a set of written words corresponding to the reading word.

【図２９】第５の実施の形態におけるインデックス部の
情報量を示す図である。FIG. 29 is a diagram showing the information amount of the index part in the fifth embodiment.

【図３０】第１の従来例における情報量を示す図であ
る。FIG. 30 is a diagram showing the amount of information in the first conventional example.

【図３１】第２の従来例における情報量を示す図であ
る。FIG. 31 is a diagram showing the amount of information in the second conventional example.

【図３２】第５の実施の形態と従来技術との情報量の比
較結果を示す図である。FIG. 32 is a diagram showing a result of comparison of information amount between the fifth embodiment and the conventional technique.

[Explanation of symbols]

１単語格納手段２単語検索手段１１単語格納手段１２キーインデックス格納手段１３ノード位置検索手段１４単語検索手段 1 word storage means 2 Word search means 11 word storage means 12-key index storage means 13 node position search means 14 word search means

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開昭63−12043（ＪＰ，Ａ) 特開平１−214970（ＪＰ，Ａ) 増市博、他３名，形態素解析を用いた全文検索システムとその応用，情報処理学会研究報告自然言語処理ＮＬ−102 −３，1994年７月21日，第94巻，第63 号，ｐ．17−24 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/30 ─────────────────────────────────────────────────── ─── Continuation of the front page (56) References JP-A 63-12043 (JP, A) JP-A 1-214970 (JP, A) Masuichi Hiroshi, 3 others, Full-text search system using morphological analysis And its application, IPSJ Research Report Natural Language Processing NL-102 -3, July 21, 1994, Vol. 94, No. 63, p. 17-24 (58) Fields surveyed (Int.Cl. ⁷ , DB name) G06F 17/30

Claims

(57) [Claims]

1. A word search device for searching for a word from a word set, wherein a word storage unit stores a set of words associated with a node according to a trie format in which nodes are recorded in depth priority order, and When the position of the node in the word storage means is input, the tries of the word storage means are traced in order from the root, the route to the node at the input position is obtained, and all the routes after the obtained route are traced. All the words corresponding to the nodes reached by, and a word search means for outputting a set of the acquired words, a key corresponding to the words included in the word storage means,
Corresponds and stores the positions of the nodes that make up the word
Key index storage means and any key in the key index storage means is input.
Then, it is input from the key index storage means.
And get the set of node positions corresponding to the key
Output the set of node positions to the word search means
And a node position search means for performing the key index storage means in the word storage means.
Show all the letters that make up the included words and each letter
Corresponding to the position of the existing node, and the word storage means
Of the letters that make up the word included in
And all characters except the last character and each character
A word search device characterized in that it is associated with the position of a node that is present .

2. When a regular expression is input, the input positive
The regular expression is analyzed, and the node position search means
If you pass a character in the expression and receive a set of node positions
Together, the word search means
You pass it a set of words and it matches the regular expression you have entered.
It also has a regular expression analysis means that outputs a set of matching words.
The word search device according to claim 1, wherein:

3. The word storage means pairs a word with each word.
It is stored as a pair with the pointer to the corresponding information.
The word search means is provided for each node from the word storage means.
A collection of pointers to the words represented by and the information corresponding to the words
The word search according to claim 1, wherein the word is output.
apparatus.

4. The word storing means has a depth
According Tra i format de is recorded, corresponding to the node
It stores a set of attached words, and
Corresponds to the set of node positions of related words related to words
When a word is input, it is input from the word storage means.
A set of node positions of words related to the selected word,
The set of acquired node positions to the word search means
It further has a node position searching means for outputting
The word search device according to claim 1.

5. The word storage means is represented by notation.
Nodes are recorded in the order of depth,
Notation words and each notation, as well as being stored in a try format
Nodes of reading words that make up a character string corresponding to a word
A reading index that stores it in association with the set of positions
The storage means and the set of words represented by the reading
Stored in a trie format in which nodes are recorded in order of priority
Together with the reading word and the notation word sentence corresponding to each reading word
Correlate with the set of node positions that make up the string
It consists of the notation index storage means that is stored and
5. The word search device according to claim 4, wherein
Place

6. A computer is searched for a word from a word set.
A computer that records a word search program for
In a readable recording medium, according to a trie format in which nodes are recorded in depth priority order.
A single word that stores the set of words associated with the node.
The word storage means and the position of the node in the word storage means are input.
Then, following the trial of the word storage means in order from the root
The route to the node at the input position,
To a node that follows all routes after the route
Acquires all the corresponding words and outputs the acquired word set.
Forces word search means, a key corresponding to the words contained in the word storage means, each
Corresponds and stores the positions of the nodes that make up a word
All the words that make up the words contained in the word storage means
Corresponds characters to the position of the node that represents each character
And sentences forming the words included in the word storage means
All letters except the first and last letters of the word
Corresponds all characters to the position of the node that represents each character
Key index storage means to be attached , and any key in the key index storage means is input.
Then, it is input from the key index storage means.
And get the set of node positions corresponding to the key
Output the set of node positions to the word search means
A word search program for making a computer function as a node position search means
A computer-readable recording medium on which a ram is recorded.