JP3452353B2

JP3452353B2 - Recording medium recording dictionary data structure, dictionary lookup method, phrase acquisition method, dictionary lookup apparatus, phrase acquisition apparatus, recording medium recording program

Info

Publication number: JP3452353B2
Application number: JP2000202127A
Authority: JP
Inventors: 由嘉里金田
Original assignee: 株式会社ジャストシステム
Priority date: 2000-07-04
Filing date: 2000-07-04
Publication date: 2003-09-29
Anticipated expiration: 2020-07-04
Also published as: JP2002024233A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、暗号化された辞書
とこれを用いた高速な辞書引きとを実現するのに好適
な、辞書データ構造を記録した記録媒体、辞書引き方
法、語句取得方法、辞書引き装置、語句取得装置、なら
びに、プログラムを記録した記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a recording medium recording a dictionary data structure, a dictionary lookup method, and a word acquisition method suitable for realizing an encrypted dictionary and a high-speed dictionary lookup using the encrypted dictionary. The present invention relates to a dictionary lookup device, a phrase acquisition device, and a recording medium recording a program.

【０００２】[0002]

【従来の技術】従来から、ワードプロセッサなどで用い
られる日本語仮名漢字変換では、文字の並びが辞書に登
録されているか否かを高速に判断するため、辞書をトラ
イ（trie）構造のデータとして記録媒体に記録する手法
が提案されている。図１は、かな漢字変換用辞書の一部
を実現するトライ構造の概要を示す模式図である。2. Description of the Related Art Conventionally, in Japanese Kana-Kanji conversion used in a word processor or the like, a dictionary is recorded as data of a trie structure in order to quickly determine whether or not a character sequence is registered in the dictionary. A method of recording on a medium has been proposed. FIG. 1 is a schematic diagram showing an outline of a trie structure that realizes a part of a kana-kanji conversion dictionary.

【０００３】本図に示すように、このトライ構造では、・読み「わ」に対して綴り「和」「話」等が、・読み「わた」に対して綴り「綿」「わた」等が、・読み「わたくし」に対して綴り「私」等が、・読み「わたし」に対して綴り「私」「渡し」「わた
し」等が、それぞれ記録されている。As shown in the figure, in this trie structure, the spelling "wa" and "story" are read for the reading "wa", and the spelling "cotton""wata" and the like for the reading "wata". , ・ The spelling "I" etc. is recorded for the reading "Watakushi", and the spelling "I""passing""I" etc. is recorded for the reading "I".

【０００４】本図に示すように、トライ構造は一般に木
構造で表され、開始ノード１０１から順に、調べたい仮
名文字列中の文字１０２が割り当てられた矢印１０３に
沿ってノード１０４を移動する。また、図中の終了文字
（#）１０５は、そこまでの読みで登録された綴りデー
タ１０６があることを示す。As shown in the figure, the trie structure is generally represented by a tree structure, and the node 104 is moved in order from the start node 101 along the arrow 103 to which the character 102 in the kana character string to be examined is assigned. Further, the end character (#) 105 in the figure indicates that there is the spelling data 106 registered by reading up to that point.

【０００５】一方、ワードプロセッサにおいても、文章
の校正や要約、検索などの必要性から、形態素解析を行
うことも多い。形態素解析では、綴りを分割して、それ
ぞれの読みなどの単語情報を得ることを目的とする。形
態素解析においても、やはりトライ構造の辞書が用いら
れる。On the other hand, even in a word processor, morphological analysis is often performed because of the necessity of proofreading, summarizing, searching, etc. of sentences. The purpose of morphological analysis is to divide spelling and obtain word information such as readings. In morphological analysis, a trie structure dictionary is also used.

【０００６】図２は、形態素解析用辞書の一部を実現す
るトライ構造の概要を示す模式図である。なお、本図に
おいて、上述の図と同様の機能を果たす要素には、同じ
符号を付してある。FIG. 2 is a schematic diagram showing an outline of a trie structure for realizing a part of the morphological analysis dictionary. Note that, in this figure, elements having the same functions as those in the above-mentioned figures are denoted by the same reference numerals.

【０００７】本図に示すように、このトライ構造では、・綴り「私」に対して読み「わたし」「わたくし」
「し」などの形態素情報が、・綴り「私用」に対して読み「しよう」などの形態素情
報が、・綴り「わた」に対して読み「わた」などの形態素情報
が、・綴り「わたくし」に対して読み「わたくし」などの形
態素情報が、・綴り「わたし」に対して読み「わたし」などの形態素
情報が、それぞれ記録されている。As shown in the figure, in this trie structure, the spelling "I" is read for "I""Wakushi"
Morphological information such as "shi" is read for spelling "private" Morphological information such as "Let's" is read for spelling "wata" Morphological information such as "wata" is spelled "watakushi" Morphological information such as reading "Watakushi" is recorded for the spelling "I", and morphological information such as reading "I" is recorded for the spelling "I".

【０００８】開始ノード１０１、文字１０２、矢印１０
３、ノード１０４、終了文字（#）１０５を用いてトラ
イ構造が実現されているのは、上述の図に示すものと同
様であるが、綴りデータ１０６のかわりに読みデータ１
０７が記録されている。Start node 101, character 102, arrow 10
The trie structure is realized by using the node No. 3, the node 104, and the end character (#) 105 as in the case of the above-described figure, but the reading data 1 is used instead of the spelling data 106.
07 is recorded.

【０００９】さらに、これらの辞書を１つのトライ構造
に融合することもできる。図３は、上記の辞書の一部を
融合したトライ構造の概要を示す模式図である。なお、
本図において、上述の図と同様の機能を果たす要素に
は、同じ符号を付してある。なお、終了文字は、これに
対応する情報が綴り情報か読み情報かを区別するため、
図中では前者を#１、後者を#２で図示している。Further, these dictionaries can be fused into one trie structure. FIG. 3 is a schematic diagram showing an outline of a trie structure in which a part of the above-mentioned dictionary is fused. In addition,
In this figure, the elements having the same functions as those in the above-mentioned figures are designated by the same reference numerals. Note that the end character distinguishes whether the corresponding information is spelling information or reading information,
In the figure, the former is shown as # 1 and the latter as # 2.

【００１０】また、このようなトライ構造の辞書を配列
を用いて実現する手法については、たとえば以下の文献
に開示されている。・青江順一「ダブル配列による高速ディジタル検索アル
ゴリズム」（電子情報通信学会論文誌, D, Vol.171-D,
No.9, pp.1592-1600, 1988年9月）・青江順一「自然言語辞書の検索ダブル配列による高
速ディジタル検索アルゴリズム」（bit, Vol.21, No.6,
pp.776-784, 1989年5月）A method for realizing such a trie-structured dictionary by using an array is disclosed in the following document, for example.・ Junichi Aoe "High-speed digital search algorithm using double array" (IEICE Transactions, D, Vol. 171-D,
No.9, pp.1592-1600, September 1988) ・ Junichi Aoe “High-speed digital search algorithm using search double array of natural language dictionary” (bit, Vol.21, No.6,
pp.776-784, May 1989)

【００１１】[0011]

【発明が解決しようとする課題】従来の手法では、綴り
データ１０６や読みデータ１０７にこれらの情報がその
まま格納されている。このような辞書を作成して配布す
るにあたって問題となるのが、辞書に記録されたデータ
の盗用、特に、登録された単語の綴りや読みのデータの
盗用である。このため、トライ構造による辞書引きの利
点を維持したまま、単語データの暗号化を図る手法が望
まれている。In the conventional method, such information is stored as it is in the spelling data 106 and the reading data 107. A problem in creating and distributing such a dictionary is plagiarism of data recorded in the dictionary, particularly plagiarism of registered word spelling and reading data. Therefore, there is a demand for a method for encrypting word data while maintaining the advantage of dictionary lookup by the trie structure.

【００１２】本発明の目的は、暗号化されたトライ構造
の辞書により単語の読みや綴りのデータの盗用を防止
し、当該辞書を用いて高速な辞書引きや読み、綴りを復
元する手法を実現することにある。An object of the present invention is to realize a method of preventing the reading and spelling of word data from being stolen by using an encrypted trie-structured dictionary and restoring the dictionary, reading and spelling at high speed using the dictionary. To do.

【００１３】[0013]

【課題を解決するための手段】上記目的を達成するた
め、本発明の第１の観点に係る辞書データ構造を記録し
たコンピュータ読取可能な記録媒体は、任意の長さの語
句と、当該語句の情報データと、を対応付けて保持する
辞書データ構造を記録し、以下のように構成する。To achieve the above object, a computer-readable recording medium in which a dictionary data structure according to a first aspect of the present invention is recorded is a phrase of an arbitrary length, and a phrase of the phrase. A dictionary data structure that holds information data and data in association with each other is recorded and configured as follows.

【００１４】すなわち、当該辞書データ構造は、開始ノ
ードn₁を蓄積する。That is, the dictionary data structure stores the start node n ₁ .

【００１５】ここで、当該辞書データ構造は、長さsの
文字の列c₁, c₂, …, c_sにて表現される語句と、当該語
句の情報データdと、を対応付けて保持する場合、
（ａ）前記開始ノードn₁と前記文字c₁との対(n₁,c₁)
と、１対１に対応させてノードn₂を蓄積し、（ｂ）整数
i (2≦i≦s)について、ノードn_iと前記文字c_iとの対(n
_i,c_i)と、１対１に対応させてノードn_i+1を蓄積し、
（ｃ）ノードn_s+1に１対１に対応させて情報データdを
蓄積する。Here, the dictionary data structure holds a phrase expressed by a character string c ₁ , c ₂ , ..., C _s having a length s and information data d of the phrase in association with each other. If you do
(A) A pair of the start node n ₁ and the character c ₁ (n ₁ , c ₁ )
And node n ₂ is stored in a one-to-one correspondence, and (b) an integer
For i (2 ≦ i ≦ s) , pairs of the character c _i and node n _i (n
_i , c _i ) and the nodes n _{i + 1} are stored in a one-to-one correspondence,
(C) The information data d is stored in one-to-one correspondence with the node n _{s + 1} .

【００１６】本発明では、整数i (1≦i≦s)について対
(n_i,c_i)と、次のノードn_i+1とが１対１に対応付けられ
て蓄積されるため、情報データdの中に語句を表現する
文字の列c₁, c₂, …, c_sをそのまま蓄積しておく必要が
ない。このため、綴りや読みのデータを暗号化すること
ができる。In the present invention, the pair for integer i (1≤i≤s)
Since (n _i , c _i ) and the next node n _{i + 1} are stored in a one-to-one correspondence, they are stored in the information data d as character strings c ₁ , c ₂ , …, It is not necessary to store c _s as it is. Therefore, the spelling and reading data can be encrypted.

【００１７】一方、１対１の対応付けがされているた
め、トライ構造の高速な検索手法をそのまま適用するこ
とができる。On the other hand, since there is a one-to-one correspondence, the high speed search method of the trie structure can be applied as it is.

【００１８】本発明の第２の観点に係る辞書引き方法
は、上記の辞書データ構造を記憶する記録媒体から、長
さsの文字の列e₁, e₂, …e_sにより表現される語句に対
応付けられて保持される情報データを引き、先頭取得工
程と、順次取得工程と、データ出力工程と、を備えるよ
うに構成する。A dictionary lookup method according to a second aspect of the present invention is a word / phrase expressed by a character string e ₁ , e ₂ , ... E _s of length s from a recording medium storing the above-mentioned dictionary data structure. The information data held in association with the above is subtracted, and a head acquisition step, a sequential acquisition step, and a data output step are provided.

【００１９】ここで、先頭取得工程では、前記開始ノー
ドn₁と前記文字e₁との対(n₁,e₁)に、１対１に対応させ
て蓄積されたノードn₂を取得する。Here, in the head acquisition step, the node n ₂ accumulated in a one-to-one correspondence with the pair (n ₁ , e ₁ ) of the start node n ₁ and the character e ₁ is acquired.

【００２０】一方、順次取得工程では、整数i (2≦i≦
s)について、先に取得されたノードn _iと前記文字e_iとの
対(n_i,e_i)に、１対１に対応させて蓄積されたノードn
_i+1を順次取得する。On the other hand, in the sequential acquisition process, the integer i (2 ≦ i ≦
node n obtained earlier for (s) _iAnd the letter e_iWith
Pair (n_i, e_i), A node n stored in a one-to-one correspondence
_{i + 1}Are sequentially acquired.

【００２１】さらに、データ出力工程では、前記取得さ
れたノードn_s+1に１対１に対応させて蓄積された情報デ
ータdを取得して出力する。Further, in the data output step, the information data d accumulated in a one-to-one correspondence with the obtained node n _{s + 1} is obtained and outputted.

【００２２】本発明により、上記記録媒体から文字列に
より表現される語句の情報データを高速に得ることがで
きる。According to the present invention, information data of a phrase expressed by a character string can be obtained at high speed from the recording medium.

【００２３】また、本発明の辞書引き方法は、報告工程
をさらに備えるように構成することができる。Further, the dictionary lookup method of the present invention can be configured to further include a reporting step.

【００２４】ここで、報告工程では、前記先頭取得工程
もしくは前記順次取得工程にて、１対１に対応させて蓄
積されたノードが取得できない場合、当該語句に対応付
けられて保持される情報データはない旨を報告する。Here, in the reporting step, if the nodes accumulated in a one-to-one correspondence in the head acquisition step or the sequential acquisition step cannot be acquired, the information data held in correspondence with the word / phrase is held. Report that there is no.

【００２５】本発明により、文字列により表現される語
句が辞書に登録されていない場合は、その旨の報告を高
速に得ることができる。According to the present invention, when the word / phrase expressed by the character string is not registered in the dictionary, a report to that effect can be obtained at high speed.

【００２６】本発明の第３の観点に係る語句取得方法
は、上記の辞書データ構造を記憶する記録媒体に保持さ
れた情報データdから、これに対応付けられて保持され
る語句を得るものであって、ノード取得工程と、先頭取
得工程と、順次取得工程と、文字列出力工程と、を備え
るように構成する。A word / phrase acquisition method according to a third aspect of the present invention obtains a word / phrase held in association with information data d held in a recording medium storing the above-mentioned dictionary data structure. Therefore, a node acquisition process, a head acquisition process, a sequential acquisition process, and a character string output process are provided.

【００２７】ここで、ノード取得工程では、情報データ
dに１対１に対応させて、もしくは、情報データdに参照
されて蓄積されたノードm₀を取得する。Here, in the node acquisition step, information data
The node m ₀ stored in one-to-one correspondence with d or by being referred to by the information data d is acquired.

【００２８】一方、先頭取得工程では、前記取得された
ノードm₀に１対１に対応させて蓄積された対(m₁,e₁)を
取得する。On the other hand, in the head acquisition step, the pair (m ₁ , e ₁ ) stored in one-to-one correspondence with the acquired node m ₀ is acquired.

【００２９】さらに、順次取得工程では、整数j (1≦
j)について、先に取得されたノードm _jに１対１に対応さ
せて蓄積された対(m_j+1,e_j+1)を順次取得する。Furthermore, in the sequential acquisition step, the integer j (1 ≦
For j), the node m obtained earlier _jOne-to-one correspondence with
The accumulated pairs (m_{j + 1}, e_{j + 1}) Are sequentially acquired.

【００３０】そして、文字列出力工程では、前記開始ノ
ードn₁が順次得られた対のいずれかの先頭の要素m_kと一
致する場合、順次得られた対の末尾の要素の列e_k,
e_k-1, …e₂, e₁を、前記情報データdに対応付けられて
保持された語句を表現する文字の列として出力する。Then, in the character string output step, if the start node n ₁ matches any one of the leading elements m _k of the sequentially obtained pairs, the sequence e _k of the last elements of the sequentially obtained pairs,
e _k−1 , ... E ₂ , e ₁ are output as a string of characters expressing a word held in association with the information data d.

【００３１】本発明により、暗号化された辞書の単語デ
ータdから、当該単語の読みや綴りの文字列を復元する
ことができる。According to the present invention, it is possible to restore the reading or spelling character string of the word from the word data d of the encrypted dictionary.

【００３２】本発明の第４の観点に係る辞書引き装置
は、上記の辞書データ構造を記憶する記録媒体から、長
さsの文字の列e₁, e₂, …e_sにより表現される語句に対
応付けられて保持される情報データを引くものであり、
先頭取得部と、順次取得部と、データ出力部と、を備え
るように構成する。According to a fourth aspect of the present invention, there is provided a dictionary lookup device, wherein a phrase expressed by a character string e ₁ , e ₂ , ... E _{s having} a length s from a recording medium storing the above dictionary data structure. The information data held in association with
A head acquisition unit, a sequential acquisition unit, and a data output unit are provided.

【００３３】ここで、先頭取得部は、前記開始ノードn₁
と前記文字e₁との対(n₁,e₁)に、１対１に対応させて蓄
積されたノードn₂を取得する。In this case, the head acquisition unit is the start node n ₁
And a node n ₂ accumulated in a one-to-one correspondence with the pair (n ₁ , e ₁ ) of the character e ₁ and the character e ₁ .

【００３４】一方、順次取得部は、整数i (2≦i≦s)に
ついて、先に取得されたノードn_iと前記文字e_iとの対(n
_i,e_i)に、１対１に対応させて蓄積されたノードn_i+1を
順次取得する。On the other hand, the sequential acquisition unit, for an integer i (2 ≦ i ≦ s), pairs (n) of the node n _i acquired previously and the character e _i.
_i , e _i ) and the nodes n _{i + 1} stored in a one-to-one correspondence are sequentially acquired.

【００３５】さらに、データ出力部は、前記取得された
ノードn_s+1に１対１に対応させて蓄積された情報データ
dを取得して出力する。Furthermore, the data output unit is configured to store the information data stored in one-to-one correspondence with the acquired node n _{s + 1.}
Get d and output it.

【００３６】また、本発明の辞書引き装置は、報告部を
さらに備えるように構成することができる。The dictionary lookup apparatus of the present invention can be configured to further include a reporting unit.

【００３７】ここで、報告部は、前記先頭取得部もしく
は前記順次取得部において、１対１に対応させて蓄積さ
れたノードが存在しないため取得ができない場合、当該
語句に対応付けられて保持される情報データはない旨を
報告する。Here, when the report part cannot be acquired in the head acquisition part or the sequential acquisition part because there is no node accumulated in a one-to-one correspondence, the report part is held in association with the relevant phrase. Report that there is no information data available.

【００３８】本発明の第５の観点に係る語句取得装置
は、上記の辞書データ構造を記憶する記録媒体に保持さ
れた情報データdから、これに対応付けられて保持され
る語句を得るものであって、ノード取得部と、先頭取得
部と、順次取得部と、文字列出力部と、を備えるように
構成する。A word / phrase acquisition device according to a fifth aspect of the present invention obtains a word / phrase held in association with information data d held in a recording medium storing the above-mentioned dictionary data structure. Therefore, the node acquisition unit, the head acquisition unit, the sequential acquisition unit, and the character string output unit are provided.

【００３９】ここで、ノード取得部は、情報データdに
１対１に対応させて、もしくは、情報データdに参照さ
れて蓄積されたノードm₀を取得する。Here, the node acquisition unit acquires the node m ₀ accumulated in one-to-one correspondence with the information data d, or by being referred to by the information data d.

【００４０】一方、先頭取得部は、前記取得されたノー
ドm₀に１対１に対応させて蓄積された対(m₁,e₁)を取得
する。On the other hand, the head acquisition unit acquires the pair (m ₁ , e ₁ ) stored in one-to-one correspondence with the acquired node m ₀ .

【００４１】さらに、順次取得部は、整数j (1≦j)に
ついて、先に取得されたノードm_jに１対１に対応させて
蓄積された対(m_j+1,e_j+1)を順次取得する。Furthermore, the sequential acquisition unit, for the integer j (1≤j), stores the pair (m _{j + 1} , e _{j + 1} ) stored in one-to-one correspondence with the node m _j acquired previously. Are sequentially acquired.

【００４２】そして、文字列出力部は、前記開始ノード
n₁が前記順次得られた対のいずれかの先頭の要素m_kと一
致する場合、前記順次得られた対の末尾の要素の列e_k,
e_k-1, …e₂, e₁を、前記情報データdに対応付けられて
保持された語句を表現する文字の列として出力する。Then, the character string output unit is the start node
If n ₁ matches the first element m _k of any of the sequentially obtained pairs, the last element sequence e _k , of the sequentially obtained pair
e _k−1 , ... E ₂ , e ₁ are output as a string of characters expressing a word held in association with the information data d.

【００４３】本発明の第６の観点に係る辞書データ構造
を記録したコンピュータ読取可能な記録媒体は、上記辞
書データ構造を記録した記録媒体を、以下のように構成
する。A computer-readable recording medium having a dictionary data structure recorded according to a sixth aspect of the present invention has the following structure of the recording medium having the dictionary data structure recorded therein.

【００４４】すなわち、前記辞書データ構造は、配列BA
SEと配列CHECKとを用いて、語句と、ノードと、その情
報データとを保持する。That is, the dictionary data structure is the array BA.
SE and array CHECK are used to hold words, nodes, and their information data.

【００４５】一方、また、開始ノードn₁と、保持される
語句を表現する文字の列c₁, c₂, …, c_sと、これに対応
付けられるノードn_2,…, n_s, n_s+1と、の情報をそれぞ
れ整数で表現する。On the other hand, the start node n ₁ , a string of characters c ₁ , c ₂ , ..., C _s representing the held word and the nodes n _2, ..., N _s , n associated with it The information of _{s + 1} and is represented by integers.

【００４６】さらに、整数i (1≦i≦s)について、前記
配列BASEのn_i番目の要素BASE[n_i]と、前記配列CHECKのn
_i+1番目の要素CHECK[n_i+1]と、に対して、２つの条件 n_i+1 = BASE[n_i] + c_i CHECK[n_i+1] = n_i がいずれも満たされるように保持する。Furthermore, for an integer i (1 ≦ i ≦ s), the n _i- th element BASE [n _i ] of the array BASE and the n-th element of the array CHECK.
_{For the i +} 1th element CHECK [n _{i + 1} ] and two conditions n _{i + 1} = BASE [n _i ] + c _i CHECK [n _{i + 1} ] = n _i are both satisfied. To hold.

【００４７】本発明により、配列を用いて暗号化された
トライ構造の辞書を実現することができる。According to the present invention, it is possible to realize a dictionary having a trie structure encrypted by using an array.

【００４８】本発明の第７の観点に係る辞書引き方法
は、上記辞書データ構造を記憶する記録媒体から、長さ
sの文字の列e₁, e₂, …e_sにより表現される語句に対応
付けられて保持される情報データを引くものであって、
順次取得工程と、データ出力工程と、を備えるように構
成する。According to a seventh aspect of the present invention, there is provided a dictionary look-up method in which the length of a dictionary is stored in a recording medium storing the dictionary data structure.
The information data held by being associated with the phrase expressed by the character string of s e ₁ , e ₂ , ... e _s ,
It is configured to include a sequential acquisition step and a data output step.

【００４９】ここで、順次取得工程では、整数i (1≦i
≦s)について、前記配列BASEのn_i番目の要素BASE[n_i]と
文字e_iとの和t_i = BASE[n_i] + e_iが、条件CHECK[t_i] =
n_iを満たす場合、整数n_i+1 = t_iとして順次取得する。Here, in the sequential acquisition step, the integer i (1 ≦ i
≤ s), the sum of the n _i- th element BASE [n _i ] of the array BASE and the character e _i is t _i = BASE [n _i ] + e _i , and the condition CHECK [t _i ] =
When n _i is satisfied, it is sequentially acquired as an integer n _{i + 1} = t _i .

【００５０】一方、データ出力工程では、前記取得され
た整数n_s+1に１対１に対応されて蓄積された情報データ
dを取得して出力する。On the other hand, in the data output step, information data accumulated in a one-to-one correspondence with the obtained integer n _{s + 1}
Get d and output it.

【００５１】また、本発明の辞書引き方法は、報告工程
をさらに備えるように構成することができる。The dictionary lookup method of the present invention can be configured so as to further include a reporting step.

【００５２】ここで、報告工程では、前記順次取得工程
において、前記条件が満たされない場合、当該語句に対
応付けられて保持される情報データはない旨を報告す
る。Here, in the reporting step, if the condition is not satisfied in the sequential acquisition step, it is reported that there is no information data held in association with the word.

【００５３】本発明の第８の観点にかかる語句取得方法
は、上記辞書データ構造を記憶する記録媒体に保持され
た情報データdから、これに対応付けられて保持される
語句を得るものであって、ノード取得工程と、順次取得
工程と、文字列出力工程と、を備えるように構成する。A word / phrase acquisition method according to an eighth aspect of the present invention is to obtain a word / phrase held in association with the information data d held in a recording medium storing the dictionary data structure. Then, a node acquisition process, a sequential acquisition process, and a character string output process are provided.

【００５４】ここで、ノード取得工程では、情報データ
dに１対１に対応させて、もしくは、情報データdに１対
１に参照されて蓄積されたノードの情報を表現する整数
m₀を取得する。Here, in the node acquisition step, information data
An integer that represents the information of the node that is stored in a one-to-one correspondence with d or with a one-to-one reference to the information data d
Get m ₀ .

【００５５】一方、順次取得工程では、整数j (0≦j)
について、先に取得された整数m_jから、整数m_j+1 = CHE
CK[m_j]と、文字整数e_j+1 = m_j - BASE[m_j]とを順次取得
する。On the other hand, in the sequential acquisition process, an integer j (0 ≦ j)
For the integer m _j obtained earlier, the integer m _{j + 1} = CHE
CK [m _j ] and the character integer e _{j + 1} = m _j -BASE [m _j ] are sequentially acquired.

【００５６】さらに、文字列出力工程では、前記開始ノ
ードの情報を表現する整数n₁と、前記順次得られたいず
れかの整数m_kと、が等しい場合、前記順次得られた文字
整数の列e_k, e_k-1, …e₂, e₁を、前記情報データdに対
応付けられて保持された語句を表現する文字の列として
出力する。Further, in the character string output step, when the integer n ₁ expressing the information of the start node is equal to any of the sequentially obtained integers m _k , the sequence of the sequentially obtained character integers The e _k , e _k-1 , ... E ₂ , e ₁ are output as a string of characters expressing the word held in association with the information data d.

【００５７】本発明の第９の観点に係る辞書引き装置
は、上記辞書データ構造を記憶する記録媒体から、長さ
sの文字の列e₁, e₂, …e_sにより表現される語句に対応
付けられて保持される情報データを引くものであって、
順次取得部と、データ出力部と、を備えるように構成す
る。According to a ninth aspect of the present invention, there is provided a dictionary look-up device, in which a length of a dictionary medium for storing the dictionary data structure is
The information data held by being associated with the phrase expressed by the character string of s e ₁ , e ₂ , ... e _s ,
It is configured to include a sequential acquisition unit and a data output unit.

【００５８】ここで、順次取得部では、整数i (1≦i≦
s)について、前記配列BASEのn_i番目の要素BASE[n_i]と文
字e_iとの和t_i = BASE[n_i] + e_iが、条件CHECK[t_i] = n_i
を満たす場合、整数n_i+1 = t_iとして順次取得する。Here, in the sequential acquisition unit, the integer i (1 ≦ i ≦
s), the sum of the n _i- th element BASE [n _i ] of the array BASE and the character e _i is t _i = BASE [n _i ] + e _i , and the condition CHECK [t _i ] = n _i
When it satisfies, it is sequentially acquired as an integer n _{i + 1} = t _i .

【００５９】一方、データ出力部では、前記取得された
整数n_s+1に１対１に対応されて蓄積された情報データd
を取得して出力する。On the other hand, in the data output section, the information data d stored in a one-to-one correspondence with the obtained integer n _{s + 1.}
To get and output.

【００６０】また、本発明の辞書引き装置は、報告部を
さらに備えるように構成することができる。Further, the dictionary lookup apparatus of the present invention can be configured to further include a reporting unit.

【００６１】ここで、報告部は、前記順次取得部におい
て、前記条件が満たされない場合、当該語句に対応付け
られて保持される情報データはない旨を報告する。Here, if the condition is not satisfied in the sequential acquisition unit, the report unit reports that there is no information data held in association with the word / phrase.

【００６２】本発明の第１０の観点に係る語句取得装置
は、上記辞書データ構造を記憶する記録媒体に保持され
た情報データdから、これに対応付けられて保持される
語句を得るものであって、ノード取得部と、順次取得部
と、文字列出力部と、を備えるように構成する。A word / phrase acquisition device according to a tenth aspect of the present invention obtains a word / phrase held in association with information data d held in a recording medium storing the dictionary data structure. And a node acquisition unit, a sequential acquisition unit, and a character string output unit.

【００６３】ここで、ノード取得部は、情報データdに
１対１に対応させて、もしくは、情報データdに１対１
に参照されて蓄積されたノードの情報を表現する整数m₀
を取得する。Here, the node acquisition unit makes one-to-one correspondence with the information data d, or one-to-one with the information data d.
An integer m ₀ that represents the information of the node that is accumulated by being referred to
To get.

【００６４】一方、順次取得部は、整数j (0≦j)につ
いて、先に取得された整数m_jから、整数m_j+1 = CHECK[m
_j]と、文字整数e_j+1 = m_j - BASE[m_j]とを順次取得す
る。[0064] On the other hand, sequential acquisition unit, for integer j (0 ≦ j), the integer m _j obtained above, the integer m _{j + 1} = CHECK [m
_j ] and the character integer e _{j + 1} = m _j -BASE [m _j ] are sequentially acquired.

【００６５】さらに、文字列出力部は、前記開始ノード
の情報を表現する整数n₁と、前記順次得られたいずれか
の整数m_kと、が等しい場合、前記順次得られた文字整数
の列e_k, e_k-1, …e₂, e₁を、前記情報データdに対応付
けられて保持された語句を表現する文字の列として出力
する。Further, when the integer n ₁ expressing the information of the start node is equal to any of the sequentially obtained integers m _k , the character string output unit outputs the sequence of the sequentially obtained character integers. The e _k , e _k-1 , ... E ₂ , e ₁ are output as a string of characters expressing the word held in association with the information data d.

【００６６】本発明の第１１の観点に係るプログラムを
記録したコンピュータ読取可能な記録媒体は、コンピュ
ータを、上記辞書引き装置、または、上記語句取得装置
として機能させるプログラムを記録するように構成す
る。A computer-readable recording medium recording a program according to an eleventh aspect of the present invention is configured to record a program that causes a computer to function as the dictionary lookup device or the phrase acquisition device.

【００６７】[0067]

【発明の実施の形態】以下、添付図面を参照して、本発
明の実施の形態について説明する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the accompanying drawings.

【００６８】（情報処理装置の概要構成）図４は、本発
明の辞書引き装置、もしくは、語句取得装置として機能
する典型的な情報処理装置（汎用のコンピュータ、各種
端末、携帯端末、携帯電話、ゲーム装置などの専用機器
を含む）の概要構成を示す模式図である。以下、図４を
参照して説明する。(Schematic Configuration of Information Processing Apparatus) FIG. 4 is a typical information processing apparatus (general-purpose computer, various terminals, mobile terminal, mobile phone, etc.) that functions as a dictionary lookup apparatus or word acquisition apparatus of the present invention. FIG. 3 is a schematic diagram showing a schematic configuration of a dedicated device such as a game device). Hereinafter, description will be given with reference to FIG.

【００６９】情報処理装置４０１は、ＣＰＵ（Central
Processing Unit；中央処理ユニット）４０２によって
制御される。情報処理装置４０１に電源を投入すると、
ＣＰＵ４０２は、ＲＯＭ（Read Only Memory；読出専用
メモリ）４０３に記憶されたＩＰＬ（Initial Program
Loader；初期プログラムローダ）を実行する。The information processing device 401 has a CPU (Central
Processing Unit (Central Processing Unit) 402. When the information processing device 401 is powered on,
The CPU 402 is an IPL (Initial Program) stored in a ROM (Read Only Memory) 403.
Loader; initial program loader) is executed.

【００７０】ＩＰＬは、ハードディスク４０４、ＦＤ
（Floppy Disk；フロッピー（登録商標）ディスク）ド
ライブ４１０に装着されたＦＤ、ＣＤ−ＲＯＭ（Compac
t Disk ROM）ドライブ４１１に装着されたＣＤ−ＲＯＭ
などの記録媒体に記憶されたＯＳ（Operating System；
オペレーティング・システム）プログラムを読み出して
実行するプログラムである。The IPL is a hard disk 404, FD
(Floppy Disk; FD, CD-ROM (Compac
t Disk ROM) CD-ROM installed in the drive 411
OS (Operating System) stored in a recording medium such as
Operating system) A program that reads and executes a program.

【００７１】ＯＳを起動した後、ＣＰＵ４０２は、キー
ボード４０５やマウス４０６などにより入力されたユー
ザの指示にしたがって、あるいは、ハードディスクなど
にあらかじめ記述された設定ファイルの内容にしたがっ
て、ハードディスクなどに記憶されたアプリケーション
プログラムを実行する。After booting the OS, the CPU 402 is stored in the hard disk or the like according to the user's instruction input through the keyboard 405 or the mouse 406, or according to the contents of the setting file previously described in the hard disk or the like. Run the application program.

【００７２】なお、携帯端末などの小規模な情報処理装
置では、ＩＰＬ自体がＯＳやアプリケーションプログラ
ムとしての機能を果たすような実施形態も採用すること
ができる。In a small-scale information processing device such as a mobile terminal, an embodiment in which the IPL itself functions as an OS or an application program can be adopted.

【００７３】ＣＰＵ４０２は、プログラムの実行の際
に、ＲＡＭ（Random Access Memory）４０７を一時的な
作業用記憶領域として用いる。このほか、一時的な作業
用記憶領域として、ＣＰＵ４０２内に設けられたレジス
タやキャッシュ（図示せず）が使われる。The CPU 402 uses a RAM (Random Access Memory) 407 as a temporary work storage area when executing a program. In addition, a register or a cache (not shown) provided in the CPU 402 is used as a temporary work storage area.

【００７４】プログラムの実行に伴ない、ユーザに結果
を報告したり、途中経過を見せるため、ＣＰＵ４０２
は、液晶ディスプレイやＣＲＴ（Cathode Ray Tube）な
どの表示装置４０８に当該情報を表示することができ
る。マウス４０６による指示操作では、マウス４０６を
移動することにより、画面に表示されたカーソルが移動
し、マウス４０６をクリックすることにより、カーソル
が指すメニュー項目を選択することができる。In order to report the result to the user or show the progress along with the execution of the program, the CPU 402
Can display the information on a display device 408 such as a liquid crystal display or a CRT (Cathode Ray Tube). In the instruction operation using the mouse 406, the cursor displayed on the screen is moved by moving the mouse 406, and the menu item pointed to by the cursor can be selected by clicking the mouse 406.

【００７５】情報処理装置４０１は、ＮＩＣ（Network
Interface Card）やモデムなどのインターフェース４０
９を介してインターネットなどのコンピュータ通信網と
通信を行うことができる。インターフェース４０９を介
して受信した文書データを処理の対象としたり、処理し
た結果をインターフェース４０９を介して送信したり、
インターフェース４０９を介して受信したプログラムを
実行したり、などができる。The information processing device 401 is a NIC (Network
Interface Card) or modem interface 40
9 can communicate with a computer communication network such as the Internet. The document data received via the interface 409 can be processed, the processed result can be transmitted via the interface 409,
The program received via the interface 409 can be executed and so on.

【００７６】（トライ構造の概要）図５は、図３に示す
融合されたトライ構造の辞書と同等の内容を有する辞書
を、本発明の手法により構成したものの様子を示す模式
図である。以下、本図を参照して説明する。(Outline of Trie Structure) FIG. 5 is a schematic diagram showing a state in which a dictionary having the same contents as the fused trie structure dictionary shown in FIG. 3 is constructed by the method of the present invention. Hereinafter, description will be given with reference to this figure.

【００７７】ある文字の列がこの辞書５０１に登録され
ているか否かを調べる場合には、まず、開始ノード１０
１から文字列中の文字を順に調べ、その文字１０２が割
り当てられた矢印１０３に沿ってノード１０４を移行し
ていけばよい。当該１０４から終了文字１０５が割り当
てられた矢印１０３が伸びている場合には、そこまでの
文字列に対応する語句が登録されていることを意味す
る。伸びた矢印１０３に指されるノード１０４が最後の
ノードであり、互いに重複しない番号がふられている。
なお、図示しないが、最後のノード１０４以外のノード
１０４にも、互いに重複しない番号（ノード番号）が付
与されている。To check whether or not a character string is registered in this dictionary 501, first, the start node 10
The characters in the character string are sequentially examined from 1, and the node 104 may be moved along the arrow 103 to which the character 102 is assigned. When the arrow 103 to which the end character 105 is assigned extends from the relevant 104, it means that the word or phrase corresponding to the character string up to that point is registered. The node 104 pointed to by the extended arrow 103 is the last node and is given a number that does not overlap with each other.
Although not shown, the nodes 104 other than the last node 104 are also given numbers (node numbers) that do not overlap with each other.

【００７８】最後のノード１０４からは、さらに、綴り
データ１０６や読みデータ１０７を指す矢印が伸びてい
る。綴りデータ１０６や読みデータ１０７には、当該綴
りや読みを表す番号が割り当てられている。An arrow pointing to the spelling data 106 and the reading data 107 further extends from the last node 104. A number indicating the spelling or reading is assigned to the spelling data 106 or the reading data 107.

【００７９】本発明の手法により構成したトライ構造の
辞書５０１は、以下の情報が記録されている。・読み「わたくし」（最終ノード番号９）に対する綴り
「私」（番号２１）。・綴り「わたくし」（最終ノード番号１０）に対する読
み「わたくし」（番号９）。・読み「わたし」（最終ノード番号１２）に対する綴り
「私」（番号２１）、「わたし」（番号１３）。・綴り「わたし」（最終ノード番号１３）に対する読み
「わたし」（番号１２）。・綴り「私」（最終ノード番号２１）に対する読み「わ
たくし」（番号９）「わたし」（番号１２）。The following information is recorded in the dictionary 501 having a trie structure constructed by the method of the present invention.・ The spelling "I" (number 21) for the reading "Watakushi" (final node number 9). The reading "Watakushi" (number 9) for the spelling "Watakushi" (final node number 10) Spelling "I" (number 21), "I" (number 13) for reading "I" (final node number 12). The reading "I" (number 12) for the spelling "I" (final node number 13). The reading "Watakushi" (number 9) "I" (number 12) for the spelling "I" (final node number 21).

【００８０】当該綴りデータ１０６や読みデータ１０７
は、開始ノード１０１から当該綴りデータ１０６や読み
データ１０７に至るまでに走査した文字１０２の列から
なる語句の綴りや読みを格納するが、当該綴り情報や、
読み情報として、トライ構造辞書５０１のノード番号を
格納している。The spelling data 106 and reading data 107
Stores the spelling and reading of a word or phrase consisting of a sequence of characters 102 scanned from the start node 101 to the spelling data 106 and reading data 107.
The node number of the trie structure dictionary 501 is stored as the reading information.

【００８１】たとえば、開始ノード１０１から順に、
「わ」「た」「く」「し」「#１」とたどると、最後の
ノード１０４（番号９）を経て、読み「わたくし」に対
する綴り「私」の綴りデータ１０６が得られる。For example, in order from the start node 101,
By tracing "wa", "ta", "ku", "shi", "# 1", the spelling data 106 of the spelling "I" for the reading "watakushi" is obtained through the last node 104 (number 9).

【００８２】また、開始ノード１０１から「私」「#
２」とたどると、最後のノード１０４（番号２１）を経
て、綴り「私」に対する読み「わたし」「わたくし」の
読みデータ１０７が得られる。From the start node 101, "I" and "#"
2 ”, the reading data 107 of the reading“ I ”and“ Watakushi ”for the spelling“ I ”is obtained through the last node 104 (number 21).

【００８３】ここで、綴り「わたくし」に対する読みデ
ータ１０７として、番号９が格納されている。これは、
開始ノード１０１から順に「わ」「た」「く」「し」
「#１」とたどった場合の最後のノード１０４の番号で
ある。Here, the reading data for the spelling "Watakushi"
The number 9 is stored as the data 107 . this is,
“Wa” “ta” “ku” “shi” in order from the start node 101
This is the number of the last node 104 when “# 1” is traced.

【００８４】また、読み「わたくし」に対する読みデー
タ１０６として、番号２１が格納されている。これは、
開始ノード１０１から「私」「#２」と順にたどった場
合の最後のノード１０４の番号である。The number 21 is stored as the reading data 106 for the reading "Watakushi". this is,
This is the number of the last node 104 when “I” and “# 2” are sequentially traced from the start node 101.

【００８５】最後のノード１０４から開始ノード１０１
へ向かって矢印を逆にたどった場合に、その途中で出会
った文字を逆順に並べかえれば、綴りデータ１０６や読
みデータ１０７に格納されているノード番号から、読み
や綴りが復元できる。From the last node 104 to the start node 101
If the arrow is followed in the opposite direction and the characters encountered in the middle are rearranged in reverse order, the reading and spelling can be restored from the node numbers stored in the spelling data 106 and reading data 107.

【００８６】このため、綴りデータ１０６や読みデータ
１０７に読みや綴りをそのまま格納するのに比べ、辞書
５０１を盗用することが難しくなる。また、トライ構造
の木の中に読みや綴りが埋め込まれるため、辞書５０１
全体のサイズが小さくなることが期待される。For this reason, it becomes more difficult to plagiarize the dictionary 501 as compared with the case where the reading and the spelling are stored as they are in the spelling data 106 and the reading data 107. Also, since the reading and spelling are embedded in the tree of the trie structure, the dictionary 501
The overall size is expected to be smaller.

【００８７】（ダブル配列による実現）以下では、上記
のように構成されるトライ構造の辞書５０１がダブル配
列に格納される様子について説明する。(Realization by Double Array) The following describes how the tri-structure dictionary 501 configured as described above is stored in the double array.

【００８８】現在のノード１０４（ノード番号r）から
文字コードaの文字１０２により移行する矢印１０３が
存在するか否かを、２つの配列BASEとCHECKを用いて以
下のように判断する。すなわち、まず、 t ← BASE[r] + a を計算する。次に、 CHECK[t] = r が成立するか否かを調べる。成立すれば、次のノード１
０４のノード番号はtである。成立しなければ、次のノ
ードは存在しないことになる。Whether or not there is an arrow 103 that moves from the current node 104 (node number r) by the character 102 of the character code a is determined as follows using the two arrays BASE and CHECK. That is, first, t ← BASE [r] + a is calculated. Next, check whether CHECK [t] = r holds. If it holds, next node 1
The node number of 04 is t. If not, the next node does not exist.

【００８９】図６は、ダブル配列に実際に「わたし」
「わたくし」が登録される場合の数値例を示す説明図で
ある。なお、本図は、上記した例とノード番号が異なる
点に注意されたい。また、本図に示すものは、読みから
綴り情報などを得るための日本語仮名漢字変換用の辞書
であるが、綴りデータ１０６の図示を省略している。FIG. 6 shows that the double array is actually "I".
It is explanatory drawing which shows the numerical example in case "Watakushi" is registered. Note that the node numbers in this figure differ from the above example. Further, although the figure shows a dictionary for Japanese Kana-Kanji conversion for obtaining spelling information and the like from reading, spelling data 106 is not shown.

【００９０】本例においては、以下のように文字コード
を割り当てている。終了文字 → １「あ」「い」… → ２３ … 「き」「く」「け」… → ７８９ … 「し」… → １２ … 「た」… → １６ … 「わ」→ ４４In this example, character codes are assigned as follows. End character → 1 “A” “I”… → 2 3… “Ki” “Ku” “Ke”… → 7 8 9… “Shi”… → 12… “Ta”… → 16… “Wa” → 44

【００９１】図６に示す図には、配列の各要素を上下に
重なった箱（上の箱はBASE、下の箱はCHECK）で表現
し、さらに、その箱の上に、当該要素の添字を記載して
いる。また、開始ノードのノード番号は１である。この
場合に、「わたし」を走査していく手順について、以下
に簡単に説明する。In the diagram shown in FIG. 6, each element of the array is expressed by a box that is vertically overlapped (the upper box is BASE, the lower box is CHECK), and the subscript of the element is added on the box. Is described. The node number of the start node is 1. In this case, the procedure for scanning "I" will be briefly described below.

【００９２】まず、開始ノード１に対して、図示するよ
うにBASE[1] = 1であり、BASE[1] +わ(44) = 1 + 44 =
45である。CHECK[45] = 1であるから、「わ」で始まる
語句が登録されていることがわかる。First, for the start node 1, BASE [1] = 1 as shown in the figure, and BASE [1] + wa (44) = 1 + 44 =
45. Since CHECK [45] = 1, it can be seen that the words beginning with "wa" are registered.

【００９３】次にノード４５に対して、図示するように
BASE[45] = 2であり、BASE[45] +た(16) = 2 + 16 = 18
である。CHECK[18] = 45であるから、「わた」で始まる
語句が登録されていることがわかる。Next, as shown in FIG.
BASE [45] = 2 and BASE [45] + C (16) = 2 + 16 = 18
Is. Since CHECK [18] = 45, it can be seen that the words and phrases beginning with "wata" are registered.

【００９４】次にノード１８に対して、図示するように
BASE[18] = 1であり、BASE[18] +し(12) = 13である。C
HECK[13] = 18であるから、「わたし」で始まる語句が
登録されていることがわかる。Next, as shown in FIG.
BASE [18] = 1 and BASE [18] + then (12) = 13. C
Since HECK [13] = 18, it can be seen that the words beginning with "I" are registered.

【００９５】さらにノード１３に対して、BASE[13] = 1
であり、BASE[13] + #(1) = 2である。CHECK[2] = 13で
あるから、語句「わたし」が登録されていることがわか
る。Further, for node 13, BASE [13] = 1
And BASE [13] + # (1) = 2. Since CHECK [2] = 13, it can be seen that the word "I" is registered.

【００９６】また、最後のノード（番号e）に対するBAS
E[e]は、トライ構造のノードと矢印の関係を表現するに
は用いないですむため、ここに、綴りデータ１０６や読
みデータ１０７の識別番号やこれらのデータが格納され
るアドレス等を格納しておくことができる。The BAS for the last node (number e)
Since E [e] does not have to be used to express the relationship between the nodes of the trie structure and the arrows, the identification numbers of the spelling data 106 and reading data 107, the addresses where these data are stored, etc. are stored here. You can keep it.

【００９７】一方、これを逆にたどる手順について説明
する。辞書において、ノードrからノードtへ文字aによ
り移行することができる場合に、ノードtから文字aとノ
ードrを求めるためには、上記の関係から、以下のよう
に計算すればよい。すなわち、 r ← CHECK[t] a ← t - BASE[r]On the other hand, a procedure for tracing this in reverse will be described. In the dictionary, when it is possible to transfer from the node r to the node t by the character a, in order to obtain the character a and the node r from the node t, the following calculation may be performed from the above relationship. That is, r ← CHECK [t] a ← t-BASE [r]

【００９８】以下では、本図に示す例において、最後の
ノードのノード番号２から文字列を復元してみる。In the following, in the example shown in this figure, the character string is restored from the node number 2 of the last node.

【００９９】まず、 CHECK[2] = 13, 2 - BASE[13] = 2 - 1 = 1 より、文字「#（１）」と前のノード１３が得られる。First, CHECK [2] = 13, 2-BASE [13] = 2-1 = 1 As a result, the character “# (1)” and the previous node 13 are obtained.

【０１００】次に、 CHECK[13] = 18, 13 - BASE[18] = 13 - 1 = 12 より、文字「し（１２）」と前のノード１８が得られ
る。Next, from CHECK [13] = 18, 13-BASE [18] = 13-1 = 12, the character "shi (12)" and the previous node 18 are obtained.

【０１０１】さらに、 CHECK[18] = 45, 18 - BASE[45] = 18 - 2 = 16 より、文字「た（１６）」と前のノード４５が得られ
る。Further, from CHECK [18] = 45, 18-BASE [45] = 18-2 = 16, the character "ta (16)" and the previous node 45 are obtained.

【０１０２】さらに、 CHECK[45] = 1, 45 - 1 = 44 より、文字「わ（４４）」と、前のノード１が得られ
る。これは開始ノードであるため、ここで上記の繰り返
しを終了する。順に得られた文字「#」「し」「た」
「わ」を逆に並べれば「わたし」が得られる。Further, from CHECK [45] = 1, 45-1 = 44, the character "wa (44)" and the previous node 1 are obtained. Since this is the start node, the above iteration ends here. The characters “#” “shi” “ta” obtained in order
If you arrange "wa" in reverse, you will get "me".

【０１０３】このように、ダブル配列を用いれば、最後
のノードのノード番号から、当該ノードに至るまでトラ
イ構造をたどるための文字の列を得ることができる。こ
れは、あるノードrとある文字aにより別のノードtへ移
行する場合に、対(r,a)と番号tとが１対１に対応付けて
格納されているためである。As described above, by using the double array, it is possible to obtain a character string for tracing the trie structure from the node number of the last node to the node. This is because the pair (r, a) and the number t are stored in a one-to-one correspondence with each other when a certain node r and a certain character a move to another node t.

【０１０４】なお、ダブル配列に、順次語句を登録して
辞書を構成する手法については、上で参照した論文に開
示されているので、本願では説明を省略する。The method of sequentially registering words and phrases in the double array and constructing the dictionary is disclosed in the above-referenced paper, and the description thereof is omitted here.

【０１０５】（辞書引きの手順）図７は、ダブル配列を
用いた実施形態において、文字列c₁, c₂, …, c_s（c_sは
終了文字）が与えられた場合に、以下の語句が登録され
ているか否かを調べ、当該語句の情報（読みデータや綴
りデータ）を得る辞書引き処理の手順を示すフローチャ
ートである。以下、本図を参照して説明する。(Dictionary Lookup Procedure) FIG. 7 shows the following when the character strings c ₁ , c ₂ , ..., C _s (c _s is an end character) are given in the embodiment using the double array. It is a flowchart which shows the procedure of the dictionary look-up process which investigates whether the term is registered and acquires the information (reading data or spelling data) of the term. Hereinafter, description will be given with reference to this figure.

【０１０６】なお、これらの処理は、情報処理装置４０
１のＣＰＵ４０２がＲＡＭ４０７やハードディスク４０
４、ＣＤ−ＲＯＭドライブ４１１などを制御しつつ実行
する。また、レジスタやＲＡＭ４０７などに、あらかじ
め、変数t、変数r、変数i用の領域が確保されているも
のとする。Note that these processes are performed by the information processing device 40.
CPU 402 of No. 1 is RAM 407 or hard disk 40
4. The CD-ROM drive 411 and the like are controlled and executed. In addition, it is assumed that areas for variables t, r, and i are secured in advance in the register, the RAM 407, and the like.

【０１０７】まず、変数iに整数１を代入する（ステッ
プＳ７０１）。これは、何文字目までを走査したかを調
べるためのカウンタである。First, the integer 1 is assigned to the variable i (step S701). This is a counter for checking how many characters have been scanned.

【０１０８】ついで、変数rに開始ノードのノード番号
を代入する（ステップＳ７０２）。Then, the node number of the start node is assigned to the variable r (step S702).

【０１０９】次に、BASE[r] + c_iを計算して、結果を変
数tに代入する（ステップＳ７０３）。Next, calculate the BASE [r] + c _i, assign the result to the variable t (step S703).

【０１１０】さらに、CHECK[t] = rであるか否かを調べ
（ステップＳ７０４）、そうでない場合（ステップＳ７
０４；Ｎｏ）、当該文字列が登録されていない旨を報告
し（ステップＳ７０５）、本処理を終了する。Further, it is checked whether CHECK [t] = r (step S704), and if not (step S7).
04; No), and reports that the character string is not registered (step S705), and ends this processing.

【０１１１】一方、CHECK[t] = rである場合（ステップ
Ｓ７０４；Ｙｅｓ）、i = sであるか否かを調べ（ステ
ップＳ７０６）、そうでない場合（ステップＳ７０６；
Ｎｏ）、iの値を１増やし（ステップＳ７０７）、変数t
の値を変数rに代入して（ステップＳ７０８）、ステッ
プＳ７０３に戻る。On the other hand, if CHECK [t] = r (step S704; Yes), it is checked whether i = s (step S706), and if not (step S706;
No), the value of i is increased by 1 (step S707), and the variable t
Is assigned to the variable r (step S708), and the process returns to step S703.

【０１１２】一方、i = sである場合（ステップＳ７０
６；Ｙｅｓ）、すなわち、c_iが終了文字である場合、BA
SE[t]に記録されている情報の識別番号やアドレスを取
得して、当該情報を出力し（ステップＳ７０９）、本処
理を終了する。On the other hand, if i = s (step S70)
6; Yes), that is, if c _i is the ending character, BA
The identification number or address of the information recorded in SE [t] is acquired, the information is output (step S709), and this processing ends.

【０１１３】（語句取得の手順）図８は、綴りデータや
読みデータなどの情報に最後のノードのノード番号が格
納されている場合に、当該情報に至る語句を表す文字列
を得るための語句取得処理の手順を示すフローチャート
である。以下、本図を参照して説明する。なお、本手順
は、辞書引き処理の手順と同様に構成設定された情報処
理装置４０１において実行される。さらに、語句を表現
する文字列を格納するための配列eをあらかじめ用意し
ておく。(Procedure for obtaining a phrase) FIG. 8 is a phrase for obtaining a character string representing a phrase reaching the information when the node number of the last node is stored in information such as spelling data and reading data. It is a flowchart which shows the procedure of an acquisition process. Hereinafter, description will be given with reference to this figure. It should be noted that this procedure is executed in the information processing apparatus 401 that is configured and set similarly to the procedure of the dictionary lookup processing. Furthermore, an array e for storing a character string expressing a phrase is prepared in advance.

【０１１４】まず、変数iの値を１に設定する（ステッ
プＳ８０１）。First, the value of the variable i is set to 1 (step S801).

【０１１５】最後のノードのノード番号を変数tに代入
する（ステップＳ８０２）。The node number of the last node is assigned to the variable t (step S802).

【０１１６】次に、CHECK[t]の値を得て、これを変数r
に代入する（ステップＳ８０３）。Next, the value of CHECK [t] is obtained, and this is set to the variable r.
(Step S803).

【０１１７】さらに、rが開始ノードのノード番号に等
しいか否かを調べる（ステップＳ８０４）。等しくない
場合（ステップＳ８０４；Ｎｏ）、t - BASE[r]を計算
して、これを変数aに代入する（ステップＳ８０５）。Further, it is checked whether r is equal to the node number of the start node (step S804). If they are not equal (step S804; No), t-BASE [r] is calculated and is substituted for the variable a (step S805).

【０１１８】ついで、配列eのi番目の要素に、変数aの
値を代入する（ステップＳ８０６）。Then, the value of the variable a is substituted into the i-th element of the array e (step S806).

【０１１９】さらに、変数iの値を１増やし（ステップ
Ｓ８０７）、変数rの値を変数tに代入して（ステップＳ
８０８）、ステップＳ８０３に戻る。Further, the value of the variable i is incremented by 1 (step S807), and the value of the variable r is substituted into the variable t (step S807).
808) and the process returns to step S803.

【０１２０】一方、rが開始ノードのノード番号に等し
い場合（ステップＳ８０４；Ｙｅｓ）、配列の要素e[i-
1], e[i-2], …, e[1]をこの順に出力して（ステップＳ
８０９）、本処理を終了する。On the other hand, if r is equal to the node number of the start node (step S804; Yes), the element e [i-
1], e [i-2], ..., e [1] are output in this order (step S
809), and this processing ends.

【０１２１】[0121]

【発明の効果】以上説明したように、本発明によれば、
暗号化された辞書とこれを用いた高速な辞書引きとを実
現するのに好適な、辞書データ構造を記録した記録媒
体、辞書引き方法、語句取得方法、辞書引き装置、語句
取得装置、ならびに、プログラムを記録した記録媒体を
提供することができる。As described above, according to the present invention,
Suitable for realizing an encrypted dictionary and a high-speed dictionary lookup using the same, a recording medium recording a dictionary data structure, a dictionary lookup method, a phrase acquisition method, a dictionary lookup device, a phrase acquisition device, and A recording medium recording the program can be provided.

[Brief description of drawings]

【図１】かな漢字変換用辞書の一部を実現するトライ構
造の概要を示す模式図である。FIG. 1 is a schematic diagram showing an outline of a trie structure that realizes a part of a kana-kanji conversion dictionary.

【図２】形態素解析用辞書の一部を実現するトライ構造
の概要を示す模式図である。FIG. 2 is a schematic diagram showing an outline of a trie structure that realizes a part of a morphological analysis dictionary.

【図３】上記の辞書の一部を融合したトライ構造の概要
を示す模式図である。FIG. 3 is a schematic diagram showing an outline of a trie structure in which a part of the dictionary is fused.

【図４】本発明の辞書引き装置、もしくは、語句取得装
置として機能する典型的な情報処理装置の概要構成を示
す模式図である。FIG. 4 is a schematic diagram showing a schematic configuration of a typical information processing device that functions as a dictionary lookup device or a phrase acquisition device of the present invention.

【図５】本発明にかかるトライ構造の辞書の概要構成を
示す模式図である。FIG. 5 is a schematic diagram showing a schematic configuration of a dictionary having a trie structure according to the present invention.

【図６】本発明のトライ構造の辞書をダブル配列により
実現する場合の様子を示す説明図である。FIG. 6 is an explanatory diagram showing a state in which the trie-structured dictionary of the present invention is realized by a double array.

【図７】本発明の辞書引きの手順を示すフローチャート
である。FIG. 7 is a flowchart showing a dictionary lookup procedure of the present invention.

【図８】本発明の語句取得の手順を示すフローチャート
である。FIG. 8 is a flow chart showing the procedure of word acquisition according to the present invention.

[Explanation of symbols]

１０１開始ノード１０２文字１０３矢印１０４ノード１０５終了文字１０６綴りデータ１０７読みデータ４０１情報処理装置４０２ＣＰＵ４０３ＲＯＭ４０４ハードディスク４０５キーボード４０６マウス４０７ＲＡＭ４０８表示装置４０９インターフェース４１０ＦＤドライブ４１１ＣＤ−ＲＯＭドライブ 101 Start node 102 characters 103 arrow 104 nodes 105 end character 106 Spelling data 107 reading data 401 Information processing device 402 CPU 403 ROM 404 hard disk 405 keyboard 406 mouse 407 RAM 408 display device 409 interface 410 FD drive 411 CD-ROM drive

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平11−7451（ＪＰ，Ａ) 青江順一，自然言語辞書の検索ダブル配列による高速ディジタル検索アルゴリズム，ｂｉｔ，日本，共立出版，1989 年５月１日，第270号，第776〜784 頁長尾真、佐藤理史、黒橋禎夫、角田達彦，岩波講座ソフトウェア科学15 自然言語処理，日本，株式会社岩波書店, 1999年10月５日，第４刷，第250〜253 頁 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/30 - 17/30 419 G06F 17/22 520 ＪＩＣＳＴファイル（ＪＯＩＳ)─────────────────────────────────────────────────── ─── Continuation of the front page (56) References JP-A-11-7451 (JP, A) Jun'ichi Aoe, search of natural language dictionary High-speed digital search using double array algorithm, bit, Japan, Kyoritsu Publishing, 1989 May 1st, Issue 270, pp. 776-784 Makoto Nagao, Rifumi Sato, Sadao Kurohashi, Tatsuhiko Tsunoda, Iwanami Course Software Science 15 Natural Language Processing, Japan, Iwanami Shoten, October 5, 1999 , No. 4, pp. 250-253 (58) Fields surveyed (Int.Cl. ⁷ , DB name) G06F 17/30-17/30 419 G06F 17/22 520 JISST file (JOIS)

Claims

(57) [Claims]

1. Correspondence between reading of a phrase and spelling of the phrase by holding a character string and information data in association with each other.
A dictionary data structure to be attached and held , wherein the dictionary data structure accumulates a start node n ₁ , and the dictionary data structure is a character string c ₁ , c ₂ , ..., C of length s.
and _s, the information data d, the (a) the start node n ₁ and pair with the character _{_{_{c 1 (n 1, c 1}}} )
And node n ₂ is stored in a one-to-one correspondence, and (b) for integer i (2 ≦ i ≦ s), node n _i and the character
c _i and the pair (n _i, c _i) and, to a one-to-one correspondence to the storage node n _{i + 1,} the information data d by a one-to-one correspondence to the (c) the node n _{s + 1}
By accumulating, it is associated and retained (hereinafter, node
Let n _{s + 1} be the " final node for the character string c ₁ , c ₂ , ..., c _s "
Call. ), The dictionary data structure uses the first and second terminating characters that are distinguishable from other characters to read the phrase and
Spelling and (p) " string of characters obtained by adding the first terminating character to the string of the reading" and "information data of the spelling" are stored in association with each other, and (q) "the spelling The “character string in which the second terminal character is added to the character string of” and the “information data of the reading” are stored in association with each other, and (r) “information data of the spelling” is written with ““ Sentence
"Character string in which the second terminal character is added to the character string"
Information that refers to the “final node” is accumulated, and (s) “information data of the relevant reading” indicates ““ text of the relevant reading.
"Character string in which the first terminating character is added to the character string"
By accumulating information that refers to the "final node,"
Computer readable recording medium storing a dictionary data structure, characterized in that retaining attach response.

2. A storage medium storing the dictionary data structure according to claim 1, which is held in association with a word or phrase expressed by a character string e ₁ , e ₂ , ... E _{s having} a length s. A dictionary lookup method for pulling information data, which acquires a node n ₂ accumulated in a one-to-one correspondence with the pair (n ₁ , e ₁ ) of the start node n ₁ and the character e ₁ The acquisition step, and for the integer i (2 ≦ i ≦ s), stored in a one-to-one correspondence with the pair (n _i , e _i ) of the previously acquired node n _i and the character e _i . comprising a sequential acquisition step of sequentially acquiring the node n _{i + 1,} a data output step of the obtained node n _{s + 1} to obtain the stored information data d to correspond to one-to-one output, the , Spelling information data stored in association with a certain reading
When referring to,
It used after adding the terminator ", reading of the information data stored in association with some spelling
When you draw a
A dictionary lookup method characterized by using "the one with the addition of a terminal character" .

3. In the case where a node accumulated in a one-to-one correspondence cannot be acquired in the head acquisition step or the sequential acquisition step, it is reported that there is no information data stored in association with the word or phrase. The dictionary lookup method according to claim 2, further comprising a reporting step of:

4. A word / phrase acquisition method for obtaining a word / phrase held in association with the information data d held in a recording medium storing the dictionary data structure according to claim 1, the information data d to correspond to one-to-one, or a node acquisition step of acquiring node m ₀ stored is referred to the information data d, stored in correspondence with one-to-one to the node m ₀ of the acquired For the leading acquisition step of acquiring the pair (m ₁ , e ₁ ), and the integer j (1 ≦ j), 1 is set to the node m _j acquired earlier.
A sequential acquisition step of sequentially acquiring pairs (m _{j + 1} , e _{j + 1} ) stored in association with pair ₁ , and a start element n _k of one of the pairs from which the start node n ₁ is sequentially acquired , The sequence of the last elements of the pair obtained sequentially, e _k , e _k-1 , ... E ₂ , e ₁ of the character representing the word held in association with the information data d includes a character string output step of outputting as a column, and wherein in the character string output step, information of the information data is read
If it is data, remove the terminator at the end of the resulting phrase.
If the information string is the spelling information data, the obtained string of characters is output as the spelling.
Read the string of characters with the trailing terminating characters removed.
The phrase acquisition method is characterized by outputting as .

5. A recording medium storing the dictionary data structure according to claim 1, which is held in association with a word or phrase expressed by a character string e ₁ , e ₂ , ... E _{s having} a length s. A dictionary look-up device for drawing information data, which acquires a node n ₂ accumulated in a one-to-one correspondence with the pair (n ₁ , e ₁ ) of the start node n ₁ and the character e ₁ Acquiring unit, for the integer i (2 ≦ i ≦ s), stored in a one-to-one correspondence with the pair (n _i , e _i ) of the previously acquired node n _i and the character e _i . A sequential acquisition unit that sequentially acquires the nodes n _{i + 1,} and a data output unit that acquires and outputs the information data d stored in one-to-one correspondence with the acquired nodes n _{s + 1} ,
And spelling information data that is stored in association with a certain reading
When referring to,
It used after adding the terminator ", reading of the information data stored in association with some spelling
When you draw a
A dictionary lookup device characterized by using "the one with the addition of a terminal character" .

6. If the head acquisition unit or the sequential acquisition unit cannot acquire information because there is no node accumulated in a one-to-one correspondence, there is no information data stored in association with the word or phrase. The dictionary lookup apparatus according to claim 5, further comprising a reporting unit that reports the effect.

7. A word / phrase acquisition device for obtaining a word / phrase held in association with the information data d held in a recording medium storing the dictionary data structure according to claim 1, the information data d In a one-to-one correspondence with each other, or with reference to the information data d, to acquire the accumulated node m _0, and a node acquisition unit, and in a one-to-one correspondence with the acquired node m ₀ For the head acquisition unit that acquires the pair (m ₁ , e ₁ ), and the integer j (1 ≦ j), 1 is set for the node m _j acquired earlier.
A sequential acquisition unit that sequentially acquires pairs (m _{j + 1} , e _{j + 1} ) stored in association with pair ₁ , and the start node n ₁ is the head element m of one of the sequentially acquired pairs. _When it matches with _k , the sequence e _k , e _k-1 , ... E ₂ , e ₁ of the last element of the pair obtained in sequence is expressed as the word held in association with the information data d. A character string output unit for outputting as a character string , wherein the character string output unit is an information data reading device for reading the information data.
Remove the trailing terminator of the resulting phrase,
Character of the string is output as a spelling, when the information data is information data for spelling, obtained
Read the string of characters with the trailing terminating characters removed.
The phrase acquisition device is characterized by outputting as .

8. The dictionary data structure includes an array BASE and an array CH.
ECK is used to hold a word, a node, and its information data, and a start node n ₁ and a character string c ₁ , c ₂ , ..., c _s representing the held word, and The information of the associated nodes n _2, ..., N _s , n _{s + 1} is represented by integers, and for the integer i (1 ≦ i ≦ s), the n _i- th element BASE [n of the array BASE _i ] and the n _{i +} 1th element CHECK [n of the array CHECK
_{i + 1} ] and two conditions n _{i + 1} = BASE [n _i ] + c _i CHECK [n _{i + 1} ] = n _i are held so that both are satisfied. The recording medium according to claim 1.

9. A recording medium storing the dictionary data structure according to claim 8, which is held in association with a word or phrase expressed by a character string e ₁ , e ₂ , ... E _s of length s. A dictionary lookup method for pulling information data, wherein, for an integer i (1 ≦ i ≦ s), the sum of the n _i- th element BASE [n _i ] of the array BASE and the character e _i t _i = BASE [n _i ] + e _i is the condition
When CHECK [t _i ] = n _i is satisfied, a sequential acquisition step of sequentially acquiring as integer n _{i + 1} = t _i , and information accumulated in a one-to-one correspondence with the acquired integer n _{s + 1} A data output step of acquiring and outputting the data d,
A dictionary lookup method comprising:

10. The method according to claim 9, further comprising a reporting step of reporting that there is no information data held in association with the phrase when the condition is not satisfied in the sequential acquisition step. How to look up the dictionary.

11. A phrase acquisition method for obtaining a phrase retained in association with an information data d retained in a recording medium storing the dictionary data structure according to claim 8, the information data d The node acquisition step of acquiring the integer m ₀ expressing the information of the node accumulated in one-to-one correspondence with or in reference to the information data d, and the integer j (0 ≦ j) are acquired first. From the integer m _j
The integer m _{j + 1} = CHECK [m _j ] and the character integer e _{j + 1} = m _j -BASE [m
_j ] and a sequential acquisition step of sequentially acquiring, and an integer n ₁ expressing the information of the start node, and any of the sequentially obtained integers m _k , if the sequentially obtained character integer A character string output step of outputting the strings e _k , e _k-1 , ... E ₂ , e ₁ as a string of characters expressing a word held in association with the information data d. How to get words and phrases.

12. A storage medium, which stores the dictionary data structure according to claim 8, is held in association with a word or phrase represented by a character string e ₁ , e ₂ , ... E _{s having} a length s. A dictionary look-up device for drawing information data, wherein, for an integer i (1 ≦ i ≦ s), the sum of the n _i- th element BASE [n _i ] of the array BASE and the character e _i t _i = BASE [n _i ] + e _i is the condition
When CHECK [t _i ] = n _i is satisfied, a sequential acquisition unit that sequentially acquires as an integer n _{i + 1} = t _i , and information accumulated in a one-to-one correspondence with the acquired integer n _{s + 1} A dictionary lookup device, comprising: a data output unit that acquires and outputs data d.

13. The sequential acquisition unit further comprises a reporting unit for reporting that there is no information data held in association with the phrase when the condition is not satisfied. The dictionary lookup device described.

14. A word / phrase acquisition device for obtaining a word / phrase held in association with the information data d held in a recording medium storing the dictionary data structure according to claim 8, the information data d The node acquisition unit that acquires the integer m ₀ that represents the information of the node that is stored in a one-to-one correspondence with or in reference to the information data d, and the integer j (0 ≦ j) From the integer m _j
The integer m _{j + 1} = CHECK [m _j ] and the character integer e _{j + 1} = m _j -BASE [m
_j ] and a sequential acquisition unit that sequentially acquires the integer n ₁ that represents the information of the start node, and any of the sequentially obtained integers m _k , if the sequentially obtained character integer A character string output unit that outputs the strings e _k , e _k-1 , ... E ₂ , e ₁ as a string of characters expressing a word held in association with the information data d. A phrase acquisition device.

15. A computer recording a program for causing a computer to function as the dictionary lookup device according to claim 5, 6, 12 or 13 or the word acquisition device according to claim 7 or 14. A readable recording medium.