JPH05266067A

JPH05266067A - Data retrieving device

Info

Publication number: JPH05266067A
Application number: JP4066079A
Authority: JP
Inventors: Jinchiai Ro; ロ・ジンチァイ; Chishien Rin; リン・チシェン
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1992-03-24
Filing date: 1992-03-24
Publication date: 1993-10-15
Anticipated expiration: 2013-09-10
Also published as: JP2795038B2

Abstract

PURPOSE:To efficiently retrieve desired data from a dictionary within a minimum retrieval range by a corresponding address concerning the relevant flag when there are the corresponding data by firstly checking whether there are data corresponding to a retrieval code or not by the setting of the corresponding flag. CONSTITUTION:A retrieval part 14 detects the block corresponding to the retrieval code from a corresponding part 16 with the front part of the retrieval code inputted from an input part 11 as a retrieval key. Next, the corresponding flag of the detected corresponding block is checked by one part next to the utilized front part of the retrieval code, when the relevant flag shows zero, it is expressed that there are no corresponding data, and a message showing that the relevant data are not discovered is displayed by an output part 19. When the relevant flag is '1', it means that there are the corresponding data, and the data in this range are read from a dictionary 18 by the corresponding address of the relevant flag and the next address. A detection part 17 detects the corresponding data coincident to the retrieval code.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、言語翻訳装置、ワード
プロセッサ、デスクトップパブリッシングシステムなど
の入力、変換システムに高速的に対応する情報を検出で
きるデータ検索装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data retrieval device capable of detecting information corresponding to an input / translation system such as a language translation device, a word processor and a desktop publishing system at high speed.

【０００２】[0002]

【従来の技術】データ検索装置の実用性はデータの記憶
空間及び検索速度にかかわっている。従来、大容量辞書
データに関しての記憶方式及び検索方式としては例え
ば、日本特開昭62-197822号に載せられている方法があ
る。該当実施例のシステムブロック図は図４(a)に示さ
れている。マスタインデックス、サブインデックス、辞
書本体は図４(b)のようにファイルに記憶されて、辞書
本体312は固定の長さのブロックに分けられていて、各
ブロックの先頭見出し語のみをサブインデックス311に
記憶する。サブインデックスも固定の長さのブロックに
分けられていて、各ブロックの先頭見出し語のみをマス
タインデックス310に記憶する。図４(b)のａ11,ａ21,ａ
31,…はすべて辞書本体内の先頭見出し語列であるの
で、サブインデックスに記憶される。サブインデックス
の先頭見出し語ａ11,ｂ11,…,ｘ11も同様にマスタイン
デックスに記憶される。2. Description of the Related Art The practicality of a data retrieval device is concerned with the storage space and retrieval speed of data. Conventionally, as a storage method and a search method for large-capacity dictionary data, there is a method described in Japanese Patent Laid-Open No. 62-197822. A system block diagram of the embodiment is shown in FIG. 4 (a). The master index, the sub index, and the dictionary body are stored in a file as shown in FIG. 4B, and the dictionary body 312 is divided into blocks of fixed length, and only the head entry word of each block is stored in the sub index 311. Remember. The sub index is also divided into blocks having a fixed length, and only the head entry word of each block is stored in the master index 310. A11, a21, a in FIG. 4 (b)
Since 31, ... Are all headword strings in the dictionary body, they are stored in the sub-index. The head entry words a11, b11, ..., X11 of the sub index are similarly stored in the master index.

【０００３】マスタインデックスファイル310をマスタ
インデックス領域307に読み込み、検索すべき文字列
が、マスタインデックスファイルの見出し語列のｎ番目
と一致もしくは、ｎ番目と(ｎ＋１)番目の間にあること
を順次に検索（第１次）する。次にサブインデックスフ
ァイルのｎ番目のセクタをバッファ領域308に読み込
む。検索すべき文字列が、読み出したサブインデックス
の見出し語列のｍ番目と一致もしくは、ｍ番目と(ｍ＋
１)番目の間にあることを順次に検索（第２次）する。
次に、サブインデックスの各セクタごとに設定されてい
る辞書データファイルに対するオフセット値ｋを用い
て、辞書データファイルの(ｋ＋ｍ)番目のセクタを読み
出す。読み出した辞書データファイルの見出し語と検索
すべき文字列とを比較し、一致する見出し語を検索（第
３次）する。The master index file 310 is read into the master index area 307, and it is sequentially determined that the character string to be searched for is the same as the nth index word in the master index file or between the nth and (n + 1) th. Search (primary). Next, the nth sector of the sub index file is read into the buffer area 308. The character string to be searched matches with the m-th index word string of the read sub index, or the m-th and (m +
1) It is sequentially searched for the second position (secondary).
Next, the (k + m) th sector of the dictionary data file is read using the offset value k for the dictionary data file set for each sector of the sub index. The headword of the read dictionary data file is compared with the character string to be searched, and the matching headword is searched (third order).

【０００４】[0004]

【発明が解決しようとする課題】上記の従来例では、入
力された発声記号列を予め処理しないままに、検索に用
いるようにすると、マスタインデックス、サブインデッ
クス、辞書本体に記憶されている照合用の発声記号列の
長さが決まっていないので、三回順次に違う長さの発声
記号列に対して検索するため、逐次的に検索空間を縮め
ることができない。もちろん、検索効率を上げることが
できない。In the above-mentioned conventional example, if the input utterance symbol string is used for the retrieval without being processed in advance, the collation stored in the master index, the sub-index, and the dictionary body will be used. Since the length of the voicing symbol string is not determined, the voicing symbol strings having different lengths are searched three times in sequence, so that the search space cannot be reduced sequentially. Of course, the search efficiency cannot be improved.

【０００５】また、元の発声記号列をそのまま利用して
検索することは、辞書本体内の発声記号列の長さが不
定、且つ固定の長さのブロックに記憶されているため、
各ブロックの最後に不定の長さの空間が残るようにな
り、辞書空間を無駄にする。Further, when the original utterance symbol string is used as it is for searching, the length of the utterance symbol string in the dictionary body is indefinite and stored in a block of a fixed length.
A space of indefinite length remains at the end of each block, wasting dictionary space.

【０００６】[0006]

【課題を解決するための手段】上記の問題点を解決する
ため、本発明は、索引コード及び対応するデータを記憶
する辞書と、複数のブロックから構成し、各ブロック毎
に複数個のビットから構成して各ビットは０か１のフラ
グ値により検索コードに対応するデータがあるかどうか
を表す対応フラグ群及び上記フラグ値は１である場合上
記辞書にの索引コード及び対応するデータの記憶位置を
格納する対応アドレス群からなる対応部と、入力された
検索コードの前部分により、対応部の該当ブロックを検
出する検索部と、入力された検索コードの一部分との比
較により上記検出された対応部の対応フラグ及び対応ア
ドレスを取り出して検索コードの対応データを辞書から
検出する検出手段とを備えたことを特徴とするデータ検
索装置。In order to solve the above problems, the present invention comprises a dictionary storing index codes and corresponding data, and a plurality of blocks, each block comprising a plurality of bits. Corresponding flag group that indicates whether or not there is data corresponding to the search code according to the flag value of 0 or 1 for each bit, and if the flag value is 1, the storage position of the index code and the corresponding data in the dictionary The corresponding portion formed of corresponding address groups for storing the corresponding search portion, the search portion for detecting the corresponding block of the corresponding portion by the front portion of the input search code, and the detected portion obtained by comparing a part of the input search code. A data retrieving apparatus, comprising: a detection unit that extracts a corresponding flag and a corresponding address of a copy and detects corresponding data of a search code from a dictionary.

【０００７】[0007]

【作用】上記の本発明の構成によれば、入力された検索
コードの前部分を検索キーとして検索部により対応部の
該当ブロックを検出し、次に、検出手段で使用された検
索コードの前部分の次の一部分により対応部の対応アド
レスを取り出して辞書に対して最小検索範囲を決めるよ
うにする。最後に上記次一部分の検索コードとの比較に
より辞書から対応するデータを検出して、出力部により
検索結果を出力する。大容量辞書データファイルを高速
に検索することが可能となる。According to the above-mentioned structure of the present invention, the front part of the input search code is used as the search key to detect the corresponding block of the corresponding part by the search part, and then the search code used by the detection means is searched. The next part of the part is used to extract the corresponding address of the corresponding part and determine the minimum search range for the dictionary. Finally, the corresponding data is detected from the dictionary by comparison with the search code of the next part, and the search result is output by the output unit. A large capacity dictionary data file can be searched at high speed.

【０００８】[0008]

【実施例】中国語辞書データの検索を実施例として、本
発明の動作を説明する。本実施例では、中国語の読み記
号をコード化して検索比較用コード、単語及び単語の使
用頻度を対応データとする。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The operation of the present invention will be described by taking a search of Chinese dictionary data as an example. In the present embodiment, the Chinese reading symbols are coded and the search comparison code, the word, and the frequency of use of the word are used as the corresponding data.

【０００９】中国語の有効な読みは千三百あまりで、す
べての読みを必ず二バイトで表わさなければならない。
この方式は、辞書の記憶空間を節約することができる
し、データ検索にも便利である。すると、各中国語の読
みの声母及び介音を一緒に一つのバイトに配置して、韻
母及び声調を他の一つのバイトに配置して、図４のよう
にそれぞれＡＳＣII文字表に配置する。各読みは唯一の
二バイトの代表コードで表示することができるようにな
る。ここでは、この二バイトの代表コードを検索コード
と称する。転換された検索コードの第一、第二、第三の
バイトを第一、第二、第三検索コードと称する。There are over 1,300 valid readings in Chinese, and every reading must be represented by two bytes.
This method can save the storage space of the dictionary and is convenient for data retrieval. Then, the Chinese vowels and Chinese vowels are placed together in one byte, the vowels and tones are placed in another byte, and are placed in the ASCII character table as shown in FIG. Each reading will be able to be displayed with a unique 2-byte representative code. Here, this 2-byte representative code is referred to as a search code. The first, second, and third bytes of the converted search code are referred to as first, second, and third search codes.

【００１０】図６に示されている中国語発声記号に対し
てそれぞれ一つの順序値を与えて、図７のような簡単な
判断及び計算により、入力された発声記号列を対応検索
コードに転換することができる。読みOne sequence value is given to each of the Chinese voicing symbols shown in FIG. 6, and the input voicing symbol string is converted into the corresponding retrieval code by the simple judgment and calculation as shown in FIG. can do. reading

【００１１】[0011]

【外１】 [Outer 1]

【００１２】を例として、説明する。図５(a)の順序値
を参照して、声母An example will be described. Referring to the sequence value in FIG.

【００１３】[0013]

【外２】 [Outside 2]

【００１４】は声母順序の第10目であり、介母Is the tenth item in the order of the voice mother,

【００１５】[0015]

【外３】 [Outside 3]

【００１６】は介母順序の第２目であるので、第一検索
コードは下記のように転換される。 21Ｈ＋10^*4＋2＝4bＨ, 4bＨはＡＳＣIIコードの「Ｋ」に当たる。Is the second of the ensemble order, the first search code is transformed as follows. 21H + 10 ^* 4 + 2 = 4bH, 4bH corresponds to "K" in the ASCII code.

【００１７】韻母Rhyme

【００１８】[0018]

【外４】 [Outside 4]

【００１９】は韻母順序の第８目であり、声調「…」は
声調順序の第０目であるので、第二検索コードは下記の
ように転換される。Is the eighth in the rhyme order and the tone "..." is the zero in the tone order, so the second search code is transformed as follows:

【００２０】26Ｈ＋8^*5＋0＝4eＨ, 4eＨはＡＳＣIIコードの「Ｎ」に当たる。26H + 8 ^* 5 + 0 = 4eH, 4eH corresponds to "N" of the ASCII code.

【００２１】上記により、読み（外１）の検索コードを
「ＫＮ」にするわけである。図１は本発明のデータ検索
装置における一実施例のシステムブロック図である。図
１において、11は任意の長さの発声記号列を入力するこ
とができる入力部である。12は記憶用のレジスタ及びバ
ッファを有する記憶部である。そのうち、Ｒレジスタ12
1は検索する時に比較しようとする検索コードを格納す
るレジスタである。Ｈレジスタ122は二分検索する時の
上位限定値を格納する。Ｐレジスタ123は二分検索をす
る時の下位限定値を格納する。Ｂレジスタ124は二分検
索及び対応単語を検出する時の限定範囲の大きさを格納
する。Ｑレジスタ125は二分検索をする時の比較対象の
コード値を格納するレジスタである。13は入力された発
声記号列に対して、中国語の発声の構成特徴にしたがっ
て、検索向けの検索コードに転換する検索コード処理部
である。15は検索用の主索引表及び副索引表を記憶する
索引記憶部である。索引記憶部15の構造は図８に示すよ
うに、主索引表には第一検索コードになる可能のあるコ
ードを記憶する。主索引表の各項目は副索引表の特定ブ
ロックに指している。該当副索引表ブロックには第一検
索コードに関するすべての有効な第二検索コードを記憶
する。副索引表の各ブロックの大きさは隣の二主索引の
ポインタの差により決められる。第二検索コードを利用
して主索引表及び副索引表を検索することにより、対応
部16の対応ブロックを獲得することができる。By the above, the retrieval code for reading (outer 1) is set to "KN". FIG. 1 is a system block diagram of an embodiment of a data search device of the present invention. In FIG. 1, reference numeral 11 is an input unit capable of inputting a vocal symbol string of an arbitrary length. A storage unit 12 has a register and a buffer for storage. Of which, R register 12
Reference numeral 1 is a register that stores a search code to be compared when searching. The H register 122 stores the upper limit value for the binary search. The P register 123 stores a lower limit value when performing a binary search. The B register 124 stores the size of the limited range when the binary search and the corresponding word are detected. The Q register 125 is a register that stores a code value to be compared when performing a binary search. Reference numeral 13 denotes a search code processing unit that converts the input utterance symbol string into a search code for search according to the constituent features of Chinese utterance. An index storage unit 15 stores a main index table and a sub index table for retrieval. As shown in FIG. 8, the structure of the index storage unit 15 stores a code that can be the first search code in the main index table. Each item in the primary index table points to a specific block in the secondary index table. All valid second search codes relating to the first search code are stored in the corresponding sub-index table block. The size of each block in the secondary index table is determined by the difference between the pointers of the adjacent two primary indexes. By searching the main index table and the sub index table using the second search code, the corresponding block of the corresponding unit 16 can be obtained.

【００２２】対応部16は索引記憶部15の階層的索引表に
対応しているブロックから構成される。対応部16の構造
は図９に示すように各ブロックは対応フラグ群161及び
対応アドレス群162に分けられている。対応フラグ群161
には索引記憶部15で利用された検索コードの次のコード
（本実施例では第三検索コードである）に対応する単語
があるかどうかの表示が記憶されている。第三検索コー
ドは図５のように、88個があり、各検索コードを１ビッ
トで表示すると、11バイトが必要なので、本実施例は対
応フラグ群161を11バイトの大きさにしている。各検索
コードは図５(a)のようにＡＳＣII順序にしたがって対
応フラグ群161の該当ビットに対応している。もし、該
当ビットは″１″に設定されている場合は該当検索コー
ドの対応単語が存在していることを表わす。該当ビット
が″０″であれば、該当検索コードには対応単語がない
ことを意味する。対応アドレス群162には、図９のよう
に上記の対応フラグ群161の″１″に設定されたビット
の索引値、つまり、辞書18に該当検索コードに対応する
単語の記憶開始アドレスが記憶されている。本実施例で
は２バイトでこの対応アドレスを記憶している。The correspondence unit 16 is composed of blocks corresponding to the hierarchical index table of the index storage unit 15. As shown in FIG. 9, the structure of the correspondence unit 16 is such that each block is divided into a correspondence flag group 161 and a correspondence address group 162. Correspondence flag group 161
The display stores whether or not there is a word corresponding to the next code (the third search code in this embodiment) of the search code used in the index storage unit 15. As shown in FIG. 5, there are 88 third search codes, and if each search code is represented by 1 bit, 11 bytes are required. Therefore, in the present embodiment, the corresponding flag group 161 has a size of 11 bytes. Each search code corresponds to the corresponding bit of the corresponding flag group 161 according to the ASCII order as shown in FIG. If the corresponding bit is set to "1", it means that the corresponding word of the corresponding search code exists. If the corresponding bit is "0", it means that the corresponding search code has no corresponding word. As shown in FIG. 9, the corresponding address group 162 stores the index value of the bit set to "1" of the corresponding flag group 161, that is, the storage start address of the word corresponding to the relevant search code in the dictionary 18. ing. In this embodiment, this corresponding address is stored in 2 bytes.

【００２３】辞書18には図９のように索引コード、対応
単語及び関連情報が格納されている。第１文字の検索コ
ードは既に索引記憶部15に記憶されているので、辞書18
には第２文字からの検索コードが格納されている。本実
施例では、対応単語の使用頻度を関連情報として１バイ
トで記憶している。As shown in FIG. 9, the dictionary 18 stores index codes, corresponding words and related information. Since the search code of the first character is already stored in the index storage unit 15, the dictionary 18
The search code starting from the second character is stored in. In this embodiment, the usage frequency of the corresponding word is stored in 1 byte as the related information.

【００２４】検索部14は検索コード処理部13で得られた
前二検索コードにより、索引記憶部15の索引表を参照し
て、二分検索という方法で、対応部16から前二検索コー
ドに該当する対応ブロックを取り出す。17は検索部14で
得られた対応ブロックの対応フラグ及び対応アドレスの
検索値により最小の検索空間で対応単語を辞書18から検
出することができる検出手段である。出力部18は検出手
段17で検出された単語及び対応情報を出力する。The search unit 14 refers to the index table of the index storage unit 15 with the previous two search codes obtained by the search code processing unit 13 and applies the binary search to the corresponding two search codes from the corresponding unit 16. Take out the corresponding block. Reference numeral 17 is a detection means capable of detecting the corresponding word from the dictionary 18 in the minimum search space based on the search value of the corresponding flag of the corresponding block and the corresponding address obtained by the search unit 14. The output unit 18 outputs the word detected by the detection unit 17 and the correspondence information.

【００２５】以上のように構成した本発明の実施例につ
いて、図２、図３の処理流れを参照しながら本発明の実
施例におけるデータ検索装置の検索動作を説明する。ま
ず、Ｓ１、Ｓ２により入力部11から入力された発声記号
列をＢバッファ127に格納して、そして、Ｓ３によりＢ
バッファ127に格納されている発声記号列に対して、図
６の処理のように検索コードに転換してから、Ａバッフ
ァ126に格納する。With respect to the embodiment of the present invention configured as described above, the search operation of the data search apparatus in the embodiment of the present invention will be described with reference to the processing flows of FIGS. First, the utterance symbol string input from the input unit 11 in S1 and S2 is stored in the B buffer 127, and then in B3 in S3.
The voicing symbol string stored in the buffer 127 is converted into a search code as in the process of FIG. 6 and then stored in the A buffer 126.

【００２６】次に、検索部14の処理に入る。Ｓ４により
下位限定値を″０″に、上位限定値を主索引表の項目数
（本実施例は55である）に設定し、それぞれＰレジスタ
123及びＨレジスタ122に格納する。次に、Ａバッファ12
6に格納されている第一検索コードをＲレジスタ121に読
み込む。Ｓ５からＳ12までの動作では、Ｐレジスタ123
の下位限定値及びＨレジスタ122の上位限定値の範囲
内、Ｒレジスタ121に格納されている第一検索コードに
より、主索引表に対して二分検索をして、得られた副索
引に対しての検索開始位置をＰレジスタ123に、副索引
にの検索開始位置のブロックの次のブロックの前の位置
をＨレジスタ122に設定すると共に、Ａバッファ126から
第二検索コードを読み出して、Ｒレジスタ121に格納す
る。そして、Ｓ13からＳ19までの動作を行い、Ｒレジス
タ121に格納されている第二検索コードにより、Ｐレジ
スタ123及びＨレジスタ122の限定値の範囲内、副索引表
に対して二分検索をして、得られた対応部16に指す対応
ブロックの位置をＱレジスタ125に格納する。Next, the processing of the search unit 14 starts. The lower limit value is set to "0" and the upper limit value is set to the number of items in the main index table (55 in this embodiment) by S4, and the P register is set for each.
123 and the H register 122. Next, A buffer 12
The first search code stored in 6 is read into the R register 121. In the operation from S5 to S12, the P register 123
Within the range of the lower limit value of H and the upper limit value of the H register 122, the first search code stored in the R register 121 is used to perform a binary search on the main index table and the obtained sub index is searched. Is set in the P register 123, the position before the block next to the block of the sub index search start position is set in the H register 122, and the second search code is read from the A buffer 126 to read the R register. Store in 121. Then, the operations from S13 to S19 are performed, and the second search code stored in the R register 121 is used to perform a binary search for the sub index table within the limited value range of the P register 123 and the H register 122. The obtained position of the corresponding block pointed to by the corresponding unit 16 is stored in the Q register 125.

【００２７】それからＳ20からＳ25までの検出手段の処
理をする。まず、Ａバッファ126から第三検索コードを
読み出して、Ｒレジスタ121に格納する。次に、Ｑレジ
スタ125に格納されている位置により対応部16に指す対
応ブロックをＢバッファ127に読み込む。Ｓ22では、Ｒ
レジスタ121に格納されている第三検索コードにより、
対応ブロックの対応フラグ群161の設定と比較して、対
応単語があるかどうかをチェックする。対応単語がない
場合は当該単語が見つからないというメッセージを出力
部19により表示する。対応単語があれば、Ｓ23ではＳ22
の比較により対応アドレス群162の該当アドレス値をＰ
レジスタ123に格納して、次のアドレス値をＨレジスタ1
22に格納すると共に、Ｐレジスタ123とＨレジスタ122と
の差（即ち最小の検索空間）をＢレジスタ124に格納す
る。Ｓ24では、Ｐレジスタ123に格納されている検索開
始アドレス値からＢレジスタ124に格納されている値ま
での辞書18に格納されているデータをＢバッファ127に
読み込む。そしてＡバッファ126の前二検索コードを削
除する。Ｓ25はＡバッファ126にの残りの検索コードに
より、Ｂレジスタ124に格納されている最小の検索空間
の限定で、Ｂバッファ127に格納されている内容と逐次
に比較して、一致する対応単語及び対応情報を出力部19
により出力する。Then, the processing of the detecting means from S20 to S25 is performed. First, the third search code is read from the A buffer 126 and stored in the R register 121. Next, the corresponding block pointed to the corresponding portion 16 according to the position stored in the Q register 125 is read into the B buffer 127. In S22, R
By the third search code stored in the register 121,
It is checked whether there is a corresponding word by comparing with the setting of the corresponding flag group 161 of the corresponding block. When there is no corresponding word, the output unit 19 displays a message that the word cannot be found. If there is a corresponding word, in S23 S22
By comparing the corresponding address value of the corresponding address group 162 with P
Store it in register 123 and store the next address value in H register 1.
22 and the difference between the P register 123 and the H register 122 (that is, the minimum search space) in the B register 124. In S24, the data stored in the dictionary 18 from the search start address value stored in the P register 123 to the value stored in the B register 124 is read into the B buffer 127. Then, the previous two search codes in the A buffer 126 are deleted. In S25, the minimum search space stored in the B register 124 is limited by the remaining search code in the A buffer 126, and the contents stored in the B buffer 127 are sequentially compared to find a matching corresponding word and Output corresponding information 19
To output.

【００２８】下記に図２、図３を参照しながら、中国語
発声記号列Referring to FIGS. 2 and 3 below, a Chinese utterance symbol string

【００２９】[0029]

【外５】 [Outside 5]

【００３０】例として、本発明の動作を詳細的に説明す
る。よりよく説明するために、Ｒレジスタ121、Ｈレジ
スタ122、Ｐレジスタ123、Ｂレジスタ124、Ｑレジスタ1
25の値をそれぞれｒ、ｈ、ｐ、ｂ、ｑとする。As an example, the operation of the present invention will be described in detail. For better explanation, R register 121, H register 122, P register 123, B register 124, Q register 1
The values of 25 are r, h, p, b and q, respectively.

【００３１】該当発声記号列が入力されると、入力部11
により入力された発声記号列をＢバッファ127に格納し
てから、検索コード処理部13により、図６、図７に示す
ように検索用の検索コードに転換する。転換された結果
は「１(％.ａ＆Ｃ＋ｗＶ１Ｇ２＃」になる。この検索コ
ードをＡバッファ126に格納する。次に、ｐをゼロ、ｈ
を主索引表の総項目数（本実施例では、主索引表に集め
られていた中国語の全ての有効な声母と介母との組み合
わせの集合は全部55がある。）に設定する。そして、Ａ
バッファ126に格納されている第一検索コード’１’を
Ｒレジスタ121に書き込み、二分検索するために、ｈ、
ｐの中間値（本実施例は27である）をＢレジスタ124に
記憶する。そして、図８に示すようにＢレジスタ124に
格納されている中間値の代表値’Ｅ’をＱレジスタ125
に記入する。次に、ＡＳＣIIの字コード順序により、ｑ
をｒと比較する。ｒはｑより大きいである場合は、ｈを
ｂ−１に換えて、より小さいである場合は、ｈをｂ＋１
に換える。ｑとｒは同じようになるまで、上記のように
図２のＳ５からＳ10までの二分検索をしてレジスタの値
を修正する動作を繰り返す。上記によると、ｑとｒとの
比較結果及び各レジスタの値の変化は下記の（表１）の
ように、When the corresponding vocalization symbol string is input, the input unit 11
The utterance symbol string input by is stored in the B buffer 127, and then converted by the search code processing unit 13 into a search code for search as shown in FIGS. The converted result is “1 (%. A & C + wV1G2 #). This search code is stored in the A buffer 126. Next, p is zero and h is h.
Is set to the total number of items in the main index table (in this embodiment, there are 55 sets of all combinations of valid Chinese and Chinese vowels collected in the main index table). And A
In order to write the first search code '1' stored in the buffer 126 into the R register 121 and perform a binary search, h,
The intermediate value of p (27 in this embodiment) is stored in the B register 124. Then, as shown in FIG. 8, the representative value'E 'of the intermediate values stored in the B register 124 is set to the Q register 125.
Fill in. Next, according to the ASCII character code order, q
With r. If r is greater than q, then replace h with b-1, and if less, h with b + 1
Change to. Until q and r become the same, the operation of performing the binary search from S5 to S10 in FIG. 2 and correcting the value of the register is repeated as described above. According to the above, the comparison result of q and r and the change of the value of each register are as shown in (Table 1) below.

【００３２】[0032]

【表１】 [Table 1]

【００３３】すると、図10に示すように副索引表に指す
検索開始位置214、次のブロックの検索位置242を獲得す
ることができる。それにしたがって、ｐを214、ｈを242
−１＝241に設定する。Then, as shown in FIG. 10, the search start position 214 and the search position 242 of the next block pointed to by the sub index table can be obtained. Accordingly, p is 214 and h is 242.
Set -1 = 241.

【００３４】次に、Ａバッファ126に格納されている第
二検索コード’（’をＲレジスタ121に書き込み、二分
検索するために、Ｂレジスタ124に格納されている中間
値の代表値’Ｅ’をＱレジスタ125に記入する。次にｑ
とｒは同じようになるまで、図２のＳ13からＳ17までの
二分検索をしてレジスタの値を修正する動作を繰り返
す。上記の処理によると、ｑとｒとの比較結果及び各レ
ジスタの値の変化は下記の（表２）のように、Next, in order to write the second search code '(' stored in the A buffer 126 into the R register 121 and perform a binary search, the representative value'E 'of the intermediate values stored in the B register 124. To Q register 125. Then q
Until r and r become the same, the operation of performing the binary search from S13 to S17 of FIG. 2 and correcting the value of the register is repeated. According to the above processing, the comparison result of q and r and the change in the value of each register are as shown in (Table 2) below.

【００３５】[0035]

【表２】 [Table 2]

【００３６】すると、図10に示すように「１（」の対応
部16に指す対応ブロック位置53128、次のブロック位置5
4044を獲得することができる。Then, as shown in FIG. 10, the corresponding block position 53128 pointed to by the corresponding portion 16 of "1 (", the next block position 5
You can get 4044.

【００３７】次に、検出手段17の処理に入る。まず、Ａ
バッファ126に格納されている第三検索コード’％’を
Ｒレジスタ121に書き込み、上記取り出された対応ブロ
ックの内容をＢバッファ127に記入する。第三検索コー
ド’％’は図５(a)に示すように本実施例の配置位置は
第５位であるので、対応フラグ群161の第５ビットを検
索すると、１に設定されているので、前三検索コードに
は対応単語があることを表示する。対応フラグ群161の
第５ビットの対応アドレス72をＰレジスタ123に、次の
対応アドレス138をＨレジスタ122に、ｐとｈとの差66を
Ｂレジスタ124に格納する。即ち53128は対応単語の辞書
の記憶開始位置で、66は検出する範囲である。Next, the processing of the detecting means 17 is started. First, A
The third search code '%' stored in the buffer 126 is written in the R register 121, and the content of the extracted corresponding block is written in the B buffer 127. Since the third search code '%' is located at the fifth position in this embodiment as shown in FIG. 5A, it is set to 1 when the fifth bit of the corresponding flag group 161 is searched. , It is displayed that the preceding three search codes have corresponding words. The corresponding address 72 of the fifth bit of the corresponding flag group 161 is stored in the P register 123, the next corresponding address 138 is stored in the H register 122, and the difference 66 between p and h is stored in the B register 124. That is, 53128 is the storage start position of the dictionary of corresponding words, and 66 is the detection range.

【００３８】それから、辞書18のｐ(53128)から数えて
ｂ(66)の範囲内のデータをＢバッファ127に読み込む。
次に、Ａバッファ126に格納されている前二検索コード
を削除し、ｂの検出する範囲の限定で、残した「％.ａ
＆Ｃ＋ｗＶ１Ｇ２＃」をＢバッファ127に読み込まれた
対応単語の索引コードと比較して、一致する索引コード
である単語「打破沙鍋問到底」及び対応情報を検出し、
出力部19により出力する。Then, the data within the range of b (66) counting from p (53128) of the dictionary 18 is read into the B buffer 127.
Next, the previous two search codes stored in the A buffer 126 are deleted, and the remaining “% .a
& C + wV1G2 # ”is compared with the index code of the corresponding word read in the B buffer 127, and the matching index code, the word“ Break-up Hotpot Question ”and corresponding information is detected,
It is output by the output unit 19.

【００３９】本実施例は、第三検索コードにより、分岐
限定(branch and bound)の原則で三階層の検索をして、
逐次に検索の範囲を小さくすることにつれて検索のスピ
ードを向上することができる。且つ、検索の範囲を縮め
るので、検索時間を減らすだけではなく、各階層の検索
で発音規則に合わない入力や誤入力などのエラーを直ち
に反応することができる。また、有効な第一検索コード
の集合を主索引表に、有効な第二検索コードの集合を副
索引表に、第三検索コードを対応フラグに格納するとい
う検索ファイル編成構造は検索に効果的に行うことがで
きる。もし、きわめて小さいの前二検索コードを主メモ
リに記憶すれば、記憶区間を節約することができる。中
国語などのデータ検索に対して、実用性が非常に大き
い。In the present embodiment, the third search code is used to perform a search of three layers on the principle of branch and bound.
The search speed can be improved as the search range is successively reduced. Moreover, since the search range is shortened, not only the search time is shortened, but also an error such as an input that does not match the pronunciation rule or an incorrect input can be immediately reacted in the search of each layer. Also, the search file organization structure in which a set of valid first search codes is stored in the main index table, a set of valid second search codes is stored in the sub index table, and a third search code is stored in the corresponding flag is effective for searching. Can be done. If the extremely small two previous search codes are stored in the main memory, the storage section can be saved. It is very practical for data retrieval such as Chinese.

【００４０】以上説明したように本発明データ検索装置
によれば、前三検索コードにより、分岐限定(branch an
d bound)の原則で三階層の検索をして、逐次に検索の範
囲を小さくすることにつれて検索のスピードを向上する
ことができる。また、各階層の検索で誤入力の読みに対
して直ちに反応することができる。また、有効な第一検
索コードの集合を主索引表に、第二検索コードの集合を
副索引表に、第三検索コードを辞書のブロック先頭に格
納するという検索ファイル編成構造は検索を効果的に行
うことができる。且つ、前二検索コードを主メモリに記
憶することにより、きわめて小さい、且つ固定の記憶区
間を用いただけで効率的に分岐検索できる。中国語など
のデータ検索に対して、データの記憶空間及び検索速度
を同時に考慮したので、実用性が非常に大きい。As described above, according to the data search apparatus of the present invention, the branch limit (branch ann
It is possible to improve the speed of the search as the search range is successively reduced by performing a search in three layers based on the principle of d bound). In addition, it is possible to immediately react to reading an erroneous input by searching each layer. Also, the search file organization structure in which a set of valid first search codes is stored in the main index table, a set of second search codes is stored in the sub index table, and a third search code is stored at the beginning of the block of the dictionary is effective for the search. Can be done. Moreover, by storing the previous two search codes in the main memory, the branch search can be efficiently performed by using only a very small and fixed storage section. Since the data storage space and the search speed are simultaneously taken into consideration when searching data such as Chinese, it is very practical.

【００４１】本発明は上記の実施例にのみ限らず、要旨
を変更しない範囲で適当変形して実施できる。例えば、
入力する記号は発声記号に限らず、簡易倉頡記号でも結
構で、索引記憶部の主、副索引表及び辞書を簡易倉頡首
尾コードで構成すれば、実施することができる。また、
本発明は日本語辞書データに対する検索も容易に実施で
きる。The present invention is not limited to the above-mentioned embodiments, but can be carried out by appropriately modifying it without departing from the scope of the invention. For example,
The symbol to be input is not limited to the vocal symbol, and a simple Kuramo symbol may be used. If the main and sub index tables of the index storage unit and the dictionary are configured with the simple Kurasu code, it can be performed. Also,
The present invention can easily perform a search for Japanese dictionary data.

【００４２】[0042]

【発明の効果】上記説明したように本発明データ検索装
置によれば、対応部の対応フラグ及び対応アドレスを設
けているので、記憶区間を節約する上に、検索範囲を最
小にして、検索速度を向上することができる。実用性が
非常に大きい。As described above, according to the data search apparatus of the present invention, since the corresponding flag and the corresponding address of the corresponding portion are provided, the storage area can be saved, the search range can be minimized, and the search speed can be reduced. Can be improved. Very practical.

[Brief description of drawings]

【図１】本発明の実施例におけるデータ検索装置の構成
を示すブロック図FIG. 1 is a block diagram showing a configuration of a data search device according to an embodiment of the present invention.

【図２】同実施例の処理過程を示す流れ図FIG. 2 is a flowchart showing the processing steps of the embodiment.

【図３】同実施例の処理過程を示す流れ図FIG. 3 is a flowchart showing the processing steps of the embodiment.

【図４】(a)は従来例のデータ検索装置の構成を示すブ
ロック図 (b)は従来例の辞書構造を説明する説明図FIG. 4 (a) is a block diagram showing a configuration of a conventional data search device, and FIG. 4 (b) is an explanatory diagram illustrating a conventional dictionary structure.

【図５】本発明の中国語の一種の発声記号のコーディン
グを示す説明図FIG. 5 is an explanatory diagram showing the coding of a kind of vocalization symbol in Chinese according to the present invention.

【図６】中国語発声記号のコード順序を示す説明図FIG. 6 is an explanatory diagram showing the code order of Chinese vocalization symbols.

【図７】本発明の検索コード処理部の動作を示す流れ図FIG. 7 is a flowchart showing the operation of the search code processing unit of the present invention.

【図８】本発明の索引記憶部の構成を説明するための説
明図FIG. 8 is an explanatory diagram for explaining a configuration of an index storage unit of the present invention.

【図９】同実施例の対応部と辞書の構成を説明する説明
図FIG. 9 is an explanatory diagram illustrating a configuration of a corresponding unit and a dictionary according to the embodiment.

【図１０】同実施例の対応部の対応ブロックを検出する
ための索引記憶部に対しての検索方式を説明する説明図FIG. 10 is an explanatory diagram illustrating a search method for an index storage unit for detecting a corresponding block of a corresponding unit of the embodiment.

[Explanation of symbols]

11 入力部 12 記憶部 13 検索コード処理部 14 検索部 15 索引記憶部 16 対応部 17 検出手段 18 辞書 19 出力部 121Ｒレジスタ 122Ｈレジスタ 123Ｐレジスタ 124Ｂレジスタ 125Ｑレジスタ 126Ａバッファ 127Ｂバッファ 161 対応フラグ群 162 対応アドレス群 11 Input section 12 Storage section 13 Search code processing section 14 Search section 15 Index storage section 16 Corresponding section 17 Detection means 18 Dictionary 19 Output section 121 R register 122H register 123P register 124B register 125Q register 126A buffer 127B buffer 161 Corresponding flag group 162 Corresponding address group

フロントページの続き (72)発明者リン・チシェン台湾タイ・ペイ・シ・ター・アン・チー・ 10628・レン・アイ・ル・サン・トォアン・136・ハオ・ロウスン・シャ・ティエン・チ・チ・シュー・カイ・ファー・クゥー・フェン・ユウ・シエン・コン・スー内Front Page Continuation (72) Inventor Lin Chi Shen Taiwan Taipai Si Tha An Chi 10628 Ren Ile San Touan 136 Hao Rosun Sha Tien Chi Chi Shu Kai Far Koo Feng Yu Xieng Kong Su

Claims

[Claims]

1. A dictionary for storing an index code and corresponding data and a plurality of blocks, each block comprising a plurality of bits, each bit corresponding to a search code by a flag value of 0 or 1. Corresponding flag group indicating whether or not there is data to be stored and the above flag value is 1, a corresponding portion including a corresponding address group storing the index code in the dictionary and the storage location of the corresponding data, and the input search code By comparing the search part for detecting the corresponding block of the corresponding part with a part of the input search code by the front part of the above, the corresponding flag and the corresponding address of the detected corresponding part are extracted and the corresponding data of the search code is dictionary. A data search device comprising:

2. A dictionary for storing index codes and corresponding words, and a plurality of blocks, each block consisting of a plurality of bits, each bit corresponding to a search code by a flag value of 0 or 1. A corresponding flag group indicating whether or not there is a corresponding word, and the flag value is 1, a corresponding portion including an index code in the dictionary and a corresponding address group storing the storage location of the corresponding word, and the input search code By comparing the search unit that detects the corresponding block of the corresponding portion with the code of the first character of the above and the code of the second character of the input search code, the corresponding flag and the corresponding address of the detected corresponding portion are extracted. A Chinese dictionary data search device comprising: a detection unit that detects a corresponding word of a search code from a dictionary.