JPH02156378A

JPH02156378A - Word retrieve processing system

Info

Publication number: JPH02156378A
Application number: JP63309967A
Authority: JP
Inventors: Akinori Ishibashi; 石橋　昭憲; Satoshi Asakawa; 浅川　悟志
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1988-12-09
Filing date: 1988-12-09
Publication date: 1990-06-15

Abstract

PURPOSE:To decrease the burden of a user by dividing an inputted character string when the maximum number of stored characters is smaller than the number of the characters in the inputted character string to be a retrieve object, and indicating the retrieve of the maximally corresponding character string as one word. CONSTITUTION:When the character code string of an input character code storing part 1 matches with the word in a dictionary in a character code comparing part 3, the number of the maximally corresponding characters and an address in the dictionary are stored into a maximal length correspondence storing part 4. Based on the number of the characters stored into the storage part 4, a comparison objective character code control part 5 separates the character code string by using a partitioning code, and a comparative collating processing with a character code storage part 2 is executed again. That is, when the word to be the retrieve object of the input character string is not stored into the dictionary in an original form, the word to be the retrieve object, the number of the maximally corresponding characters and the address in the dictionary at such a time point are stored. Based on the information, by applying a control such as division to the word to be the retrieve object, the optimal word information stored into the dictionary can be obtained by binary search for one time.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、単語検索処理システムに関し、特に検索対象
の単語に制御を加えて、唯１回の２分探索により検索結
果が得られる単語検索処理システムに関するものである
。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to a word search processing system, and in particular to a word search processing system in which a search target word is controlled and a search result is obtained by only one binary search. It concerns a processing system.

[Conventional technology]

データベース技術の発展とユーザ業務の複雑化のために
、データベースに蓄積された情報を専用のオペレータだ
けでなく、計算機に慣れないユーザでも簡単な操作でか
つ高速に検索できる情報検索システムが要望されている
。この要望に応じるシステムの１つとして、計算機に対
して自然語で問い合わせを行う方法がある。Due to the development of database technology and the increasing complexity of user tasks, there is a need for an information retrieval system that allows not only dedicated operators but also users who are not familiar with computers to search for information stored in databases with simple operations and at high speed. There is. One system that meets this demand is a method of querying a computer in natural language.

従来、自然語で問い合わせを行い、単語を検索する方法
としては、最長一致方式と三方探索法とを採用したもの
が主流であった。Conventionally, the longest-match method and the three-way search method have been the mainstream methods for querying in natural language and searching for words.

最長一致方式とは、入力文字コード列を単語記憶装置内
に記憶されている単語の最長の文字コード数に切り出し
て、検索処理の結果、一致する単語がなかった場合には
、入力文字コード列の最終文字コードを削除した入力文
字コード列に対して再び検索処理を施こす方法である。The longest match method is to cut out the input character code string to the longest character code number of the word stored in the word storage device, and if there is no matching word as a result of search processing, the input character code string is In this method, the search process is performed again on the input character code string with the last character code deleted.

また、三方探索法とは、入力文字コード列と先ず単語配
列の中央の単語との一致をとり、不一致のときにはどち
ら側の半分の配列を検索するかを決定し、再び不一致の
ときにはさらにその半分の配列内の中央の単語との一致
をとるという操作を繰り返して一致する単語を検索する
方法である。In addition, the three-way search method first matches the input character code string with the word in the center of the word array, and if there is a mismatch, it is determined which half of the array should be searched, and if there is a mismatch again, then half of the word is searched again. This is a method of searching for a matching word by repeating the operation of matching with the central word in the array.

これら最長一致方式と三方探索法を用いた単語検索処理
方式としては、例えば特開昭５７−１０９０８８号公報
に記載されたものがある。上記公報の方式では、単語記
憶装置内の最長文字コードに切り出したものに対して検
索した結果、一致する単語を検出できないときには最終
文字コードを削除して再度検索するという繰り返し処理
を不要にしている。すなわち、単語記憶装置中の各単語
に対して、その単語の先頭位置から一致し、その文字コ
ード数が単語の文字コード数を下回わる範囲内で最大と
なる単語を指定して記憶し、この単語を検索結果の出力
として用いている。A word search processing method using the longest match method and the three-way search method is described, for example, in Japanese Patent Laid-Open No. 109088/1988. The method disclosed in the above publication eliminates the need for the repeated process of deleting the last character code and searching again when a matching word cannot be found as a result of searching for the longest character code in the word storage device. . That is, for each word in the word storage device, specify and store the word that matches the first position of the word and has the largest number of character codes within a range that is less than the number of character codes of the word; This word is used as the output of the search results.

[Problem to be solved by the invention]

このように、従来の単語検索方法では、一致する単語を
検出するまで、検索対象の単語を１文字ずつ削除したり
、変化させたりして、何回も初めから検索を繰り返す方
法を用いていたため、システム全体の性能に影響を及ぼ
していた。また、前記公報に記載された方式においても
、単語の最長一致の際に特定情報をユーザが付加するこ
とにより、１回の三方探索により目的とする単語情報を
得るようになっているので、特定情報を付加するという
ユーザの負荷が問題であった。この特定情報とは、単語
記憶装置内の各単語の先頭部が他の単語と一致するとき
、その最大の単語のアドレスを記載するものである。例
えば、あるアドレスに記憶された「クロイロ」は、他の
アドレスＡの「クロ」と先頭位置から２文字だけ一致し
ているので、そのアドレスに特定付加情報としてアドレ
スＡを記憶しておくのである。しかし、このような特定
情報をユーザが事前に付加することは、極めて面倒であ
り、時間の浪費となっていた。また、辞書内容の更新、
追加、削除等をユーザが行う場合には、前記特定情報も
変更する必要があり、この際の時間の浪費は大である。In this way, conventional word search methods involve repeating the search from the beginning many times by deleting or changing the search target word one character at a time until a matching word is found. , which was affecting the overall system performance. Also, in the method described in the above publication, the user adds specific information to the longest word match, and the target word information is obtained through a single three-way search. The problem was the burden on the user of adding information. This specific information describes the address of the largest word when the beginning of each word in the word storage device matches another word. For example, ``Kuroiro'' stored at a certain address matches ``Kuro'' at another address A by two characters from the beginning, so address A is stored as specific additional information at that address. . However, it is extremely troublesome and time consuming for the user to add such specific information in advance. Also, update the dictionary contents,
When the user adds, deletes, etc., the specific information also needs to be changed, which wastes a lot of time.

本発明の目的は、このような従来の課題を解決し、文字
コード記憶装置内に特定情報を付加して単語を記憶する
ことなく、１回の三方探索により目的とする単語情報が
高速に得られるような単語検索処理システムを提供する
ことにある。The purpose of the present invention is to solve these conventional problems, and to quickly obtain target word information through a single three-way search without adding specific information and storing the word in the character code storage device. The purpose of the present invention is to provide a word search processing system that allows users to search for words.

[Means to solve the problem]

上記目的を達成するため、本発明の単語検索処理システ
ムは、検索対象となる入力文字列を格納する手段と、文
字列単語および該文字列単語に付随する次検索アドレス
を記憶する文字記憶辞書と、上記入力文字列と上記文字
記憶辞書内の単語が一致しているか否かを比較する文字
比較手段とを備えた単語検索処理システムにおいて、上
記文字比較手段により°単語の先頭から連続して一致す
ると判定されたときに、一致した最大文字数を記憶する
最大一致記憶手段と、該最大一致記憶手段から起動され
、記憶されている最大文字数が検索対象となる入力文字
列の文字数より小さい時に、該検索対象となる入力文字
列を分割し、最大一致した文字列を１つの単語として再
び検索を指示する比較対象文字制御手段とを有すること
に特徴がある。In order to achieve the above object, the word search processing system of the present invention includes means for storing input character strings to be searched, and a character storage dictionary for storing character string words and next search addresses associated with the character string words. , a word search processing system comprising a character comparison means for comparing whether or not the input character string matches a word in the character storage dictionary; When it is determined that this is the case, a maximum match storage means for storing the maximum number of matched characters is activated from the maximum match storage means, and when the maximum number of characters stored is smaller than the number of characters in the input character string to be searched, The present invention is characterized in that it has a comparison target character control means that divides the input character string to be searched and instructs the search to be performed again using the maximum matched character string as one word.

[For production]

本発明においては、（イ）入力文字コード列と辞書内の
単語とがマツチした際に、最長一致した文字数と辞書内
の単語のアドレスを記憶しておくこと、および（ロ）比
較対象文字コード制御部により、検索の対象となる単語
の指定を制御すること、の２つの新しい処理機能を有し
ている。すなわち、入力文字列中の検索対象の単語が、
そのままの形で辞書内に記憶されていない場合、システ
ムは。In the present invention, (a) when an input character code string and a word in the dictionary match, the longest matching number of characters and the address of the word in the dictionary are stored; and (b) character codes to be compared. The control unit has two new processing functions: controlling the specification of words to be searched. In other words, the search target word in the input string is
If it is not stored in the dictionary in its raw form, the system.

検索対象の単語と最長一致する文字数と、その時点での
辞書内のアドレスとを記憶しておき、その情報を基にし
て検索対象の単語に分割等の制御を加えることにより、
１回の三方探索で辞書内に記憶されている最適な単語情
報を得ることが可能である・〔実施例〕以下、本発明の実施例を、図面により詳細に説明する。By memorizing the longest number of characters matching the search target word and the address in the dictionary at that time, and adding control such as splitting the search target word based on that information,
It is possible to obtain optimal word information stored in the dictionary with a single three-way search. [Example] Hereinafter, an example of the present invention will be described in detail with reference to the drawings.

第１図は、本発明の一実施例を示す単語検索処理システ
ムのブロック図である。FIG. 1 is a block diagram of a word search processing system showing one embodiment of the present invention.

第１図において、１は入力される文字コード（問い合わ
せ用の自然語「カナ文字」）をセットする入力文字コー
ド格納部、２は文字コード列からなる単語と各単語に付
随する情報として、三方探索のための次検索アドレス（
左右ポインタ）が記憶されている文字コード記憶部、３
は入力文字コード列と文字コード記憶部２に記憶されて
いる単語とが一致するか否かを、文字コードの大小比較
により判定する文字コード比較部、４は文字コード比較
部３による判定の結果、入力文字コード列と現在比較さ
れている文字コード記憶部２内の単語の先頭文字位置よ
り連続して一致する文字数を検出し、これと現在まで記
憶されている最大一致文字数とを比較することにより大
きい方を格納する最長一致記憶部、５は最長一致記憶部
４から信号が送られて来ると、記憶部４に記憶されてい
る文字数をもとに区切りコードを用いて分離し、再度文
字コード記憶部２との比較照合処理を開始するための比
較対象文字コード制御部、６は文字コード記憶部２内の
一致した文字コード列が出力されたとき、これを格納す
る出力領域部である。In Fig. 1, 1 is an input character code storage section for setting input character codes (natural language "kana characters" for inquiries), and 2 is a three-way storage section for storing words consisting of character code strings and information accompanying each word. Next search address for search (
a character code storage section in which left and right pointers) are stored, 3
4 is a character code comparison unit that determines whether or not the input character code string matches the word stored in the character code storage unit 2 by comparing the character codes, and 4 is the result of the determination by the character code comparison unit 3. , detecting the number of consecutive matching characters from the first character position of the word in the character code storage unit 2 currently being compared with the input character code string, and comparing this with the maximum number of matching characters stored up to now. When a signal is sent from the longest match storage unit 4, the longest match storage unit 5 stores the larger one, separates the characters using a delimiter code based on the number of characters stored in the storage unit 4, and stores the characters again. A comparison target character code control unit is used to start comparison processing with the code storage unit 2, and 6 is an output area unit that stores the matched character code string in the character code storage unit 2 when it is output. .

文字コード比較部３の比較判定は、先頭から順に１文字
ごとに各文字コードの大小比較、つまり２進法の比較に
より行われる。すなわち、文字コードとして′アイウェ
オ′順に増加する２進数を割り当てるので、アが最も小
さく、ア行ではオが最も大きく、さらにカ行、す行、夕
行・・・と順次コードは大きくなっていく。The comparison determination by the character code comparison unit 3 is performed by comparing the magnitude of each character code for each character from the beginning, that is, by comparing the characters in binary format. In other words, binary numbers that increase in the order of ``Aiweo'' are assigned as character codes, so A is the smallest, O is the largest in the A line, and then the code becomes larger in the C line, Su line, Yugo, etc. .

第４図は、第１図における文字コード記憶部の記憶内容
を示す図である。FIG. 4 is a diagram showing the stored contents of the character code storage section in FIG. 1.

第４図に示すように、文字コード記憶部２には。As shown in FIG. 4, in the character code storage section 2.

アドレス、単語コードの読み、単語文字、左ポインタお
よび右ポインタの各欄がある。なお、従来の文字コード
記憶部２では、左右ポインタ欄の他に、特定付加情報と
してその欄のの単語読みと一部重複する読みを持つ文字
コードのアドレスを記憶する欄がさらに余分に設けられ
ていた。There are columns for address, word code reading, word character, left pointer, and right pointer. In addition, in the conventional character code storage unit 2, in addition to the left and right pointer fields, an additional field is provided to store addresses of character codes whose readings partially overlap with the word readings in that field as specific additional information. was.

第４図の単語テーブルからも明らかなように、単語の配
列をアドレスの小さい方から順に′アイウェオ′の順序
で並べているので、最初に中央の単語と比較した結果、
その単語に割り当てられた２進数値より小さい入力文字
コードに対しては、ポインタによりその単語より前の配
列位置、つまり小さい２進数値の先頭文字で始まる単語
との比較に移ることになり、これを繰り返すことによっ
て一致する確率を大きくしている。As is clear from the word table in Figure 4, the words are arranged in the order of 'Aiweo' starting from the smallest address, so as a result of first comparing with the word in the middle,
For input character codes that are smaller than the binary value assigned to that word, the pointer moves to a comparison with the array position before that word, that is, the word that starts with the first character with the lower binary value, and this By repeating this, the probability of a match is increased.

文字コード比較部３は、先頭からの１文字ごとの判定に
おいて、入力文字コード列中の現在の文字位置のコード
が１文字コード記憶部２内で現在比較の対象となってい
る単語中の上記文字位置のコードよりも大きいと判定さ
れたときには、その文字コード記憶部２内の単語に付随
して予め定められた２つの次検索アドレスのうち、予め
定まっている一方を選択する。例えば、最初に中央部の
アドレス８の「チャｊと比較した結果、入力文字コード
の第１文字が「チ」に割り当てられた２進数値より大き
ければ、右ポインタにより指示されたアドレス１２の単
語との比較に移る。逆に、上記文字位置コードよりも小
さいと判定された場合には、上記次検索アドレスのうち
の他方を選択し、次に比較すべき文字コード記憶部２内
の単語を指定する６例えば、第４図の「チャ」と比較し
た結果、入力文字コードの第１文字が「チ」に割り当て
られた２進数値より大きければ、左ポインタにより指示
されたアドレス４の単語との比較に移る。The character code comparison unit 3 determines whether the code of the current character position in the input character code string is the above-mentioned one in the word currently being compared in the character code storage unit 2, in character-by-character determination from the beginning. When it is determined that the address is larger than the code of the character position, one of the two next search addresses predetermined associated with the word in the character code storage section 2 is selected. For example, if the first character of the input character code is larger than the binary value assigned to "ch" as a result of first comparing it with "cha j" at address 8 in the center, then the word at address 12 pointed by the right pointer Let's move on to a comparison. Conversely, if it is determined that the character position code is smaller than the character position code, the other one of the next search addresses is selected and the next word in the character code storage unit 2 to be compared is specified. As a result of the comparison with "cha" in FIG. 4, if the first character of the input character code is larger than the binary value assigned to "chi", the process moves on to comparison with the word at address 4 indicated by the left pointer.

これに対して、上記文字位置における両者のコードの大
きさが等しければ、比較す入き文字位置を１文字分進め
て、上記処理を反復する。On the other hand, if the sizes of the two codes at the character position are equal, the comparison input character position is advanced by one character and the above process is repeated.

この結果、入力文字コードおよび文字コード列の先頭か
ら末尾まで連続した文字コードが一致した時には、文字
コード記憶部２内の当該文字コード列を出力領域部６に
出力する。As a result, when the input character code and the consecutive character codes from the beginning to the end of the character code string match, the character code string in the character code storage section 2 is outputted to the output area section 6.

しかし、先頭から末尾まで連続して一致しないで、その
一部のみ一致したときには、最長一致記憶部４に記憶さ
れている内容に依存する。最長−数記憶部４は１文字コ
ード比較部３における比較判定の結果、入力文字コード
列と現在比較されている文字コード記憶部２内の単語の
先頭文字位置より連続して一致している文字数を検出し
、これを最長一致記低部４において既に記憶されている
それまでの最大一致文字数と比較し、今回の一致文字数
の方が大きければこれを最大一致文字数として記憶し、
同時に今回の文字コード記憶部２内の単語のアドレスも
記憶する。例えば、入力文字コード「クロイロノ」を第
４図のアドレス５のｒクロＪと比較した結果、最初の２
文字が一致したため、最長−数記憶部４に文字数２とそ
の一致したアドレス５を記憶しておき、さらにアドレス
６の了りロイロ」と比較した結果、最長−数記憶部４に
既に記憶されている一致文字数より多い４文字が一致し
たため、最長−数記憶部４の内容を今回の一致文字４お
よびアドレス６に書き換える。However, if there is no continuous match from the beginning to the end, but only a partial match, it depends on the content stored in the longest match storage section 4. The longest-number storage unit 4 stores the number of consecutive characters that match the input character code string from the first character position of the word in the character code storage unit 2 that is currently being compared as a result of the comparison in the one-character code comparison unit 3. , and compares this with the previous maximum number of matching characters already stored in the longest match record section 4, and if the current number of matching characters is larger, this is stored as the maximum number of matching characters,
At the same time, the address of the current word in the character code storage section 2 is also stored. For example, as a result of comparing the input character code "Kuroirono" with r KuroJ at address 5 in Figure 4, the first 2
Since the characters matched, we stored the number of characters 2 and the matching address 5 in the longest-number storage unit 4, and compared it with the address 6, which was already stored in the longest-number storage unit 4. Since four characters are matched, which is more than the current number of matching characters, the contents of the longest-number storage unit 4 are rewritten to the current matching characters 4 and address 6.

これに対し、今回の一致文字数の方が最長−数記憶部４
に既に記憶されている文字数よりも小さい場合には、比
較対象文字コード制御部５に信号が転送される。また、
今回の比較で一致する文字数が最長−数記憶部４に既に
記憶されている一致文字数と等しいときには、次の２つ
の場合に処理が分かれる。すなおち、（イ）文字コード
記憶部２内の単語に付随している次検索アドレスが存在
する場合（つまり、左右ポインタが次アドレスを指示し
ている場合）と、（ロ）次検索アドレスが存在しない場
合（左右ポインタが次アドレスを指示していない場合）
である。On the other hand, the number of matching characters this time is the longest - number storage unit 4
If the number of characters is smaller than the number of characters already stored in , the signal is transferred to the comparison target character code control section 5. Also,
When the number of matching characters in the current comparison is equal to the number of matching characters already stored in the longest-number storage unit 4, the processing is divided into the following two cases. In other words, (a) there is a next search address attached to the word in the character code storage unit 2 (that is, the left and right pointers are pointing to the next address), and (b) the next search address is If it does not exist (if the left and right pointers do not point to the next address)
It is.

比較対象となっている文字コード記憶部２内の単語に付
随している次検索アドレスが存在する場合には、予め定
まっている２つの次検索アドレスのうちの一方を選択す
る。この時、最長−数記憶部４に記憶されている情報は
変更しない。If there is a next search address associated with the word in the character code storage unit 2 that is being compared, one of the two predetermined next search addresses is selected. At this time, the information stored in the longest-number storage section 4 is not changed.

また、次検索アドレスが存在しない場合には、最長−数
記憶部２に記憶されている文字数よりも小さい場合と同
じように、比較対象文字コード制御部５に信号を転送す
る。If the next search address does not exist, a signal is transferred to the comparison target character code control unit 5 in the same way as in the case where the number of characters is smaller than the maximum number stored in the maximum minus number storage unit 2.

比較対象文字コード制御部５は、最長−数記憶部４から
信号が転送されてくると、入力文字コード格納部１の入
力文字コード列を最長−数記憶部４に記憶されている最
大一致文字数の情報をもとに区切りコード（例えば、ス
ペースコード）を用いて分離させ、文字コード比較部３
の判定対象文字コード列を区切りコードまでの文字コー
ド列と認識させて、最長−数記憶部４内に記憶されてい
る文字コード記憶部２内のアドレスが示す単語との文字
コードの大小比較から再度辞書マツチ処理を開始する。When the signal is transferred from the longest-number storage unit 4, the comparison target character code control unit 5 converts the input character code string in the input character code storage unit 1 into the maximum number of matching characters stored in the longest-number storage unit 4. Based on the information, the characters are separated using a delimiter code (for example, a space code), and the
The character code string to be determined is recognized as the character code string up to the delimiter code, and the character code is compared in size with the word indicated by the address in the character code storage unit 2 stored in the longest-number storage unit 4. Start the dictionary matching process again.

このときの辞書マツチ処理の方法は、比較対象の単語と
文字コード記憶部２内の単語が、一致するか否かだけの
判定を行う。The method of dictionary matching processing at this time only determines whether or not the word to be compared and the word in the character code storage section 2 match.

一致した場合には、その単語を出力領域部６に出力する
。また、一致しなかった場合には、上記比較対象文字列
の末尾１文字を切り離し、区切りコードの後に続いてい
る文字列の先頭に移動する。If there is a match, the word is output to the output area section 6. If they do not match, the last character of the character string to be compared is separated and moved to the beginning of the character string following the delimiter code.

この時、末尾１文字をカットした上記文字列を新たな比
較対象文字列として、最長−数記憶部４内に記憶されて
いる文字コード記憶部２内のアドレスをポインタ情報と
して持つ文字コード記憶部２内の単語との大小比較によ
る辞書マツチを行う。At this time, the character string with the last character cut off is set as a new comparison target character string, and the character code storage unit has the address in the character code storage unit 2 stored in the longest-number storage unit 4 as pointer information. Dictionary matching is performed by comparing the size with the words in 2.

この際に、最長−数記憶部４内のデータは、初期化され
るものとする。At this time, it is assumed that the data in the longest-number storage section 4 is initialized.

例えば、入力文字コード列「クロッポイ」と文字コード
記憶部２の「クロ」と比較した結果、先頭の２文字が一
致したため、最長−数記憶部４に文字数２とアドレスを
記憶した後、次に「り」と比較した場合には、先頭の１
文字が一致するが、既に記憶されている文字数２よりも
小さいため、比較対象文字コード制御部５に信号が送ら
れる。For example, as a result of comparing the input character code string "Kuropoi" with "Kuro" in the character code storage section 2, the first two characters match, so after storing the number of characters 2 and the address in the longest-number storage section 4, the next When compared with “ri”, the first 1
Although the characters match, since the number is smaller than the number of characters already stored (2), a signal is sent to the comparison target character code control unit 5.

文字コード制御部５では、記憶されている文字数２の「
クロＪとｒツポイ」に分離して、再度「クロＪで辞書マ
ツチを開始する。ここで、文字コード記憶部２の「りＪ
と比較した場合には、一致しないので、「クロＪの末尾
１文字を切り雛し１区切りコードの後に続いている「ツ
ボイ」の先頭に移動して、「ロッポイＪとする。そして
、新たに削除された「り」を比較対象文字列として辞書
マツチ処理を開始する。In the character code control unit 5, the number of stored characters “2” is stored.
Separate them into ``Kuro J and r Tsupoi'' and start dictionary matching again with ``Kuro J''.
If you compare it with , it doesn't match, so cut out the last character of Kuro J and move it to the beginning of "Tsuboi" that follows the 1-delimiter code, and make it "Roppoi J." Dictionary matching processing is started using the deleted "ri" as a comparison target character string.

出力領域部６は１文字コード比較部３により入力文字コ
ード記憶部２内の単語が、先頭がら末尾コードまたは区
切りコードまで連続して一致したと判明した単語を出力
する。The output area unit 6 outputs words that have been found to match the words in the input character code storage unit 2 continuously from the beginning to the end code or delimiter code by the one-character code comparison unit 3.

第２図は、第４図の文字コード記憶部を用いた処理バイ
ナリ・ツリーの図であり、第３図は、本発明の一実施例
を示す動作フローチャートである。FIG. 2 is a diagram of a processing binary tree using the character code storage section of FIG. 4, and FIG. 3 is an operation flowchart showing one embodiment of the present invention.

第２図〜第４図に従って、具体例を説明する。A specific example will be explained according to FIGS. 2 to 4.

いま、入力文字コード列として「クロツポイｊを考える
。先ず、入力文字コード格納部１に「クロッポイＪを比
較対象文字としてセットする（第３図ステップ３１）。Now, consider "Kurotsupoi J" as an input character code string. First, "Kurotsupoi J" is set in the input character code storage section 1 as a character to be compared (Step 31 in FIG. 3).

第２図に示すように、本実施例においては、文字コード
記憶部２内の初期アドレスレジスタには、アドレス８が
記憶されていることを前提とする。これにより、文字コ
ード比較部３で「クロッポイｊとアドレス８の「チャｊ
とを最初に比較する（ステップ３２）。文字コードして
′アイウェオ′順に増加する２進数を割り当てるものと
しているので、入力文字コード列の第１文字コード「り
」は、比較する文字コード記憶部２内の単語の第１文字
ｒチ」より小さいと判定される（ステップ３３）。従っ
て、第４図における「チャ」の左ポインタ４により次に
比較すべき文字コード記憶部２内の単語が「キイ口」で
あることを指定する（ステップ３４）。この時の最長−
数記憶部４には、この段階では初期値として′０′を記
憶している。従って、最長−数文字数は減少しないので
（ステップ３５）、元に戻って再度辞書内の単語と比較
する（ステップ３２）。すなわち。As shown in FIG. 2, in this embodiment, it is assumed that address 8 is stored in the initial address register in the character code storage section 2. As a result, the character code comparison unit
is first compared with (step 32). Since the character code is assigned as a binary number that increases in the order of 'Aiweo', the first character code 'ri' in the input character code string is the first character 'r' of the word in the character code storage unit 2 to be compared. It is determined that it is smaller (step 33). Therefore, the left pointer 4 of "cha" in FIG. 4 specifies that the next word in the character code storage section 2 to be compared is "key mouth" (step 34). The longest at this time -
The number storage unit 4 stores '0' as an initial value at this stage. Therefore, since the maximum number of characters does not decrease (step 35), the process returns to the beginning and compares again with the words in the dictionary (step 32). Namely.

「クロッポイＪと「キイ口」の第１文字の比較では、入
力文字コード列の方が参照する文字コード記憶部２内の
単語よりも大きいと判定され（ステップ３３）、「キイ
口」の右ポインタ６により次に比較すべき文字コード記
憶部２内の単語が「クロイロ」であることを指定する（
ステップ３４）。In the comparison of the first character of "Kuropoi J" and "Kiikuchi", it is determined that the input character code string is larger than the word in the character code storage unit 2 to which it is referred (step 33), and the right The pointer 6 specifies that the word in the character code storage unit 2 to be compared next is "kuroiro" (
Step 34).

この場合、第１文字目の「りｊと第２文字目の１口」は
一致している。第３文字目の比較で入力文字コード列が
「ツ」であり、参照する文字コード記憶部２内の単語の
方が「イ」であるため、入力文字コード列の方が太きい
と判定される。従って、ＴＩクロイロ」の右ポインタ′
７′により次に比較すべき文字コード記憶部２内の単語
がｒシロ」であることを指定する（ステップ３４）。ま
た、最長−数記憶部４には、今回一致した最大の長さ′
２′と、これに対応する文字コード記憶部２内の単語「
クロイロ１のアドレス６が記憶される（ステップ３５）
。次に、入力文字コード列「クロッポイ」とアドレス７
の「シロ」とを比較する（ステップ３２）。この場合に
は、第１文字目の比較において、入力文字コード列の方
が参照する文字コード記憶部２内の単語よりも小さいと
判定し、「シロ」の左ポインタが指す文字コード記憶部
２内の単語を指定する（ステップ３４）。しかし、最長
−数記憶部４には、一致した最大文字数′２′が記憶さ
れており、今回一致した最大文字数の′０″よりも大き
くなるため、最長−数記憶部４から比較対象文字コード
制御部５に対して起動信号が転送される（ステップ３５
）。この起動信号により、文字コード記憶部２内に「ク
ロッポイ」と一致する単語が存在しないことが判明する
。また、その時の最長−数記憶部４に記憶されている一
致最大文字数′２′により、比較対象文字コード制御部
５は、入力文字コード格納部１内の「クロツボイ」を先
頭の２文字「クロ」と残りの文字列「ツポイ」に区切る
（ステップ３６）。そして、最長−数記憶部４で記憶さ
れた情報を基に、文字コード列「クロ」と一致する単語
の文字コード記憶部２内の存在の有無を判定する（ステ
ップ３１）。この判定処理は、最長−数記憶部４に記憶
されている「クロイロ」のアドレス６から再度開始され
る。先ず、文字コード比較部３において、「クロ■と文
字コード記憶部２内のアドレス６の「クロイロ」とを比
較する（ステップ３２）。この場合には、第１文字目と
第２文字目は一致しているが、第３文字目の比較で入力
文字コード列が区切りコードであり（本実施例では、空
白コードとする）、参照する文字コード記憶部２内の文
字が「イ」であるため、入力文字コード列の方が小さい
と判定される。すなわち、ここでは′アイウェオ′順に
大きくなるが、最初の文字「ア」よりもスペースの方が
小さいものとする。従って、「クロイロ」の左ポインタ
５により、次に比較すべき文字コード記憶部２内の単語
はｒクロ」であることを指定する（ステップ３４）。次
に、入力文字コード列「クロ」を、アドレス５のｒクロ
」と比較する（ステップ３２）。In this case, the first character "rij" and the second character "1 kuchi" match. In the comparison of the third character, the input character code string is "tsu" and the word in the character code storage unit 2 to be referenced is "i", so the input character code string is determined to be thicker. Ru. Therefore, the right pointer of TI Croiro'
7' specifies that the word in the character code storage section 2 to be compared next is "r Shiro" (step 34). In addition, the maximum length matched this time is stored in the longest-number storage unit 4.
2' and the corresponding word "
Address 6 of Kuriro 1 is stored (step 35)
. Next, input character code string "Kuropoi" and address 7
and "Shiro" (step 32). In this case, in the comparison of the first character, it is determined that the input character code string is smaller than the word in the character code storage unit 2 that is referred to, and the character code storage unit 2 that the left pointer of "SHIRO" points to (step 34). However, since the maximum number of matched characters '2' is stored in the longest-number storage unit 4, which is larger than the maximum number of characters matched this time '0', the comparison target character code is stored in the longest-number storage unit 4. The activation signal is transferred to the control unit 5 (step 35
). This activation signal makes it clear that there is no word matching "Kuropoi" in the character code storage section 2. Also, based on the maximum number of matched characters '2' stored in the longest-number storage unit 4 at that time, the comparison target character code control unit 5 converts the first two characters 'Kurotsuboi' in the input character code storage unit 1 to 'Kurotsuboi'. " and the remaining character string "tsupoi" (step 36). Then, based on the information stored in the longest-number storage section 4, it is determined whether or not a word matching the character code string "Kuro" exists in the character code storage section 2 (step 31). This determination process is restarted from the address 6 of "Kuroiro" stored in the longest-number storage unit 4. First, the character code comparison section 3 compares "Kuro ■" with "Kuroiro" at address 6 in the character code storage section 2 (step 32). In this case, the first character and second character match, but when comparing the third character, the input character code string is a delimiter code (in this example, it is a blank code), and the reference Since the character in the character code storage unit 2 is "i", it is determined that the input character code string is smaller. That is, here, the space increases in the order of ``A'', but the space is smaller than the first letter ``A''. Therefore, the left pointer 5 of "Kuroiro" specifies that the next word in the character code storage section 2 to be compared is "r Kuro" (step 34). Next, the input character code string "kuro" is compared with "r kuro" at address 5 (step 32).

この時、入力文字コード列と文字コード記憶部２内の単
語が一致するので（ステップ３３）５文字コード記憶部
２内の単語ｒクロ」の情報を出力領域部６に出力する（
ステップ３７）。At this time, since the input character code string and the word in the character code storage section 2 match (step 33), the information of the word "r" in the 5-character code storage section 2 is output to the output area section 6 (
Step 37).

このアドレス５の文字コード記憶部２内の単語ｒクロ」
は、入力文字コード列の先頭位置から連続して最も長く
一致する単語ということになる。The word "r" in the character code storage unit 2 at this address 5.
is the longest consecutive word that matches from the beginning of the input character code string.

このように、計算機が間合わせ文中に含まれる単語の意
味を理解するためには、辞書との照合が必要であるが、
辞書マツチ処理の速度はシステム全体の性能上では重要
な要素となっている。本実施例では、最長−数記憶部４
と比較対象文字コード制御部５を新たに設けることによ
り、１回の検索で検索対象の単語と同等ないし一致する
辞書内の単語を検出することができ、単語検索の性能を
向上させることができる。また、本実施例では。In this way, in order for a computer to understand the meaning of a word contained in a makeshift sentence, it needs to be checked against a dictionary.
The speed of dictionary matching processing is an important factor in overall system performance. In this embodiment, the longest-number storage unit 4
By newly providing the comparison target character code control section 5, it is possible to detect words in the dictionary that are equivalent to or coincide with the search target word in a single search, and the performance of word search can be improved. . Also, in this example.

単語検索処理を行うシステムの性能向上を図るとともに
、性能向上に伴って生じる辞書構築時の作業量、および
辞書内の単語の変更時の作業量を省略することができる
。すなわち、従来の文字コード記憶部では、テーブル情
報として特定情報を付与させて単語を記憶させる必要が
あったのが、本発明ではこれを省略できるので、ユーザ
作業量を削減することが可能である。特に、文字コード
記憶部内の単語の更新、追加、削除が発生する際の上記
特定情報変更に伴うユーザ作業量を省略することが可能
である。In addition to improving the performance of a system that performs word search processing, it is possible to omit the amount of work required to construct a dictionary and change the words in the dictionary, which occur with improved performance. In other words, in the conventional character code storage unit, it was necessary to add specific information as table information to store words, but in the present invention, this can be omitted, so it is possible to reduce the user's workload. . In particular, it is possible to omit the user's workload associated with changing the specific information when updating, adding, or deleting words in the character code storage unit.

〔Effect of the invention〕

以上説明したように、本発明によれば、単語検索の際に
唯１回の三方探索により、入力文字コード列の先頭から
入力文字コード列に最も長く一致する文字コード記憶部
内の文字コード列を迅速に指定することが可能である。As explained above, according to the present invention, a character code string in the character code storage unit that matches the input character code string for the longest time is searched from the beginning of the input character code string by performing only one three-way search when searching for a word. It is possible to specify quickly.

また、辞書の作成時や更新時におけるユーザの負荷を増
加させないという利点もある。It also has the advantage of not increasing the user's load when creating or updating a dictionary.

[Brief explanation of the drawing]

第１図は本発明の一実施例を示す単語検索処理システム
の機能ブロック図、第２図は第１図の処理動作を示すバ
イナリ・ツリーを示す図、第３図は本発明の一実施例を
示す単語検索処理のフローチャート、第４図は第１図に
おける文字コード記憶部の記憶内容を示す図である。に入力文字コード格納部、２：文字コード記憶部、３：
文字コード比較部、４：最長−数記憶部、５：比較対象
文字コード制御部、６：出力領域部。第図 □：デデーの流れ第！Fig. 1 is a functional block diagram of a word search processing system showing an embodiment of the present invention, Fig. 2 is a diagram showing a binary tree showing the processing operation of Fig. 1, and Fig. 3 is an embodiment of the present invention. FIG. 4 is a flowchart showing the word search process, and FIG. 4 is a diagram showing the stored contents of the character code storage section in FIG. Input character code storage section, 2: Character code storage section, 3:
Character code comparison section, 4: longest-number storage section, 5: comparison target character code control section, 6: output area section. Figure □: The flow of Deday!

Claims

[Claims]

1. means for storing an input character string to be searched; a character memory dictionary for storing a character string word and a next search address associated with the character string word; and a means for storing an input character string to be searched; In a word search processing system comprising a character comparison means for comparing whether or not they match, when the character comparison means determines that there are consecutive matches from the beginning of a word, a maximum number of characters that are matched is stored. A match storage means and a maximum match storage means are activated to divide the input string to be searched when the maximum number of stored characters is smaller than the number of characters of the input string to be searched, and to divide the input string to be searched for, and to divide the input character string to be searched to obtain the maximum matched characters. A search word processing system comprising: comparison target character control means for instructing a search again using a string as one word.