JPS5827529B2

JPS5827529B2 - Kanji/kana mixed sentence creation device

Info

Publication number: JPS5827529B2
Application number: JP54160586A
Authority: JP
Inventors: 清大井; 多喜子富士; 弘林
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1979-12-11
Filing date: 1979-12-11
Publication date: 1983-06-10
Also published as: JPS5682974A

Description

【発明の詳細な説明】本発明は、漢字・仮名混り文作成装置、特に例えば仮名
文字列として入力された文字列の中から、当該文字列中
沖の単語を単語辞書中の格納単語と対比して、抽出され
た単語の前および／または後の単語との文法的接続関係
といわゆる最長一致法とにもとづいて決定し、上記文字
列を漢字・仮名混り文に変換する漢字・仮名混り文作成
装置において、最長一致法にしたがって抽出された単語
が予め定められた語数を超えて一致している場合に、当
該単語について文法的接続関係をチェックすることなく
決定して出力するようにした漢字・仮名混り文作成装置
に関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention provides a kanji/kana mixed sentence creation device, in particular, for example, from a character string input as a kana character string, words in the middle of the character string are compared with words stored in a word dictionary. Then, the kanji/kana mixture is determined based on the grammatical connection relationship with the words before and/or after the extracted word and the so-called longest match method, and the above character string is converted into a kanji/kana mixture sentence. In the sentence creation device, when the words extracted according to the longest match method match more than a predetermined number of words, the word is determined and output without checking the grammatical connection relationship. The invention relates to a device for creating sentences containing mixed kanji and kana.

最近、仮名文字列を入力して漢字・仮名混り文に変換す
る仮名・漢字変換装置が開発されつつある。Recently, kana/kanji conversion devices are being developed that input kana character strings and convert them into sentences containing kanji/kana.

このような変換装置を用いた漢字・仮名混り文作成装置
においては、入力された仮名文字列中から単語を抽出し
、漢字に変換することが必要となる６、このような変換
に当っては、抽出された単語について前後の文法的な接
続関係を調べる方法と、単語辞書中の格納単語と利比し
て最も長く合致したものを抽出するいわゆる最長一致法
とが有効な方法として用いられている。In a device for creating sentences containing kanji and kana using such a conversion device, it is necessary to extract words from the input kana character string and convert them to kanji6. Two effective methods are used: a method that examines the grammatical connections before and after the extracted word, and a so-called longest match method that extracts the longest match by comparing the words stored in a word dictionary. ing.

しかし、従来のこの種の装置においては、先に処理され
た単語は正しいものとして考慮され、後に続く単語が正
しい文法的な接続関係をもたない場合には、当該後に続
く単語が例え如何に長い語数において上記格納単語と一
致がとられていてもこれを棄却するようにされていた。However, in conventional devices of this kind, the word processed first is considered to be correct, and if the following word does not have the correct grammatical connection, the following word is Even if a long number of words matches the stored word, it is rejected.

この従来の装置の方式は、上記人力文字列が正しく入力
されている限ぎり、文法的な接続関係が正しく守られて
有効なものである。This method of the conventional device is effective as long as the above-mentioned human character string is inputted correctly, since the grammatical connections are maintained correctly.

しかし、上記人力文字列の入力に当って入力ミスが存在
した場合には、当該箇所が誤まって変換され、文法的な
接続関係にもとづいて該変換が続いたり、あるいは以後
の単語の変換ができなくなったりすることが生じる。However, if there is an input error when inputting the above-mentioned human character string, the relevant part may be incorrectly converted, and the conversion may continue based on grammatical connections, or the subsequent word conversion may be incorrect. Sometimes things become impossible.

本発明は、この点を解決することを目的としており、予
め定めた語数を超えて格納単語と一致がとられているこ
とを条件に上記文法的な接続関係のチェックを行なうこ
となく当該一致単語を正しいものとして出力するように
し、文法的な接続関係を調べるという原則をくずすこと
なく、処理を進め得るようにすることを目的としている
。The purpose of the present invention is to solve this problem, and on the condition that a match is made with a stored word exceeding a predetermined number of words, the matching word can be used without checking the grammatical connection. The purpose is to output the data as correct, and to be able to proceed with the process without compromising the principle of checking grammatical connections.

そしてそのために、本発明の漢字・仮名混り文作成装置
は、少なくとも漢字に変換されるべき箇所が仮名文字に
よって入力される入力文字列について、当該文字夕１沖
の単語が単語辞書と対応をとられつつ、前および／また
は後の単語との文法的接続関係と、上記単語辞書中の格
納単語に対して対応をとられた単語のうち最も長い一致
が得られたものを選択する最長一致法とにもとづいて決
定され、上記入力文字列を漢字・仮名混り文に変換する
漢字・仮名混り文作成装置において、単語辞書検索部と
文法情報チェック回路部とをそなえ、上記入力文字列中
の単語を上記単語辞書と対応をとるに当って上記最長一
致法によって単語を抽出し、予め定めた語数を超えて一
致した単語については上記文法情報チェック回路部によ
るチェックを行なうことなく出力し、かつ上記予め定め
た語数以下において一致した単語については上記文法情
報チェック回路部によるチェックを行なって出力するよ
うにしたことを特徴としている。To this end, the kanji/kana mixed sentence creation device of the present invention is designed to ensure that, for input character strings in which at least the part to be converted into kanji is input as kana characters, the word in the first part of the character string corresponds to the word dictionary. The longest match method selects the word that has the longest match among the words that are matched with the grammatical connection relationship with the previous and/or subsequent word and the word stored in the word dictionary. The kanji/kana mixed sentence creation device for converting the input character string into a kanji/kana mixed sentence is provided with a word dictionary search unit and a grammatical information check circuit unit. When matching the words in the word dictionary with the word dictionary, the words are extracted by the longest match method, and words that match more than a predetermined number of words are output without being checked by the grammar information checking circuit, Further, the grammar information checking circuit section checks and outputs the words that match within the predetermined number of words.

以下図面を参照しつつ説明する。This will be explained below with reference to the drawings.

第１図は本発明の一実施例構成を示し、第２図は単語の
文法的な接続関係を説明する説明図、第３図は単語辞書
の一実施例構成、第４図は入力ミスを生じている入力文
字列の一例、第５図は従来の装置によって行なわれた処
理の一例、第６図は本発明による処理の一例を示してい
る。Fig. 1 shows the structure of an embodiment of the present invention, Fig. 2 is an explanatory diagram explaining the grammatical connection relationships of words, Fig. 3 shows the structure of an embodiment of a word dictionary, and Fig. 4 shows the structure of an embodiment of the word dictionary. An example of a resulting input character string, FIG. 5 shows an example of processing performed by a conventional device, and FIG. 6 shows an example of processing according to the present invention.

第１図において、１は文字列入力装置、２はバッファ・
レジスタ、３は単語辞書検索装置、４は単語辞書、５は
文法情報チェック回路部、６は出力ファイルを表わして
いる。In Figure 1, 1 is a character string input device, 2 is a buffer
3 is a register, 3 is a word dictionary search device, 4 is a word dictionary, 5 is a grammar information check circuit, and 6 is an output file.

第１図にもとづいた本発明の説明に先立って、従来装置
の概念と従来装置における問題点とを説明する。Prior to explaining the present invention based on FIG. 1, the concept of the conventional device and problems with the conventional device will be explained.

従来装置は第１図図示の構成において図示斜線で表わし
たルートが存在しなかったものと考えてよい。It can be considered that the conventional device did not have the route indicated by diagonal lines in the configuration shown in FIG.

そして、日本文においては、第２図に示す如く、左辺縦
列に示す品詞をもつ各単語が上部横行に示す品詞をもつ
単語に続くか否かが判つており（＊印が存在する箇所は
文法的な接続が許される所を示す）、これにもとづいて
第３図図示の如き単語辞書が第１図図示の単語辞書４と
して用意されている。In Japanese sentences, as shown in Figure 2, it is determined whether each word with the part of speech shown in the left column follows the word with the part of speech shown in the upper horizontal row (the places marked with an asterisk are grammatical). Based on this, a word dictionary as shown in FIG. 3 is prepared as the word dictionary 4 shown in FIG. 1.

即ち、見出しが「ギン」で与えられる単語に対応する漢
字が漢字コードの形で「銀」とされ、以下例えば見出し
が「ギンコラ」、「ギンコラマン」・・・・・・のも
のに対応して漢字コードの形で「銀行」、「銀行マン」
・・・・・・の如く、単語辞書４に格納される。In other words, the kanji corresponding to the word with the heading "gin" is given as "gin" in the form of a kanji code, and below, for example, in response to the heading "ginkora", "ginkoraman", etc. "Bank", "Bankman" in the form of Kanji code
. . . are stored in the word dictionary 4.

そして夫々の見出し・漢字コードの対応に対して、見出
しの語数（見出し長）と漢字に変換されたときの語数（
漢字コード長）と当該単語が如何なる品詞の単語に続く
ことができかつ如何なる品詞の単語を続けることができ
るかを表わすインデックス（ＩＮＤＸ）とが与えられて
いる。Then, for each heading/kanji code correspondence, the number of words in the heading (heading length) and the number of words when converted to kanji (
Kanji code length) and an index (INDX) indicating what part of speech the word can follow and what part of speech the word can follow.

そして仮に今「ギンコラ」なる単語をもって単語辞書４
を検索する場合、格納単語「銀」と「銀行」とにおいて
一致するが、最長一致法にしたがって見出し長のより長
い方即ち「銀行」が抽出されるようにされる。And if we now have the word ``Ginkora'' in Word Dictionary 4.
When searching for , the stored words ``silver'' and ``bank'' match, but the longest match method is used to extract the one with the longer heading length, ie, ``bank.''

そして文法的な接続関係がインデックスによってチェッ
クされる。Then, grammatical connections are checked using the index.

この点を第１図に準拠して説明すると、文字列入力装置
１によって仮名文字列が入力され、バッファ・レジスタ
２にセットされる。To explain this point with reference to FIG. 1, a kana character string is input by a character string input device 1 and set in a buffer register 2.

そして、単語辞書検索装置３がバッファ・レジスタ２上
の文字（ａｔｂｔＣｔｄｔｅ＋ｆ
ｙｇｙ ”’ ）をもって単語辞書４を検索し、当
該辞書４中の格納単語（ａ）（ａ、ｂ）、（ａ、ｂ＋ｃ
）、（ａ、ｂ。Then, the word dictionary search device 3 retrieves the characters (a t b t Ct d t e + f
y g y ”' ) to search the word dictionary 4 and find the words (a) (a, b), (a, b+c) stored in the dictionary 4.
), (a, b.

ｃ、ｄ）と対応がとれたとすると、最長一致法にしたが
って上記文字（ａ、ｂ、ｃ、ｄ、ｅ・・・・・・）中に
は１つの単語として単語（ａ、ｂ、ｃ、ｄ）が存在する
ものとして抽出する。c, d), the word (a, b, c, d) is extracted as existing.

そしてバッファ・レジスタ２上に文字列（ｅ、１９ｇ、
・・・）を残し、抽出した単語（ａ、ｂ、ｃ、ｄ）につ
いて文法情報チェック回路部５によって文法的な接続関
係をチェックする。Then, the character string (e, 19g,
) are left and the grammatical connection relationship of the extracted words (a, b, c, d) is checked by the grammatical information checking circuit unit 5.

即ち、該抽出された単語の直前に成る単語（Ｘ、Ｙ、Ｚ
）が決定されていたとすると、該決定された単語（Ｘ、
Ｙ、Ｚ）と上記抽出された単語（ａ、ｂ、ｃ、ｄ）との
文法的な接続関係をチェックする。That is, the word immediately before the extracted word (X, Y, Z
) has been determined, the determined word (X,
The grammatical connection relationship between the words (Y, Z) and the extracted words (a, b, c, d) is checked.

そして文法的に満足されていれば、上記抽出した単語（
ａ、ｂ、ｃ、ｄ）を正しい単語として決定して出力ファ
イル６に出力する。And if it is grammatically satisfied, the word extracted above (
a, b, c, d) are determined as correct words and output to the output file 6.

文法的に正しくない場合には、上記単語（ａ、ｂ、ｃ、
ｄ）を棄却し、例えば次に最長−致のとれた単語（ａ、
ｂ、ｃ）を抽出するようにする。If it is not grammatically correct, use the above words (a, b, c,
d), for example, the next longest matching word (a,
b, c) are extracted.

従来の装置においては上述の如く処理され、第１図図示
斜線で示すルートが存在しなかった。In the conventional apparatus, processing is performed as described above, and the route shown by diagonal lines in FIG. 1 does not exist.

このために、本来「ギンコウデツカウジドウヨキンキハ
・・・」と入力されているべき所が仮に第４図図示の如
く「ギンコウデツカエジドウヨキンキハ・・」と誤まっ
て入力されたとすると、第５図図示の如く処理されるこ
ととなってしまう。For this reason, if the place that should have originally been entered as "Ginkou detsu ka ji do yokin kiha..." is mistakenly input as "Ginkou detsu ka ji do yokin kiha..." as shown in Figure 4, then This results in processing as shown in FIG.

即ち、入力ミスのあった「え」から次に［−自動預金機
」なる単語が最長一致法にもとづいて抽出されたとして
も、ワ行五段活用語尾である「え」に対して名詞が接続
され得ないことから上記単語「自動預金機」は棄却され
てしまう。In other words, even if the next word [-automatic teller machine] is extracted from the incorrectly inputted ``e'' based on the longest match method, the noun is Since it cannot be connected, the word "automatic teller machine" is rejected.

この結果、１え］につながる単語を見出すことができず
、未変換のまま「じ」「ど」・・・と出力されてしまう
こととなる。As a result, words connected to ``1e'' cannot be found, and ``ji'', ``do'', etc. are output without conversion.

また場合によっては、上記「自動預金機」でなく、たま
たま「内線電話機」の如き単語が上記「え」の後に存在
していたとすると、「銀行で使えない線で・・・」と処
理されることもあり得る。Also, in some cases, if a word like "extension telephone" happens to exist after the above "e" instead of "automatic deposit machine" above, it will be processed as "It's a line that can't be used at banks..." It is possible.

即ち入力ミスのあった箇所が原因となって誤変換を生じ
ることとなる。In other words, an erroneous conversion occurs due to an input error.

この点を改善するために、本発明の場合、第１図図示斜
線ルートをもうけ、例えば最長一致法によって抽出され
た単語が語数１−５」を超えるものであった場合には、
文法情報チェック回路部５による文法的接続関係をチェ
ックすることなく、正しい単語として出力するようにす
る。In order to improve this point, in the case of the present invention, a diagonal route shown in Figure 1 is provided, and for example, if the word extracted by the longest match method exceeds 1-5 words,
Words are output as correct words without checking grammatical connections by a grammatical information checking circuit section 5.

そして語数「５」以下の単語についてのみ文法情報チェ
ック回路部５によるチェックを行なうようにする。Then, the grammar information checking circuit section 5 checks only words with a word count of "5" or less.

このための構成は、第１図図示の単語辞書検索装置３に
おいて、最長一致のとれた単語の語長を調べ、語数「５
」以下のもののみを文法情報チェック回路部５に渡すよ
うにすれば足りる。The configuration for this purpose is to check the word length of the longest matching word in the word dictionary search device 3 shown in FIG.
” It is sufficient to pass only the following to the grammar information checking circuit unit 5.

このようにすることによって、第４図詞示の如き入力ミ
スのある文字列を変換する場合にも、単語「自動預金機
」が第６図図示の如く正しい単語として決定され、以下
「は−１・・・と変換されてゆくこととなる。By doing this, even when converting a character string with an input error as shown in Figure 4, the word "automatic deposit machine" is determined as the correct word as shown in Figure 6, and henceforth "is - 1... will be converted.

換言すると、入力されているべき「使・う」が「使え」
と誤まって入力されていたとしても、「自動預金機」の
如き長い語数をもつ単語が抽出された所で変換処理が正
常に戻されてゆくこととなる。In other words, "Use/U" that should have been entered is "Use".
Even if it is entered incorrectly, the conversion process will return to normal once a word with a long number of words such as "automatic teller machine" is extracted.

以−Ｌ説明した如く、本発明によれば、最長一致法と文
法的な接続関係チェックとの原則を基本的にはくずすこ
となく、入力ミスなどに起因する誤変換の連鎖を断ち切
って正しい処理に戻すことが可能となる。As explained above, according to the present invention, the chain of erroneous conversions caused by input errors can be broken and correct processing can be performed without fundamentally breaking the principles of the longest match method and grammatical connection relation checking. It is possible to return to.

[Brief explanation of drawings]

第１図は本発明の一実施例構成を示し、第２図は単語の
文法的な接続関係を説明する説明図、第３図は単語辞書
の一実施例構成、第４図は入力ミスを生じている入力文
字列の一例、第５図は従来の装置によって行なわれた処
理の一例、第６図は本発明による処理の一例を示してい
る。図中、１は文字列入力装置、２はバッファ・レジスタ、
３は単語辞書検索装置、４は単語辞書、５は文法情報チ
ェック回路部、６は出力ファイルを表わす。Fig. 1 shows the structure of an embodiment of the present invention, Fig. 2 is an explanatory diagram explaining the grammatical connection relationships of words, Fig. 3 shows the structure of an embodiment of a word dictionary, and Fig. 4 shows the structure of an embodiment of the word dictionary. An example of a resulting input character string, FIG. 5 shows an example of processing performed by a conventional device, and FIG. 6 shows an example of processing according to the present invention. In the figure, 1 is a string input device, 2 is a buffer register,
3 is a word dictionary search device, 4 is a word dictionary, 5 is a grammar information check circuit, and 6 is an output file.

Claims

[Claims]

1 For input character strings in which at least the part to be converted into kanji is input as kana characters, the words in the character string are matched with the word dictionary and the grammatical connections with the previous and/or subsequent words are determined. It is determined based on the longest match method, which selects the word with the longest match among the words stored in the word dictionary, and converts the input character string into a kanji/kana mixed sentence. Kanji to be converted into
The kana-mixed sentence creation device includes a word dictionary search unit and a grammar information check circuit unit, and extracts words by the longest match method when matching words in the input character string with the word dictionary; Words that match more than the predetermined number of words are output without being checked by the grammar information check circuit, and words that match less than the predetermined number of words are checked by the grammar information check circuit. A device for creating sentences containing mixed kanji and kana characters.