JPH0630098B2

JPH0630098B2 - Kana-Kanji converter

Info

Publication number: JPH0630098B2
Application number: JP59194307A
Authority: JP
Inventors: 泰男小山
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 1984-09-17
Filing date: 1984-09-17
Publication date: 1994-04-20
Anticipated expiration: 2009-04-20
Also published as: JPS6172361A

Description

【発明の詳細な説明】〔技術分野〕本発明は、カナコードで入力された日本語文章をかな漢
字混じり文に変換して出力する、かな漢字変換装置に関
する。Description: TECHNICAL FIELD The present invention relates to a kana-kanji conversion device for converting a Japanese sentence input by a kana code into a kana-kanji mixed sentence and outputting the sentence.

[Prior art]

従来のかな漢字変換装置（例えば特開昭５７−１２７２
６７号公報参照）においては、国語辞書を検索し該当単
語のない場合、あるいは入力ミス等により文法的に誤り
がある場合、かな漢字変換処理すべてがエラーとなる
か、少なくとも該当文節、単語レベルまではエラーとな
っていた。しかし複数文節変換等を行う場合は長いかな
文字列を入力しており、一部の入力ミス等により少なく
とも文節、単語レベルまでエラーとなることは非常に不
経済であった。A conventional kana-kanji conversion device (for example, Japanese Patent Laid-Open No. 57-1272).
No. 67), if the Japanese dictionary is searched and there is no corresponding word, or if there is a grammatical error due to an input error or the like, all kana-kanji conversion processing will result in an error, or at least the applicable phrase and word level. There was an error. However, when performing multiple bunsetsu conversion, etc., a long kana character string was input, and it was very uneconomical to make an error at least at the bunsetsu and word levels due to some input errors.

〔目的〕本発明は、この様な問題に鑑み、該当単語辞書にない場
合、最低限の対応により残る文字列をできる限り有効に
解析処理し、再入力動作を最低限におさえ、経済的な入
力を可能とするかな漢字変換装置を提供する事を目的と
する。[Purpose] In view of such a problem, the present invention is an economical method that analyzes the remaining character string as effectively as possible with a minimum correspondence when it is not in the corresponding word dictionary, and minimizes the re-input operation. The purpose is to provide a kana-kanji conversion device that enables input.

[Characteristics of the Invention]

本発明は、キー入力等により入力されたかな文字列について複数文
節を解析し、かな漢字変換を行なうかな漢字変換装置に
おいて、前記入力されたかな文字列に基づき辞書を検索して文節
候補を生成する文節候補生成手段と、該生成された文節候補と、該文節候補の後部であって該
文節候補の属する文節の区切り位置に位置する語との接
続を文法的に解析する文節末語検定手段と、該文節末語検定手段により前記文節候補と文法的に接続
しないと判定された語を不定語として抜き出し、残りの
前記入力されたかな文字列の解析を続行させる不定語発
生手段とを備えることを特徴とする。The present invention is a kana-kanji conversion device that analyzes kana-character strings input by key input or the like and performs kana-kanji conversion, and a phrase that generates a phrase candidate by searching a dictionary based on the input kana-character string. A candidate generation means, a clause end word test means for grammatically analyzing a connection between the generated clause candidate and a word located at a delimiter position of a clause to which the clause candidate belongs, which is a rear part of the clause candidate; An indefinite word generating means for extracting a word determined to be not grammatically connected to the bunsetsu candidate as an indefinite word by the bunsetsu ending word testing means and continuing analysis of the remaining input kana character string. Characterize.

〔Example〕

ここで、以下に本発明の詳細を図示した実施例に基づい
て説明する。Here, the details of the present invention will be described below based on illustrated embodiments.

第１図は本発明のかな漢字変換装置の一実施例を示す概
略ブロック図である。図中符号１は、キーボード等によ
るデータ入力部である。このデータ入力部１で入力され
たデータのうち、かなデータのみが、かな文字文章受取
格納部２に蓄積され、データ入力部１による解析開始の
制御情報により、文節候補展開手段３が、かな桁位置１
桁目より、それを基点として、国語辞書検索手段４によ
り、国語辞書５を索引し、文節候補展開手段３が、文節
候補格納部７に文節候補を格納し、この際、文節末とな
る候補の終り桁位置も格納する。そして、基点のかな桁
位置に１を加算し、つぎの基点による解析に入るが、こ
の時、この基点以後に文節末となる桁位置がない場合、
不定語発生手段６により、１桁のかな不定語文節を発生
しこの桁を文節末とする。この際にして、文節候補が展
開され、文節候補バッファ８に格納されたら、文節候補
接合検定手段９が文法チェック部１０により、文法情報
をチェックし、文法エラーデータは、文節候補バッファ
８内のデータにエラーフラグを立てる。さらにこの処理
が終了したら、文書候補再編集手段１１が文節候補バッ
ファ８のデータに対して、不定語無しで、全体の結合状
態が保てるかを検定し、それが可能であれば、不定語デ
ータをエラーとして文書候補接合検定手段９によりエラ
ーとなったデータと共に削除し、有効データを文節候補
バッファ８に再編成する。FIG. 1 is a schematic block diagram showing an embodiment of the kana-kanji conversion device of the present invention. Reference numeral 1 in the figure is a data input unit such as a keyboard. Of the data input by the data input unit 1, only the Kana data is accumulated in the Kana character sentence receiving / storing unit 2, and by the control information of the analysis start by the data input unit 1, the bunsetsu candidate expanding means 3 causes the kana digit to expand. Position 1
Starting from the digit, the national language dictionary searching means 4 indexes the national language dictionary 5 from that position, and the bunsetsu candidate developing means 3 stores the bunsetsu candidates in the bunsetsu candidate storage unit 7. At this time, the bunsetsu candidates are candidates. The ending digit position of is also stored. Then, 1 is added to the kana digit position of the base point, and the next base point analysis is started. At this time, if there is no digit position at the end of the phrase after this base point,
The indefinite word generating means 6 generates a one-digit kana indefinite phrase and sets this digit as the end of the phrase. At this time, when the bunsetsu candidates are expanded and stored in the bunsetsu candidate buffer 8, the bunsetsu candidate joining verification means 9 checks the grammatical information by the grammar checking unit 10, and the grammatical error data is stored in the bunsetsu candidate buffer 8. Set an error flag on the data. Further, when this processing is completed, the document candidate re-editing means 11 verifies the data in the phrase candidate buffer 8 whether or not the entire combined state can be maintained without an indefinite word, and if that is possible, the indefinite word data. Is deleted as an error together with the data in error by the document candidate joining verification means 9, and valid data is reorganized in the phrase candidate buffer 8.

これらの一連の作業が終了すると、文節候補出力手段１
２が、再編成された文節候補バッファ８のデータを、Ｃ
ＲＴなどに出力する。When these series of operations are completed, the phrase candidate output means 1
2 uses the data of the reorganized phrase candidate buffer 8 as C
Output to RT etc.

次に第２図のフローチャートを用いて第１図の実施例の
動作をさらに詳細に説明する。Next, the operation of the embodiment of FIG. 1 will be described in more detail with reference to the flowchart of FIG.

まず、データ入力部１により入力されかな文字文章受取
格納部２に蓄積されているかな文字列の桁位置を示すか
な文字桁位置ポインタ（図示せず）の値が１にセットさ
れる（＜１＞）。次に、かな文字桁位置ポインタと入力
されたかな文字列の桁数が比較され（＜２＞）、かな文
字桁位置ポインタの値が、入力かな文字列の値以下なら
＜３＞の処理となり、入力かな文字列の値よりかな文字
列ポインタの値が大きくなると後述の＜１０＞以下の処
理となる。First, the value of a kana character digit position pointer (not shown) indicating the digit position of the kana character string input by the data input unit 1 and accumulated in the kana character sentence reception storage unit 2 is set to 1 (<1 >). Next, the kana character digit position pointer and the number of digits of the input kana character string are compared (<2>), and if the value of the kana character digit position pointer is less than or equal to the value of the input kana character string, the processing of <3> is performed. If the value of the kana character string pointer becomes larger than the value of the input kana character string, the processing described in <10> and below will be performed.

＜３＞においては、国語辞書検索手段４が国語辞書５を
索引してさらに文節候補展開手段３が文節候補格納部７
に文節候補を格納し、その際、文節末となる候補の終わ
り桁位置も格納する。この点について第３図を用いてさ
らに詳しく説明する。In <3>, the national language dictionary searching means 4 indexes the national language dictionary 5, and the bunsetsu candidate expanding means 3 further sets the bunsetsu candidate storing section 7.
The bunsetsu candidate is stored in, and the ending digit position of the bunsetsu candidate is also stored. This point will be described in more detail with reference to FIG.

第３図において３０は、データ入力部１より入力された
かな文字列であり、３１は、その桁位置に対応した文節
末桁テーブルであり、文末の語が存在する桁位置に対応
した場所にフラグガ立つものである。In FIG. 3, 30 is a kana character string input from the data input unit 1, 31 is a phrase end digit table corresponding to the digit position, and is a place corresponding to the digit position where the word at the end of the sentence exists. It stands out.

今、例を挙げて説明すると、まず、「わたし」という自
立語が見つけられて、（正確には、わ（輪）という自立
語があるが説明の都合上、わたしでもって説明す
る。）、その後に続く語尾として「は」が見つけられ、
「わたしは」という語が切り出される。そして、この文
節末が正しいか、文節候補展開手段３によって検定さ
れ、今の場合のように文節末として正しい語が続いてい
る場合には対応する桁位置の文節末桁テーブルにフラグ
を立てる。しかし、例えば、「は」の換わりに「ん」が
入力されている場合には、文節末の検定によってこの文
節末の語は正しくないと判定され、文節末桁テーブルに
はフラグはたたない。Now, giving an example, first, I found the independent word "I" (to be precise, there is an independent word "wa", but for the sake of explanation, I will explain it myself). "Ha" is found as the ending that follows,
The word "I am" is cut out. Then, it is verified by the bunsetsu candidate expanding means 3 whether this bunsetsu end is correct, and if the correct word continues as the bunsetsu end as in the present case, a flag is set in the bunsetsu end digit table at the corresponding digit position. However, for example, if "n" is input instead of "ha", the word at the end of the phrase is determined to be incorrect by the test at the end of the phrase, and no flag is set in the end-of-segment digit table. .

次に再び第２図のフローチャートに戻り動作の説明を続
ける。＜３＞の後、次に切り出された自立語等の語幹か
らみて最も近い後方の文節末位置（第３図の例でいうな
ら「は」の位置）に文節末テーブルフラグがたっている
か調べられ（＜４＞）、もし、フラグが立っていれば、
そのフラグの立っている桁位置に１つ加算した値をかな
文字桁位置ポインタの値とし（＜５＞），＜２＞の処理
にもどる。又、＜４＞の処理においてフラグが立ってい
ない場合には、不定語発生手段６によって１文字の不定
語が発生される（＜６＞）。すなわち、本来ならフラグ
が立っているべき桁位置にあるかな文字が不定語とされ
る。＜６＞の後ではかな文字桁位置ポインタの値は、そ
れまでの値（不定語の桁位置を示している。）に＋１加
算した値となり（＜７＞）、さらにその後＜２＞の処理
にもどる。Next, returning to the flowchart of FIG. 2 again, the description of the operation is continued. After <3>, it is checked whether the phrase end table flag is set at the nearest backward phrase end position (the position of “ha” in the example of FIG. 3) closest to the stem of the cut out independent word or the like. (<4>), if the flag is set,
The value obtained by adding one to the digit position where the flag is set is set as the value of the kana character digit position pointer (<5>), and the process returns to <2>. If the flag is not set in the process of <4>, the indefinite word generating means 6 generates an indefinite word of one character (<6>). That is, the kana character at the digit position where the flag should have been set is regarded as an indefinite word. After <6>, the value of the kana character digit position pointer becomes a value obtained by adding +1 to the value up to that point (indicating the digit position of the indefinite word) (<7>), and then the processing of <2>. Return to.

又、＜２＞の処理においてかな文字桁位置ポインタの値
がかな文字列の入力桁数を越えている場合には、次に文
章末の語が文末の語として正しいかどうかが判定され
（＜１０＞）、正しくない場合には文末の語を不定語に
して（＜１１＞）処理を終了し、正しい場合にはそのま
ま終了する。When the value of the kana character digit position pointer exceeds the number of input digits of the kana character string in the processing of <2>, it is next determined whether the word at the end of the sentence is correct as the word at the end of the sentence (<10>), if it is not correct, the word at the end of the sentence is set as an indefinite word (<11>), and the process is ended.

第４図は、第１図における文書候補展開手段３により不
定語発生手段６で不定語が候補として発生した例であ
る。例とて比較的ミス入力してしまうケースで「きょう
はてんきが」と入力すべきとろを「きょぅはてんきが」
と入力されてしまった場合をあげている。FIG. 4 is an example in which an indefinite word is generated as a candidate by the indefinite word generating means 6 by the document candidate expanding means 3 in FIG. For example, in the case of making a relatively mistaken input, "Kyouhatenkiga" should be entered as "Kyouhatenkiga"
The case where it is input is given.

ただし、この時使用した国語辞書には、便宜上記入され
ているものだけが登録されているものとする。However, for the sake of convenience, it is assumed that only those that have been filled in are registered in the national language dictionary used at this time.

この場合は、第２，３文字目の「ょ」と「ぅ」が不定語
として１文字単位に発生している。In this case, the second and third characters "yo" and "u" occur as indefinite words in units of one character.

これによると、前半の「きょうは」に相当する「きょぅ
は」の解析に失敗しているが、後半の「てんきが」に相
当する部分の解析には、成功している事がわかる。According to this, it can be seen that the analysis of “Kyouha” corresponding to “Kyoha” in the first half has failed, but the analysis of the part corresponding to “Tenkiga” in the latter half has succeeded. .

〔effect〕

以上説明したように本発明では、生成された文節候補
と、該文節候補の後部であって該文節候補の属する文節
の区切り位置に位置する語との接続を文法的に解析し、
その結果により不定語を抜き出すので、ごく少数の語句
の組み合わせについてのみ調べるだけで不定語が抽出で
き不定語の抽出が簡単かつ、短時間に行なえる。As described above, in the present invention, the connection between the generated phrase candidate and the word located at the delimiter position of the phrase to which the phrase candidate belongs is grammatically analyzed,
Since the indefinite term is extracted based on the result, the indefinite term can be extracted only by examining only a few combinations of words and phrases, and the indefinite term can be extracted easily and in a short time.

又、文法的な解析に基づいているので、誤入力の結果、
文法的には接続しない単語の羅列（例えば、「わたし
あ」の様に個々の単語は辞書に有るケース）となった場
合でも不定語の抽出が可能となる。Also, because it is based on grammatical analysis,
Indefinite words can be extracted even when a list of words that are not grammatically connected (for example, a case where individual words are in a dictionary such as “I”) is extracted.

[Brief description of drawings]

第１図は本発明のかな漢字変換装置の１実施例の概略を
示す機能ブロック図である。第２図は第１図の実施例の動作のブロックフローチャー
トである。第３図は、第２図のブロックフローチャートの一部をさ
らに詳細に説明するための図である。第４図は本発明の適用された、具体的な文節候補展開例
である。３……文節候補展開手段６……不定語発生手段FIG. 1 is a functional block diagram showing an outline of one embodiment of the kana-kanji conversion device of the present invention. FIG. 2 is a block flowchart of the operation of the embodiment shown in FIG. FIG. 3 is a diagram for explaining a part of the block flowchart of FIG. 2 in more detail. FIG. 4 is a specific example of expanding the clause candidates to which the present invention is applied. 3 ... clause candidate expansion means 6 ... indefinite word generation means

Claims

[Claims]

1. A kana-kanji conversion device that analyzes kana-kanji character strings input by key input or the like and performs kana-kanji conversion, searches a dictionary based on the input kana-character strings to generate phrase candidates. A clause candidate generating means, a clause ending word test means for grammatically analyzing a connection between the generated clause candidate and a word located at a delimiter position of a clause to which the clause candidate belongs, which is a rear part of the clause candidate. An indefinite word generating means for extracting a word determined to be not grammatically connected to the bunsetsu candidate as an indefinite word by the bunsetsu end word testing means and continuing the analysis of the remaining input kana character string. Kana-to-Kanji conversion device featuring.