JP2766084B2

JP2766084B2 - Kana-Kanji conversion method

Info

Publication number: JP2766084B2
Application number: JP3043201A
Authority: JP
Inventors: 正博阿部; 正紀川瀬
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1991-03-08
Filing date: 1991-03-08
Publication date: 1998-06-18
Anticipated expiration: 2013-06-18
Also published as: JPH04211862A

Description

【発明の詳細な説明】【０００１】【産業上の利用分野】本発明は仮名入力を漢字仮名混り
文に変換する仮名漢字変換装置に係り、特に入力が文節
単位に分ち書きされなくても変換可能な、いわゆるべた
書き入力に好適な仮名漢字変換装置に関するものであ
る。【０００２】【従来の技術】従来、べた書き入力の仮名漢字変換の方
法としては、日立評論昭和５６年５月第６３巻第５号
「ＨＩＴＡＣＬ−３２０／３０Ｈ，５０Ｈ文書処理機
能」（以下文献（１）と呼ぶ）に述べられている最長一
致バックトラックを組合せた方法、昭和５３年度情報処
理学会第１９回全国大会論文集５Ｅ−４「べた書き文の
カナ漢字変換システム」（以下文献（２）と呼ぶ）に述
べられている二文節最長一致法、昭和５６年情報処理学
会計算言語学研究会資料２５−６「表方式を用いた文節
構造分析アルゴリズムとその能率について」（以下文献
（３）と呼ぶ）に述べられている文節数最小法などがよ
く知られている。【０００３】文献（１）の方法は、入力左端から最長一
致により自立語を切り出し、次にその自立語に文法的に
接続可能な付属語を引当てる文節変換処理を右端に達す
るまで繰返し行うもので、右端に達しない場合はバック
トラック（後戻り）して別の文節変換を試み先へ進むも
のである。【０００４】文献（２）の方法は、入力左端から２文節
にわたって全ての可能な変換候補を総当り的に抽出しそ
の中から２文節の長さの和が最長となるものを選んで１
文節目の切れ目とし、今後はその点を始点として同じこ
とを繰返すことにより１文節ずつ切れ目を決定しながら
変換する方法である。【０００５】文献（３）の方法は、左端から総当り的に
文節の切り出しを行い、その中から文節数が最小となる
組合せを選び出して変換結果とする方法である。【０００６】【発明が解決しようとする課題】べた書き入力では、文
節と文節の間の切れ目をどこにするかという選択の余地
があるために、文節分ち書きの場合にくらべて、一般に
変換の候補となる多義が多く生ずる。たとえば、「すう
がくかいせきじょうでは」という入力に対しては「数学
解析上では」，「数学科移籍上では」，「数学会席上で
は」，「数学か遺跡上では」など多くの解釈が可能であ
る。したがって、文献（１），（２），（３）に述べら
れている方法を用いても高い変換精度を得るのは難しい
という問題があった。【０００７】本発明の目的は変換の確からしさの尤度が
高い複数の変換候補を効率よく抽出，保持し、正しい変
換結果をその中から容易に速く選択，確定するべた書き
入力向きの手段を提供することにある。【０００８】【課題を解決するための手段】前記目的を達成するため
に本発明の仮名漢字変換方法は、複数の文節を含み、文
節単位に分かち書きされていない仮名文字列を入力する
文字列入力手段と、仮名文字列から複数の文節を切り出
し、各文節の先頭文字を先頭とする少なくとも１つの変
換候補を抽出して仮名漢字変換を行う仮名漢字変換手段
と、表示手段とを有する仮名漢字変換装置により仮名漢
字変換を行う仮名漢字変換方法において、仮名漢字変換
を指示する入力に応じて、抽出された変換候補のうち、
各文節に対する確からしさが最も高い変換候補から構成
される候補文字列を表示手段に表示するとともに、注目
する文節に対応する変換候補を他と識別可能に表示し、
注目する文節の文節切れ目を変更する入力に応じて、仮
名文字列の変更された文節切れ目より後続の部分に対し
変換の確からしさを再評価して得られた変換候補を表示
手段に表示するようにしたものである。【０００９】【作用】このように、可能性の高い候補を選んで保持し
ておくことにより、変換誤りが生じても高速に別の候補
を表示選択することが可能となり、そのために必要な記
憶装置の最も少なくて済む。また変換誤りを別の候補を
選択することにより修正した場合は、その部分だけでな
く、それ以後の候補も再評価するので、たとえば、候補
間の区切りの位置が変った場合は、自動的に新しい区切
り位置から始まる後続の候補列が準備され、以後の修正
の手数を減少することが可能となる。【００１０】【実施例】以下、本発明の一実施例を説明する。【００１１】図１は本発明の仮名漢字変換装置の構成を
表わす図で、入出力部１，制御部２および記憶部３から
成る。入出力部１は仮名入力および変換、選択指示を入
力するためのキーボードと変換結果および選択候補を表
示するための表示部とからなる。【００１２】制御部２は仮名漢字変換の実行制御を司
る。記憶部３は入力仮名文字列や変換で用いられるデー
タを一時保持する。【００１３】本仮名漢字変換装置を用いた変換の実行方
法を図２に示す。ユーザは入出力部１から仮名文字列を
入力し変換を指示する（１０）。制御部２はこの仮名入
力に対して変換の確からしさの尤度が高い候補を作成し
記憶部３に格納する変換処理（２０）を行う。次に制御
部２は、記憶部３にある候補の中から最も尤度の高い候
補列と先頭の候補に対応して選択すべき代替候補を作成
し（３０）、入出力部１に表示させる（４０）。ユーザ
が、選択候補の中から目的の候補を選択する（５０）
と、制御部２は、選択された部分を確定し、選択された
候補の学習を行う（６０）。次に制御部２は、未確定部
分について再び表示候補作成処理（３０）を行い、後続
部分の変換結果と代替候補を作成し表示する。以上の表
示（４０）、選択（５０）、確定処理（６０）、表示候
補作成処理（３０）を変換結果がすべて確定されるまで
繰返す。【００１４】以下、上で述べた制御部２の動作を更に詳
しく説明する。【００１５】図３は変換処理（２０）の動作を表わすフ
ローチャート図である。図４は変換処理（２０）によっ
て記憶部３上に作られるデータの一例を示す図である。
図４に示した例を用いて、図３の変換処理（２０）を具
体的に説明する。【００１６】今、入力文字列が「すうがくかいせきじょ
うでは」であったとする。図３、および図４において、
ｎは入力文字列の先頭から数えた文字の位置を示すポイ
ンタとする。図３のステップ２０１でまずｎを１にセッ
トし入力文字列の先頭に位置づける。ステップ２０２で
ｎの位置が終了している文節端があるかどうかチェック
する。文字列の先頭は特別に文節端とみなすとする。ス
テップ２０３に移って、文節変換を行う。文節変換で
は、ｎの位置を文節の先頭として１文節分の変換を行い
可能な変換候補を抽出する。ここで文節とは１つの自立
語とその前に省略可能な接頭語，自立語の後に省略可能
な接尾語、および省略可能な付属語が連なった形式のも
のをさす。複合語は文節とみなす。図４の例では、「数
学」，「数」，「吸う」，「数学会」，「数学界」，
「数学階」，「数が」，「吸うが」等が先頭から変換候
補として抽出される。この文節単位の仮名漢字変換とし
ては、ＮＨＫ技術研究第２５巻第５号昭和４８年５月
「計算機によるカナ漢字変換」に示されている方法がよ
く知られている。次にステップ２０４で変換候補の確か
らしさの尤度を判定し、記憶部３に保持すべきか、捨て
る（技刈りと呼ぶ）べきか決める。確からしさの尤度
は、基本的には文献（１）,（２），（３）に述べられ
ているように、入力文字列をより長い文節の列、別の言
い方をすればより少ない文節の列に分解する方が尤度が
高くなるように決める。ただし同じ文節では「この」
「その」などの連体詞や、「こと」，「もの」などの形
式名詞などは他の文節に付属して使用されることが多い
ので名詞や動詞のように独立した一つの文節とみなさ
ず、文節数を数える場合１より小さな値とする。このよ
うに、品詞および出現頻度等を考慮した重みを掛けて文
節数を求め、その数が少ない程尤度が高いとする。具体
的には、名詞，動詞，形容詞，形容動詞に重み１、形式
名詞，補助動詞，連体詞等には０.１、接頭語，接尾語
は準自立語扱いとして０.５の重みを与えるものとす
る。【００１７】枝刈りは、文字列先頭から現在判定の対象
となっている文節の後端までの尤度を求め、それをその
文節後端文字位置における尤度と定めて、もし既に同じ
文字位置において尤度が求まっている場合はその値と比
較し、その値がある許容値を越える場合枝刈りを実行す
る。本実施例では、この許容値は同じ文字位置における
重みつき文節数の最小値＋１である。しかし本発明は、
確からしさの尤度の決め方、および枝刈りの許容値の大
きさによって制限されるものではないことは言うまでも
ない。本発明の特徴は、確からしさの尤度がある範囲内
にある文節候補をすべて抽出し、記憶部３上に保持する
ことにより、後の選択，修正を高速に容易に行なえる点
にある。【００１８】図４において、先頭より抽出された各文節
はすべて記憶部３に格納される。（ステップ２０５）。
格納する場合、データを図４に示したようなネットワー
ク状にすることにより占有する記憶容量を小さくするこ
とができる。たとえば、「数」と「吸う」は文節右端を
共有することにより、その後続の文節を一元化すること
が可能である。このようなデータ表現の具体的方法はリ
スト処理としてよく知られている。【００１９】図３ステップ２０６ではｎに１を加えてポ
インタを次の入力文字位置に進める。ステップ２０７で
ポインタｎの文字位置に入力がまだあるかどうか判定
し、あればステップ２０２に戻る。【００２０】ｎが２のとき本文字位置に文節右端はない
のでステップ２０６に行き、直ちに次の文字位置にポイ
ンタｎを移す。ｎが３の場合は、「数」，「吸う」とい
う文節端が可能であるので、ステップ２０３で本位置か
ら次の文節の抽出を行い、「額」，「楽」，「額か」，
「額会」，「額科」などの文節が得られる。ステップ２
０４でｎが４の位置ですでに「数学」が候補として得ら
れており、その重みつき文節数は１である。「額」，
「楽」の文節候補ではｎが４の位置の重みつき文節数は
「数」の１と「額」または「楽」の１を加えて２とな
る。枝刈り条件から「額」，「楽」は許容範囲内におさ
まるのでステップ２０５でネットワークに登録されるこ
とになる。【００２１】以上述べたように、ステップ２０２からス
テップ２０７をｎが１４になるまで繰返すことにより図
４に示した完全なネットワークが得られる。ただし、こ
こで「会，界，階，科，化，上，場，状」は接尾語とす
る。【００２２】ステップ２０８では、変換結果の未確定部
分の先頭の位置を表わすポインタＰを１に初期設定し、
まだ確定部分がないことを示しておく。【００２３】以上により制御部２における変換処理（２
０）の動作が終了し、次に表示候補作成処理（３０）の
動作に移る。【００２４】図５は表示候補作成処理（３０）の動作を
表わすフローチャート図である。【００２５】ステップ３０１では、Ｐ点より入力文字列
終端までの候補列の中から尤度の最も大きな候補列を変
換結果として表示バッファにセットする。もしも尤度の
同じ候補列が複数ある場合は、記憶部３上にあって、既
に確定された文字列から複合語や接辞付きの自立語を保
持している学習テーブルを参照し、このテーブル上にあ
る語を最も多く含む候補列を変換結果として選択する。
また以上でも一意に決らない場合は、自立語長の和の最
も大きな候補列を選択する。この他に、変換結果を選ぶ
方法としては、単語の頻度を用いる方法等もあり、以上
述べた要因を任意に組合せて別の選択方法を作ることが
できる。【００２６】ステップ３０２では、Ｐ点から始まる他の
候補群を表示バッファにセットしユーザが選択できるよ
うにする。【００２７】図４の例の場合、Ｐが１のとき変換結果と
して「数学解析上では」が選ばれ、選択のための候補群
として、「数学」，「数」，「吸う」が取出され表示バ
ッファにセットされる。さらに、現在選択の対象となっ
ている文字列部分を明示するため、変換結果、および入
力文字列の該当部分が強調表示される。（表示（４
０））。【００２８】図７は入力（１０）が終った直後の入出力
部１に表示される画面情報を、図８は表示候補作成処理
（３０）が終った後表示される画面情報を示す。本画面
で最下部は入力行、上部は変換結果出力行、右下は選択
候補表示エリアである。【００２９】ユーザは画面上の変換結果を見て選択（５
０）を行う。選択は選択候補群の中から目的の１つを選
ぶ操作をキーボードにより指示するが、画面上部に強調
表示されている変換結果が正しい場合はある特定のキー
を押下することにより選ぶことも可能である。【００３０】制御部２はユーザにより選択（５０）が行
われると、次に確定処理（６０）の動作を実行する。【００３１】図６は確定処理（６０）の動作を表わすフ
ローチャート図である。【００３２】ステップ６０１では、ユーザが選択指示し
た候補に対応する記憶部３上のネットワークのデータに
マークを付けるとともに、Ｐに選択された候補の長さを
加えて、後続の未確定部の先頭位置を示す。【００３３】ステップ６０２では、選択された候補を記
憶部３上の学習テーブルに読みと共に格納する。【００３４】制御部２は表示候補作成処理３０により確
定処理６０で選択された候補および、Ｐ点以降の候補の
ネットワークの尤度を再評価して得られる後続の変換結
果と、Ｐ点から始まる他の候補群を表示バッファにセッ
トする。【００３５】図９にユーザが「数学」を選択した後の画
面表示情報を示す。強調表示は「数学」の後の「解析」
の部分に移っている。選択候補群としては「解析」，
「会」，「界」，「階」，「科」，「化」，「か」が表
示される。【００３６】ユーザが、候補の中から「解析」ではなく
「会」を選択すると、図１０に示すように、「会」の後
の変換結果が第４図のネットワークから得られる「席
上」に自動的に変更され、選択の候補群として「席
上」，「関」，「咳」が表示される。このように候補間
の隣接関係を保持しているため、文節切れ目が選択（５
０）によって変更された場合は、後続の表示も同時に修
正することが可能であり、以後の修正の工数を減らすこ
とができる。【００３７】ユーザが「会」を選択したことによって、
確定処理（６０）で学習テーブルも更新されるが、今の
場合、すでに「数学」が登録されており、「会」が接尾
語であることから、以前に登録されている「数学」に
「会」が追加され「数学会」の形で再登録が行われる。
図１１に「数学会」が登録された後の学習テーブルの内
容を示す。【００３８】制御部２は未確定部がなくなるまで表示候
補作成処理（３０）と確定処理（６０）を繰返して実行
する。すべての変換結果が確定されると処理を終了す
る。【００３９】以上述べた一連の処理が終了すると記憶部
３上のネットワークのデータは消去されるが、学習テー
ブル上のデータは保持され、以後同じ文字列が入力され
た場合は優先的に変換結果として採用されるので、使用
に応じて変換精度を高めることができる。従来も、変換
に伴う学習機能は用いられていたが、接頭語や接尾語は
それ単独で学習しても、同音異議のもの同志が同一文章
中に現われることが多いので、かえって変換精度を落す
ことが多く問題であった。また「新聞記者が汽車で」な
どのように、同音異議語が現われる場合は「きしゃ」と
いう読みの単位で学習していたので、「記者」と「汽
車」を区別することができなかった。本発明によれば、
接頭語や接尾語はそれが付く自立語と共に学習し、複合
語は自立語に分解せず長い単位で学習するので、学習に
よる精度の向上をより高くできる効果がある。【００４０】【発明の効果】本発明によれば、変換の確からしさの尤
度が高い候補を保持しているので、もし変換結果に誤り
があっても直ちに別の候補を選択することができ修正を
高速化できる効果がある。候補の作成保持のための処理
をユーザの入力作業中に並行して行うことにより、更に
ユーザの変換待ち時間を短くすることが可能である。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a kana-kanji conversion device for converting a kana input into a kanji-kana mixed sentence, and more particularly to a kana-kanji conversion device in which the input is not broken down into clause units. The present invention relates to a kana-kanji conversion device suitable for so-called solid input, which can also convert characters. 2. Description of the Related Art Conventionally, as a method of converting kana to kanji characters in solid writing, Hitachi Review, May 1981, Vol. 63, No. 5, "HITAC L-320 / 30H, 50H Document Processing Function" Ref. (1)), the method of combining the longest matching backtracks described in Ref. (1), 1993 19th Information Processing Society of Japan 19th Annual Convention Transactions 5E-4, "Kana-Kanji Conversion System for Solid Writing Sentences" (Referred to as (2)), the two-phrase longest-matching method described in 1981, Information Processing Society of Japan, Computational Linguistics Research Group Material 25-6, "Phrase Structure Analysis Algorithm Using Table Method and Its Efficiency" The term number minimization method described in (3) is well known. The method of Document (1) cuts out an independent word from the input left end by the longest match, and then repeats a phrase conversion process of assigning a grammatically connectable attached word to the independent word until it reaches the right end. If the right end is not reached, backtracking (backward) is performed and another phrase conversion is attempted to proceed to the destination. In the method of reference (2), all possible conversion candidates are brute-force-extracted over two phrases from the left end of the input, and a candidate having the longest sum of the lengths of the two phrases is selected from among them.
This is a method in which a break is set as a break of a phrase, and the same is repeated starting from that point to determine a break for each phrase. The method of Document (3) is a method in which phrases are cut out from the left end in a brute force manner, and a combination that minimizes the number of phrases is selected from the cutouts to obtain a conversion result. [0006] In the case of solid writing, since there is a choice of where to make a break between phrases, it is generally necessary to use a conversion method in comparison with the case of phrase segmentation. Many candidate ambiguities occur. For example, there are many interpretations of the input "in the context of mathematics,""in the transfer of mathematics,""in the mathematical society,""in mathematics or archeological sites," and so on. It is possible. Therefore, there is a problem that it is difficult to obtain high conversion accuracy even by using the methods described in Documents (1), (2), and (3). An object of the present invention is to provide a means for solid writing input for efficiently extracting and holding a plurality of conversion candidates having a high likelihood of conversion certainty, and easily and quickly selecting and confirming a correct conversion result among them. To provide. [0008] To achieve the above object,
The kana-kanji conversion method of the present invention includes a plurality of phrases,
Enter a kana character string that is not separated in clause units
Character string input means and cut out multiple phrases from kana character strings
And at least one variable starting with the first character of each clause
Kana-Kanji conversion means for extracting Kana-kanji candidates and performing Kana-Kanji conversion
And a kana-kanji conversion device having display means.
Kana-Kanji conversion method for Kana-Kanji conversion
Of the conversion candidates extracted according to the input indicating
Consists of conversion candidates with the highest probability for each clause
Display candidate character strings on the display
The conversion candidate corresponding to the phrase to be displayed is displayed so as to be distinguishable from others,
Depending on the input that changes the segment break of the target phrase,
For the part of the name string after the changed clause break
Display conversion candidates obtained by re-evaluating the likelihood of conversion
This is displayed on the means. As described above, by selecting and holding a candidate having a high possibility, it becomes possible to display and select another candidate at high speed even if a conversion error occurs. Requires the least equipment. If a conversion error is corrected by selecting another candidate, not only that part but also subsequent candidates are re-evaluated. For example, if the position of the boundary between candidates changes, Subsequent candidate strings starting from the new delimiter position are prepared, and the number of subsequent corrections can be reduced. An embodiment of the present invention will be described below. FIG. 1 is a diagram showing the configuration of a kana-kanji conversion device according to the present invention, which comprises an input / output unit 1, a control unit 2, and a storage unit 3. The input / output unit 1 includes a keyboard for inputting kana input, conversion and selection instructions, and a display unit for displaying conversion results and selection candidates. The control unit 2 controls execution of kana-kanji conversion. The storage unit 3 temporarily stores an input kana character string and data used for conversion. FIG. 2 shows a method of executing conversion using the present kana-kanji conversion apparatus. The user inputs a kana character string from the input / output unit 1 and instructs conversion (10). The control unit 2 performs a conversion process (20) for creating a candidate having a high likelihood of conversion certainty and storing the candidate in the storage unit 3 with respect to the kana input. Next, the control unit 2 creates an alternative candidate to be selected corresponding to the candidate string having the highest likelihood and the first candidate from the candidates in the storage unit 3 (30), and causes the input / output unit 1 to display the alternative candidate. (40). The user selects a target candidate from the selection candidates (50).
Then, the control unit 2 determines the selected portion and learns the selected candidate (60). Next, the control unit 2 performs a display candidate creation process (30) again on the undetermined portion, and creates and displays a conversion result and a replacement candidate for the subsequent portion. The above display (40), selection (50), determination processing (60), and display candidate creation processing (30) are repeated until all the conversion results are determined. Hereinafter, the operation of the control unit 2 will be described in more detail. FIG. 3 is a flowchart showing the operation of the conversion process (20). FIG. 4 is a diagram showing an example of data created on the storage unit 3 by the conversion process (20).
The conversion process (20) of FIG. 3 will be specifically described using the example shown in FIG. Now, it is assumed that the input character string is "in the world". 3 and FIG.
n is a pointer indicating the position of the character counted from the beginning of the input character string. In step 201 of FIG. 3, n is set to 1 and positioned at the head of the input character string. In step 202, it is checked whether or not there is a clause end where the position of n ends. It is assumed that the beginning of the character string is specially regarded as a clause end. In step 203, phrase conversion is performed. In the phrase conversion, a conversion candidate capable of performing one phrase conversion is extracted with the position of n as the head of the phrase. Here, the phrase refers to a form in which one independent word is preceded by an optional prefix, an optional suffix can be added after the independent word, and an optional appendix is connected. Compound words are considered clauses. In the example of FIG. 4, "math", "number", "suck", "mathematical society", "mathematic world",
“Mathematical floor”, “number”, “sucks”, etc. are extracted as conversion candidates from the top. As the kana-kanji conversion in phrase units, the method shown in "Kana-Kanji Conversion by Computer" in NHK Technical Research Vol. 25, No. 5, May 1973 is well known. Next, in step 204, the likelihood of the probability of the conversion candidate is determined, and it is determined whether the conversion candidate should be stored in the storage unit 3 or discarded (referred to as skill cutting). The likelihood of certainty is basically, as described in Refs. (1), (2), and (3), that the input character string is a sequence of longer phrases, or, in other words, fewer phrases. Is determined so that the likelihood is higher when the image is decomposed into columns. However, in the same phrase "this"
Adnominals such as "no" and formal nouns such as "koto" and "mono" are often attached to other phrases, so they are not considered as independent phrases like nouns or verbs. When counting the number of clauses, the value should be smaller than 1. As described above, the number of phrases is obtained by multiplying the weight in consideration of the part of speech and the appearance frequency, and the smaller the number is, the higher the likelihood is. Specifically, nouns, verbs, adjectives, adjective verbs are given a weight of 1, formal nouns, auxiliary verbs, adverbs, etc. are given a weight of 0.1, and prefixes and suffixes are given a weight of 0.5 as semi-independent words And In pruning, the likelihood from the beginning of the character string to the end of the phrase to be determined at present is determined, and this is defined as the likelihood at the character position at the end of the phrase. If the likelihood is found in the above, the value is compared with the value, and if the value exceeds a certain allowable value, pruning is executed. In this embodiment, the allowable value is the minimum value of the number of weighted phrases at the same character position + 1. However, the present invention
Needless to say, the method is not limited by the method of determining the likelihood of the certainty and the size of the allowable value of the pruning. A feature of the present invention is that by extracting all phrase candidates within a certain range of likelihood of certainty and storing them in the storage unit 3, later selection and correction can be easily performed at high speed. In FIG. 4, all the clauses extracted from the head are stored in the storage unit 3. (Step 205).
In the case of storing, by arranging data in a network form as shown in FIG. 4, the occupied storage capacity can be reduced. For example, “number” and “suck” can share the right end of a phrase, thereby unifying subsequent phrases. A specific method of expressing such data is well known as list processing. In step 206 of FIG. 3, 1 is added to n to move the pointer to the next input character position. At step 207, it is determined whether or not there is any input at the character position of the pointer n. When n is 2, since there is no phrase right end at this character position, the process goes to step 206, and the pointer n is immediately moved to the next character position. When n is 3, the end of the phrase "number" or "suck" is possible, so the next phrase is extracted from this position in step 203, and "forehead", "easy", "forehead",
A phrase such as "forehead meeting" or "forehead department" can be obtained. Step 2
In 04, “mathematical” has already been obtained as a candidate at the position where n is 4, and the number of weighted phrases is one. "amount",
In the phrase candidate “easy”, the number of weighted phrases at the position where n is 4 is 2 by adding 1 of “number” and 1 of “amount” or “easy”. From the pruning conditions, “forehead” and “easy” fall within the allowable range, and are registered in the network in step 205. As described above, by repeating steps 202 to 207 until n becomes 14, the complete network shown in FIG. 4 is obtained. However, here, "meeting, world, floor, family, formation, upper, place, state" is a suffix. In step 208, a pointer P indicating the head position of the undetermined part of the conversion result is initialized to 1, and
We show that there is no final part yet. As described above, the conversion processing (2
After the operation of (0) is completed, the process proceeds to the operation of the display candidate creation process (30). FIG. 5 is a flowchart showing the operation of the display candidate creation process (30). In step 301, a candidate string having the highest likelihood among the candidate strings from the point P to the end of the input character string is set as a conversion result in the display buffer. If there are a plurality of candidate strings having the same likelihood, a learning table in the storage unit 3 which holds a compound word or an independent word with an affix from the already determined character string is referred to. Is selected as the conversion result.
If the above is not uniquely determined, the candidate string having the largest sum of the independent word lengths is selected. In addition, as a method of selecting a conversion result, there is a method of using the frequency of a word or the like, and another selection method can be created by arbitrarily combining the factors described above. In step 302, another candidate group starting from point P is set in the display buffer so that the user can select it. In the example shown in FIG. 4, when P is 1, "on the mathematical analysis" is selected as the conversion result, and "mathematical", "number", and "suck" are extracted as candidate groups for selection. Set in the display buffer. Further, the conversion result and the corresponding portion of the input character string are highlighted to clearly indicate the character string portion currently being selected. (Display (4
0)). FIG. 7 shows screen information displayed on the input / output unit 1 immediately after the input (10) is completed, and FIG. 8 shows screen information displayed after the display candidate creation processing (30) is completed. In this screen, the bottom is an input line, the top is a conversion result output line, and the bottom right is a selection candidate display area. The user looks at the conversion result on the screen and selects (5)
Perform 0). The selection is performed by using a keyboard to select an object from the selection candidate group. If the conversion result highlighted at the top of the screen is correct, it can be selected by pressing a specific key. is there. When the selection (50) is performed by the user, the control unit 2 next executes the operation of the confirmation processing (60). FIG. 6 is a flowchart showing the operation of the determination process (60). At step 601, a mark is given to the network data in the storage unit 3 corresponding to the candidate designated by the user, and the length of the selected candidate is added to P, and the head of the subsequent undetermined part is added. Indicates the position. In step 602, the selected candidate is read and stored in the learning table on the storage unit 3. The control unit 2 starts from the point P and the subsequent conversion result obtained by re-evaluating the likelihood of the network of the candidate selected in the determination processing 60 by the display candidate creation processing 30 and the candidate after the point P. Set another candidate group in the display buffer. FIG. 9 shows screen display information after the user selects "math". Highlighting is "Analysis" after "Mathematics"
Has moved to the part. "Analysis" as a selection candidate group,
"Meeting", "Kai", "Floor", "Department", "Kake", and "Ka" are displayed. When the user selects "meeting" instead of "analysis" from the candidates, as shown in FIG. 10, the conversion result after "meeting" is obtained from the network shown in FIG. Are automatically changed to "Choice", "Seki", and "Cough" are displayed as a selection candidate group. Since the adjacency between the candidates is held in this way, the segment break is selected (5
In the case where the display is changed according to (0), the subsequent display can be corrected at the same time, and the number of steps for subsequent correction can be reduced. When the user selects "meeting",
Although the learning table is also updated in the confirmation process (60), in this case, since "mathematics" has already been registered and "kai" is a suffix, "mathematics" has been added to the previously registered "mathematics". Association is added and re-registered in the form of a "mathematical society".
FIG. 11 shows the contents of the learning table after "Mathematical Society" is registered. The control unit 2 repeatedly executes the display candidate creation processing (30) and the determination processing (60) until there is no undetermined part. When all the conversion results have been determined, the process ends. When the above-described series of processing is completed, the network data in the storage unit 3 is erased, but the data in the learning table is retained. If the same character string is subsequently input, the conversion result is given priority. Therefore, the conversion accuracy can be increased according to use. In the past, the learning function associated with conversion was used, but even if the prefix and suffix were learned by themselves, the same objection of the same voice often appeared in the same sentence, but rather reduced the conversion accuracy There were many problems. In addition, when a homonym appears, such as "news reporter is on a train", it was not possible to distinguish between "reporter" and "train" because it was learned in the unit of reading "kisha". . According to the present invention,
Prefixes and suffixes are learned together with the independent words to which they are attached, and compound words are learned in long units without being decomposed into independent words, so that there is an effect that the accuracy of learning can be improved more. According to the present invention, since candidates having a high likelihood of conversion certainty are held, another candidate can be immediately selected even if the conversion result has an error. This has the effect of speeding up the correction. By performing the processing for creating and holding the candidates in parallel during the input operation of the user, it is possible to further reduce the conversion waiting time of the user.

【図面の簡単な説明】【図１】本発明による仮名漢字変換装置の構成図であ
る。【図２】変換の実行方法を示す図である。【図３】変換処理の動作を表わすフローチャート図であ
る。【図４】データの一例を示す図である。【図５】表示候補作成処理の動作を表わすフローチャー
ト図である。【図６】確定処理の動作を表わすフローチャート図であ
る。【図７】画面表示情報を表わす図である。【図８】画面表示情報を表わす図である。【図９】画面表示情報を表わす図である。【図１０】画面表示情報を表わす図である。【図１１】学習テーブルの内容の例を表わす図である。【符号の説明】１…入出力部２…制御部３…記憶部BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a configuration diagram of a kana-kanji conversion device according to the present invention. FIG. 2 is a diagram illustrating a method of performing conversion. FIG. 3 is a flowchart illustrating an operation of a conversion process. FIG. 4 is a diagram showing an example of data. FIG. 5 is a flowchart illustrating an operation of a display candidate creation process. FIG. 6 is a flowchart illustrating an operation of a determination process. FIG. 7 is a diagram showing screen display information. FIG. 8 is a diagram showing screen display information. FIG. 9 is a diagram showing screen display information. FIG. 10 is a diagram showing screen display information. FIG. 11 is a diagram illustrating an example of contents of a learning table. [Description of Signs] 1 ... Input / output unit 2 ... Control unit 3 ... Storage unit

フロントページの続き (56)参考文献特開昭56−38663（ＪＰ，Ａ) 特開昭58−115528（ＪＰ，Ａ) 特開昭53−7132（ＪＰ，Ａ)Continuation of front page (56) References JP-A-56-38663 (JP, A) JP-A-58-115528 (JP, A) JP-A-53-7132 (JP, A)

Claims

(57) [the claims] 1. A character string input means for inputting a kana character string that includes a plurality of phrases and is not separated in units of phrases; In the kana-kanji conversion method of performing kana-kanji conversion by a kana-kanji conversion device having a kana-kanji conversion device for extracting and performing kana-kanji conversion, and a kana-kanji conversion device having display means, Among the conversion candidates, the candidate character string composed of the conversion candidate with the highest probability for each phrase is displayed on the display means, and the conversion candidate corresponding to the phrase of interest is displayed so as to be distinguishable from the others. In response to an input that changes the segment break of the phrase of interest, the likelihood of conversion of the portion of the kana character string that follows the changed segment break is re-examined. Kanji conversion method and displaying the conversion candidates obtained by valence on the display means.