JP2695772B2

JP2695772B2 - Kana-Kanji conversion device

Info

Publication number: JP2695772B2
Application number: JP61188444A
Authority: JP
Inventors: 義光大島; 正博阿部
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1986-08-13
Filing date: 1986-08-13
Publication date: 1998-01-14
Anticipated expiration: 2013-01-14
Also published as: JPS6345676A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は仮名入力を漢字仮名混じり文に変換する仮名
漢字変換装置に関わり、特に文節単位に分ち書きされて
いない仮名入力文を、変換キー等の特別な変換開始指示
手段なしに、入力に追随して変換を行なう仮名漢字変換
装置に関する。〔従来の技術〕従来、文節単位に分ち書きされていない仮名入力（い
わゆるべた書き入力）の文を漢字仮名混じり文に変換す
る方式はいくつか知られているが、その一つとして特開
昭60−189565「仮名漢字変換装置」に記されている方法
がある。ここで示されている方法は、仮名入力を漢字仮
名混じり文へ精度良く変換すること、変換の結果に曖昧
さがある場合は、それに対する複数の候補を効率良く抽
出し、保持し、その中から正しい変換結果を素速く容易
に選択する手段を提供している。また、日経コンピュータ1985年11月25日号「全文仮名
漢字変換方式の技術」で述べられている方式は、仮名文
字の入力と並行して、それまでに入力されている仮名文
字列の任意の部分文字列に対応する自立語の辞書引きを
行なうとともに、簡単な文節マツチングを行なつて、そ
の結果一定量（最長４文節）の仮名文字列の入力が認識
されると、自動的に正規の仮名漢字変換を開始するとい
う方式である。操作者が変換キーを押して変換開始の指
示をすることなく変換が行なわれ、操作者の認識を容易
にする方式である。さらに、特開昭60−22226「かな漢字変換装置」では
１文字入力されるたびに最長一致で辞書検索を行ない、
仮名漢字変換を即時に高速に実行する方式が述べられて
いる。〔発明が解決しようとする問題点〕上記三者の従来技術のうち、最初のものは、先にある
程度仮名文を入力し、その後変換キーを押して仮名漢字
変換の開始を指示する必要があり、変換キーを押してか
ら変換結果が得られるまでに時間がかかること、また文
節単位の変換方式ではないにしろ、操作者が入力文の区
切りを意識して変換キーを押さなくてはならず、煩わし
いという問題点があつた。二番目の従来例は、この問題点を解消しようとするも
のである。しかし、入力文節数が４以上になつてから変
換を開始するために、変換結果を得るまでの時間の問題
は完全に解消されていないこと、前処理での４文節の切
り出しは完全なものでなく、後に続く変換過程で修正を
施す必要があり、処理に重複する部分があるという問題
点がある。また、変換処理の前に全ての自立語を辞書か
ら読み出しているが、その多くは後の処理では不要にな
るので、この部分でも処理に無駄がある。最後の例は、変換処理は高速に実行することが可能で
あるが、専ら辞書照合を最長一致のみで行なつており、
第一の従来例と比べ、変換精度の点で問題がある。本発明は、このような従来技術による仮名漢字変換に
おいて、相伴に解決することが困難であつた変換精度と
処理時間の問題を改善し、高速かつ高精度な変換処理手
段を提供し、かつまた修正にあたつても容易に他の候補
を取り出すことが可能な仮名変換装置を提供することを
目的とする。〔問題点を解決するための手段〕上記目的を達成するために、次のような手段を用い
る。（１）仮名を含む文字を入力するキーボードと、入力し
た文字列を貯えるメモリと、単語の読みを牽引した手段
として単語の表記、品詞などの情報を格納してある辞書
と、該入力文字列メモリから部分文字列を切り出して該
辞書内の対応する読みを持つ語を検索する辞書検索手段
と、検索された語をもとに前後の語との接続検定を含む
仮名漢字変換を行なう手段と、変換処理により生成され
る複数の仮名漢字変換候補を記憶する手段と、変換候補
を尤度の順に表示する手段と、表示された候補から正し
い語を選択する手段を具備し、該辞書検索手段は該入力
文字列メモリ内の文字列を直接読み出し、またメモリ内
の文字が不足する場合には文字入力処理を行ない、文字
列の入力と辞書検索を逐次交代的に処理することによ
り、文字の入力に追随して仮名漢字変換を実行する手段
を設ける。（２）また、直前の入力誤りを修正するための後退キー
（バツクスペースキー）を設けるとともに、後退キーの
入力に対応して既に入力されている入力文字列メモリ内
の末尾文字を削除する手段、これに対応して削除された
入力文字を含む変換候補を変換記憶手段から削除する手
段、および変換候補記憶手段中に残されている各変換候
補の末尾または変換候補列の先頭位置から新たな変換候
補の作成を開始する手段を設ける。（３）さらにまた、上記のように複数候補が発生する処
理の開始時点において、各候補に対する変換処理の開始
を記憶管理する手段と、この記憶管理手段に記憶されて
いる実行可能な処理を順に取り出して処理を実行する手
段と、実行中の処理が辞書検索中新しい入力仮名文字列
の入力待ち状態になつたときに、この処理が入力待ち状
態になつたことを上記の記憶管理手段に登録するととも
に、実行待ち状態にある別の実行可能な処理を取り出し
て実行を再開させる手段を設ける。〔作用〕前記第１の手段（１）により、仮名文字の入力と並行
して変換処理を行なうことが可能となる。したがつて、
前記第１の従来例（特開昭60−189565）のように、ある
程度文字列を入力後変換キーを押して変換を実行すると
いうような手順をとる必要がなく、操作者を待たせるこ
となく変換処理を実行できる。しかも本手段を上記従来
例の一部として組み込むことも可能で、これにより高精
度かつ高速な仮名漢字変換装置を実現できる。第２の手段（２）は後退キーの機能に対する内部的な
処理手段について記したもので、ここに記した各手段を
連繋動作させることにより、実現される。また、可能な
位置からの再変換を開始させることにより、削除後の新
たな入力文字に対する変換にも対応させることができ
る。第３の手段（３）は並列処理の対象範囲を拡げ、より
徹底した並列処理を行なうことにより、処理の効率を上
げ高速化をはかるためのものである。手段（１）および手段（３）により、前記第２の従来
例よりも高速かつ即時的な変換処理が可能となる。〔実施例〕以下、本発明を実施例を用いて詳細に説明する。第１図に本発明に基づく仮名漢字変換装置の一実施例
を示す。１は入力用キーボード、２は入力した文字列と
貯える入力文字列メモリ、３は変換用辞書、４は辞書検
索部、５は辞書検索部から得られた語をもとに仮名漢字
変換を行なう変換部、６は変換候補の記憶部、７は変換
候補の表示および選択の制御を行なう表示選択制御部、
８は変換候補を見やすい形で画面に表示する表示部であ
る。第２図は、第１図の仮名漢字変換部5,変換候補記憶部
6,表示選択制御部７における処理をまとめてフローチヤ
ートである。変換処理が開始されると、まず、ステツプ201で、入
力文字列メモリ２の文節変換開始位置を示すポインタｉ
を初期値０にセツトし、入力文字列の先頭（未だ文字を
入力していない状態）に位置づける。次にステツプ202
でその位置が文節と文節の間の区切り、すなわち文節端
であるかどうかチエツクする。入力文字列の先頭は文節
端と見なしうるので、この場合は次のステツプ203の文
節変換へと進む。文節変換では、第１図の構成図からわかるように、キ
ーボードから仮名等の文字を入力しつつ、順次それを文
節を構成する形態素へと変換していく。ここで文節とは
一つの自立語を中心としてその前に一つ以上の省略可能
な接頭語、その後ろにやはり一つ以上の省略可能な接尾
語、さらにその後ろに省略可能な一つ以上の付属語が連
なつた形式のものを指する。複合語は其を構成する一つ
の自立語を含むブロツクに分け、それぞれを文節と見な
す。例えば「産業上の利用分野」に対しては、「産業上
の」（自立語1,接尾語1,付属語１）、「利用」「分野」
（各々自立語１）それぞれ一つの文節である。この文節
単位の仮名漢字変換の方式としては、NHK技術研究第25
巻第５号（昭和48年５月）「計算機によるカナ漢字変
換」に示されている方法が良く知られており、ここでも
基本的にその方法を適用することができる。ここで一つ
の完全な文節が抽出されると、次にステツプ204で変換
候補の確からしさの尤度を判定し、変換候補記憶部６に
保持すべきか、捨てる（「枝切り」と呼ぶ）べきかを決
める。確からしさの尤度は、昭和53年度情報処理学会第
19回全国大会論文集5E−４「べた書き文のカナ漢字変換
システム」や、昭和56年情報処理学会計算言語学研究会
資料25−６「表形式を用いた文節構造分析アルゴリズム
とその能率について」などで述べられているように、入
力文字列をより長い文節の列、別の言い方をすればより
少ない文節の列に分解する方が尤度が高くなるように決
める。ただし「この」「その」などの連体詞や、「こ
と」「もの」などの形式名詞などは他の文節に付属して
使用されることが多いので、名詞や動詞のように独立し
た文節とは見なさず、文節数を数える場合、１より小さ
な値とする。このように、品詞および出現頻度等を考慮
した重みをかけて文節数を求め、その数が少ない程尤度
が高いとする。具体的には、名詞，動詞，形容詞，形容
動詞，等には重み1,形式名詞，補助動詞，連体詞等には
0.1、接頭語，接尾語は準自立語扱いして0.5の重みを与
える、などとする。枝刈りは、文字列先頭から現在判定の対象となつてい
る文節の後端までの尤度を求め、それをその文節後端の
文字位置における尤度と定めて、もし既に同じ文字位置
において尤度がもとまつている場合はその値と比較し
て、その値がある許容値を超える場合枝刈りを実行す
る。許容値としては、例えば、同じ文字位置における重
みつきの文節数に最小値＋１などをとる。ステツプ204の枝刈りをパスした各文節は、ステツプ2
05で変換候補記憶部６内に格納される。また、ステツプ
206では、それまでに得られている各文節のうちでも最
も尤度の高い候補の文節列を表示部８の画面上に表示す
る（実際には、各文節の処理ごとに、最も尤度の高い候
補を追加表示していく）。ステツプ207でｉに１を加えてポインタを次の入力文
字位置に進める。ステツプ208ではポインタｉの文字位
置に入力文字がまだあるかどうか判定し、あればステツ
プ202へ戻る。なお、文字の入力はステツプ203の文節変換中に並行
して行なわれる。入力が継続して行われていれば、ステ
ツプ208では入力終りには達しえないので、第２図のル
ープは繰り返して処理される。また、文字の入力が終了
していても、入力文字列中に未処理の文字が残つている
ときはループを繰り返す。以上のループを繰り返し、入力文字をすべて取り込
み、またそれに対する変換処理が終了すると、ステツプ
209へ進み、操作者から選択入力を受け付け、各部分の
候補中から操作者の意図したものを順に選び結果を確定
していく。選択操作および処理の詳細はここでは述べな
いが、例えば特開昭60−189565に示してある方法により
実施することができる。以上に述べた処理に従つて変換候補記憶部６のなかに
形成されるデータの具体例を第３図に示す。例としてと
つた入力文字列は「すうがくかいせきじようでは」であ
るが、これを図の上部に示してある。図の下部には変換
処理の結果作成される複数の変換候補のデータを示して
ある。変換候補のデータの重複を防ぐためにネツトワー
ク状のデータ構造を採用し、記憶容量の節約をはかつて
いる。第４図に、第２図ステツプ203の文節変換の処理フロ
ーを示す。接頭語処理（ステツプ311）、自立語処理
（ステツプ312）、接尾語処理（ステツプ313）、付属語
処理（ステツプ314）から構成される。図から明らかな
ように、自立語処理のみが必須で他の処理は省略可能と
なつている。これは、前述した文節の定義をそのまま図
式化したものとなつており、実際には接頭語，接尾語，
付属語の処理それぞれについて、その処理を実施する場
合と実施しない場合の各々の組合せに対するすべての場
合について処理を実行し、可能な文節変換候補を全て抽
出することを試みる。また、付属語については一つだけ
でなく複数連なる場合もあるが、それらについても全て
の処理を試みる。これらの意味で、第４図は通常のフロ
ーチヤートの表現とは異なり、説明の補助用のものであ
る。第４図の各処理においては入力文字列メモリ２上でポ
インタｉが指す仮名文字から始まる読みを持つ語を辞書
と照合しつつ取り出すとともに、充分な文字が入力文字
列メモリ２上にそろつていないときは、キーボード１か
ら入力文字を待つ。また、辞書との照合により一致した
語を取り出し、先行する語との接続チエツクを行なう。第５図に自立語処理のフローを示す。接頭語処理，接
尾語処理，付属語処理も基本的構造は自立語処理と同様
なので、以下自立語処理で代表させて説明する。ステツプ321でまず辞書検索部４に要求を出し、入力
文字列メモリ２内の部分文字列と一致する語を辞書３か
ら読み出す。このとき入力文字列メモリ２上で先頭の文
字位置を同じくする（ポインタｉの値が同じ）語は全て
取り出す。第２図の例では，「巣」「酢」「数」「吸
う」「数学」などが、先頭から変換候補として抽出され
る。次にステップ322で変換候補として抽出されたもの
を一つずつ取り出し、ステツプ323で先行する形態素
（語）との接続チエツクを行なう。上の例では、取り出
し開始点が入力文字列の先頭、すなわち文節の先頭であ
るので、全ての自立語は接続条件を満足し、「巣」
「酢」「数」「吸う」「数学」などがステツプ324のチ
エツクをパスし、文節候補の一部と認定される。そし
て、ステツプ325で既存の部分以降（この場合は先頭）
につながれる。一般の位置の例では、第３図で助詞の
「が」（付属語に相当）を切り出している部分がある
が、助詞「が」は格助詞の場合と接続助詞の場合があ
り、それぞれ名詞などの体言および動詞などの用言と接
続可能であるので、図のように「数」には格助詞の
「が」が、「吸う」には接続助詞の「が」が切り出され
て接続される。このようしにて取り出された変換候補全
てについて接続チエツクが完了する（ステツプ326）と
自立語処理が終了する。接頭語処理，接尾語処理，付属語処理も処理内容はほ
ぼ同様である。第５図ステツプ321における辞書検索部４の処理フロ
ーを第６図に示す。ステツプ401でまず入力文字列メモ
リ２上で辞書検索する語の読みの後端を表わすポインタ
ｋを語の先端位置ポインタｉの値に初期化する。次にス
テップ402で、入力文字列メモリ２上のポインタｋの位
置に既に入力文字が登録されているかチエツクし、あれ
ば下へ進む。入力文字がない場合はキーボードから文字
が入力されるのを待つ（ステツプ403）。文字が入力さ
れたらそれを読み取り、入力文字列メモリ２のポインタ
ｋの位置へ登録する（ステツプ404）。ステツプ405では
ｉとｋではさまれる区間の文字列を読みとして辞書の検
索を行なう。検索の結果、一致する語が得られたときは
（ステツプ406）、それを辞書検索語バツフアに登録す
る（ステツプ407）。ここで辞書検索語バツフアとは辞
書検索部４内に設けられた一時記憶用のメモリである。
次にステツプ408でポインタｋを一つ進め、ループの先
頭に戻る。ステツプ406で、辞書中に一致する語を見つ
からなかつたときは、ループを脱出して処理を終わる。
また辞書検索部バツフアに登録されている語は第５図の
ステツプ322に渡される。なお、ステツプ406で辞書中に一致する語が見つから
なかつたときは処理を終わるとしたが、この処理フロー
では短い読みから長い読みの方向へ向かつて処理を行な
つているので、一致する読みの語がたまたまなくても、
さらに読みの長さを長くすると一致する語が見つかる場
合がある。例えば「すうがく」という文字列が入力文字
列メモリ２内にあるとき、読み「すう」に対する語とし
ては「数」「吸う」があるが、「すうが」に対する語は
ない。しかしさらに読みを延長して「すうがく」とする
と、「数学」という語がある。このような状況に対処するためには、辞書の牽引部の
構造を、例えば第７図のような読み順の木構造にすれば
よい。上記の例で言えば、読み「す」に対しては「巣」
「酢」などが、「すう」に対しては「数」「吸う」など
の語があることがわかる。「すうが」に対しては対応す
る語はないが、さらに読みを伸ばしたときには一致する
語があることが簡単に判定できるようになつている。ま
た、「すうがく」の「く」まで来たときには、それより
先には一致する語がないことが直ちにわかるようになつ
ている。この点を利用して、「数学」を検索した時点で
それより長い語がないことを検知し、第６図のループを
余分に回ることなく、直ちにループを脱出するように第
６図のフローを変更することも出来る。なお、第６図のように与えられた文字列に対応する複
数の長さの読みの語を得るような場合、検索の結果得ら
れた語のいちいちバツフア（辞書検索語バツフア）に貯
めこむことなく、語が得られるたびに次のステツプ（こ
の場合は第５図のステツプ322）へその語を渡していく
ような制御をとることができる。このようにすれば、辞
書から全ての語が得られるまで、接続チエツク等の他の
処理を待たせる必要がなく並列して実行できるので、処
理効率を上げることが出来る。ところで、自立語は接頭語，接尾語あるいは付属語と
比べて読みの長さが長いものが多い。この場合、短い読
みのものから長い読みのものまで全てに対して、接続チ
エツクなどの処理をしても無駄になることが多い。実用
的には最長一致あるいは次最長一致（最長一致の語の読
みに次いで一致する長さの語）により得られる語に制限
しても、仮名漢字変換の精度上は殆ど問題ない。したが
つて、この場合、まず最長一致の語が得られるまで次々
に読みの長さを伸ばしていき、最長一致の語が得られた
あと次最長一致の語を検索するというようにすることが
出来る。以上、文字が誤りなく入力されている場合の例につい
て説明してきたが、人間がキーボードから文字を入力し
ている場合、時として入力ミスを犯すことがある。しか
し、入力ミスはその場で気がつく場合が多い。キーボー
ドには通常これを修正するために、直前の入力文字を取
り消すための後退キーが設けられている。次に、この後
退キーに関する動作について説明する。第８図に、後退キー処理のフローチヤートを示す。後
退キーが押されると、まずステツプ341で入力文字列メ
モリ２内の末尾の文字を１文字削除する。次にステツプ
342では、辞書検索部４で実行中のその文字を含む辞書
アクセスのキヤンセルを行なう。またステツプ343で
は、変換候補記憶部６内で削除された文字を含む形態素
（語）の変換候補をネツトワークから削除する。最後に
（ステツプ344）、ネツトワーク上の各形態素（語）の
末尾および変換文字列の先頭から再度変換処理を試み
る。この場合、既に切り出されている形態素（語）につ
いては再度処理を行なう必要はなく、削除された文字の
部分から先へ新しく入力される文字を含む文字列につい
て変換処理すなわち辞書検索を行なえばよい。また、辞
書中にある語の読みの最大長がわかつている場合は、ネ
ツトワーク上の全ての形態素（語）の末尾および変換候
補列の先頭から再変換処理を行なう必要はなく、入力文
字列の最後端から辞書中の語の読みの最大長分遡つたと
ころまでの範囲で再変換処理を行なえばよい。以上で第１の実施例の説明を終わる。ところで、第３図から明らかなように、一般に変換対
象の入力文字列が与えられたとき、それに対応する仮名
漢字変換候補は複数存在する。単なる同音語だけでな
く、入力文字列の語への分割のしかた、あるいはその品
詞についても複数の候補が存在しうる。このとき、第１
の実施例で示した第１図の各部の処理が充分高速で、１
文字入力されるだびに必要な処理が実行され、次の入力
が入る前までに処理が終了するのならば問題ないが、実
際には、第１図の各部の処理には有限の実行時間を必要
とするので、キーボードからの文字の入力に処理の速度
が追いつかなくなる状況が生じる可能性がある。これは
例えば、熟練した操作者が非常な高速で文字を入力した
とき、あるいは並列に存在しうる変換候補が非常に多数
生じたときなどに起こる。このような状況においても、効率的に仮名漢字変換を
実行することが出来る第２の実施例を次に説明する。第９図に本実施例の構成を示す。第９図において、１
〜８は第１図におけるものと同一である。は、上述の並
列に存在する複数候補の処理を次々に切り換えて実行さ
せるための並列処理制御部である。第10図に示した具体例を用いて、本実施例における仮
名漢字変換装置の動作を説明する。この例は、最終的に
第３図のような結果になる過程で「すうがくか」まで入
力された状態を示している。「すうがく」の部分では変
換候補が明らかになり、「か」およびそれ以降の処理を
開始した状態にある。「か」については副助詞の「か」
あるいは接尾語の「科」「化」などがあり、これらは既
にネツトワークに登録されている。さらに両者ともこの
位置で文節を終了しうるので、その後端から次の文節の
処理が開始されている。また「か」とそれに続く入力文
字列からなる文節の処理も開始されていることを示して
いる。第１図の実施例で仮名漢字変換を実行中、第６図のス
テツプ402およびステツプ403でキーボード入力待ち状態
に入つたときに、第10図のような状況になつている場合
を考える。このとき、副助詞「か」と接尾語「科」
「化」が切り出されたあと「か」とそれに続く入力文字
列からなる文節の処理を開始しているときは問題ない
が、これが逆に「か」または「科」「化」が切り出され
る前に、「か」それに続く入力文字列に対する文節の処
理の途中で入力待ちの状態に入つてしまい、「か」また
は「科」「化」の処理が実行可能であるにも拘わらず待
たされてしまう場合がある。このようなとき、一つの文
節の処理が入力待ちの状態に入つたことを検知し、その
空き時間を利用して別の実行可能な処理を取り出して実
行するような構成にすれば、どちらの処理が先になろう
とも、そのとき実行可能な処理を全て実行することが可
能となる。このような、キーボード入力の待ちの状態を利用する
並列処理の候補としては次のようなものがある。（１）入力文字列メモリ２上で、文節開始点を異にする
複数の文節処理、およびそれに続く処理。（２）接頭語，自立語，接尾語，付属語など前後の接続
関係が異なる複数の処理。また、品詞，活用などの多義
による複数の処理。（３）始点を同一にするが、終点すなわち長さの異なる
語に対する複数の処理。これらについて、多義が生じる時点でそれぞれ処理の
実行開始を並列処理制御部９へ通知登録し、各処理が辞
書アクセス中にキーボード入力待ちになつたとき、並列
処理制御部９が既登録の処理のかなかから実行可能なも
のを選び出して起動する。なお、多義が生じる時点と
は、例えば、第２図でポインタｉを移動させながら文節
端の有無を調べつつ文節変換を起動するところ、第４図
で、接頭語処理，接尾語処理，付属語処理に実行有無に
関する分岐部分などである。〔発明の効果〕以上述べたごとく、本発明によれば、文節単位に分ち
書きされていない仮名文字列を入力としてこれを仮名漢
字変換する装置において、特別な変換キーを押すことな
しに仮名漢字変換を行なうことが可能となり、しかも入
力に追随して変換が行なわれるので、変換結果の確認が
容易となり、操作性の良い仮名漢字変換装置が実現可能
となる。また、本発明を従来例（特開昭60−189565）の
一部として組み込まれることにより、高精度かつ高速な
仮名漢字変換装置で実現できる。なお、本発明は順番に入力される仮名文字列を逐次的
に変換していく構成であつて、対話型による音声入力装
置のように逐次的に仮名文字列が入力されるような装置
に対しても有効なことは言うまでもない。The present invention relates to a kana-kanji conversion apparatus for converting kana input into a kana-kana mixed sentence, and particularly to a kana-kanji input sentence that is not separated in phrases. The present invention relates to a kana-kanji conversion device that performs conversion following input without special conversion start instruction means such as a key. [Prior Art] Conventionally, there have been known several methods of converting a sentence of kana input (so-called solid input) which is not written in units of phrases into a sentence mixed with kanji kana. There is a method described in Kana-Kanji conversion device in Sho 60-189565. The method shown here is to convert kana input accurately to sentence mixed with kanji kana, and if there is ambiguity in the result of conversion, efficiently extract and hold multiple candidates for it, and Provides a means to quickly and easily select the correct conversion result from In addition, the method described in the Nikkei Computer November 25, 1985 issue of "Full Text Kana-Kanji Conversion Method Technology" uses a method of inputting kana character strings Independent dictionary matching with substrings is performed, and simple phrase matching is performed. As a result, when a certain amount (up to 4 phrases) of kana character strings is recognized, the regular kana character string is automatically recognized. This is a method of starting kana-kanji conversion. This is a method in which conversion is performed without the operator pressing the conversion key and instructing to start conversion, thereby facilitating recognition of the operator. Further, in Japanese Patent Application Laid-Open No. 60-22226, "Kana-Kanji conversion device" performs a dictionary search with the longest match each time a character is input,
A method is described in which kana-kanji conversion is immediately performed at high speed. [Problems to be Solved by the Invention] Of the three prior arts described above, the first one needs to first input a kana sentence to some extent, then press a conversion key to instruct the start of kana-kanji conversion, It takes time from pressing the conversion key until the conversion result is obtained, and even if it is not a phrase-based conversion method, the operator must press the conversion key with awareness of the break of the input sentence, which is troublesome There was a problem. The second conventional example is to solve this problem. However, since the conversion is started after the number of input phrases has reached 4 or more, the problem of the time required to obtain the conversion result has not been completely solved, and the extraction of four phrases in the preprocessing is complete. However, it is necessary to make corrections in the subsequent conversion process, and there is a problem that there is an overlap in the processing. Further, all the independent words are read from the dictionary before the conversion processing. However, many of them are unnecessary in the subsequent processing, so that this part is wasteful in the processing. In the last example, the conversion process can be performed at high speed, but dictionary matching is performed exclusively with the longest match only.
There is a problem in the conversion accuracy as compared with the first conventional example. The present invention improves the conversion accuracy and the processing time, which have been difficult to solve in the kana-kanji conversion according to the prior art, and provides a high-speed and high-precision conversion processing means. It is an object of the present invention to provide a kana conversion device capable of easily extracting another candidate even when making a correction. [Means for Solving the Problems] In order to achieve the above object, the following means are used. (1) A keyboard for inputting characters including kana, a memory for storing the input character string, a dictionary storing information such as word notation and part of speech as a means for driving the reading of the word, and the input character string A dictionary search means for extracting a partial character string from the memory to search for a word having a corresponding reading in the dictionary, and a means for performing a kana-kanji conversion including a connection test between preceding and succeeding words based on the searched word; Means for storing a plurality of kana-kanji conversion candidates generated by the conversion process, means for displaying the conversion candidates in order of likelihood, and means for selecting a correct word from the displayed candidates, and the dictionary search means Directly reads the character string in the input character string memory, performs character input processing when the character in the memory is insufficient, and sequentially and alternately processes the input of the character string and the dictionary search. Add to input And providing means for performing kana-kanji conversion. (2) A means for providing a backward key (backspace key) for correcting the immediately preceding input error and for deleting the last character in the input character string memory which has already been entered in response to the input of the backward key. Means for deleting the conversion candidate containing the input character correspondingly deleted from the conversion storage means, and adding a new conversion candidate from the end of each conversion candidate or the head position of the conversion candidate sequence remaining in the conversion candidate storage means. Means for starting creation of the conversion candidate is provided. (3) Further, at the start of the process in which a plurality of candidates are generated as described above, the means for storing and managing the start of the conversion process for each candidate, and the executable process stored in the storage management means are sequentially described. Means for taking out and executing the processing, and registering in the storage management means that this processing has entered the input waiting state when the processing being executed is waiting for input of a new input kana character string during dictionary search. In addition, there is provided means for retrieving another executable process in the execution waiting state and resuming the execution. [Operation] According to the first means (1), it is possible to perform a conversion process in parallel with the input of the kana character. Therefore,
As in the first prior art (Japanese Patent Application Laid-Open No. 60-189565), there is no need to take a procedure such as executing a conversion by inputting a character string to some extent and then pressing a conversion key, and without making the operator wait. Can perform processing. In addition, this means can be incorporated as a part of the above-mentioned conventional example, whereby a high-accuracy and high-speed kana-kanji conversion apparatus can be realized. The second means (2) describes internal processing means for the function of the reverse key, and is realized by linking the respective means described herein. In addition, by starting re-conversion from a possible position, it is possible to cope with conversion for a new input character after deletion. The third means (3) is for increasing the processing efficiency and speeding up by expanding the range of parallel processing and performing more thorough parallel processing. By means (1) and (3), the conversion processing can be performed more quickly and immediately than in the second conventional example. Examples Hereinafter, the present invention will be described in detail with reference to examples. FIG. 1 shows an embodiment of a kana-kanji conversion device according to the present invention. 1 is an input keyboard, 2 is an input character string memory for storing input character strings, 3 is a conversion dictionary, 4 is a dictionary search unit, and 5 is a kana-kanji conversion based on words obtained from the dictionary search unit. A conversion unit, 6 is a storage unit for conversion candidates, 7 is a display selection control unit for controlling display and selection of conversion candidates,
Reference numeral 8 denotes a display unit that displays conversion candidates on a screen in a viewable manner. FIG. 2 shows a kana-kanji conversion unit 5 and a conversion candidate storage unit in FIG.
6. The processing in the display selection control unit 7 is collectively a flowchart. When the conversion process is started, first, in step 201, a pointer i indicating the phrase conversion start position in the input character string memory 2 is entered.
Is set to the initial value 0, and positioned at the beginning of the input character string (in a state where no characters have been input yet). Then step 202
Check if the position is a break between clauses, that is, the end of a clause. Since the beginning of the input character string can be regarded as a phrase end, in this case, the process proceeds to phrase conversion in the next step 203. In the phrase conversion, as can be seen from the configuration diagram of FIG. 1, characters such as kana are input from a keyboard and are sequentially converted into morphemes constituting the phrase. Here, a phrase is centered on one independent word, preceded by one or more optional prefixes, followed by one or more optional suffixes, and followed by one or more optional prefixes. It refers to a form with a series of attached words. Compound words are divided into blocks containing one independent word that composes them, and each is regarded as a clause. For example, for "industrial application field", "industrial" (independent word 1, suffix 1, appendix 1), "use""field"
(Each independent word 1) is one phrase. As a method of kana-kanji conversion for each phrase, NHK Technical Research No. 25
The method shown in “Volume No. 5 (May 1973)“ Kana-Kanji Conversion by Computer ”” is well known, and the method can be basically applied here. Here, when one complete phrase is extracted, the likelihood of the probability of the conversion candidate is determined in step 204, and it should be stored in the conversion candidate storage unit 6 or discarded (called "branch"). Decide. The likelihood of certainty is the IPSJ 1983
19th Annual Meeting 5E-4 "Kana-Kanji conversion system for solid written sentences", 1981 IPSJ Computational Linguistics Research Document 25-6, "Table-based phrase structure analysis algorithms and their efficiency." And so on, the input character string is determined so that the likelihood is higher when the input character string is decomposed into a sequence of longer phrases, or in other words, a sequence of fewer phrases. However, adnominals such as "this" and "that" and formal nouns such as "koto" and "mono" are often used in conjunction with other clauses, so independent clauses such as nouns and verbs When counting the number of clauses without considering it, it is set to a value smaller than 1. As described above, the number of phrases is obtained by weighting in consideration of the part of speech and the appearance frequency, and it is assumed that the smaller the number is, the higher the likelihood is. Specifically, weights for nouns, verbs, adjectives, adjective verbs, etc., and weights for formal nouns, auxiliary verbs, adverbs, etc.
0.1, prefixes and suffixes are treated as semi-independent words, and a weight of 0.5 is given. Pruning calculates the likelihood from the beginning of the character string to the end of the phrase that is currently the subject of the determination, and determines it as the likelihood at the character position at the end of the phrase. If the degree is adequate, the value is compared with the value, and if the value exceeds a certain allowable value, pruning is executed. As the allowable value, for example, the minimum value + 1 is used for the number of weighted phrases at the same character position. Each clause that passed the pruning in step 204 is referred to as step 2.
At 05, it is stored in the conversion candidate storage unit 6. Also, step
At 206, the phrase sequence of the candidate with the highest likelihood among the phrases obtained so far is displayed on the screen of the display unit 8 (actually, the most likely likelihood is Higher candidates are added and displayed). In step 207, 1 is added to i, and the pointer is advanced to the next input character position. At step 208, it is determined whether or not there is any input character at the character position of the pointer i. If there is, the process returns to step 202. The input of characters is performed in parallel during the phrase conversion in step 203. If the input has been made continuously, the end of the input cannot be reached in step 208, so the loop of FIG. 2 is repeatedly processed. Even if the input of characters has been completed, the loop is repeated if unprocessed characters remain in the input character string. The above loop is repeated, all the input characters are fetched, and when the conversion processing for them is completed, the step
Proceeding to 209, selection input is received from the operator, and the intention of the operator is sequentially selected from among the candidates of each part to determine the result. Although the details of the selection operation and processing are not described here, for example, it can be carried out by the method shown in JP-A-60-189565. FIG. 3 shows a specific example of data formed in the conversion candidate storage unit 6 in accordance with the processing described above. The input character string taken as an example is "I'm going to do this", which is shown at the top of the figure. The lower part of the figure shows data of a plurality of conversion candidates created as a result of the conversion processing. In order to prevent duplication of the conversion candidate data, a network-like data structure is adopted to save the storage capacity. FIG. 4 shows a processing flow of the phrase conversion in step 203 of FIG. It consists of prefix processing (step 311), independent word processing (step 312), suffix processing (step 313), and attached word processing (step 314). As is clear from the figure, only the independent word processing is essential, and the other processing can be omitted. This is a schematized version of the definition of clauses described above, and is actually a prefix, suffix,
For each processing of the attached word, the processing is executed for all combinations of the case where the processing is performed and the case where the processing is not performed, and an attempt is made to extract all possible phrase conversion candidates. In addition, not only one attached word but also a plurality of attached words may be present, but all of them are tried. In these respects, FIG. 4 is different from the usual expression of the flow chart and is for the purpose of explanation. In each process of FIG. 4, a word having a pronunciation starting from the kana character indicated by the pointer i on the input character string memory 2 is extracted while collating with the dictionary, and sufficient characters are arranged on the input character string memory 2. If not, wait for an input character from the keyboard 1. In addition, a word that matches with the dictionary is extracted, and a connection check with the preceding word is performed. FIG. 5 shows a flow of the independent word processing. Prefix processing, suffix processing, and auxiliary word processing have the same basic structure as that of the independent word processing. In step 321, a request is first sent to the dictionary search unit 4, and words matching the partial character strings in the input character string memory 2 are read from the dictionary 3. At this time, all words having the same character position at the beginning (the value of the pointer i is the same) on the input character string memory 2 are extracted. In the example of FIG. 2, “nest”, “vinegar”, “number”, “suck”, “mathematics”, etc. are extracted as conversion candidates from the top. Next, in step 322, the conversion candidates extracted are extracted one by one, and in step 323, a connection check with the preceding morpheme (word) is performed. In the above example, since the extraction start point is at the beginning of the input character string, that is, at the beginning of the phrase, all the independent words satisfy the connection condition and the "nest"
"Vinegar", "number", "suck", "mathematics", etc. passed the check in step 324 and were recognized as part of the phrase candidates. Then, in step 325, after the existing part (in this case, the head)
Connected to In the example of the general position, there is a portion where the particle “ga” (corresponding to an adjunct) is cut out in FIG. 3, but the particle “ga” is a case particle or a connecting particle, and each particle is a noun. Since it is possible to connect with the nomenclature such as verbs and verbs, as shown in the figure, the case particle "ga" is cut out for "number", and the connecting particle "ga" is cut out for "suck" as shown in the figure. You. When the connection check is completed for all the conversion candidates extracted in this way (step 326), the independent word processing ends. Prefix processing, suffix processing, and adjunct processing are almost the same. FIG. 6 shows the processing flow of the dictionary search unit 4 in step 321 in FIG. In step 401, a pointer k representing the end of the reading of a word to be dictionary-searched in the input character string memory 2 is initialized to the value of the leading end position pointer i of the word. Next, in step 402, it is checked whether or not the input character has already been registered at the position of the pointer k on the input character string memory 2, and if there is, the process proceeds downward. If there is no input character, it waits for a character to be input from the keyboard (step 403). When a character is input, it is read and registered at the position of the pointer k in the input character string memory 2 (step 404). In step 405, the dictionary is searched by reading the character string in the section between i and k. As a result of the search, when a matching word is obtained (step 406), it is registered in a dictionary search word buffer (step 407). Here, the dictionary search word buffer is a memory for temporary storage provided in the dictionary search unit 4.
Next, at step 408, the pointer k is advanced by one, and the process returns to the beginning of the loop. If no matching word is found in the dictionary in step 406, the processing exits from the loop.
The words registered in the dictionary search section buffer are passed to step 322 in FIG. If no matching word is found in the dictionary in step 406, the process is terminated.However, in this process flow, the process is performed from the short reading to the long reading, so that the matching reading is not performed. Even if the word doesn't happen,
If the reading length is further increased, a matching word may be found. For example, when the character string “Sugaraku” is in the input character string memory 2, the words for “Reading” include “Number” and “Suck”, but there is no word for “Sugar”. However, if you extend the reading further to "sugar," there is the word "mathematics." To cope with such a situation, the structure of the traction unit of the dictionary may be a tree structure of the reading order as shown in FIG. 7, for example. Speaking of the above example, for reading "su", "nest"
It can be seen that “vinegar” and “su” have words such as “number” and “suck”. Although there is no corresponding word for "Suga", it is possible to easily determine that there is a matching word when the reading is further extended. In addition, when "ku" of "sugaraku" comes, it is immediately understood that there is no matching word before that. By utilizing this point, it is detected that there is no longer word when "mathematical" is searched, and the flow shown in FIG. Can also be changed. In the case where words having a plurality of lengths corresponding to a given character string are obtained as shown in FIG. 6, the words must be stored in a buffer (a dictionary search word buffer) of each of the words obtained as a result of the search. Instead, it is possible to take control such that each time a word is obtained, the word is passed to the next step (in this case, step 322 in FIG. 5). In this way, other processes such as connection check can be executed in parallel without waiting until all words are obtained from the dictionary, so that the processing efficiency can be improved. By the way, independence words often have longer reading lengths than prefixes, suffixes, or adjunct words. In this case, it is often useless to perform processing such as a connection check for all of the short to long readings. Practically, even if it is limited to words obtained by the longest match or the next longest match (the word having the same length after reading the longest match word), there is almost no problem in the accuracy of kana-kanji conversion. Therefore, in this case, it is necessary to increase the reading length one by one until the longest match is obtained, and then search for the next longest match after the longest match is obtained. I can do it. As described above, the example in which the character is input without error has been described. However, when a human is inputting the character from the keyboard, sometimes an input error is made. However, input mistakes are often noticed on the spot. To correct this, the keyboard is usually provided with a back key for canceling the immediately preceding input character. Next, the operation related to the backward key will be described. FIG. 8 shows a flowchart of the reverse key processing. When the backward key is pressed, first, at step 341, one character at the end of the input character string memory 2 is deleted. Then step
At 342, a dictionary access cancel including the character being executed by the dictionary search unit 4 is canceled. In step 343, the conversion candidate of the morpheme (word) including the character deleted in the conversion candidate storage unit 6 is deleted from the network. Finally (step 344), the conversion process is attempted again from the end of each morpheme (word) on the network and the beginning of the conversion character string. In this case, it is not necessary to perform the processing again for the morphemes (words) already cut out, and it is sufficient to perform the conversion processing, that is, the dictionary search for the character string including the newly input character from the part of the deleted character. . If the maximum length of a word in the dictionary is longer than the length of the input character string, there is no need to perform re-conversion processing from the end of all morphemes (words) on the network and the beginning of the conversion candidate string. The re-conversion processing may be performed in a range from the last end of the word to the point where the word in the dictionary is read back by the maximum length. This is the end of the description of the first embodiment. By the way, as is apparent from FIG. 3, when an input character string to be converted is generally given, there are a plurality of kana-kanji conversion candidates corresponding to the input character string. In addition to mere homonyms, there may be a plurality of candidates for how to divide an input character string into words or for its part of speech. At this time, the first
The processing of each part in FIG. 1 shown in the embodiment of FIG.
There is no problem if the processing required for each character input is performed and the processing is completed before the next input is performed, but in reality, the processing of each part in FIG. 1 requires a finite execution time. This may cause a situation in which the processing speed cannot keep up with the input of characters from the keyboard. This occurs, for example, when a skilled operator inputs characters at a very high speed, or when a large number of conversion candidates that can exist in parallel occur. A second embodiment capable of efficiently performing kana-kanji conversion even in such a situation will be described below. FIG. 9 shows the configuration of this embodiment. In FIG. 9, 1
8 are the same as those in FIG. Is a parallel processing control unit for switching and executing the processing of a plurality of candidates existing in parallel one after another. The operation of the kana-kanji conversion device according to the present embodiment will be described using the specific example shown in FIG. This example shows a state in which "sugakuka" has been input in the process of finally obtaining the result shown in FIG. In the part of “Suugaraku”, conversion candidates are clarified, and “K” and the subsequent processing have been started. As for "ka", the auxiliary particle "ka"
Alternatively, there are suffixes such as "family" and "ka", which are already registered in the network. Further, since the phrase can end at this position, the processing of the next phrase is started from the rear end. It also indicates that the processing of a phrase consisting of "?" And an input character string following it has been started. While the kana-kanji conversion is being executed in the embodiment of FIG. 1, it is assumed that a situation as shown in FIG. 10 is reached when a keyboard input wait state is entered in steps 402 and 403 in FIG. At this time, the auxiliary particle "ka" and the suffix "ka"
There is no problem when processing of a clause consisting of "ka" and the input character string following "ka" after "ka" is cut out, but this is the opposite before "ka" or "family""ka" is cut out In the middle of the phrase processing for the input character string following "ka", it enters a state of waiting for input, and is kept waiting even though the processing of "ka" or "family""ka" can be executed In some cases. In such a case, if it is detected that the processing of one clause has entered the state of waiting for input, and another executable processing is extracted and executed using the idle time, either of the processings can be performed. Even if the processing is performed first, it is possible to execute all the processing that can be executed at that time. The following are candidates for such parallel processing utilizing the state of waiting for keyboard input. (1) Plural phrase processing on the input character string memory 2 with different phrase start points, and subsequent processing. (2) A plurality of processes having different connection relations before and after, such as a prefix, an independent word, a suffix, and an attached word. In addition, multiple processes based on polysemy, such as part of speech and inflection. (3) A plurality of processes for words having the same starting point but different ending points, that is, words having different lengths. For each of these, when the ambiguity occurs, the start of execution of each process is notified to the parallel processing control unit 9 and registered. When each process waits for a keyboard input while accessing the dictionary, the parallel processing control unit 9 determines whether the registered process is a registered process. Select the executable from among them and start it. The time when polysemy occurs is, for example, when the phrase conversion is started while moving the pointer i in FIG. 2 to check for the presence or absence of a clause end. In FIG. 4, prefix processing, suffix processing, For example, there is a branch portion regarding whether or not the process is executed. [Effects of the Invention] As described above, according to the present invention, in a device for inputting a kana character string that is not divided and written in units of phrases and converting it to a kana-kanji character, without using a special conversion key, Kanji conversion can be performed, and the conversion is performed following the input, so that it is easy to confirm the conversion result and a kana-kanji conversion device with good operability can be realized. Further, by incorporating the present invention as a part of the conventional example (Japanese Patent Laid-Open No. 60-189565), it is possible to realize a high-accuracy and high-speed kana-kanji conversion device. It should be noted that the present invention has a configuration in which kana character strings that are sequentially input are sequentially converted, and is applicable to a device in which kana character strings are sequentially input such as an interactive voice input device. Needless to say, it is effective.

【図面の簡単な説明】第１図は本発明の第一の実施例を示す図、第２図は仮名
漢字変換処理の全体フロー、第３図はこの処理により変
換候補記憶部６内に形成されるデータの具体例である。
第４図は文節変換の処理フロー、第５図は自立語処理の
処理フロー、第６図は辞書検索部４の処理フローであ
る。第７図は、第６図の処理フローで使用する辞書の牽
引部の構成を示す。第８図は後退キー処理のフローであ
る。第９図は第二の実施例の構成を示したもので、第10
図はその動作を説明するための具体例を示したものであ
る。１…キーボード、２…入力文字列メモリ、３…辞書、４
…辞書検索部、５…仮名漢字変換部、６…変換候補の記
憶部、７…表示選択制御部、８…表示部、９…並列処理
制御部。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram showing a first embodiment of the present invention, FIG. 2 is an overall flow of a kana-kanji conversion process, and FIG. 3 is formed in a conversion candidate storage section 6 by this process. It is a specific example of data to be performed.
FIG. 4 is a processing flow of phrase conversion, FIG. 5 is a processing flow of independent word processing, and FIG. 6 is a processing flow of the dictionary search unit 4. FIG. 7 shows the configuration of a dictionary pulling unit used in the processing flow of FIG. FIG. 8 is a flowchart of the reverse key processing. FIG. 9 shows the configuration of the second embodiment.
The figure shows a specific example for explaining the operation. 1 ... keyboard, 2 ... input character string memory, 3 ... dictionary, 4
... Dictionary search unit, 5 ... Kana-Kanji conversion unit, 6 ... Storage unit of conversion candidates, 7 ... Display selection control unit, 8 ... Display unit, 9 ... Parallel processing control unit.

Claims

(57) [Claims] A keyboard for inputting characters including kana, an input character string memory for storing the input character strings, a dictionary for storing information including at least one of a word notation and a part of speech, and a partial character string from the input character string memory. A dictionary search means for extracting a word having a corresponding reading in the dictionary and performing a kana-kanji conversion process including performing a connection test with a preceding word based on the searched word; Means for generating kanji conversion candidates, means for storing a plurality of kana-kanji conversion candidates generated by the kana-kanji conversion process, means for displaying the plurality of kana-kanji conversion candidates in the order of likelihood, and displayed kana Means for selecting a correct word from kanji conversion candidates, and the dictionary search means sequentially searches the dictionary in parallel with input of characters from the keyboard. What is claimed is: 1. A kana-kanji conversion device, comprising: a parallel processing means for selecting an executable process while waiting for an input from the keyboard and starting the process as parallel column processing. 2. 2. The kana-kanji conversion device according to claim 1, wherein the execution of the dictionary search is started when an ambiguous character occurs in the input character string. 3. 3. The kana-kanji conversion device according to claim 1, wherein the dictionary has an index portion having a tree structure in reading order. 4. In claim 1 or 2, the means for performing the kana-kanji conversion process determines the likelihood such that the likelihood is higher when the input character string is decomposed into a longer phrase string, A kana-kanji conversion device, wherein a plurality of kana-kanji conversion candidates are generated. 5. 3. The kana-kanji conversion device according to claim 1, wherein the means for storing the plurality of kana-kanji conversion candidates has a network-like data structure. 6. A keyboard for inputting characters including kana, an input character string memory for storing the input character strings, a dictionary for storing information including at least one of a word notation and a part of speech, and a partial character string from the input character string memory. A dictionary search means for extracting a word having a corresponding reading in the dictionary and performing a kana-kanji conversion process including performing a connection test with a preceding word based on the searched word; Means for generating kanji conversion candidates, means for storing a plurality of kana-kanji conversion candidates generated by the kana-kanji conversion process, means for displaying the plurality of kana-kanji conversion candidates in the order of likelihood, and displayed kana A means for selecting a correct word from the kanji conversion candidates; and a parallel processing means for selecting a process that can be executed while waiting for input from the keyboard and starting the process as parallel column processing. A kana conversion device configured to sequentially perform a search of the dictionary by the dictionary search means in parallel with input of a character from a keyboard, wherein a reverse operation is performed to cancel a character input immediately before on a part of the keyboard. Means for deleting the last character in the input character string memory which has already been input in response to the input of the backward key, and a plurality of kana-kanji conversion candidates for freezing the deleted character. Means for deleting from the storage means for storing the conversion candidates; and at least one of the end of each kana-kanji conversion candidate and the beginning of each kana-kanji conversion candidate remaining in the storage means for storing the plurality of kana-kanji conversion candidates. Means for performing a kana-kanji conversion process again, and a kana-kanji conversion device.