JP3814000B2

JP3814000B2 - Character string conversion apparatus and character string conversion method

Info

Publication number: JP3814000B2
Application number: JP29997595A
Authority: JP
Inventors: 博喜阿望
Original assignee: 株式会社ジャストシステム
Priority date: 1995-11-17
Filing date: 1995-11-17
Publication date: 2006-08-23
Anticipated expiration: 2015-11-17
Also published as: JPH09146937A

Description

【０００１】
【発明の属する技術分野】
この発明は、文字列変換装置に関し、特に、辞書登録されたローマ字混入文字列の変換に関する。
【０００２】
【関連技術および発明が解決しようとする課題】
かな漢字変換において、英字で入力して、かな漢字変換を行なう英字入力かな漢字変換方法が知られている。この英字入力かな漢字変換方法において、英文字混在文字列を入力する場合には、その英文字混在文字列について、その綴りをそのまま入力し、後で逆変換を行なう方法がある。例えば、「ｏｒｉｇｉｎａｌの動作」という英文字混入文字列を入力する場合には、英字入力モードにて、「ｏｒｉｇｉｎａｌｎｏｄｏｕｓａ」と入力して、変換キーを押す。すると、「おりぎなｌの動作」と仮変換される。この状態で、逆変換する部分である「おりぎなｌ」を指定して、逆変換キーを押すと、指定された部分が「ｏｒｉｇｉｎａｌ」と英文字文字列に逆変換される。このように、文字列変換装置においては、英字入力モードにて英文字混入文字列を入力する場合に、一旦正しいスペルで英文字を入力しておいて後で逆変換する後変換機能が採用されている。
【０００３】
しかしながら、かかる後変換機能を用いても、一旦逆変換する部分を指定して逆変換を行なうという作業が必要となる。したがって、英文字混入文字列を入力する場合には、英文字が混入しない文字列を入力する場合と比べて、作業性が低下する。
【０００４】
かかる問題を解決する為に、前記後変換が行なわれた場合にはこれを自動的に辞書に登録し、次回からは通常の漢字と同様に変換キーを押すだけで、英文字に変換することも考えられる。しかし、かかる辞書機能を用いても、以下の様な場合には、正確に変換できないという問題があった。
【０００５】
例えば、入力文字列「おりぎなｌ」に対して変換後文字列「ｏｒｉｇｉｎａｌ」が辞書登録してある場合に、「ｏｒｉｇｉｎａｌ案で」という文字列に変換する為に、「ｏｒｉｇｉｎａｌａｎｄｅ」と入力して、変換キーを押したとする。この場合、前記入力文字列は、所定のローマ読み規則（この場合、「ｌａ」＝「ぁ」とする）に基づいて、かな文字列「おりぎなぁんで」として理解されて、例えば、変換後文字列「折技なぁんで」と変換されてしまう。
【０００６】
この発明は上記問題を解決し、ローマ字入力モードにて入力された英文字列を変換を正確かつ簡易に行なえる文字列変換装置および変換方法を提供することを目的とする。
【０００７】
【課題を解決するための手段】
本発明にかかる文字列変換装置は、Ａ）英文字を含むかな文字列を変換前文字列として、この変換前文字列に対応する変換後文字列を記憶する辞書手段、Ｂ）ローマ字入力モードにて入力された英文字列を、ローマ字読み規則に基づいてローマ字かな変換処理を行い読み文字列を生成するとともに、前記辞書手段を検索して前記読み文字列中に前記変換前文字列が存在する場合は、これに対応する英字文字列を変換後文字列として出力する変換手段を備えた文字列変換装置において、Ｃ）前記変換手段は、前記入力された英文字列について前記ローマ字かな変換処理の切れ目を示すフラグを記憶しており、前記読み文字列について前記切れ目ではなかった位置でローマ字かな変換処理を行ない英文字混入文字列を得る英文字混入文字列取得処理を実行するとともに、前記辞書手段を検索して前記変換前文字列中に前記英文字混入文字列が存在する場合は、これに対応する英字文字列を変換後文字列として出力する。
これにより、入力された文字列から英文字を含む読み文字列を得ることができる。したがって、ローマ字入力モードにて入力された英文字列を変換を正確かつ簡易に行なえる。
【０００８】
本発明にかかる文字列変換装置においては、前記英文字混入文字列取得処理は、前記読み文字列の内、最後尾または最前列の文字列から、順次位置をずらして繰り返し実行される。これにより、入力された文字列から英文字を含む読み文字列を得ることができる。
【０００９】
本発明にかかる文字列変換装置においては、前記変換手段は、変換候補文字列が変換候補として好ましいか否かの判断規則を記憶する判断規則記憶手段を備え、前記判断規則に基づいて、前記変換候補文字列が変換候補として好ましくないと判断した場合に、前記入力された英文字列中の英文字を含む読み文字列を得る。したがって、前記英文字列記憶手段を参照するだけで、入力された英文字列中の英文字を含むかな文字列を得ることができる。
【００１０】
本発明にかかる文字列変換方法においては、Ａ）英文字を含むかな文字列を変換前文字列として、この変換前文字列に対応する変換後文字列を記憶する辞書手段、Ｂ）ローマ字入力モードにて入力された英文字列を、ローマ字読み規則に基づいてローマ字かな変換処理を行い読み文字列を生成するとともに、前記辞書手段を検索して前記読み文字列中に前記変換前文字列が存在する場合は、これに対応する英字文字列を変換後文字列として出力する変換手段を備えた文字列変換装置を用いた文字列変換方法であって、Ｃ）前記変換手段に、以下の処理を実行させること、 c1) 前記入力された英文字列について前記ローマ字かな変換処理の切れ目を示すフラグを記憶させておき、前記読み文字列について前記切れ目ではなかった位置でローマ字かな変換処理を行ない英文字混入文字列を得させ、 c2) 前記辞書手段を検索して前記変換前文字列中に前記英文字混入文字列が存在する場合は、これに対応する英字文字列を変換後文字列として出力する。
【００１１】
本発明にかかる文字列変換方法においては、前記英文字混入文字列取得処理は、前記読み文字列の内、最後尾または最前列の文字列から、順次位置をずらして繰り返し実行される。これにより、入力された文字列から英文字を含む読み文字列を得ることができる。
【００１３】
請求項７の記憶媒体においては、コンピュータが実行可能なプログラムを記憶したコンピュータ可読の記憶媒体であって、前記プログラムは、請求項１ないし請求項６のいずれかの装置又は方法を実現するものであることを特徴とする。
【００１４】
【発明の効果】
請求項１、請求項６の文字列変換装置または文字列変換方法においては、前記読み文字列生成の際に、前記ローマ字読み規則にとらわれることなく、前記生成された読み文字列の一部のかな文字列について、前記入力された英文字列中の英文字を含む読み文字列を得て、この英文字列中の英文字を含む読み文字列中に前記変換前文字列が存在するか否かも判断する。これにより、入力された文字列から英文字を含む読み文字列を得ることができる。したがって、ローマ字入力モードにて入力された英文字列を変換を正確かつ簡易に行なえる。
【００１５】
請求項２の文字列変換装置においては、前記変換手段は、変換候補文字列が変換候補として好ましいか否かの判断規則を記憶する判断規則記憶手段を備え、前記判断規則に基づいて、前記変換候補文字列が変換候補として好ましくないと判断した場合に、前記入力された英文字列中の英文字を含む読み文字列を得る。したがって、変換候補文字列が変換候補として好ましくない場合のみ、前記入力された英文字列中の英文字を含む読み文字列を得ることができる。これにより、変換効率を向上させることができる。
【００１６】
請求項３の文字列変換装置においては、前記入力された英文字列を記憶する英文字列記憶手段を備え、前記変換手段は、生成した読み文字列の一部の文字について、前記英文字列記憶手段を参照して、入力された英文字列中の英文字を含むかな文字列を得る。したがって、前記英文字列記憶手段を参照するだけで、入力された英文字列中の英文字を含むかな文字列を得ることができる。
【００１７】
請求項４の文字列変換装置においては、前記変換手段は、前記入力された英文字列に、強制区分指示命令が付加されている場合には、前記入力された英文字列中の英文字列を含むかな文字列を得る。したがって、文字列入力者の意図に合致した場合に、前記入力された英文字列中の英文字を含む読み文字列を得ることができる。これにより、変換効率を向上させることができる。
【００１８】
請求項５の文字列変換装置においては、前記変換手段は、前記辞書手段に記憶されている変換前文字列中における英文字より前に位置するかな文字列と同じかな文字列が、前記ローマ字読み規則に基づいて生成された読み文字列中に存在するか否かを判断して、存在する場合には、前記生成された読み文字列において続くかな文字列について、対応する英文字を得て、両英文字列が一致するか否かを判断する。そして、両者が、一致する場合には、前記入力された英文字列中の英文字を含む読み文字列を得る。したがって、前記辞書手段に記憶された変換前文字列を含むかな文字列が、前記読み文字列中に存在する場合には、入力された文字列から英文字を含む読み文字列を得ることができる。これにより、ローマ字入力モードにて入力された英文字列を変換を正確かつ簡易に行なえる。
【００１９】
【発明の実施の態様】１．機能ブロック図の説明
本発明の一実施例を図面に基づいて説明する。図１に示す文字列変換装置１は、入力手段４１、入出力制御手段４２、表示手段４３、出力手段４４、文字列記憶手段６３、変換手段５０および辞書手段７０を備えている。
【００２０】
入力手段４１には、各種の命令および変換対象となる英数文字列が入力される。入出力制御手段４２は、入力された英数文字列を文字列記憶手段６３に与える。文字列記憶手段６３は、英文字列記憶手段６４、読み文字列記憶手段６５、および表記文字列記憶手段６６を有している。入力された英数文字列は、英文字列記憶手段６４に記憶される。読み文字列記憶手段６５には、変換手段５０で変換された読み文字列が記憶される。この読み文字列記憶手段６５に記憶される文字列としては、かな文字列だけ、英数字文字列だけおよび双方の組合わせのいずれの場合もある。表記文字列記憶手段６６には、変換手段５０で変換された表記文字列が記憶される。表記文字列としては、かな文字列、漢字文字列、および英数字文字列のいずれの場合もある。
【００２１】
入出力制御手段４２は、出力命令を受けると、表記文字列記憶手段６６に記憶された表記文字列を表示手段４３に出力する。表示手段４３は、この表記文字列を表示する。出力手段４４は、表記文字列記憶手段６６に記憶された表記文字列を出力する。
【００２２】
辞書手段７０は、文法情報記憶手段７１、単語情報記憶手段７２、共起用例情報記憶手段７３、および学習情報記憶手段７４を有している。
【００２３】
文法情報記憶手段７１は、単語間の文法的な結びつきの正否に関する情報、その結びつきの強さに関する情報等を記憶する。例えば、「名詞（助詞なし）＋接尾語は結びつきが強い、名詞（助詞なし）＋名詞は結びつきが強い」等の情報が記憶されている。
【００２４】
単語情報記憶手段７２は、変換前文字列および対応する変換後文字列を記憶する。変換前文字列には、かな文字だけ、英文字だけ、英数文字混じりのかな文字のいずれの場合も含む。変換後文字列には、漢字、カタカナ、英文字およびこれらの組合わせ文字列を含む。具体的には、各々の単語の読み文字列、表記文字列、品詞情報及び活用情報等が記憶される。
【００２５】
共起用例情報記憶手段７３には、意味的な結びつきの強い単語間の２項関係情報が記憶される。この共起用例情報には、単に２つの結びつきの他に、「人」、「花」等の属性単位の共起用例や、付加的な制限情報を含むものである。付加的な制限情報としては、例えば、「を」、「が」等の助詞情報、成立する向きの情報等がある。共起用例としては、単語同志の結びつきとして、「暑い−夏」、「厚い−本」、「熱い−お湯」等が記憶される。属性単位の共起用例として、「人（彼、彼女、先生、恋人、等）に−会う」等が記憶される。属性単位の共起用例として、「花（チューリップ、菊、等）が−咲く」等が記憶される。助詞の制限情報として、「話を−聞く」、「薬が−効く」、「機転が−利く」等が記憶される。向きの制限情報として、「家庭−教育」、「教育−過程」が記憶される。
【００２６】
かかる共起用例情報は、意味情報であり、これを用いることにより同音語の多義性が解消される。例えば、共起用例情報７３による同音語の一例を示すと、「あつい／ほん」という読みに対して変換処理を行なう際に、共起用例情報７３における「厚い−本」という結びつきの情報から「厚い」が選択され、「熱い、暑い」等は選択されない。したがって、「厚い／本」という変換結果が即座に得られる。このように、共起用例情報は、変換効率の向上を図るために用いられる。
【００２７】
学習情報記憶手段７４は、複数の表記のうち、最近に使用された表記を優先して採用する場合の使用情報である。
【００２８】
変換手段５０は、読み文字列生成手段５１、読み規則記憶手段５２、基本解析手段５３、文節区切り処理手段５４、自動辞書登録手段５５、表記選択手段５６、およびローマ字区切り変更手段５８を備えている。
【００２９】
読み規則記憶手段５２は、英文字列に対するローマ字読み文字列が記憶されている。読み文字列生成手段５１は、読み規則記憶手段５２に記憶されたローマ字読み規則に基づいて、英文字列記憶手段６４に記憶された英文字列から読み文字列を生成して、読み文字列記憶手段６５に記憶する。
【００３０】
基本解析手段５３は、辞書手段７０の文法情報記憶手段７１および単語情報記憶手段７２を参照して文節の区切り位置を推定する。
【００３１】
文節区切り処理手段５４は、基本解析手段５２から出力される文節区切り候補に対して文法／単語情報のチェックをすると共に、共起用例情報記憶手段７３を参照して同音語選択処理等を行なう。さらに、文節区切り処理手段５４は文節区切り候補の絞り込みを行なう。
【００３２】
表記選択手段５６は、文節区切り処理手段５４の処理にて、表記が未決定の文節に対して表記の決定を行なう。表記選択手段５６は、切り出された文節の単語部分を、かな文字列からローマ字文字列に変換する逆ローマ字変換手段５７を有している。なお、逆ローマ字変換手段８２は、表示文字サイズの変更も可能である。
【００３３】
自動辞書登録手段５５は、表記選択手段５６で決定された単語が、辞書手段７０の単語情報記憶手段７２に登録されていない場合には、この単語の読み、表記文字列、品詞等の情報を単語情報記憶手段７２に自動登録する。
【００３４】
ローマ字区切り変更手段５８は、判断規則記憶手段５９を有しており、以下の様にして、ローマ字区切りを変更する。判断規則記憶手段５９には、変換候補文字列が変換候補として好ましいか否かの判断規則が記憶されている。ローマ字区切り変更手段５８は、この判断規則に基づいて、読み文字列記憶手段６５に記憶された読み文字列の一部の文字について、前記英文字列記憶手段に記憶された英文字列を参照して、入力された英文字列中の英文字を含むかな文字列を得る。得られた英文字を含むかな文字列について、基本解析手段５３および文節区切り処理手段５４は、再度文字列変換を行なう。
【００３５】
かかる構成により、変換手段５０は、ローマ字入力モードにて入力された英文字列を、ローマ字読み規則に基づいて読み文字列を生成するとともに、前記辞書手段を検索して前記読み文字列中に前記変換前文字列が存在する場合は、これに対応する英字文字列を変換後文字列として出力する。また、前記判断規則に基づいて、前記変換候補文字列が変換候補として好ましくないと判断した場合には、前記ローマ字読み規則にとらわれることなく、前記生成された読み文字列の一部のかな文字列について、前記入力された英文字列中の英文字を含む読み文字列を得て、前記辞書手段を検索することができる。
【００３６】
なお、本実施形態においては、判断規則記憶手段５９を設け、必要な場合にのみ、前記ローマ字区切り処理を行なっているが、これを設けず、全てについて前記ローマ字区切り処理を行なうようにしてもよい。
【００３７】
２．ハードウェア構成
図２に、図１に示す文字列変換装置１をＣＰＵを用いて実現したハードウェア構成の一例を示す。
【００３８】
文字列変換装置１は、ＣＰＵ２３、メモリ２７、ハードディスク２６、ＣＲＴ３０、ＦＤＤ２５、キーボード２８、マウス３１およびバスライン２９を備えている。ＣＰＵ２３は、ハードディスク２６に記憶された制御プログラムにしたがいバスライン２９を介して、各部を制御する。
【００３９】
この制御プログラムは、ＦＤＤ２５を介して、プログラムが記憶されたフレキシブルディスクから読み出されてハードディスク２６にインストールされたものである。なお、フレキシブルディスク以外に、ＣＤ−ＲＯＭ、ＩＣカード等のプログラムを実体的に一体化したコンピュータ可読の記憶媒体から、ハードディスクにインストールさせるようにしてもよい。さらに、通信回線を用いてダウンロードするようにしてもよい。
【００４０】
本実施形態においては、プログラムをフレキシブルディスクからハードディスク２６にインストールさせることにより、フレキシブルディスクに記憶させたプログラムを間接的にコンピュータに実行させるようにしている。しかし、これに限定されることなく、フレキシブルディスクに記憶させたプログラムをＦＤＤ２５から直接的に実行するようにしてもよい。なお、コンピュータによって、実行可能なプログラムとしては、そのままのインストールするだけで直接実行可能なものはもちろん、一旦他の形態等に変換が必要なもの（例えば、データ圧縮されているものを、解凍する等）、さらには、他のモジュール部分と組合して実行可能なものも含む。
【００４１】
ハードディスク２６には辞書データが記憶される。メモリ２７には各種の演算結果等が記憶される。ＣＲＴ３０には、変換候補等が表示される。
【００４２】
ＣＲＴ３０の表示について図３を用いて、説明する。ＣＲＴ３０の編集画面１００には、文章を表示するエリアの他に、画面下方にエリア１１０が設けられている。エリア１１０には、各種メッセージや状態が表示され、また、同音異義語の候補が表示される。エリア１１０の拡大図を同図の右側部分に示す。エリア１１０の中のエリア１２０は、次候補群表示のための指示が与えられたとき、それらの単語候補を表示する領域として使用される。
【００４３】
ここで、［挿入］とは、キーボード４から入力された文字列をカーソル１０２の直前に挿入することを意味しており、［確定］とは、かな漢字変換等のかな文書変換をしてユーザが選択した単語を後で再度読みに戻すことをしないモードになっていることを意味する。［連カナ漢］とは、かな漢字変換モードが連文節でって、カナ入力であることを示している。
【００４４】
なお、上述の［挿入］については、所定の制御キーを操作することにより、［挿入］と［上書］が相互に適宜変更可能になっている。［カナ漢］についても、同様に、ローマ字入力の［Ｒ漢］と相互に変更可能である。
【００４５】
図３において、ユーザが、キーボード２８より文字列を入力し、変換キーを押下することにより、後述するように、文字列変換処理が実行される。この結果、画面のカーソル１０２の位置に、各々の読みで変換された優先度の高い単語からなる文（もしくは文節）が未確定の状態で表示される。このとき、次候補キーを押下すると、表示された未確定単語のうち、先頭の文節の候補群がエリア１２０に表示される。
【００４６】
本実施形態においては、スペースキーを、かな文書変換キーおよび次候補キーに対応させている。すなわち、読みを入力した直後にスペースキーが押下された時は、かな文書変換キーとして判断する。この状態で、続けてスペースキーが入力したときには次候補キーとして処理される。なお、読みを入力せず、単にスペースキーが入力されたとき、又はコントロールキーとシフトキーとスペースキーとが同時に押下された時には、本来の空白文字の入力として処理される。
【００４７】
候補の確定は、変換キーを押下したときの状態（第１候補）でよければ、その時点でリターンキーを押下するか、もしくは続いて読みキーを押下することでなされる。また、次候補キーを押下して候補群をエリア１２０に表示した時には、その候補のいずれかを選択（候補群には番号が付されていて、その番号を入力）したのち、リターンキー（確定キー）を押下することで確定する。なお、読みが確定している時に入力したリターンキーは文字どおり改行キーを示す。
【００４８】
３．フローチャート
つぎに、ハードディスク２６に記憶されているプログラムについて、説明する。
【００４９】
3.1 自動登録および変換処理
図４のフローチャートを用いて、文書処理動作を説明する。ユーザは、キーボード２８から、英文字列を入力する。ＣＰＵ２３は、英文字列が入力されるか否か判断しており（図４ステップＳＴ１）、英文字列が入力されると、入力された英文字列をローマ字バッファ２７ａに記憶する（図４ステップＳＴ３）。つぎに、入力された英文字列について、以下に示すローマ字読み規則に基づいて、ローマ字を読みが可能か否か判断する（ステップＳＴ５）。
【００５０】
ローマ字読み規則
規則１：ローマ字文字列がローマ字かな変換テーブルに登録されている文字列の場合には、該当するローマ字文字列をかな文字に変換する。
【００５１】
規則２：英子音文字が連続して入力された場合は、最初の英子音文字を「っ」に変換し、以後規則１を繰り返す。
【００５２】
ＣＰＵ２３は、ローマ字を読みが可能な場合には、かな変換を行なう（ステップＳＴ７）。変換されたかな文字は読み文字列バッファ２７ｂに記憶される（ステップＳＴ９）。たとえは、図５Ａに示すような英文字列が入力された場合、「ＤＡ」と入力された段階で、母音であるので、かな変換処理が行なわれ、読み文字列バッファ２７ｂに、「だ」が記憶される。
【００５３】
このような処理を繰返す事により、入力された英文字列から読み文字列が生成される。例えば、図５Ａに示す文字列に対して、上述のローマ字かな変換規則を適用すると、図５Ｂに示すような読み文字列が生成されて読み文字列バッファ２７ｂに記憶される。
【００５４】
このようにして、ローマ字バッファ２７ａおよび読み文字列バッファ２７ｂには、順次データが記憶される。なお、ローマ字読みできない英字、数字、記号キーの場合には、そのまま読み文字列バッファ２７ｂへ記憶される。
【００５５】
ＣＰＵ２３は、変換キーが操作されるか否か判断しており（ステップＳＴ１１）、変換キーが操作されないうちは、ステップＳＴ１〜ステップＳＴ９の処理が繰返される。この状態で、ユーザが変換キーを操作すると、ＣＰＵ２３は、変換キーが操作されたと判断して、文節区切りおよび辞書変換処理を行なう（ステップＳＴ１３）。
【００５６】
ステップＳＴ１３の処理は、従来の文節区切りおよび辞書変換処理と同様である。簡単に説明すると、辞書部２６ｂに記憶されたデータ（単語情報、文法情報等）により、読み文字列バッファ２７ｂのデータについて、読み文字列の文節候補が抽出される。さらに、辞書部２６ｂに記憶された他のデータを用いて、抽出された文節候補が、文法的、意味的にチェックされて文節候補の絞り込みが行なわれる。かかる絞り込みとともに、文節区切り処理がなされる。
【００５７】
つぎに、ＣＰＵ２３は、最小コスト法により、変換候補を仮決定するとともに、仮決定した変換候補を、表記文字列バッファ２７ｃに記憶する（ステップＳＴ１５）。なお、最小コスト法以外に公知の方法を用いてもよい。
【００５８】
つぎに、ＣＰＵ２３は、ローマ字区切り変更処理が必要か否か判断する（図８ステップＳＴ１９）。ローマ字区切り変更処理については後述する。ローマ字区切り変更処理が必要でないと判断した場合には、この表示文字列バッファ２７ｃに記憶された文字列を、ＣＲＴ３０に表示する（ステップＳＴ２１）。
【００５９】
ユーザは、表示された変換結果について、変換が正しいか否か判断し、正しい場合は、確定キーを操作する。また、文節区切りが誤っていると判断した場合は、
文節区切り修正キーを操作する。また、文節区切りは正しいが、間違った同音異義語が表示されている場合には、次候補キーを操作する。
【００６０】
ＣＰＵ２３は、ステップＳＴ２３にていずれのキーが操作されたか判断する。もし、文節区切り修正キーが操作された場合は、当該キーの内容に応じて、文節区切り位置を変更して（ステップＳＴ２５）、図４ステップＳＴ１３以下の処理を繰返す。
【００６１】
次候補キーが操作された場合は、ＣＰＵ２３は、逆ローマ字変換処理を行なう（ステップＳＴ２７）。逆ローマ字変換処理について、簡単に説明する。ここでは、図５Ｃの「だて」部分について、逆ローマ字変換処理を行なうものとする。ＣＰＵ２３は、「だて」を、ローマ字バッファ２７ａを参照して、英文字列に変換する。そして、全角で表記した第１候補（「ＤＡＴＥ」）と、半角サイズで表記した第２候補（「ｄａｔｅ」）が生成される。
【００６２】
このように、逆ローマ字変換処理を行なうことにより入力された英文字列を変換する事ができる。
【００６３】
つぎに、ＣＰＵ２３は、これらの２つの候補に加えて、辞書部２６ｂから読み出した第３〜第７の候補とを合成して、ＣＲＴ３０の候補表示エリア１２０に提示する（図５Ｄ参照）。このようにして、逆ローマ字変換された文字列が、辞書部２６ｂに登録されている候補とともに、表示され、ユーザの選択対象として提示される。
【００６４】
ユーザは、エリア１２０に提示された次候補の中から、「１．ＤＡＴＥ」を選択する候補特定命令を与える。ＣＰＵ２３は、かかる候補特定命令が与えられると、エリア１２０の表示を図５Ｅに示すように変化させ、図５Ｆに示すように変更候補を表示する。
【００６５】
所望の変換結果が得られたので、ユーザは、確定キーを操作する。これにより、ＣＰＵ２３は、確定処理を行なう（ステップＳＴ３１）。これにより、表記文字列バッファ２７ｃに図５Ｆに示す文字列が記憶される。
【００６６】
つぎに、ＣＰＵ２３は、逆ローマ字変換された文字列が前記確定文字列中に存在するか否か判断する（ステップＳＴ３３）。存在する場合には、ＣＰＵ２３は、辞書部２６ｂに当該逆ローマ字変換された文字列を記憶する。この場合であれば、ＣＰＵ２３は、図９に示すように、文字列「ＤＡＴＥ」が読み「だて」として、追加登録するとともに、学習情報を更新する。
【００６７】
このように、ユーザの辞書登録操作を一切要求することなく、自動的に辞書部２６ｂに新しい単語が登録される。したがって、例えば、つぎにユーザが、「ひづけのかくにんには、だてとにゅうりょくする。」と文字入力して変換キーを押すと「日付の確認には、ＤＡＴＥと入力する。」と一回で正しくかな文書変換ができる。すなわち、日英辞書や英日辞書等を特別用意しておかなくても、外国語の単語をオリジナルのまま入力してかな漢字混じり文章の中に混在させることが可能となるとともに、新規単語をユーザの操作により新規登録する必要もない。
【００６８】
このように、本実施形態においては、辞書部２６ｂに登録されていない単語が検出された場合には、変換候補として、これらのかな文字列を元のキー入力時のローマ字文字列に逆変換して、大文字全角、小文字半角、大文字半角、小文字全角等に変換し、ユーザに次候補の１つとして提示する。さらに、元のかな文字列のまま表示した文字列やひらがなをカタカナに変換した文字列を、次候補の１つとして提示するとユーザの選択範囲を拡大することができる。
【００６９】
なお、次候補の提示の順番として、逆ローマ字変換した文字列を優先的に提示するようにすると、日本語と外国語をより区別することなくユーザが取扱えるようになる。さらに、逆ローマ字変換の英大文字への変換が先か、英子文字への変換が先か、或いは全角表記が先か、半角表記文字列も次候補の１つとして表示するか等は、ユーザの選択によりカスタマイズできるようにしておくと、ユーザが逆ローマ字変換をより利用しやすくなる。
【００７０】
逆ローマ字変換された候補文字列が確定文字列として採用された場合、辞書部２６ｂに順次自動登録することが可能となり、この機能は日本語と外国語とを混在させた日本語文章を作成する場合、モード変換キーや辞書登録操作をユーザに一切要求しないで新規外国語を辞書登録できる。
【００７１】
このように、本実施形態においては、次候補キーが押下された場合に、次候補対象として、入力された英文字に戻した文字列を表示するようにしている。これにより、ユーザは、変換キーを操作するだけで、後変換キー（かな変換された文字列を再び英文字列に変換するキー）を操作することなく、所望の英文字列を得ることができる。さらに、かかる英文字列が確定された場合は、これを自動登録するようにしている。したがって、辞書に新たに登録する作業が不要となる。
【００７２】
なお、本実施形態においては、辞書に登録する際、前記自動登録処理を行なったが、これに限定されず、ユーザが後変換キーを操作して、英文字列に変換され、かかる変換が確定した時に、辞書登録するようにしてもよい。
【００７３】
3.2 ローマ字区切り変更処理
つぎに、ローマ字区切り変更処理について説明する。ローマ字区切り変更処理とは、入力文字列をローマ字読みする際に、ユーザの意図とは異なる部分でローマ字読みをした為に、辞書登録されている文字列に変換されない場合に、入力文字列を前記ローマ字読み規則にとらわれることなく、ローマ字読みする処理をいう。
【００７４】
例えば、辞書に「ＡＴＯＫ」が登録されており、「ＡＴＯＫ用例」と変換する為に「ａｔｏｋｙｏｕｒｅｉ」を入力した場合、上記ローマ字読み規則では、「あときょうれい」とローマ字読みされてしまい、ユーザの入力意図とはズレてしまう。ローマ字区切り変更処理を行なう事により、これをユーザの意図通りローマ字読みさせて、正しい変換結果を得る事ができる。
【００７５】
ＣＰＵ２３は、図８ステップＳＴ１９において、ローマ字区切り変更処理が必要である可能性が高い場合には、図１１ステップＳＴ４１以下の処理を行なう。本実施形態においては、ＣＰＵ２３は、一文字文節がある場合は、ローマ字区切り変更処理が必要である可能性が高い判断するようにした。
【００７６】
例えば、「ＡＴＯＫ用例」と変換する為に、「ａｔｏｋｙｏｕｒｅｉ」と入力すると、「あときょうれい」とローマ字読みされて、図７に示す辞書を参照して「後，強冷」と変換される。この場合、漢字変換された文字列に、一文字文節「後」が存在するので、ステップＳＴ１９にて、ローマ字区切り変更処理が必要であると判断する。
【００７７】
ＣＰＵ２３は、該当文節における注目かな文字について、フラグｒｍｐｏｓが「０」の英文字で再区切りを行なう。フラグｒｍｐｏｓとは、入力された英文字列をローマ字読みしたときに、母音となるか子音となるかを区別する為のフラグである。これは、入力文字列をローマ字読みした段階で、ローマ字かな変換処理が可能となった部分を、フラグｒｍｐｏｓ［１］とし、それ以外の部分は、フラグｒｍｐｏｓ［０］とする。また、注目かな文字は、最初は、当該文節の後から１番目で、かつフラグｒｍｐｏｓ［０］を含むかな文字とする。
【００７８】
具体的に説明すると、図１０Ａに示すように、「ａｔｏｋｙｏｕｒｅｉ」と入力された場合、フラグｒｍｐｏｓは、図１０Ｂに示す様に、「１，０，１，０，０，１，１，０，１，１」となる。そして、この場合、注目かな文字は、「あ，と，き，ょ，う，れ，い」の「れ」となる。したがって、この場合、「れ」を英文字文字列「ｒ」，「ｉ」として、図１０Ｄに示すように、英文字混入読み文字列「あときょうｒ」に、かな変換する。
【００７９】
ＣＰＵ２３は、この「あときょうｒ」が辞書部２６ｂに登録されているか否かを判断する（ステップＳＴ４３）。登録されていない場合には、つぎの区切りが可能か否か判断し（ステップＳＴ４５）、可能である場合は、ステップＳＴ４１の処理を繰返す。
【００８０】
この場合、つぎの区切りが可能であるので、つぎの、フラグｒｍｐｏｓ［０］を含むかな文字で、かな変換する。これにより、図１０Ｅに示す様に、「あとｋｙ」が得られる。ＣＰＵ２３は、この「あとｋｙ」が、辞書部２６ｂに登録されているか否かを判断する（ステップＳＴ４３）。登録されていない場合には、つぎの区切りが可能か否か判断し（ステップＳＴ４５）、可能である場合は、ステップＳＴ４１の処理を繰返す。
【００８１】
この場合、つぎの区切りが可能であるので、つぎの、フラグｒｍｐｏｓ［０］を含むかな文字で、かな変換する。これにより、図１０Ｆに示す様に、「あとｋ」が得られる。ＣＰＵ２３は、この「あとｋ」が、辞書部２６ｂに登録されているか否かを判断する（ステップＳＴ４３）。この場合、図７に示すように「あとｋ」が登録されているので、当該位置でローマ字かな変換処理を行ない（ステップＳＴ４９）、図４ステップＳＴ１３以下の処理を行なう。
【００８２】
なお、ステップＳＴ１３の処理を行なう事により、入力文字列「ａｔｏｋｙｏｕｒｅｉ」は、「ＡＴＯＫ用例」と変換される。
【００８３】
なお、ステップＳＴ４５にて、つぎの区切りが可能でない場合は、ローマ字区切り変更処理は不要と判断して、図４ステップＳＴ１５にて、仮決定された変換候補を変換候補とする（ステップＳＴ４７）。そして、図８ステップＳＴ２１以下の処理を行なう。
【００８４】
このようにして、英文字を含む文字列を正確に変換することができる。なお、本実施形態においては、検索が成功するまで、読みの長い順に行なったが、読みの短い順に行なうこともできる。さらに、検索が成功すると処理を中止しているが、可能の場合全てについて検索を行ない、最も評価の高いものを選択するようにしてもよい。
【００８５】
なお、本実施形態においては、一文字文節がある場合は、ローマ字区切り変更処理が必要である可能性が高いと判断して、前記ローマ字区切り変更処理をするようにしたが、これに以外に、ローマ字区切り変更処理が必要である可能性が高い場合を設定しておき、同様の処理を行なうようにしてもよい。例えば、未登録語の文節がある場合である。
【００８６】
未登録語の文節としては、例えば、「ｏｒｉｇｉｎａｌ」が辞書登録されており、「ｏｒｉｇｉｎａｌアプリ」と変換する為に、「ｏｒｉｇｉｎａｌａｐｒｉ」と入力したとする。この場合、「おりぎなぁぷり」と、かな変換され、漢字変換すると、例えば「折技なぁぷり」となる。この場合、「なぁぷり」は未登録語の文節となる。この様な場合も、前記一文字文節がある場合と同様に、ローマ字区切り変更処理が必要である可能性が高いと判断できる。
【００８７】
４．他の実施形態
上記実施形態においては、一旦漢字変換してからローマ字区切り変更処理が必要か否か判断しているが、文字入力がされた状態でローマ字区切り変更処理が必要であるか否か判断するようにしてもよい。例えば、シフトキーを押した状態で文字入力がされた場合は、ローマ字区切り変更処理が必要である可能性が高いと判断して、１回目の英文字列からかな変換する状態で、前記ローマ字読み規則にとらわれない区切り処理をするよう決定することができる。
【００８８】
例えば、辞書に「ＡＴＯＫ」が登録されており、「このＡＴＯＫ用例を用いて」と変換する為に、「ｋｏｎｏＡｔｏｋｙｏｕｒｅｉｗｏｍｏｔｉｉｔｅ」と「Ａ」をシフトキーを用いて入力した場合は、以下の様にして、ローマ字かな変換する。
【００８９】
ＣＰＵ２３は、シフトキーを用いて入力された文字「Ａ」から、英字の区切りに属しない箇所までで、ローマ字かな変換を行なう。すなわち、「ｋｏｎｏＡｔｏｋｙｏｕｒｅｉｗｏｍｏｔｉｉｔｅ」のうち、「Ａｔｏｋｙｏｕｒｅｉ」について、英字の区切りに属しない箇所までの文字列「ａｔｏｋｙｏｕｒ」で区切りを行ない、「あときょうｒ」に、かな変換する。そして、この「あときょうｒ」が辞書にあるか否かを判断する。
【００９０】
ない場合には、上記と同様にして、再区切り処理を行なう。このようにして、ユーザの命令がなくても、確実に英文字を含む文字列を変換することができる。
【００９１】
なお、この場合も、読みの短い順に行なうこともできる。
【００９２】
このように、入力手段に英文字列が入力される際に、強制区分指示命令が付加されている場合には、前記入力された英文字列中の英文字列を含むかな文字列を得る様にしてもよい。
【００９３】
また、以下の様にして、前記入力された英文字列中の英文字を含む読み文字列を得る様にしてもよい。辞書に存在する文字列のうち、英文字を含む文字列の当該英文字までのかなが含まれている部分があるか否かをローマ字かな変換をした段階で判断する。そして、存在する場合には、入力文字におけるこれにつづく英字と、辞書に記憶されている文字列におけるこれに続く英字が一致するか否か判断し、一致する場合にはその箇所で英字区切りを行なう。
【００９４】
たとえば、「あとｋ」が登録されている場合、「ａｔｏｋｙｏｕｒｅｉ」が入力されると「あときょうれい」とローマ字かな変換がなされる。かかる入力文字中に、辞書登録されている「あとｋ」の「あと」部分の文字列（以下かな共通部という）があるか否かを判断する。この場合、かな共通部が存在するので、入力文字列の「あときょうれい」についてかな共通部に続く英字を抽出する。なお、かな共通部に続く英字は、つぎの母音の部分の前までの組合わせ全てが抽出される。したがって、この場合、「ｋ」および「ｋｙ」の２種類が抽出される。辞書登録されている文字は、かな共通部につづく英字は「ｋ」である。この場合、「ｋ」と考えれは、両者が一致するので、この場合「ａｔｏｋｙｏｕｒｅｉ」は「ａｔｏｋ」と「ｙｏｕｒｅｉ」に区切られる。
【００９５】
この場合、全ての箇所についてかかる処理を行なうのは効率が悪いので、上記一文字文節がある場合、または未登録語がある場合のみ、かかる処理を行なうようにしてもよい。
【００９６】
上記実施形態においては、ローマ字バッファを参照して、英文字に戻すようにしてローマ字区切り変更処理を実行しているが、図６のローマ字読み規則を逆読みして、英文字に戻すようにして実行するようにしてもよい。
【００９７】
なお、本実施形態においては、図１に示す機能を実現する為に、ＣＰＵ２３を用い、ソフトウェアによってこれを実現している。しかし、その一部もしくは全てを、ロジック回路等のハードウェアによって実現してもよい。
【図面の簡単な説明】
【図１】本発明にかかる文字列変換装置１の機能ブロック図である。
【図２】図１に示す文字列変換装置１のハードウエア構成の一例を示す図である。
【図３】ＣＲＴ３０の表示を示す図である。
【図４】変換処理のフローチャートを示す。
【図５】ローマ字バッファ、かなバッファ、表記文字バッファのデータ内容および表示例を示す図である。
【図６】ローマ字読み規則を示す図である。
【図７】辞書部２６ｂの内容を示す図である。
【図８】変換処理のフローチャートを示す。
【図９】辞書部２６ｂに追加される単語データ内容を示す図である。
【図１０】ローマ字バッファ、かなバッファのデータ内容を示す図である。
【図１１】ローマ字区切り変更処理のフローチャートを示す。
【符号の説明】
６３・・・文字列記憶手段
５０・・・変換手段
７０・・・辞書手段
２３・・・ＣＰＵ
２７・・・メモリ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a character string conversion device, and more particularly to conversion of a Roman character mixed character string registered in a dictionary.
[0002]
[Related Art and Problems to be Solved by the Invention]
In Kana-Kanji conversion, an English character input Kana-Kanji conversion method for performing Kana-Kanji conversion by inputting English characters is known. In this alphabet input Kana-Kanji conversion method, when an English character mixed character string is input, there is a method of inputting the spelling of the English character mixed character string as it is and performing reverse conversion later. For example, in order to input an English character mixed character string “original operation”, “original nodousa” is input in the English character input mode, and the conversion key is pressed. Then, it is temporarily converted as “Origin l operation”. In this state, when “Origina 1” which is the part to be reversely converted is designated and the reverse conversion key is pressed, the designated part is reversely converted to “original” and an English character string. Thus, in the character string conversion device, when inputting an English character mixed character string in the English character input mode, a post-conversion function that once inputs an English character with a correct spelling and performs reverse conversion later is employed. ing.
[0003]
However, even if such a post-conversion function is used, it is necessary to perform an inverse conversion by designating a portion to be inversely converted once. Therefore, when an English character mixed character string is input, workability is lower than when a character string in which no English characters are mixed is input.
[0004]
In order to solve this problem, when the post-conversion is performed, it is automatically registered in the dictionary, and from the next time it is converted into English characters just by pressing the conversion key in the same way as normal kanji. Is also possible. However, even if such a dictionary function is used, there is a problem that it cannot be accurately converted in the following cases.
[0005]
For example, if the converted character string “original” is registered in the dictionary for the input character string “Origina 1”, enter “originallande” to convert it to the character string “in original”. Suppose that the conversion key is pressed. In this case, the input character string is understood as a kana character string “Originana de” based on a predetermined Roman reading rule (in this case, “la” = “a”). It will be converted to the string “Folding technique”.
[0006]
An object of the present invention is to solve the above problems and to provide a character string conversion device and a conversion method capable of accurately and easily converting an English character string input in a Roman character input mode.
[0007]
[Means for Solving the Problems]
A character string conversion device according to the present invention includes:A) A dictionary means for storing a post-conversion character string corresponding to this pre-conversion character string as a pre-conversion character string including English characters, and B) an English character string input in the Roman character input mode. Based on reading rulesRomaji kana conversion processConversion means for generating a reading character string and searching the dictionary means to output an alphabetic character string corresponding to the character string before conversion when the reading character string exists in the reading character string In the character string conversion device provided, C) the conversion means includes:A flag indicating the break of the Roman character kana conversion process is stored for the input English character string, and an English character mixed character string is obtained by performing a Roman character kana conversion process at a position that is not the break for the reading character string. When the character string mixed character string acquisition process is performed and the dictionary means is searched and the English character mixed character string exists in the pre-conversion character string, the corresponding English character string is set as a post-conversion character string. Output.
Thereby, a reading character string including English characters can be obtained from the input character string. Therefore, the English character string input in the Roman character input mode can be converted accurately and easily.
[0008]
In the character string conversion device according to the present invention, the English character mixed character string acquisition process is repeatedly executed by sequentially shifting the position from the last or frontmost character string in the reading character string. Thereby, a reading character string including English characters can be obtained from the input character string.
[0009]
In the character string conversion device according to the present invention, the conversion means includes determination rule storage means for storing a determination rule as to whether or not a conversion candidate character string is preferable as a conversion candidate, and the conversion based on the determination rule When it is determined that the candidate character string is not preferable as a conversion candidate, a reading character string including an English character in the input English character string is obtained. Therefore, it is possible to obtain a kana character string including an English character in the input English character string simply by referring to the English character string storage means.
[0010]
In the character string conversion method according to the present invention, A) a dictionary means for storing a post-conversion character string corresponding to this pre-conversion character string as a pre-conversion character string including an English character, and B) a Roman character input mode The English character string input in is subjected to a romaji kana conversion process based on the roman character reading rules to generate a read character string, and the dictionary means is searched for and the pre-conversion character string exists in the read character string A character string conversion method using a character string conversion device provided with conversion means for outputting a corresponding English character string as a converted character string, and C) the conversion means is subjected to the following processing: To perform, c1) A flag indicating the break of the Roman character kana conversion process is stored for the input English character string, and a Roman character kana conversion process is performed on the read character string at a position that is not the break to obtain an English character mixed character string. , c2) When the dictionary means is searched and the English character mixed character string exists in the pre-conversion character string, the corresponding English character string is output as a post-conversion character string.
[0011]
In the character string conversion method according to the present invention, the English character mixed character string acquisition process is repeatedly executed by sequentially shifting the position from the last or frontmost character string in the reading character string. Thereby, a reading character string including English characters can be obtained from the input character string.
[0013]
A storage medium according to claim 7 is a computer-readable storage medium storing a computer-executable program, and the program realizes the apparatus or method according to any one of claims 1 to 6. It is characterized by being.
[0014]
【The invention's effect】
In the character string conversion device or the character string conversion method according to any one of Claims 1 and 6, when the reading character string is generated, a part of the generated reading character string is not limited to the Roman character reading rule. For a character string, a reading character string including an English character in the input English character string is obtained, and whether or not the pre-conversion character string exists in the reading character string including the English character in the English character string. to decide. Thereby, a reading character string including English characters can be obtained from the input character string. Therefore, the English character string input in the Roman character input mode can be converted accurately and easily.
[0015]
3. The character string conversion apparatus according to claim 2, wherein the conversion means includes determination rule storage means for storing a determination rule as to whether or not a conversion candidate character string is preferable as a conversion candidate, and the conversion based on the determination rule. When it is determined that the candidate character string is not preferable as a conversion candidate, a reading character string including an English character in the input English character string is obtained. Therefore, only when the conversion candidate character string is not preferable as a conversion candidate, a reading character string including the English characters in the input English character string can be obtained. Thereby, conversion efficiency can be improved.
[0016]
4. The character string conversion apparatus according to claim 3, further comprising an English character string storage unit that stores the input English character string, wherein the conversion unit is configured to convert the English character string for a part of the generated character string. With reference to the storage means, a kana character string including an English character in the input English character string is obtained. Therefore, it is possible to obtain a kana character string including an English character in the input English character string simply by referring to the English character string storage means.
[0017]
5. The character string conversion apparatus according to claim 4, wherein the conversion means includes an English character string in the input English character string when a forced classification instruction command is added to the input English character string. Get a kana string containing. Therefore, when it matches the intention of the character string input person, it is possible to obtain a reading character string including the English characters in the input English character string. Thereby, conversion efficiency can be improved.
[0018]
6. The character string conversion apparatus according to claim 5, wherein the conversion means is adapted to read the roman character as a kana character string that is the same as the kana character string located before the English character in the pre-conversion character string stored in the dictionary means. It is determined whether or not it exists in the reading character string generated based on the rule, and if it exists, for the kana character string that continues in the generated reading character string, the corresponding English character is obtained, Judge whether both English character strings match. If they match, a reading character string including the English characters in the input English character string is obtained. Therefore, when a kana character string including the pre-conversion character string stored in the dictionary means exists in the reading character string, a reading character string including English characters can be obtained from the input character string. . Thereby, the English character string input in the Roman character input mode can be converted accurately and easily.
[0019]
BEST MODE FOR CARRYING OUT THE INVENTION Functional block diagramLight
BookAn embodiment of the invention will be described with reference to the drawings. The character string conversion device 1 shown in FIG. 1 includes an input unit 41, an input / output control unit 42, a display unit 43, an output unit 44, a character string storage unit 63, a conversion unit 50, and a dictionary unit 70.
[0020]
Various commands and alphanumeric character strings to be converted are input to the input means 41. The input / output control means 42 gives the input alphanumeric character string to the character string storage means 63. The character string storage unit 63 includes an English character string storage unit 64, a reading character string storage unit 65, and a written character string storage unit 66. The input alphanumeric character string is stored in the English character string storage means 64. The reading character string storage unit 65 stores the reading character string converted by the conversion unit 50. The character string stored in the reading character string storage means 65 may be a kana character string only, an alphanumeric character string only, or a combination of both. The written character string storage unit 66 stores the written character string converted by the conversion unit 50. The notation character string may be a kana character string, a kanji character string, or an alphanumeric character string.
[0021]
When the input / output control means 42 receives the output command, the input / output control means 42 outputs the written character string stored in the written character string storage means 66 to the display means 43. The display means 43 displays this notation character string. The output unit 44 outputs the written character string stored in the written character string storage unit 66.
[0022]
The dictionary means 70 includes grammatical information storage means 71, word information storage means 72, co-occurrence example information storage means 73, and learning information storage means 74.
[0023]
The grammatical information storage unit 71 stores information about correctness of grammatical connection between words, information about the strength of the connection, and the like. For example, information such as “noun (no particle) + suffix is strong, noun (no particle) + noun is strong” is stored.
[0024]
The word information storage means 72 stores the pre-conversion character string and the corresponding post-conversion character string. The pre-conversion character string includes the case of only Kana characters, only English characters, or Kana characters mixed with alphanumeric characters. The converted character string includes kanji, katakana, English characters, and a combination character string thereof. Specifically, a reading character string, a written character string, part-of-speech information, utilization information, and the like of each word are stored.
[0025]
The co-occurrence example information storage unit 73 stores binary relation information between words having strong semantic connections. This co-occurrence example information includes co-occurrence examples of attribute units such as “person” and “flower” and additional restriction information, in addition to two connections. Additional restriction information includes, for example, particle information such as “O” and “GA”, information on the direction of establishment, and the like. As examples of co-occurrence, “hot-summer”, “thick-book”, “hot-hot water”, and the like are stored as word ties. As an example of attribute-based co-occurrence, “Meeting a person (he, girlfriend, teacher, lover, etc.)” is stored. As an example of co-occurrence in attribute units, “flowers (tulips, chrysanthemums, etc.) are blooming” and the like are stored. As the particle restriction information, “speak-listen”, “medicine-effective”, “motivate-effective”, etc. are stored. “Home-education” and “education-process” are stored as direction restriction information.
[0026]
Such co-occurrence example information is semantic information, and the use of this information eliminates the ambiguity of homophones. For example, an example of a homonym word based on the co-occurrence example information 73 indicates that, when the conversion process is performed on the reading “hot / hon”, the “thick-book” information in the co-occurrence example information 73 indicates “ “Thick” is selected, and “hot, hot” or the like is not selected. Therefore, the conversion result of “thick / book” is obtained immediately. Thus, the co-occurrence example information is used to improve conversion efficiency.
[0027]
The learning information storage means 74 is usage information when a recently used notation is preferentially adopted among a plurality of notations.
[0028]
The conversion unit 50 includes a reading character string generation unit 51, a reading rule storage unit 52, a basic analysis unit 53, a phrase delimiter processing unit 54, an automatic dictionary registration unit 55, a notation selection unit 56, and a Roman character delimiter changing unit 58. .
[0029]
The reading rule storage means 52 stores a Roman character reading character string for an English character string. The reading character string generation means 51 generates a reading character string from the English character string stored in the English character string storage means 64 based on the Roman character reading rules stored in the reading rule storage means 52, and stores the reading character string storage. Store in means 65.
[0030]
The basic analysis means 53 estimates the segment break position with reference to the grammar information storage means 71 and the word information storage means 72 of the dictionary means 70.
[0031]
The phrase break processing means 54 checks the grammar / word information for the phrase break candidate output from the basic analysis means 52 and performs a homophone selection process with reference to the co-occurrence example information storage means 73. Further, the phrase break processing means 54 narrows down the phrase break candidates.
[0032]
The notation selection means 56 determines the notation for the phrase whose notation has not been determined by the processing of the phrase delimiter processing means 54. The notation selection means 56 has reverse Roman character conversion means 57 for converting the word portion of the extracted phrase from a kana character string to a Roman character string. The reverse Roman character conversion means 82 can also change the display character size.
[0033]
When the word determined by the notation selection means 56 is not registered in the word information storage means 72 of the dictionary means 70, the automatic dictionary registration means 55 displays information such as the reading of this word, the notation character string, and the part of speech. It is automatically registered in the word information storage means 72.
[0034]
The Roman character delimiter changing means 58 has a judgment rule storage means 59, and changes the Roman character delimiter as follows. The determination rule storage means 59 stores a determination rule as to whether or not a conversion candidate character string is preferable as a conversion candidate. Based on this determination rule, the Roman character delimiter changing means 58 refers to the English character string stored in the English character string storage means for some characters of the read character string stored in the read character string storage means 65. Thus, a kana character string including the English characters in the input English character string is obtained. For the kana character string including the obtained English character, the basic analysis means 53 and the phrase delimiter processing means 54 perform character string conversion again.
[0035]
With this configuration, the conversion means 50 generates a reading character string based on the Roman character reading rule for the English character string input in the Roman character input mode, and searches the dictionary means to search for the character string in the reading character string. If there is a pre-conversion character string, the corresponding English character string is output as the post-conversion character string. Further, when it is determined that the conversion candidate character string is not preferable as a conversion candidate based on the determination rule, the kana character string that is a part of the generated reading character string without being caught by the Roman character reading rule For the above, the dictionary means can be searched by obtaining a reading character string including an English character in the input English character string.
[0036]
In the present embodiment, the determination rule storage means 59 is provided, and the Roman character separation process is performed only when necessary. However, the Roman character separation process may be performed for all of them. .
[0037]
2. Hardware structureCompletion
FigureFIG. 2 shows an example of a hardware configuration in which the character string conversion device 1 shown in FIG. 1 is realized using a CPU.
[0038]
The character string conversion device 1 includes a CPU 23, a memory 27, a hard disk 26, a CRT 30, an FDD 25, a keyboard 28, a mouse 31, and a bus line 29. The CPU 23 controls each unit via the bus line 29 according to a control program stored in the hard disk 26.
[0039]
This control program is read from the flexible disk storing the program via the FDD 25 and installed in the hard disk 26. In addition to the flexible disk, it may be installed on a hard disk from a computer-readable storage medium in which a program such as a CD-ROM or an IC card is substantially integrated. Furthermore, it may be downloaded using a communication line.
[0040]
In the present embodiment, the program stored in the flexible disk is indirectly executed by the computer by installing the program from the flexible disk to the hard disk 26. However, the present invention is not limited to this, and the program stored in the flexible disk may be directly executed from the FDD 25. Note that programs that can be executed by a computer are not only those that can be directly executed by simply installing them, but also those that need to be converted to other forms once (for example, those that have been compressed) are decompressed. Etc.), and further executable in combination with other module parts.
[0041]
The hard disk 26 stores dictionary data. The memory 27 stores various calculation results and the like. Conversion candidates and the like are displayed on the CRT 30.
[0042]
The display on the CRT 30 will be described with reference to FIG. In the editing screen 100 of the CRT 30, an area 110 is provided in the lower part of the screen in addition to the area for displaying the text. In the area 110, various messages and states are displayed, and candidates for homonyms are displayed. An enlarged view of the area 110 is shown on the right side of the figure. Area 120 in area 110 is used as an area for displaying word candidates when an instruction for displaying the next candidate group is given.
[0043]
Here, [Insert] means that a character string input from the keyboard 4 is inserted immediately before the cursor 102, and [Confirm] means that the user performs Kana document conversion such as Kana-Kanji conversion and the like. It means that the selected word is not read back later. “Kana-Kana-Kan” indicates that the Kana-Kanji conversion mode is continuous phrase and Kana input.
[0044]
As for the above-mentioned [Insert], [Insert] and [Overwrite] can be appropriately changed by operating a predetermined control key. Similarly, [Kana Han] can be changed mutually with [R Han] for Roman character input.
[0045]
In FIG. 3, when a user inputs a character string from the keyboard 28 and presses a conversion key, a character string conversion process is executed as described later. As a result, a sentence (or phrase) consisting of a high-priority word converted by each reading is displayed in an undetermined state at the position of the cursor 102 on the screen. At this time, when the next candidate key is pressed, a candidate group of the first phrase among the displayed unconfirmed words is displayed in the area 120.
[0046]
In the present embodiment, the space key is associated with the kana document conversion key and the next candidate key. That is, when the space key is pressed immediately after inputting a reading, it is determined as a kana document conversion key. In this state, when the space key is continuously input, it is processed as the next candidate key. When a space key is simply input without inputting a reading, or when the control key, the shift key, and the space key are pressed simultaneously, the input is processed as an original blank character.
[0047]
The confirmation of the candidate is made by pressing the return key at that point or subsequently pressing the reading key if the state when the conversion key is pressed (first candidate) is acceptable. When the candidate group is displayed in the area 120 by pressing the next candidate key, one of the candidates is selected (the candidate group is numbered and the number is entered), and then the return key (confirmed) Confirm by pressing (key). Note that the return key input when the reading is confirmed literally indicates a line feed key.
[0048]
3. FLOWERG
OneNext, a program stored in the hard disk 26 will be described.
[0049]
3.1 Automatic registration and conversion processReason
FigureThe document processing operation will be described with reference to the flowchart of FIG. The user inputs an English character string from the keyboard 28. The CPU 23 determines whether or not an English character string is input (step ST1 in FIG. 4). When the English character string is input, the input English character string is stored in the Roman character buffer 27a (step 4 in FIG. 4). ST3). Next, it is determined whether or not the input alphabetic character string can be read based on the following Roman character reading rules (step ST5).
[0050]
Romaji reading rules
Rule 1: If the roman character string is a character string registered in the romaji kana conversion table, the corresponding romaji character string is converted to a kana character.
[0051]
Rule 2: When English consonant characters are continuously input, the first English character is converted to “”, and then Rule 1 is repeated.
[0052]
CPU23 performs kana conversion, when a Roman character can be read (step ST7). The converted kana character is stored in the reading character string buffer 27b (step ST9). For example, when an English character string as shown in FIG. 5A is input, since it is a vowel when “DA” is input, kana conversion processing is performed, and “DA” is stored in the read character string buffer 27b. Is memorized.
[0053]
By repeating such processing, a reading character string is generated from the input English character string. For example, when the above-described Roman Kana-Kana conversion rule is applied to the character string shown in FIG. 5A, a reading character string as shown in FIG. 5B is generated and stored in the reading character string buffer 27b.
[0054]
In this way, data is sequentially stored in the Roman character buffer 27a and the reading character string buffer 27b. In the case of English letters, numbers, and symbol keys that cannot be read in Roman letters, they are stored in the read character string buffer 27b as they are.
[0055]
The CPU 23 determines whether or not the conversion key is operated (step ST11), and steps ST1 to ST9 are repeated as long as the conversion key is not operated. In this state, when the user operates the conversion key, the CPU 23 determines that the conversion key has been operated, and performs the phrase break and dictionary conversion processing (step ST13).
[0056]
The process of step ST13 is the same as the conventional phrase segmentation and dictionary conversion process. Briefly, the phrase candidate of the reading character string is extracted from the data in the reading character string buffer 27b by the data (word information, grammatical information, etc.) stored in the dictionary unit 26b. Furthermore, using the other data stored in the dictionary unit 26b, the extracted phrase candidates are checked grammatically and semantically to narrow down the phrase candidates. Along with such narrowing down, a phrase break process is performed.
[0057]
Next, the CPU 23 tentatively determines conversion candidates by the minimum cost method, and stores the tentatively determined conversion candidates in the written character string buffer 27c (step ST15). In addition to the minimum cost method, a known method may be used.
[0058]
Next, the CPU 23 determines whether or not a Roman character break changing process is necessary (step ST19 in FIG. 8). The Romaji break change process will be described later. If it is determined that the Romaji break changing process is not necessary, the character string stored in the display character string buffer 27c is displayed on the CRT 30 (step ST21).
[0059]
The user determines whether the displayed conversion result is correct or not, and if it is correct, operates the confirmation key. If you determine that the paragraph break is incorrect,
Operate the phrase break correction key. If the phrase break is correct but the wrong homonym is displayed, the next candidate key is operated.
[0060]
The CPU 23 determines which key has been operated in step ST23. If the phrase break correction key is operated, the phrase break position is changed in accordance with the contents of the key (step ST25), and the processing from step ST13 onward in FIG. 4 is repeated.
[0061]
When the next candidate key is operated, the CPU 23 performs reverse romaji conversion processing (step ST27). The reverse romaji conversion process will be briefly described. Here, it is assumed that the reverse Roman character conversion process is performed on the “date” portion of FIG. 5C. The CPU 23 converts “date” into an English character string with reference to the Roman character buffer 27a. Then, a first candidate (“DATE”) expressed in full-width and a second candidate (“date”) expressed in half-width size are generated.
[0062]
In this way, the input English character string can be converted by performing the reverse romaji conversion process.
[0063]
Next, in addition to these two candidates, the CPU 23 combines the third to seventh candidates read from the dictionary unit 26b and presents them in the candidate display area 120 of the CRT 30 (see FIG. 5D). In this way, the character string that has been subjected to reverse romaji conversion is displayed together with the candidates registered in the dictionary unit 26b and presented as a user's selection target.
[0064]
The user gives a candidate specifying command for selecting “1.DATE” from the next candidates presented in the area 120. When such a candidate specifying command is given, the CPU 23 changes the display of the area 120 as shown in FIG. 5E and displays the change candidates as shown in FIG. 5F.
[0065]
Since the desired conversion result is obtained, the user operates the confirmation key. Thereby, the CPU 23 performs a confirmation process (step ST31). As a result, the character string shown in FIG. 5F is stored in the written character string buffer 27c.
[0066]
Next, the CPU 23 determines whether or not a character string that has been subjected to reverse Romaji conversion exists in the determined character string (step ST33). When it exists, CPU23 memorize | stores the said character string by which the reverse Roman character conversion was carried out in the dictionary part 26b. In this case, as shown in FIG. 9, the CPU 23 reads and adds the character string “DATE” as “date” and updates the learning information.
[0067]
In this way, a new word is automatically registered in the dictionary unit 26b without requiring any user dictionary registration operation. Therefore, for example, when the user next inputs the characters “To date, enter date” and press the conversion key, “Enter DATE to check the date. ”Can be converted correctly and at once. In other words, even if you do not prepare a Japanese-English dictionary or an English-Japanese dictionary, you can enter foreign words as they are and mix them in kana-kanji mixed sentences. There is no need for new registration by the operation.
[0068]
As described above, in the present embodiment, when a word that is not registered in the dictionary unit 26b is detected, these kana character strings are reversely converted into Roman character strings at the time of key input as conversion candidates. Are converted into uppercase full-width, lowercase half-width, uppercase half-width, lowercase full-width, etc., and presented to the user as one of the next candidates. Furthermore, when a character string displayed as the original kana character string or a character string obtained by converting hiragana into katakana is presented as one of the next candidates, the user's selection range can be expanded.
[0069]
If a character string that has been subjected to reverse romaji conversion is preferentially presented as the next candidate presentation order, the user can handle Japanese and foreign languages without further distinction. In addition, whether the reverse Roman alphabet conversion to English capital letters first, the conversion to English characters first, the full-width notation first, the half-width notation character string to be displayed as one of the next candidates, etc. If it is possible to customize by selection, it becomes easier for the user to use reverse Romaji conversion.
[0070]
When a candidate character string that has been converted into a reverse romaji is adopted as a confirmed character string, it can be automatically and sequentially registered in the dictionary unit 26b, and this function creates a Japanese sentence in which Japanese and foreign languages are mixed. In this case, a new foreign language can be registered in the dictionary without requiring the user to perform any mode conversion key or dictionary registration operation.
[0071]
As described above, in this embodiment, when the next candidate key is pressed, the character string returned to the input English character is displayed as the next candidate target. Thereby, the user can obtain a desired English character string only by operating the conversion key, without operating the post-conversion key (the key for converting the kana-converted character string into the English character string again). . Further, when such an English character string is confirmed, it is automatically registered. This eliminates the need for new registration in the dictionary.
[0072]
In the present embodiment, the automatic registration process is performed when registering in the dictionary. However, the present invention is not limited to this, and the user operates the post-conversion key to convert it into an English character string, and the conversion is confirmed. At that time, the dictionary may be registered.
[0073]
3.2 Romaji separatorReason
OneNext, a description will be given of the process of changing romaji separators. Romaji delimiter change processing means that when reading an input character string in Roman letters, the input character string is converted to a character string registered in the dictionary because the character is read in a part different from the user's intention. Romaji reading process without being bound by the Romaji reading rules.
[0074]
For example, if “ATOK” is registered in the dictionary and “atokyourei” is input to convert it to “example for ATOK”, the Roman reading of the above-mentioned Roman character will result in “Romantic reading” and the user ’s It will deviate from the input intention. By performing the Romaji break changing process, it is possible to read the Romaji as intended by the user and to obtain a correct conversion result.
[0075]
In step ST19 in FIG. 8, when there is a high possibility that the Romaji break changing process is necessary, the CPU 23 performs the processes after step ST41 in FIG. In the present embodiment, when there is a one-character phrase, the CPU 23 determines that there is a high possibility that the Romaji break changing process is necessary.
[0076]
For example, when “atkyourei” is input to convert “ATOK example”, the text “Atokyorei” is read as romaji and converted to “after, strong cold” with reference to the dictionary shown in FIG. In this case, since there is a one-character phrase “after” in the character string that has been converted to Kanji, it is determined in step ST19 that Roman character delimiter change processing is necessary.
[0077]
The CPU 23 re-divides the kana characters of interest in the corresponding phrase with English characters whose flag rmpos is “0”. The flag rmpos is a flag for discriminating whether an input English character string is a vowel or a consonant when it is read in Roman characters. In this step, a portion that can be converted into a Roman character at the stage of reading the input character string is set as a flag rmpos [1], and the other portion is set as a flag rmpos [0]. In addition, the kana character of interest is initially a kana character that is the first after the clause and includes the flag rmpos [0].
[0078]
More specifically, as shown in FIG. 10A, when “atkyourei” is input, the flag rmpos is set to “1,0,1,0,0,1,1,0, 1, 1 ". In this case, the kana character of interest is “Re” of “A, To, Ki, Cho, U, Re, I”. Therefore, in this case, “re” is converted into an English character string “r”, “i” and converted into an English character mixed reading character string “Ato-r” as shown in FIG. 10D.
[0079]
The CPU 23 determines whether or not this “Ato-r” is registered in the dictionary unit 26b (step ST43). If it is not registered, it is determined whether or not the next separation is possible (step ST45). If it is possible, the process of step ST41 is repeated.
[0080]
In this case, since the next delimiter is possible, kana conversion is performed with the kana character including the next flag rmpos [0]. As a result, “after ky” is obtained as shown in FIG. 10E. The CPU 23 determines whether or not “after ky” is registered in the dictionary unit 26b (step ST43). If it is not registered, it is determined whether or not the next separation is possible (step ST45). If it is possible, the process of step ST41 is repeated.
[0081]
In this case, since the next delimiter is possible, kana conversion is performed with the kana character including the next flag rmpos [0]. As a result, “k” is obtained as shown in FIG. 10F. The CPU 23 determines whether or not “after k” is registered in the dictionary unit 26b (step ST43). In this case, since “k” is registered as shown in FIG. 7, a Roman character kana conversion process is performed at the position (step ST 49), and the processes after step ST 13 in FIG. 4 are performed.
[0082]
By performing the process of step ST13, the input character string “atokyourei” is converted to “example for ATOK”.
[0083]
If the next delimiter is not possible in step ST45, it is determined that the Roman delimiter changing process is unnecessary, and the conversion candidate provisionally determined in step ST15 in FIG. 4 is set as a conversion candidate (step ST47). Then, the processing from step ST21 onward in FIG. 8 is performed.
[0084]
In this way, a character string including English characters can be accurately converted. In this embodiment, the search is performed in the long reading order until the search is successful. However, the search may be performed in the short reading order. Furthermore, if the search is successful, the processing is stopped, but if possible, the search may be performed for all, and the highest evaluation may be selected.
[0085]
In the present embodiment, when there is a one-character phrase, it is determined that there is a high possibility that the Romaji break changing process is necessary, and the Romaji break changing process is performed. A case where there is a high possibility that the delimiter changing process is necessary may be set, and the same process may be performed. For example, there is a clause of an unregistered word.
[0086]
As an unregistered word clause, for example, “original” is registered in the dictionary, and “original appli” is input to convert it to “original app”. In this case, Kana conversion is performed as “Originapuri”, and Kanji conversion is performed, for example, “Folding technique Napuri”. In this case, “Naapuri” is a phrase of an unregistered word. In such a case as well, it can be determined that there is a high possibility that the Romaji break changing process is necessary as in the case where the one-character phrase is present.
[0087]
4). Other implementationsstate
UpIn the embodiment, it is determined whether or not the Roman character break changing process is necessary after the Kanji conversion, but it is determined whether or not the Roman character break changing process is necessary in a state where the character is input. Also good. For example, when a character is input in a state where the shift key is pressed, it is determined that there is a high possibility that the Roman character delimiter changing process is necessary, and the Roman character reading rule is converted from the first English character string. It is possible to decide to perform separation processing that is not restricted by
[0088]
For example, if “ATOK” is registered in the dictionary, and “KonoAtokyoiwomotiteite” and “A” are input using the shift key to convert “Using this example for ATOK”, Romaji kana conversion.
[0089]
The CPU 23 performs Roman character kana conversion from the character “A” input using the shift key to a portion that does not belong to the alphabetic break. That is, of “konoAtokyoiwomotite”, “Atokyouei” is delimited by the character string “atkyoour” up to a part that does not belong to the delimiter of alphabetic characters, and is converted into “Atokyo”. Then, it is determined whether or not this “Ato-r” is in the dictionary.
[0090]
If not, re-separation processing is performed in the same manner as described above. In this manner, a character string including English characters can be reliably converted without a user command.
[0091]
In this case as well, it can be performed in the order of short reading.
[0092]
In this way, when an English character string is input to the input means, if a compulsory classification instruction command is added, a kana character string including an English character string in the input English character string is obtained. It may be.
[0093]
Moreover, you may make it obtain the reading character string containing the English character in the said input English character string as follows. In the character string existing in the dictionary, it is determined whether or not there is a portion including a kana up to the corresponding English character in the character string including the English character at the stage of converting the Roman character. Then, if it exists, it is determined whether or not the following letter in the input character matches the following letter in the character string stored in the dictionary. Do.
[0094]
For example, when “after k” is registered, when “atokyourei” is input, “Akiyorei” is converted into a Roman character. It is determined whether or not there is a character string of the “after” part of “after k” registered in the dictionary (hereinafter referred to as “kana common part”) in such input characters. In this case, since there is a kana common part, the alphabetic characters that follow the kana common part are extracted for “Akiyorei” in the input character string. As for the alphabetic characters following the kana common part, all combinations up to the part before the next vowel part are extracted. Therefore, in this case, two types “k” and “ky” are extracted. As for the characters registered in the dictionary, the alphabetical character following the kana common part is “k”. In this case, since “k” is considered to be the same, “atokyourei” is divided into “atok” and “yourei” in this case.
[0095]
In this case, since it is inefficient to perform such processing for all locations, such processing may be performed only when there is the one-character phrase or when there is an unregistered word.
[0096]
In the above embodiment, the Roman character delimiter changing process is executed by referring to the Roman character buffer and returning to the English character. However, the Roman character reading rule in FIG. You may make it perform.
[0097]
In the present embodiment, the CPU 23 is used to realize the function shown in FIG. 1, and this is realized by software. However, some or all of them may be realized by hardware such as a logic circuit.
[Brief description of the drawings]
FIG. 1 is a functional block diagram of a character string converter 1 according to the present invention.
FIG. 2 is a diagram illustrating an example of a hardware configuration of the character string conversion device 1 illustrated in FIG. 1;
FIG. 3 is a diagram showing a display on a CRT 30;
FIG. 4 shows a flowchart of conversion processing.
FIG. 5 is a diagram showing data contents and display examples of a Roman character buffer, a Kana buffer, and a notation character buffer.
FIG. 6 is a diagram showing Roman character reading rules.
FIG. 7 is a diagram showing the contents of a dictionary unit 26b.
FIG. 8 shows a flowchart of a conversion process.
FIG. 9 is a diagram showing the contents of word data added to the dictionary unit 26b.
FIG. 10 is a diagram showing data contents of a Roman character buffer and a Kana buffer.
FIG. 11 is a flowchart of Roman character segment change processing.
[Explanation of symbols]
63 ... Character string storage means
50 ... Conversion means
70 ... dictionary means
23 ... CPU
27 ... Memory

Claims

Dictionary means for storing a post-conversion character string corresponding to the pre-conversion character string as a pre-conversion character string including an English character;
A Roman character kana conversion process is performed on an English character string input in the Roman character input mode based on a Roman character reading rule to generate a read character string, and the dictionary means is searched for the pre-conversion character in the read character string. If the column exists, conversion means for outputting the corresponding character string as a converted character string,
In a character string conversion device comprising:
The conversion means stores a flag indicating a break in the Roman character kana conversion process for the input English character string, and performs a Roman character kana conversion process at a position that is not the break for the reading character string. In addition to executing an English character mixed character string acquisition process for obtaining a character string, if the English character mixed character string exists in the pre-conversion character string by searching the dictionary means, the corresponding English character string is Output as a string after conversion,
String conversion apparatus according to claim.

The character string conversion device according to claim 1,
The English character mixed character string acquisition processing is repeatedly executed by sequentially shifting the position from the last or frontmost character string in the reading character string,
Character string converter characterized by the above.

In the character string conversion device according to claim 1 or 2,
The conversion unit includes a determination rule storage unit that stores a determination rule as to whether or not a conversion candidate character string is preferable as a conversion candidate, and determines that the conversion candidate character string is not preferable as a conversion candidate based on the determination rule. And obtaining a reading character string including an English character in the input English character string,
Character string converter characterized by the above.

  Dictionary means for storing a post-conversion character string corresponding to the pre-conversion character string as a pre-conversion character string including an English character;
  An English character string input in the Roman character input mode is subjected to a Roman character kana conversion process based on a Roman character reading rule to generate a read character string, and the dictionary means is searched for the pre-conversion character in the read character string. If the column exists, conversion means for outputting the corresponding character string as a converted character string,
  A character string conversion method using a character string conversion device comprising:
  Causing the conversion means to execute the following processing;
  A flag indicating the break of the Roman character kana conversion process is stored for the input English character string, and a Roman character kana conversion process is performed on the read character string at a position that is not the break to obtain an English character mixed character string. ,
  When the dictionary means is searched and the English character mixed character string is present in the pre-conversion character string, the corresponding English character string is output as a post-conversion character string.
  A character string conversion method using a character string conversion device characterized by the above.

In the character string conversion method according to claim 4,
The English character mixed character string acquisition processing is repeatedly executed by sequentially shifting the position from the last or frontmost character string in the reading character string,
Character string conversion method characterized by