JP3847869B2

JP3847869B2 - Character string conversion apparatus and method

Info

Publication number: JP3847869B2
Application number: JP34843496A
Authority: JP
Inventors: 雄二小林
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1996-12-26
Filing date: 1996-12-26
Publication date: 2006-11-22
Anticipated expiration: 2016-12-26
Also published as: JPH10187691A

Description

【０００１】
【発明の属する技術分野】
本発明は、文字列の変換を行う文字列変換装置およびその方法に関するものである。
【０００２】
【従来の技術】
従来、文字列の変換をおこなう文字処理装置として、かな読みを入力し、適切な漢字文字列に変換するかな漢字変換装置がある。
【０００３】
かな漢字変換においては、入力のかな読みに対して同音語と呼ばれる、可能な漢字表記が複数個存在することが多く、第１優先解として、どの漢字表記を出力するかについての種々の手法が実施されてきた。
【０００４】
例えば、単語の出現度数を頻度として格納し、頻度値が最も高い漢字表記に決定する方法、用例と呼ばれる単語対に格納された漢字表記を優先する方法などが実施されている。
【０００５】
【発明が解決しようとする課題】
しかしながら、従来行われてきた第１優先解決定手法によれば、かな読みに対して意味的に妥当な漢字表記を出力することはできるが、漢字表記文字数に関する制約がないため、使用可能な文字数に制限のある入力状況にあっては、必ずしも適切な漢字表示を得ることが困難であった。例えば、文字数制限のある原稿を書くような状況にあっては、意図を伝えることができる範囲で表記の文字数を抑えた漢字候補を第２優先以下の候補の中から選びなおす必要があった。また、同じかな読みに対して得られる漢字表記は辞書に格納されている数種に限定されるため、その中で所望の意図を伝える文字数の少ない漢字表記が存在しない場合があり、異なる表現で入力し直さなければならなかった。
【０００６】
本発明は、上述の点に鑑みてなされたものであり、その目的とするところは、入力の文字列に対して意味的に妥当な複数の変換文字列のうち、最も文字列数の少ない変換文字列を優先して出力する文字列変換装置およびその方法を提供することである。
【０００７】
【課題を解決するための手段】
上記目的を達成するために、本発明の文字列変換装置は以下の構成を有する。すなわち、変換元となる変換元文字列を入力する入力手段と、前記変換元文字列に少なくとも１つの単語を対応づけ、各単語に少なくとも１つの表記文字列を対応づけて格納する辞書手段と、前記辞書手段から、入力された前記変換元文字列に対応する１つの単語を決定する決定手段と、前記辞書手段から、前記決定手段により決定された単語に対応する表記文字列の内、最短の表記文字列を検索する検索手段と、前記検索手段により検索された最短の表記文字列を、前記変換元文字列を変換した表記文字列として出力する出力手段とを備える。
【０００８】
あるいは、かな文字列と、該かな文字列に対応する少なくとも１つの単語と、各単語に対応する少なくとも１つの表記文字列と、該表記文字列が読みに対応した通常表記か読みに対応しない短縮表記かを示す表記コードと、前記単語の意味を示す語義コードと、該単語の尤度とを対応付けて記憶する第１の辞書手段と、前記語義コードと、該語義コードと対応する少なくとも１つの同義語の表記文字列とを対応付けて記憶する第２の辞書手段と、短縮表記を優先する短縮優先モードを指示する指示手段と、かな文字列を入力する入力手段と、前記入力手段により入力されたかな文字列を、前記第１の辞書手段に記憶された単語を基にして文節に分割して、１つ以上の文節候補を作成し、各文節候補を構成する単語の尤度に基づいて、最適な文節候補を決定する文節候補決定手段と、前記文節候補決定手段により決定された文節候補ごとに、前記短縮優先モードが指示されている場合には、前記文節候補を構成する単語の表記文字列を、前記第１の辞書手段から検索し、さらに、当該単語の語義コードに対応する表記文字列を第２の辞書手段から検索して、当該検索された表記文字列の中で最短の表記文字列を出力し、前記短縮優先モードが指示されていない場合には、前記文節候補を構成する単語の表記文字列を前記第１の辞書手段から検索し、前記表記コードが通常表記を示す表記文字列を出力する検索手段と、前記検索手段により出力された表記文字列を表示出力する出力手段とを備える。
【００２１】
【発明の実施の形態】
以下、図面を参照して本発明の実施の形態として、かな漢字変換機能を有する日本語処理装置を詳細に説明する。
＜装置の構成＞
図１は、本発明に係る日本語処理装置の全体構成の一例である。
【００２２】
図示の構成において、ＣＰＵ１０１は、マイクロプロセッサであり、文字処理のための演算、論理判断等を行ない、アドレスバスＡＢ、コントロールバスＣＢ、データバスＤＢを介して、それらのバスに接続された各構成要素を制御する。
【００２３】
アドレスバスＡＢはＣＰＵ１０１の制御の対象とする構成要素を指示するアドレス信号を転送する。コントロールバスＣＢはＣＰＵ１０１の制御の対象とする各構成要素のコントロール信号を転送して印加する。データバスＤＢは各構成機器相互間のデータ転送を行なう。
【００２４】
次にＲＯＭ１０２、読出し専用の固定メモリである。ＲＯＭ１０２は、図７〜図１４として後述するＣＰＵ１０１による制御の手順を記憶させたプログラムエリアが格納されている。
【００２５】
また、ＲＡＭ１０３は、１ワード１６ビットの構成の書込み可能のランダムアクセスメモリであって、各構成要素からの各種データの一時記憶に用いる。図１５に示すとおり、ＲＡＭ１０３は、入力読みバッファ（ＩＢＵＦ）、出力漢字バッファ（ＯＢＵＦ）、文節候補テーブル（ＢＣＴ）、最大文節尤度格納ワークメモリ（ｍｙｄ）、カレント文節候補尤度格納ワークメモリ（ｙｕｄ）、確定文節候補番号格納ワークメモリ（ｄｉｄ）、カレント文節候補番号格納ワークメモリ（ｂｉｄ）、最短漢字表記長格納ワークメモリ（ｋｃｌ）、カレント漢字表示長格納ワークメモリ（ｃｎｔ）、漢字表記一時記憶バッファ（ＯＴＢＵＦ）、短縮漢字表記優先指示フラグ（ｓｆｌｇ）、登録短縮表記読みバッファ（ＵＹＢＵＦ）、登録短縮表記バッファ（ＵＨＢＵＦ）などから構成されている。
【００２６】
入力読みバッファＩＢＵＦは、かな漢字変換を行う対象となる入力文字列を格納するバッファであり、出力漢字バッファＯＢＵＦはかな漢字変換処理後の変換済み漢字列を格納するバッファであって、かな漢字変換を行う同種の情報処理装置において一般的に用いられているので、詳細な説明は省略する。文節候補テーブルＢＣＴは、かな漢字変換処理途上で、入力の読み文字列に対して可能な漢字表記候補を求めるために必要な情報を格納するバッファであって、図６において後述する。最大文節尤度格納ワークメモリｍｙｄは文節候補テーブルにおいて現れる文節の開始位置を同じくする文節候補のうちの最大の文節尤度を格納するワークメモリである。カレント文節候補尤度格納ワークメモリｙｕｄは、現在処理対象となっている文節候補の尤度を格納するワークメモリである。確定文節候補番号格納ワークメモリｄｉｄは、最大文節尤度を持つ文節候補の番号を格納するワークメモリである。カレント文節候補番号格納ワークメモリｂｉｄは、現在処理対象となっている文節候補の番号を格納するワークメモリである。最短漢字表記長格納ワークメモリｋｃｌは確定文節候補番号格納ワークメモリに示される文節候補の持つ可能な漢字表記のうち、最も漢字表記文字数の少ない漢字候補の文字数を格納するワークメモリである。カレント漢字表記長格納ワークメモリｃｎｔは、確定文節候補番号格納ワークメモリに示される文節候補の持つ可能な漢字表記のうち、現在注目している漢字表記候補の漢字文字数を格納するワークメモリである。短縮漢字表示優先指示フラグはｓｆｌｇ、短縮表記を第１優先解として出力するかどうかを指定する変換条件指示フラグで、１＝“短縮表記を優先する”、０＝“短縮表記を優先しない”の値を格納する。登録短縮読みバッファＵＹＢＵＦおよび登録短縮表記バッファＵＨＢＵＦは、短縮表記として追加登録あるいは削除する際の、登録短縮表記の読み、表記をそれぞれ格納するバッファである。
【００２７】
ＫＢ１０５はキーボードであって、アルファベットキー、ひらがなキー、カタカナキー、句点等の文字記号入力キー、短縮表示登録、削除、および操作の取消を指示する操作指示キー、短縮表記の優先、非優先を指示する短縮表記優先指示キー、ひらがなで入力された文字列を漢字混じりの文字列に変換するかな漢字変換キー、及び、カーソル移動を指示するカーソル移動キー等のような各種の機能キーを備えている。
【００２８】
ＤＩＳＫ１０４は図４において後述するかな漢字変換用単語辞書４０１、図５において後述する同義語辞書５０１および文書データ等を記憶するための外部メモリである。文書データ等は必要に応じて保管され、また、保管されたデータはキーボードの指示により、必要な時呼び出される。
【００２９】
ＣＲ１０６はカーソルレジスタである。ＣＰＵ１０１により、カーソルレジスタの内容を読み書きできる。後述するＣＲＴコントローラ（ＣＲＴＣ）１０８は、ここに蓄えられたアドレスに対する表示装置ＣＲＴ上の位置にカーソルを表示する。
【００３０】
ＤＢＵＦ１０７は表示用バッファメモリで、表示すべきデータのパターンを蓄える。
【００３１】
ＣＲＴＣ１０８はカーソルレジスタＣＲ１０６及びバッファＤＢＵＦ１０７に蓄えられた内容を表示器ＣＲＴ１０９に表示する役割を担う。
【００３２】
また、ＣＲＴ１０９は陰極線管等を用いた表示装置であり、その表示装置ＣＲＴ１０９におけるドット構成の表示パターンおよびカーソルの表示をＣＲＴコントローラ１０８で制御する。
【００３３】
さらに、ＣＧ１１０はキャラクタジェネレータであって、表示装置ＣＲＴ１０９に表示する文字、記号のパターンを記憶するものである。
【００３４】
かかる各構成要素からなる本発明文字処理装置においては、キーボードＫＢ１０５からの各種の入力に応じて作動するものであって、キーボードＫＢ１０５からの入力が供給されると、まず、インタラプト信号がＣＰＵ１０１に送られ、そのＣＰＵ１０１がＲＯＭ１０２に記憶してある各種の制御信号を読出し、それらの制御信号に従って、各種の制御が行われる。
＜変換される文字列の例＞
上記の構成よりなる本実施例装置におけるかな漢字変換が実行される例を図２を参照して以下に説明する。
【００３５】
（ａ）は、かな漢字変換の対象となる文字列を入力したときの入力読みバッファの状態を示している。図中において、「ろうどうくみあいのせんもんいいんかい」は、かな漢字変換の対象となる文字列であり、この状態で「変換」キーが押下されると、かな漢字変換の対象となる文字列に対してかな漢字変換が行われ、（ｂ）のように「労組の専門委」のように、（ｃ）に示す非短縮表示優先指示状態での変換結果「労働組合の専門委員会」より短い表記長の漢字候補が優先された状態になる。
＜短縮表記登録時の表示例＞
図３は、短縮表記を追加登録あるいは登録されている短縮表示を削除する実行例である。キーボードＫＢ１０５上の登録キーあるいは削除キーの押下により、図示の様なメッセージが表示され、オペレータによる短縮表記読みおよび短縮表記が促される。引き続き、入力した短縮表記に対する登録あるいは削除キーの押下により、機能が実行され、単語辞書に対する登録あるいは削除が行われる。図示の状態での登録キー押下により、読み「わーくすてーしょん」に対して「ＷＳ」なる短縮漢字表記が登録される。また、削除キーの押下により、該短縮表記漢字が単語辞書より削除される。
＜単語辞書の例＞
図４は、本発明の実施形態に係る日本語処理装置における単語辞書の例である。
【００３６】
図中に示されるように６つのフィールドより構成され、第１フィールドである単語連番は、単語辞書に格納されている単語をユニークにする連続番号であり、後述する単語の読みと品詞と語義コードが等しい単語に対してただ一つの連続番号を与える。第２フィールドである読みは、単語の読みを格納する。第３フィールドは単語の品詞を格納する。第４フィールドに格納される語義コードは、単語を語義別に分類した語義コードを格納する。語義コードが異なる単語は異なる単語連番を持つ別の単語の扱いを受ける。また、この語義コードは図５に示す同義語を示す語義コードと同じものが格納される。第５フィールドには、単語の優先度を表す指標値として尤度を格納する。尤度が大きいほど単語としての尤もらしさが高くなる。第６フィールドの表記、表記タイプは単語の漢字表記を格納するとともに、その漢字表記が該当読みに対して、通常使用で変換されるべきタイプ（通常タイプ）であるか、短縮表記優先指示がなされている場合のみ変換されるべきタイプ（短縮表記タイプ）であるかの区別を格納する。図において（Ｎ）は通常タイプを、（Ｓ）は短縮表記タイプを表している。
＜同義語辞書の例＞
図５は、同じ語義コードを持つ同義語を語義コード別に分類して格納した同義語辞書である。一つの語義コードに対して格納される同義語は同じ意味を有する言い換え可能な単語であるとみなされる。
【００３７】
図６は、入力の読み列に対して解析途上の文節候補を格納する文節候補テーブルの一例である。文節候補テーブルは、個々の文節候補を識別する文節番、文節の自立部を構成する単語の読みを格納する自立語単語、自立語単語の単語辞書における識別番号である単語連番、文節候補の付属部の読みを格納する付属部、該文節候補の尤もらしさの指標となる尤度を格納する。尤度はその数が大きいほど文節の尤もらしさが高いことを意味する。同文節は入力の読み列上での文節候補の開始位置が等しい他の文節候補へのリンク情報を格納する。同じ文節開始位置である次の文節候補の文節番を格納し、“−１”は、同じ開始位置の文節候補がそれ以上存在しないことを意味する終端リンクである。次文節は該文節候補に引き続いて現れる次の文節候補へのリンク情報を格納する。同文節リンクと同じく次の文節候補の文節番を格納し、“−１”は、次の文節候補が存在しないことを意味する終端リンクである。同文節と次文節とがともに“−１”である文節候補は、入力読み列の終端まで達した最後の文節候補である。
【００３８】
上述の実施例の動作をフローに従って説明する。
＜キー入力に応じた処理手順＞
図７は本発明文字列変換装置の動作を示すフローチャートである。
【００３９】
（Ａ−１）においてキーボード１０５よりキーが押下され、割り込みが発生するのを待つ。キーが入力されると、（Ａ−２）において、キーが押下される直前の状態におけるキーに対応する機能を判別し、機能の種類に応じて（Ａ−３）、（Ａ−４）、（Ａ−５）、（Ａ−６）、（Ａ−７）のいずれかのステップに分岐する。
【００４０】
（Ａ−３）は、（Ａ−２）において文字入力と判定された場合の処理であり、入力された文字のコードを入力読みバッファＩＢＵＦに蓄える。この処理は、かな漢字変換を行う同種の情報処理装置において一般に行われている処理であり、公知であるので特に記述しない。
【００４１】
（Ａ−４）は、（Ａ−２）においてかな漢字変換と判定された場合の処理であり、入力読みバッファＩＢＵＦに蓄えられている文字を漢字に変換する。図８において詳述する。
【００４２】
（Ａ−５）は、（Ａ−２）において短縮表記の登録あるいは削除と判定された場合の処理であり、図３に示すように、操作者の指定による読み、表記を持った短縮表記漢字を単語辞書４０２に登録し、あるいは指定された短縮表記漢字を単語辞書から削除する。図１４において詳述する。
【００４３】
（Ａ−６）は、（Ａ−２）において短縮表示優先モード設定と判定された場合の処理であり、操作者ができるだけ短い表記を優先したかな漢字変換結果を得たいときに短縮表記優先モードとし、また短縮表記優先でない通常のモードに戻す設定を行う。キーボードＫＢ上の短縮表記優先指示キーの押下に従い、短縮漢字表記優先指示フラグｓｆｌｇの値を“１”に設定、再押下で“０”に設定とトグル式に値を設定する。
【００４４】
（Ａ−７）は、（Ａ−２）において文字入力、かなな時変換、短縮表記登録、モード設定以外（例えば、カーソル移動等）と判定された場合の処理であり、同種の情報処理装置において一般的に行われている処理であり、公知であるので特に記述しない。
【００４５】
（Ａ−８）は、上記の処理の結果、変更された部分を表示する表示処理である。表示するデータを読んではパターンに展開し、表示バッファに出力するという通常広く行われている処理である。
＜かな漢字変換処理の手順＞
図８は、（Ａ−４）の処理を詳細化したフローチャートである。
【００４６】
（Ｂ−１）は、入力読みバッファＩＢＵＦに格納された読みに従い、単語辞書４０１を検索し、検索された単語の文節接続検定を行うことにより、文節単位に分かち、文節候補を作成する文節分割処理である。文節分割処理では、文節候補テーブルＢＣＴを下記のごとく作成する。即ち、入力読みバッファＩＢＵＦの先頭から順次文節候補を作成していき、文節番をつけ、単語辞書より検索された自立語単語の読みと単語連番を格納する。また文節接続検定により接続可能と判定された付属部読みを付属部に格納する。また、以下の式をもって得られる優先度の指標値を尤度として格納する。
【００４７】

文節接続検定を行い、付属部尤度を算出する処理は、かな漢字変換を行う同種の装置における手法に準ずるものであるので、詳述しない。ひとつの文節候補を作成すると、次の文節候補を作成し、同文節リンク、次文節リンクを設定して文節候補テーブルを終端させる。
【００４８】
（Ｂ−２）は、文節候補テーブルに格納された文節候補より、第１候補解として最適な文節候補を選択し、漢字表記として出力バッファＯＢＵＦに出力する表記決定処理であり、図９に詳述する。
【００４９】
（Ｂ−３）は、次の文節候補を決定し出力するために、（Ｂ−２）で決定された文節候補の次文節リンクを取り出す。
【００５０】
（Ｂ−４）は（Ｂ−３）で取り出された次文節リンクが“−１”であるか否かにより、文節候補テーブルＢＣＴ上のすべての文節を変換終了したかを判定し、全文節変換ならばリターンし、そうでなければ（Ｂ−２）で次の文節を決定出力する。
＜１文節の表記決定の手順＞
図９は、（Ｂ−２）の処理を詳細化したフローチャートである。
【００５１】
（Ｃ−１）は、文節候補テーブルＢＣＴの中で文節開始位置を同じくする文節候補のうち、最も尤度の高い文節候補をひとつ決定する文節候補決定処理であり、図１０に詳述する。
【００５２】
（Ｃ−２）は、（Ｃ−１）で決定された文節候補の可能な漢字表記から、第１優先解として適当な漢字表記を決定する漢字表記決定処理であり、図１１に詳述する。
＜文節の候補を決定する手順＞
図１０は、（Ｃ−１）の処理を詳細化したフローチャートである。
【００５３】
（Ｄ−１）は、最大文節尤度格納ワークメモリｍｙｄを０で初期化する。
【００５４】
（Ｄ−２）は、カレント文節候補尤度格納ワークメモリｙｕｄに現在注目している文節候補の文節尤度を、カレント文節候補番号格納ワークメモリｂｉｄにその文節番号を格納する。
【００５５】
（Ｄ−３）は、現在注目している文節候補の文節尤度が最大であるかどうかを最大文節尤度格納ワークメモリｍｙｄと比較することで判定する。最大の尤度でなければ、（Ｄ−６）へ、最大であれば（Ｄ−４）に分岐する。
【００５６】
（Ｄ−４）は、現在注目している文節候補の文節尤度の値で最大文節尤度格納わくメモリｍｙｄを更新する。
【００５７】
（Ｄ−５）は、最大文節尤度を持つ現在注目している文節候補の文節番を確定文節候補番号格納ワークメモリｄｉｄに格納する。
【００５８】
（Ｄ−６）は、同じ文節開始位置である次の文節候補が存在するかどうかを判定する。文節候補の同文節リンクが“−１”であれば、同じ文節開始位置の他の文節候補は存在しないので（Ｄ−７）へ、そうでなければ、カレント文節候補番号格納ワークメモリｂｉｄを同文節リンクに示される文節番の値に更新して（Ｄ−３）へ分岐する。
【００５９】
（Ｄ−７）は、文節開始位置を同じくする文節候補中で最大の文節尤度を持つ文節候補を確定文節候補番号格納ワークメモリｄｉｄに示される文節候補に決定してリターンする。
＜文節候補に対する漢字表記を決定する手順＞
図１１は、（Ｃ−２）の処理を詳細化したフローチャートである。
【００６０】
（Ｅ−１）は、短縮漢字表記優先モードであるかどうかを短縮漢字表記優先指示フラグｓｆｌｇが“１”であるかどうかにより判定する。短縮漢字表記優先モードであれば（Ｅ−５）へ、そうでなければ（Ｅ−２）へ分岐する。
【００６１】
（Ｅ−２）以下は、短縮漢字表示優先でない通常の変換モードの際の漢字表示決定処理を示したものである。（Ｅ−２）は、決定された文節候補番号を持つ文節候補の自立語単語連番を取り出し、その連番を持つ単語の漢字表記を単語辞書の表記格納エリアから漢字表記を取り出す。
【００６２】
（Ｅ−３）で、取り出された漢字表記の表記タイプが短縮表記であるかどうかを判定し、短縮表記であるならば、この漢字表記を読み飛ばし、（Ｅ−４）で次の漢字表記を取り出して同様の判定処理を繰り返す。短縮表記でなかったならば、検索された漢字表記に決定する。
【００６３】
（Ｅ−５）以下の処理は、短縮漢字表記優先モードでの漢字表記決定処理を示したものである。（Ｅ−５）は、決定された文節候補番号を持つ文節候補の自立語単語連番を取り出し、その連番を持つ単語の漢字表記を単語辞書の表記格納エリアから最も漢字表記数の少ない漢字表記を取り出す処理であって、図１２に詳述する。
【００６４】
（Ｅ−６）は、決定された自立語単語の語義コードを取り出し、同義語辞書を検索し、同じ語義コードを持つ同義語の中から、最も漢字表記数の少ない漢字表記を取り出して、より表記文字数の少ない漢字表記に決定する処理であり、図１３に詳述する。
【００６５】
（Ｅ−７）で、（Ｅ−３）もしくは（Ｅ−６）で決定された漢字表記を出力バッファＯＢＵＦに格納して終了する。
＜単語辞書から最短表記を検索する手順＞
図１２は、（Ｅ−５）の処理を詳細化したフローチャートである。
【００６６】
（Ｆ−１）で最短漢字表記長格納ワークメモリｋｃｌに表記文字数として可能な最大値を越える大きな値を格納する。例えば、メモリｋｃｌに格納し得る最大値をセットする。
【００６７】
（Ｆ−２）で単語連番の示す単語の漢字表記を単語辞書４０１の漢字表記エリアから取り出してくる。
【００６８】
（Ｆ−３）で取り出した漢字表記の表記文字数をカウントし、その結果をカレント漢字表記長格納ワークメモリｃｎｔにセットする。
【００６９】
（Ｆ−４）で、最短の漢字表記であるかどうかを最短漢字表記長格納ワークメモリｋｃｌと比較することによって判定し、最短でなければ（Ｆ−７）へ、最短表記であれば（Ｆ−５）で最短漢字表記長格納ワークメモリｋｃｌを更新する。
【００７０】
（Ｆ−６）で、漢字表記一時記憶バッファＯＴＢＵＦに漢字表記を格納する。
【００７１】
（Ｆ−７）で、漢字表記エリアに次の漢字表記候補があるかどうか検索し、それ以上存在しなければ（Ｆ９）で最短漢字表記を決定する。存在すれば、（Ｆ−８）で次の漢字表記に注目点を移し、（Ｆ−２）にループする。
【００７２】
（Ｆ−９）で漢字漢字表記一時記憶バッファＯＴＢＵＦに格納されている漢字表記を最短漢字表記として決定する。
＜同義語辞書から最短表記を検索する手順＞
図１３は、（Ｅ−６）の処理を詳細化したフローチャートである。
【００７３】
（Ｇ−１）は、決定された文節候補の自立語単語が持つ語義コードと同じ同義語が存在するか同義語辞書５０１を検索する。
【００７４】
（Ｇ−２）で同義語が存在するかどうか判定し、存在しなければ終了する。
【００７５】
（Ｇ−３）で同義語辞書の同義語漢字表記を取り出してくる。
【００７６】
（Ｇ−４）で同義語漢字表記文字数をカウントし、カレント漢字表記長格納ワークメモリｃｎｔにセットする。
【００７７】
（Ｇ−５）で最短の漢字表記であるかどうかを最短漢字表記長格納ワークメモリｋｃｌと比較することによって判定し、最短でなければ（Ｇ−８）へ、最短表記であれば（Ｇ−６）で最短漢字表記長格納ワークメモリｋｃｌを更新する。
【００７８】
（Ｇ−７）で、漢字表記一時記憶バッファＯＴＢＵＦに漢字表記を格納する。
【００７９】
（Ｇ−８）で、同義語辞書に次の同義語があるかどうか検索し、それ以上存在しなければ（Ｇ−１０）で最短漢字表記を決定する。存在すれば、（Ｇ−３）へループする。
【００８０】
（Ｇ−１０）で漢字表記一時記憶バッファＯＴＢＵＦに格納されている漢字表記を最短漢字表記として決定する。
＜入力文字列の変換例＞
ここで、図２を例として具体的な動作を説明する。
【００８１】
かな入力に先立って、短縮表記優先キーを押下して短縮表記優先指示フラグｓｆｌｇを“１”にし、短縮表記に変換するよう指示しておく。
【００８２】
次に、かな漢字変換を行うためにかなで「ろうどうくみあいのせんもんいいんかい」と入力し、かな漢字変換キーを入力する。これにより図２（ａ）の文字列を対象として図７（Ａ−４）のかな漢字変換が実行される。
【００８３】
まず図８の（Ｂ−１）で、図６に示す文節候補テーブルが作成される。すなわち、単語辞書４０１を読みで検索し、「ろうどうくみあい」（単語連番９６５６８）と付属する「の」，「せんもん」（単語連番１３２８７），「せんもん」（単語連番１３２８８），「いいんかい」（単語連番１２８５）なる４つの文節候補を、文節番を０〜３として図６のように得ることができる。このうち、かな文字列中の同じ位置から始まる文節は文節番１と２の候補であるため、文節番１の同文節欄の値は２となっている。それらに後続する次文節はともに文節番３の文節候補である。また文節番３の「いいんかい」は、後続の文節が無いため、次文節の欄は−１である。また、尤度の欄は、前出の式を適用してそれぞれの文節候補ごとに得られる。
【００８４】
次に（Ｂ−１）において、各文節の漢字表記を決定する。そのためにまず図１０の手順で文節候補を決定する。最初の文節候補は文節番０の「ろうどうくみあい」であり、それ１つしか候補がないため、それを文節候補として決定する。
【００８５】
次に、決定された文節候補に対して漢字表記を図１１の手順で決定する。短縮表記優先指示キーにより短縮表記が優先するものと指示されている場合には、図１２の手順で単語辞書１０４に登録されたうちの最も語数の少ない表記を選んでその文字列の漢字表記候補として決定する。文字列「ろうどうくみあい」に関しては、単語辞書１０４に登録された最短の表記は「労組」であるため、それを単語辞書の検索による表記の候補として決定する。
【００８６】
次に図１３の手順で、同義語辞書５０１を検索する。単語辞書４０１に登録された語義コードは９８７３０であり、同義語辞書５０１でそれを検索すると、「労働組合」「労組」という２つの異なる文字列が登録されているものの、それは単語辞書４０１の登録内容よりも短いものではないため、そこからは選ばれず結局文字列「労働組合」が変換後の候補として決定され、それが（Ｅ−７）で出力バッファＯＢＵＦに格納される。この際には、漢字に変換された独立語「労働組合」とともに、付属部「の」も出力される。
【００８７】
こうして１文節の処理が終了すると、図８（Ｂ−３）により次の文節候補について処理が行われる。
【００８８】
図６によれば、文節候補テーブルＢＣＴの内容からつぎなる文節は文節番０の次文節欄を見ると文節１の「せんもん」である。また、文節番１の同文節欄を見ると−１ではなく、文節番２の文字列「せんもん」が同じ位置から始まる文節候補として存在している。
【００８９】
そこで、図１０の手順により、最も尤度の高い文節候補を選び出す。この場合、図６の通り、文節番１の尤度が１２０、文節番２の尤度が８０であり、文節番１が文節候補として決定される。こうして決定された文節候補について漢字表記を、前の文節と同じく図１１の手順で決定するが、この文字列に関しては短縮表記も同義語の登録もないため、単語辞書４０１の登録通りに「専門」が出力バッファＯＢＵＦに出力される。
【００９０】
次に、候補となった文節番１の次文節欄をたどって、文節番３について文節決定・漢字表記の処理を行なう。文節番３は最後の文節であり、他に候補もないため、文節番３の「いいんかい」が図１０の手順により文節候補に決定される。
【００９１】
短縮表記が指示されている場合には、この文節について、図単語辞書４０１を検索して漢字表記「委員会」を得る。これが単語辞書に登録された最短表記である。
【００９２】
次に図１３の手順により同義語辞書５０１を語義コード１０１２で検索し、最短文字列「委」を得る。これが漢字表記として決定され、出力バッファＯＢＵＦに出力される。
【００９３】
以上のようにして得られた図２（ｂ）に示す短縮表記による文字列「労組の専門委」が、図７の（Ａ−８）により表示される。
【００９４】
また、短縮表記優先指示キーを再び押下して、短縮表記優先指示フラグｓｆｌｇを“０”にしてから図２（ａ）の文字列「ろうどうくみあいのせんもんいいんかい」を入力すると、候補として決定される文節は短縮表示指示フラグｓｆｌｇが１の場合と同一であるが、各文節の漢字表記を決定する手順が異なる。この場合に、図１１の（Ｅ−１）から（Ｅ−２）に進むため、自立語に対応して単語辞書４０１に登録された最初の短縮表記でない文字列が漢字表記として出力される。そのため、出力される文字列は図２の（ｃ）のように「労働組合の専門委員会」という通常の表記の文字列である。
＜短縮表記の単語辞書への登録＞
図１４は、（Ａ−５）の処理を詳細化したフローチャートである。
【００９５】
（Ｈ−１）で登録あるいは削除する短縮表記の読みを入力し、登録短縮表記読みバッファＵＹＢＵＦに格納する。
【００９６】
（Ｈ−２）で登録あるいは削除する短縮漢字表記を入力し、登録短縮表記バッファＵＨＢＵＦに格納する。
【００９７】
（Ｈ−３）で登録か削除か実行する機能を指定し、（Ｈ−４）で実行機能の判定を行う。「登録」が指定されれば、（Ｈ−５）で短縮表記登録を行い、「削除」が指定されれば（Ｈ−６）で短縮表記削除を実行する。「取消」が指定されれば、何も実行せず、ただちに終了する。
【００９８】
（Ｈ−５）は短縮表記を追加登録する処理である。登録短縮表記読みバッファＵＹＢＵＦから読みを取り出し、登録短縮表記バッファＵＨＢＵＦから短縮表記漢字を取り出す。指定された読みで単語辞書の挿入ポイントを検索し、表記タイプを短縮表記タイプとして追加する。
【００９９】
（Ｈ−６）は短縮表記を単語辞書から削除する処理である。登録短縮表記読みバッファＵＹＢＵＦから読みを取り出し、登録短縮表記バッファＵＨＢＵＦから短縮表記漢字を取り出す。指定された読みで単語辞書の短縮表記を検索し、見つかった該短縮表記を単語辞書から抹消する。
【０１００】
この処理を行う際には、図３の表示画面から欄３０１に読みを、欄３０２に対応する表記を入力する。図３の例では、その状態で実行キーを押すと、文字列「ＷＳ」が短縮表記として、表記タイプ（Ｓ）を付して単語辞書に登録される。
【０１０１】
以上のように辞書に短縮表記を登録し、短縮表記を望む場合には、優先的に短縮表記に変換することで、簡単な操作で出力文字列の長さを短いものとして得ることができる。
【０１０２】
また、同義語を同義語辞書として登録しておいた場合、短縮表記を望む場合には、同義語辞書からも文字列を検索して最も短い文字列を得ることで、入力する文章の意図を変えずに同義語も含めて最も短い文字列に変換することができる。
【０１０３】
また、短縮表記を短縮表記と明示して辞書に登録しておくことで、入力した読み通りの正確な文字列を出力として得たい場合には、変換後の候補として不要な短縮表記の候補を表示せずに済み、煩雑さを防止できる。
【０１０４】
なお、以上の説明において、短縮表記を優先変換する対象となる機能として、かな漢字変換の機能に対する例しか示していないが、本発明はかな漢字変換に限定されるものではない。より表記文字数の少ない変換先文字列を優先して第１候補解とする本発明の主旨を逸脱するものでなければよい。かな漢字変換以外の機能に対しても、同様に短縮表記への優先的な文字列変換を行うことができる。例えば、英語文を日本語文に翻訳出力する翻訳装置においても、同様に、より表記長をつめた翻訳結果の日本語文の生成に本発明を適用することができる。
【０１０５】
上記装置の機能もしくは方法の機能によって達成される本発明の目的は、前述の本発明を実施した装置におけるプログラムを記憶させた記憶媒体によっても達成できる。すなわち、上記装置に、その記憶媒体を装着し、その記憶媒体から読出したプログラム自体が本発明の新規な機能を達成するからである。このための、本発明に係るプログラムの構造的特徴は図１６に示す通りである。図１６のマップの右側が、図４及び図５及びフローチャートにおける符号に対応している。
【０１０６】
本実施例においては、短縮表示優先モード時のみ変換される短縮漢字表記を通常の漢字表記と同じ辞書に格納するように構成したが、通常の漢字表記とは異なる、短縮漢字表記のみを格納するような辞書に格納するようにしてもよい。
【０１０７】
【他の実施形態】
なお、本発明は、複数の機器（例えばホストコンピュータ，インタフェイス機器，リーダ，プリンタなど）から構成されるシステムに適用しても、一つの機器からなる装置（例えば、複写機，ファクシミリ装置など）に適用してもよい。
【０１０８】
また、本発明の目的は、前述した実施形態の機能を実現するソフトウェアのプログラムコードを記録した記憶媒体を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記憶媒体に格納されたプログラムコードを読出し実行することによっても達成される。
【０１０９】
この場合、記憶媒体から読出されたプログラムコード自体が本発明の新規な機能を実現することになり、そのプログラムコードを記憶した記憶媒体は本発明を構成することになる。
【０１１０】
プログラムコードを供給するための記憶媒体としては、例えば、フロッピディスク，ハードディスク，光ディスク，光磁気ディスク，ＣＤ−ＲＯＭ，ＣＤ−Ｒ，磁気テープ，不揮発性のメモリカード，ＲＯＭなどを用いることができる。
【０１１１】
また、コンピュータが読出したプログラムコードを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼働しているＯＳ（オペレーティングシステム）などが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれる。
【０１１２】
さらに、記憶媒体から読出されたプログラムコードが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書込まれた後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれる。
【０１１３】
【発明の効果】
以上説明したように、本発明に係る文字列変換装置及びその方法によれば、変換文字列候補群として変換文字列数の少ない文字列を自動的に選択することができる。また、かな読みの異なる文字列であっても、入力意図を損なわずに短い漢字表記を得ることができるため、変換文字列の中から所望の短い文字列を選びなおす必要がなく、簡単に文字列数を抑制した変換結果を得る操作性の高い文字列変換装置を実現することができるという効果を奏する。
【０１１４】
【図面の簡単な説明】
【図１】実施形態の日本語処理装置の全体構成を示すブロック図である。
【図２】短縮表記優先変換の変換例を示した図である。
【図３】短縮表記のユーザ登録例を示した図である。
【図４】本実施例の単語辞書の構成の例を示した図である。
【図５】本実施例の同義語辞書の構成の例を示した図である。
【図６】本実施例の文節候補テーブルの構成の例を示した図である。
【図７】本実施例の動作全体の処理手順の一例を示すフローチャートである。
【図８】本実施例のかな漢字変換の処理手順の一例を示すフローチャートである。
【図９】本実施例のかな漢字変換第１候補決定の処理手順の一例を示すフローチャートである。
【図１０】本実施例の文節候補決定の処理手順の一例を示すフローチャートである。
【図１１】本実施例の漢字表記決定の処理手順の一例を示すフローチャートである。
【図１２】本実施例の最少漢字表記検索の処理手順の一例を示すフローチャートである。
【図１３】本実施例の同義語検索の処理手順の一例を示すフローチャートである。
【図１４】本実施例の短縮表記登録の処理手順の一例を示すフローチャートである。
【図１５】ＲＡＭの内容の構成例を示す図である。
【図１６】本発明を実現するプログラムを格納した記憶媒体におけるメモリマップの図である。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a character string conversion apparatus and method for converting character strings.
[0002]
[Prior art]
Conventionally, as a character processing device that performs character string conversion, there is a kana-kanji conversion device that inputs kana readings and converts them into appropriate kanji character strings.
[0003]
In kana-kanji conversion, there are often multiple possible kanji notations called homophones for input kana readings, and various methods for which kanji notation to output are implemented as the first priority solution. It has been.
[0004]
For example, a method of storing the appearance frequency of a word as a frequency and determining the kanji notation having the highest frequency value, a method of giving priority to the kanji notation stored in a word pair called an example, and the like have been implemented.
[0005]
[Problems to be solved by the invention]
However, according to the first priority solution determination method that has been conventionally performed, it is possible to output kanji notation that is semantically valid for kana reading, but since there is no restriction on the number of kanji characters, the number of usable characters However, it is not always possible to obtain an appropriate display of kanji characters in an input situation where there is a limit. For example, in a situation where a manuscript with a limited number of characters is being written, it is necessary to re-select candidates for kanji that have a reduced number of written characters within a range in which the intention can be transmitted. In addition, since the kanji notation obtained for the same kana reading is limited to several types stored in the dictionary, there may not be a kanji notation with a small number of characters that conveys the desired intention, and different expressions are used. I had to retype it.
[0006]
The present invention has been made in view of the above points, and the object of the present invention is to convert the smallest number of character strings among a plurality of conversion character strings that are semantically valid for the input character string. It is an object to provide a character string converting apparatus and method for preferentially outputting a character string.
[0007]
[Means for Solving the Problems]
In order to achieve the above object, a character string converter of the present invention has the following configuration. That is, input means for inputting a conversion source character string as a conversion source, and the conversion source character stringAssociate at least one word with each wordAt least oneNotationDictionary means for associating and storing character strings;Determining means for determining one word corresponding to the input conversion source character string from the dictionary means;From the dictionary means,Word determined by the determining meansCorresponding toNotationThe shortest of the stringsNotationSearch means for searching for a character string, and searched by the search meansShortest notationThe character string was converted from the conversion source character string.NotationOutput means for outputting as a character string.
[0008]
Alternatively, a kana character string, at least one word corresponding to the kana character string, at least one notation character string corresponding to each word, and a normal notation corresponding to reading or a shortening not corresponding to reading A first dictionary means for storing a notation code indicating notation, a meaning code indicating the meaning of the word, and a likelihood of the word, at least one corresponding to the meaning code and the meaning code; A second dictionary means for storing two synonym character strings in association with each other, an instruction means for instructing an abbreviated priority mode for giving priority to abbreviated notation, an input means for inputting a kana character string, and the input means The input kana character string is divided into phrases based on the words stored in the first dictionary means to create one or more phrase candidates, and the likelihood of the words constituting each phrase candidate is determined. Based on the best statement When the shortening priority mode is instructed for each phrase candidate determining means for determining a candidate and each phrase candidate determined by the phrase candidate determining means, a notation character string of words constituting the phrase candidate, A search is made from the first dictionary means, and a notation character string corresponding to the meaning code of the word is searched from the second dictionary means, and the shortest notation character string in the searched notation character string is obtained. When the shortened priority mode is not instructed, a notation character string of words constituting the phrase candidate is retrieved from the first dictionary means, and a notation character string in which the notation code indicates normal notation is obtained. Search means for outputting, and output means for displaying and outputting the character string output by the search means.
[0021]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, a Japanese language processing apparatus having a kana-kanji conversion function will be described in detail as an embodiment of the present invention with reference to the drawings.
<Device configuration>
FIG. 1 is an example of the overall configuration of a Japanese language processing apparatus according to the present invention.
[0022]
In the configuration shown in the figure, a CPU 101 is a microprocessor that performs operations for character processing, logical determination, and the like, and is connected to those buses via an address bus AB, a control bus CB, and a data bus DB. Control elements.
[0023]
The address bus AB transfers an address signal indicating a component to be controlled by the CPU 101. The control bus CB transfers and applies a control signal for each component to be controlled by the CPU 101. The data bus DB performs data transfer between the component devices.
[0024]
Next, ROM 102 is a read-only fixed memory. The ROM 102 stores a program area in which a control procedure by the CPU 101 described later with reference to FIGS. 7 to 14 is stored.
[0025]
The RAM 103 is a writable random access memory having a configuration of 16 bits per word, and is used for temporary storage of various data from each component. As shown in FIG. 15, the RAM 103 includes an input reading buffer (IBUF), an output Chinese character buffer (OBUF), a phrase candidate table (BCT), a maximum phrase likelihood storage work memory (myd), a current phrase candidate likelihood storage work memory ( yud), fixed phrase candidate number storage work memory (did), current phrase candidate number storage work memory (bid), shortest kanji representation length storage work memory (kcl), current kanji display length storage work memory (cnt), temporary kanji notation A storage buffer (OTBUF), an abbreviated Kanji character notation priority instruction flag (sflg), a registered abbreviated notation reading buffer (UYBUF), a registered abbreviated notation buffer (UHBUF), and the like.
[0026]
The input reading buffer IBUF is a buffer for storing an input character string to be subjected to kana-kanji conversion, and the output kanji buffer OBUF is a buffer for storing a converted kanji string after the kana-kanji conversion process, and is the same kind for performing kana-kanji conversion. Since the information processing apparatus is generally used, detailed description thereof is omitted. The phrase candidate table BCT is a buffer for storing information necessary for obtaining possible kanji notation candidates for the input reading character string during the kana-kanji conversion process, and will be described later with reference to FIG. The maximum phrase likelihood storage work memory myd is a work memory for storing the maximum phrase likelihood among the phrase candidates having the same start position of the phrase appearing in the phrase candidate table. The current phrase candidate likelihood storage work memory yud is a work memory that stores the likelihood of the phrase candidate currently being processed. The fixed phrase candidate number storage work memory did is a work memory that stores the number of phrase candidates having the maximum phrase likelihood. The current phrase candidate number storage work memory bid is a work memory that stores the number of the phrase candidate currently being processed. The shortest kanji notation length storage work memory kcl is a work memory for storing the number of kanji candidates having the smallest number of kanji notation characters among the possible kanji notations possessed by the phrase candidates shown in the fixed phrase candidate number storage work memory. The current kanji notation length storage work memory cnt is a work memory that stores the number of kanji characters of the kanji notation candidate currently focused on among the possible kanji notations possessed by the phrase candidates indicated in the fixed phrase candidate number storage work memory. The abbreviated Kanji display priority instruction flag is sflg, a conversion condition instruction flag for designating whether or not to output the abbreviated notation as the first priority solution. Stores a value. The registered abbreviated reading buffer UYBUF and the registered abbreviated notation buffer UHBUF are respectively buffers for storing the reading and notation of the registered abbreviated notation when additionally registering or deleting as abbreviated notation.
[0027]
The KB 105 is a keyboard, and includes alphabet keys, hiragana keys, katakana keys, character symbol input keys such as punctuation marks, operation instruction keys for instructing to register and delete abbreviated display, and canceling operations. There are various function keys such as a shorthand notation priority instruction key, a kana-kanji conversion key for converting a character string input in hiragana into a character string mixed with kanji, a cursor movement key for instructing cursor movement, and the like.
[0028]
The DISK 104 is an external memory for storing a kana-kanji conversion word dictionary 401 described later in FIG. 4, a synonym dictionary 501 described later in FIG. 5, document data, and the like. Document data and the like are stored as necessary, and the stored data is called up when necessary by an instruction from the keyboard.
[0029]
CR 106 is a cursor register. The CPU 101 can read and write the contents of the cursor register. A CRT controller (CRTC) 108 to be described later displays a cursor at a position on the display device CRT for the address stored here.
[0030]
The DBUF 107 is a display buffer memory and stores a pattern of data to be displayed.
[0031]
The CRTC 108 plays a role of displaying the contents stored in the cursor register CR 106 and the buffer DBUF 107 on the display CRT 109.
[0032]
The CRT 109 is a display device using a cathode ray tube or the like, and the display pattern of the dot configuration and the display of the cursor in the display device CRT 109 are controlled by the CRT controller 108.
[0033]
Further, the CG 110 is a character generator and stores a pattern of characters and symbols displayed on the display device CRT 109.
[0034]
The character processing apparatus according to the present invention comprising such components operates in response to various inputs from the keyboard KB 105. When input from the keyboard KB 105 is supplied, an interrupt signal is first sent to the CPU 101. The CPU 101 reads out various control signals stored in the ROM 102, and various controls are performed according to the control signals.
<Example of character string to be converted>
An example in which kana-kanji conversion is executed in the apparatus of the present embodiment having the above configuration will be described below with reference to FIG.
[0035]
(A) has shown the state of the input reading buffer when the character string used as the target of Kana-Kanji conversion is input. In the figure, “Rodokumiai no Senmoninkai” is a character string that is subject to Kana-Kanji conversion. When the “Convert” key is pressed in this state, the character string that is subject to Kana-Kanji conversion is changed. Kana-kanji conversion is performed on the result, and the conversion result in the non-short display priority instruction state shown in (c) is shorter than the “Labor Union Special Committee”, as in “Labor Union Special Committee” as in (b). Long kanji candidates are prioritized.
<Display example when registering shorthand notation>
FIG. 3 shows an execution example in which abbreviated notation is additionally registered or abbreviated display registered is deleted. By pressing the registration key or the delete key on the keyboard KB 105, a message as shown in the figure is displayed, prompting the operator to read and abbreviate. Subsequently, the function is executed by pressing the registration or deletion key for the entered shorthand notation, and the word dictionary is registered or deleted. By pressing the registration key in the state shown in the drawing, the abbreviated Kanji character notation “WS” is registered for the reading “workstation”. In addition, when the delete key is pressed, the abbreviated kanji is deleted from the word dictionary.
<Example of word dictionary>
FIG. 4 is an example of a word dictionary in the Japanese language processing apparatus according to the embodiment of the present invention.
[0036]
As shown in the figure, it is composed of six fields, and the word sequence number as the first field is a serial number that makes the word stored in the word dictionary unique. Give only one sequence number for words with the same code. The second field reading stores the word reading. The third field stores the part of speech of the word. The meaning code stored in the fourth field stores a meaning code in which words are classified by meaning. Words with different meaning codes are treated as other words with different word sequence numbers. Further, this meaning code stores the same meaning code as the meaning code shown in FIG. The fifth field stores the likelihood as an index value indicating the priority of the word. The greater the likelihood, the higher the likelihood as a word. The notation and notation type of the sixth field store the kanji notation of the word, and the kanji notation is a type that should be converted in normal use for the corresponding reading (normal type), or a short notation priority instruction is given. The distinction as to whether or not the type should be converted (abbreviated notation type) is stored. In the figure, (N) represents a normal type, and (S) represents a short notation type.
<Example of synonym dictionary>
FIG. 5 is a synonym dictionary in which synonyms having the same meaning code are classified and stored by meaning code. Synonyms stored for one semantic code are considered to be paraphrasable words having the same meaning.
[0037]
FIG. 6 is an example of a phrase candidate table that stores phrase candidates that are being analyzed for input reading strings. The phrase candidate table is a phrase number for identifying each phrase candidate, an independent word storing a reading of a word constituting the independent part of the phrase, a word serial number that is an identification number in the word dictionary of the independent word, a phrase candidate An appendix storing the reading of the appendix, and a likelihood serving as an index of the likelihood of the phrase candidate are stored. Likelihood means that the greater the number, the higher the likelihood of the phrase. The phrase stores link information to other phrase candidates having the same start position of the phrase candidate on the input reading string. The phrase number of the next phrase candidate at the same phrase start position is stored, and “−1” is a terminal link that means that there are no more phrase candidates at the same start position. The next phrase stores link information to the next phrase candidate that appears following the phrase candidate. Similarly to the phrase link, the phrase number of the next phrase candidate is stored, and “−1” is a terminal link that means that the next phrase candidate does not exist. The phrase candidate in which both the phrase and the next phrase are “−1” is the last phrase candidate that has reached the end of the input reading sequence.
[0038]
The operation of the above embodiment will be described according to the flow.
<Processing procedure according to key input>
FIG. 7 is a flowchart showing the operation of the character string conversion apparatus of the present invention.
[0039]
In (A-1), a key is pressed from the keyboard 105, and an interruption is generated. When a key is input, in (A-2), the function corresponding to the key in the state immediately before the key is pressed is determined, and (A-3), (A-4), Branches to one of steps (A-5), (A-6), and (A-7).
[0040]
(A-3) is a process when it is determined that a character is input in (A-2), and the code of the input character is stored in the input reading buffer IBUF. This processing is generally performed in the same type of information processing apparatus that performs kana-kanji conversion, and is not particularly described because it is publicly known.
[0041]
(A-4) is a process performed when it is determined in Kana-Kanji conversion in (A-2), and the characters stored in the input reading buffer IBUF are converted into Kanji. This will be described in detail with reference to FIG.
[0042]
(A-5) is a process performed when it is determined to register or delete the shorthand notation in (A-2). As shown in FIG. 3, abbreviated kanji with reading and notation specified by the operator. Is registered in the word dictionary 402, or the designated shorthand kanji is deleted from the word dictionary. This will be described in detail with reference to FIG.
[0043]
(A-6) is a process when it is determined in (A-2) that the abbreviated display priority mode is set, and when the operator wants to obtain a kana-kanji conversion result giving priority to the shortest possible notation, the abbreviated notation priority mode is set. In addition, a setting for returning to the normal mode that does not give priority to the shorthand notation is performed. As the abbreviated notation priority instruction key on the keyboard KB is pressed, the value of the abbreviated Kanji notation priority instruction flag sflg is set to “1”, and the value is set to “0” and toggled when pressed again.
[0044]
(A-7) is processing when it is determined in (A-2) that characters other than character input, kana time conversion, shorthand notation registration, and mode setting (for example, cursor movement, etc.) are determined. The processing is generally performed in FIG. 2 and is not particularly described because it is known.
[0045]
(A-8) is a display process for displaying the changed part as a result of the above process. When reading the data to be displayed, it is a process that is usually performed widely, which is developed into a pattern and output to a display buffer.
<Procedure for Kana-Kanji conversion processing>
FIG. 8 is a flowchart detailing the process (A-4).
[0046]
(B-1) searches for the word dictionary 401 according to the reading stored in the input reading buffer IBUF, and performs phrase connection verification of the searched word, thereby dividing the phrase into phrases and creating phrase candidates It is processing. In the phrase division process, a phrase candidate table BCT is created as follows. That is, phrase candidates are sequentially created from the top of the input reading buffer IBUF, phrase numbers are assigned, and independent word words retrieved from the word dictionary and word sequence numbers are stored. Also, appendix readings determined to be connectable by the phrase connection verification are stored in the appendix. Further, the priority index value obtained by the following equation is stored as the likelihood.
[0047]

The process of performing the phrase connection test and calculating the appendix likelihood is similar to the method in the same type of apparatus that performs kana-kanji conversion, and will not be described in detail. When one phrase candidate is created, the next phrase candidate is created, the phrase link and the next phrase link are set, and the phrase candidate table is terminated.
[0048]
(B-2) is a notation determination process that selects an optimal phrase candidate as the first candidate solution from the phrase candidates stored in the phrase candidate table, and outputs the selected phrase candidate to the output buffer OBUF as Kanji notation. Describe.
[0049]
(B-3) takes out the next phrase link of the phrase candidate determined in (B-2) in order to determine and output the next phrase candidate.
[0050]
(B-4) determines whether or not all the phrases on the phrase candidate table BCT have been converted, depending on whether or not the next phrase link extracted in (B-3) is “−1”. If it is converted, the process returns. Otherwise, the next phrase is determined and output in (B-2).
<Procedure for determining the description of one phrase>
FIG. 9 is a flowchart detailing the process (B-2).
[0051]
(C-1) is a phrase candidate determination process for determining one phrase candidate having the highest likelihood among the phrase candidates having the same phrase start position in the phrase candidate table BCT, which will be described in detail with reference to FIG.
[0052]
(C-2) is a kanji notation determination process for determining an appropriate kanji notation as the first priority solution from the kanji notation possible for the phrase candidate determined in (C-1), and will be described in detail in FIG. .
<Procedure for determining phrase candidates>
FIG. 10 is a flowchart detailing the process (C-1).
[0053]
(D-1) initializes the maximum phrase likelihood storage work memory myd with 0.
[0054]
(D-2) stores the phrase likelihood of the currently selected phrase candidate in the current phrase candidate likelihood storage work memory yud, and the phrase number in the current phrase candidate number storage work memory bid.
[0055]
(D-3) is determined by comparing with the maximum phrase likelihood storage work memory myd whether the phrase likelihood of the phrase candidate currently focused on is the maximum. If it is not the maximum likelihood, it branches to (D-6), and if it is the maximum, it branches to (D-4).
[0056]
(D-4) updates the maximum phrase likelihood storage memory myd with the phrase likelihood value of the currently candidate phrase candidate.
[0057]
(D-5) stores the phrase number of the currently focused phrase candidate having the maximum phrase likelihood in the fixed phrase candidate number storage work memory did.
[0058]
(D-6) determines whether there is a next phrase candidate at the same phrase start position. If the same phrase link of the phrase candidate is “−1”, there is no other phrase candidate at the same phrase start position, so go to (D-7). Otherwise, the current phrase candidate number storage work memory bid is the same. Update to the value of the phrase number indicated in the phrase link and branch to (D-3).
[0059]
In (D-7), the phrase candidate having the maximum phrase likelihood among the phrase candidates having the same phrase start position is determined as the phrase candidate indicated in the confirmed phrase candidate number storage work memory did, and the process returns.
<Procedure for determining kanji notation for phrase candidates>
FIG. 11 is a flowchart detailing the process (C-2).
[0060]
In (E-1), it is determined whether or not the abbreviated Chinese character notation priority mode is set by whether or not the abbreviated Chinese character notation priority instruction flag sflg is “1”. If it is the abbreviated Chinese character notation priority mode, the process branches to (E-5), and if not, the process branches to (E-2).
[0061]
(E-2) The following shows the kanji display determination process in the normal conversion mode where priority is not given to shortened kanji display. (E-2) takes out the independent word sequence number of the phrase candidate having the determined clause candidate number, and takes out the kanji representation of the word having the sequence number from the notation storage area of the word dictionary.
[0062]
In (E-3), it is determined whether or not the extracted kanji notation type is abbreviated notation. And the same determination process is repeated. If it is not an abbreviated notation, the searched kanji character notation is determined.
[0063]
(E-5) The following processing shows kanji notation determination processing in the shortened kanji notation priority mode. (E-5) takes out the independent word sequence number of the phrase candidate having the determined phrase candidate number, and displays the kanji representation of the word having the sequence number from the notation storage area of the word dictionary with the least number of kanji representations This is a process for extracting the notation, which will be described in detail with reference to FIG.
[0064]
(E-6) takes out the meaning code of the determined independent word, searches the synonym dictionary, takes out the kanji notation with the smallest number of kanji from the synonyms having the same meaning code, and more This is a process of determining kanji notation with a small number of characters, which will be described in detail with reference to FIG.
[0065]
In (E-7), the Chinese character notation determined in (E-3) or (E-6) is stored in the output buffer OBUF, and the process ends.
<Procedure for searching shortest notation from word dictionary>
FIG. 12 is a flowchart detailing the process (E-5).
[0066]
In (F-1), the shortest Kanji notation length storage work memory kcl stores a large value exceeding the maximum possible number of notation characters. For example, the maximum value that can be stored in the memory kcl is set.
[0067]
In (F-2), the kanji notation of the word indicated by the word sequence number is extracted from the kanji notation area of the word dictionary 401.
[0068]
The number of written characters in Kanji notation extracted in (F-3) is counted, and the result is set in the current Kanji notation length storage work memory cnt.
[0069]
In (F-4), it is determined by comparing with the shortest kanji notation length storage work memory kcl whether or not it is the shortest kanji notation. The shortest Kanji notation length storage work memory kcl is updated at -5).
[0070]
In (F-6), the kanji notation is stored in the kanji notation temporary storage buffer OTBUF.
[0071]
In (F-7), a search is made as to whether there is a next kanji notation candidate in the kanji notation area. If there are no more kanji notation candidates, the shortest kanji notation is determined in (F9). If it exists, the point of interest is moved to the next kanji notation in (F-8) and looped to (F-2).
[0072]
In (F-9), the kanji notation stored in the kanji / kanji notation temporary storage buffer OTBUF is determined as the shortest kanji notation.
<Procedure for searching the shortest notation from the synonym dictionary>
FIG. 13 is a flowchart detailing the process (E-6).
[0073]
(G-1) searches the synonym dictionary 501 for the existence of the same synonym as the meaning code of the independent word of the determined phrase candidate.
[0074]
In (G-2), it is determined whether a synonym exists.
[0075]
In (G-3), the synonym kanji notation of the synonym dictionary is extracted.
[0076]
At (G-4), the number of synonym kanji characters is counted and set in the current kanji character length storage work memory cnt.
[0077]
In (G-5), it is determined by comparing the shortest kanji notation length with the shortest kanji notation length storage work memory kcl. In 6), the shortest Kanji notation length storage work memory kcl is updated.
[0078]
In (G-7), the kanji notation is stored in the kanji notation temporary storage buffer OTBUF.
[0079]
In (G-8), it is searched whether there is the next synonym in the synonym dictionary. If there is no more, the shortest Kanji character notation is determined in (G-10). If it exists, loop to (G-3).
[0080]
In (G-10), the kanji notation stored in the kanji notation temporary storage buffer OTBUF is determined as the shortest kanji notation.
<Input string conversion example>
Here, a specific operation will be described with reference to FIG.
[0081]
Prior to the kana input, the abbreviated notation priority key is pressed to set the abbreviated notation priority instruction flag sflg to “1” to instruct conversion to the abbreviated notation.
[0082]
Next, in order to perform Kana-Kanji conversion, enter “Koromo Kamiai no Senmoninkai” and enter a Kana-Kanji conversion key. As a result, the kana-kanji conversion shown in FIG. 7A-4 is executed for the character string shown in FIG.
[0083]
First, at (B-1) in FIG. 8, the phrase candidate table shown in FIG. 6 is created. That is, the word dictionary 401 is searched for reading, and “No”, “Senmon” (word sequence number 13287), “Senmon” (word sequence number 13288) attached to “Lord Kumiai” (word sequence number 96568). , Four phrase candidates “Iinkai” (word serial number 1285) can be obtained as shown in FIG. Among these, since the phrase starting from the same position in the kana character string is a candidate for

phrase numbers

1 and 2, the value in the same phrase column of phrase number 1 is 2. The next clause following them is a clause candidate of clause number 3. The phrase number 3 “Iinkai” has no subsequent phrase, so the next phrase column is −1. The likelihood column is obtained for each phrase candidate by applying the above formula.
[0084]
Next, in (B-1), the kanji notation of each phrase is determined. For this purpose, phrase candidates are first determined by the procedure shown in FIG. The first phrase candidate is “Lord Friends” with phrase number 0, and since there is only one candidate, it is determined as a phrase candidate.
[0085]
Next, kanji notation is determined for the determined phrase candidate by the procedure of FIG. If the abbreviated notation priority instruction key indicates that the abbreviated notation has priority, the notation with the smallest number of words registered in the word dictionary 104 in the procedure of FIG. 12 is selected and the kanji notation candidate for the character string is selected. Determine as. With regard to the character string “Rodokumiai”, since the shortest notation registered in the word dictionary 104 is “Labor Union”, it is determined as a notation candidate by searching the word dictionary.
[0086]
Next, the synonym dictionary 501 is searched by the procedure of FIG. The semantic code registered in the word dictionary 401 is 98730, and when it is searched in the synonym dictionary 501, two different character strings “labor union” and “union” are registered. Since it is not shorter than the content, it is not selected from there, and the character string “labor union” is finally determined as a candidate after conversion, and is stored in the output buffer OBUF in (E-7). At this time, the attached part “no” is also output together with the independent word “labor union” converted into kanji.
[0087]
When the processing for one phrase is completed in this way, the process for the next phrase candidate is performed according to FIG. 8B-3.
[0088]
According to FIG. 6, the next phrase from the contents of the phrase candidate table BCT is “Senmon” of phrase 1 when the next phrase column of phrase number 0 is seen. In addition, when the same phrase column of phrase number 1 is seen, not “−1” but the character string “Senmon” of phrase number 2 exists as a phrase candidate starting from the same position.
[0089]
Therefore, the most likely phrase candidate is selected by the procedure of FIG. In this case, as shown in FIG. 6, the likelihood of phrase number 1 is 120, the likelihood of phrase number 2 is 80, and phrase number 1 is determined as a phrase candidate. The kanji notation for the determined phrase candidate is determined by the procedure of FIG. 11 as in the previous phrase, but there is no abbreviated notation or synonym registration for this character string. Is output to the output buffer OBUF.
[0090]
Next, the next phrase field of the candidate phrase number 1 is traced, and phrase determination / kanji notation processing is performed for phrase number 3. Since the phrase number 3 is the last phrase and there are no other candidates, the phrase number 3 “Inkai” is determined as a phrase candidate by the procedure of FIG.
[0091]
When abbreviated notation is instructed, the figure word dictionary 401 is searched for this phrase to obtain the kanji notation “committee”. This is the shortest notation registered in the word dictionary.
[0092]
Next, the synonym dictionary 501 is searched with the meaning code 1012 according to the procedure shown in FIG. This is determined as a Chinese character notation and output to the output buffer OBUF.
[0093]
The character string “Labor Union Special Committee” by the shorthand notation shown in FIG. 2B obtained as described above is displayed by (A-8) in FIG.
[0094]
If the abbreviated notation priority instruction key is pressed again and the abbreviated notation priority instruction flag sflg is set to “0”, then the character string “Budokuai no Senmoninkai” in FIG. The phrase determined as is the same as when the shortened display instruction flag sflg is 1, but the procedure for determining the kanji notation of each phrase is different. In this case, in order to proceed from (E-1) to (E-2) in FIG. 11, a character string that is not the first abbreviated notation registered in the word dictionary 401 corresponding to an independent word is output as a kanji notation. Therefore, the character string to be output is a character string in a normal notation “Labor Union Special Committee” as shown in FIG.
<Registration to the abbreviation word dictionary>
FIG. 14 is a flowchart detailing the process (A-5).
[0095]
The short notation reading to be registered or deleted is input in (H-1) and stored in the registered short notation reading buffer UYBUF.
[0096]
The abbreviated Kanji notation to be registered or deleted is input in (H-2) and stored in the registered abbreviated notation buffer UHBUF.
[0097]
The function to be registered or deleted is designated in (H-3), and the execution function is determined in (H-4). If “registration” is designated, the short notation registration is performed in (H-5), and if “delete” is designated, the short notation is deleted in (H-6). If "Cancel" is specified, nothing is executed and the process is immediately terminated.
[0098]
(H-5) is processing for additionally registering a shorthand notation. A reading is taken out from the registered short notation reading buffer UYBUF, and abbreviated notation kanji is taken out from the registered short notation buffer UHBUF. Searches the insertion point of the word dictionary with the specified reading and adds the notation type as a shorthand notation type.
[0099]
(H-6) is a process of deleting the shorthand notation from the word dictionary. A reading is taken out from the registered short notation reading buffer UYBUF, and abbreviated notation kanji is taken out from the registered short notation buffer UHBUF. The abbreviation in the word dictionary is searched with the specified reading, and the found abbreviation is deleted from the word dictionary.
[0100]
When this processing is performed, reading is made in the column 301 from the display screen of FIG. In the example of FIG. 3, when the execution key is pressed in this state, the character string “WS” is registered in the word dictionary with the notation type (S) as an abbreviated notation.
[0101]
As described above, when abbreviated notation is registered in the dictionary and the abbreviated notation is desired, the length of the output character string can be shortened with a simple operation by preferentially converting to the abbreviated notation.
[0102]
Also, if you have registered synonyms as a synonym dictionary and want shorthand notation, search the synonym dictionary for the character string to obtain the shortest character string to It can be converted to the shortest string including synonyms without changing.
[0103]
Also, if you want to obtain the exact character string as input as output by specifying the shorthand notation as shorthand and registering it in the dictionary, you can select unnecessary shorthand candidates as converted candidates. It is not necessary to display and can prevent complications.
[0104]
In the above description, only an example for the function of kana-kanji conversion is shown as a function for subjecting the abbreviated notation to priority conversion, but the present invention is not limited to kana-kanji conversion. It does not have to depart from the gist of the present invention in which the conversion destination character string with a smaller number of written characters is given priority as the first candidate solution. Similarly for functions other than Kana-Kanji conversion, preferential character string conversion to abbreviated notation can be performed. For example, in a translation apparatus that translates and outputs an English sentence into a Japanese sentence, the present invention can be similarly applied to generation of a Japanese sentence as a translation result with a longer notation length.
[0105]
The object of the present invention achieved by the function of the above apparatus or the method can also be achieved by a storage medium storing a program in the above-described apparatus implementing the present invention. That is, the storage medium is mounted on the above-mentioned apparatus, and the program itself read from the storage medium achieves the novel function of the present invention. For this purpose, the structural features of the program according to the present invention are as shown in FIG. The right side of the map in FIG. 16 corresponds to the reference numerals in FIGS. 4 and 5 and the flowchart.
[0106]
In this embodiment, the short kanji notation converted only in the short display priority mode is stored in the same dictionary as the normal kanji notation, but only the short kanji notation different from the normal kanji notation is stored. You may make it store in such a dictionary.
[0107]
[Other Embodiments]
Note that the present invention can be applied to a system including a plurality of devices (for example, a host computer, an interface device, a reader, a printer, etc.), or a device (for example, a copier, a facsimile device, etc.) including a single device You may apply to.
[0108]
Another object of the present invention is to supply a storage medium storing software program codes for implementing the functions of the above-described embodiments to a system or apparatus, and the computer (or CPU or MPU) of the system or apparatus stores the storage medium. This can also be achieved by reading and executing the program code stored in.
[0109]
In this case, the program code itself read from the storage medium realizes the novel function of the present invention, and the storage medium storing the program code constitutes the present invention.
[0110]
As a storage medium for supplying the program code, for example, a floppy disk, a hard disk, an optical disk, a magneto-optical disk, a CD-ROM, a CD-R, a magnetic tape, a nonvolatile memory card, a ROM, or the like can be used.
[0111]
Further, by executing the program code read by the computer, not only the functions of the above-described embodiments are realized, but also an OS (operating system) operating on the computer based on the instruction of the program code. A case where part or all of the actual processing is performed and the functions of the above-described embodiments are realized by the processing is also included.
[0112]
Further, after the program code read from the storage medium is written into a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer, the function expansion is performed based on the instruction of the program code. This includes a case where the CPU or the like provided in the board or the function expansion unit performs part or all of the actual processing, and the functions of the above-described embodiments are realized by the processing.
[0113]
【The invention's effect】
As described above, the character string conversion apparatus and method according to the present invention can automatically select a character string having a small number of converted character strings as the converted character string candidate group. In addition, even for character strings with different kana readings, it is possible to obtain short kanji notation without impairing the input intention. There is an effect that it is possible to realize a character string conversion device with high operability to obtain a conversion result with the number of columns suppressed.
[0114]
[Brief description of the drawings]
FIG. 1 is a block diagram showing an overall configuration of a Japanese language processing apparatus according to an embodiment.
FIG. 2 is a diagram illustrating a conversion example of short notation priority conversion.
FIG. 3 is a diagram showing an example of user registration in abbreviated notation.
FIG. 4 is a diagram illustrating an example of a configuration of a word dictionary according to the present embodiment.
FIG. 5 is a diagram illustrating an example of a configuration of a synonym dictionary according to the present embodiment.
FIG. 6 is a diagram illustrating an example of the configuration of a phrase candidate table according to the embodiment.
FIG. 7 is a flowchart illustrating an example of a processing procedure of overall operation of the present exemplary embodiment.
FIG. 8 is a flowchart illustrating an example of a kana-kanji conversion processing procedure according to the present exemplary embodiment.
FIG. 9 is a flowchart illustrating an example of a processing procedure for determining a kana-kanji conversion first candidate according to the embodiment;
FIG. 10 is a flowchart illustrating an example of a processing procedure for determining phrase candidates according to the embodiment.
FIG. 11 is a flowchart illustrating an example of a procedure for determining kanji notation according to the embodiment.
FIG. 12 is a flowchart illustrating an example of a processing procedure of minimum kanji notation search according to the embodiment.
FIG. 13 is a flowchart illustrating an example of a synonym search processing procedure according to the embodiment.
FIG. 14 is a flowchart illustrating an example of a processing procedure of shorthand notation registration according to the embodiment.
FIG. 15 is a diagram illustrating a configuration example of contents of a RAM.
FIG. 16 is a diagram of a memory map in a storage medium storing a program for realizing the present invention.

Claims

An input means for inputting a conversion source character string as a conversion source;
Dictionary means for associating at least one word with the conversion source character string and storing at least one notation character string with each word;
Determining means for determining one word corresponding to the input conversion source character string from the dictionary means;
Search means for searching for the shortest notation character string among the notation character strings corresponding to the words determined by the determination means from the dictionary means;
A character string conversion apparatus comprising: an output unit that outputs the shortest character string searched by the search unit as a character string converted from the conversion source character string.

The dictionary means includes a word dictionary registered in association with a word and a notation character string of the word, and a synonym dictionary registered in association with a word and a notation character string of a synonym of the word, 2. The character string according to claim 1, wherein the search unit searches for the shortest character string from the notation character string of the word determined by the determination unit and the notation character string of the synonym of the word. Conversion device.

Register means for registering a shorthand character string associated with a word in the dictionary means, and instructing the preferential output of the shorthand character string as a search result when the search means searches the dictionary means The character string conversion device according to claim 1, further comprising an instruction unit that performs the operation.

The input means inputs a kana character string as the conversion source character string, and the dictionary means associates a kana character string with at least one word that reads the kana character string, and at least one written character for each word The character string conversion apparatus according to claim 1, further comprising a word dictionary that stores the strings in association with each other.

The word dictionary includes likelihood for each word, and selects an optimal phrase candidate from the phrase candidates included in the character string input by the input unit based on the likelihood of the word constituting the phrase candidate The character string conversion device according to claim 4, further comprising selection means for performing the selection.

A kana character string, at least one word corresponding to the kana character string, at least one notation character string corresponding to each word, and whether the notation character string is a normal notation corresponding to reading or a shortened notation not corresponding to reading A first dictionary means for storing the notation code indicating the meaning, the meaning code indicating the meaning of the word, and the likelihood of the word in association with each other;
Second dictionary means for storing the meaning code and a character string of at least one synonym corresponding to the meaning code in association with each other;
An instruction means for instructing an abbreviated priority mode that prioritizes abbreviated notation;
An input means for inputting a kana character string;
The kana character string input by the input means is divided into phrases based on the words stored in the first dictionary means to create one or more phrase candidates, and the words constituting each phrase candidate A phrase candidate determining means for determining an optimal phrase candidate based on the likelihood of
For each phrase candidate determined by the phrase candidate determining means, when the shortening priority mode is instructed, the notation character string of the word constituting the phrase candidate is searched from the first dictionary means, Further, the notation character string corresponding to the meaning code of the word is searched from the second dictionary means, the shortest notation character string is output among the searched notation character strings, and the shortening priority mode is instructed. If not, search means for searching for a notation character string of words constituting the phrase candidate from the first dictionary means, and outputting a notation character string in which the notation code indicates normal notation,
An output means for displaying and outputting the written character string output by the search means;

Input means for inputting a conversion source character string as a conversion source, dictionary storage means for associating at least one word with the conversion source character string and storing at least one written character string with each word, and written characters An output unit that outputs a sequence; a processing unit that executes various processes based on a program; and a program storage unit that stores the program, and the processing unit and the program storage unit cooperate with each other. A character string conversion method in a character processing device comprising a determination means, a search means, and a character string output means ,
Said determining means, from among the words stored in the dictionary storage unit, a determination step of determining a single word that corresponds to the conversion based on the character string input from the input means,
The search means searches the dictionary storage means for the shortest notation character string among the notation character strings corresponding to the words determined by the determination step;
The character string output means includes an output step of outputting the shortest notation character string searched from the dictionary storage means in the search step to the output means as a notation character string obtained by converting the conversion source character string. Character string conversion method characterized by

The dictionary storage means includes a word dictionary registered in association with a word and a notation character string of the word, and a synonym dictionary registered in association with a word and a notation character string of a synonym of the word,
8. The character search according to claim 7, wherein in the search step, the shortest character string is searched from the notation character string of the word determined in the determination step and the notation character string of the synonym of the word. Column conversion method.

The character processing device further includes registration means and instruction means realized by cooperation of the processing means and the program storage means,
It said registration means includes a registration step of registering the dictionary storage unit, a character string shorthand in association with a word,
The instruction unit further includes an instruction step for instructing to output the abbreviated character string preferentially as a search result when the dictionary storage unit is searched by the search step. The character string conversion method described in.

The input means inputs a kana character string as the conversion source character string, and the dictionary storage means associates the kana character string with at least one word that reads the kana character string, and at least one notation for each word The character string conversion method according to claim 7, further comprising a word dictionary that stores character strings in association with each other.

The character processing device further includes selection means realized by cooperation of the processing means and the program storage means,
The word dictionary includes likelihood for each word, and the selection unit is optimal based on the likelihood of words constituting the phrase candidate from among the phrase candidates included in the character string input by the input unit. The character string conversion method according to claim 10, further comprising a selection step of selecting a simple phrase candidate.

A kana character string, at least one word corresponding to the kana character string, at least one notation character string corresponding to each word, and whether the notation character string is a normal notation corresponding to reading or a shortened notation not corresponding to reading A first dictionary storage means for storing a notation code indicating the meaning of the word, a word meaning code indicating the meaning of the word, and a likelihood of the word, at least one corresponding to the meaning code and the meaning code The second dictionary storage means for storing the synonym notation character string in association with each other, the input means for inputting the kana character string, the output means for displaying and outputting the notation character string, and executing various processes based on the program processing means for, said program comprising a can and stored program storage means, and said processing means and instructing means and phrase candidate determination is realized by the cooperating with said program storage means A kana-kanji conversion method in the character processing apparatus having a stage and the search means and the character string output means,
An instruction step for instructing a shortened priority mode in which the instruction means gives priority to the shorthand notation;
The phrase candidate determining means divides the kana character string input by the input means into phrases based on the likelihood stored in the first dictionary storage means to create one or more phrase candidates A phrase candidate determination step for determining an optimal phrase candidate based on the likelihood of words constituting each phrase candidate;
When the search unit is instructed for the shortening priority mode for each phrase candidate determined by the phrase candidate determination step, the notation character string of the words constituting the phrase candidate is converted to the first dictionary. Search from the storage means, and further search the second dictionary storage means for the notation character string corresponding to the meaning code of the word, and output the shortest notation character string in the searched notation character string, When the shortening priority mode is not instructed, a notation character string of words constituting the phrase candidate is searched from the first dictionary storage means, and a notation character string in which the notation code indicates a normal notation is output. The search process;
The character string output means, string conversion method characterized by comprising an output step of displaying and outputting the writing character string output from the first or second dictionary storage means by said search step to said output means .