JP3700193B2

JP3700193B2 - Kana-kanji conversion device and kana-kanji conversion method

Info

Publication number: JP3700193B2
Application number: JP32355594A
Authority: JP
Inventors: 庸雄河西; 隆志山村
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 1994-11-30
Filing date: 1994-11-30
Publication date: 2005-09-28
Anticipated expiration: 2020-09-28
Also published as: JPH08161324A

Description

【０００１】
【産業上の利用分野】
本発明は、仮名漢字変換装置および仮名漢字変換方法に関し、詳しくは単語間の係り受けの情報を用いて、文節を構成する単語の漢字候補の選択に利用する仮名漢字変換装置および仮名漢字変換方法に関する。
【０００２】
【従来の技術】
従来、キーボードなどから入力された仮名文字列を、所望の仮名漢字混じり文に変換する仮名漢字変換装置が、日本語文の入力装置として、あるいは日本語文の編集装置として、種々提案されている。こうした仮名漢字変換装置は、使用者が単語や文節の区切り位置をいちいち指定する必要がなく、しかも変換後の文字列は使用者が望んだ表記となるものが望まれている。日本語には、同音異議語や同訓異議語が多数存在することから、誤りなく所望の仮名漢字混じり文を得るためには、おそらく最終的には文の意味を解析しなければならないが、意味を解析するためには、少なくとも有機的に関連づけられた数万に上る言葉の知識ベースが必要となり、実現は極めて困難である。
【０００３】
そこで、従来の仮名漢字変換装置では、文節分かち書きの処理や、同音異議語の選択における学習処理を工夫し、意味の解析なしで、使用者が望む結果を得るよう試みている。文節分かち書きの処理としては、２文節を基本単位とし成り立ち得る文節の中で最長の文節が得られる２文節を第１候補とする２文節最長一致法や、文節を構成する単語の候補となり得る単語および単語同士の組合わせにコストを付け、この点数が所定の条件を満たす文節を第１候補とする最小コスト法などがある。また、学習処理には、同音異議語の中から直前に使用者が選択した単語を最優先で次回の候補とする同音異議語の学習や、ある単語を含んだ文節の長さとして直前に使用者が指定した長さを最優先とする文節長の学習などが知られている。
【０００４】
更に、最近では、単語同士の特定の関係（例えば、「熱いお茶」の「熱い」と「お茶」、あるいは「暑い夏」の「暑い」と「夏」）に着目し、この関係を記憶した辞書を用意することで、一方の単語（例えば「お茶」）が特定されたとき、この単語に関係のある言葉（例えば「あつい」の候補のうちの「熱い」）を第１候補として選択するものも提案されている（例えば特開平３−１０５６６４号の「かな漢字変換装置」や特開平４−２７７８６１号公報の「かな漢字変換装置」など）。こうした単語間の特定の関係は、「係り受け」あるいは「共起」と呼ばれる。
【０００５】
【発明が解決しようとする課題】
しかしながら、現実に用いられている言語の構文規則は極めて複雑であり、単純に係り受けの関係を参照すれば所望の日本語が得られる訳ではなく、いかなる手順で係り受けの検定を行なうかは大きな問題となっていた。例えば、入力された文字列中に括弧や句読点などがある場合、係り受けの検定をどうするかといった問題も明快な解決は示されていない。
【０００６】
更に、一度に複数の文節について仮名漢字変換を行なう場合、係り受けの関係をどこから検定するか、また複数の係り受けの関係が見い出された場合、優先順位をどうするか、更には複数の係り受けの関係が見い出された場合で共存が許される場合と許されない場合とがあるか否か、なども明らかにはされていなかった。
【０００７】
本発明の仮名漢字変換装置および仮名漢字変換方法は、こうした問題を明らかにし、係り受けの関係を利用して所望の仮名漢字混じり文を得ることを目的としてなされ、次の構成を採った。
【０００８】
【課題を解決するための手段および作用】
請求項１の仮名漢字変換装置は、
仮名文字列を入力し、文法辞書を参照して、仮名漢字混じり文字列候補を生成する仮名漢字変換装置であって、
前記入力された仮名文字列を、前記文法辞書を参照して文節分かち書きする分かち書き手段と、
所定の文節同士の係り受けの情報を記憶した係り受け情報辞書と、
前記分かち書きされた一つの文節を起点とし、該係り受け情報辞書を参照して他の文節との係り受けの存在を、文節間の距離の小さなものから順次検定する係り受け検定手段と、
前記入力された仮名文字列中に存在する区切り記号を検索する区切り記号検索手段と、
前記係り受けの検定中に前記区切り記号が見い出されたとき、該位置を係り受け検定の終了位置として、前記一つの文節を起点とする係り受け検定を終了する係り受け検定終了手段と
を備え、更に、
前記係り受け検定手段は、
起点とした一つの文節を係り語として、係り受け検定の受け語の検定を、対象とした文節の次候補について順次行なう次候補検定手段と、
該次候補検定手段による検定が終了した後、該起点とした一つの文節の次候補を係り語として、受け語の検定を続行する次係り語検定手段と
を備えたことを要旨とする。
【０００９】
この仮名漢字変換装置は、入力された仮名文字列に対して、分かち書き手段が、文法辞書を参照して文節分かち書きを行ない、分かち書きされた一つの文節を起点とし、係り受け検定手段が、所定の文節同士の係り受けの情報を記憶した係り受け情報辞書を参照して、他の文節との係り受けの存在を、起点とした文節からの文節間の距離の小さなものから順次検定する。他方、区切り記号検索手段が、入力された仮名文字列中に存在する区切り記号を検索しており、ある文節を起点としてなされる係り受けの検定中に区切り記号が見いだされた場合には、係り受け検定終了手段が、その位置を係り受け検定の終了位置として、一つの文節を起点とする係り受け検定を終了させる。しかも、この仮名漢字変換装置は、係り受けの検定を単に第１に文節候補とされたものについてのみ行なうのではなく、次候補についても行ない、かつ受け語についての次候補に関する検定を優先し、その後、係り語の次候補について検定を行なう。
【００１０】
また、請求項２の仮名漢字変換装置は、
仮名文字列を入力し、文法辞書を参照して、仮名漢字混じり文字列候補を生成する仮名漢字変換装置であって、
前記入力された仮名文字列を、前記文法辞書を参照して文節分かち書きする分かち書き手段と、
前記入力された仮名文字列中、開始と終了を示す対をなす符号により区切られた範囲を検索する範囲検索手段と、
所定の文節同士の係り受けの情報を記憶した係り受け情報辞書と、
前記分かち書きされた一つの文節を起点とし、該係り受け情報辞書を参照して他の文節との係り受けの存在を、文節間の距離の小さなものから順次検定する係り受け検定手段と、
前記範囲検索手段により検索された範囲を、前記係り受け検定手段による係り受け検定の範囲から除く検定範囲除去手段と
を備え、更に、
前記係り受け検定手段は、
起点とした一つの文節を係り語として、係り受け検定の受け語の検定を、対象とした文節の次候補について順次行なう次候補検定手段と、
該次候補検定手段による検定が終了した後、該起点とした一つの文節の次候補を係り語として、受け語の検定を続行する次係り語検定手段と
を備えたことを要旨とする。
【００１１】
この仮名漢字変換装置は、入力された仮名文字列に対して、分かち書き手段が、文法辞書を参照して文節分かち書きを行ない、分かち書きされた一つの文節を起点とし、係り受け検定手段が、所定の文節同士の係り受けの情報を記憶した係り受け情報辞書を参照して、他の文節との係り受けの存在を、起点とした文節からの文節間の距離の小さなものから順次検定する。他方、範囲検索手段が、入力された仮名文字列中、開始と終了を示す対をなす符号により区切られた範囲を検索しており、この範囲を、検定範囲除去手段が、係り受け検定手段による係り受け検定の範囲から除く。従って、係り受けの検定は、開始と終了を示す対をなす符号により囲まれた範囲を飛ばして行なわれることになる。しかも、この仮名漢字変換装置は、係り受けの検定を単に第１に文節候補とされたものについてのみ行なうのではなく、次候補についても行ない、かつ受け語についての次候補に関する検定を優先し、その後、係り語の次候補について検定を行なう。
【００１４】
請求項４記載の仮名漢字変換装置は、請求項１または請求項２記載の仮名漢字変換装置であって、
前記分かち書きされた文節が３以上の場合、前記分かち書きされた一つの文節を起点とし、該係り受け情報辞書を参照して他の文節との係り受けの存在を、前記３以上の文節のうち、文末に近い側から、順次検定する係り受け検定手段と
を備えたことを要旨とする。
【００１５】
この仮名漢字変換装置は、分かち書きされた文節が３以上の場合の係り受けの検定の優先順位を定めるものであり、係り受け検定手段は、分かち書きされた一つの文節を起点とし、係り受け情報辞書を参照して他の文節との係り受けの存在を、３以上の文節のうち、文末に近い側から順次検定する。
【００１６】
請求項５の仮名漢字変換装置は、請求項４の係り受け検定手段に、
文末に近い係り語を起点とする検定を先に行なう第１の検定手段と、
該第１の検定手段により係り受けが見いだされた場合には、該起点とした文節より前方の文節に着目し、前記第１の検定手段より見いだされた係り受けと共に成立する係り受けのみ、係り受けの成立として取り出す第２の検定手段と
を備えたことを要旨とする。
【００１７】
従って、この仮名漢字変換装置は、まず文末に近い係り受けの検定を先に行ない、その後、その文節より前方の文節において、既に係り受けが検定された係り受けと共に成立する係り受けのみ、係り受けの成立として認める。
【００１８】
請求項６の仮名漢字変換方法は、
仮名文字列を入力し、文法辞書を参照して、仮名漢字混じり文字列候補を生成する仮名漢字変換方法であって、
前記入力された仮名文字列を、前記文法辞書を参照して文節分かち書きし、
前記分かち書きされた一つの文節を起点とし、所定の文節同士の係り受けの情報を記憶した係り受け情報辞書を参照して、他の文節との係り受けの存在を、文節間の距離の小さなものから順次検定すると共に、
前記入力された仮名文字列中に存在する区切り記号を検索し、
前記係り受けの検定中に前記区切り記号を見いだしたとき、該位置を係り受け検定の終了位置として、前記一つの文節を起点とする係り受け検定を終了するものとし、
更に、前記係り受けの検定においては、
起点とした一つの文節を係り語として、係り受け検定の受け語の検定を、対象とした文節の次候補について順次行ない、
該次候補についての検定が終了した後、該起点とした一つの文節の次候補を係り語として、受け語の検定を続行すること
を要旨とする。
【００１９】
また、請求項７の仮名漢字変換方法は、
仮名文字列を入力し、文法辞書を参照して、仮名漢字混じり文字列候補を生成する仮名漢字変換方法であって、
前記入力された仮名文字列中、開始と終了を示す対をなす符号により区切られた範囲を検索すると共に、
前記入力された仮名文字列を、前記文法辞書を参照して文節分かち書きし、
前記分かち書きされた一つの文節を起点とし、前記検索された範囲を、該係り受け検定の範囲から除きつつ、所定の文節同士の係り受けの情報を記憶した係り受け情報辞書を参照して、他の文節との係り受けの存在を、文節間の距離の小さなものから順次検定するものとし、
更に、前記係り受けの検定においては、
起点とした一つの文節を係り語として、係り受け検定の受け語の検定を、対象とした文節の次候補について順次行ない、
該次候補についての検定が終了した後、該起点とした一つの文節の次候補を係り語として、受け語の検定を続行すること
を要旨とする。
【００２０】
請求項８の仮名漢字変換方法は、請求項６または請求項７記載の仮名漢字変換方法であって、
仮名文字列を入力し、文法辞書を参照して、仮名漢字混じり文字列候補を生成する仮名漢字変換方法であって、
前記入力された仮名文字列を、前記文法辞書を参照して文節分かち書きし、
前記分かち書きされた文節が３以上の場合、前記分かち書きされた一つの文節を起点とし、所定の文節同士の係り受けの情報を記憶した係り受け情報辞書を参照して他の文節との係り受けの存在を、前記３以上の文節のうち、文末に近い側から、順次検定すること
を要旨とする。
【００２１】
【実施例】
以上説明した本発明の構成・作用を一層明らかにするために、以下本発明の好適な実施例について説明する。図１は、仮名漢字変換の制御ロジックを示すブロック図、図２は、この仮名漢字変換制御ロジックが実際に動作するハードウェアを示すブロック図である。図２に示すように、この装置は、周知のＣＰＵ２１を中心にバス３１により相互に接続された次の各部を備える。ＣＰＵ２１とバス３１により相互に接続された各部について、簡単に説明する。
【００２２】
ＲＯＭ２２：仮名漢字変換プログラム等を記憶するマスクメモリ、
ＲＡＭ２３：主記憶を構成する読み出しおよび書き込みが可能なメモリ、
キーボードインタフェース２５：キーボード２４からのキー入力を司るインタフェース、
ＣＲＴＣ２７：カラーで表示可能なＣＲＴ２６への信号出力を制御するＣＲＴコントローラ、
プリンタインタフェース２９：プリンタ２８へのデータの出力を制御するインタフェース、
ハードディスクコントローラ（ＨＤＣ）３０；ハードディスク３２を制御するインタフェース、
である。ハードディスク３２には、ＲＡＭ２３にロードされて実行される各種プログラムやデバイスドライバの形式で提供される仮名漢字変換処理プログラム、あるいはその仮名漢字変換処理プログラムが参照する各種変換辞書などが記憶されている。
【００２３】
こうして構成されたハードウエアにより、文章が入力，仮名漢字変換，編集，表示，印刷などがなされる。すなわち、キーボード２４から入力された文字列は、ＣＰＵ２１により所定の処理がなされ、ＲＡＭ２３の所定領域に格納され、ＣＲＴＣ２７を介してＣＲＴ２６の画面上に表示される。
【００２４】
次に、こうして構成されたハードウエアにより実行される機能を図１を用いて説明する。図１に示した各部の構成と働きについて概説するが、ここで行なわれる処理は、キーボード２４より入力されたデータに基づき、中央処理装置（ＣＰＵ２１）が実行するものである。このＣＰＵ２１により、総ての処理がおこなわれる。仮名漢字変換については、キーボード２４が操作されたとき、所定の割込処理が起動し、入力したキーイメージを対応する仮名文字列に変換し、更にこれを仮名漢字混じり文字列に変換するデバイスドライバが起動する。もとより、並列処理可能なコンピュータであれば、仮名漢字変換を一つのアプリケーション（インプットメソッド）が行なうものとし、変換結果を、必要とするアプリケーションに引き渡す構成としても差し支えない。この場合には、キーボード２４からの入力をインプットメソッドが一括して引き受けることになる。
【００２５】
キーボード２４からのキーイメージは、文字入力部４０により受け付けられ、ここで、対応する仮名文字列に変換される。ローマ字入力の場合には所定の変換テーブルを参照して、仮名文字列に変換する。一つの仮名文字が得られる度に文字入力部４０は、その仮名文字を変換制御部４２に送出する。この変換制御部４２は、仮名漢字変換の中心的な役割を果たす所であり、後述する種々の仮名漢字変換を制御して、結果を変換後文字列出力部４４に送出する。変換後文字列出力部４４は、現実には、ＣＲＴＣ２７に信号を送り、ＣＲＴ２６に変換後文字列を表示する。
【００２６】
変換制御部４２は、受け取った仮名文字を文字列入力部５０に引き渡す。文字列入力部５０は、文字格納部５２に仮名文字列を格納する。この文字列に基づいて、自立語候補作成部５４と付属語候補作成部６４とが、単語データの候補を作成する。自立語候補作成部５４は、ハードディスク３２に予め記憶された自立語辞書５８を用い、自立語解析位置管理部５６の管理の下で、得られた仮名文字列から自立語候補を抽出する処理を行なう。一方、付属語候補作成部６４は、同じく付属語辞書６８を用い、付属語解析位置管理部６６の管理の下で、得られた仮名文字列から付属語候補を抽出する処理を行なう。解析位置を移動しつつ、自立語候補と付属語候補を抽出する処理については、後述する。
【００２７】
ここで、自立語辞書５８は、学習により、同音異義語や接辞などの優先順位を変更する。この学習処理を行なうのが、係り受け学習部７０，自立語学習部７２，補助語学習部７４，接辞学習部７６，文字変換学習部７８である。係り受け学習部７０は、係り受けが成立する条件で、使用者が係り受けに該当する単語以外の語を選択した場合、同じ単語の組合わせでは、使用者が選択した組合わせを優先するよう係り受けの関係を学習するものである。自立語学習部７２は、同音異義語の存在する自立語群において、最後に選択された単語を最優先の候補とするよう学習するものである。補助語学習部７４は、例えば「ください」などの補助語を「ください」「下さい」など、いずれの語形で変換するかを学習するものである。更に、接辞学習部７６は、接頭語，接尾語などの変換形式（例えば、「御」「ご」など）を学習するものである。文字変換学習部７８は、入力した文字列をそのままひらがなやカタカナとして確定させた場合に、その文字列を学習し、次回以降の変換処理では確定させたひらがなまたはカタカナを候補として出力するものである。
【００２８】
自立語候補作成部５４，付属語候補作成部６４により、作成された語候補を得て、単語データ作成部８０が、各語候補についてのデータを作成する。即ち、得られた自立語と付属語、自立語と自立語、更には「自立語＋付属語」からなる文節間の接続を接続検定テーブル８４を参照して接続検定部８２が行なった結果、および全体のコスト計算をコスト計算部８６が行なった結果を得て、単語毎のデータとして出力するのである。この単語データは、一旦単語データ格納部１００に格納され、係り受け候補調整部９０からの調整出力を受けて、文節分かち書きの処理に用いられる。
【００２９】
係り受け候補調整部９０は、自立語候補作成部５４，付属語候補作成部６４からの語候補を単語データ作成部８０，単語データ格納部１００，文節分かち書き部１０２を介して受けて、係り受けの検定を行なうものである。係り受けの検定は、ハードディスク３２に予め用意された係り受け辞書９８を参照することによって行なわれる。尚、係り受け辞書は、容量を小さくするために係り受け関係が逆となるものについても一つの係り受け情報のみを記憶しているに過ぎないので、文法的な解析を伴い転置情報調整部９９により、係り受け辞書９８の情報を拡張して、係り受け候補の調整を行なっている。例えば、係り語「花が」＋受け語「美しい」という係り受け情報のみ係り受け辞書９８に記憶しておき、係り語「美しい」＋受け語「花」という係り受けの検定も行なおうとするのである。
【００３０】
係り受けの検定を行なう範囲は、係り受け範囲管理部９６により管理される。また、係り受けの関係の検定には、いくつかの許容条件があり、これが使役・受動解析部９２，助詞許容解析部９４等により判定される。以上の係り受けの検定により調整された文節候補から文節分かち書きの第１候補が決定され、これが文節データ格納部１０６に格納される。格納された候補は、変換文字列出力部１０８により変換制御部４２に出力される。変換制御部４２は、この文字列を候補文字列として表示すると共に、非所望の文字列が候補となる場合もありえるから、使用者による指示を受けて、次候補の表示や選択などの処理を行なう。これらの指示や選択の結果などは、文節データ格納部１０６や既述した各学習部７０ないし７８に入力され、文節の一部確定や学習による優先順位の書換などに用いられる。なお、図示していないが、使用者により文字列の確定処理がなされると、各部に一時的に保存されたデータは総て消去され、次の変換に備える。
【００３１】
以上、仮名文字の入力から変換語文字列の出力までを概説したが、次に各処理の詳細について説明する。まず最初に一般的な文節分かち書きの処理について説明し、次に本発明の要部である係り受けの処理について説明する。図３は、最小コスト法による文節分かち書きの処理の概要を示すフローチャートである。図示するように、まず、一時的に保存されたデータの消去や解析位置を１桁目に初期化するなどの初期化の処理（ステップＳ２００）を行なった後、解析位置を求める処理を行なう（ステップＳ２１０）。解析位置とは、それまでに入力された仮名文字列の先頭から順に一つずつ進められていく位置である。例えば、図４に示す例文「くるまではこをはこぶ」という仮名文字列が入力されているとすれば、最初の解析位置は１桁目の「く」の位置である。この解析位置で、ハードディスク３２に記憶された自立語辞書５８および付属語辞書６８を検索する処理を行なう（ステップＳ２２０）。
【００３２】
辞書の検索を行なった後、得られた単語についてそれ以前の単語との結合をチェックする処理を行ない（ステップＳ２３０）、単語間の結合がありえない語しか得られていない場合には、更に辞書を検索する。例えば、図４に示した例では、「こをはこぶ」の「は」について付属語辞書６８から検索された係助詞の「は」は、そのなど直前の格助詞「を」との結合がありえないと判断されるから、単語データ作成部８０，接続検定部８２による接続の検定により、無効なデータとして扱われる。図４では、こうした結合チェックにより無効と判断された語に符号「×」を付けた。なお、単語間の結合は、接続検定テーブル８４に予め記憶されているが、この接続検定テーブル８４は、単語の品詞同士の結合の可能性についての情報を与えるテーブルであり、実施例では、４００×４００程度のマトリックスとして与えられている。一つの解析位置での辞書検索と結合チェックが終われば、解析位置を順に進めて更に処理を繰り返す。
【００３３】
結合の可能性のある単語については、次にコスト計算を行ない、その語の最小総コストを求める処理を行なう（ステップＳ２４０）。この処理は、コスト計算部８６が行なうもので、図４（Ａ）に示す例では、「くるま」は、例えば「く」＋「る」＋「ま」、「くる」＋「ま」「くるま」と分けることができ、これらに単語を当てはめてゆくとき、自立語＝２、付属語＝０のコストを持つものとし、「苦」（自立語）＋「流」（自立語）ならば、「流」の総コストは４、と求めるものである。この時、「間」のコストが４となるのは、最小の総コストを求めるからであり、「苦」＋「流」＋「間」のコスト６ではなく、「来る」＋「間」の場合のコスト４を採用するからである。「で」「は」は付属語なので、それ以前の単語のうち最小のコストの単語「車」＝２のコストがそれ自身のコストとなる。図４には、各語のコストを右下に示した。
【００３４】
以上のコスト計算の後で、各単語のコストをチェックし、不適切なコストのものを無効とする処理を行なう（ステップＳ２５０）。不適切なコストとは、他の語の組合わせと比べてコストが大きくなってしまう語の組合わせである。即ち、「区」＋「留」といった語の組合わせを選択することは、その位置までで得られる他の語「来る」や「繰る」のコストより高くなってしまうので、不適切なコストと判断して、これを文節候補から除外するのである。この最小コストの考え方から採用されない語を、図４では、語の右上に「●」として示した。なお、図４において、「○」は、その語が、上述した結合チェックとコストチェックの結果、文節候補を形成する可能性のある語として残ったものであることを示している。
【００３５】
次に、こうしてコストが与えられた単語候補をリンクする処理を行なう（ステップＳ２６０）。即ち、結合が有効とされた語について、その結合関係をポインタを設定することで関係づけるのである。図４の例では、「来る」「繰る」「車」「まで」「で」「は」「では」などが無効でない語として最小総コストの計算がなされたから、「来る」「繰る」については「まで」にリンクし、「車」については「で」「では」にリンクするというように関係づけるのである。こうした結合チェックやコスト計算、そしてリンクづけの処理を、一つの解析位置で総ての単語の検索が完了する間で繰り返す。また、その解析位置での辞書の検索が完了すると、更に解析位置を一つ進めて、新たな単語の成立を検討し、同様に結合チェックやコスト計算などを繰り返す。
【００３６】
解析位置が、既に入力された最後の仮名文字の位置に至り、全語について解析が完了した場合には（ステップＳ２６５）、以上の処理を前提として、最小コストのパスを検索する処理を行なう（ステップＳ２７０）。この処理は、文節分かち書き部１０２が行なうもので、有効とされた語の組合わせのなかで、語に付与されたコストの総和が最小になるものを検索する処理である。「くるまではこをはこぶ」の例では、図４（Ｂ）に実線Ｊのパスとして示すように、「車で」＋「箱を」＋「運ぶ」という分かち書きが総コスト１８となるので、最小コストとして選択される。なお、最小コストではないが、他の文節分かち書きの候補も検索される。例えば、図４（Ｂ）に破線Ｂのパスとして示すように、「車では」＋「子を」＋「運ぶ」という分かち書き（コスト＝２０）である。こうして分かち書きの候補を作成した後（ステップＳ２８０）、今度は各文節の内部での候補を作成する処理を行なう（ステップＳ２９０）。即ち、ひとつの文節分かち書きの内部で、例えば「はこを」に対して「箱を」や「函を」といった候補を用意するのである。これらの文節の候補や単語の候補は、使用者により文節の分け方をかえるよう指示されたり、次候補を表示するよう指示された場合に使用される。
【００３７】
次に、こうして文節分かち書き処理がなされ、各文節について漢字候補が作成された後で実行される係り受け検定処理について説明する。図５は仮名漢字変換を行なおうとする例文を示す説明図であり、図６は係り受けの終了位置を検索する終了位置検索処理ルーチンを示すフローチャートであり、図７は係り受け検定処理ルーチンを示すフローチャートである。ここでは例文として、図５（Ａ）に示すように、『「あかいはながうつくしい」まどをあけてわたしはいった。』という句読点と括弧（「」）を内部に有する文を取り上げる。最小コスト法を用いた分かち書きの処理により図５（Ｂ）に示す文節が取り出されたと仮定する。まず、係り受けの終了位置を検索する終了位置検索処理ルーチンについて、図６に従って説明する。
【００３８】
図６に示した終了位置検索処理ルーチンは、つまりところ係り受けの検定の範囲を決定する処理である。即ち、入力された仮名文字列から文節分かち書きの処理により複数の文節が取り出された場合、どの文節からどの文節までの間で係り受けの検定を行なうのが望ましいかを決定する処理である。この処理ルーチンが開始されると、まず入力され分かち書き処理された文節のうち先頭の文節位置を係り受け解析範囲の開始位置として設定し（ステップＳ３００）、この文節が単語データから生成された文節であるか否かを判断する（ステップＳ３１０）。文節の中には、例えば「Ｘ＝Ａならば」というように、不定語からなる文節が存在する。自立語辞書５８から検索された自立語を含まないような文節は係り受けの成立する文節足り得ないから、この場合には、係り受け範囲の検定を行なう必要はない。
【００３９】
単語データより生成された文節であると判断されると、次に文節内の最終の単語を取り出す処理を行なう（ステップＳ３２０）。文節は、自立語＋付属語として構成されるが、特に末尾には句読点などが付属する場合がある。そこで、取りだした最終の単語について、以下左括弧−例えば『（，「，［』等であるか否かの判断（ステップＳ３３０）、右括弧−例えば）『，」，］』等であるか否かの判断（ステップＳ３４０）、句点−例えば『。・．，？！』等であるか否かの判断（ステップＳ３５０）を行なう。これらのいずれでもないと判断された場合、またはステップＳ３１０で、着目した文節が単語データから生成された文節でない場合には、係り受けの解析範囲の検索を、文節分かち書きで作成された全文節に対して行なったか否かを判断し（ステップＳ３６０）、全文節について判断していなければ、次の文節に移動して（ステップＳ３７０）、ステップＳ３２０から以上の処理を繰り返す。
【００４０】
一方、文節内の最終の単語が左括弧、右括弧、句点のいずれかであると判断されるか（ステップＳ３３０，３４０，３５０）、全文節について判断を行なった場合には（ステップＳ３６０）、現在の文節を係り受けの解析範囲の終端と判断し（ステップＳ３８０）、「ＥＮＤ」に抜けて本ルーチンを終了する。
【００４１】
以上の処理により、例えば図５の例文では、『「赤い』という文節から終了位置の検索を開始すると、まず『「赤い』の最終単語には左括弧などは含まれないから次の文節『花が』に移動し、ここでも最終単語には左括弧などは存在しないから次の文節『うつくしい」』に移動する。ここでは最終単語は右括弧『」』なので、ここが係り受けの解析範囲の終端と判断される。即ち、まず『「赤い花がうつくしい」』が係り受けの検定範囲として確定されるのである。同様に、その後の文節についても判断が行なわれ、『窓を開けて私は言った。』が係り受けの次の検定範囲として確定される。
【００４２】
こうして係り受けの検定範囲が確定されると、次に図７に示した係り受け検定処理ルーチンが起動される。尚、図６では、文節分かち書き処理により生成された全文節について係り受けの検定範囲を決めたが、係り受けの検定範囲が一つ定められる度に、図７の検定処理ルーチンを起動するものとしても差し支えない。
【００４３】
図７の係り受け検定処理ルーチンが起動されると、まず検定範囲として定められた範囲の終端の一つ前の文節を係り語として設定する処理を行なう（ステップＳ４００）。係り受けは、実施例では、係り語を優先として対応する受け語を検索するので、少なくとも一つの受け語を持つよう、検定範囲の終端一つ前の文節を係り語の文節として設定するのである。また、この時処理の各変数を初期化する処理も併せ行なう（例えばｎを値１に設定する）。次に係り語からｎ個後方の文節を受け語として設定する処理を行ない（ステップＳ４１０）、係り受け辞書９８を検索する処理を行なう（ステップＳ４２０）。尚、係り受け辞書は、「受け語語根」＋「係り語」という形式で記録されている。
【００４４】
辞書の一例を図８に示す。図８（Ａ）は、「機転＋利く」という係り受けが存在する場合の辞書の内容を模式的に示したものである。この辞書は、見出し語である「ききてん」という読みと「利く」と「機転」という語が登録されており、「花＋美しい」という係り受けであれば、図８（Ｂ）に示すように、見出し語である「うつくしはな」という読みと「美しい」と「花」という語が登録されている。尚、末尾には、後述する付属語許容解析用の付属語情報が付属している。実際の辞書は、この他検索用のインデックスや語の長さを示す情報などが付帯している。
【００４５】
係り語として設定された語と受け語として設定された語とから係り受け辞書を検索した結果、係り語と受け語から作成された見出し（例えば「うつくしはな」）が辞書に存在するか否かを判断し（ステップＳ４３０）、係り受けが辞書に見いだされなければ、検定範囲の終端まで検定したか否かを判断し（ステップＳ４４０）、終端まで検定していなければ、変数ｎを値１だけインクリメントし（ステップＳ４５０）、ステップＳ４１０から処理を繰り返す。係り受けが係り受け辞書９８に見いだされた場合には（ステップＳ４３０）、受け語に既に係り受けが成立しているか否かの判断を行なう（ステップＳ４６０）。これは、図９に示すように、一度係り受けＱ１＋Ｒ２が成立した場合、次にその前の文節Ｐについて係り受けを判定するとき、既に見い出されたＱ１＋Ｒ２という係り受けを優先するよう処理を異ならせるためである。
【００４６】
受け語側（即ち語Ｐと語Ｑとの係り受け判定における語Ｑ）について既に係り受けが成立しているのでなければ、処理はステップＳ４７０以下に移行し、まず係り語優先で係り受けの成立する語を検索る。係り語優先で係り受けの成立する語を検索するのは、ステップＳ４３０の判断により、少なくともひとつ係り受けが有り得る語が存在することが分かっているから、どの単語について係り受けが成立するかを、係り語の側の順位を先にして判断するのである。この検索の様子を図１０に示す。係り語の側に、その読みに対して複数の単語が見い出されている場合、その第１候補Ｘ１を固定して、受け語Ｙについて、既に自立語辞書５８に配列された学習の順位に従って、順位の高い側から順に、Ｙ１→Ｙ２→Ｙ３→Ｙ４・・・という順に係り受けが成立するかを見て行くのである（図１０符号Ａ１の検索）。この検索によって、係り受けを満足する単語が見い出されなかった場合には、次の係り語Ｘ２を選択して同様に検定を行なう（符号Ａ２の検索）。
【００４７】
こうした検索により、係り受け辞書９８から読み出された係り受けを満足する係り語と受け語の組みが見い出されたとき、次に付属語の許容解析を行なう（ステップＳ４８０）。この処理について、説明する。
【００４８】
助詞の許容解析は、係り受けのタイプにより定義された許容関係を満たしているかを判断するものであり、係り受けのタイプ毎に次の類型を持つ。
［Ｉ］連用修飾型
▲１▼名詞＋助詞＋用言の場合の助詞
格助詞「が」「から」「で」「と」「に」「へ」「より」「を」「の」
係助詞「は」
▲２▼用言連用形＋用言の場合
▲３▼名詞＋用言（助詞省略型）の場合の省略可能な助詞
「が」「は」係助詞，副助詞
［ＩＩ］連体修飾型
▲４▼名詞＋助詞＋名詞の場合の助詞
「の」
▲５▼体言＋体言（並列）の場合の助詞
「や」「と」
▲６▼用言連体形＋名詞の場合
▲７▼連体詞＋名詞の場合
【００４９】
即ち、係り受け関係にあると判断された２つの語の関係が上記の▲１▼ないし▲７▼のいずれかに属するとして、係り受け関係にある両語の間に存在する付属語（大部分は助詞もしくは助詞的表現）が上記のいずれかに該当する場合は、係り受け辞書９８には係り受け関係を有する語について許容する助詞の設定がなされているから、これを検定するのである。例えば、「機転」と「利く」との間の係り受けが助詞の許容設定（の・が）を伴っている場合、上記▲１▼のケース（名詞＋助詞＋用言）に属するから、「の」「が」は両語間に存在可能であるけれども（機転が利いた、機転の利いた→○）、他の格助詞「から」「で」などは許容できない（機転から利いた、機転で利いた→×）ということになる。
【００５０】
▲１▼ないし▲７▼の各関係について、そこに挙げられたもの以外については、許容されると判断する。この許容されると判断する例を以下に列挙するが、これらは、係り受けとしては実際の表現としては成り立たない場合を含む可能性がある。しかし、係り受けは、実際の人間の言語活動としては、広い概念であり、あまりに厳格な係り受けの取り決めはむしろ現実にそぐわないことが多い。また、余りに厳密な係り受けの取り決めは係り受け辞書９８のいたずらな増大を招くだけであり、係り受け検定の速度も低下させる。そこで、本実施例では、付属語の許容について、係り受けの生じる関係を▲１▼から▲７▼に分け、その中で許容・非許容の明確なものについては、係り受け辞書に許容するものとして係り受け関係の成り立つ語と共に記憶し、それ以外については、許容するものとしたのである。
【００５１】
［ＩＩＩ］許容される表現−連用修飾形の場合
・名詞＋格助詞的表現＋用言における格助詞的表現
「ずつ」「として」「のため」「において」「によって」など、
・名詞＋係助詞＋用言における係助詞
「こそ」「さえ」「しか」「でも」「も」など、
・名詞＋副助詞＋用言における副助詞
「きり」「くらい」「ずつ」「だけ」など、
・名詞＋副助詞的表現＋用言における副助詞的表現
「なので」「なら」など、
・用言＋助詞＋用言における助詞
「のは」など
・接続助詞「ので」「から」「から」「て」など、
・接続助詞的表現「からには」「ためには」「ほど」「うえ」など、
・用言＋用言を並列させる表現「か」「し」「たり」「と同時に」など、
【００５２】
［ＩＶ］許容される表現−連体修飾形
・名詞＋助詞的表現＋名詞における助詞的表現
「における」「に関する」「に基づいて」など、
・用言＋助詞的表現＋名詞における助詞的表現
「ための」「といった」「に伴う」「などの」「ごとき」など、
・体言＋体言を並列させる表現「か」。
【００５３】
以上の規則に従って、係り受けの関係が見い出された２つの語の間の付属語の許容について判断する。例として挙げた「花」「美しい」の場合には、許容される格助詞は「が」であるから、この場合は係り受けの成立が認められる。そこで、これを判定し（ステップＳ４８０）、係り受けが成立した場合には、成立が認められた語を、その文節を構成する自立語における係り語と受け語との第１候補として確定する処理を行なう（ステップＳ４９０）。即ち、自立語辞書５８の同音異語の学習による登録順序を入れ換えるのである。更にこうして見い出された受け語から係り語までの間を係り受け成立済み範囲として登録し、これを管理する処理を行ない（ステップＳ５００）、全範囲について係り受けの検索を行なったか否かの判断（ステップＳ５１０）に進む。なお、両語間に存在する付属語により係り受けの成立が否定された場合には、第１候補の変更を行なわず、更にその係り語と受け語について係り受け情報が他にあれば同様の検定を行ない（図示省力）、なければ全範囲について終了したかの判断に移行する。
【００５４】
係り受け検定の全範囲、即ち、図６に示した処理により確定された係り受けの解析範囲の全部について係り受けの検定が終了していなければ、変数ｎを値１に初期化し（ステップＳ５２０）、更に係り語を一つ前の文節に移動し（ステップＳ５３０）、全範囲について係り受けの検定が完了するまで、上記の処理（ステップＳ４１０ないし５００）を繰り返す。
【００５５】
ステップＳ４６０において、受け語に既に係り受けが成立していると判断された場合には、既に係り受けが成立しているとされた受け語を用いた係り受けが成立するかを判断する（ステップＳ５４０）。即ち、図９（Ａ）に示した例では、語Ｑ１と語Ｒ２との間に係り受けが成立していた場合、語Ｐと語Ｑとの間の係り受けの判定の際には、受け語となる語Ｑ１を固定して、係り受けの成立を判断するのである。従って、この場合には、係り受けＰ１＋Ｑ２が存在しても、これを採用することはない。但し、受け語をＱ１とする係り受けＰ２＋Ｑ１が見い出されれば、これは係り受けの成立とする。従って、図９（Ｂ）に示したように、Ｐ２＋Ｑ１＋Ｒ２という係り受けは成立するのである。図４に示した例文では、係り受けの検定範囲とされた「あかいはながうつくしい」について、「花」＋「美しい」という係り受けが見い出されて、「花」「美しい」が第１候補として学習されたあとでは、「赤い」＋「鼻」という係り受けが存在したとして、この係り受けが採用されることはない。他方、「赤い」＋「花」という係り受けは採用される。
【００５６】
そこで、この場合には、係り語の第１候補（例では語Ｐ２）を確定する（ステップＳ５５０）。その後、同様に係り受けの範囲の管理（ステップＳ５００）と、全範囲についての係り受け検定の終了かの判断（ステップＳ５１０）とを行なう。係り受けの検定範囲として確定された全範囲について係り受けの検定が終了した場合には、「ＥＮＤ」に抜けて本ルーチンを終了する。
【００５７】
以上説明した本実施例によれば、次の効果が得られる。まず第１に、係り受けの検定範囲が明確になる。即ち、左右の括弧や句読点を越えては、係り受けの判定を行なうことがなく、通常の構文において係り受けがおよぶ範囲に近い範囲で係り受けの判定を行なうことができる。従って、係り受けの範囲をむやみに大きくして処理に過大な時間がかかるといったことがない。図５に示した例文では、係り受けの検定範囲は、『「あかいはながうつくしい」』と『まどをあけてわたしはいった。』とに分離されているから、「美しい」＋「窓」といった係り受けが係り受け辞書９８に存在したとしても、これを係り受けの存在として検定することがない。
【００５８】
また、本実施例では、係り受けの検定を、係り受けの検定範囲とされた範囲内で文末に近い位置から開始し、かつ係り語を優先として行なっている。この構成は、係り受けによる単語候補の確定をより所望のものとする点で極めて有効であった。これは、日本語では文末の述部側が文の意味を担っていることが多いこと、および行為（一般に文末側の述部が記述）が同じで主体（一般により文頭側の記述）が代わることよりも、主体の行為が変化する場合が多いことからではないかと考えられる。
【００５９】
また、係り受けが一旦成立したと判断された場合には、その受け語から係り語までの範囲を係り受け成立範囲として管理するので、係り受けの範囲が交差することがない。また、２以上の受け語が一つの係り語を受けるという判断をすることもない。また、係り受けの成立を隣接する文節を越えて判断するので（ｎ≧２の場合）、副詞などによる修飾が係り受け関係の間に入っても係り受けの検定を正しく行なうことができる。従って、複数の係り受けが成立する場合には、図１１（Ａ）に示すように、独立した係り受けが別個に成立する組合わせか、図１１（Ｂ）に示すように、一つの受け語が２以上の係り語を受ける組合わせか、図１１（Ｃ）に示すように、一つの係り受けを跨ぐようにもう一つの係り受けが成立する組合わせが許されることになる。
【００６０】
次に本発明の第２の実施例について説明する。第２実施例の仮名漢字変換装置は、第１実施例の図６に示した係り受け範囲を決定する終了位置検索処理ルーチンに代えて図１２に示した係り受け検定範囲確定処理ルーチンを実行するものであり、他は第１実施例と同一である。第２実施例の係り受け検定範囲確定処理ルーチンは、図１２に示すように、まず、入力され文節分かち書きの処理がなされた全文節をサーチし（ステップＳ６００）、対をなす符号、例えば『「』と『」』、『（』と『）』などが存在するかを判断し（ステップＳ６１０）、存在する場合には解析対象をその内部に移し（ステップＳ６２０）、もはや対をなす符号がなければ、句点（。？！等、但し読点『、，』は含まない）で区切られる範囲を、係り受けの検定範囲として確定する（ステップＳ６３０）。他方、対をなす符号でくくられた範囲の外については、その範囲の前後を一体とし、かつ句点で区切られる範囲を係り受けの検定範囲として確定する（ステップＳ６４０）。
【００６１】
即ち、この処理によれば、図１３に示した例文『かれは「なつはあつい」と、ぼくにはなした』の場合、文節分かち書きにより得られた第１候補の単語を用いて係り受けの検定範囲を示すと、対をなす符号『「」』の中の『夏は厚い』と、その外部の『彼はと、僕に放した。』となり（は文節の区切りを示す）、係り受けＢ１，Ｂ２，Ｂ３の存在により、前者にあっては『夏は暑い』が第１候補として学習され、後者にあっては『彼はと、僕に話した。』が第１候補として学習される。従って、括弧などでくくられた大きな節を挟んで成立するように係り受けの存在も正しく判定することができる。
【００６２】
以上説明した二つの実施例では、係り受けの検定は、一つの文節を起点として文節間距離の短いものから順に行なったが、図１４に示すように、係り受け検定を行なう範囲の中で、まず文節間距離が１のものすべて（「Ｃ」＋「Ｄ」、「Ｂ」＋「Ｃ」、「Ａ」＋「Ｂ」）について検定を行ない、次に文節間距離２のもの（「Ｂ」＋「Ｄ」、「Ａ」＋「Ｃ」）、更に遠いもの（「Ａ」＋「Ｄ」）と順に係り受けの検定を行なっても差し支えない。この場合でも、図１４に示すように、後方の文節間の係り受けから検定するものとすることが望ましい。
【００６３】
以上本発明の実施例について説明したが、本発明はこうした実施例に何等限定されるものではなく、例えば最小コスト法に代えて２文節最長一致法などの他の文節分かち書きの手法を用いた構成、係り受け検定における係り語優先か受け語優先かを切り替えられる構成、係り受け検定を行なう最大の文節間距離（実施例ではｎの大きさ）を所定の大きさに制限する構成、係り受け辞書のその他の構成など、本発明の要旨を逸脱しない範囲内において、種々なる態様で実施し得ることは勿論である。
【００６４】
【発明の効果】
以上説明したように本発明の請求項１の仮名漢字変換装置および請求項６の仮名漢字変換方法では、入力された仮名文字列に対して、文法辞書を参照して文節分かち書きを行ない、分かち書きされた一つの文節を起点とし、所定の文節同士の係り受けの情報を記憶した係り受け情報辞書を参照して、他の文節との係り受けの存在を、起点とした文節からの文節間の距離の小さなものから順次検定する。他方、入力された仮名文字列中に存在する区切り記号を検索しており、ある文節を起点としてなされる係り受けの検定中に区切り記号が見いだされた場合には
、その位置を係り受け検定の終了位置として、一つの文節を起点とする係り受け検定を終了させる。しかも、起点とした一つの文節を係り語として、係り受け検定の受け語の検定を、対象とした文節の次候補について順次行ない、かつ該次候補についての検定が終了した後、該起点とした一つの文節の次候補を係り語として、受け語の検定を続行する。従って、係り受けの検定を行なう範囲を、日本語の構文構成に即して適正に定めることができる。しかも、この仮名漢字変換装置は、係り受けの検定を単に第１に文節候補とされたものについてのみ行なうのではなく、次候補についても行ない、かつ受け語についての次候補に関する検定を優先し、その後、係り語の次候補について検定を行なうので、他の候補についての検定も併せて行なうことができる。
【００６５】
請求項２の仮名漢字変換装置と請求項７の仮名漢字変換方法では、入力された仮名文字列に対して、文法辞書を参照して文節分かち書きを行ない、分かち書きされた一つの文節を起点とし、所定の文節同士の係り受けの情報を記憶した係り受け情報辞書を参照して、他の文節との係り受けの存在を、起点とした文節からの文節間の距離の小さなものから順次検定する。他方、入力された仮名文字列中、開始と終了を示す対をなす符号により区切られた範囲を検索しており、この範囲を、係り受け検定の範囲から除く。しかも、起点とした一つの文節を係り語として、係り受け検定の受け語の検定を、対象とした文節の次候補について順次行ない、かつ該次候補についての検定が終了した後、該起点とした一つの文節の次候補を係り語として、受け語の検定を続行する。従って、係り受けの検定は、開始と終了を示す対をなす符号により囲まれた範囲を飛ばして行なわれることになる。この結果、挿入句や挿入節を越えて成立する係り受けを検定することが可能となる。しかも、この仮名漢字変換装置は、係り受けの検定を単に第１に文節候補とされたものについてのみ行なうのではなく、次候補についても行ない、かつ受け語についての次候補に関する検定を優先し、その後、係り語の次候補について検定を行なうので、他の候補についての検定も併せて行なうことができる。
【００６７】
請求項４の仮名漢字変換装置および請求項８の仮名漢字変換方法では、分かち書きされた文節が３以上の場合の係り受けの検定の優先順位を定め、分かち書きされた一つの文節を起点とし、係り受け情報辞書を参照して他の文節との係り受けの存在を、３以上の文節のうち、文末に近い側から順次検定するので、係り受けを用いて、自然な日本語の構文による仮名漢字変換の候補を得ることができる。
【００６８】
請求項５の仮名漢字変換装置では、まず文末に近い係り受けの検定を先に行ない、その後、その文節より前方の文節において、既に係り受けが検定された係り受けと共に成立する係り受けのみ、係り受けの成立として認めるので、係り受けを無理に成立させることがなく、自然な日本語の構文による仮名漢字変換の候補を得ることができる。
【図面の簡単な説明】
【図１】本発明の一実施例である仮名漢字変換装置における仮名漢字変換機能の実現形態を示す機能ブロック図である。
【図２】実施例としての仮名漢字変換装置が実現されるハードウェアを示すブロック図である。
【図３】文節分かち書き部１０２において実行される文節分かち書き処理を示すフローチャートである。
【図４】実施例における最小コスト法による文節分かち書きの一例を示す説明図である。
【図５】実施例における係り受け検定の対象となる例文を示す説明図である。
【図６】終了位置検索処理ルーチンを示すフローチャートである。
【図７】実施例における係り受け検定処理ルーチンを示すフローチャートである。
【図８】実施例における係り受け辞書の一例を示す説明図である。
【図９】複数文節における係り受けの成立の様子を示す説明図である。
【図１０】係り受けの検定における優先順位を示す説明図である。
【図１１】一つの入力文字列に内に複数の係り受けが存在する場合の類型を示す説明図である。
【図１２】第２実施例の係り受け検定範囲確定処理ルーチンを示すフローチャートである。
【図１３】第２実施例における係り受け検定範囲の確定処理の例文を示す説明図である。
【図１４】係り受け検定を行なう範囲の中での係り受け検定の順序の一例を示す説明図である。
【符号の説明】
２１…ＣＰＵ
２２…ＲＯＭ
２３…ＲＡＭ
２４…キーボード
２５…キーボードインタフェース
２６…ＣＲＴ
２７…ＣＲＴＣ
２８…プリンタ
２９…プリンタインタフェース
３０…ハードディスクコントローラ（ＨＤＣ）
３１…バス
３２…ハードディスク
４０…文字入力部
４２…変換制御部
４４…変換後文字列出力部
５０…文字列入力部
５２…文字格納部
５４…自立語候補作成部
５６…自立語解析位置管理部
５８…自立語辞書
６４…付属語候補作成部
６６…付属語解析位置管理部
６８…付属語辞書
７０…係り受け学習部
７０…学習部
７２…自立語学習部
７４…補助語学習部
７６…接辞学習部
７８…文字変換学習部
８０…単語データ作成部
８２…接続検定部
８４…接続検定テーブル
８６…コスト計算部
９０…係り受け候補調整部
９２…受動解析部
９４…助詞許容解析部
９６…係り受け範囲管理部
９８…係り受け辞書
９９…係り受け転置情報調整部
１００…単語データ格納部
１０２…文節分かち書き部
１０４…係り受け転置情報調整部
１０６…文節データ格納部
１０８…変換文字列出力部[0001]
[Industrial application fields]
The present invention relates to a kana-kanji conversion device and a kana-kanji conversion method, and more particularly, to a kana-kanji conversion device and a kana-kanji conversion method used for selecting a kanji candidate for a word constituting a phrase using information on dependency between words. About.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, various kana-kanji conversion devices that convert a kana character string input from a keyboard or the like into a desired kana-kanji mixed sentence have been proposed as a Japanese sentence input device or a Japanese sentence editing apparatus. In such a kana-kanji conversion device, it is not necessary for the user to specify each word or phrase separation position, and it is desired that the converted character string has the notation desired by the user. In Japan, there are many homophones and homonyms, so in order to obtain the correct kana-kanji mixed sentence without error, you will probably have to analyze the meaning of the sentence. In order to analyze the meaning, it is necessary to have a knowledge base of tens of thousands of words related at least organically, which is extremely difficult to realize.
[0003]
Therefore, the conventional Kana-Kanji conversion device attempts to obtain the desired result without analyzing the meaning by devising the phrase segmentation process and the learning process for selecting the homonym. As segmentation processing, a two-phrase longest match method in which two clauses are obtained as the first candidate among the phrases that can be composed of two clauses as a basic unit, or a word that can be a candidate for a word constituting the clause In addition, there is a minimum cost method in which a cost is given to a combination of words and a phrase whose score satisfies a predetermined condition is a first candidate. Also, in the learning process, learning of homophones with the highest priority given to the word selected by the user immediately prior to the homophone is used as the next candidate, or used as the length of the phrase containing a word. Learning the phrase length that gives the highest priority to the length specified by the person is known.
[0004]
Furthermore, recently, we have focused on specific relationships between words (for example, “hot” and “tea” in “hot tea” or “hot” and “summer” in “hot summer”) and remembered this relationship. By preparing a dictionary, when one word (for example, “tea”) is specified, a word related to this word (for example, “hot” among candidates for “hot”) is selected as the first candidate. Some have been proposed (for example, “Kana-Kanji conversion device” in Japanese Patent Laid-Open No. 3-105664, “Kana-Kanji conversion device” in Japanese Patent Laid-Open No. 4-277661, etc.). These specific relationships between words are called “dependency” or “co-occurrence”.
[0005]
[Problems to be solved by the invention]
However, the syntax rules of the languages used in reality are extremely complex, and simply referring to the dependency relationship does not give the desired Japanese language. What procedure does the dependency test do? It was a big problem. For example, when there are parentheses or punctuation marks in the input character string, there is no clear solution to the problem of how to test the dependency.
[0006]
In addition, when performing kana-kanji conversion for multiple phrases at once, where to check the dependency relationship, if multiple dependency relationships are found, what to do with the priority order, or even multiple dependency relationships Whether or not coexistence was allowed or not when the relationship was found was not made clear.
[0007]
The kana-kanji conversion apparatus and the kana-kanji conversion method of the present invention have been made for the purpose of clarifying these problems and obtaining a desired kana-kanji mixed sentence using the dependency relationship.
[0008]
[Means and Actions for Solving the Problems]
  The kana-kanji conversion device according to claim 1 is:
  A kana-kanji conversion device for inputting a kana character string and referring to a grammar dictionary to generate a kana-kanji mixed character string candidate,
  A segmentation means for segmenting the input kana character string with reference to the grammar dictionary;
  A dependency information dictionary storing dependency information between predetermined phrases;
  Dependency verification means for verifying the presence of a dependency with other clauses in order from the one having the smallest distance between phrases, starting from the one phrase that has been divided, and referring to the dependency information dictionary;
  Delimiter search means for searching for delimiters present in the input kana character string;
  A dependency test ending means for ending the dependency test starting from the one clause when the delimiter is found during the dependency test and using the position as the end position of the dependency test;
  WithIn addition,
The dependency verification means is:
Next candidate test means for sequentially determining the next candidate of the target phrase, with the test of the dependency test as a starting word for the phrase,
After the test by the next candidate test means is completed, the next candidate test means for continuing the test of the received word with the next candidate of the one phrase as the starting point as a related word;
  The main point is that
[0009]
  In this kana-kanji conversion device, the segmentation means refers to the grammar dictionary for the input kana character string, and segmentation is performed by referring to the grammar dictionary. With reference to a dependency information dictionary storing dependency information between clauses, the presence of dependency with other clauses is sequentially examined from the ones with the smallest distance between clauses from the starting clause. On the other hand, if the delimiter search means is searching for a delimiter that exists in the input kana character string and the delimiter is found during a dependency test starting from a certain phrase, The acceptance test end means uses the position as the end position of the dependency test and ends the dependency test starting from one phrase.In addition, the kana-kanji conversion device does not perform the dependency test only for the first candidate phrase, but also performs the next candidate, and gives priority to the test for the next candidate for the received word, Thereafter, the next candidate for the clerk is tested.
[0010]
  The kana-kanji conversion device according to claim 2 is:
  A kana-kanji conversion device for inputting a kana character string and referring to a grammar dictionary to generate a kana-kanji mixed character string candidate,
  A segmentation means for segmenting the input kana character string with reference to the grammar dictionary;
  A range search means for searching a range delimited by a pair of codes indicating start and end in the input kana character string;
  A dependency information dictionary storing dependency information between predetermined phrases;
  Dependency verification means for verifying the presence of a dependency with other clauses in order from the one having the smallest distance between phrases, starting from the one phrase that has been divided, and referring to the dependency information dictionary;
  A test range removing unit for removing the range searched by the range search unit from the range of the dependency test by the dependency test unit;
  WithIn addition,
The dependency verification means is:
Next candidate test means for sequentially determining the next candidate of the target phrase, with the test of the dependency test as a starting word for the phrase,
After the test by the next candidate test means is completed, the next candidate test means for continuing the test of the received word with the next candidate of the one phrase as the starting point as a related word;
  The main point is that
[0011]
  In this kana-kanji conversion device, the segmentation means refers to the grammar dictionary for the input kana character string, and segmentation is performed by referring to the grammar dictionary. With reference to a dependency information dictionary storing dependency information between clauses, the presence of dependency with other clauses is sequentially examined from the ones with the smallest distance between clauses from the starting clause. On the other hand, the range search means searches the input kana character string for a range delimited by a pair of signs indicating the start and end, and the test range removal means is determined by the dependency test means. Excluded from the scope of dependency testing. Therefore, the dependency test is performed by skipping a range surrounded by a pair of codes indicating start and end.In addition, the kana-kanji conversion device does not perform the dependency test only for the first candidate phrase, but also performs the next candidate, and gives priority to the test for the next candidate for the received word, Thereafter, the next candidate for the clerk is tested.
[0014]
  The kana-kanji conversion device according to claim 4 is:A kana-kanji conversion device according to claim 1 or 2, wherein
  When the segmented phrase is 3 or more, the one segment segmented as a starting point is referred to, and the presence of a dependency with another clause is referred to by referring to the dependency information dictionary. Dependency verification means to verify sequentially from the side near the end of the sentence
  The main point is that
[0015]
This kana-kanji conversion device determines the priority of dependency test when there are three or more shared phrases, and the dependency verification means starts with one shared phrase as a dependency information dictionary. The presence of a dependency with other clauses is sequentially examined from the side closer to the end of the sentence among three or more clauses.
[0016]
The kana-kanji conversion device according to claim 5 includes the dependency verification means according to claim 4,
A first test means for first performing a test starting from a related word close to the end of the sentence;
When a dependency is found by the first verification means, pay attention to the phrase ahead of the phrase that is the starting point, and only the dependency that is established with the dependency found by the first verification means. The second verification means to take out as the establishment of the reception
The main point is that
[0017]
Therefore, the kana-kanji conversion device first performs the dependency test near the end of the sentence first, and then only the dependency that has been established together with the dependency already verified in the preceding clause. We recognize as the establishment of
[0018]
  The kana-kanji conversion method of claim 6 is:
  A kana-kanji conversion method for inputting a kana character string and referring to a grammar dictionary to generate a kana-kanji mixed character string candidate,
  The input kana character string is segmented with reference to the grammar dictionary,
  Starting from one of the above-mentioned clauses, refer to the dependency information dictionary storing the dependency information between the predetermined clauses, and determine the presence of dependency with other clauses with a small distance between clauses And test sequentially from
  Search for a delimiter that exists in the input kana character string,
  When the delimiter is found during the dependency test, the dependency test starting from the one clause is terminated with the position as the end position of the dependency test.Shall be
Furthermore, in the dependency test,
One phrase as the starting point is used as a dependency word, and the dependency test of the dependency test is sequentially performed on the next candidate of the target phrase.
After the test for the next candidate is completed, continue the test of the spoken word using the next candidate of the one phrase as the starting point as a related word.about
  Is the gist.
[0019]
  The kana-kanji conversion method of claim 7 is:
  A kana-kanji conversion method for inputting a kana character string and referring to a grammar dictionary to generate a kana-kanji mixed character string candidate,
  In the input kana character string, a range delimited by a pair of codes indicating start and end is searched, and
  The input kana character string is segmented with reference to the grammar dictionary,
  With reference to the dependency information dictionary storing the dependency information of predetermined phrases while excluding the searched range from the dependency test range, starting from the one phrase that has been divided, and other Sequentially check the existence of the dependency on the clauses in ascending order of the distance between clauses.Shall be
Furthermore, in the dependency test,
One phrase as the starting point is used as a dependency word, and the dependency test of the dependency test is sequentially performed on the next candidate of the target phrase.
After the test for the next candidate is completed, continue the test of the spoken word using the next candidate of the one phrase as the starting point as a related word.about
  Is the gist.
[0020]
  The kana-kanji conversion method of claim 8 is:A kana-kanji conversion method according to claim 6 or 7,
  A kana-kanji conversion method for inputting a kana character string and referring to a grammar dictionary to generate a kana-kanji mixed character string candidate,
  The input kana character string is segmented with reference to the grammar dictionary,
  When there are three or more segmented phrases, the dependency information dictionary that stores the dependency information between predetermined phrases is used as a starting point for the one segment that has been segmented. Sequentially check for existence from the side closer to the end of the sentence among the three or more phrases
  Is the gist.
[0021]
【Example】
In order to further clarify the configuration and operation of the present invention described above, preferred embodiments of the present invention will be described below. FIG. 1 is a block diagram showing a control logic for kana-kanji conversion, and FIG. 2 is a block diagram showing hardware on which the kana-kanji conversion control logic actually operates. As shown in FIG. 2, this apparatus includes the following units connected to each other by a bus 31 around a known CPU 21. The units connected to each other by the CPU 21 and the bus 31 will be briefly described.
[0022]
ROM 22: mask memory for storing a kana-kanji conversion program, etc.
RAM 23: Readable and writable memory constituting main memory,
Keyboard interface 25: An interface for managing key inputs from the keyboard 24.
CRTC 27: CRT controller that controls signal output to the CRT 26 that can display in color.
Printer interface 29: an interface for controlling output of data to the printer 28;
A hard disk controller (HDC) 30; an interface for controlling the hard disk 32;
It is. The hard disk 32 stores various programs loaded into the RAM 23 and executed, a kana-kanji conversion processing program provided in the form of a device driver, or various conversion dictionaries referenced by the kana-kanji conversion processing program.
[0023]
With the hardware configured in this manner, text can be input, kana-kanji conversion, editing, display, printing, and the like. That is, the character string input from the keyboard 24 is subjected to predetermined processing by the CPU 21, stored in a predetermined area of the RAM 23, and displayed on the screen of the CRT 26 via the CRTC 27.
[0024]
Next, functions executed by the hardware thus configured will be described with reference to FIG. The configuration and operation of each unit shown in FIG. 1 will be outlined. The processing performed here is executed by the central processing unit (CPU 21) based on data input from the keyboard 24. All processing is performed by the CPU 21. As for kana-kanji conversion, a predetermined interrupt process is activated when the keyboard 24 is operated, and the input key image is converted into a corresponding kana character string, and further converted into a kana-kanji mixed character string. Starts. Of course, in a computer capable of parallel processing, kana-kanji conversion may be performed by one application (input method), and the conversion result may be transferred to a required application. In this case, inputs from the keyboard 24 are collectively accepted by the input method.
[0025]
The key image from the keyboard 24 is received by the character input unit 40 and is converted into a corresponding kana character string here. In the case of romaji input, it is converted into a kana character string with reference to a predetermined conversion table. Each time one kana character is obtained, the character input unit 40 sends the kana character to the conversion control unit 42. The conversion control unit 42 plays a central role in kana-kanji conversion, controls various kana-kanji conversion described later, and sends the result to the converted character string output unit 44. The post-conversion character string output unit 44 actually sends a signal to the CRTC 27 and displays the post-conversion character string on the CRT 26.
[0026]
The conversion control unit 42 passes the received kana character to the character string input unit 50. The character string input unit 50 stores the kana character string in the character storage unit 52. Based on this character string, the independent word candidate creation unit 54 and the adjunct word candidate creation unit 64 create word data candidates. The independent word candidate creation unit 54 uses the independent word dictionary 58 stored in advance in the hard disk 32 and performs processing for extracting independent word candidates from the obtained kana character string under the management of the independent word analysis position management unit 56. Do. On the other hand, the adjunct word candidate creation unit 64 uses the adjunct word dictionary 68 and performs processing for extracting an adjunct word candidate from the obtained kana character string under the management of the adjunct analysis position management unit 66. The process of extracting independent word candidates and auxiliary word candidates while moving the analysis position will be described later.
[0027]
Here, the independent word dictionary 58 changes priorities such as homonyms and affixes by learning. This learning process is performed by the dependency learning unit 70, the independent word learning unit 72, the auxiliary word learning unit 74, the affix learning unit 76, and the character conversion learning unit 78. The dependency learning unit 70 gives priority to the combination selected by the user in the same word combination when the user selects a word other than the word corresponding to the dependency under the condition that the dependency is satisfied. It learns dependency relationships. The independent word learning unit 72 learns the last selected word as the highest priority candidate in the independent word group in which the homonyms exist. The auxiliary word learning unit 74 learns which word form is used to convert an auxiliary word such as “please”, for example, “please” or “please”. Further, the affix learning unit 76 learns conversion formats (for example, “Go”, “Go”, etc.) such as prefixes and suffixes. The character conversion learning unit 78 learns a character string when the input character string is confirmed as hiragana or katakana as it is, and outputs the hiragana or katakana determined in the subsequent conversion processing as a candidate. .
[0028]
The independent word candidate creation unit 54 and the attached word candidate creation unit 64 obtain the created word candidates, and the word data creation unit 80 creates data for each word candidate. That is, as a result of the connection verification unit 82 referring to the connection verification table 84 for connection between the obtained independent words and ancillary words, independent words and independent words, and connection between phrases consisting of “independent words + ancillary words”, The result of the cost calculation unit 86 performing the overall cost calculation is obtained and output as data for each word. This word data is temporarily stored in the word data storage unit 100, receives the adjustment output from the dependency candidate adjustment unit 90, and is used for the phrase segmentation process.
[0029]
The dependency candidate adjustment unit 90 receives word candidates from the independent word candidate creation unit 54 and the adjunct word candidate creation unit 64 via the word data creation unit 80, the word data storage unit 100, and the phrase segmentation unit 102. The test is performed. The dependency test is performed by referring to a dependency dictionary 98 prepared in advance on the hard disk 32. Since the dependency dictionary stores only one dependency information even if the dependency relationship is reversed in order to reduce the capacity, the transposed information adjustment unit 99 is accompanied by a grammatical analysis. Thus, the information in the dependency dictionary 98 is expanded to adjust the dependency candidates. For example, only dependency information of the dependency word “Hanaga” + reception word “beautiful” is stored in the dependency dictionary 98, and an attempt is made to test the dependency of the dependency word “beautiful” + reception language “flower”. It is.
[0030]
The range in which the dependency test is performed is managed by the dependency range management unit 96. In addition, there are several allowable conditions for the dependency relationship test, which are determined by the service / passive analysis unit 92, the particle allowable analysis unit 94, and the like. The first phrase segmentation candidate is determined from the phrase candidates adjusted by the above dependency test, and is stored in the phrase data storage unit 106. The stored candidates are output to the conversion control unit 42 by the conversion character string output unit 108. The conversion control unit 42 displays this character string as a candidate character string, and an undesired character string may be a candidate. Therefore, in response to an instruction from the user, processing such as display and selection of the next candidate is performed. Do. These instructions, selection results, and the like are input to the phrase data storage unit 106 and the learning units 70 to 78 described above, and are used for determining part of the phrase, rewriting priority by learning, and the like. Although not shown, when the character string is finalized by the user, all data temporarily stored in each unit is deleted and prepared for the next conversion.
[0031]
The outline from the input of the kana character to the output of the conversion word character string has been outlined above. Next, details of each process will be described. First, general phrase segmentation processing will be described, and then dependency processing, which is a main part of the present invention, will be described. FIG. 3 is a flowchart showing an outline of phrase segmentation processing by the minimum cost method. As shown in the figure, first, after performing initialization processing (step S200) such as erasing temporarily stored data and initializing the analysis position to the first digit, processing for obtaining the analysis position is performed (step S200). Step S210). The analysis position is a position where the kana character string input so far is advanced one by one in order. For example, if the kana character string “Kagaku wa hako hakubu” is input as shown in FIG. 4, the first analysis position is the position of the first digit “ku”. At this analysis position, a process for searching the independent word dictionary 58 and the attached word dictionary 68 stored in the hard disk 32 is performed (step S220).
[0032]
After searching the dictionary, a process for checking the combination of the obtained word with the previous word is performed (step S230). If only words that cannot be combined between words are obtained, the dictionary is further stored. Search for. For example, in the example shown in FIG. 4, “wa” of the counsel retrieved from the adjunct dictionary 68 for “ha” of “kohahakobu” cannot be combined with the immediately preceding case particle “wo”. Therefore, it is treated as invalid data by the connection verification by the word data creation unit 80 and the connection verification unit 82. In FIG. 4, a symbol “x” is attached to a word that is determined to be invalid by such a combination check. The connection between words is stored in advance in the connection verification table 84. This connection verification table 84 is a table that gives information on the possibility of connection between parts of speech of a word. It is given as a matrix of × 400. When the dictionary search and the combination check at one analysis position are completed, the analysis position is advanced in order and the process is further repeated.
[0033]
For words that are likely to be combined, cost calculation is performed next, and processing for obtaining the minimum total cost of the words is performed (step S240). This process is performed by the cost calculation unit 86. In the example shown in FIG. 4A, “car” is, for example, “ku” + “ru” + “ma”, “car” + “ma”, “car”. , And when applying words to them, it is assumed that it has the cost of independent words = 2 and attached words = 0, and if it is “suffer” (independent words) + “style” (independent words), The total cost of “flow” is calculated as 4. At this time, the cost of “between” is 4 because the minimum total cost is calculated, and “come” + “between” is not the cost 6 of “bitter” + “flow” + “between”. This is because the case cost 4 is adopted. Since “de” and “ha” are attached words, the cost of the word “car” = 2 with the lowest cost among the previous words becomes its own cost. In FIG. 4, the cost of each word is shown in the lower right.
[0034]
After the above cost calculation, the cost of each word is checked, and a process of invalidating an inappropriate cost is performed (step S250). Inappropriate cost is a combination of words that is costly compared to other word combinations. In other words, selecting a combination of words such as “ku” + “toru” will be more expensive than other words “coming” and “repeating” obtained up to that position. Judgment is made and it is excluded from the phrase candidates. In FIG. 4, words that are not adopted based on this concept of minimum cost are shown as “●” in the upper right of the word. In FIG. 4, “◯” indicates that the word remains as a word that may form a phrase candidate as a result of the above-described combination check and cost check.
[0035]
Next, a process for linking the word candidates thus given the cost is performed (step S260). That is, for words for which coupling is valid, the coupling relationship is related by setting a pointer. In the example of FIG. 4, “come”, “carry”, “car”, “to”, “de”, “ha”, “ha”, etc. have been calculated as the minimum total cost. It links to “to” and “car” to “de” and “to”. This combination check, cost calculation, and linking process are repeated while searching for all words at one analysis position is completed. Further, when the dictionary search at the analysis position is completed, the analysis position is further advanced by one, the establishment of a new word is examined, and the connection check and cost calculation are repeated in the same manner.
[0036]
When the analysis position reaches the position of the last input kana character and the analysis is completed for all words (step S265), a process of searching for the path with the minimum cost is performed on the assumption of the above processing (step S265). Step S270). This process is performed by the phrase segmentation unit 102, and is a process of searching for a combination of valid words that minimizes the sum of the costs assigned to the words. In the example of “Humps until coming”, as shown by the solid line J in FIG. 4B, the division cost of “by car” + “carrying a box” + “carrying” is a total cost of 18, so the minimum Selected as a cost. Although not the minimum cost, other phrase segmentation candidates are also searched. For example, as shown by a broken line B in FIG. 4B, the division is “by car” + “child” + “carry” (cost = 20). In this way, after the candidate for the division is created (step S280), the process for creating the candidate inside each phrase is performed (step S290). That is, for example, candidates such as “box” and “box” are prepared for “hako wa” within one phrase segment. These phrase candidates and word candidates are used when the user is instructed to change the way of segmenting or to display the next candidate.
[0037]
Next, the dependency test process executed after the phrase segmentation process is performed and the kanji candidates are created for each phrase will be described. FIG. 5 is an explanatory diagram showing an example sentence to be converted to kana-kanji, FIG. 6 is a flowchart showing an end position search processing routine for searching for the end position of dependency, and FIG. 7 shows a dependency test processing routine. It is a flowchart to show. Here, as an example sentence, as shown in FIG. ”And a sentence with parentheses (“ ”) inside. It is assumed that the phrase shown in FIG. 5B is extracted by the division processing using the minimum cost method. First, an end position search processing routine for searching for a dependency end position will be described with reference to FIG.
[0038]
The end position search processing routine shown in FIG. 6 is a process for determining the range of the dependency test. In other words, when a plurality of clauses are extracted from the input kana character string by the phrase segmentation process, it is a process of determining from which clause to which clause it is desirable to perform the dependency test. When this processing routine is started, the first clause position of the input and segmented clause is first set as the dependency analysis range start position (step S300), and this clause is a clause generated from word data. It is determined whether or not there is (step S310). Among phrases, there are phrases consisting of indefinite words, such as “if X = A”. Since a phrase that does not include an independent word retrieved from the independent word dictionary 58 cannot be enough for a dependency to be established, it is not necessary to test the dependency range.
[0039]
If it is determined that the phrase is generated from the word data, then a process of extracting the last word in the phrase is performed (step S320). The phrase is configured as an independent word + an appendix, but a punctuation mark or the like may be attached at the end. Therefore, the last word taken out is the left parenthesis—for example, “(,“, [”, etc., judgment (step S330), right parenthesis—for example,“, ”,]]”, etc. Judgment (step S340), a punctuation mark-for example, ".・. ,? ! ] Is determined (step S350). If it is determined that none of these is found, or if the clause of interest is not a clause generated from word data in step S310, the dependency analysis range is searched for all the clauses created by the phrase segmentation. It is determined whether or not it has been performed (step S360). If all the phrases are not determined, the process moves to the next phrase (step S370), and the above processing is repeated from step S320.
[0040]
On the other hand, whether it is determined that the last word in the phrase is a left parenthesis, a right parenthesis, or a punctuation (steps S330, 340, 350), or if all the phrases are determined (step S360), The current phrase is determined to be the end of the dependency analysis range (step S380), and the process returns to "END" to end the present routine.
[0041]
With the above processing, for example, in the example sentence of FIG. 5, when the search for the end position is started from the phrase “red”, the last phrase “flower” is not included in the last word “red” because the last word “red” is not included. Will move to "", and since there is no left parenthesis or the like in the last word here, it moves to the next phrase "Utsui". Here, since the last word is a right parenthesis “” ”, it is determined that this is the end of the dependency analysis range. That is, first, ““ red flowers are beautiful ”” is determined as the dependency test range. Similarly, a judgment was made on the subsequent passages: 'I opened the window and I said. ] Is determined as the next test range for dependency.
[0042]
When the dependency test range is thus determined, the dependency test processing routine shown in FIG. 7 is started. In FIG. 6, the dependency test range is determined for all the phrases generated by the phrase segmentation process. However, each time a dependency test range is determined, the test processing routine of FIG. 7 is started. There is no problem.
[0043]
When the dependency test processing routine of FIG. 7 is started, first, a process is performed in which the phrase immediately before the end of the range defined as the test range is set as a dependency word (step S400). In the embodiment, the dependency is searched with the priority given to the dependency word. Therefore, the phrase immediately before the end of the test range is set as the phrase of the dependency word so as to have at least one reception word. . At this time, a process for initializing each variable of the process is also performed (for example, n is set to a value of 1). Next, a process of setting the n-th phrase after the dependency word as a received word is performed (step S410), and a process of searching the dependency dictionary 98 is performed (step S420). The dependency dictionary is recorded in the format of “reception word root” + “dependency word”.
[0044]
An example of the dictionary is shown in FIG. FIG. 8A schematically shows the contents of the dictionary in the case where there is a dependency of “promotion + working”. In this dictionary, the reading “Kikiten” as the headword and the words “handy” and “motivation” are registered, and if the dependency is “flower + beautiful”, as shown in FIG. The words “Utsushi Hana” and the words “beautiful” and “flower” are registered. At the end, attached word information for attached word allowable analysis described later is attached. The actual dictionary is accompanied by an index for searching, information indicating the length of words, and the like.
[0045]
As a result of searching the dependency dictionary from the words set as the dependency words and the words set as the reception words, whether or not a headline (for example, “Utsushi Hana”) created from the dependency words and the reception words exists in the dictionary If no dependency is found in the dictionary, it is determined whether or not the end of the test range has been tested (step S440). If not tested to the end, the variable n is set to the value 1 (Step S450), and the process is repeated from step S410. If a dependency is found in the dependency dictionary 98 (step S430), it is determined whether a dependency has already been established for the received word (step S460). As shown in FIG. 9, when the dependency Q1 + R2 is established once, the next time the dependency is determined for the previous phrase P, the process is changed so that the dependency Q1 + R2 already found is prioritized. Because.
[0046]
If no dependency has already been established on the receiver side (ie, word Q in the dependency determination between word P and word Q), the process proceeds to step S470 and subsequent steps, and the dependency is established first with dependency word priority. Search for the word you want. The reason for searching for a dependency-satisfied word with priority on the dependency word is that, according to the determination in step S430, it is known that there is a word that can have a dependency, and for which word the dependency is satisfied, The order on the side of the dependency word is judged first. The state of this search is shown in FIG. When a plurality of words are found for the reading on the side of the dependency word, the first candidate X1 is fixed, and the received word Y is determined according to the learning order already arranged in the independent word dictionary 58. In order from the highest order, it is checked whether the dependency is established in the order of Y1 → Y2 → Y3 → Y4... (Search for reference A1 in FIG. 10). If no word satisfying the dependency is found by this search, the next dependency word X2 is selected and the test is performed in the same manner (search for reference A2).
[0047]
When a combination of a dependency word and a reception word satisfying the dependency read from the dependency dictionary 98 is found by such a search, an admissible word admission analysis is performed (step S480). This process will be described.
[0048]
The admissible analysis of particles determines whether or not the allowable relationship defined by the dependency type is satisfied, and has the following types for each dependency type.
[I] Continuous modification type
▲ 1 Noun + particle + particle in the case of a predicate
Case particles "ga" "from" "de" "to" "ni" "to" "more" "to" "no"
Particle particle "ha"
(2) In the case of idioms + idioms
(3) Optional particle in the case of noun + predicate (particle abbreviation type)
"GA" "HA" particle, adjunct particle
[II] Complex modification type
(4) Noun + particle + particle in case of noun
"of"
(5) Particles in the case of body + body (parallel)
"Ya" "To"
(6) In the case of predicate form + noun
(7) In case of conjunction + noun
[0049]
That is, it is assumed that the relationship between two words determined to be in a dependency relationship belongs to any one of the above-mentioned (1) to (7), and an adjunct word existing between both words in the dependency relationship (mostly Is a particle or a particle expression), the dependency dictionary 98 is set with an allowable particle for a word having a dependency relationship, and this is verified. For example, when the dependency between “promotion” and “handedness” is accompanied by the allowable setting of the particle (no ·), it belongs to the case (1) above (noun + particle + probe). “” And “ga” can exist between the two words (smooth, savvy → ○), but other case particles “kara” and “de” are unacceptable. Would be good →→).
[0050]
Regarding the relations (1) to (7), it is judged that the relations other than those listed are allowed. Examples of determinations that are permitted are listed below, but these may include cases where the dependency does not hold as an actual expression. However, dependency is a broad concept as an actual human language activity, and a too strict dependency arrangement is often not suitable for reality. Also, a too strict dependency arrangement will only lead to a mischievous increase in the dependency dictionary 98 and will also reduce the speed of dependency verification. Therefore, in this embodiment, with respect to the admissibility of attached words, the relations in which dependency occurs are divided from (1) to (7), and those that are clearly permitted or not permitted are permitted in the dependency dictionary. It is memorized as a word with a dependency relationship, and the others are allowed.
[0051]
[III] Permissible expressions-in the case of continuous modification
・ Noun + case particle expression + case particle expression in predicate
“By”, “As”, “For”, “In”, “By”, etc.
・ Noun + particle + verb particle
“Some”, “Even”, “Shi”, “But”, “M”, etc.
・ Noun + adverb + adjunct in adjunct
"Kiri" "About" "One by one" "Only", etc.
・ Noun + adverbial expression + adverbial expression in predicate
"So," "If", etc.
・ Phrase + particle + particle
"Noha" etc.
・ Connecting particles “So” “From” “From” “Te”, etc.
・ Conjunctive particle-like expressions
・ Expression that parallels preaching + preaching “ka” “shi” “ri” “simultaneously”, etc.
[0052]
[IV] Permissible expressions-complex modifications
・ Noun + particle-like expression + particle-like expression in noun
"In", "related", "based on", etc.
・ Phrase + particle-like expression + particle-like expression in noun
"For", "like", "with", "etc."
・ Expression that parallels body language + body language "ka".
[0053]
According to the above rules, the admissibility of an adjunct word between two words for which a dependency relationship is found is determined. In the case of “flower” and “beautiful” given as an example, the allowable case particle is “ga”, and in this case, the formation of dependency is recognized. Therefore, this is determined (step S480), and when the dependency is established, the word for which the establishment is recognized is determined as the first candidate of the dependency word and the received word in the independent word constituting the phrase. Is performed (step S490). In other words, the order of registration by learning homophones in the independent word dictionary 58 is changed. Further, the range from the received word to the related word thus found is registered as a dependency established range, and a process for managing this is performed (step S500), and it is determined whether dependency search has been performed for the entire range ( Proceed to step S510). In addition, when the establishment of the dependency is denied due to an attached word existing between the two words, the first candidate is not changed, and if there is other dependency information about the dependency word and the received word, the same applies. If the test is performed (labor saving in the figure), the process proceeds to the determination as to whether or not the entire range has been completed.
[0054]
If the dependency test has not been completed for the entire dependency test range, that is, the entire dependency analysis range determined by the processing shown in FIG. 6, the variable n is initialized to a value 1 (step S520). Further, the dependency word is moved to the previous phrase (step S530), and the above processing (steps S410 to S500) is repeated until the dependency test is completed for the entire range.
[0055]
If it is determined in step S460 that the dependency has already been established for the received language, it is determined whether the dependency using the received language for which the dependency has already been established is established (step S460). S540). That is, in the example shown in FIG. 9A, when a dependency is established between the word Q1 and the word R2, when the dependency between the word P and the word Q is determined, The word Q1 that is the word is fixed, and the establishment of the dependency is determined. Accordingly, in this case, even if there is a dependency P1 + Q2, it is not adopted. However, if a dependency P2 + Q1 having a received language Q1 is found, this is a dependency. Therefore, as shown in FIG. 9 (B), the dependency of P2 + Q1 + R2 is established. In the example shown in FIG. 4, “flower” + “beautiful” is found for “red”, which is the dependency test range, and “flower” “beautiful” is the first candidate. After learning, the dependency “red” + “nose” exists, and this dependency is not adopted. On the other hand, a dependency of “red” + “flower” is adopted.
[0056]
Therefore, in this case, the first candidate for the related word (word P2 in the example) is determined (step S550). Thereafter, similarly, the management of the dependency range (step S500) and the determination as to whether the dependency test is completed for the entire range (step S510) are performed. When the dependency test has been completed for all ranges determined as the dependency test range, the process returns to “END” and the present routine ends.
[0057]
According to the present embodiment described above, the following effects can be obtained. First, the dependency test range is clarified. That is, the dependency is not determined beyond the left and right parentheses and punctuation marks, and the dependency can be determined in a range close to the range where the dependency extends in the normal syntax. Therefore, the range of the dependency is not increased excessively and the processing does not take excessive time. In the example sentence shown in Fig. 5, the dependency test range is "Akai Hanaga Tsukui" and "I opened the window and I said. Therefore, even if a dependency such as “beautiful” + “window” exists in the dependency dictionary 98, it is not verified as the presence of the dependency.
[0058]
In the present embodiment, the dependency test is started from a position close to the end of the sentence within the range of the dependency test range, and the dependency word is given priority. This configuration is extremely effective in making the determination of word candidates by dependency more desirable. This is because, in Japanese, the predicate side at the end of a sentence often takes on the meaning of the sentence, and the action (generally the predicate at the end of the sentence is the same) and the subject (generally the description at the beginning of the sentence is replaced) It is thought that this is because the actions of the subject often change.
[0059]
Further, when it is determined that the dependency has been established once, the range from the received word to the dependency word is managed as the dependency establishment range, so that the dependency ranges do not intersect. In addition, it is not determined that two or more spoken words receive a single dependent word. In addition, since the establishment of the dependency is judged beyond the adjacent clause (when n ≧ 2), the dependency test can be correctly performed even if the modification by the adverb or the like enters between the dependency relationships. Therefore, when a plurality of dependencies are established, as shown in FIG. 11A, a combination in which independent dependencies are established separately or as shown in FIG. Or a combination in which another dependency is established so as to straddle one dependency, as shown in FIG. 11 (C).
[0060]
Next, a second embodiment of the present invention will be described. The kana-kanji conversion apparatus of the second embodiment executes a dependency test range determination processing routine shown in FIG. 12 instead of the end position search processing routine for determining the dependency range shown in FIG. 6 of the first embodiment. The others are the same as those of the first embodiment. As shown in FIG. 12, the dependency test range determination processing routine of the second embodiment first searches all the clauses that have been input and subjected to the phrase segmentation processing (step S600), and forms a pair of codes, for example, ““ ] And "" "," ("and") ", etc. are determined (step S610). If they are present, the analysis target is moved to the inside (step S620), and there is no longer a pair of codes. For example, the range delimited by the punctuation marks (.?!, Etc., but not including the punctuation marks ",,") is determined as the dependency test range (step S630). On the other hand, for the outside of the range enclosed by the paired codes, the range before and after the range is integrated and the range delimited by the punctuation is determined as the dependency test range (step S640).
[0061]
That is, according to this processing, in the case of the example sentence “He is“ Natsu is hot ”and I did it” shown in FIG. 13, the dependency is made using the first candidate word obtained by the phrase segmentation. The test range is indicated by “Summer is in the paired symbol“ ””. "Thick" and the outside "he is When, to me Let go. And become ( Indicates the break between phrases), and the presence of the dependency B1, B2, B3, "the summer is hot" is learned as the first candidate in the former, And told me. Is learned as the first candidate. Therefore, it is possible to correctly determine the existence of a dependency so as to be established with a large node surrounded by parentheses.
[0062]
In the two embodiments described above, the dependency test is performed in order from the shortest distance between phrases starting from one phrase, but as shown in FIG. First, the test is performed for all the distances between phrases (1) (“C” + “D”, “B” + “C”, “A” + “B”), and then the distance between phrases (2) (“B”). ”+“ D ”,“ A ”+“ C ”), and further distant items (“ A ”+“ D ”) may be tested in order. Even in this case, as shown in FIG. 14, it is desirable to test from the dependency between the following clauses.
[0063]
Although the embodiments of the present invention have been described above, the present invention is not limited to these embodiments. For example, instead of the minimum cost method, another phrase segmentation method such as a two-segment longest match method is used. , A configuration that can switch between dependency word priority and reception word priority in dependency verification, a configuration that limits the maximum inter-sentence distance (in the embodiment, the size of n) for dependency verification to a predetermined size, dependency dictionary Needless to say, the present invention can be implemented in various modes within the scope of the present invention, such as other configurations.
[0064]
【The invention's effect】
  As described above, in the kana-kanji conversion device of claim 1 and the kana-kanji conversion method of claim 6 of the present invention, the input kana character string is segmented by referring to the grammar dictionary and segmented. The distance between phrases from a phrase that starts from a single phrase and refers to the dependency information dictionary that stores the dependency information between the specified phrases, and the existence of the dependency as another phrase Test sequentially from the smallest. On the other hand, if you search for a delimiter that exists in the input kana character string, and a delimiter is found during a dependency test starting from a phrase
Then, with the position as the end position of the dependency test, the dependency test starting from one phrase is ended.In addition, with the single phrase as the starting point as a dependency word, the dependency test of the dependency test is sequentially performed for the next candidate of the target phrase, and after the verification for the next candidate is completed, the determination is made as the starting point. Continue to test the spoken word using the next candidate of one phrase as a related word.Therefore, it is possible to appropriately determine the range for performing the dependency test in accordance with the Japanese syntax structure.In addition, the kana-kanji conversion device does not perform the dependency test only for the first candidate phrase, but also performs the next candidate, and gives priority to the test for the next candidate for the received word, Thereafter, since the test is performed on the next candidate for the clerk, the test on other candidates can be performed together.
[0065]
  In the kana-kanji conversion device according to claim 2 and the kana-kanji conversion method according to claim 7, the input kana character string is segmented by referring to the grammar dictionary, and one segmented phrase is used as a starting point. With reference to a dependency information dictionary storing dependency information between predetermined clauses, the presence of dependency with other clauses is sequentially examined from the ones with the smallest distance between phrases from the starting phrase. On the other hand, in the input kana character string, a range delimited by a pair of codes indicating start and end is searched, and this range is excluded from the range of dependency test.In addition, with the single phrase as the starting point as a dependency word, the dependency test of the dependency test is sequentially performed for the next candidate of the target phrase, and after the verification for the next candidate is completed, the determination is made as the starting point. Continue to test the spoken word using the next candidate of one phrase as a related word.Therefore, the dependency test is performed by skipping a range surrounded by a pair of codes indicating start and end. As a result, it is possible to test a dependency that is established beyond an insertion phrase or insertion clause.In addition, the kana-kanji conversion device does not perform the dependency test only for the first candidate phrase, but also performs the next candidate, and gives priority to the test for the next candidate for the received word, Thereafter, since the test is performed on the next candidate for the clerk, the test on other candidates can be performed together.
[0067]
In the kana-kanji conversion device of claim 4 and the kana-kanji conversion method of claim 8, the priority order of the dependency test is determined when there are three or more segmented phrases, and one segmented phrase is used as a starting point. Since the dependency information with other clauses is checked with reference to the reception information dictionary, the kana kanji with natural Japanese syntax using dependency is sequentially checked from the side closer to the end of the sentence. Candidates for conversion can be obtained.
[0068]
In the kana-kanji conversion device according to claim 5, first, the dependency test near the end of the sentence is performed first, and then only the dependency that is established together with the dependency whose dependency has already been verified in the preceding clause. Since the acceptance is accepted, it is possible to obtain candidates for kana-kanji conversion using natural Japanese syntax without forcing the dependency.
[Brief description of the drawings]
FIG. 1 is a functional block diagram showing an implementation form of a kana-kanji conversion function in a kana-kanji conversion apparatus according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating hardware that realizes a kana-kanji conversion device as an embodiment;
FIG. 3 is a flowchart showing a phrase segmentation process executed in the segment segmentation unit 102;
FIG. 4 is an explanatory diagram showing an example of phrase segmentation by a minimum cost method in the embodiment.
FIG. 5 is an explanatory diagram showing an example sentence that is a subject of a dependency test in the embodiment.
FIG. 6 is a flowchart showing an end position search processing routine.
FIG. 7 is a flowchart showing a dependency test processing routine in the embodiment.
FIG. 8 is an explanatory diagram illustrating an example of a dependency dictionary in the embodiment.
FIG. 9 is an explanatory diagram showing how dependency is established in a plurality of phrases.
FIG. 10 is an explanatory diagram showing priorities in dependency testing.
FIG. 11 is an explanatory diagram showing types when there are a plurality of dependencies in one input character string.
FIG. 12 is a flowchart showing a dependency test range determination process routine of the second embodiment.
FIG. 13 is an explanatory diagram showing an example sentence of a dependency test range determination process in the second embodiment.
FIG. 14 is an explanatory diagram illustrating an example of a dependency test order within a range in which dependency test is performed;
[Explanation of symbols]
21 ... CPU
22 ... ROM
23 ... RAM
24 ... Keyboard
25 ... Keyboard interface
26 ... CRT
27 ... CRTC
28 ... Printer
29 ... Printer interface
30: Hard disk controller (HDC)
31 ... Bus
32 ... Hard disk
40 ... Character input part
42. Conversion control unit
44 ... converted character string output section
50 ... Character string input part
52 ... Character storage
54 ... Independent word candidate creation section
56 ... Independent word analysis position management department
58 ... Autonomous dictionary
64 ... Attached word candidate creation part
66 ... Attached word analysis position management section
68 ... Attached word dictionary
70. Dependency learning department
70 ... Learning Department
72 ... Independent language learning department
74 ... Auxiliary Language Learning Department
76 ... Affix Learning Department
78 ... Character conversion learning unit
80: Word data creation unit
82 ... Connection verification department
84 ... Connection verification table
86 ... Cost calculator
90 ... Dependency candidate adjustment section
92 ... Passive analysis section
94 ... Participant tolerance analysis part
96 ... Dependency range management department
98 ... Dependency dictionary
99: Dependency transposition information adjustment unit
100: Word data storage unit
102 ... sentence segment writing section
104 ... Dependency transposition information adjustment unit
106 ... phrase data storage unit
108 ... Conversion character string output section

Claims

A kana-kanji conversion device for inputting a kana character string and referring to a grammar dictionary to generate a kana-kanji mixed character string candidate,
A segmentation means for segmenting the input kana character string with reference to the grammar dictionary;
A dependency information dictionary storing dependency information between predetermined phrases;
Dependency verification means for verifying the presence of a dependency with other clauses in order from the one having the smallest distance between phrases, starting from the one phrase that has been divided, and referring to the dependency information dictionary;
Delimiter search means for searching for delimiters present in the input kana character string;
The clause that the delimiter is found, and a dependency test completion means for the end position of dependency test, further,
The dependency checking means starts from one of the divided clauses, and in the range to the end position of the dependency check,
Next candidate test means for sequentially determining the next candidate of the target phrase, with the test of the dependency test as a starting word for the phrase,
After the test by the next candidate test means is completed, the next candidate test means for continuing the test of the received word with the next candidate of the one phrase as the starting point as a related word;
Kanji conversion device provided with.

A kana-kanji conversion device for inputting a kana character string and referring to a grammar dictionary to generate a kana-kanji mixed character string candidate,
A segmentation means for segmenting the input kana character string with reference to the grammar dictionary;
A range search means for searching a range delimited by a pair of codes indicating start and end in the input kana character string;
A dependency information dictionary storing dependency information between predetermined phrases;
Dependency verification means for verifying the presence of a dependency with other clauses in order from the one having the smallest distance between phrases, starting from the one phrase that has been divided, and referring to the dependency information dictionary;
A range searched by the range search means is excluded from a range of dependency verification by the dependency verification means, and a test range setting means for determining a dependency test range ; and
The dependency verification means includes the dependency verification range.
Next candidate test means for sequentially determining the next candidate of the target phrase, with the test of the dependency test as a starting word for the phrase,
After the test by the next candidate test means is completed, the next candidate test means for continuing the test of the received word with the next candidate of the one phrase as the starting point as a related word;
Kanji conversion device provided with.

A kana-kanji conversion device according to claim 1 or 2,
The segmentation means considers that the segment to be segmented is composed of independent words or independent words and ancillary words, and performs segmentation by a combination that minimizes the number of independent words constituting the input kana character string. Is a means to do
Kana-kanji conversion device.

A kana-kanji conversion device according to claim 1 or 2, wherein
The dependency verification means, when the segmented phrase is 3 or more, the one segment segmented as a starting point, the dependency information dictionary is referred to the presence of dependency with other phrases, of the three or more clauses, from near the end of the sentence side, a means you sequentially assayed temporary name kanji conversion device.

A kana-kanji conversion device according to claim 4,
The dependency verification means is:
A first test means for first performing a test starting from a related word close to the end of the sentence;
When a dependency is found by the first verification means, pay attention to the phrase ahead of the phrase that is the starting point, and only the dependency that is established with the dependency found by the first verification means. A kana-kanji conversion device comprising: a second verification means that is taken out as a result of receiving.

A kana-kanji conversion method for inputting a kana character string and referring to a grammar dictionary to generate a kana-kanji mixed character string candidate,
The input kana character string is segmented with reference to the grammar dictionary,
Starting from one of the above-mentioned clauses, refer to the dependency information dictionary storing the dependency information between the predetermined clauses, and determine the presence of dependency with other clauses with a small distance between clauses And test sequentially from
Search for a delimiter that exists in the input kana character string,
A clause that was found before Symbol delimiters, and end position of dependency test,
Furthermore, in the dependency test, in the range from one of the segmented clauses to the end position of the dependency test,
One phrase as the starting point is used as a dependency word, and the dependency test of the dependency test is sequentially performed on the next candidate of the target phrase.
A kana-kanji conversion method in which, after the test for the next candidate is completed, the test of the received word is continued using the next candidate of the one phrase as the starting point as a related word .

A kana-kanji conversion method for inputting a kana character string and referring to a grammar dictionary to generate a kana-kanji mixed character string candidate,
In the input kana character string, a range delimited by a pair of codes indicating start and end is searched, and
The input kana character string is segmented with reference to the grammar dictionary,
With reference to the dependency information dictionary storing the dependency information of predetermined phrases while excluding the searched range from the dependency test range, starting from the one phrase that has been divided, and other The existence of the dependency with the clauses of the sentence shall be examined in order from the one with the smallest distance between the clauses ,
Furthermore, in the dependency test,
One phrase as the starting point is used as a dependency word, and the dependency test of the dependency test is sequentially performed on the next candidate of the target phrase.
A kana-kanji conversion method in which, after the test for the next candidate is completed, the test of the received word is continued using the next candidate of the one phrase as the starting point as a related word .

A kana-kanji conversion method according to claim 6 or 7,
In the dependency test, when there are three or more divided clauses, the dependency information dictionary storing dependency information between predetermined clauses is used as a starting point. the presence of dependency with other clauses, of the at least three clauses, from the side close to the end of the sentence, you sequentially test
Kana-kanji conversion method.