JP3847801B2

JP3847801B2 - Character processing apparatus and processing method thereof

Info

Publication number: JP3847801B2
Application number: JP23473994A
Authority: JP
Inventors: 雄二小林; 千佳小楠
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1994-09-29
Filing date: 1994-09-29
Publication date: 2006-11-22
Anticipated expiration: 2021-11-22
Also published as: JPH0895972A

Description

【０００１】
【産業上の利用分野】
本発明は、日本語文を対象として、対象文の文節構造を絞り込み、文節の係り受けによる句構造、文節を構成する単語の読み、表記、品詞をチェックする機能を備えた文字処理装置及びその処理方法に関するものである。
【０００２】
【従来の技術】
従来、日本語文を対象として、その入力文の文節構造を解析する機能を備えた文字処理装置として、仮名読みを入力し、その文節構造を解析して、適切な漢字文字列に変換を行う仮名漢字変換装置がある。
【０００３】
仮名漢字変換では、入力の読み列を解析して、可能な文節候補を作成し、その組み合わせの中から変換候補を決定し、尤もらしい順に提示する。そして、提示された変換候補の中からオペレータが望む候補を選択する。第１候補として提示される変換候補の尤もらしさを高めるため、従来、品詞の異同に変換候補を絞り込む手法、単語の頻度順に並べる手法が実施されてきた。
【０００４】
また、結合価パターンと呼ばれる文型の格パターンを参照して文節構造を決定し、適切な漢字文字列への変換処理を行う手法が提案されている。
【０００５】
更に、「本を読む」のように、対になって表われる用例を、予め用例辞書に登録しておき、変換時にその用例辞書を参照して第１候補を用例辞書中の用例に従って決定する用例変換という手法も実施されてきた。「本を読む」のように単語と単語の関係を記述した用例だけでなく、単語を意味的にグループ化してまとめた意味分類との関係を記述した用例というものも提案されている。例えば、「［具象物］に書く」という用例を登録しておくと、［具象物］である「紙」についてもこの用例が適用され、「紙に書く」という第１変換候補を得ることができる。用例辞書に登録する用例として、意味的に可能な単語対を選ぶと共に、この用例変換の仕組みを利用することにより、正しい変換結果を得る可能性が高まった。
【０００６】
【発明が解決しようとする課題】
しかしながら、従来の用例を参照した文節構造解析処理においては、適用可能な用例のみを参照していたため、意味的に可能なすべての用例を用例辞書中に持つ必要があり、登録されていない用例に対しては、明かに意味的に不整合な組み合わせであっても、これを排除することができず、意味的にありえないおかしな変換結果を出力してしまうことがあった。また、意味的に可能な用例の数は莫大であり、これらすべての用例を格納するための用例辞書メモリ容量も膨大なものと成るため、低コストで精度の高い解析効率を実現することが難しかった。更に、文書分野によっては、用例を適用して優先解釈すると、意味的に不整合な解析結果となることがある。例えば、諺や比喩表現などは技術分野においては、優先しないことが望ましい。一例を上げるならば、「雲をつかむ」という慣用句は、生物について論じた技術文書においては、適用するのは好ましくない。
【０００７】
本発明は、上述の点に鑑みてなされたもので、その目的とするところは、文節構造の第１解析候補として不適切な単語対を抑制用例とし、第１解析候補としてふさわしい用例と区別して格納し、かつ、抑制用例をその抑制すべき分野情報と共に格納することにより、文節構造決定時に入力文の文書分野と合致する抑制用例を検出したならば、抑制用例に合致する単語対を第１解析候補として出力しないように制御する文字処理装置及びその処理方法を提供することである。
【０００８】
【課題を解決するための手段】
上記目的を達成するために、本発明の文字処理装置は以下の構成を有する。
【０００９】
単語の読みと表記とを対応づけて記憶する単語辞書手段と、用例を構成する第１及び第２の単語の表記と、当該用例が優先すべき用例か抑制すべき用例かを示す用例タイプと、当該用例を適用すべき特定の分野があればその分野を示し、適用すべき分野がなければ一般に適用すべきである旨を示す用例適用分野とを対応づけて記憶する用例記憶手段と、仮名文字列を入力するための入力手段と、前記入力手段により入力される文字列を優先して変換すべき分野を指定する分野指定情報を操作者から受け付ける分野指定情報受付手段と、前記分野指定情報受付手段で受け付けた分野指定情報を記憶する分野指定情報記憶手段と、前記単語辞書手段を参照し、前記入力手段により入力された文字列を文節候補に分かつ文節候補作成手段と、前記文節候補作成手段により作成された文節候補のうち、前記用例記憶手段に記憶された用例に該当する単語の組合せを検索する検索手段と、前記検索手段により見つかった単語の組合せが該当する用例のタイプが優先すべき用例か抑制すべき用例かを判定する用例タイプ判定手段と、前記検索手段により見つかった単語の組合せが該当する用例の適用分野が、一般に適用すべきであるか、もしくは特定の分野であって前記分野指定情報記憶手段に記憶された分野指定情報により指定された分野と一致するかを判定する分野判定手段と、前記用例タイプ判定手段により前記用例のタイプが優先すべき用例と判定された場合に、当該用例に該当する単語の組合せを文節候補として優先させ、前記用例タイプ判定手段により前記用例のタイプが抑制すべき用例と判定された場合には、前記分野判定手段により、一般に適用すべきである、もしくは前記指定された分野と一致すると判定された場合には、当該用例に該当する単語の組合せを文節候補として抑制し、前記指定された分野と一致しないと判定された場合には、当該用例に該当する単語の組合せは文節候補として抑制せずに、文節候補を決定する文節候補決定手段とを有する。
【００１２】
また、本発明による文字処理装置の処理方法は以下の工程を有する。
【００１３】
単語の読みと表記とを対応づけて記憶する単語辞書手段と、用例を構成する第１及び第２の単語の表記と、当該用例が優先すべき用例か抑制すべき用例かを示す用例タイプと、当該用例を適用すべき特定の分野があればその分野を示し、適用すべき分野がなければ一般に適用すべきである旨を示す用例適用分野とを対応づけて記憶する用例記憶手段と、仮名文字列を入力するための入力手段と、該入力手段より入力される文字列を優先して変換すべき分野を指定する分野指定情報を記憶する分野指定情報記憶手段と、文節候補を記憶する文節候補記憶手段と、プログラムに基づいて各種処理を実行する処理手段と、前記プログラムが記憶されたプログラム記憶手段とを備え、かつ前記処理手段及び前記プログラム記憶手段と協働することによって実現される文節候補作成手段、検索手段、用例タイプ判定手段、分野判定手段、及び文節候補決定手段を備える文字処理装置の処理方法であって、前記文節候補作成手段が、前記入力手段より入力された文字列を、前記単語辞書手段を参照して文節候補に分かち、作成された文節候補を前記文節候補記憶手段に記憶する文節候補作成工程と、前記検索手段が、前記文節候補記憶手段に記憶された文節候補のうち、前記用例記憶手段に記憶された用例に該当する単語の組合せを検索する検索工程と、前記用例タイプ判定手段が、前記検索工程により見つかった単語の組合せが該当する用例の用例タイプを前記用例記憶手段から参照して、当該用例タイプが優先すべき用例か抑制すべき用例かを判定する用例タイプ判定工程と、前記分野判定手段が、前記検索工程により見つかった単語の組合せが該当する用例の用例適用分野を前記用例記憶手段から参照して、当該用例適用分野が、一般に適用すべきであるか、もしくは特定の分野であって前記分野指定情報記憶手段に記憶された分野指定情報により指定された分野と一致するかを判定する分野判定工程と、前記文節候補決定手段が、前記用例タイプ判定工程により前記用例の用例タイプが優先すべき用例と判定された場合に、当該用例に該当する単語の組合せを文節候補として優先させ、前記用例タイプ判定工程により前記用例の用例タイプが抑制すべき用例と判定された場合には、前記分野判定工程により、一般に適用すべきである、もしくは前記指定された分野と一致すると判定された場合には、当該用例に該当する単語の組合せを文節候補として抑制し、前記指定された分野と一致しないと判定された場合には、当該用例に該当する単語の組合せは文節候補として抑制せずに、文節候補を決定する文節候補決定工程とを有する。
【００１５】
【実施例】
以下、本発明を入力対象を日本語のかな読み列に限定したかな漢字変換装置を例にして、図面を参照しながら詳細に説明する。
【００１６】
図１は、本発明に係る一実施例の全体構成を示す図である。
【００１７】
同図において、１はマイクロプロセッサ（ＣＰＵ）であり、文字処理のための演算、論理判断等を行い、アドレスバス（ＡＢ）、コントロールバス（ＣＢ）、データバス（ＤＢ）を介して接続された各構成要素を制御する。ここで、アドレスバスはＣＰＵ１の制御の対象とする構成要素を指示するアドレス信号を転送する。コントロールバスはＣＰＵ１の制御の対象とする各構成要素のコントロール信号を転送して印加する。データバスは各構成機器相互間のデータの転送を行う。２は読み出し専用の固定メモリ（ＲＯＭ）であり、２ａはＣＰＵ１による制御手順（図９〜図１５）等を記憶するプログラムエリア（ＰＡ）である。
【００１８】
３は１ワード１６ビットで構成される書き込み可能なランダムアクセスメモリ（ＲＡＭ）であり、各構成要素からの各種データの一時記憶に用いられる。また３ａはカナ漢字変換を行うための単語辞書（ＷＤＩＣ）であり、図３にその構成を示す。３ｂは文節と文節との組み合わせを格納した用例辞書（ＹＲＤＩＣ）であり、図４にその構成を示す。３ｃはキーボードより入力された読みを格納する入力読みバッファ・メモリ（ＹＢＵＦ）である。３ｄは文書バッファ（ＴＢＵＦ）であり、キーボードより入力された文書情報を蓄えるためのメモリである。３ｅはカナ漢字変換途中の文節候補を記憶する文節候補テーブル（ＢＣＴ）である。３ｆはカナ漢字変換の未確定文節候補を蓄える同音語バッファメモリ（ＤＢＦ）であり、図６にその構成を示す。３ｇは同音語バッファメモリに対応するカナ漢字変換の他の候補を蓄える同音語プールメモリ（ＤＢＰ）であり、図７にその構成を示す。３ｈは単語辞書に格納されている単語のうち、分野情報の付与されている単語の優先変換を指定するフラグ（ＢＦＬＧ）である。
【００１９】
４は定型文書を記憶するための外部メモリ（ＤＩＳＫ）であり、作成された文書の保管を行い、保管された文書はキーボードの指示により必要な時呼び出される。５はキーボード（ＫＢ）であり、アルファベットキー、ひらがなキー、カタカナキー等の文字記号入力キー及び変換を指示する変換キー等の各種のファンクションキーを備えている。ここで、５ａは読みを入力するためのキー（ＹＯＭＩ）、５ｂは入力した読みを変換するための変換指示キー（ＣＯＮ）、５ｃは変換候補を変更して次候補に変換するための次候補変換指示キー（ＮＸＴ）、５ｄは現在の同音語表示方向に確定し、同時にその候補表記を学習することを指示するための選択キー（ＳＥＬ）、そして、５ｅは単語辞書ＷＤＩＣに格納されている生物分野の優先変換を指定するための生物分野指定キー（ＢＩＯ）である。
【００２０】
６はカーソルレジスタ（ＣＲ）であり、ＣＰＵ１により内容が読み書きされる。後述するＣＲＴコントローラがここに蓄えられたアドレスに対する表示装置上の位置にカーソルを表示する。７は表示用バッファメモリ（ＤＢＵＦ）であり、文書バッファ３ｄに蓄えられた文書情報等のパターンを蓄える。８はＣＲＴコントローラ（ＣＲＴＣ）であり、カーソルレジスタ６及び表示用バッファ７に蓄えられた内容を表示装置に表示する役割を担う。９は陰極線管等を用いた表示装置（ＣＲＴ）であり、その表示装置におけるドットの構成のパターン及びカーソルの表示をＣＲＴコントローラ８で制御する。１０はキャラクタ・ジェネレータ（ＣＧ）であり、表示装置ＣＲＴに表示する文字、記号のパターンを記憶するものである。
【００２１】
かかる各構成要素からなる文字処理装置においては、キーボード５からの各種の入力に応じて作動するものであり、キーボード５からキー入力が供給されると、先ずインタラプト信号がＣＰＵ１に送られ、そのＣＰＵ１がＲＯＭ２内に記憶されている各種の制御信号を読み出し、それらの制御信号に従って各種の制御が行われる。
【００２２】
以上の構成よりなる本実施例での装置において、かな漢字変換を実行する例を図２を参照して以下に説明する。
【００２３】
図２において、左側の入力読み列中の記号「／」は変換を指示する変換キーを表わす。まず（ａ）で、「もじをいしにかく」と入力した場合、第１候補として「文字を石に書く」と変換される。これは用例辞書３ｂに「［物質］に書く」という用例が登録されており、単語辞書３ａで「石」が［物質］であると記述されているからである。同じ読み「いし」を持つ他の「意志」や「医師」は［物質］で無いため、この用例は適用されない。このような用例がないと、「意志に書く」などの意味的に不整合な変換となってしまう。
【００２４】
次に、（ｂ）で「みほんをみずにかく」と入力した場合、（ａ）の場合と同様に、「水」も単語辞書３ａで「物質」とされているが、用例辞書３ｂに、抑制すべき用例として「水に書く」と登録されているため、「［物質］に書く」という用例が適用されず、「水に書く」は第１変換候補から除外される。更に、（ｃ）では入力「くもをつかむ」に対し、一般の文書入力時には、用例「雲をつかむ」が適用され第１変換候補となるが、生物分野文書の入力においては「雲をつかむ」が抑制され、「蜘蛛をつかむ」となる。
【００２５】
図３は、本実施例における文字処理装置の単語辞書３ａの例を示す図である。単語辞書３ａは、１つの単語に対して「読み（ＹＭ）」「表記（ＨＫ）」「品詞（ＧＩ）」「単語尤度（ＦＱ）」「意味分類（ＳＦ）」「分野情報（ＣＴ）」「分野尤度（ＣＦ）」の７フィールドから構成される。ここで、単語尤度には頻度情報等のその単語自体の尤もらしさを示す情報として１〜５の値が格納される。尤度値５が最も尤もらしいと解釈され、値が小さくなるに従って第１変換候補としての尤もらしさを欠く。また、意味分類はその単語の意味を分類して記述したものであり、「概念」「物質」等が格納される。この意味分類は品詞が名詞である単語にのみ付与される。そして、分野尤度は、分野情報に示される分野情報の尤もらしさの度合いを示す値である。
【００２６】
図４は、本実施例における文字処理装置の用例辞書３ｂの例を示す図である。用例辞書３ｂは、１つの用例に対して「第１単語情報（ＬＷ）」「第２単語情報（ＲＷ）」「助詞接続情報（ＣＪ）」「用例適用タイプ（ＴＰ）」「用例適用分野（ＢＹ）」の５フィールドから構成される。ここで、第１及び第２単語情報には単語辞書３ａの該当単語先頭へのポインタが格納される。それぞれの単語情報は単語のほかに意味分類を格納し、意味分類による単語グループを指定することもできる。また、助詞接続情報は第１単語情報と第２単語情報を結びつける助詞を記述するものである。用例適用タイプには、「通常タイプ」と「抑制タイプ」の２タイプのいずれであるかが格納される。「通常タイプ」の用例は、その登録パターンを優先すべき用例であり、「抑制タイプ」の用例は、その登録パターンが第１候補にならない用例である。そして、用例適用分野には、その用例の適用或いは抑制を行うべき分野情報が格納される。例えば、図において「［物質］に書く」という用例は、意味分類［物質］を持つ単語について、一般的に適用可能な用例である。また、「雲をつかむ」は生物分野以外では適用可能な用例であるが、生物分野では、「雲をつかむ」と第１候補になる事を抑制すべき用例である。
【００２７】
図５は、本実施例における文字処理装置の分野指定フラグ３ｈの例を示す図である。分野指定フラグ３ｈは、１分野を１ビットに対応させ、優先変換するか、しないかがセットされる。図示の状態においては、ビットの値が“１”になっている『生物』分野が他の分野に比べ優先変換されやすくなっている。
【００２８】
図６は、本実施例における文字処理装置の文節候補テーブル３ｅの例を示す図である。文節候補テーブル３ｅは、入力読み列の先頭から末尾までで解析可能な文節の列を、１つの文節候補をノードとする木構造で表現したテーブルであり、５フィールドから構成されている。図において、ＢＣＩＤはそれぞれの文節候補をユニークにするＩＤ番号である。ＢＣＪＷは文節候補の自立部単語を表し、その自立部単語の存在する単語辞書３ａへのポインタである。ＢＣＦＷは文節候補の付属部を構成する付属語列であり、付属語の開始位置と付属語末尾位置を入力読みバッファ３ｃ上のインデクスを格納する。ここで、記号「φ」は付属語列が存在しないことを意味する。ＳＬＮＫはその文節候補と文節開始読み位置が同じである別の文節候補をリンクするもので、リンク先の文節候補ＩＤを格納する。ＳＬＮＫの値が“−１”のものは、存在しない文節候補ＩＤであり、終端コードを意味し、その以上同読み開始位置文節が存在しないことを表している。ＤＬＮＫはその文節候補に引き続く文節候補をリンクするもので、ＳＬＮＫと同じリンク先の文節候補ＩＤを格納する。ＤＬＮＫの値が“−１”のものは、ＳＬＮＫにおけると同様、それ以上後続する文節候補が存在しないことを意味する。
【００２９】
例えば、文節候補ＩＤが“０”の「水に」は、同読み開始位置の文節候補として、「見ずに」や「診ずに」を持ち、また後続の文節候補として、「各」及び、その同読み開始位置文節である「核」「書く」「欠く」を持つ。
【００３０】
図７は、本実施例における文字処理装置の同音語バッファ３ｆの例を示す図である。図７において、ＨＩＤは同音語ＩＤであり、文節単位に作られる同音語の一つ一つをユニークにするＩＤ番号である。ＪＷはその同音語の自立部単語を表し、単語辞書３ａへのポインタを格納する。ＦＷはその同音語の付属語列を単語辞書３ａの読みフィールドを構成するカナ読みコードと同様のコードを格納する。ＰＨＩＤはこの同音語と共に用例のペアとなる後続のペア同音語の同音語ＩＤを格納する。ＰＨＩＤが“−１”であるときは、後続のペア同音語が存在しないことを意味する。ＡＰＹＲは第１候補を決定する際に後続のペア同音語との間で適用された用例を用例辞書３ｂの該当用例先頭へのポインタとして格納する。
【００３１】
図８は、本実施例における文字処理装置の同音語プール３ｇの例を示す図である。同音語プール３ｇは、第１候補の情報のみ格納する同音語バッファに対してその同音語の候補として可能な全ての単語候補を格納する。図において、ＨＩＤはどの同音語に対する候補であるかを、対応する同音語バッファＤＢＦの同音語ＩＤとして格納する。ＪＷＰは該当文節の同音語として可能な自立部単語を単語辞書３ａへのポインタとして格納する。ＦＷＰは自立部単語ＪＷＰに対する付属語列を格納する。例えば、同音語ＩＤが“１０”である文節は、「花を」と「鼻を」の二つの同音語候補を持つ。
【００３２】
次に、本実施例における作動を図９に示すフローチャートに従って以下に説明する。
【００３３】
まず、ステップＳ１０１においてキーボード５よりキーが押下され、割り込みが発生するのを待つ。キーが入力されるとステップＳ１０２に進み、入力キーを判別し、入力キーの種類に応じてステップＳ１０３乃至ステップＳ１０８のいずれかのステップに分岐する。
【００３４】
ステップＳ１０３は読み入力キー（ＹＯＭＩ）５ａが押下されたときの処理であり、押下された読みのコードを入力読みバッファ・メモリ（ＹＢＵＦ）３ｃに蓄える。ステップＳ１０４は変換キー（ＣＯＮ）５ｂが押下されたときの処理であり、ステップＳ１０３で入力され蓄積された、カナ漢字変換の対象となる文字列を漢字に変換し、不図示の出力バッファに出力する。ステップＳ１０５は次候補変換キー（ＮＸＴ）５ｃが押下されたときの処理であり、ステップＳ１０４で出力された同音語バッファに対応する同音語プール中の同音語の別の候補を表示する。
【００３５】
ステップＳ１０６は選択キー（ＳＥＬ）５ｄが押下されたときの処理であり、画面に表示されている出力バッファの中の同音語を確定し、確定された文字列を文書中に出力する。更に、選択された単語が第１候補を決定するために適用した用例と一致するかどうかを判定し、必要ならば適用した用例を、分野情報を対応づけて抑制パターンとして記憶する動作を行う。ステップＳ１０７は生物分野指定キー（ＢＩＯ）５ｅが押下されたときの処理であり、指定した分野に対応する優先変換分野指定フラグ（ＢＦＬＧ）３ｈのビットの値を“１”にする。そして、ステップＳ１０８はＹＯＭＩ、ＣＯＮ、ＮＸＴ、ＳＥＬ、ＢＩＯ以外のキー、例えば、カーソル移動キーなどの文書編集で用いるキーなどが押下された場合の処理であり、同種の文字処理装置において一般に行われている処理であり、公知であるので特に説明しない。
【００３６】
次に、ステップＳ１０４の「変換処理」の詳細を図１０に示すフローチャートに従って以下に説明する。
【００３７】
まず、ステップＳ２０１において、詳細は後述する文節候補作成処理を行い、文節候補テーブル（ＢＣＴ）３ｅを作成する。次に、ステップＳ２０２において、詳細は後述する第１候補決定処理を行い、文節候補テーブル３ｅに格納された文節候補の中から第１変換候補として相応しい候補を絞り込む。そして、ステップＳ２０３において、ステップＳ２０２で決定された第１候補に基づいて、変換結果を同音語バッファ（ＤＢＦ）３ｆに作成し出力する。また同時に、第１候補にならなかった同じ読み開始位置、文節長の他の候補を同音語プール（ＤＢＰ）３ｇに格納する。
【００３８】
ここで、ステップＳ２０１の「文節候補作成処理」の詳細を図１１に示すフローチャートに従って以下に説明する。
【００３９】
まず、ステップＳ３０１において、入力の読みバッファ（ＹＢＵＦ）３ｃ上の文節候補作成開始インデックスｉ、文節候補テーブル（ＢＣＴ）３ｅのインデックスｊを“０”に初期設定する。そして、ステップＳ３０２では、ｉの示す読みバッファ３ｃ中の読みに基づき、単語辞書（ＷＤＩＣ）３ａを検索し、単語候補を求める。次に、ステップＳ３０３において、見つかった単語候補に対して接続する付属語列を解析する文節接続検定処理を行い、文節候補を得る。ステップＳ３０４において、得られた文節候補を文節候補テーブル３ｅに格納し、単語検索、文節接続検定の結果得られた自立語単語、付属語列の情報をそれぞれＢＣＪＷ、ＢＣＦＷフィールドに格納し、ＳＬＮＫ、ＤＬＮＫにこの文節候補を持ち得る他の文節候補についても設定する。全ての情報を文節候補テーブル３ｅに格納後、ステップＳ３０５で文節候補テーブル３ｅのインデックスｊをカウントアップする。
【００４０】
次に、ステップＳ３０６において、文節候補開始位置がｉである単語を、即ち読み開始位置が同じである単語を全て検索し終えたかどうかを判定し、同じ開始位置の単語がまだ残っていればステップＳ３０２へ戻り、検索を続行する。また、同じ読み開始位置の単語を全て検索し終えていればステップＳ３０７に進み、文節候補テーブル３ｅに記述された付属後読み列の示す、文節終了位置を、後続文節の読み開始位置として見つける。そして、見つかった文節終了位置を、文節候補開始位置としてｉにセットする。この時に、全ての文節候補が終端していたならば、即ち文節候補テーブル３ｅのＤＬＮＫの値が“−１”であったならば、入力読みの末尾に達しているので、文節候補作成処理を終了する。しかし、終端していなければセットしたｉでステップＳ３０２へ戻り、後続の文節候補作成処理を続行する。
【００４１】
更に、ステップＳ２０２の「第１候補決定処理」の詳細を図１２に示すフローチャートに従って以下に説明する。
【００４２】
まず、ステップＳ４０１において、第１候補決定の指標値となるカウンタメモリｍａｘを“０”に初期設定する。このｍａｘは全ての文節候補の中の最大尤度を示すもので、尤度０は最小の値である。そして、ステップＳ４０２において、文節候補テーブル３ｅより文節候補列を１つ取り出す。次に、ステップＳ４０３において、取り出した文節候補列の文節対尤度を算出する。文節対尤度の算出は以下のように行う。
【００４３】
（文節対尤度）＝｛（単語尤度）の総和｝＋｛（文節間尤度）の総和｝
＋｛（単語の分野尤度）の総和｝
ここで、単語尤度は単語辞書３ａのＦＱフィールドの値を用いる。文節間尤度は文節間の接続の尤もらしさを示す値であり、本実施例においては値を“０”で固定する。単語の分野尤度は単語辞書３ａのＣＴフィールドの分野情報と、変換時の分野を示す分野指定フラグ（ＢＦＬＧ）３ｈが一致した時に、分野尤度ＣＦ×８の値を与える。
【００４４】
次に、ステップＳ４０４において、取り出された文節候補列に適用可能な用例を検索し、用例適用尤度を文節対尤度の加算する。尚、この用例辞書検索処理は図１３に従って更に後述する。そして、ステップＳ４０５において、算出された文節対尤度が最大尤度ｍａｘより大きいかを判定し、大きい場合にはステップＳ４０６に進み、最大尤度を算出された文節対尤度で更新する。そして、ステップＳ４０７において、この最大尤度に対応した文節候補列をワークメモリｄｉｄに格納する。ワークメモリｄｉｄは文節候補テーブル３ｅと同じ構成を持ち、但し、ＳＬＮＫ、ＤＬＮＫの値が、最大尤度を持つ文節候補列に対応した文節候補をリンクするようにユニークとなるように設定される。
【００４５】
ステップＳ４０８において、文節候補テーブル３ｅから別の文節候補列を取り出せるかどうかを判定し、取り出せる場合にはステップＳ４０２へ戻り、検索を続行する。また、全ての文節候補列を処理し終えて、それ以上文節候補列を取り出せない場合にはリターンする。
【００４６】
更に、ステップＳ４０４の「用例辞書検索処理」の詳細を図１３に示すフローチャートに従って以下に説明する。
【００４７】
まず、ステップＳ５０１において、図１２のステップＳ４０２で取り出された文節候補列より、用例検索注目文節を取り出し、用例を構成するペア文節候補があるかどうかを用例辞書（ＹＲＤＩＣ）３ｂを検索する。次のステップＳ５０２において、用例を構成するペア文節候補があるかどうかを判定し、見つからなければリターンする。しかし、見つかればステップＳ５０３以降の処理を行う。
【００４８】
ステップＳ５０３において、見つかった用例のタイプをチェックし、その用例が「抑制タイプ」の用例であったならば、更にステップＳ５０４において、その抑制用例の適用分野をチェックし、適用分野と分野指定フラグ（ＢＦＬＧ）３ｈが示す分野情報が一致していればステップＳ５０５に進み、この文節対尤度の値を“０”にセットすることで、この文節が第１候補となることを抑制する。
【００４９】
一方、ステップＳ５０３において、「通常タイプ」の用例と判定された場合は、用例適用尤度として固定値１０１２を文節対尤度に加算してリターンする。
【００５０】
ここで図９に戻り、ステップＳ１０５の「次候補表示処理」の詳細を図１４に示すフローチャートに従って以下に説明する。
【００５１】
まず、ステップＳ６０１において、次候補を表示させたい同音語の同音ＩＤを取得する。次のステップＳ６０２では、同音語プール（ＤＢＰ）３ｇの同じ同音ＩＤを持つ次候補を、同音語プールの自立語ＪＷＰと付属後列ＦＷＰより取り出し、同音語バッファ（ＤＢＦ）３ｆの対応するＪＷ及びＦＷフィールドに格納し、これを表示する。
【００５２】
次に、ステップＳ１０６の「選択処理」の詳細を図１５に示すフローチャートに従って以下に説明する。
【００５３】
まず、ステップＳ７０１において、選択処理を行う同音語の同音ＩＤを取り出し、続くステップＳ７０２では、取り出した同音ＩＤを持つ同音語バッファ３ｆ中のペア同音語ＩＤを取り出す。そして、ステップＳ７０３において、選択対象同音語の自立語と、ペア同音語の自立語を取り出し、ステップＳ７０４において、この自立語のペアが同音語バッファ３ｆ中のＡＰＹＲフィールドに格納されている第１候補決定時の適用用例と一致するかどうかを判定する。このＡＰＹＲで示される用例を用例辞書３ｂより取得し、この用例を構成する自立語のペアと一致しているかどうかで判定する。もし、ＡＰＹＲに示される用例が、意味分類で記述されているならば、同音語バッファ中の自立語ＪＷの意味分類を比較することにより、一致しているかどうかのチェックを行う。
【００５４】
第１候補決定時に適用した用例が、選択時にも成立しているならば、抑制用例の登録処理は不要であると判定し、リターンする。また、用例が一致しなかった場合には、この適用用例を抑制タイプとして、登録するためにステップＳ７０５以降の処理を行う。
【００５５】
ステップＳ７０５において、分野指定フラグ（ＢＦＬＧ）３ｈより、現在指定されている分野情報を取り出す。これはＢＦＬＧのビットが“１”となっている全ての分野を取り出すことである。次に、ステップＳ７０６において、用例辞書３ｂに、対象同音語バッファの適用用例ＡＰＹＲをステップＳ７０５で取り出した適用分野情報と共に、「抑制タイプ」の用例として登録することで、以後その指定分野において、第１候補決定時に適用されないようにする。
【００５６】
＜変形例＞
上述した実施例では、文節構造解析を行う文字処理装置として、入力対象文を仮名文字のみに限定した仮名漢字変換装置を例にあげたが、本発明はこれだけに限定されるものではない。文節構造を解析し、文節間の関係を登録された反例に基づいて、反例を解析候補から排除する処理を行う本発明の主旨を逸脱するものでなければよい。例えば、日本語文の文節構造を解析する文字処理装置として、漢字仮名混じり文を入力対象文として、文節構造を解析し、文の意味的整合性をチェックし、誤り指摘を行う文書校正装置として本発明を実施することも可能である。
【００５７】
また、本実施例においては、用例辞書にはペアで出現する単語の組合せとして、２単語（或いは意味分類）のペアであるとしたが、単語のペアは３つ組或いは一般的にｎ個のペア（ここでｎは２以上の整数）であっても同様に処理することができる。更に、用例を記憶する用例辞書と、反例を記憶する反例辞書は、同一の辞書とし、適用タイプで用例か、反例かを判別するようにしたが、それぞれを別の辞書として持つようにしても同様である。
【００５８】
更に、本実施例においては、反例は既に存在しているものとしたが、反例は予め用意されたものではなく、オペレータの操作により漸次、蓄積されてゆくのみであっても構わない。また、反例は反例を構成する単語の単語辞書へのポインタにより表現しているが、ポインタではなく、反例を構成する単語を特定できるものであれば、読み、表記、品詞など単語辞書に持っている、単語を特定化している属性で格納してもよいし、或いは単語辞書に格納されている各単語にユニークな連番を割り付け、その連番で反例を構成する単語を特定化するようにしてもよい。
【００５９】
更にまた、反例の変わりに適用されるべき用例と共存させることにより、より第１候補の解析精度を向上させることができる。
【００６０】
また、指定できる分野情報として、「医学分野」「生物分野」など８分野の例を提示したが、この８分野に限定されるものではない。分野情報として、単語、用例、及び反例の使われる意味的なグループのみならず、「挨拶文」「論説文」「散文」といった文章様態を指定するものであってもようし、オペレータ、及びオペレータの所属といった使用者ごとに設定するようにしても同様に処理することができる。
【００６１】
尚、本発明は、複数の機器から構成されるシステムに適用しても、１つの機器から成る装置に適用しても良い。また、システム或いは装置にプログラムを供給することによって達成される場合にも適用できることは言うまでもない。
【００６２】
【発明の効果】
以上説明したように、本発明によれば、特定の単語の組合せを、文節候補として優先させることができ、また特定の分野においては文節候補として抑制し、特定の分野以外においては文節候補として抑制しないことができ、あるいは分野によらずに抑制することもできるので、分野に応じて柔軟に入力対象とする日本語文の変換精度を高めることができる。
【００６３】
また、本発明によれば、文節構造としてふさわしい用例の全てを網羅的に登録しなくとも、ふさわしくない単語対を、抑制すべき分野の情報と対応付けて登録することにより、メモリ容量を最小限に抑えることができ、低いコストで日本語文の文節構造解析精度の高い文字処理装置を実現することができる。
【００６４】
更に、本発明によれば、文節構造としてふさわしくない単語対を意味分野で登録することができるため、ふさわしくない反例の網羅性を高め、なおかつ、よりメモリ容量を小さく抑えることが可能である。
【００６５】
【図面の簡単な説明】
【図１】本実施例の文字処理装置の全体構成を示すブロック図である。
【図２】本実施例の仮名漢字変換の実行例を示した図である。
【図３】本実施例の単語辞書の構成を示した図である。
【図４】本実施例の用例辞書の構成を示した図である。
【図５】本実施例の分野指定フラグの構成を示した図である。
【図６】本実施例の文候補テーブルの構成を示した図である。
【図７】本実施例の同音語バッファの構成を示した図である。
【図８】本実施例の同音語プールの構成を示した図である。
【図９】本実施例の処理手順を示すフローチャートである。
【図１０】本実施例の変換処理を示すフローチャートである。
【図１１】本実施例の文節候補作成処理を示すフローチャートである。
【図１２】本実施例の第１候補決定処理を示すフローチャートである。
【図１３】本実施例の用例辞書検索処理を示すフローチャートである。
【図１４】本実施例の次候補表示処理を示すフローチャートである。
【図１５】本実施例の選択処理を示すフローチャートである。
【符号の説明】
１ＣＰＵ
２ＲＯＭ
３ＲＡＭ
４外部メモリ
５キーボード
６カーソルレジスタ
７表示用バッファ
８ＣＲＴコントローラ
９表示装置
１０キャラクタジェネレータ[0001]
[Industrial application fields]
The present invention relates to a Japanese sentence, a character processing apparatus having a function of narrowing down a phrase structure of a target sentence, checking a phrase structure by phrase dependency, reading of words constituting the phrase, notation, and part of speech, and processing thereof It is about the method.
[0002]
[Prior art]
Conventionally, as a character processing device with a function to analyze the phrase structure of an input sentence for a Japanese sentence, input a kana reading, analyze the phrase structure, and convert it to an appropriate kanji character string There is a Kanji conversion device.
[0003]
In Kana-Kanji conversion, input reading sequences are analyzed to create possible phrase candidates, conversion candidates are determined from the combinations, and presented in the most likely order. Then, a candidate desired by the operator is selected from the presented conversion candidates. In order to increase the likelihood of conversion candidates presented as the first candidate, conventionally, a method of narrowing down conversion candidates to different parts of speech and a method of arranging them in order of word frequency have been implemented.
[0004]
Also, a method has been proposed in which a phrase structure is determined with reference to a sentence pattern case pattern called a valence pattern, and conversion processing to an appropriate kanji character string is performed.
[0005]
In addition, examples that appear in pairs, such as “read a book”, are registered in the example dictionary in advance, and the first candidate is determined according to the examples in the example dictionary by referring to the example dictionary at the time of conversion. A technique called example conversion has also been implemented. In addition to examples that describe the relationship between words such as “read a book”, examples that describe relationships with semantic classifications that group words together are proposed. For example, if the example of “write on [concrete object]” is registered, this example is also applied to “paper” which is [concrete object], and the first conversion candidate “write on paper” can be obtained. it can. By selecting a semantically possible word pair as an example to be registered in the example dictionary and using this example conversion mechanism, the possibility of obtaining a correct conversion result has increased.
[0006]
[Problems to be solved by the invention]
However, in the phrase structure analysis process that refers to the conventional example, only the applicable examples are referred to, so it is necessary to have all the semantically possible examples in the example dictionary, and the examples are not registered. On the other hand, even a combination that is clearly semantically inconsistent cannot be eliminated, and an unusually strange conversion result may be output. In addition, the number of semantically possible examples is enormous, and the capacity of the example dictionary memory for storing all these examples is enormous, making it difficult to achieve high-precision analysis efficiency at low cost. It was. Furthermore, depending on the document field, if an example is applied and priority interpretation is performed, analysis results that are semantically inconsistent may be obtained. For example, it is desirable not to give priority to ecstasy and metaphor expression in the technical field. To give an example, the idiom “grab a cloud” is not preferred in technical documents that discuss organisms.
[0007]
The present invention has been made in view of the above points, and its purpose is to distinguish an inappropriate word pair as the first analysis candidate of the phrase structure from the example suitable for the first analysis candidate. If the suppression example that matches the document field of the input sentence is detected when determining the phrase structure by storing the suppression example together with the field information to be suppressed, the first word pair that matches the suppression example It is an object to provide a character processing device and a processing method for controlling such that it is not output as an analysis candidate.
[0008]
[Means for Solving the Problems]
In order to achieve the above object, the character processing apparatus of the present invention has the following configuration.
[0009]
Word dictionary means for storing word readings and notations in association with each other, notation of first and second words constituting the example, and an example type indicating whether the example should be prioritized or suppressed , An example storage means for indicating a specific field to which the example is to be applied and indicating the field, and indicating that it should be generally applied if there is no field to be applied; Input means for inputting a character string, and field designation information for designating a field to be converted with priority given to the character string input by the input means. The field designation information accepting means received from the operator and the field designation information received by the field designation information accepting means The field designation information storage means for storing, the word dictionary means, the character string input by the input means is divided into phrase candidates, the phrase candidate creating means, and the phrase candidate created by the phrase candidate creating means The search means for searching for a combination of words corresponding to the example stored in the example storage means, and whether the combination of the words found by the search means corresponds to the example to be prioritized or the example to be suppressed The application field of the example to which the combination of the example type determination means for determination and the word found by the search means corresponds should be generally applied, or is a specific field and stored in the field designation information storage means The field determination means for determining whether the field matches the field specified by the field specification information, and the example type determination means should prioritize the type of the example. When it is determined as an example, the combination of words corresponding to the example is given priority as a phrase candidate, and when the example type determination unit determines that the example type should be suppressed, the field determination unit Therefore, if it is determined that it should be generally applied or matches the specified field, the word combination corresponding to the example is suppressed as a phrase candidate and determined not to match the specified field In such a case, the combination of words corresponding to the example is not suppressed as a phrase candidate, and has phrase candidate determination means for determining a phrase candidate.
[0012]
The processing method of the character processing apparatus according to the present invention includes the following steps.
[0013]
Word dictionary means for storing word readings and notations in association with each other, notation of first and second words constituting the example, and an example type indicating whether the example should be prioritized or suppressed , An example storage means for indicating a specific field to which the example is to be applied and indicating the field, and indicating that it should be generally applied if there is no field to be applied; Input means for inputting a character string, field designation information storage means for storing field designation information for designating a field to be converted with priority given to a character string input from the input means, and a phrase for storing phrase candidates Candidate storage means; processing means for executing various processes based on a program; A phrase candidate creation means, a search means, an example type judgment means, a field judgment means, which are realized by cooperating with the processing means and the program storage means. Means for determining phrase candidates A processing method of a character processing device comprising: Create phrase candidates Means for dividing the character string input from the input means into phrase candidates with reference to the word dictionary means, and storing the created phrase candidates in the phrase candidate storage means; Search A means for searching for a combination of words corresponding to the example stored in the example storage unit among the phrase candidates stored in the phrase candidate storage unit; Example type judgment An example type determination step in which means refers to an example type of an example to which a combination of words found by the search step corresponds from the example storage unit, and determines whether the example type is a priority example or a suppression example. And said Field judgment The means refers to the example application field of the example to which the combination of words found in the search step corresponds from the example storage unit, and the example application field should be generally applied or is a specific field. A field determination step of determining whether or not the field specified by the field specification information stored in the field specification information storage means matches the field Determination of phrase candidates When it is determined in the example type determination step that the example type of the example is a priority example, the means prioritizes a combination of words corresponding to the example as a phrase candidate, and the example type determination step If the example type is determined to be an example to be suppressed, the field determination step should generally apply, or if it is determined to match the specified field, the word corresponding to the example If it is determined that the combination is not matched with the specified field, the combination of words corresponding to the example is not suppressed as a phrase candidate, and the phrase candidate is determined. Process.
[0015]
【Example】
Hereinafter, the present invention will be described in detail with reference to the drawings, taking as an example a kana-kanji conversion device in which an input target is limited to a Japanese kana reading sequence.
[0016]
FIG. 1 is a diagram showing an overall configuration of an embodiment according to the present invention.
[0017]
In the figure, reference numeral 1 denotes a microprocessor (CPU) which performs operations for character processing, logical determination, etc., and is connected via an address bus (AB), a control bus (CB), and a data bus (DB). Control each component. Here, the address bus transfers an address signal indicating a component to be controlled by the CPU 1. The control bus transfers and applies a control signal of each component to be controlled by the CPU 1. The data bus transfers data between the component devices. Reference numeral 2 denotes a read-only fixed memory (ROM), and reference numeral 2a denotes a program area (PA) for storing control procedures (FIGS. 9 to 15) by the CPU 1.
[0018]
Reference numeral 3 denotes a writable random access memory (RAM) composed of 16 bits per word, and is used for temporary storage of various data from each component. Reference numeral 3a denotes a word dictionary (WDIC) for performing kana-kanji conversion, and FIG. 3 shows its configuration. Reference numeral 3b denotes an example dictionary (YRDIC) storing combinations of phrases and phrases, and FIG. 4 shows the configuration thereof. An input reading buffer memory (YBUF) 3c stores readings input from the keyboard. Reference numeral 3d denotes a document buffer (TBUF), which is a memory for storing document information input from the keyboard. 3e is a phrase candidate table (BCT) that stores phrase candidates in the middle of kana-kanji conversion. 3f is a homophone buffer memory (DBF) for storing unconfirmed phrase candidates for Kana-Kanji conversion, and FIG. 6 shows the configuration thereof. 3g is a homophone pool memory (DBP) for storing other candidates for kana-kanji conversion corresponding to the homophone buffer memory, and FIG. 7 shows its configuration. 3h is a flag (BFLG) that designates preferential conversion of words to which field information is given among words stored in the word dictionary.
[0019]
Reference numeral 4 denotes an external memory (DISK) for storing a standard document. The created document is stored, and the stored document is called when necessary by an instruction from the keyboard. Reference numeral 5 denotes a keyboard (KB), which includes various function keys such as alphabetic keys, hiragana keys, character symbol input keys such as katakana keys, and conversion keys for instructing conversion. Here, 5a is a key for inputting a reading (YOMI), 5b is a conversion instruction key (CON) for converting the input reading, and 5c is a next candidate for changing a conversion candidate and converting it to a next candidate. A conversion instruction key (NXT), 5d is determined in the current homophone display direction, and at the same time a selection key (SEL) for instructing to learn the candidate notation, and 5e is stored in the word dictionary WDIC. This is a biological field designation key (BIO) for designating priority conversion of the biological field.
[0020]
Reference numeral 6 denotes a cursor register (CR), whose contents are read and written by the CPU 1. A CRT controller, which will be described later, displays a cursor at a position on the display device with respect to the address stored here. Reference numeral 7 denotes a display buffer memory (DBUF), which stores patterns such as document information stored in the document buffer 3d. Reference numeral 8 denotes a CRT controller (CRTC), which plays a role of displaying the contents stored in the cursor register 6 and the display buffer 7 on the display device. Reference numeral 9 denotes a display device (CRT) using a cathode ray tube or the like, and a dot configuration pattern and cursor display in the display device are controlled by a CRT controller 8. Reference numeral 10 denotes a character generator (CG) which stores character and symbol patterns to be displayed on the display device CRT.
[0021]
The character processing device including the respective components operates in response to various inputs from the keyboard 5. When a key input is supplied from the keyboard 5, an interrupt signal is first sent to the CPU 1, and the CPU 1 Reads various control signals stored in the ROM 2 and performs various controls in accordance with these control signals.
[0022]
An example of performing kana-kanji conversion in the apparatus of the present embodiment having the above-described configuration will be described below with reference to FIG.
[0023]
In FIG. 2, the symbol “/” in the left input reading string represents a conversion key for instructing conversion. First, in (a), when “Moji is Ishiyake” is entered, “character is written on stone” is converted as the first candidate. This is because the example “write in [substance]” is registered in the example dictionary 3b, and “stone” is described as [substance] in the word dictionary 3a. Other “wills” and “doctors” with the same reading “Ishi” are not [substance], so this example is not applicable. Without such an example, a semantically inconsistent conversion such as “write to will” would result.
[0024]
Next, when “Mihon Mizukunikaku” is entered in (b), “Water” is also designated as “Substance” in the word dictionary 3a as in (a), but in the example dictionary 3b, Since “write in water” is registered as an example to be suppressed, the example “write in [substance]” is not applied, and “write in water” is excluded from the first conversion candidates. Further, in (c), the input “grab the cloud” is input, and the example “grab a cloud” is applied as a first conversion candidate when inputting a general document. However, in the input of a biological field document, “grab the cloud”. Will be suppressed and become "grab".
[0025]
FIG. 3 is a diagram illustrating an example of the word dictionary 3a of the character processing device according to the present embodiment. The word dictionary 3a includes “reading (YM)”, “notation (HK)”, “part of speech (GI)”, “word likelihood (FQ)”, “semantic classification (SF)”, and “field information (CT)” for one word. "7 fields of field likelihood (CF)". Here, a value of 1 to 5 is stored in the word likelihood as information indicating the likelihood of the word itself such as frequency information. Likelihood value 5 is interpreted as most likely, and as the value decreases, the likelihood of being the first conversion candidate is lacking. The semantic classification describes the meaning of the word and stores “concept”, “substance”, and the like. This semantic classification is given only to words whose part of speech is a noun. The field likelihood is a value indicating the likelihood of the field information indicated in the field information.
[0026]
FIG. 4 is a diagram illustrating an example of the example dictionary 3b of the character processing device according to the present embodiment. The example dictionary 3b includes “first word information (LW)”, “second word information (RW)”, “particle connection information (CJ)”, “example application type (TP)”, “example application field ( BY) ". Here, the first and second word information stores a pointer to the head of the corresponding word in the word dictionary 3a. Each word information stores a semantic classification in addition to the word, and a word group based on the semantic classification can be designated. Further, the particle connection information describes a particle that connects the first word information and the second word information. The example application type stores either “normal type” or “suppression type”. The “normal type” example is an example in which the registered pattern should be given priority, and the “suppression type” example is an example in which the registered pattern does not become the first candidate. In the example application field, field information on which the example should be applied or suppressed is stored. For example, the example of “write in [substance]” in the figure is an example that can be generally applied to words having a semantic classification [substance]. In addition, “grab a cloud” is an example that can be applied outside the biological field, but in the biological field, “grab a cloud” is an example that should be suppressed from becoming a first candidate.
[0027]
FIG. 5 is a diagram illustrating an example of the field designation flag 3h of the character processing device according to the present embodiment. The field designation flag 3h is set to indicate whether one field corresponds to one bit and whether priority conversion is performed or not. In the state shown in the figure, the “biology” field whose bit value is “1” is easier to be preferentially converted than the other fields.
[0028]
FIG. 6 is a diagram illustrating an example of the phrase candidate table 3e of the character processing device according to the present embodiment. The phrase candidate table 3e is a table in which a sequence of phrases that can be analyzed from the beginning to the end of the input reading sequence is represented by a tree structure having one phrase candidate as a node, and is composed of five fields. In the figure, BCID is an ID number that makes each phrase candidate unique. BCJW represents an independent part word of a phrase candidate and is a pointer to the word dictionary 3a where the independent part word exists. BCFW is an adjunct word string that constitutes an appendix of phrase candidates, and stores an index on the input reading buffer 3c for an adjunct start position and adjunct end position. Here, the symbol “φ” means that there is no attached word string. SLNK links another phrase candidate whose phrase start reading position is the same as that phrase candidate, and stores a linked phrase candidate ID. A SLNK value of “−1” is a phrase candidate ID that does not exist, means a termination code, and indicates that there is no more same reading start position phrase. DLNK links phrase candidates subsequent to the phrase candidate, and stores the phrase candidate ID of the same link destination as SLNK. When the value of DLNK is “−1”, it means that there is no further phrase candidate as in SLNK.
[0029]
For example, the phrase candidate ID “0” “Mizu ni” has “without looking” or “without checking” as the phrase candidate at the same reading start position, and “each” and “without checking” as the subsequent phrase candidates. , Has the same reading start position clause "nuclear""write""missing".
[0030]
FIG. 7 is a diagram illustrating an example of the homophone buffer 3f of the character processing device according to the present embodiment. In FIG. 7, HID is a homophone ID, and is an ID number that makes each of the homophones created for each phrase unique. JW represents a self-supporting word of the homophone and stores a pointer to the word dictionary 3a. The FW stores a code similar to the kana reading code that constitutes the reading field of the word dictionary 3a for the attached word string of the homophone. The PHID stores the homophone ID of the subsequent pair homophone that is a pair of examples together with the homophone. When PHID is “−1”, it means that there is no subsequent pair homophone. The APYR stores the example applied between the subsequent pair homophones when determining the first candidate as a pointer to the head of the corresponding example in the example dictionary 3b.
[0031]
FIG. 8 is a diagram illustrating an example of the homonym pool 3g of the character processing device according to the present embodiment. The homophone pool 3g stores all possible word candidates as homophone candidates for the homophone buffer that stores only the information of the first candidate. In the figure, which homophone is a candidate for HID is stored as the homophone ID of the corresponding homophone buffer DBF. JWP stores independent part words that can be used as homophones of the corresponding phrase as pointers to the word dictionary 3a. The FWP stores an attached word string for the independent part word JWP. For example, a phrase having a homophone ID “10” has two homophone candidates “flower” and “nose”.
[0032]
Next, the operation in the present embodiment will be described with reference to the flowchart shown in FIG.
[0033]
First, in step S101, a key is pressed from the keyboard 5 to wait for an interrupt to occur. When a key is input, the process proceeds to step S102, where the input key is determined, and the process branches to one of steps S103 to S108 depending on the type of the input key.
[0034]
Step S103 is processing when the reading input key (YOMI) 5a is pressed, and stores the pressed reading code in the input reading buffer memory (YBUF) 3c. Step S104 is processing when the conversion key (CON) 5b is pressed. The character string to be converted into kana-kanji converted and input in step S103 is converted into kanji and output to an output buffer (not shown). To do. Step S105 is processing when the next candidate conversion key (NXT) 5c is pressed, and displays another candidate for the homophone in the homophone word pool corresponding to the homophone buffer output in step S104.
[0035]
Step S106 is processing when the selection key (SEL) 5d is pressed, confirms the homophone in the output buffer displayed on the screen, and outputs the confirmed character string in the document. Further, it is determined whether or not the selected word matches the example applied to determine the first candidate, and if necessary, the applied example is stored as a suppression pattern in association with the field information. Step S107 is processing when the biological field designation key (BIO) 5e is pressed, and the value of the bit of the priority conversion field designation flag (BFLG) 3h corresponding to the designated field is set to “1”. Step S108 is processing when a key other than YOMI, CON, NXT, SEL, BIO, for example, a key used for document editing such as a cursor movement key is pressed, and is generally performed in the same type of character processing apparatus. This process is well known and will not be described in particular.
[0036]
Next, the details of the “conversion process” in step S104 will be described with reference to the flowchart shown in FIG.
[0037]
First, in step S201, a phrase candidate creation process (detailed later) is performed to create a phrase candidate table (BCT) 3e. Next, in step S202, a first candidate determination process, which will be described later in detail, is performed to narrow down candidates suitable as first conversion candidates from the phrase candidates stored in the phrase candidate table 3e. In step S203, based on the first candidate determined in step S202, the conversion result is created and output to the homophone buffer (DBF) 3f. At the same time, other candidates for the same reading start position and phrase length that did not become the first candidate are stored in the homophone pool (DBP) 3g.
[0038]
Details of the “sentence candidate creation process” in step S201 will be described below with reference to the flowchart shown in FIG.
[0039]
First, in step S301, the phrase candidate creation start index i on the input reading buffer (YBUF) 3c and the index j of the phrase candidate table (BCT) 3e are initialized to “0”. In step S302, the word dictionary (WDIC) 3a is searched based on the reading in the reading buffer 3c indicated by i to obtain a word candidate. Next, in step S303, a phrase connection verification process for analyzing an attached word string to be connected to the found word candidate is performed to obtain a phrase candidate. In step S304, the obtained phrase candidates are stored in the phrase candidate table 3e, and the independent word words and the attached word string information obtained as a result of word search and phrase connection test are stored in the BCJW and BCFW fields, respectively, SLNK, Other phrase candidates that can have this phrase candidate in DLNK are also set. After all the information is stored in the phrase candidate table 3e, the index j of the phrase candidate table 3e is counted up in step S305.
[0040]
Next, in step S306, it is determined whether or not all words having the same phrase start position, i.e., words having the same reading start position, have been searched. Returning to S302, the search is continued. If all the words at the same reading start position have been searched, the process advances to step S307 to find the phrase end position indicated by the attached post-reading sequence described in the phrase candidate table 3e as the reading start position of the subsequent phrase. Then, the found phrase end position is set to i as the phrase candidate start position. At this time, if all the phrase candidates are terminated, that is, if the DLNK value of the phrase candidate table 3e is “−1”, the end of the input reading has been reached. finish. However, if not terminated, the process returns to step S302 with i set, and the subsequent phrase candidate creation process is continued.
[0041]
Details of the “first candidate determination process” in step S202 will be described below with reference to the flowchart shown in FIG.
[0042]
First, in step S401, a counter memory max serving as an index value for determining the first candidate is initialized to “0”. This max indicates the maximum likelihood among all the phrase candidates, and the likelihood 0 is the minimum value. In step S402, one phrase candidate string is extracted from the phrase candidate table 3e. Next, in step S403, the phrase pair likelihood of the extracted phrase candidate string is calculated. The phrase pair likelihood is calculated as follows.
[0043]
(Phrase pair likelihood) = {sum of (word likelihood)} + {sum of (inter-phrase likelihood)}
+ {Sum of (word field likelihood)}
Here, the value of the FQ field of the word dictionary 3a is used for the word likelihood. The phrase likelihood is a value indicating the likelihood of connection between phrases, and is fixed at “0” in the present embodiment. The field likelihood of a word is given as a field likelihood CF × 8 when the field information in the CT field of the word dictionary 3a matches the field designation flag (BFLG) 3h indicating the field at the time of conversion.
[0044]
Next, in step S404, an example applicable to the extracted phrase candidate string is searched, and the example application likelihood is added to the phrase pair likelihood. This example dictionary search process will be further described later with reference to FIG. In step S405, it is determined whether the calculated phrase pair likelihood is greater than the maximum likelihood max. If it is larger, the process proceeds to step S406, and the maximum likelihood is updated with the calculated phrase pair likelihood. In step S407, the phrase candidate string corresponding to the maximum likelihood is stored in the work memory did. The work memory did has the same configuration as the phrase candidate table 3e, except that SLNK and DLNK values are set to be unique so as to link the phrase candidates corresponding to the phrase candidate string having the maximum likelihood.
[0045]
In step S408, it is determined whether another phrase candidate string can be extracted from the phrase candidate table 3e. If it can be extracted, the process returns to step S402 to continue the search. When all the phrase candidate strings have been processed and no more phrase candidate strings can be extracted, the process returns.
[0046]
Details of the “example dictionary search process” in step S404 will be described below with reference to the flowchart shown in FIG.
[0047]
First, in step S501, an example search attention phrase is extracted from the phrase candidate string extracted in step S402 of FIG. 12, and the example dictionary (YRDIC) 3b is searched for whether there is a pair phrase candidate constituting the example. In the next step S502, it is determined whether there is a pair clause candidate constituting the example, and if not found, the process returns. However, if it is found, the processing after step S503 is performed.
[0048]
In step S503, the type of the found example is checked. If the example is the example of “suppression type”, the application field of the suppression example is further checked in step S504, and the application field and field designation flag ( If the field information indicated by (BFLG) 3h matches, the process proceeds to step S505, and the phrase pair likelihood value is set to “0” to suppress the phrase from becoming the first candidate.
[0049]
On the other hand, if it is determined in step S503 that the example is “normal type”, the fixed value 1012 is added to the phrase pair likelihood as the example application likelihood, and the process returns.
[0050]
Returning to FIG. 9, the details of the “next candidate display process” in step S105 will be described below according to the flowchart shown in FIG.
[0051]
First, in step S601, a homophone ID of a homophone for which the next candidate is to be displayed is acquired. In the next step S602, the next candidate having the same homophone ID in the homophone pool (DBP) 3g is extracted from the independent word JWP of the homophone pool and the attached rear row FWP, and the corresponding JW and FW in the homophone buffer (DBF) 3f are extracted. Store it in the field and display it.
[0052]
Details of the “selection process” in step S106 will be described below with reference to the flowchart shown in FIG.
[0053]
First, in step S701, the homophone ID of the homophone to be selected is extracted, and in the subsequent step S702, the pair homophone ID in the homophone buffer 3f having the extracted homophone ID is extracted. Then, in step S703, the independent word of the selection target homophone and the independent word of the pair homophone are extracted, and in step S704, this independent word pair is stored in the APYR field in the homophone buffer 3f. It is determined whether it matches the application example at the time of determination. The example shown by this APYR is acquired from the example dictionary 3b, and it is determined whether or not it matches with the independent word pair constituting this example. If the example shown in APYR is described by semantic classification, it is checked whether or not they match by comparing the semantic classification of independent words JW in the homophone buffer.
[0054]
If the example applied at the time of determining the first candidate is also established at the time of selection, it is determined that the registration process of the suppression example is unnecessary, and the process returns. If the examples do not match, the processing from step S705 onward is performed to register the application example as a suppression type.
[0055]
In step S705, the currently designated field information is extracted from the field designation flag (BFLG) 3h. This is to extract all fields in which the BFLG bit is “1”. Next, in step S706, the application example APYR of the target homonym buffer is registered in the example dictionary 3b together with the application field information extracted in step S705 as an example of “suppression type”. It is not applied when determining one candidate.
[0056]
<Modification>
In the above-described embodiments, the kana-kanji conversion device in which the input target sentence is limited to only kana characters is taken as an example of the character processing device that performs the phrase structure analysis, but the present invention is not limited to this. The phrase structure may be analyzed as long as it does not depart from the gist of the present invention, which performs processing for excluding counterexamples from analysis candidates based on counterexamples in which relations between phrases are registered. For example, as a character processing device that analyzes the phrase structure of a Japanese sentence, this is a document proofreading device that analyzes kanji structure, checks the semantic consistency of sentences, points out errors, and uses kanji-kana mixed sentences as input target sentences. It is also possible to carry out the invention.
[0057]
In the present embodiment, the example dictionary has two word (or semantic classification) pairs as a combination of words appearing in pairs. However, there are three word pairs or generally n word pairs. Even a pair (where n is an integer of 2 or more) can be processed in the same manner. Furthermore, the example dictionary for storing examples and the counterexample dictionary for storing counterexamples are the same dictionary, and it is determined whether the application type is an example or a counterexample. It is the same.
[0058]
Furthermore, in the present embodiment, the counterexamples already exist. However, the counterexamples are not prepared in advance, and may only be gradually accumulated by the operation of the operator. The counterexample is expressed by a pointer to the word dictionary of the word constituting the counterexample, but if it can identify the word constituting the counterexample instead of the pointer, it can be read, written, part of speech, etc. It may be stored with the attribute specifying the word, or a unique serial number is assigned to each word stored in the word dictionary, and the word constituting the counterexample is specified by the serial number. May be.
[0059]
Furthermore, by coexisting with an example to be applied instead of a counterexample, the analysis accuracy of the first candidate can be further improved.
[0060]
Moreover, although examples of eight fields such as “medical field” and “biological field” have been presented as field information that can be specified, the present invention is not limited to these eight fields. As the field information, not only the semantic group in which words, examples, and counterexamples are used, but also text forms such as “greeting text”, “article”, and “prose” may be specified. Even if it is set for each user such as the affiliation, the same processing can be performed.
[0061]
The present invention may be applied to a system composed of a plurality of devices or an apparatus composed of a single device. It goes without saying that the present invention can also be applied to a case where the object is achieved by supplying a program to a system or apparatus.
[0062]
【The invention's effect】
As explained above, according to the present invention, a specific word combination is Can be given priority as a phrase candidate, Can be suppressed as a phrase candidate in a specific field, and not as a phrase candidate outside a specific field. Or can be suppressed regardless of the field Therefore, the conversion accuracy of the Japanese sentence to be input can be increased flexibly according to the field.
[0063]
In addition, according to the present invention, it is possible to minimize the memory capacity by registering unsuitable word pairs in association with information on fields to be suppressed without comprehensively registering all examples suitable as phrase structures. Therefore, it is possible to realize a character processing device with high sentence structure analysis accuracy of a Japanese sentence at a low cost.
[0064]
Furthermore, according to the present invention, word pairs that are not suitable as phrase structures can be registered in the semantic field, so that it is possible to improve the completeness of counterexamples that are not suitable and to further reduce the memory capacity.
[0065]
[Brief description of the drawings]
FIG. 1 is a block diagram showing an overall configuration of a character processing apparatus according to an embodiment.
FIG. 2 is a diagram illustrating an execution example of kana-kanji conversion according to the present embodiment.
FIG. 3 is a diagram showing a configuration of a word dictionary according to the present embodiment.
FIG. 4 is a diagram showing a configuration of an example dictionary of the present embodiment.
FIG. 5 is a diagram showing a configuration of a field designation flag according to the present embodiment.
FIG. 6 is a diagram illustrating a configuration of a sentence candidate table according to the embodiment.
FIG. 7 is a diagram showing a configuration of a homophone buffer of the present embodiment.
FIG. 8 is a diagram showing a configuration of a homonym pool of the present example.
FIG. 9 is a flowchart showing a processing procedure of the present embodiment.
FIG. 10 is a flowchart illustrating conversion processing according to the present exemplary embodiment.
FIG. 11 is a flowchart illustrating phrase candidate creation processing according to the embodiment.
FIG. 12 is a flowchart illustrating first candidate determination processing according to the embodiment.
FIG. 13 is a flowchart showing an example dictionary search process according to the present embodiment.
FIG. 14 is a flowchart showing a next candidate display process according to the embodiment.
FIG. 15 is a flowchart illustrating selection processing according to the embodiment.
[Explanation of symbols]
1 CPU
2 ROM
3 RAM
4 External memory
5 Keyboard
6 Cursor register
7 Display buffer
8 CRT controller
9 Display device
10 Character generator

Claims

Word dictionary means for storing word reading and notation in association with each other;
The notation of the first and second words constituting the example, an example type indicating whether the example is to be prioritized or suppressed, and the specific field to which the example is to be applied are indicated. An example storage means for storing an example application field indicating that the field should be generally applied if there is no field to be applied;
An input means for inputting a kana character string;
Field designation information receiving means for receiving field designation information for designating a field to be converted with priority given to the character string input by the input means ;
Field designation information storage means for storing field designation information received by the field designation information reception means ;
Referring to the word dictionary means, dividing the character string input by the input means into phrase candidates, and phrase candidate creating means;
Search means for searching for a combination of words corresponding to the example stored in the example storage means among the phrase candidates created by the phrase candidate creation means;
An example type determination means for determining whether the combination of words found by the search means corresponds to an example to be prioritized or to be suppressed;
The application field of the example to which the combination of words found by the search means corresponds should be generally applied, or is a specific field specified by the field designation information stored in the field designation information storage means Field judgment means for judging whether or not the field matches,
When the example type determination unit determines that the example type is an example to be prioritized, the combination of words corresponding to the example is given priority as a phrase candidate, and the example type determination unit suppresses the type of the example. If it is determined to be a usage example, it should be generally applied by the field determination means, or if it is determined to match the specified field, a combination of words corresponding to the example is selected as a phrase candidate. And a phrase candidate determining unit that determines a phrase candidate without suppressing a combination of words corresponding to the example as a phrase candidate when it is determined that the specified field does not match. Character processing device characterized by.

The word dictionary means stores the word reading, the notation, and the semantic classification in association with each other, and the example storage means further includes a first word notation constituting the example, a second word semantic classification, , An example type that indicates whether the example should be preferred or suppressed, and the specific field to which the example should be applied, that field should be indicated; The character processing apparatus according to claim 1, wherein an example application field indicating the effect is stored in association with each other.

Word dictionary means for storing word readings and notations in association with each other, notation of first and second words constituting the example, and an example type indicating whether the example should be prioritized or suppressed , An example storage means for indicating a specific field to which the example is to be applied and indicating the field, and indicating that it should be generally applied if there is no field to be applied; Input means for inputting a character string, field designation information storage means for storing field designation information for designating a field to be converted with priority given to a character string input from the input means, and a phrase for storing phrase candidates a candidate storing means, and processing means for executing various processes based on programs, that the program has a stored program storage means, and cooperating with said processing means and said program storing means Phrase candidate creating means implemented Te, search means, a processing method of a character processing apparatus comprising example type determination means, the field determining means, and the phrase candidate determination means,
A phrase candidate creating step in which the phrase candidate creating means divides the character string input from the input means into phrase candidates with reference to the word dictionary means, and stores the created phrase candidates in the phrase candidate storage means; ,
Said search means, a search step wherein among the stored phrase candidate phrase candidate storage unit, to search for combinations of words corresponding to the examples stored in the example storage means,
The example type determining means refers to the example type of the example to which the combination of words found by the search step corresponds from the example storage means, and determines whether the example type is an example to be prioritized or suppressed. Example type determination process;
The field determination means refers to the example application field of the example to which the combination of words found in the search step corresponds from the example storage unit, and the example application field should generally be applied or specified A field determination step for determining whether the field matches the field specified by the field specification information stored in the field specification information storage means;
When the phrase candidate determining means determines that the example type of the example is to be prioritized by the example type determining step, the phrase candidate determining unit prioritizes a combination of words corresponding to the example as a phrase candidate, and the example type determining step If the example type of the example is determined to be an example to be suppressed, the field determination step should generally apply, or if the example type is determined to match the specified field, the example If the combination of words corresponding to is suppressed as a phrase candidate and it is determined that it does not match the specified field, the phrase combination is determined without suppressing the word combination corresponding to the example as a phrase candidate. And a phrase candidate determination step for performing the processing.