JP3772214B2

JP3772214B2 - Natural sentence ambiguity elimination device and natural sentence ambiguity elimination program

Info

Publication number: JP3772214B2
Application number: JP2003132527A
Authority: JP
Inventors: 敦子木田; 英子山本; 享子桝山; 均井佐原
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2003-05-12
Filing date: 2003-05-12
Publication date: 2006-05-10
Anticipated expiration: 2023-05-12
Also published as: JP2004334729A

Description

【０００１】
【発明の属する技術分野】
本発明は、コンピュータに入力済の日本語自然文からなるテキストデータについて、その自然文が表す意味の曖昧性を解消する装置及びそのプログラムに関するものである。
【０００２】
【従来の技術】
中世以前の日本語には、係助詞と文末の活用形とが形態的な呼応関係を持つ「係り結び」の用法が存在したが、「係り結び」が消滅した現代の日本語の文章の場合、述語が文末に置かれるため、文の終末まで進まないとその文章の内容が確定しない。そのため、長文で複雑な内容の文章では、その内容が肯定的なのか否定的なのか、或いは疑問を表しているのかが文末まで読まないことには明らかにならない。すなわち、同一の自然文中に一つの「係り」となる語に対応する「結び」となる語が複数出現した場合、その文の意味を理解する上でどの「結び」語が真に「係り」語に対応しているかを的確に把握することが難しいといえる。ここで、現代日本語の文構造の研究において、現代語ではある種の副詞などが古語の係助詞と似た役割を果たしており、後続要素を予告しているとの示唆がなされている（例えば、非特許文献１参照）。例えば「たぶん……だろう」や「おそらく……だろう」などといった組み合わせは、呼応関係を形成する先行要素（呼要素）及び後続要素（応要素）のペアとして内省や直感である程度予測がつくと考えられると指摘されている（例えば、非特許文献２参照）。
【０００３】
【非特許文献１】
大野晋，「係り結びの研究」，第１版，岩波書店，１９９３年１月１２日，ｐ３５０−３５１
【非特許文献２】
益岡隆志，「モダリティの文法」，第１版，くろしお出版，１９９１年５月２５日，ｐ２９−４６
【０００４】
【発明が解決しようとする課題】
ところが、このような呼応ペアについては未だ体系立てた研究がなされておらず、上述した文献や教科書等においても少数の呼応ペアが例示されるに留まっているのが現状である。すなわち、現代日本語における「係り結び」の研究では、内省や直感では予測し得ないような呼応ペアが不足しているために、ある語とそれと共に現れる（共起する）語とが本当に呼応関係にあるのか否かを明らかにするには基礎的データが余りにも少ないといわざるを得ない。したがって、ユーザが自分でコンピュータに入力した日本語の文章が正しい呼応関係にある「係り結び」の表現を行っているのか検証を行ったり、或いはユーザが他者から受け取った日本語自然文がの意味を理解するには、そのユーザ自身の直感や内省に頼るしかないのが現状である。このような問題は、入力された日本語自然文が長文であればあるほど顕著となる。
【０００５】
そこで本発明は、以上のような問題に鑑みて、日本語文における「係り結び」を形成する呼応ペアを蓄積したデータベースを利用することで、入力済の日本語自然文からなるテキストデータにおいて、真に正しいと推測される呼応関係を見出し、当該自然文の曖昧性を解消しようとするものである。
【０００６】
【課題を解決するための手段】
すなわち、本発明は、日本語の自然文中において共起する２つの語のうち係り結びを形成する呼要素と応要素とを対にした呼応ペアの集合を、当該呼応ペアを含む元の自然文中における呼要素と応要素との平均語間距離を付加した状態で格納してある呼応ペアデータベースを検索することによって、入力された日本語自然文テキストにおける呼応関係の曖昧性を解消するコンピュータからなる自然文曖昧性解消装置、並びに当該コンピュータを自然文曖昧性解消装置として機能させるためのプログラムである。
【０００７】
図１に基本的な機能構成図を実線で示すように、この自然文曖昧性解消装置Ａは、日本語テキストデータの入力を受け付ける入力受付手段１と、入力受付手段１で受け付けた日本語テキストデータについて形態素解析を実行する形態素解析手段３と、形態素解析手段３で形態素解析された各形態素に先頭から順に語番号を付与する語番号付与手段４と、形態素解析手段３で形態素解析された形態素から所定の呼要素を抽出する呼要素抽出手段５と、呼応ペアデータベースＤＢ１から、前記呼要素抽出手段５で抽出した呼要素に対応する応要素を検索し、その検索結果に基づいて形態素解析手段３で形態素解析された形態素のうち応要素に該当するものを抽出する応要素抽出手段６と、呼要素抽出手段５で抽出した呼要素と前記応要素抽出手段６で抽出した全ての応要素との語間距離を前記語番号付与手段４で付与された語番号に基づいて算出する語間距離演算手段８と、語間距離演算手段８による全ての演算結果を平均して得られる平均語間距離と所定の関係にある語間距離を有する応要素を前記呼応ペアデータベースＤＢ１から特定する応要素特定手段９とを具備していることを特徴とするものである。
【０００８】
ここで、呼要素とは、現代日本語文において「係り結び」を構成する２単語のうち「係り」に該当する語を意味する。呼要素となり得る単語には、例えば『基礎日本語文法改訂版』（益岡隆志、田窪行則、くろしお出版、１９９２年）の分類によると、「提題助詞」、「取り立て助詞」、「陳述の副詞」が該当する。具体的な対象語としては、「こそ」、「しか」、「さえ」、「は」、「も」、「ばかり」、「のみ」、「すら」、「なら」、「くらい（ぐらい）」、「だけ」、「なんて」、「けっして（決して）」、「おそらく（恐らく）」、「たぶん（多分）」、「ぜひ（是非）」、「まるで」、「もし」、「きっと」等の語を挙げることができるが、必ずしもこれらに限定されるわけではい。そして、「結び」に該当する「応要素」は、現代日本語自然文において「呼要素」が出現した場合にそれと同時に出現する単語である。本発明では、「呼要素」と「応要素」とが同一文中に同時に出現することを「共起する」と定義するとともに、この「共起」する「呼要素」と「応要素」との組み合わせを「共起ペア」と呼び、その「共起ペア」のうち「呼要素」と「応要素」とがこの順で出現することを「呼応する」と定義するとともに、この「呼応」する「呼要素」と「応要素」との組み合わせを特に「呼応ペア」と呼ぶものとする。そして、呼応ペアデータベースＤＢ１には、多数の現代日本語の自然文データを原文データとして抽出した呼要素と応要素とが対をなす「呼応ペア」として格納されている。さらに、さらに、これら呼応ペアのデータは、呼要素ごとに分類されており、各呼応ペアには、呼応ペアデータベースを生成する基礎となった多数の原文において当該呼要素と応要素との語間距離の平均値が付与されている。この語間距離は、原文を形態素解析した場合に、隣接する形態素同士の距離を１として求められている。
【０００９】
したがって、このように自然文曖昧性解消装置Ａとしてコンピュータを機能させることで、入力済の日本語自然文から「係り結び」すなわち呼応関係にある語を選び出す。そして、呼要素とそれに対応する一以上の応要素との各語間距離と、呼応ペアデータベースＤＢ１に格納されている当該呼要素と当該応要素との平均距離とを比較して、対応する１つの応要素を選出することで、その自然文における呼応関係を明確にして、ユーザによる文章の検証や理解を補助することができる。ここで、「入力」とは、キーボードやマウスやタブレット等の入力装置を用いてコンピュータに文字データを入力することや、他のコンピュータ等の外部の機器等から入力を受けることなどを意味する。また、「平均語間距離と所定の関係にある語間距離を有する応要素」とは、平均語間距離と同一又は近似する語間距離を有する応要素や、平均語間距離と一定の演算式関係にある語間距離を有する応要素などを意味している。
【００１０】
特に、この自然文曖昧性解消装置Ａを利用するユーザに対する文章理解や文章検証の補助をより利便性の高い者とするためには、コンピュータを、入力受付手段１で受け付けた日本語テキストデータを出力する出力手段２と、呼要素抽出手段５で抽出した呼要素と前記応要素抽出手段６で抽出した応要素との関連性を示す関連性情報を出力する関連性情報出力手段７と、応要素特定手段９で特定した応要素を明示する応要素明示情報を出力する応要素明示手段１０としてさらに機能させることが好ましく、さらに視覚的に使い勝手もよいものとするには、出力手段２、関連性情報出力手段７、応要素明示手段１０を、それぞれディスプレイ等の表示装置に日本語テキストデータ、関連性情報、応要素明示情報を表示出力するものとするとよい。
【００１１】
【発明の実施の形態】
以下、本発明の一実施形態を、図面を参照して説明する。
【００１２】
この実施形態に係る自然文曖昧性解消装置Ａは、図１に機能構成を示したように、日本語自然文から「係り結び」を形成する呼要素と応要素とのペアである呼応ペアデータを収集した呼応ペアデータベースＤＢ１を利用して、入力済の日本語自然文を表すテキストデータにおける一の呼要素と一以上の応要素とを見出し、その文章の意味を示す一の応要素を特定して、当該日本語自然文の検証やユーザによる理解を補助するためのものである。この自然文曖昧性解消装置Ａは、呼応ペアデータベースＤＢ１を内蔵し又は外部に接続して検索することができる状態にあるコンピュータにより構成されるものである。このコンピュータは、図２に概略的な機器構成図を示すように、バス線等で電気的に接続されたＣＰＵ１０１、メモリ１０２等の内部機器を有しており、これらにハードディスク等の記憶装置１０３、ディスプレイ等の表示装置１０４、キーボードやマウス等の入力装置１０５、各種通信インターフェース１０６等の外部機器を備えた通常のパーソナルコンピュータ等からなり、例えば外部に呼応ペアデータベースＤＢ１を通信線を介して接続してある。なお、この呼応ペアデータベースＤＢ１に格納されるデータは、ハードディスク等の記憶装置に格納させることもできる。
【００１３】
そして、記憶装置１０３に記録した自然文曖昧性解消プログラムをＣＰＵ１０１が読み出してメモリ１０２に記憶させ、当該ＣＰＵ１０１が前記プログラムに従った処理を行い、メモリ１０２、ハードディスク等の記憶装置１０３、モニタ等の表示装置１０４、キーボードやマウス等の入力装置１０５、各種通信インターフェース１０６等の機器を駆動させることによって、このコンピュータは、自然文曖昧性解消装置Ａとして機能する。ここで、自然文曖昧性解消装置Ａの機能とは、図１に示したように、入力受付手段１、出力手段２、形態素解析手段３、語番号付与手段４、呼要素抽出手段５、応要素抽出手段６、関連性情報出力手段７、語間距離演算手段８、応要素特定手段９、応要素明示手段１０を指す。特に、入力受付手段１は、キーボードやマウス等の入力装置１０５からユーザによって入力されたデータを受け付けてメモリ１０２に格納する機能を有するものであり、出力手段２、意味情報出力手段４及び確定意味情報出力手段６は、ディスプレイ等の表示装置１０４に文字を表示させる機能を有するものである。
【００１４】
また、呼応ペアデータベースＤＢ１は、例えば新聞記事等から収集した膨大な日本語自然文のデータを原文データとしてこれを形態素解析し、その形態解析後のデータに基づいて同一文中で共起した一対の語（共起ペア）を補完類似度の演算により求め、その共起ペアのなかから「呼要素」「応要素」の順で原文データ中に出現したものについて信頼度を演算することによって得られた「呼応ペア」を格納したものである。ここで「信頼度」とは、呼応ペアごとに着目して、ある呼要素が出現する原文数に対する当該呼応ペアが出現する原文数の割合を意味し、最終的に得られた呼応ペアはこの信頼度が所定値以上を示したものである本実施形態では原文データとして「毎日新聞記事データ」と「日経新聞記事データ」の各１０年分を利用している。ここで、呼要素は、本実施形態では「提題助詞」、「取り立て助詞」、「陳述の副詞」に分類される語、具体的には、「こそ」、「しか」、「さえ」、「は」、「も」、「ばかり」、「のみ」、「すら」、「なら」、「くらい（ぐらい）」、「だけ」、「なんて」、「けっして（決して）」、「おそらく（恐らく）」、「たぶん（多分）」、「ぜひ（是非）」、「まるで」、「もし」、「きっと」を採用して予め漸進的文解釈支援プログラムに記述してあり、呼応ペアデータベースＤＢ１ではこれら呼要素ごとに呼応ペアが分類されている。そして、この呼応ペアデータベースＤＢ１においては、各原文データを形態素解析して得られる各形態素に先頭から語番号を付与しておき、各呼応ペアについて、当該呼応ペアが出現した原文データにおいて応要素の語番号と呼要素の語番号との差を語間距離とし、その呼応ペアが出現した全原文データにおける語間距離の平均値を「平均語間距離」として付与した状態でデータを格納してある。なお、ここでは隣接する形態素間の語間距離を１としている。図３に、呼要素「決して」に関する呼応ペアデータベースＤＢ１の内容の一部を示す。同図中欄には、呼要素「決して」に対応する応要素が記述されており、各応要素の右欄には、呼要素「決して」と当該応要素との平均語間距離が記述されている。
【００１５】
自然文曖昧性解消装置Ａの動作は、上述した入力受付手段１、出力手段２、形態素解析手段３、語番号付与手段４、呼要素抽出手段５、応要素抽出手段６、関連性情報出力手段７、語間距離演算手段８、応要素特定手段９、応要素明示手段１０に対応して次のように行われる。
【００１６】
この自然文曖昧性解消装置Ａは、図４のフローチャートに示すように、まず、入力受付手段１の機能によってキーボード等の入力装置１０５から日本語自然文のテキストデータの入力を受け付けると（Ｓ１）、そのテキストデータを一時的にメモリ１０２に格納しつつ出力手段２の機能によって即時的にディスプレイ等の表示装置１０４に当該文字列を表示していく（Ｓ２）。次に、入力を受け付けたテキストデータをメモリ１０２から形態素解析手段３が読み出して形態素解析を実行し、当該テキストデータを品詞分解する（Ｓ３）。ここで、形態素解析プログラムには、例えば日本語形態素解析プログラムとして、「ＪＵＭＡＮ」（http://www.lab25.kuee.kyoto-u.ac.jp/nl-resource/juman.html）や「茶筅」(http://chasen.aist-nara.ac.jp/)を用いることができる。そして、語番号付与手段４の機能によって、各形態素に対して文頭から順に語番号を付与する（Ｓ４）。また、呼要素抽出手段５の機能により、形態素解析手段３の形態素解析結果から得られた形態素から上述した呼要素に該当する形態素を抽出する（Ｓ５）。この抽出された形態素は、一時的にメモリ１０２に格納される。さらに、応要素抽出手段６の機能によって、先に抽出してある呼要素について呼応ペアデータベースＤＢ１を検索して、当該呼要素に対応する応要素に該当する応要素となる一以上の形態素を、ステップＳ３で形態素解析された形態素から抽出する（Ｓ６）。この抽出された応要素も一時的にメモリ１０２に格納される。ここで、関連性情報出力手段７の機能により、メモリ１０２から呼要素と応要素とを読み出して、関連性情報として呼要素を明示するマーク（一例として、当該呼要素を○印で囲むマークや当該呼要素に付された下線）と、当該マークから応要素に向けて延びる矢印マークとを表示装置１０４に出力して、ステップＳ２で表示装置１０４に表示させた文字列に重ね合わせて表示させる（Ｓ７）。その一方、語間距離演算手段８の機能によって、メモリ１０２に格納されている各応要素の語番号から応要素の語番号を差し引く演算を実行し、各応要素についての語間距離を算出する（Ｓ８）。この演算結果である語間距離も対応する応要素に関連づけてメモリ１０２に格納される。さらに、応要素特定手段９の機能によって、各応要素に関する語間距離と、呼応ペアデータベースＤＢ１に格納されている当該応要素の平均語間距離とを比較し、ここでは平均語間距離に最も近い値を示した語間距離に対応する応要素を、ステップＳ５で抽出した呼要素に対応する応要素として確定する（Ｓ９）。最後に、応要素明示手段１０の機能によって、ステップＳ９で確定された応要素を明示するための応要素明示情報を表示装置１０４に表示出力する（Ｓ１０）。ここでは応要素明示情報を表示する処理の一例として、当該応要素に下線を引くとともに、呼要素から当該応要素へ延びる矢印マークと他の応要素へ延びる矢印マークとを区別するために、該当する矢印マークを実線で表して他の矢印マークを破線で表す方法を採用しているが、矢印マークを色分けするなどの他の処理を行ってもよい。
【００１７】
ここで、図５を参照して、入力後にディスプレイ等の表示装置１０４に表示された日本語自然文における曖昧性解消の具体例を挙げて説明する。なお、以下の例は、呼要素「決して」に関するものである。まず、図５（ａ）に示すように、「道のりは決して楽ではない。」という日本語自然文が入力装置１０５を利用して入力され表示装置１０４に表示される場合、上述した自然文曖昧性解消装置Ａの動作のフローチャートに従って、この自然文について形態素解析が行われ各形態素に語番号が付与されたうえで、呼要素「決して」が抽出される。なお、同図に示した「｜」（縦線）は、隣接する形態素間の区切りを示している。そして、図３に示したような呼応ペアデータベースＤＢ１を参照して、呼要素「決して」に対応する応要素として「楽」「で」「は」「ない」が抽出されると、関連性情報として呼要素「決して」から各応要素「楽」「で」「は」「ない」に対して矢印が付され、これが表示装置１０４に表示される。次いで、呼要素「決して」と各応要素との語間距離が算出されて、その語間距離に最も近い平均語間距離を有する応要素「ない」が呼応ペアデータベースＤＢ１から確定されたうえで、前記関連性情報のうち「決して」から「ない」に向かう矢印が応要素明示情報として太線で表示装置１０４に表示される。したがって、この自然文曖昧性解消装置Ａを使用するユーザは、入力した又は入力された自然文中に正しい「係り結び」表現が用いられており、そのなかでも「決して」と「ない」とが正しい呼応関係にあることが分かる。なお、このような短い自然文に限らず、例えば図５（ｂ）に示すような比較的長い自然文の場合であっても、上述した手順と同様にして、「決して」と「ない」とが正しい呼応関係にあることを明示することができる。
【００１８】
以上のようにして、本実施形態では、入力済の日本語自然文のテキストデータ中から、当該文中において真に呼応するというに妥当する呼要素と応要素とを明示して、複数の呼応関係による自然文の曖昧さを解消することができる。そのため、この自然文曖昧性解消装置Ａを利用するユーザが、自分で入力したテキストデータに基づく文章が正しい呼応関係を有しているか否かを検証したり、他者から受け取ったテキストデータに基づく文章の真の意味を理解することに大いに役立つことになる。このことは、この自然文曖昧性解消装置Ａや自然文曖昧性解消プログラムを、日本語入力装置や日本語入力プログラムおいて自然文の検証装置又は検証プログラムの一部として応用することができることになる。
【００１９】
なお、本発明は上述した実施形態に限られるものではなく、「決して」以外の呼要素についても同様にして、入力後の日本語自然文から正しい「係り結び」関係にある呼要素と応要素とを抽出し、その日本語自然文の曖昧性を解消することができる。また、例えば、呼要素「きっと」に対応する応要素「違い」と応要素「ない」とを組み合わせた複数の形態素からなる応要素「違いない」を生成しておき、これとその平均語間距離とを呼応ペアデータベースに格納しておけば、上述した実施形態と同様にして、より実用的に真の呼要素と応要素との組み合わせを入力済のテキストデータから抽出することができる。
【００２０】
その他、各部の具体的構成や機能についても上記実施形態に限られるものではなく、本発明の趣旨を逸脱しない範囲で種々変形が可能である。
【００２１】
【発明の効果】
以上に詳述したように、本発明に係る自然文曖昧性解消装置又はそのための自然文曖昧性解消プログラムによれば、入力済のテキストデータに基づく日本語自然文中において、複数の呼応関係が出現してその文章の意味が曖昧となった場合に、当該自然文中の呼要素と応要素との語間距離のうち、当該呼要素と当該応要素との平均的な語間距離と所定の関係にある語間距離を有し呼応関係にある応要素を機械的に選出するようにしているので、最も確からしい呼応関係が明らかとなって曖昧性が解消され、ユーザが自然文の意味を理解したり自然文において正しい呼応関係を用いているかを検証したりするのに非常に役立つ。また、日本語入力プログラムや日本語検証プログラム等に適用することで、本発明は、それらの信頼性を向上することも可能である。
【図面の簡単な説明】
【図１】本発明及びその一実施形態に係る自然文曖昧性解消装置の機能構成を概略的に示す図。
【図２】同実施形態に係る自然文曖昧性解消装置を構成するコンピュータの概略的機器構成図。
【図３】同実施形態において利用される呼応ペアデータの一部を示す図。
【図４】同自然文曖昧性解消装置の動作の概観を示すフローチャート。
【図５】同自然文曖昧性解消装置を利用したテキストデータの曖昧性解消の一具体例を示す図。
【符号の説明】
Ａ…自然文曖昧性解消装置
ＤＢ１…呼応ペアデータベース
１…入力受付手段
２…出力手段
３…形態素解析手段
４…語番号付与手段
５…呼要素抽出手段
６…応要素抽出手段
７…関連性情報出力手段
８…語間距離演算手段
９…応要素特定手段
１０…応要素明示手段[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a device and a program for resolving the ambiguity of the meaning represented by a natural sentence in text data composed of a Japanese natural sentence already input to a computer.
[0002]
[Prior art]
In Japanese before the Middle Ages, there was a usage of “moment knot” in which the counselor and the utilization form at the end of the sentence had a morphological correspondence, but in the case of modern Japanese sentences where “moment knot” disappeared, predicate Is placed at the end of the sentence, so the contents of the sentence cannot be finalized unless it reaches the end of the sentence. For this reason, it is not clear that a sentence with a long and complex content does not read until the end of the sentence whether the content is positive or negative, or represents a question. That is, when multiple words that become “knots” corresponding to a single word that is related to a single “student” appear in the same natural sentence, which “knot” word is truly “study” for understanding the meaning of the sentence. It can be said that it is difficult to accurately grasp whether a word is supported. Here, in the study of sentence structure in contemporary Japanese, it has been suggested that certain adverbs in modern languages play a role similar to the counsel in ancient words, and foresee subsequent elements (for example, Non-Patent Document 1). For example, combinations such as “maybe ... probably” and “probably… probably” can be predicted to some extent by introspection and intuition as a pair of preceding element (calling element) and succeeding element (response element) that form a responsive relationship. It is pointed out that it is considered to be attached (for example, refer nonpatent literature 2).
[0003]
[Non-Patent Document 1]
Satoshi Ohno, “Study on Ties”, 1st Edition, Iwanami Shoten, January 12, 1993, p350-351
[Non-Patent Document 2]
Masashi Takashi, "Grammar of Modality", 1st Edition, Kuroshio Publishing, May 25, 1991, p29-46
[0004]
[Problems to be solved by the invention]
However, no systematic research has been conducted on such responsive pairs, and a small number of responsive pairs are exemplified in the above-mentioned documents and textbooks. In other words, in the study of “relationship” in modern Japanese, there is a lack of responsive pairs that cannot be predicted by introspection and intuition, so a word and a word that appears (co-occurs) with it are really responsive. It is necessary to say that there is too little basic data to clarify whether they are related. Therefore, it is possible to verify whether the Japanese text entered by the user himself / herself into the computer expresses the correct “relationship” expression, or the meaning of the natural Japanese text received by the user from another person. The only way to understand is to rely on the user's own intuition and introspection. Such a problem becomes more prominent as the input natural Japanese sentence is longer.
[0005]
Therefore, in view of the above problems, the present invention uses a database that accumulates responsive pairs that form a “knot” in a Japanese sentence. It is intended to find the responsive relationship that is presumed to be correct and to resolve the ambiguity of the natural sentence.
[0006]
[Means for Solving the Problems]
That is, the present invention provides a set of response pairs in which a call element and a response element forming a knot among two words co-occurring in a Japanese natural sentence are paired in the original natural sentence including the response pair. A natural computer consisting of a computer that resolves ambiguity of responsive relationships in the input natural Japanese text by searching the responsive pair database stored with the average inter-word distance between the call elements and response elements added A program for causing a sentence ambiguity eliminating apparatus and the computer to function as a natural sentence ambiguity eliminating apparatus.
[0007]
As shown in FIG. 1, the basic functional configuration diagram is shown by a solid line, the natural sentence ambiguity resolving apparatus A includes an input receiving unit 1 that receives input of Japanese text data, and a Japanese text received by the input receiving unit 1. Morpheme analysis means 3 for performing morpheme analysis on data, word number assignment means 4 for assigning word numbers to each morpheme analyzed by morpheme analysis means 3 in order from the top, and morpheme analyzed by morpheme analysis means 3 Call element extraction means 5 for extracting a predetermined call element from the call, and a response element corresponding to the call element extracted by the call element extraction means 5 from the response pair database DB1, and morpheme analysis means based on the search result A response element extraction means 6 for extracting a morpheme corresponding to a response element from among the morphemes analyzed in step 3, a call element extracted by the call element extraction means 5 and the response element extraction means The term distance calculating means 8 for calculating on the basis of the word spacing distance between all response elements extracted in the word number assigned by the word number providing means 4, all of the operation result by the inter-word distance calculating means 8 Response element specifying means 9 for specifying response elements having an interword distance in a predetermined relationship with an average distance between words obtained from the average from the responsive pair database DB1 is provided. .
[0008]
Here, the call element means a word corresponding to “relation” among two words constituting “relation” in a modern Japanese sentence. For example, according to the classification of “Basic Japanese Grammar Revised Edition” (Takashi Masuoka, Yukinori Takubo, Kuroshio Publishing, 1992), the words that can be call elements are: “Proposed particle”, “Training particle”, “Declaration particle” "Adverb" corresponds. The specific target words are “same”, “shika”, “even”, “ha”, “mo”, “just”, “only”, “even”, “if”, “about” (about) , "Just", "what", "never", "probably", "probably", "maybe", "by all means", "just", "if", "surely", etc. Words, but not necessarily limited thereto. The “response element” corresponding to “knot” is a word that appears at the same time when “call element” appears in a modern Japanese natural sentence. In the present invention, it is defined that “call element” and “response element” appear in the same sentence at the same time as “co-occur”, and the “call element” and “response element” to be “co-occur” A combination is called a “co-occurrence pair”, and “call element” and “response element” appearing in this order in the “co-occurrence pair” are defined as “response” and “response”. A combination of “call element” and “response element” is particularly referred to as “response pair”. The responsive pair database DB1 stores a “responsive pair” in which a call element and a response element obtained by extracting a large amount of modern Japanese natural sentence data as original text data are paired. Furthermore, the data of these answering pairs are classified for each call element, and each answering pair includes a word space between the call element and the answering element in a number of texts that form the basis for generating the answering pair database. The average value of distance is given. This inter-word distance is obtained by setting the distance between adjacent morphemes as 1 when the original text is analyzed by morpheme.
[0009]
Accordingly, by causing the computer to function as the natural sentence ambiguity eliminating apparatus A in this way, words having “relationships”, that is, responsive relationships are selected from the inputted Japanese natural sentences. Then, each inter-word distance between the call element and one or more response elements corresponding to the call element is compared with the average distance between the call element and the response element stored in the response pair database DB1, and the corresponding 1 By selecting one response element, it is possible to clarify the response relationship in the natural sentence and assist the user in verifying and understanding the sentence. Here, “input” means inputting character data into a computer using an input device such as a keyboard, mouse, or tablet, or receiving input from an external device such as another computer. In addition, “a response element having a distance between words having a predetermined relationship with the average distance between words” means a response element having a distance between words that is the same as or similar to the average distance between words, and a constant calculation with the average distance between words. It means a response element having an inter-word distance in a formula relationship.
[0010]
In particular, in order to make it easier for the user who uses the natural sentence ambiguity eliminating apparatus A to assist in understanding and verifying sentences, the Japanese text data received by the input receiving means 1 is used. An output means 2 for outputting, a relevance information output means 7 for outputting relevance information indicating a relevance between the call element extracted by the call element extraction means 5 and the response element extracted by the response element extraction means 6; It is preferable to further function as response element clarification means 10 for outputting response element clarification information that clearly specifies the response element specified by the element specifying means 9, and to further improve the visual usability, the output means 2, The sex information output means 7 and the response element clarification means 10 may each display and output Japanese text data, relevance information, and response element explicit information on a display device such as a display.
[0011]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
[0012]
The natural sentence ambiguity resolving apparatus A according to this embodiment, as shown in the functional configuration of FIG. 1, receives responsive pair data that is a pair of a call element and a response element that form a “relationship” from a Japanese natural sentence. Using the collected response pair database DB1, find one call element and one or more response elements in the text data representing the natural Japanese sentence already entered, and identify one response element indicating the meaning of the sentence. This is to assist the verification of the Japanese natural sentence and the understanding by the user. This natural sentence ambiguity resolving apparatus A is constituted by a computer that has a built-in responsive pair database DB1 or is connected to the outside and can be searched. This computer has internal devices such as a CPU 101 and a memory 102 electrically connected by a bus line or the like, as shown in a schematic device configuration diagram in FIG. 2, and a storage device 103 such as a hard disk. A display device 104 such as a display, an input device 105 such as a keyboard and a mouse, and a normal personal computer equipped with external devices such as various communication interfaces 106. For example, the responsive pair database DB1 is connected to the outside via a communication line. It is. The data stored in the responsive pair database DB1 can also be stored in a storage device such as a hard disk.
[0013]
Then, the CPU 101 reads out the natural sentence ambiguity elimination program recorded in the storage device 103 and stores it in the memory 102. The CPU 101 performs processing according to the program, and the memory 102, the storage device 103 such as a hard disk, a monitor, etc. By driving devices such as the display device 104, the input device 105 such as a keyboard and a mouse, and various communication interfaces 106, the computer functions as the natural sentence ambiguity eliminating device A. Here, as shown in FIG. 1, the function of the natural sentence ambiguity resolving device A is as follows: input receiving means 1, output means 2, morpheme analyzing means 3, word number assigning means 4, call element extracting means 5, response element It refers to element extraction means 6, relevance information output means 7, interword distance calculation means 8, response element specification means 9, and response element clarification means 10. In particular, the input receiving unit 1 has a function of receiving data input by the user from the input device 105 such as a keyboard and a mouse and storing the data in the memory 102. The output unit 2, the semantic information output unit 4, and the fixed meaning The information output means 6 has a function of displaying characters on the display device 104 such as a display.
[0014]
In addition, the responsive pair database DB1 morphologically analyzes a large amount of Japanese natural sentence data collected from newspaper articles or the like as original text data, and a pair of co-occurrence in the same sentence based on the data after the morphological analysis. It is obtained by calculating the word (co-occurrence pair) by calculating the complementary similarity and calculating the reliability of the co-occurrence pairs that appear in the original text data in the order of “call element” and “response element”. It also stores “calling pairs”. Here, “reliability” refers to the ratio of the number of original sentences in which the corresponding response pair appears to the number of original sentences in which a certain call element appears, paying attention to each answering pair. In the present embodiment in which the reliability indicates a predetermined value or more, “Daily Newspaper Article Data” and “Nikkei Newspaper Article Data” for 10 years are used as original text data. Here, in this embodiment, the call element is a word classified as “proposed particle”, “collecting particle”, or “declarative adverb”, specifically, “same”, “shika”, “even”, “Wa”, “Wo”, “Just”, “Only”, “Even”, “If”, “About”, “Just”, “What”, “Never”, “Probably (probably ) ”,“ Maybe ”,“ Make sure ”,“ Made ”,“ Moshi ”,“ Surely ”, and have been described in the progressive sentence interpretation support program in advance. Response pairs are classified for each call element. Then, in this responsive pair database DB1, a word number is assigned to each morpheme obtained by morphological analysis of each original text data, and for each responsive pair, the corresponding element in the original text data in which the corresponding responsive pair appears. The difference between the word number and the word number of the call element is the inter-word distance, and the data is stored in the state where the average value of the inter-word distance in all original text data in which the corresponding pair appears is given as the "average inter-word distance" is there. Here, the distance between words between adjacent morphemes is 1. FIG. 3 shows a part of the contents of the responsive pair database DB1 regarding the call element “never”. In the figure, the response element corresponding to the call element “Never” is described, and in the right column of each response element, the average inter-word distance between the call element “Never” and the response element is described. ing.
[0015]
The operation of the natural sentence ambiguity resolving apparatus A is as follows: input receiving means 1, output means 2, morpheme analyzing means 3, word number assigning means 4, call element extracting means 5, response element extracting means 6, relevance information output means. 7. Corresponding to the inter-word distance calculation means 8, the response element specifying means 9, and the response element specifying means 10 is performed as follows.
[0016]
As shown in the flowchart of FIG. 4, the natural sentence ambiguity eliminating apparatus A first receives an input of text data of a Japanese natural sentence from the input device 105 such as a keyboard by the function of the input receiving means 1 (S1). Then, the text data is temporarily stored in the memory 102, and the character string is immediately displayed on the display device 104 such as a display by the function of the output means 2 (S2). Next, the morphological analysis unit 3 reads the text data that has been accepted from the memory 102 and executes morphological analysis, and the text data is decomposed into parts of speech (S3). Here, the morpheme analysis program includes, for example, “JUMAN” (http://www.lab25.kuee.kyoto-u.ac.jp/nl-resource/juman.html) or “tea bowl” as a Japanese morpheme analysis program. (Http://chasen.aist-nara.ac.jp/) can be used. And the word number is given to each morpheme in order from the sentence head by the function of the word number assigning means 4 (S4). Further, the function of the call element extraction unit 5 extracts morphemes corresponding to the above-described call elements from the morphemes obtained from the morpheme analysis results of the morpheme analysis unit 3 (S5). The extracted morpheme is temporarily stored in the memory 102. Further, the function of the response element extraction means 6 searches the response pair database DB1 for the previously extracted call element, and one or more morphemes that become response elements corresponding to the response element corresponding to the call element are obtained. The morpheme analyzed in step S3 is extracted from the morpheme (S6). This extracted response element is also temporarily stored in the memory 102. Here, the function of the relevance information output means 7 reads the call element and the response element from the memory 102 and clearly indicates the call element as relevance information (for example, a mark surrounding the call element with a circle or The underline attached to the call element) and an arrow mark extending from the mark toward the response element are output to the display device 104, and displayed superimposed on the character string displayed on the display device 104 in step S2. (S7). On the other hand, the function of the interword distance calculation means 8 performs an operation of subtracting the word number of the response element from the word number of each response element stored in the memory 102 to calculate the distance between words for each response element. (S8). The distance between words as a result of this calculation is also stored in the memory 102 in association with the corresponding response element. Further, the function of the response element specifying means 9 compares the inter-word distance related to each response element with the average inter-word distance of the response element stored in the responsive pair database DB1, and here, the average inter-word distance is the largest. The response element corresponding to the inter-word distance indicating a close value is determined as the response element corresponding to the call element extracted in step S5 (S9). Finally, the response element specifying information for specifying the response element determined in step S9 is displayed on the display device 104 by the function of the response element specifying means 10 (S10). Here, as an example of the process of displaying the response element explicit information, the response element is underlined, and in order to distinguish the arrow mark extending from the call element to the response element and the arrow mark extending to another response element, Although the method of representing the arrow mark to be represented by a solid line and the other arrow mark by a broken line is employed, other processing such as color-coding the arrow mark may be performed.
[0017]
Here, with reference to FIG. 5, a specific example of disambiguation in a Japanese natural sentence displayed on a display device 104 such as a display after input will be described. The following example relates to the call element “never”. First, as shown in FIG. 5 (a), when a Japanese natural sentence “The road is never easy” is input using the input device 105 and displayed on the display device 104, the natural sentence ambiguity described above. According to the flowchart of the operation of the feasibility canceling apparatus A, the morpheme analysis is performed on the natural sentence, a word number is assigned to each morpheme, and the call element “never” is extracted. In addition, "|" (vertical line) shown in the figure indicates a break between adjacent morphemes. Then, referring to the responsive pair database DB1 as shown in FIG. 3, when “Easy”, “De”, “Ha” and “None” are extracted as response elements corresponding to the call element “Never”, the relevance information From the call element “never”, arrows corresponding to each response element “Easy” “De” “Ha” “None” are displayed on the display device 104. Next, the inter-word distance between the call element “never” and each response element is calculated, and the response element “no” having the average inter-word distance closest to the inter-word distance is determined from the response pair database DB1. In the relevance information, an arrow from “never” to “never” is displayed on the display device 104 with bold lines as response element explicit information. Therefore, the user who uses this natural sentence ambiguity resolution device A uses the correct “relationship” expression in the input or input natural sentence, and among them, “never” and “never” are correct responses. You can see that there is a relationship. It should be noted that not only such a short natural sentence but also a relatively long natural sentence such as that shown in FIG. Can clearly indicate that they are in the correct responsive relationship.
[0018]
As described above, in the present embodiment, a plurality of responsive relations are specified by clearly indicating call elements and response elements that are appropriate to respond truly in the sentence from the text data of the input Japanese natural sentence. Can eliminate the ambiguity of natural sentences. Therefore, a user who uses this natural sentence ambiguity eliminating apparatus A verifies whether a sentence based on the text data input by himself / herself has a correct responsive relationship, or based on text data received from another person. It will be very helpful in understanding the true meaning of the text. This means that the natural sentence ambiguity eliminating device A and the natural sentence ambiguity eliminating program can be applied as a part of a natural sentence verification device or verification program in a Japanese input device or a Japanese input program. Become.
[0019]
It should be noted that the present invention is not limited to the above-described embodiment, and call elements and response elements that are in a correct “relationship” relationship from Japanese natural sentences after input are similarly applied to call elements other than “never”. Can be extracted, and the ambiguity of the Japanese natural sentence can be resolved. Also, for example, a response element “no difference” composed of a plurality of morphemes combining a response element “difference” corresponding to the call element “probably” and a response element “no” is generated, and the average word interval If the distance is stored in the responsive pair database, the combination of the true call element and the response element can be extracted from the input text data more practically as in the above-described embodiment.
[0020]
In addition, the specific configuration and function of each part are not limited to the above embodiment, and various modifications can be made without departing from the spirit of the present invention.
[0021]
【The invention's effect】
As described above in detail, according to the natural sentence ambiguity eliminating apparatus or the natural sentence ambiguity eliminating program according to the present invention, a plurality of responsive relationships appear in a Japanese natural sentence based on input text data. If the meaning of the sentence becomes ambiguous, the average distance between the call element and the response element and the predetermined relationship among the inter-word distances between the call element and the response element in the natural sentence. Because the response elements with the distance between words and the responsive relationship are mechanically selected, the most probable responsive relationship is revealed, the ambiguity is resolved, and the user understands the meaning of the natural sentence And is very useful for verifying whether the correct responsiveness is used in natural sentences. Further, by applying the present invention to a Japanese language input program, a Japanese language verification program, etc., the present invention can also improve their reliability.
[Brief description of the drawings]
FIG. 1 is a diagram schematically showing a functional configuration of a natural sentence ambiguity eliminating apparatus according to the present invention and an embodiment thereof.
FIG. 2 is a schematic device configuration diagram of a computer configuring the natural sentence ambiguity eliminating device according to the embodiment.
FIG. 3 is a view showing a part of responsive pair data used in the embodiment.
FIG. 4 is a flowchart showing an overview of the operation of the natural sentence ambiguity eliminating apparatus.
FIG. 5 is a diagram showing a specific example of disambiguation of text data using the natural sentence disambiguation apparatus.
[Explanation of symbols]
A ... Natural sentence ambiguity eliminating device DB1 ... responsive pair database 1 ... input receiving means 2 ... output means 3 ... morpheme analyzing means 4 ... word number assigning means 5 ... call element extracting means 6 ... responsive element extracting means 7 ... relevance information Output means 8 ... Word distance calculation means 9 ... Response element specifying means 10 ... Response element specifying means

Claims

A set of response pairs in which a call element and a response element forming a knot among two co-occurring words in a Japanese natural sentence are defined as a call element and response element in the original natural sentence including the response pair. It is composed of a computer that resolves the ambiguity of the responsiveness relationship in the input Japanese natural sentence text by searching the responsive pair database stored with the average inter-word distance added.
An input receiving means for receiving input of Japanese text data;
Morphological analysis means for performing morphological analysis on the Japanese text data received by the input receiving means;
Word number assigning means for sequentially assigning word numbers from the beginning to each morpheme analyzed by the morpheme analyzing means;
Call element extraction means for extracting a predetermined call element from the morpheme analyzed by the morpheme analysis means;
A response element corresponding to the call element extracted by the call element extraction means is searched from the response pair database, and a morpheme corresponding to the response element is extracted from the morpheme analysis means by the morpheme analysis means based on the search result. Responding element extraction means,
Inter-word distance calculation means for calculating the inter-word distance between the call element extracted by the call element extraction means and all response elements extracted by the response element extraction means based on the word number assigned by the word number assigning means. When,
Response element specifying means for specifying one response element having an interword distance in a predetermined relationship with an average word distance obtained by averaging all calculation results by the word distance calculation means from the response pair database; A natural sentence ambiguity resolving apparatus characterized by comprising:

Output means for outputting Japanese text data received by the input receiving means;
Relevance information output means for outputting relevance information indicating the relevance between the call element extracted by the call element extraction means and the response element extracted by the response element extraction means;
The natural sentence ambiguity resolving apparatus according to claim 1, further comprising response element specifying means for outputting response element specifying information specifying the response element specified by the response element specifying means.

3. The natural sentence according to claim 2, wherein the output means, the relevance information output means, and the response element explicit means display and output Japanese text data, relevance information, and response element explicit information, respectively, on a display device such as a display. Disambiguation device.

A set of response pairs in which a call element and a response element forming a knot among two co-occurring words in a Japanese natural sentence are defined as a call element and response element in the original natural sentence including the response pair. A computer that resolves the ambiguity of responsiveness in the input natural Japanese text by searching the responsive pair database stored with the average distance between words added,
An input receiving means for receiving input of Japanese text data;
Morphological analysis means for performing morphological analysis on the Japanese text data received by the input receiving means;
Word number assigning means for sequentially assigning word numbers from the beginning to each morpheme analyzed by the morpheme analyzing means;
Call element extraction means for extracting a predetermined call element from the morpheme analyzed by the morpheme analysis means;
A response element corresponding to the call element extracted by the call element extraction means is searched from the response pair database, and a morpheme corresponding to the response element is extracted from the morpheme analysis means by the morpheme analysis means based on the search result. Responding element extraction means,
Inter-word distance calculation means for calculating the inter-word distance between the call element extracted by the call element extraction means and all response elements extracted by the response element extraction means based on the word number assigned by the word number assigning means. When,
Response element specifying means for specifying one response element having an interword distance in a predetermined relationship with an average word distance obtained by averaging all calculation results by the word distance calculation means from the response pair database; A natural sentence ambiguity resolving program characterized by causing a natural sentence ambiguity resolving apparatus to function.

The computer further
Output means for outputting Japanese text data received by the input receiving means;
Relevance information output means for outputting relevance information indicating the relevance between the call element extracted by the call element extraction means and the response element extracted by the response element extraction means;
5. The natural sentence ambiguity elimination program according to claim 4, wherein the natural sentence ambiguity elimination program functions as a natural sentence ambiguity elimination device provided with response element clarification means for outputting response element clarification information that specifies the response element specified by the response element specification means.

6. The nature according to claim 5, wherein the output means, the relevance information output means, and the response element explicit means function to display and output Japanese text data, relevance information, and response element explicit information on a display device such as a display. A sentence ambiguity resolution program.