JP4579400B2

JP4579400B2 - NATURAL LANGUAGE RESPONSE DEVICE AND METHOD, AND RECORDING MEDIUM CONTAINING NATURAL LANGUAGE RESPONSE PROGRAM

Info

Publication number: JP4579400B2
Application number: JP2000330450A
Authority: JP
Inventors: 昇韮塚
Original assignee: 株式会社システムイグゼ
Priority date: 2000-10-30
Filing date: 2000-10-30
Publication date: 2010-11-10
Anticipated expiration: 2020-10-30
Also published as: JP2002132766A

Description

【０００１】
【発明が属する技術分野】
本発明は、自然言語で表現された文字列に含まれる情報を文字列により抽出して蓄積した知識データに対して、自然言語による質問を行い、蓄積した情報を取得する自然言語応答装置及び方法に関する。
【０００２】
【従来の技術】
従来の形態素解析において、語彙の意味を確率、使用頻度、集合論、木構造を使用して予め定義し、語彙の組合せによる論理値や集合論による領域によって文章の意味を判断している。また、言葉によるコンピュータ操作は、特定の単語を記憶し、単語に対応した動作、単語、文章を登録しておき、論理回路による組合せにより動作を実行している。
【０００３】
文章の文脈解析において、文法解析手法が適用されている。これは、文法を定型パターンに分類し、このパターンに合致した語彙の意味接合条件により意味を確定する方法が取られている。
【０００４】
しかし、文章を構成する語彙の意味は、語彙の互いの関係により決定され、
その結果として文法が構成される必要があり、語彙の意味や文法を固定的に定義しておくことは、これらの関係を限定することになる。語彙や文法による文章の解析は、固定パターンを増大させてしまい、また、限定したパターンによる文章の解析は、パターン認識による近似が行われることになり、正確な情報を把握することが難しい。
【０００５】
上述したような意味解析は、
特開平４−１９１３３８号広報「質問回答方法および装置」、
特開平５−２３０７０１号広報「情報抽出装置」、
特開平７−０７８３５５号広報「自然言語の意味解析処理装置」、
特開平１０−３０４５８３号広報「自然言語処理装置及び方法」などがある。
【０００６】
【発明が解決しようとする課題】
自然言語で表される文章の意味は、文章を構成する語彙の組合せにより、一方の語彙が他方の語彙との関係により、互いに影響した結果として文章を構成する１つ１つの語彙の意味が決定され、文章全体が指し示す意味が決定される必要がある。また、文章は、文章全体が指し示す意味を含んでいるので、文章を構成する語彙が表現するイメージを決定し、文章を構成する個々の語彙の意味を確定する方法を用いる必要がある。
【０００７】
そこで本発明は、構文解析のための文法パターンを必要としない形態素解析を実現するために、文章を構成する語彙が他の語彙による影響を自律して判断し、文章を構成する語彙の組合せを語彙が変更すること（以下、このことを他の語彙に対する影響と定義する）により文章を構成する要素となる語彙を確定し、語彙の組合せによる意味を推定し、文章全体の概要となる意味を推定し、文章に含まれる情報を抽出し、抽出した情報を自然言語により蓄積することで、自然言語により蓄積した情報を自然言語により取得することを目的とする。
【０００８】
【課題を解決するための手段】
上記目的を解決するために本発明に係る自然言語応答装置及び方法は、文を構成する最小単位を定義語とし、定義語は、意味を特定することのできる漢字、ひらがな、カタカナ、英字、数字、記号及びその組合せによる文字列に対して定義される。自然言語による文章を入力し、文章の構成要素と成り得る全ての定義語が他の定義語へ与える影響を記述言語により記述した処理手続きを定義語の使用方法、抽出方法、動作方法、回答方法へ必要に応じて定義しておく。
従来、意味や使用方法による違いを人間が一般常識として判断し、選択していた語句を、記述言語で記述した手続きを定義語のデータの一部として定義することにより、文章を構成する定義語によって作られた組合せを、個々の定義語へ定義された手続きを実行することで、自己または、他の定義語への影響を定義語自身が判断し、文章を構成する定義語の組合せ構造を変化させ、文章を構成する最適な定義語の連結状態を作り出し、
【０００９】
文章に含まれる情報を抽出する場合や質問に対する回答を行う場合についても語句の組合せによって生じる条件を定義語の動作方法、回答方法へ定義されている記述言語による手続きを実行し、蓄積情報の検索条件を作り出すことで、質問に対して、個々の状況に応じた回答を的確に作り出すことができる。
【００１０】
上記定義語と定義語に対する上記方法を記述した手続きと定義語の言語学上の属性を定義したデータを定義語情報とした場合、文は定義語の連結状態として定義することができ、個々の定義語情報が小さなプログラムとして動作することのできるデータを記憶していて、文を構成する定義語情報の組合せにより形態素解析処理の手続きを決定することができる。このことは、組立て工場などの生産ラインにロボットを配置して１つの製品を生産することができるが、ロボットの機能や構成を変更することで他の製品を生産することができる。これは、個々のロボットは限定された機能を使用しているだけであるが、全体としての生産ラインでは、個々のロボットの構成やバランスをとることで、あらゆる製品を生産することができる。定義語情報に記述された上記手続きがこのロボットに相当し、上記手続きは限定的な動作定義がなされ、定義語情報の接続によって構成される文は生産ラインに相当する。ロボットである定義語情報の動作により生産物として、文の構文を解析し、文に含まれる情報を取得することができる。
【００１１】
このことは、文章全体の文法を解釈する必要が無く、定義語が他の定義語に与える影響を定義語にデータとして記述することで、定義語の組合せによるあらゆる文章表現に対応できることを特徴としている。
【００１２】
上記操作を行い、文章に含まれる情報を個別の文として取り出すことで、文章や語彙を数学モデルや数値に変換する必要が無いため、解析で必要とする情報量や記憶するデータ量を少なくすることができ、解析時間を短縮することで自然なスピードで対話を行うことができる。また、文章に含まれる情報を定義語の組合せによる文字列によって情報を知識データとして蓄積することで、数値化することが難しい言葉の表現自体に含まれる情報や意味を少ないデータで蓄積することができることを特徴としている。
【００１３】
【発明の実施の形態】
上記課題を解決するために本発明の実施の形態について、事例をあげて、詳細に説明する。本例では、簡単な言葉のやりとりとして、
情報１「公園へ行き弁当を食べた。」
質問１「何を食べましたか。」
質問２「何処で弁当を食べましたか。」
質問３「公園で何を食べましたか。」
質問４「弁当を食べましたか。」
質問５「公園で弁当を食べましたか。」
質問６「弁当を公園で食べましたか。」
質問７「公園でリンゴを食べましたか。」
を想定し、上記情報１に対して質問１から７による会話を想定した場合、人間は簡単に回答することができるが、情報１をコンピュータに蓄積しておき、上記質問を人間が入力することで、コンピュータが質問に対する回答を行うことは、従来の技術では、簡単に得られていない。このことは、従来の方法が、限定した言葉や文法などによってコンピュータとの対話を実現しようとしているためである。本発明では、情報１などの情報を蓄積することで、上記質問１から７などの多様に表現された質問に対して的確に回答する方法を詳細に記述して、自然言語応答装置の説明を行う。
【００１４】
本発明は、例えば図１に示すような自然言語応答装置１に適用される。この自然言語応答装置１は、操作者が情報や文字列を入力することで自然言語応答装置１を操作するための入力部２を備える。入力部２は、例えばキーボードやマウス等のポインティングデバイスから成り、操作者により操作されることで入力処理の対象となって、入力部で変換された自然言語を示す文字列として文字変換部３へ出力される。
【００１５】
また、上述した入力部２は、キーボード等に限らず、例えば、マイクロホンから入力した音声信号を音声認識を行うことで自然言語を文字列へ変換するものであっても良い。また、データファイルへ書き込んだ外部テキストデータを入力するものであっても良く、自然言語を入力できる手段であれば広く適用することができる。
【００１６】
上記、情報１の文を自然言語により入力部２で入力した場合、文字変換部３へ出力される文字列が、例えば、平仮名であるとすると、
上記文字変換部３は、入力部２によって変換された情報１の文字列として、「こうえんへいきべんとうをたべた。」を入力し、定義語記憶部４に格納された定義語情報を用いて正規化された標準文字列へ変換する処理を行う。
定義語情報２０は、図２に示すような領域Ｒ１からＲ１４を持ち、従来の形態素解析において辞書と呼ばれるものに相当するが、単に、語彙や品詞などの固定情報を記憶させたものではなく、形態素解析処理の一部を定義語情報２０のデータとして定義されている。このことは、文を構成する語彙の接続情報や構文を判定している従来の主プログラムから、解析処理を取り除き、標準解析処理のみを主プログラムに設定しておく。主プログラムの標準解析処理で行えない文の解析処理については、定義語情報のデータとして解析処理を記述言語による手続きとして定義する。従って、自然言語応答装置１の解析処理は、文を構成している定義語情報２０に記述された上記手続きの組合せにより決定される。
【００１７】
図２に、定義語情報２０のデータ構造を示す。定義語情報２０は、漢字、ひらがな、カタカナ、英字、数字、記号及びその組合せにより意味を特定できる文字列を定義語として記録している。定義語は、意味を特定できる文字列であれば、ファイル名や画像データなどを文字列で表現したものでも良く、言葉である必要はない。定義語Ｒ１には属性として、識別番号Ｒ２、読みＲ３、送り仮名Ｒ４、品詞Ｒ５、特殊活用Ｒ６、活用Ｒ７、表記方法Ｒ８を記憶し、定義語Ｒ１の意味を電子化するために意味として、標準選択手順を変更するための手続きを記述した使用方法Ｒ９、動作手続きを記述した動作方法Ｒ１０、標準の文章意味抽出手順を変更するための手続きを記述した文章意味抽出方法Ｒ１１、標準回答生成手順を変更するための手続きを記述した回答生成方法Ｒ１２、定義語から想像することのできる他の定義語を必要に応じて１以上記述した関連定義語Ｒ１３、同じ意味を持つ他の定義語を必要に応じて１以上記述した同意定義語Ｒ１４を記憶している。
【００１８】
識別番号Ｒ２は、定義語Ｒ１へ割り当てられた一意の番号で、同音異義語や読み方の違いにより意味が異なる場合に定義語Ｒ１を特定するために使用される。”金（きん）”と”金（かね）”では表記”金”は同一であるが読み方の違いによって使用方法や意味が異なる。このような語彙を区別するために識別番号Ｒ２によって定義語Ｒ１の区別を行っている。
【００１９】
読みＲ３は、定義語Ｒ１の発音を定義語Ｒ１の１文字ごとに分割した平仮名を使用して定義する。定義語Ｒ１として”構造”を定義した場合、読みＲ３では”こう□ぞう”（□は空白文字を表す）と分割文字として空白文字で区切って定義する。このことにより、”構ぞう”、”こう造”、”こうぞう”を”構造”として検索することが可能となる。送り仮名Ｒ４は、慣用句などの活用で送り仮名を必要とする場合に１以上の送り仮名を定義する。動詞、形容詞などの送り仮名は、定義語Ｒ１、特殊活用Ｒ６、活用Ｒ７によって定義する。また、送り仮名省略検索を行うために、”行う”や”行なう”などの表記について、Ｒ３へ”おこな”と登録し、定義語Ｒ１の表記方法Ｒ８の属性の一部として、送り仮名省略検索を行う属性を定義する。この送り仮名省略検索属性を持つ定義語Ｒ１は読みＲ３の最後の文字と送り仮名が同一である場合、Ｒ３の最後の１字を省略して検索対象とすることで、”行う”や”行なう”を標準文字列”行う”として検索することができる。
【００２０】
品詞Ｒ５は、言語学で使用する文法上の品詞を定義する。また、人名に対する姓名の区別や単位を拡張品詞としてＲ５へ定義する。
【００２１】
特殊活用Ｒ６は、サ変動詞活用、形容動詞活用、連体形の活用、副詞の活用、ク活用、シク活用、タリ活用、ナリ活用などの特殊な活用について定義する。
定義語として”定義”を定義した場合、特殊活用Ｒ６へサ変動詞活用であることを定義する。これにより、”定義する”、”定義した”、”定義しない”などの助詞、終助詞、格助詞などを含んだ活用変化を定義しておき、この活用変化の種別によって、文法上の現在形、過去形、進行形、肯定、否定、疑問などの文章の形態や論理値を認識することができる。
【００２２】
活用Ｒ７は、特殊活用Ｒ６以外の動詞活用で、五段活用、上一段活用、下一段活用、上二段活用、下二段活用、カ行変格活用、サ行変格活用、ナ行変格活用、ラ行変格活用などの文法上の動詞活用を定義する。これにより、特殊活用Ｒ６と同じように、文法上の現在形、過去形、進行形、肯定、否定、疑問などの文章の形態や論理値を認識することができる。
【００２３】
表記方法Ｒ８は、標準化した表記形式を定義する。”キロメートル”を”Ｋｍ”として表記することや”林檎”を”リンゴ”と表記することを標準文字列として定義する。また、”行う”や”行なう”などの送り仮名の送りについて”行う”と表記することを正規化された標準文字列として定義する。
【００２４】
使用方法Ｒ９、動作方法Ｒ１０、文章意味抽出方法Ｒ１１、回答生成方法Ｒ１２は、自然言語解析処理手順を変更するためのプログラム記述言語で記述された手続きであり、文章中の定義語Ｒ１の関連性を条件に、定義語の組合せを変更する手続きを定義する。
【００２５】
関連定義語Ｒ１３は、定義語Ｒ１から想像することのできる他の定義語Ｒ１を必要に応じて１以上定義する。定義語Ｒ１として”リンゴ”を定義した場合、”食物”、”果物”、”木”、”赤”、”美味い”などの定義語Ｒ１を関連定義語Ｒ１３として定義することができる。これにより、定義語Ｒ１の組合せにより生じる統一された意味をこの関連定義語Ｒ１３を使用して推定することができる。同意定義語Ｒ１４は、同じ意味で表現方法が異なる定義語を必要に応じて１以上定義する。”ゴールド”と”金（きん）”は表記が異なるが同じ”貴金属の金”を指している。文章中に”ゴールド”と表記された場合、”金（きん）”として認識することができる。
【００２６】
図１の文字変換部３は、上記文字列「こうえんへいきべんとうをたべた。」に対して定義語記憶部４から定義語情報２０を取得し、上記文字列を構成することのできる組合せを作成する。この組合せは、定義語Ｒ１が上記文字列の位置情報をもとに、同一位置から始まる定義語情報２０を１つの行とし、複数の行を１つの列にまとめ、各列の位置順に接続する。この組合せの状態を定義語変換行列と呼び、図３に示す。図３の定義語変換行列は、第１列に入力文字列の第１文字である”こ”から始まる語彙を列にまとめ、第２列は、入力文字列の第２文字”う”から始まる語彙を列にまとめている。この定義語変換行列Ｍ１は、上記入力文字列に対する初期状態を示している。定義語変換行列Ｍ１で定義語を矩形で囲んだ表現は、図２に示す定義語情報２０のＲ１からＲ１４により構成されていることを表し、矩形内の語彙は定義語Ｒ１を表している。
【００２７】
上記定義語変換行列Ｍ１を構成する列は、入力文字列に対応する変換候補を入力文字列の開始位置によりまとめているので、定義語Ｒ１の選択によって使用しない列が存在する。
図３は、第１列の第１行に入力文字列”こうえん”に対する変換候補として”後援”が選択されていることを示している。この場合、入力文字列の一部である”こうえん”を第１列の変換領域と呼ぶ。第１列の変換領域が”こうえん”であると仮定すると、第２列及び第３列の変換領域は第１列の変換領域と重なり第１列に対して接続できない。図３の第２列、第３列が１段下がって表示されているのは、接続できない状態を示している。第１列へ接続できるのは、第１列の変換領域が接する第４列であり、変換領域”へいき”に対して”併記”が変換候補となっている。第４列に接続できるのは、第７列の変換領域”べんとう”に対して変換候補”弁当”である。この場合、３つの変換候補は全て名詞であることは明白であるが、名詞結合を行って新しい名詞と判断するためには、定義語情報２０の関連定義語Ｒ１３による意味の統一性が必要条件となる。
【００２８】
上記定義語情報２０の関連定義語Ｒ１３は、定義語Ｒ１から想像することのできる他の定義語により定義する。このことは、定義語Ｒ１の文字列や言葉から想像できるイメージを他の言葉によって定義することで、定義語Ｒ１が属するグループを概念的にとらえることができ、固定化したグループを形成する必要がない。関連定義語Ｒ１３へ複数の言葉を定義することにより、複数の異なる概念グループを定義したり、同じような概念グループを定義することで、より狭域的な概念グループなどを同時に形成することができる。例えば、名詞の”犬”について、”動物”、”生物”などの言葉により広域的意味を定義し、同時に、”秋田犬”、”シェパード”、”ブルドッグ”などの犬種を定義することで、犬自体のグループを構成することができ、登録語の相互関係や依存関係を意識する必要がない。
【００２９】
上記変換候補は、関連定義語Ｒ１３を使用して意味の統一性を判断することで名詞結合した場合に意味を創出できるかを判定することができる。関連定義語Ｒ１３は、定義語Ｒ１から創造することのできる他の定義語であれば何でも定義することができ、上記３つの名詞に対して関連定義語Ｒ１３を想定すると、上記３つの名詞について、
”後援”に対して、”人”、”会場”、”会”、”講堂”などのＲ１３、
”併記”に対して、”記述”、”説明”、”並べる”などのＲ１３、
”弁当”に対して、”食物”、”食事”、”愛妻”、”家庭”などのＲ１３
を定義する。上位３つの名詞のＲ１３から意味の統一性を判定する共通した言葉がない。さらに、Ｒ１３に定義されている言葉（例えば、”人”や”記述”など）の定義語情報２０を取得し、関連定義語Ｒ１３を比較しても、２次の関連定義語Ｒ１３による共通した言葉を見出すことはできない。従って、上位３つの名詞に対する意味の統一性が無いことが判定でき、３つの名詞を結合することができない。また、上位２つの名詞”後援”と”併記”の結合についても結合できないと判定することができる。以上のことから、２番目の変換候補である”併記”が変換候補とならないことが確定できる。図３の変換行列Ｍ１の第４列について、変換領域”へいき”の他の変換候補について上記操作を行うと、
”兵器”、”平気”についても名詞結合による意味の統一性が無いことが判定できる。この結果から、第１列の変換領域を”こうえん”と仮定した場合、第１列の変換領域へ接続する変換領域を”へいき”としたことが誤りであると判定できる。第４列の変換領域を”へいき”に対する変換候補を第４列の最下位行へ移動すると、変換領域は”へ”となる。
【００３０】
第４列の変換領域を”へ”と仮定した場合、第４列の変換候補は格助詞”へ”となる。格助詞”へ”の定義語情報２０の使用方法Ｒ９に対して、”へ”の使用方法を記述言語により記述しておく。図４は、格助詞”へ”の使用方法Ｒ９へ登録されている記述言語で記述された使用方法の例を示したものである。この使用方法Ｒ９へ記述されている内容は、格助詞”へ”の接続条件として、
”名詞＋へ＋動詞”、”名詞＋へ＋形容詞”、”名詞＋へ＋副詞”または、”名詞＋へ”の優先順位を持つ接続条件を定義し、格助詞”へ”の後ろへ接続される品詞を属する列の先頭へ移動することを指定している。但し、移動できるのは、”へ”の前にある名詞と移動する品詞の意味の統一性が必要条件となる。
第４列の変換領域へ接続するのは第５列となり、第１列の変換領域
”こうえん”に対する名詞属性を持つ変換候補と第５列の変換領域”いき”に対する変換候補の組合せを、上記意味の統一性の判定処理を行うと、接続条件の内、”名詞＋へ＋動詞”について、第１列第２行の”公園”と第５列第１行の”行き”によって、”公園＋へ＋行き”が成り立つことが判定できる。これは、名詞の”公園”に対して、”場所”、”遊び”などのＲ１３、
動詞の”行く”に対して、”場所”、”建物”などのＲ１３
を定義することができ、関連定義語”場所”によって意味の統一性があると判定することができる。従って、”こうえんへいき”は”公園へ行き”と確定することができる。残りの”べんとうをたべた”についても、同様な処理を行うことができる。接続詞”を”の使用方法Ｒ９へ接続条件として、
”名詞＋を＋動詞”、”名詞＋を＋形容詞”、”名詞＋を＋副詞”、
”名詞＋を”などを定義することで、”を”の前にある名詞と”を”の後ろにある品詞の意味の統一性を考慮することで、”弁当を食べた”が成り立つことが判定できる。従って、図３に示す変換行列Ｍ１は最終的に、図５に示す変換行列Ｍ１の構造となる。図５の接続している列の先頭行にある定義語を使用することで、入力文字列に対する変換結果とすることができる。
【００３１】
図１の情報抽出部５は、上記文字変換部３で作成された図５に示す定義語変換行列Ｍ１を使用して入力文字列に含まれる情報を抽出する。情報の抽出は、文を構成する最小単位に分割することで行う。通常、文の分割は”名詞＋動詞”の形態を１つの単位として行うが、定義語情報２０の文章意味抽出方法Ｒ１１に抽出方法が記述されている場合は、記述されている手続きを実行することによって行う。図５の変換結果”公園へ行き弁当を食べた”について、”名詞＋動詞”の単位に分割すると、”公園へ行き”と”弁当を食べた”の２つに分割することができる。２つに分割した上記文が入力文字列に対応する文字列
”こうえんへいき”と”べんとうをたべた”を取得して、図６に示す分割した文の其々の定義語変換行列Ｍ２、Ｍ３を上記変換方法を使用して作成する。
【００３２】
上記定義語変換行列Ｍ２、Ｍ３は、Ｍ１の構成要素である定義語を文を構成することのできる最小単位に分割した結果であり、定義語情報２０の使用方法Ｒ９や関連定義語Ｒ１３の影響を最小に抑えた結果となっている。このため、Ｍ２、Ｍ３は直接的な語彙の関係により変換されている。もし、Ｍ１とＭ２、Ｍ３の結果が異なる場合は、Ｍ２、Ｍ３の変換結果の方が信頼度が高い。このような場合、Ｍ２、Ｍ３の結果をＭ１へ反映させることで、より信頼性を高めることができる。これは、一般的に、Ｍ１の変換結果が正しいかどうかをＭ２、Ｍ３、…Ｍｎ（ｎは入力文字列に含まれる情報数を示す）により判定でき、もしＭ１の変換結果が誤っている場合は、Ｍ２、Ｍ３、…Ｍｎを使用してＭ１を自動修正することができる。
【００３３】
上記入力文字列に対する定義語変換行列Ｍ１は、定義語による標準文字列変換結果であり、定義語変換行列Ｍ２、Ｍ３は、入力文字列に含まれる情報である。
図１の情報蓄積部６は、上記定義語変換行列Ｍ１、Ｍ２、Ｍ３を図７に示す構造で文単位に作成し、図１の知識データ記憶部７へ記憶する。図７の構造は、上記入力文字列について言えば、Ｍ２として”公園へ行った事実”、Ｍ３として”弁当を食べた事実”、Ｍ２とＭ３連結により、”公園へ行ってから弁当を食べたという時間経過及び条件”を示している。
【００３４】
図１の知識データ記憶部７へ蓄積される情報は、図７に示すデータ単位を図８に示す階層構造を使用して蓄積される。図８の階層構造は、必要に応じて作成されるが、例えば、情報を蓄積するための文章をファイルへ記述し、記述内容について章番号や節番号を章や節のタイトルへ付すことにより、階層構造を作り出す。階層構造は、定義語変換行列Ｍ２、Ｍ３…Ｍｎが連結されたＭ１に、他のレコードを連結するためのアドレスポインターＰＶ（Ｐ２）、ＮＸ（Ｐ３）を設ける。タイトル文章については、アドレスポインターＢＰ（Ｐ１）、ＰＶ（Ｐ２）、ＮＸ（Ｐ３）を設ける。階層構造を作るタイトル文章については、階層終了データＰ４を設け、階層開始位置のＰ１にＰ４のレコードの先頭アドレスを、Ｐ４にＰ１のレコードの先頭アドレスを記憶する。タイトル１（Ｍ１）の文章にＭ２、Ｍ３を連結し、Ｍ１にＰ１、Ｐ２、Ｐ３を設けた全体をレコード１（Ｒ１）とし、Ｒ１からＲ６までの接続関係を示している。Ｒ１は階層構造を持ち、Ｒ１の終了レコードとしてＲ６を定義し、Ｒ１のＰ１へＲ６の先頭アドレスを記憶し、Ｒ６のＰ１へＲ１の先頭アドレスを記憶する。Ｐ２に１つ前のレコードの先頭アドレスを記憶し、Ｐ３は１つ先のレコードの先頭アドレスを記憶する。Ｒ１及びＲ６は、特異レコードとし、Ｒ１のＰ２へＲ６の先頭アドレスを記憶し、Ｒ６のＰ３へＲ１の先頭アドレスを記憶する。Ｒ１からＲ６のＰ１、Ｐ２、Ｐ３に対して上記レコードの先頭アドレスを記憶することにより、文章で記憶した情報の階層構造を作り出し、１行ごとに管理された文章全体の位置関係を構築し、各行のレコードを双方向型循環サーチチェインをＰ１、Ｐ２、Ｐ３により構築することにより、必要な情報の範囲を推定し、高速検索を可能とすることができる。
【００３５】
入力文字列に含まれる情報が不足している場合、文字変換部３及び情報抽出部５において、上記知識データ記憶部７へ蓄積された知識データを検索することにより、不足情報の推定を行う。例えば、”ろがつかえるようになった”を変換する場合、”ろ”を変換するための情報が不足し、”炉”又は”櫓”と変換して良いか判断できない。このような場合、知識データ記憶部７に、”舟に乗った”と変換されている情報がある場合、”櫓”の図２に示す関連定義語Ｒ１３へ”舟”を定義しておくことにより”舟”の影響によって”櫓”と変換すると判断して”櫓が使えるようになった”と変換できる。知識データ記憶部７に、”皿を作った”と変換されている場合、”皿”の関連定義語Ｒ１３へ”陶器”、”陶芸”、”ガラス”、”瀬戸物”と定義し、”炉”のＲ１３に”焼く”、”陶芸”、”溶かす”などを定義しておくことにより、”炉”と変換すると判断して”炉が使えるようになった”と変換できる。知識データ記憶部７に、対応する知識データが存在しない場合は、”炉”や”櫓”の関連定義語Ｒ１３を使用して何に使う”ろ”なのかを質問することで、変換候補を確定することができる。
【００３６】
図９は、上記操作による変換結果及び情報抽出結果を画面へ表示した一例を示したものであって、キーボードから入力文字列
”こうえんへいきべんとうをたべた”（ａ１）を読み込み、定義語記憶部４を検索し、定義語変換行列Ｍ１、Ｍ２、Ｍ３を作成し、定義語データの意味を考慮して、定義語変換行列Ｍ１、Ｍ２、Ｍ３の解析結果を表示し、抽出した情報を知識データ記憶部７へ蓄積した内容を示している。ａ２の表示は、入力した文字列を示す。ａ３は、定義語変換行列Ｍ１、Ｍ２、Ｍ３を用いて変換し、定義語単位に空白文字で区切って、図２の定義語情報２０の表記方法Ｒ８に従って表示している。ａ４の表示は、変換結果を品詞ごとに空白文字で区切って表示している。ａ５の表示は、ａ３の変換結果と、定義語Ｒ１の読みＲ３を用いて、定義語ごとに空白文字で区切って読みＲ３を表示している。ａ６、ａ７の表示は、ａ２の文章に定義されている情報を抽出し、抽出情報ごとに表示している。
【００３７】
上記情報蓄積方法により知識データ記憶部７へ蓄積された知識データについて自然言語により質問を受け付ける。情報の蓄積と質問は、マウスなどのポインティングデバイスにより動作切り替えを行う。また、文の解析結果である定義語変換行列から、動詞の定義語情報２０の特殊活用Ｒ６、活用Ｒ７から疑問文であると判断することで動作切り替えを行う。
上記情報１の入力文字列を情報として蓄積し、上記質問１を入力した場合の回答方法について記述する。
上記変換及び情報抽出方法により、図１の入力部１で入力された質問１
”なにをたべましたか”を文字変換部３へ出力し、変換した結果を情報抽出部５が入力して、図１０に示す定義語変換行列が作成される。質問１に含まれる情報はＭ２のみ存在し、Ｍ１とＭ２の結果は同じとなり、変換結果は、
”何を食べましたか”となる。
【００３８】
図１の回答生成部８は、上記定義語変換行列Ｍ１、Ｍ２を入力し、質問１に対する回答を生成する。回答は、回答の鍵となる定義語に対して、定義語情報２０の回答生成方法Ｒ１２へ記述言語を使用して回答方法を定義することにより生成することができる。上記質問１の場合、代名詞”何”の定義語情報２０の回答生成方法Ｒ１２へ回答方法を記述しておく。図１１は、代名詞”何”のＲ１２への記述例を示している。図１１に示した回答生成方法は、
”何＋接続詞＋動詞”、”何＋格助詞＋動詞”の質問形態に対して、動詞の”食べた”の関連定義語Ｒ１３を使用して、知識データ記憶部７に蓄積された知識データのＭ２、Ｍ３、…Ｍｎから一致する動詞に接続する名詞と質問１を検索した接続詞及び動詞を回答文とすることを示している。従って、上記手続きは、図６のＭ３から名詞である”弁当”と質問１の”を”と情報１の動詞である”食べた”を連結し、図２の活用Ｒ７を使用して、動詞”食べた”の回答形として”食べました”を取得することにより、回答文”弁当を食べました”を生成することができる。
【００３９】
図１の回答表示部９は、上記回答文を出力装置へ出力することにより、情報１に対する質問１の回答として、”弁当を食べました”を、例えば、ディスプレイや音声出力によって出力することで、自然言語による応答を実現する。
【００４０】
上記質問２「何処で弁当を食べましたか。」を入力した場合、格助詞”で”の文章意味抽出方法Ｒ１１へ”名詞＋で”、”代名詞＋で”を抽出単位とし、残りの文を”名詞＋動詞”で抽出する手続きを記述しておく。この場合、Ｒ１１の抽出結果として”何処で”と”弁当を食べました”を定義語変換行列Ｍ２、Ｍ３とすることができる。質問文において、代名詞を含まない抽出情報は、知識データを検索する条件となる。情報１はこの条件を満足するので、回答対象文と判断することができる。代名詞”何処”の回答生成方法Ｒ１２に、知識データから、関連定義語Ｒ１３に”場所”を示す名詞を検索して回答条件とする手続きを記述しておく。これにより”公園で”の回答条件を作り出すことができ、回答として”公園で食べました”を出力することができる。
上記質問３は、逆に、”公園で”が検索条件となり、上記処理と同様に行うことで、”弁当をたべました”を出力できる。
上記質問４は、質問１の代名詞”何”が名詞”弁当”となっているため、
質問４自体が検索条件となる。この場合、”はい”または”いいえ”が回答となり、質問４の条件に一致する知識データが存在するので、回答として、
”はい＋動詞”の形態で、”はい食べました”を出力することができる。
上記質問５は、”公園で”と”弁当を食べた”の２つの条件が存在し、質問４と条件数が異なるだけで、同じ処理で同じ結果となる。
上記質問６は、情報抽出条件が質問２で、検索条件は質問５と同じであり、
回答は、質問５と同じ結果となる。
上記質問７は質問５と同じ処理であるが、知識データに”公園で”の検索条件を満足する文に”リンゴを食べた”を満足する文が存在しないので、回答として”いいえ”を出力することができる。
【００４１】
以上のように、具体的な事実や情報を含んだ文章を知識データとして文の構成をそのまま蓄積することで、文の構成要素を検索条件として使用して目的の情報を自然言語による質問を行って得ることができる。また、自然言語解析において解析パターンや文法パターンでは表現することができない語彙特有の使用方法や回答方法を鍵となる語彙に記述することで、多くの言葉の表現について意味を推定することができる。
【００４２】
【実施例】
上記本発明の実施の形態を使用した具体的な実施例について図を参照しながら説明する。
【００４３】
図１２は、本発明による標準文字列自動変換装置１０の実施例を示したものである。標準文字列自動変換装置１０は、上記実施形態で説明した図１の自然言語応答装置１と同じ、入力部２、文字変換部３、定義語記憶部４、情報抽出部５を備えている。また、変換結果を一時記憶する変換結果一時記憶部１１と変換結果を表示装置へ表示するための文字出力部１２を備えている。
入力部２は、キーボード等に限らず、例えば、マイクロホンから入力した音声信号を音声認識を行うことで自然言語を文字列へ変換するものであっても良い。
また、データファイルへ書き込んだ外部テキストデータを入力するものであっても良く、自然言語を入力できる手段であれば広く適用することができる。
【００４４】
上記入力部２で入力された文字列は、上記実施形態の方法を使用して、入力文字列に対応する定義語記憶部４に記憶された定義語情報２０を取得し、文字変換部３で図３に示す定義語変換行列Ｍ１を作成する。
【００４５】
情報抽出部５は、上記定義語変換行列Ｍ１を使用して、入力文字列に含まれる情報を上記実施形態で説明した情報抽出方法により抽出し、図５及び図６に示す定義語変換行列Ｍ１、Ｍ２、…Ｍｎを作成し、この定義語変換行列を変換結果一時記憶部１１へ保存する。
文字出力部１２は、上記変換文字列を表示するための出力バッファーへ出力する。
【００４６】
上記のような処理により、これまで、語彙の使用頻度や同音異義語のメニュー選択などによって行っていた方法を、文の意味を把握することにより、語彙の選択を自動的に行うことができる。また、情報不足によって自動変換できない場合、操作者に対して質問を行うことで不足情報を補う方法により、操作者は変換に必要な操作方法を覚える必要がない。
【００４７】
また、この標準文字列自動変換装置１０は、「ひらがな」を漢字へ変換するだけではなく、漢字、ひらがな、カタカナ、英字、数字、記号及びその組合せにより表現された文字列を、上記実施形態で説明したように、定義語情報２０の定義語Ｒ１、読みＲ３、送り仮名Ｒ４、表記方法Ｒ８を使用して正規化された標準文字列へ変換することができる。
【００４８】
図１３に示す実施例は、自然言語によるデータベース検索方法であって、
通信回線で接続されている外部データベースや同一コンピュータに構築されている外部データベースに対し、自然言語によってデータベースのデータを検索する処理の流れ図を示したものである。
上記実施形態で説明した方法により、外部データベースに登録されているデータの概要説明文を外部ファイルとして作成し、外部ファイルを読み込み、外部データベース３０に関する知識データを知識データ記憶部７へ作成しておく。
【００４９】
例えば、外部データベース３０のテーブルＡ、カラムＢについて外部ファイルへ”テーブルＡのカラムＢは社員の生年月日である。”と記述した場合、この文の変換行列Ｍ１、Ｍ２、…Ｍｎの抽出情報は、
”テーブルＡのカラムＢは社員の生年月日”となり、この結果を知識データ記憶部７へ蓄積する。定義語”生年月日”の関連定義語Ｒ１３へ”誕生日”を登録し、定義語”誕生日”のＲ１３へ”日付”を登録しておく。
【００５０】
ステップ１（Ｓ１）で、入力装置から外部データベース３０を検索するための条件を自然言語で入力する。文字列を入力する装置は、キーボードや、
マイクロホンから入力した音声を音声認識システムによる文字列入力、或いは、外部ファイルへ格納した文書など、自然言語を入力できるものであれば広く適用できる。
【００５１】
ステップ２（Ｓ２）で、処理を終了する操作が行われた場合、処理を終了する。入力した文字列が存在する場合、本発明の実施形態で説明した方法を使用して処理を行う。ステップ３（Ｓ３）は図１の文字変換部３に相当し、ステップ４（Ｓ４）は情報抽出部５に相当する。入力された文字列は、Ｓ３で図３に示すような定義語変換行列Ｍ１を作成し、Ｓ４で図５及び図６に示される定義語変換行列Ｍ１、Ｍ２、…Ｍｎを作成する。ステップ５（Ｓ５）で、定義語変換行列Ｍ１、Ｍ２、…Ｍｎを使用して、図７及び、図８に示す構造で情報を抽出して一時記憶装置へ記憶する。
【００５２】
ステップ６（Ｓ６）で、Ｓ５で蓄積した情報を使用して、外部データベース３０に関する知識データ記憶部７に蓄積された情報を検索し、入力文字列に含まれるデータ表現（テーブル名、カラム名、データ値など）について、蓄積した情報があるか検索し、蓄積情報がない場合、操作者に対して不足情報を出力し、条件の再入力を促す。
【００５３】
ステップ７（Ｓ７）で、Ｓ４で抽出した条件を使用して、外部データベース３０を検索するための条件を生成する。検索条件は、蓄積情報を構成する定義語情報２０の品詞Ｒ５、同意定義語Ｒ１４を使用して生成される。名詞はテーブル名、カラム名、データ値として判断し、形容詞、副詞はデータの範囲条件となる。また、動詞や助詞、格助詞、接続詞は条件を構成する演算要素として判定し、演算要素は同意定義語Ｒ１４を用いて作成することができ、また、文法を伴う条件は回答生成方法Ｒ１２へ記述言語を用いて生成手順を記述しておく。
【００５４】
例えば、検索条件として、”１９７０年生まれの社員”と入力した場合、この文の定義語変換行列Ｍ１、Ｍ２、…Ｍｎの抽出情報は、
”１９７０年生まれの社員”となり、１９７０年は日付、社員は人称代名詞として判断することができる。この場合、関連定義語Ｒ１３へ”誕生日”が登録された名詞属性を持つ定義語”生まれ”の使用方法Ｒ９が
”日付＋生まれ＋人称名詞”の場合、この名詞を含む定義語変換行列の列の先頭行へ移動する手続きを記述しておく。”１９７０年生まれ”は、”名詞＋名詞”となり、名詞結合状態を示す。この場合、”１９７０年”のＲ１３は”日付”、”生まれ”のＲ１３は”誕生日”を持つが、”日付”と”誕生日”は一致しない。この場合、定義語”日付”と”誕生日”のＲ１３を参照すると、”誕生日”のＲ１３に”日付”が存在する。従って、”１９７０年生まれ”の結合名詞に関して推定したＲ１３は、”日付”または”誕生日”となる。従って、上記例で、知識データ記憶部７へ蓄積した知識データに対し、”日付”と”社員”、”誕生日”と”社員”の条件で検索すると、
”テーブルＡのカラムＢは社員の生年月日”を取得することができる。この文は格助詞により接続された名詞を含んでいるが、格助詞”の”による接続は、
無条件で単一名詞として解釈することができるので”Ａ１はＡ２”となる文章形態を構成すると判断でき、”誕生日”と”社員”を含むのはＡ２であることも判断できる。従って、接続詞”は”を示す定義語の回答生成方法Ｒ１２に、知識データ記憶部７の知識データを検索した結果からテーブル名やカラム名を、入力した検索条件から範囲条件を抽出する手続きを記述しておく。
【００５５】
上記自然言語による検索条件を、ＳＱＬ文により記述するデータベース検索に適応した場合、知識データ記憶部７を検索した結果からテーブル名、カラム名を抽出するとテーブル名＝”テーブルＡ”、カラム名＝”カラムＢ”となり、入力した検索条件から範囲条件を抽出すると”１９７０”となる。従って、検索条件を含むＳＱＬ文は
”ＳＥＬＥＣＴ＊ＦＲＯＭテーブルＡＷＨＥＲＥカラムＢ＝
１９７０；”となる。アクセスする外部データベース名はシステムにより既知であるが、データベース名を文書ファイルへ記述し、知識データとして蓄積することにより、データベースのオープンを行うことができる。また、複数の外部データベースが存在するような場合、文書ファイルへ外部データベースごとにテーブル、カラム情報を記述することで、外部データベースの検索条件から外部データベースを選択することができる。
上記検索条件をステップ７（Ｓ７）で検索条件として蓄積しておく。
【００５６】
ステップ８（Ｓ８）で上記検索条件で外部データベース６０を検索するか操作者に対して質問を行い、検索しない場合、次の条件を入力する。
次の検索条件として、”または、１９６０年生まれの社員”と入力した場合、上記検索条件に、入力条件を付加することで、複合条件を作り出す。この場合、接続詞”または”を示す定義語の同意定義語Ｒ１４へ”ＯＲ”を登録しておく。
上記例と同様に条件解析を行うと、ＳＱＬ文の条件項目は
”ＯＲカラムＢ＝１９６０”となる。従って、２つの条件を結合したＳＱＬ文は、
”ＳＥＬＥＣＴ＊ＦＲＯＭテーブルＡＷＨＥＲＥカラムＢ＝１９７０ＯＲカラムＢ＝１９６０；”とすることができる。
【００５７】
上記例において、表示条件を同様に設定することができる。テーブルＡにカラムＣ、Ｄが存在する場合、表示条件として”カラムＣ、カラムＤを表示”と入力した場合、表示カラムとしてカラムＣ、Ｄを上記検索条件に結合させるとＳＱＬ文は、
”ＳＥＬＥＣＴカラムＣカラムＤＦＲＯＭテーブルＡＷＨＥＲＥ
カラムＢ＝１９７０ＯＲカラムＢ＝１９６０；”とすることができる。
【００５８】
Ｓ８で外部データベースへ上記検索条件で外部データベース３０を検索する指示を行った場合、ステップ９（Ｓ９）で上記検索条件により外部データベース３０をアクセスし、検索したデータを取得し、表示装置へ検索結果を表示することができる。
【００５９】
【発明の効果】
以上説明したように本発明の自然言語応答装置及び方法は、人間とコンピュータ、コンピュータとコンピュータが自然言語による対話を行うことが容易にできる。このことは、音声入力や文字列入力による対話が可能となる。
【００６０】
また、本発明に係る自然言語応答装置及び方法は、文法パターンを必要としない解析方法を用い、語彙の処理方法を語彙のデータとして取り扱うことにより、言葉の持つ多くの処理形態を単純化した特定処理とすることが可能となる。
【００６１】
漢字変換装置において、文章や語句の意味を本発明の方法を用いることで、複数行にわたる漢字、ひらがな、英字、数字、記号およびその組み合わせによる文字列を従来行われている漢字変換ではなく、正規化された標準文字列へ変換することが容易に可能となる。
【００６２】
本発明の方法により、対話によって操作方法を導出しながら、コンピュータに蓄積した情報を取り出すことが可能なため、コンピュータの知識や蓄積データに関する知識を必要としない操作が可能である。
【図面の簡単な説明】
【図１】本発明を適用した自然言語応答装置の構成を示したブロック図である。
【図２】本発明を適用した定義語情報の領域を示した説明図である。
【図３】本発明を適用した定義語変換行列の初期状態のデータ構造を示した説明図である。
【図４】本発明を適用した定義語情報の使用方法へ登録する手続きの１例を示した図である。
【図５】本発明を適用した定義語変換行列の確定状態のデータ構造を示した説明図である。
【図６】本発明を適用した意味分割文字列の定義語変換行列を示した説明図である。
【図７】本発明を適用した文章の情報抽出状態を示した説明図である。
【図８】本発明を適用した階層構造を持つ知識データ構造を示した説明図である。
【図９】本発明を適用した定義語変換行列の変換結果、情報抽出結果の表示例を示した説明図である。
【図１０】本発明を適用した質問に対する定義語変換行列の確定状態のデータ構造を示した説明図である。
【図１１】本発明を適用した定義語情報の回答方法へ登録する手続きの１例を示した図である。
【図１２】本発明を適用した標準文字列自動変換装置の構成を示したブロック図である。（実施例１）
【図１３】本発明を適用した自然言語による外部データベースのデータ検索を行う処理の流れ図を示した説明図である。（実施例２）
【符号の説明】
１自然言語応答装置
２入力部
３文字変換部
４定義語記憶部
５情報抽出部
６情報蓄積部
７知識データ記憶部
８回答生成部
９回答表示部
１０標準文字列自動変換装置
１１変換結果一時記憶部
１２文字出力部
１３ディスプレイ
１４スピーカー
２０定義語情報
３０外部データベース[0001]
[Technical field to which the invention belongs]
The present invention relates to a natural language response device and method for asking a question in natural language for knowledge data obtained by extracting information stored in a character string expressed in a natural language and storing the information, and acquiring the stored information About.
[0002]
[Prior art]
In conventional morphological analysis, the meaning of a vocabulary is defined in advance using probabilities, frequency of use, set theory, and tree structure, and the meaning of a sentence is determined based on a logical value based on a combination of vocabularies and an area based on set theory. In addition, the computer operation by words stores a specific word, registers an operation corresponding to the word, a word, and a sentence, and executes the operation by a combination of logic circuits.
[0003]
Grammar analysis techniques are applied in context analysis of sentences. In this method, the grammar is classified into a fixed pattern, and the meaning is determined by the semantic connection condition of the vocabulary that matches the pattern.
[0004]
However, the meaning of the vocabulary that makes up the sentence is determined by the relationship between the vocabularies,
As a result, the grammar needs to be constructed, and fixedly defining the meaning and grammar of the vocabulary limits these relationships. The analysis of sentences using vocabulary or grammar increases the number of fixed patterns, and the analysis of sentences using limited patterns involves approximation by pattern recognition, making it difficult to grasp accurate information.
[0005]
Semantic analysis as described above
Japanese Patent Laid-Open No. 4-191338 “Question Answering Method and Device”,
Japanese Laid-Open Patent Publication No. 5-230701 “Information Extraction Device”,
Japanese Laid-Open Patent Publication No. 7-0778355 “Natural Language Semantic Analysis Processing Device”,
Japanese Laid-Open Patent Publication No. 10-304583 “Natural Language Processing Device and Method” is available.
[0006]
[Problems to be solved by the invention]
The meaning of sentences expressed in natural language is determined by the combination of the vocabularies that compose the sentences, and the meaning of each vocabulary that composes the sentences as a result of the influence of one vocabulary on the other vocabulary. And the meaning of the whole sentence needs to be determined. Further, since the sentence includes the meaning indicated by the whole sentence, it is necessary to use a method of determining the image expressed by the vocabulary constituting the sentence and determining the meaning of each vocabulary constituting the sentence.
[0007]
Therefore, in order to realize a morphological analysis that does not require a grammatical pattern for syntactic analysis, the present invention autonomously determines the influence of vocabularies constituting a sentence on the influence of other vocabularies, and determines a combination of vocabularies constituting the sentence. By changing the vocabulary (hereinafter, this is defined as the influence on other vocabularies), the vocabulary that constitutes the sentence is determined, the meaning of the combination of the vocabulary is estimated, and the meaning that gives an overview of the whole sentence It is an object to obtain information stored in a natural language by estimating and extracting information contained in a sentence and storing the extracted information in a natural language.
[0008]
[Means for Solving the Problems]
In order to solve the above-mentioned object, the natural language response device and method according to the present invention uses a minimum unit constituting a sentence as a definition word, and the definition word is a kanji, hiragana, katakana, alphabet, numeral that can specify the meaning. , Symbols and combinations of character strings. How to use a definition word, extraction method, operation method, answer method to input a sentence in natural language and describe the processing procedure that describes the effect of all definition words that can be constituent elements of the sentence on other definition words in the description language Define as necessary.
Definition words that make up a sentence by defining the procedure described in the description language as part of the definition word data, where human beings have determined differences in meaning and usage as general common sense. By executing the procedure defined for each definition word, the definition word itself judges the influence on itself or other definition words, and the combination structure of the definition words constituting the sentence is determined. Change, create the optimal connection of the definition words that compose the sentence,
[0009]
Even when extracting information contained in sentences or when answering questions, search the stored information by executing the procedure in the descriptive language defined in the operation method of the definition word and the answer method, which is caused by the combination of phrases By creating conditions, it is possible to accurately create answers according to individual situations.
[0010]
When the definition word information is data defining the definition word and the procedure describing the above method for the definition word and the linguistic attribute of the definition word, the sentence can be defined as the connection state of the definition words. The definition word information stores data that can operate as a small program, and the procedure of the morphological analysis process can be determined by the combination of the definition word information constituting the sentence. This means that a robot can be arranged on a production line such as an assembly factory to produce one product, but other products can be produced by changing the function and configuration of the robot. This is because each robot uses only a limited function, but the entire production line can produce any product by balancing and configuring the individual robots. The procedure described in the definition word information corresponds to this robot, the procedure has a limited motion definition, and the sentence formed by connecting the definition word information corresponds to the production line. As a product, the syntax of a sentence can be analyzed and information contained in the sentence can be acquired by the operation of definition word information that is a robot.
[0011]
This is characterized by the fact that it is not necessary to interpret the grammar of the entire sentence, and that it is possible to cope with any sentence expression by a combination of definition words by describing the influence of the definition word on other definition words as data. Yes.
[0012]
By performing the above operations and extracting the information contained in the sentence as individual sentences, it is not necessary to convert the sentence or vocabulary into a mathematical model or numerical value, so the amount of information required for analysis and the amount of data to be stored are reduced. It is possible to talk at a natural speed by shortening the analysis time. Also, by storing the information contained in the text as knowledge data using character strings of combinations of defined words, it is possible to accumulate information and meaning contained in the expression of words that are difficult to quantify with less data It is characterized by being able to do it.
[0013]
DETAILED DESCRIPTION OF THE INVENTION
In order to solve the above-described problems, embodiments of the present invention will be described in detail with examples. In this example, as a simple exchange of words,
Information 1 “I went to the park and ate a lunch.”
Question 1 "What did you eat?"
Question 2 “Where did you eat lunch?”
Question 3 “What did you eat in the park?”
Question 4 "Did you eat lunch?"
Question 5 “Did you eat lunch in the park?”
Question 6 “Did you eat lunch in the park?”
Question 7 “Did you eat an apple in the park?”
Assuming a conversation with questions 1 to 7 with respect to the above information 1, humans can easily answer, but information 1 is stored in a computer and the above questions are input by humans. In the prior art, it is not easy for the computer to answer the question. This is because the conventional method attempts to realize dialogue with the computer by using limited words and grammar. In the present invention, by accumulating information such as information 1, a method for accurately answering variously expressed questions such as the above questions 1 to 7 is described in detail, and the natural language response device is described. Do.
[0014]
The present invention is applied to, for example, a natural language response device 1 as shown in FIG. The natural language response device 1 includes an input unit 2 for operating the natural language response device 1 by an operator inputting information or a character string. The input unit 2 is composed of a pointing device such as a keyboard or a mouse, for example, and becomes an object of input processing when operated by an operator, and is sent to the character conversion unit 3 as a character string indicating a natural language converted by the input unit. Is output.
[0015]
The input unit 2 described above is not limited to a keyboard or the like. For example, the input unit 2 may convert a natural language into a character string by performing voice recognition on a voice signal input from a microphone. Moreover, it is possible to input external text data written in a data file, and any means capable of inputting a natural language can be widely applied.
[0016]
When the sentence of the information 1 is input by the input unit 2 in a natural language, if the character string output to the character conversion unit 3 is, for example, hiragana,
The character conversion unit 3 inputs “I have eaten the word” as the character string of the information 1 converted by the input unit 2, and the definition word information stored in the definition word storage unit 4 is input. Use it to convert to a standard character string normalized.
The definition word information 20 has areas R1 to R14 as shown in FIG. 2 and corresponds to what is called a dictionary in the conventional morphological analysis, but is not simply stored fixed information such as vocabulary and part of speech. A part of the morpheme analysis process is defined as data of the definition word information 20. This removes the analysis process from the conventional main program that determines the connection information and syntax of the vocabulary constituting the sentence, and sets only the standard analysis process in the main program. For sentence analysis processing that cannot be performed by the standard analysis processing of the main program, the analysis processing is defined as a procedure in a description language as data of definition word information. Therefore, the analysis processing of the natural language response device 1 is determined by a combination of the procedures described in the definition word information 20 constituting the sentence.
[0017]
FIG. 2 shows the data structure of the definition word information 20. The definition word information 20 records, as definition words, character strings whose meaning can be specified by kanji, hiragana, katakana, alphabetic characters, numbers, symbols, and combinations thereof. As long as the definition word is a character string whose meaning can be specified, a file name, image data, or the like may be expressed by a character string, and need not be a word. In the definition word R1, the identification number R2, the reading R3, the sending kana R4, the part of speech R5, the special use R6, the use R7, and the notation method R8 are stored as attributes, and in order to digitize the meaning of the definition word R1, Usage method R9 describing the procedure for changing the standard selection procedure, operation method R10 describing the operation procedure, sentence meaning extraction method R11 describing the procedure for changing the standard sentence meaning extraction procedure, standard answer generation procedure Generation method R12 that describes the procedure for changing the name, related definition word R13 that describes one or more other definition words that can be imagined from the definition word, and other definition words that have the same meaning as necessary One or more synonymous definition terms R14 are stored in accordance with.
[0018]
The identification number R2 is a unique number assigned to the definition word R1, and is used to specify the definition word R1 when the meaning differs depending on the homonym and the reading. In “Kin” and “Kane”, the notation “Kin” is the same, but the usage and meaning differ depending on how it is read. In order to distinguish such vocabulary, the definition word R1 is distinguished by the identification number R2.
[0019]
The reading R3 is defined using a hiragana that is the pronunciation of the definition word R1 divided into one character of the definition word R1. When "structure" is defined as the definition word R1, in the reading R3, it is defined by separating it with a white space character as "this □ elephant" (□ represents a space character). As a result, it is possible to search “structure”, “construction”, and “construction” as “structure”. The sending pseudonym R4 defines one or more sending pseudonyms when the sending pseudonym is required in the use of an idiomatic phrase or the like. Sending kana such as verbs and adjectives are defined by definition word R1, special use R6, and use R7. In addition, in order to perform a search for abbreviated sending kana, a notation such as “done” or “done” is registered in R3 as “done”, and abbreviated sending kana as part of the attribute of the notation method R8 of the definition word R1. Define the attribute to be searched. If the last character of the reading R3 is the same as the last character of the reading R3, the definition word R1 having the sending kana abbreviation search attribute "does" or "does" by omitting the last character of R3 as a search target. It is possible to search as “perform standard character string”.
[0020]
The part of speech R5 defines a grammatical part of speech used in linguistics. Further, the distinction and unit of the first name and the last name with respect to the person name are defined in R5 as an extended part of speech.
[0021]
Special utilization R6 defines special utilization such as utilization of sub-variables, utilization of adjective verbs, utilization of union forms, utilization of adverbs, utilization of cucumbers, utilization of siku, utilization of talis, utilization of nari.
When “definition” is defined as a definition word, it is defined that special variable R6 is used as a sub-variable. This defines usage changes including particles such as “define”, “defined”, “not defined”, final particles, case particles, etc. , Past forms, progressive forms, affirmation, denial, questions, etc.
[0022]
Utilization R7 is a verb utilization other than special utilization R6. It is a five-stage utilization, an upper-stage utilization, a lower-stage utilization, an upper-second-stage utilization, a lower-second-stage utilization, a Ka-line modification utilization, a Sa-line modification utilization, a Na-line modification utilization, Define grammatical verb usage, such as the use of la line modification. As a result, similar to the special use R6, it is possible to recognize a grammatical present tense, past tense, progressive tense, affirmation, denial, question, and other sentence forms and logical values.
[0023]
The notation method R8 defines a standardized notation format. Expressing “Kilometer” as “Km” and “Apple” as “Apple” are defined as standard character strings. In addition, it is defined as a standardized standard character string that “do” is used for the sending of a sending pseudonym such as “do” or “do”.
[0024]
The usage method R9, the operation method R10, the sentence meaning extraction method R11, and the answer generation method R12 are procedures written in a program description language for changing the natural language analysis processing procedure, and are related to the definition word R1 in the sentence. Define a procedure that changes the combination of definition words on the condition of.
[0025]
The related definition word R13 defines one or more other definition words R1 that can be imagined from the definition word R1 as necessary. When “apple” is defined as the definition word R1, definition words R1 such as “food”, “fruit”, “tree”, “red”, “delicious” and the like can be defined as the related definition word R13. Thereby, the unified meaning which arises by the combination of definition word R1 can be estimated using this related definition word R13. The synonymous definition word R14 defines one or more definition words having the same meaning and different expression methods as necessary. “Gold” and “gold” refer to the same “precious metal gold”, although they have different notations. When “Gold” is written in the text, it can be recognized as “Kin”.
[0026]
The character conversion unit 3 in FIG. 1 can acquire the definition word information 20 from the definition word storage unit 4 with respect to the character string “I have eaten the text” and can compose the character string. Create a combination. In this combination, based on the position information of the character string, the definition word R1 is defined as the definition word information 20 starting from the same position, and a plurality of lines are combined into one column and connected in the order of the position of each column. . The state of this combination is called a definition word conversion matrix and is shown in FIG. In the definition word conversion matrix of FIG. 3, vocabulary starting from “ko” that is the first character of the input character string is arranged in a first column, and the second column starts from the second character “u” of the input character string. Vocabulary is organized into columns. This definition word conversion matrix M1 shows the initial state for the input character string. An expression in which a definition word is surrounded by a rectangle in the definition word conversion matrix M1 indicates that the definition word information 20 includes R1 to R14 shown in FIG. 2, and the vocabulary in the rectangle indicates the definition word R1.
[0027]
In the column constituting the definition word conversion matrix M1, conversion candidates corresponding to the input character string are grouped according to the start position of the input character string, and therefore there are columns that are not used by the selection of the definition word R1.
FIG. 3 shows that “sponsorship” is selected as a conversion candidate for the input character string “Koen” in the first row of the first column. In this case, “Kouen”, which is a part of the input character string, is referred to as a first column conversion area. Assuming that the conversion area of the first column is “Koen”, the conversion areas of the second and third columns overlap with the conversion area of the first column and cannot be connected to the first column. The second column and the third column in FIG. 3 displayed one level lower indicate a state in which connection is not possible. The fourth column can be connected to the first column, and the conversion region of the first column is in contact with the conversion region. The conversion candidate “bento” can be connected to the conversion area “Bento” in the seventh column that can be connected to the fourth column. In this case, it is clear that the three conversion candidates are all nouns, but in order to determine a new noun by performing noun combination, it is a necessary condition that the meaning of the related definition word R13 in the definition word information 20 is uniform. It becomes.
[0028]
The related definition word R13 of the definition word information 20 is defined by another definition word that can be imagined from the definition word R1. This means that by defining the image that can be imagined from the character string and words of the definition word R1 with other words, the group to which the definition word R1 belongs can be conceptualized, and it is necessary to form a fixed group. Absent. By defining a plurality of words in the related definition word R13, a plurality of different concept groups can be defined, or by defining a similar concept group, a narrower concept group can be formed simultaneously. . For example, by defining the broad meaning of the noun “dog” using words such as “animal” and “living organism”, and at the same time, defining the breed of dogs such as “Akita Inu”, “Shepard”, and “Bulldog”. You can make up a group of dogs themselves and do not need to be aware of the interrelationships and dependencies of registered words.
[0029]
The conversion candidate can determine whether meaning can be created when the noun combination is performed by determining the uniformity of meaning using the related definition word R13. The related definition word R13 can be defined by any other definition word that can be created from the definition word R1, and assuming the related definition word R13 for the three nouns,
R13 such as “person”, “venue”, “meeting”, “lecture hall” for “support”
R13 such as “description”, “explanation”, “arrange”, etc.
R13 such as "Food", "Meal", "Love wife", "Home" for "Bento"
Define There is no common word for judging the unity of meaning from R13 of the top three nouns. Furthermore, even if the definition word information 20 of the words defined in R13 (for example, “person”, “description”, etc.) is obtained and the related definition words R13 are compared, the common definition word R13 is shared by the second related definition word R13. I can't find words. Therefore, it can be determined that there is no unity of meaning for the top three nouns, and the three nouns cannot be combined. It can also be determined that the combination of the top two nouns “sponsored” and “joint” cannot be combined. From the above, it can be determined that the “concurrent writing” that is the second conversion candidate is not a conversion candidate. For the fourth column of the transformation matrix M1 in FIG.
It can be determined that “weapons” and “peace” are not unified by noun combination. From this result, when it is assumed that the conversion area of the first column is “Kouen”, it can be determined that it is an error that the conversion area connected to the conversion area of the first column is “successful”. When the conversion candidate for the conversion region in the fourth column is moved to the lowest row in the fourth column, the conversion region becomes “to”.
[0030]
Assuming that the conversion region in the fourth column is “to”, the conversion candidate in the fourth column is “to” the case particle. For the usage method R9 of the definition word information 20 for the case particle “to”, the usage method for “to” is described in a description language. FIG. 4 shows an example of the usage method described in the description language registered in the usage method R9 of the case particle “to”. The content described in this usage method R9 is as a connection condition to the case particle "to"
Define a connection condition that has the priority of "noun + to + verb", "noun + to + adjective", "noun + to + adverb", or "noun + to", and connect to the case particle after "to" Specifies that the part of speech to be moved to the beginning of the column to which it belongs. However, in order to be able to move, the unification of the meaning of the noun that precedes “to” and the part of speech that moves is a necessary condition.
The fifth column connects to the fourth column conversion area, and the first column conversion area.
When the combination of the conversion candidate having the noun attribute for “Kouen” and the conversion candidate for the fifth column conversion area “Iki” is subjected to the above-described uniformity of the meaning, “noun + to + verb” in the connection conditions With respect to “,” “park” in the first column and second row and “go” in the first row in the fifth column, it can be determined that “going to park + to + go” holds. This is because the noun “park”, R13 such as “place”, “play”,
R13 such as “place” and “building” for the verb “go”
Can be defined, and it can be determined that the meaning is uniform by the related definition word “location”. Therefore, it can be determined that “Go to Koen” is “Go to the park”. The same process can be performed for the remaining “between meals”. As a connection condition to the usage method R9 of the conjunction ""
"Noun + a + verb", "Noun + a + adjective", "Noun + an adverb",
By defining “noun +”, etc., “eating lunch” can be established by considering the unity of the meanings of the nouns in front of “to” and the parts of speech behind “to” Can be judged. Therefore, the transformation matrix M1 shown in FIG. 3 finally has the structure of the transformation matrix M1 shown in FIG. By using the definition word in the first row of the connected columns in FIG. 5, the conversion result for the input character string can be obtained.
[0031]
The information extraction unit 5 in FIG. 1 extracts information contained in the input character string using the definition word conversion matrix M1 shown in FIG. Information is extracted by dividing it into the smallest units that make up a sentence. Usually, sentence division is performed in the form of “noun + verb” as one unit, but when the extraction method is described in the sentence meaning extraction method R11 of the definition word information 20, the described procedure is executed. By doing. If the conversion result “I went to the park and ate lunch” in FIG. 5 is divided into units of “noun + verb”, it can be divided into “go to the park” and “eat the lunch”. The above sentence divided into two is a character string corresponding to the input character string
“Kenen Iki” and “Bento Eta” are acquired, and the respective definition word conversion matrices M2 and M3 of the divided sentence shown in FIG. 6 are created using the above conversion method.
[0032]
The definition word conversion matrices M2 and M3 are the result of dividing the definition word, which is a component of M1, into the smallest units that can constitute a sentence, and the influence of the usage method R9 of the definition word information 20 and the related definition word R13. The result is minimized. For this reason, M2 and M3 are converted by a direct vocabulary relationship. If the results of M1, M2, and M3 are different, the conversion results of M2 and M3 have higher reliability. In such a case, the reliability can be further improved by reflecting the results of M2 and M3 on M1. In general, it is possible to determine whether or not the conversion result of M1 is correct by M2, M3,... Mn (n indicates the number of information included in the input character string), and if the conversion result of M1 is incorrect Can automatically correct M1 using M2, M3,... Mn.
[0033]
The definition word conversion matrix M1 for the input character string is a standard character string conversion result based on the definition word, and the definition word conversion matrices M2 and M3 are information included in the input character string.
The information storage unit 6 in FIG. 1 creates the definition word conversion matrices M1, M2, and M3 in sentence units with the structure shown in FIG. 7, and stores them in the knowledge data storage unit 7 in FIG. In the structure of FIG. 7, as for the above input character string, as “M2“ Fact to go to the park ”, as M3“ Fact to eat the lunch ”, M2 and M3 connection,“ After going to the park and ate the lunch ” "Time passage and conditions".
[0034]
The information stored in the knowledge data storage unit 7 in FIG. 1 is stored using the data unit shown in FIG. 7 using the hierarchical structure shown in FIG. The hierarchical structure of FIG. 8 is created as necessary. For example, by describing a sentence for storing information in a file, and adding a chapter number and a section number to the chapter or section title, Create a hierarchical structure. In the hierarchical structure, address pointers PV (P2) and NX (P3) for connecting other records are provided in M1 in which definition word conversion matrices M2, M3. For the title text, address pointers BP (P1), PV (P2), and NX (P3) are provided. For title sentences that form a hierarchical structure, hierarchical end data P4 is provided, and the leading address of the P4 record is stored in P1 at the hierarchical starting position, and the leading address of the P1 record is stored in P4. M1 and M3 are connected to the sentence of title 1 (M1), and P1, P2, and P3 are provided in M1 as a whole, and record 1 (R1) is shown, and the connection relationship from R1 to R6 is shown. R1 has a hierarchical structure, defines R6 as an end record of R1, stores the start address of R6 in P1 of R1, and stores the start address of R1 in P1 of R6. P2 stores the top address of the previous record, and P3 stores the top address of the next record. R1 and R6 are peculiar records, the head address of R6 is stored in P2 of R1, and the head address of R1 is stored in P3 of R6. By storing the top address of the record for P1, P2, and P3 of R1 to R6, a hierarchical structure of information stored in the sentence is created, and the positional relation of the entire sentence managed for each line is constructed. By constructing a bi-directional circular search chain with P1, P2, and P3 for each row of records, it is possible to estimate the range of necessary information and enable high-speed search.
[0035]
When the information included in the input character string is insufficient, the character conversion unit 3 and the information extraction unit 5 estimate the insufficient information by searching the knowledge data stored in the knowledge data storage unit 7. For example, when converting “can be used”, there is not enough information for converting “ro”, and it cannot be determined whether it can be converted to “furnace” or “櫓”. In such a case, if the knowledge data storage unit 7 has information converted to “ride on a boat”, define “boat” in the related definition word R13 shown in FIG. Therefore, it can be converted to “can be used” by judging that it is converted to “櫓” due to the influence of “boat”. If the knowledge data storage unit 7 has been converted to “made dish”, define “ceramic”, “ceramic”, “glass”, “seto” to the related definition word R13 of “dish”, and “furnace” By defining “baking”, “ceramics”, “melting”, etc. in R13 of “”, it can be converted to “furnace” and can be converted to “furnace can be used”. If the corresponding knowledge data does not exist in the knowledge data storage unit 7, the conversion candidate can be determined by asking what it is to be used for using the related definition word R13 of “furnace” or “櫓”. It can be confirmed.
[0036]
FIG. 9 shows an example in which the conversion result and the information extraction result by the above operation are displayed on the screen.
Read "Kenen Ikibento Eta" (a1), search the definition word storage unit 4, create definition word conversion matrices M1, M2, and M3, define the definition word data in consideration of the meaning The analysis results of the word conversion matrices M1, M2, and M3 are displayed, and the contents of the extracted information stored in the knowledge data storage unit 7 are shown. The display of a2 shows the input character string. a3 is converted using the definition word conversion matrices M1, M2, and M3, and is displayed in accordance with the notation method R8 of the definition word information 20 in FIG. In the display of a4, the conversion result is displayed for each part of speech separated by a blank character. The display of a5 uses the conversion result of a3 and the reading R3 of the definition word R1, and displays the reading R3 by separating each definition word with a blank character. In the display of a6 and a7, information defined in the sentence a2 is extracted and displayed for each extracted information.
[0037]
A question is accepted in natural language about the knowledge data stored in the knowledge data storage unit 7 by the information storage method. Information accumulation and questions are switched by a pointing device such as a mouse. The operation is switched by determining from the definition word conversion matrix, which is the analysis result of the sentence, that the sentence is a question sentence from the special use R6 and the use R7 of the definition word information 20 of the verb.
The input character string of the information 1 is stored as information, and a response method when the question 1 is input will be described.
Question 1 input by the input unit 1 of FIG. 1 by the above conversion and information extraction method
"What did you eat?" Is output to the character conversion unit 3, and the result of conversion is input to the information extraction unit 5 to create the definition word conversion matrix shown in FIG. The information included in question 1 is only M2, the results of M1 and M2 are the same, and the conversion result is
"What did you eat?"
[0038]
The answer generation unit 8 in FIG. 1 receives the definition word conversion matrices M1 and M2 and generates an answer to the question 1. An answer can be generated by defining an answer method using a description language to an answer generation method R12 of the definition word information 20 for a definition word that is a key of the answer. In the case of the above question 1, the answer method is described in the answer generation method R12 of the definition word information 20 of the pronoun “what”. FIG. 11 shows a description example of the pronoun “what” in R12. The answer generation method shown in FIG.
Knowledge data stored in the knowledge data storage unit 7 by using the related definition word R13 of the verb “eat” for the question form of “what + conjunction + verb” and “what + case particle + verb” M2, M3,..., Mn, the noun connected to the matching verb, and the conjunction and verb retrieved from question 1 are used as the answer sentence. Therefore, the above procedure links the noun “bento” and the “question 1” from “M3” in FIG. 6 with the “eating” information 1 verb, and uses the R7 in FIG. By obtaining “I ate” as the response type of “I ate”, the response sentence “I ate lunch” can be generated.
[0039]
The answer display unit 9 in FIG. 1 outputs the above answer sentence to the output device, and outputs “I ate lunch” as an answer to the question 1 with respect to the information 1 by, for example, a display or voice output. Realize natural language response.
[0040]
If you entered the question 2 “Where did you eat your lunch?” To the sentence meaning extraction method R11 with the case particle “,” use “noun +” and “pronoun +” as the extraction unit, and the rest of the sentence Describe the procedure to extract by "noun + verb". In this case, as the extraction result of R11, “where” and “I ate lunch” can be defined as the definition word conversion matrices M2 and M3. In the question sentence, the extracted information that does not include pronouns is a condition for searching knowledge data. Since the information 1 satisfies this condition, it can be determined as an answer target sentence. In the answer generation method R12 of the pronoun “where”, a procedure is described in which a noun indicating “location” in the related definition word R13 is searched from the knowledge data as an answer condition. This makes it possible to create an answer condition of “in the park” and output “I ate in the park” as an answer.
On the other hand, the above question 3 can be output as “Lunch was eaten” by performing the same processing as the above processing, with “in the park” as the search condition.
In question 4 above, the pronoun “what” in question 1 is the noun “bento”
The question 4 itself is a search condition. In this case, “Yes” or “No” is the answer, and there is knowledge data that matches the conditions of Question 4.
In the form of “yes + verb”, “yes” can be output.
The above question 5 has two conditions of “in the park” and “eat lunch”, and the same result is obtained by the same process except that the number of conditions is different from that of question 4.
In the above question 6, the information extraction condition is question 2, and the search condition is the same as question 5,
The answer is the same as in question 5.
Question 7 is the same process as Question 5, but the sentence that satisfies the search condition “in the park” in the knowledge data does not contain a sentence that satisfies “eating an apple”, so “No” is output as an answer can do.
[0041]
As described above, sentences containing specific facts and information are stored as knowledge data as sentences, and sentences are used as search conditions to query the target information in natural language. Can be obtained. In addition, by describing the vocabulary-specific usage and answer methods that cannot be expressed by analysis patterns or grammatical patterns in natural language analysis in the key vocabulary, the meaning of the expression of many words can be estimated.
[0042]
【Example】
A specific example using the embodiment of the present invention will be described with reference to the drawings.
[0043]
FIG. 12 shows an embodiment of the standard character string automatic conversion apparatus 10 according to the present invention. The standard character string automatic conversion device 10 includes an input unit 2, a character conversion unit 3, a definition word storage unit 4, and an information extraction unit 5, which are the same as the natural language response device 1 of FIG. 1 described in the above embodiment. Further, a conversion result temporary storage unit 11 for temporarily storing the conversion result and a character output unit 12 for displaying the conversion result on the display device are provided.
The input unit 2 is not limited to a keyboard or the like. For example, the input unit 2 may convert a natural language into a character string by performing voice recognition on a voice signal input from a microphone.
Moreover, it is possible to input external text data written in a data file, and any means capable of inputting a natural language can be widely applied.
[0044]
The character string input by the input unit 2 uses the method of the above embodiment to obtain the definition word information 20 stored in the definition word storage unit 4 corresponding to the input character string, and the character conversion unit 3 A defined word conversion matrix M1 shown in FIG. 3 is created.
[0045]
The information extraction unit 5 uses the definition word conversion matrix M1 to extract information contained in the input character string by the information extraction method described in the above embodiment, and defines the definition word conversion matrix M1 shown in FIGS. , M2,... Mn are created, and this definition word conversion matrix is stored in the conversion result temporary storage unit 11.
The character output unit 12 outputs the converted character string to an output buffer for displaying.
[0046]
By the above-described processing, the vocabulary can be automatically selected by grasping the meaning of the sentence in the method that has been performed by the vocabulary use frequency or the menu selection of the homonyms until now. In addition, when automatic conversion cannot be performed due to lack of information, the operator does not need to learn an operation method necessary for conversion by a method of supplementing the shortage information by asking a question to the operator.
[0047]
In addition, the standard character string automatic conversion device 10 not only converts “Hiragana” to Kanji, but also converts character strings expressed by Kanji, Hiragana, Katakana, English, numbers, symbols, and combinations thereof in the above embodiment. As described, the definition word information 20 can be converted into a standardized standard character string using the definition word R1, the reading R3, the sending kana R4, and the notation method R8.
[0048]
The embodiment shown in FIG. 13 is a database search method in natural language,
The flowchart of the process which searches the data of a database by a natural language with respect to the external database connected with the communication line or the external database constructed | assembled on the same computer is shown.
By the method described in the above embodiment, an outline explanation of data registered in the external database is created as an external file, the external file is read, and knowledge data related to the external database 30 is created in the knowledge data storage unit 7. .
[0049]
For example, when the table A and the column B of the external database 30 are written to the external file “column B of the table A is the date of birth of the employee.”, The extraction information of the conversion matrices M1, M2,. Is
“Column B of table A is the date of birth of the employee”, and the result is stored in the knowledge data storage unit 7. “Birthday” is registered in the related definition word R13 of the definition word “birth date”, and “date” is registered in R13 of the definition word “birthday”.
[0050]
In step 1 (S1), a condition for searching the external database 30 from the input device is input in a natural language. The device to input the character string is a keyboard,
The present invention can be widely applied as long as it can input a natural language such as a character string input by a voice recognition system or a document stored in an external file from a voice input from a microphone.
[0051]
In step 2 (S2), when an operation to end the process is performed, the process ends. If the input character string exists, processing is performed using the method described in the embodiment of the present invention. Step 3 (S3) corresponds to the character conversion unit 3 in FIG. 1, and step 4 (S4) corresponds to the information extraction unit 5. The input character string creates a definition word conversion matrix M1 as shown in FIG. 3 in S3, and definition word conversion matrices M1, M2,... Mn shown in FIGS. In step 5 (S5), using the definition word conversion matrices M1, M2,... Mn, information is extracted with the structure shown in FIGS. 7 and 8 and stored in the temporary storage device.
[0052]
In step 6 (S6), the information stored in the knowledge data storage unit 7 related to the external database 30 is searched using the information stored in S5, and the data representation (table name, column name, If there is no stored information, the operator outputs insufficient information and prompts the operator to re-enter the conditions.
[0053]
In step 7 (S7), a condition for searching the external database 30 is generated using the condition extracted in S4. The search condition is generated using the part of speech R5 and the synonymous definition word R14 of the definition word information 20 constituting the accumulated information. Nouns are determined as table names, column names, and data values, and adjectives and adverbs are data range conditions. In addition, verbs, particles, case particles, and conjunctions are determined as calculation elements constituting conditions, calculation elements can be created using the synonymous definition word R14, and conditions with grammar are described in the answer generation method R12. The generation procedure is described using a language.
[0054]
For example, when “employees born in 1970” is entered as a search condition, the extraction information of the definition word conversion matrices M1, M2,.
“Employees born in 1970”, 1970 is a date, and employees can be judged as personal pronouns. In this case, the usage method R9 of the definition word “born” having the noun attribute in which “birthday” is registered in the related definition word R13 is
In the case of “date + born + personal noun”, a procedure for moving to the first row of the column of the definition word conversion matrix including this noun is described. “Born in 1970” becomes “noun + noun” and indicates a noun combination state. In this case, R13 of “1970” has “date” and R13 of “born” has “birthday”, but “date” and “birthday” do not match. In this case, referring to the definition words “date” and R13 of “birthday”, “date” exists in R13 of “birthday”. Therefore, R13 estimated for the combined noun “born 1970” is “date” or “birthday”. Therefore, in the above example, when the knowledge data stored in the knowledge data storage unit 7 is searched under the conditions of “date” and “employee”, “birthday” and “employee”,
“Column B of table A is the date of birth of the employee” can be acquired. This sentence contains nouns connected by case particles, but connection by case particles "
Since it can be unconditionally interpreted as a single noun, it can be determined that “A1 is composed of A2”, and it can be determined that A2 includes “birthday” and “employee”. Therefore, a procedure for extracting the table name and column name from the result of searching the knowledge data in the knowledge data storage unit 7 and the range condition from the input search condition is described in the answer generation method R12 of the definition word indicating that the conjunction “has”. Keep it.
[0055]
When the search condition in the natural language is applied to a database search described by an SQL sentence, when a table name and a column name are extracted from a result of searching the knowledge data storage unit 7, table name = “table A”, column name = ” Column B ", and when the range condition is extracted from the input search condition," 1970 "is obtained. Therefore, the SQL statement including the search condition is
"SELECT * FROM table A WHERE column B =
1970; ". The name of the external database to be accessed is known by the system, but the database can be opened by describing the database name in a document file and accumulating it as knowledge data. When a database exists, an external database can be selected from search conditions of the external database by describing table and column information for each external database in the document file.
The search conditions are stored as search conditions in step 7 (S7).
[0056]
In step 8 (S8), the external database 60 is searched using the above search condition or a question is asked to the operator. If the search is not performed, the following condition is input.
When “or an employee born in 1960” is input as the next search condition, a composite condition is created by adding the input condition to the search condition. In this case, “OR” is registered in the synonymous definition word R14 of the definition word indicating the conjunction “or”.
When condition analysis is performed as in the above example, the condition items in the SQL statement are
“OR column B = 1960”. Therefore, the SQL statement that combines the two conditions is
“SELECT * FROM table A WHERE column B = 1970 OR column B = 1960;”.
[0057]
In the above example, the display conditions can be set similarly. When columns C and D exist in table A, if “display column C and column D” is input as a display condition, and if columns C and D are combined with the above search condition as a display column, the SQL statement is
"SELECT column C column D FROM table A WHERE
Column B = 1970 OR column B = 1960;
[0058]
If an instruction to search the external database 30 with the above search condition is given to the external database in S8, the external database 30 is accessed according to the search condition in step 9 (S9), the searched data is acquired, and the search result is sent to the display device. Can be displayed.
[0059]
【The invention's effect】
As described above, the natural language response apparatus and method of the present invention can facilitate the interaction between a human and a computer, and between the computer and the computer in a natural language. This enables dialogue by voice input or character string input.
[0060]
In addition, the natural language response device and method according to the present invention uses an analysis method that does not require a grammatical pattern, and handles a vocabulary processing method as vocabulary data, thereby simplifying many processing forms of words. Processing can be performed.
[0061]
By using the method of the present invention, the kanji conversion device uses the method of the present invention to convert a kanji, hiragana, alphabetic characters, numbers, symbols, and combinations thereof into a regular character string instead of the conventional kanji conversion. It can be easily converted into a standardized character string.
[0062]
According to the method of the present invention, it is possible to take out information stored in a computer while deriving an operation method by dialogue, and therefore, an operation that does not require knowledge of the computer or knowledge about stored data is possible.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a natural language response device to which the present invention is applied.
FIG. 2 is an explanatory diagram showing an area of definition word information to which the present invention is applied.
FIG. 3 is an explanatory diagram showing a data structure in an initial state of a definition word conversion matrix to which the present invention is applied.
FIG. 4 is a diagram showing an example of a procedure for registering in the definition word information usage method to which the present invention is applied.
FIG. 5 is an explanatory diagram showing a data structure of a defined state of a definition word conversion matrix to which the present invention is applied.
FIG. 6 is an explanatory diagram showing a definition word conversion matrix of a semantic division character string to which the present invention is applied.
FIG. 7 is an explanatory diagram showing an information extraction state of a sentence to which the present invention is applied.
FIG. 8 is an explanatory diagram showing a knowledge data structure having a hierarchical structure to which the present invention is applied.
FIG. 9 is an explanatory diagram showing a display example of a conversion result and an information extraction result of a definition word conversion matrix to which the present invention is applied.
FIG. 10 is an explanatory diagram showing a data structure of a defined state of a definition word conversion matrix for a question to which the present invention is applied.
FIG. 11 is a diagram showing an example of a procedure for registering in a definition word information answering method to which the present invention is applied.
FIG. 12 is a block diagram showing the configuration of a standard character string automatic conversion apparatus to which the present invention is applied. Example 1
FIG. 13 is an explanatory diagram showing a flowchart of a process for searching for data in an external database in a natural language to which the present invention is applied. (Example 2)
[Explanation of symbols]
1 Natural language response device
2 Input section
3 Character conversion part
4 definition word storage
5 Information extraction part
6 Information storage
7 Knowledge data storage
8 Answer generator
9 Answer display area
10 Standard character string automatic converter
11 Conversion result temporary storage
12 character output part
13 Display
14 Speaker
20 Definition word information
30 External database

Claims

A character string composed of kanji, hiragana, katakana, alphabetic characters, numbers, symbols and combinations thereof that can specify the meaning is called a definition word.
Input means for receiving natural language input;
A conversion means for converting the input natural language into a normalized standard character string;
For each definition word, the description language describes how to use the definition word when it is used in the input natural language, how to extract the definition word, and how to answer the question about the definition word. Definition word storage means for storing a procedure and a plurality of attributes of the definition word as semantic determination information;
Among the meaning determination information, semantic estimation means for estimating the meaning of each definition word by determining a method of using each definition word and a combination of definition words constituting the standard character string using a plurality of attributes,
Information extracting means for extracting information contained in the natural language input by the method of extracting each definition word estimated by the meaning estimating means;
Knowledge data storage means for storing knowledge data information extracted by the information extraction means;
A natural language question is received by the input means, a combination of definition words constituting the question is estimated by the semantic estimation means, and a definition word answering method in the combination and knowledge data stored by the knowledge data storage means A natural language response device comprising: an answer generation unit configured to generate an answer corresponding to a question using information.

The meaning determination information stored in the definition word storage means stores a processing procedure in which the definition words constituting the input natural language influence each other due to the relationship in the definition word usage method described in the description language. The natural language response apparatus according to claim 1, wherein the natural language analysis is performed using the stored processing procedure.

The semantic determination information stored in the definition word storage means stores a processing procedure in which the definition words constituting the input natural language influence each other due to the relationship described in the definition word extraction method in the description language. And using the memorized processing procedure,
The natural language response device according to claim 1, wherein information included in the input natural language is extracted with a character string expressed by a combination of definition words.

The meaning determination information stored in the definition word storage means stores a processing procedure in which the definition words constituting the input natural language influence each other due to the relationship in the definition word answering method described in the description language. And using the memorized processing procedure,
The natural language response apparatus according to claim 1, wherein an answer generation process corresponding to an information acquisition request included in the input natural language is performed.

Character conversion, vocabulary meaning determination, sentence meaning determination, extraction of information contained in the sentence, vocabulary structure change and syntax structure change based on the vocabulary, and response sentence generation using a defined word conversion matrix The natural language response device according to claim 1.

A natural language response method executed by a computer,
A character string composed of kanji, hiragana, katakana, alphabetic characters, numbers, symbols and combinations thereof that can specify the meaning is called a definition word.
For each definition word, the description language describes how to use the definition word when it is used in the input natural language, how to extract the definition word, and how to answer the question about the definition word. Semantic determination information consisting of a procedure and multiple attributes of its definition word is stored,
Accepts natural language input, converts the input natural language into a standardized standard string,
Among the above-mentioned semantic determination information, the meaning estimation process of each definition word is performed by determining the combination of definition words constituting the input natural language using the use method of each definition word and a plurality of attributes. The information included in the natural language input by each definition word extraction method is extracted, and the extracted information is stored as knowledge data information.
Accepts a natural language question input, estimates the combination of definition words constituting the question by the semantic estimation process, and responds to the question using the definition word answer method in the combination and the knowledge data information Generate an answer,
Natural language response method characterized by that.

On the computer,
A character string composed of kanji, hiragana, katakana, alphabetic characters, numbers, symbols and combinations thereof that can specify the meaning is called a definition word.
For each definition word, the description language describes how to use the definition word when it is used in the input natural language, how to extract the definition word, and how to answer the question about the definition word. Semantic determination information consisting of a procedure and multiple attributes of its definition word is stored,
Accepts natural language input, converts the input natural language into a standardized standard string,
Among the above-mentioned semantic determination information, the meaning estimation process of each definition word is performed by determining the combination of definition words constituting the input natural language using the use method of each definition word and a plurality of attributes. The information included in the natural language input by each definition word extraction method is extracted, and the extracted information is stored as knowledge data information.
Accepts a natural language question input, estimates the combination of definition words constituting the question by the semantic estimation process, and responds to the question using the definition word answer method in the combination and the knowledge data information Generate an answer,
The computer-readable recording medium which recorded the program for performing the process with.

A character string composed of kanji, hiragana, katakana, alphabetic characters, numbers, symbols and combinations thereof that can specify the meaning is called a definition word.
Input means for receiving natural language input;
A conversion means for converting the input natural language into a normalized standard character string;
For each definition word, the description language describes how to use the definition word when it is used in the input natural language, how to extract the definition word, and how to answer the question about the definition word. Definition word storage means for storing a procedure and a plurality of attributes of the definition word as semantic determination information;
Among the meaning determination information, semantic estimation means for estimating the meaning of each definition word by determining a method of using each definition word and a combination of definition words constituting the standard character string using a plurality of attributes,
Information extracting means for extracting information contained in the natural language input by the method of extracting each definition word estimated by the meaning estimating means;
Means for converting the input natural language into a more normalized standard character string by constructing a sentence by a combination of definition words using the analysis result of the semantic estimation means and the information extraction means. Standard standard character string automatic conversion device.

The semantic determination information stored in the definition word storage means defines a state in which the definition words constituting the input natural language influence each other depending on the relationship in the state used in the input natural language. 9. The input natural language is converted into a normalized standard character string using a processing procedure that is classified into a word usage method and an extraction method and described in a description language. Standard string automatic conversion device.

The meaning determination information stored in the definition word storage means stores a processing procedure in which the definition words constituting the input natural language influence each other due to the relationship in the definition word usage method described in the description language. 9. The standard character string automatic conversion device according to claim 8, wherein natural language analysis is performed using the stored processing procedure.

The semantic determination information stored in the definition word storage means stores a processing procedure in which the definition words constituting the input natural language influence each other due to the relationship described in the definition word extraction method in the description language. And using the memorized processing procedure,
9. The standard character string automatic conversion device according to claim 8, wherein information included in the input natural language is extracted as a character string expressed by a combination of definition words.

9. The character conversion, lexical meaning determination, sentence semantic determination, extraction of information included in a sentence, vocabulary structure change and lexical structure change centered on a vocabulary are performed using a definition word conversion matrix. Standard string automatic conversion device described in 1.

A standard character string automatic conversion method executed by a computer,
A character string composed of kanji, hiragana, katakana, alphabetic characters, numbers, symbols and combinations thereof that can specify the meaning is called a definition word.
For each definition word, the description language describes how to use the definition word when it is used in the input natural language, how to extract the definition word, and how to answer the question about the definition word. Semantic determination information consisting of a procedure and multiple attributes of its definition word is stored,
Accepts natural language input, converts the input natural language into a standardized standard string,
Among the above-mentioned meaning determination information, the meaning estimation process of each definition word is performed by determining a combination of definition words constituting the standard character string using a method of using each definition word and a plurality of attributes.
Perform information extraction processing to extract information included in the natural language input by the method of extracting each definition word estimated by the semantic estimation processing,
By using the analysis result of the semantic estimation process and the information extraction process, by constructing a sentence by a combination of definition words, the input natural language is converted into a more normalized standard character string.
The standard character string automatic conversion method characterized by this.

On the computer,
A character string composed of kanji, hiragana, katakana, alphabetic characters, numbers, symbols and combinations thereof that can specify the meaning is called a definition word.
For each definition word, the description language describes how to use the definition word when it is used in the input natural language, how to extract the definition word, and how to answer the question about the definition word. Semantic determination information consisting of a procedure and multiple attributes of its definition word is stored,
Accepts natural language input, converts the input natural language into a standardized standard string,
Among the above-mentioned meaning determination information, the meaning estimation process of each definition word is performed by determining a combination of definition words constituting the standard character string using a method of using each definition word and a plurality of attributes.
Perform information extraction processing to extract information included in the natural language input by the method of extracting each definition word estimated by the semantic estimation processing,
By using the analysis result of the semantic estimation process and the information extraction process, by constructing a sentence by a combination of definition words, the input natural language is converted into a more normalized standard character string.
The computer-readable recording medium which recorded the program for performing the process with.

The plurality of attributes of each definition word of the meaning determination information are:
One or more strings describing the reading of the definition word;
The definition part of speech,
Use of definition words by part of speech,
An attribute that describes the standardized standard string representation of the definition word,
For definition words, one or more other definition words that can be imagined as needed,
One or more other definition words representing the same meaning as the definition word,
A computer-readable recording medium on which the program according to claim 8 or 14 is recorded.