JP4057962B2

JP4057962B2 - Question answering apparatus, question answering method and program

Info

Publication number: JP4057962B2
Application number: JP2003188988A
Authority: JP
Inventors: 佳美齋藤
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2003-06-30
Filing date: 2003-06-30
Publication date: 2008-03-05
Anticipated expiration: 2023-06-30
Also published as: JP2005025418A

Description

【０００１】
【発明の属する技術分野】
本発明は、ユーザの入力した質問に対する回答を出力する質問応答装置、質問応答方法及びプログラムに関する。
【０００２】
【従来の技術】
インターネットの検索エンジンに代表されるように、ユーザの検索要求に適合する文書を検索してランキングする技術が広く普及している。しかし、文書検索は、「…に関する新聞記事が読みたい」「…に関するWebページが見たい」といった検索要求を満足することはできるが、「○×社の社長は誰か？」「富士山の高さは？」「鯨は絶滅しかかっているか？」といった質問に対してダイレクトに答えを返すことができない。文書検索は、文書又は文書中のパッセージを出力するだけなので、ユーザは出力結果から自分で回答を探し出さなくてはならない。
【０００３】
後者のような質問に対する回答を直接出力するものに、質問応答システムがある。質問応答システムは、「○×社の社長は誰か？」のような質問に対しては、○×社に関する文書（例えば○×社のホームページ）を出力するのではなく、○×社の社長の人名を出力し、「富士山の高さは？」のような質問に対しては「富士山は３７７６ｍです。」のような回答を出力する。
【０００４】
このような質問応答システムが情報検索や情報抽出などの研究の発展形として近年注目を集めており、ユーザの質問に対してある程度ダイレクトな答えを返すことが可能になって来ている。例えば特許文献１のように、利用者の質問文に対して、ある程度ダイレクトな回答と、その回答が当該質問文に対する回答となっていることを利用者が確認できる根拠文書とを出力するようなシステムが知られている。この根拠文書とは、例えば、質問への回答をシステムが抽出した抽出源の文書のことである。ユーザは、根拠文書によりどの文書を根拠とした回答であるかを知ることができる。
【０００５】
【特許文献１】
特開２００２−１３２８１２公報
【０００６】
【発明が解決しようとする課題】
上記のような根拠文書は、一つのこともあるが、一般的には複数存在し得る。従来の質問応答システムでは、根拠文書が複数存在する場合、全根拠文書を列挙して提示したり代表的な根拠文書を選択して提示したりするだけであった。しかし、ユーザが唯一つの質問に対する回答のみ欲することもあるが、一連の知識・情報を欲していることも少なくない。後者の場合、ある質問の回答に係る根拠文書中に、該回答の他にも有益な情報（例えばユーザが次以降にすべき他の質問に対する回答のような関連情報）が含まれているならば、ユーザは（例えば他の質問をせずとも）根拠文書を閲覧することで効率的な情報取得が可能になる。
【０００７】
しかしながら、従来の質問応答システムでは、根拠文書中に存在する当該回答以外の情報に関する提示機能を提供することができなかった。例えば、根拠文書の評価方法として一般的に行われている、質問文中に含まれている検索キーワードとのマッチングの度合いに基づくような文書評価方法によっては、質問文に対する回答以外の情報としてどのような有益な情報が含まれているかを評価するのは困難である。また、例えば、根拠文書の要約を表示する方法では、質問内容に関わらず同じ要約が生成されるものや、元々の質問文に偏った要約が生成されるようなものしかなかった。
【０００８】
本発明は、上記事情を考慮してなされたもので、質問文に対する回答を提示するにあたって、その根拠文書に含まれる情報をも考慮した提示を可能とする質問応答装置、質問応答方法及びプログラムを提供することを目的とする。
【０００９】
【課題を解決するための手段】
本発明に係る質問応答装置は、複数の文書を記憶する第１のデータベースと、この第１のデータベースに記憶された各文書毎に、前記文書に含まれる回答候補になり得る文字列と、この文字列の回答候補としてのカテゴリを示すカテゴリ情報とを対応付けて記憶する第２のデータベースと、自然言語による質問文を入力するための入力手段と、前記質問文に基づいて、前記第１のデータベースを検索する検索手段と、前記検索手段により検索された文書中から、前記質問文に対する回答となる回答文字列を抽出する抽出手段と、前記第２のデータベースを参照して、前記検索手段により検索された文書のうち前記回答文字列を含む前記文書である根拠文書中に含まれる前記回答文字列以外の文字列に対応するカテゴリ情報を取得する取得手段と、前記回答文字列及び前記回答文字列以外の文字列に対応するカテゴリ情報を含む回答情報を出力する出力手段とを備えたことを特徴とする。
【００１０】
なお、装置に係る本発明は方法に係る発明としても成立し、方法に係る本発明は装置に係る発明としても成立する。
また、装置または方法に係る本発明は、コンピュータに当該発明に相当する手順を実行させるための（あるいはコンピュータを当該発明に相当する手段として機能させるための、あるいはコンピュータに当該発明に相当する機能を実現させるための）プログラムとしても成立し、該プログラムを記録したコンピュータ読取り可能な記録媒体としても成立する。
【００１１】
本発明によれば、質問文に対する回答を提示するにあたって、その根拠文書に含まれる情報をも考慮した提示が可能になる。よって、本発明によれば、質問に対する回答が得られるとともに、例えば、根拠文書中に回答文字列の他にどのような情報が含まれているかを容易に把握することができるようになる。
【００１２】
【発明の実施の形態】
以下、図面を参照しながら発明の実施の形態を説明する。
【００１３】
図１に、本発明の一実施形態に係る質問応答システムの構成例を示す。
【００１４】
図１に示されるように、本質問応答システムは、入力部１、文書検索部２、回答種別判定部３、回答候補抽出部４、回答生成部５、根拠文書情報付加部６、出力部７を備えている。
【００１５】
また、本質問応答システムは、文書データベース１１、表現カテゴリデータベース１２、回答候補データベース１３、質問パターンデータベース１４、根拠文書情報テーブル１５を備えるようにしてもよい。
【００１６】
なお、文書データベース１１及び又は回答候補データベース１３を備えない構成も可能である。例えば、これらデータベース１１，１３は、ＬＡＮあるいはインターネット等のネットワークを介して接続されていて検索可能となっていてもよい。
【００１７】
また、回答候補データベース１３を備えない場合あるいは回答候補データベース１３を備えるが、回答候補の内容をインターネット等のネットワークを介して外部から取得できる場合には、回答候補抽出部４及び表現カテゴリデータベース１２を備えなくて構わない。
【００１８】
図１の各部の概要は以下の通りである。
【００１９】
入力部１は、質問文の入力を行う。
【００２０】
文書検索部２は、入力された質問文に基づいて文書データベース１１を検索し、得られた文書のスコアリングを行う。
【００２１】
回答種別判定部３は、質問パターンデータベース１４を用いて、入力された質問文の回答種別の判定を行う。
【００２２】
回答候補抽出部４は、例えば表現カテゴリデータベース１２を用いるなどして、文書データベース１１に格納された検索対象文書から回答候補文字列の抽出を行い、回答候補文字列に対する回答カテゴリの付与を行って、回答候補情報を生成し、これを回答候補データベース１３へ格納する。
【００２３】
回答生成部５は、入力された質問文と、文書検索部２により得られた検索結果と、回答種別判定部３により得られた回答種別と、回答候補データベース１３に格納されている回答候補情報とに基づいて、回答情報の生成を行う。
【００２４】
根拠文書情報付加部６は、根拠文書情報テーブル１５を用いて、回答生成部５により生成された回答情報に対して、根拠文書に関係する根拠文書情報の付加を行う。
【００２５】
出力部７は、根拠文書情報を付加された回答情報の出力を行う。
【００２６】
ここで、本質問応答システムを計算機を用いて実現する場合、入力部１及び出力部７は、ユーザインタフェースに相当し、例えば、キーボード、マウス、マイクロフォンなどの入力装置や、計算機ディスプレイやスピーカーなどの出力装置を用いて実現することができる。また、文書検索部２、回答種別判定部３、回答候補抽出部４、回答生成部５、根拠文書情報付加部６は、プログラムにより実現することができる。
【００２７】
また、本質問応答システムをクライアント・サーバシステムとして実現する場合、例えば、当該構成要素のうち入力部１及び出力部７がクライアント計算機側に搭載され、それ以外の部分がサーバ計算機側に搭載される。
【００２８】
以下では、具体例を用いながら本質問応答システムの処理の流れについて説明する。
【００２９】
図２に、文書データベース１１の一例を示す。この例は、各文書は、文書ＩＤとタイトルとテキストを含んでいる。なお、○○○はある映画監督の氏名であり、ＸＸＸ、ＹＹＹ、ＺＺＺ、ＷＷＷはいずれも○○○監督の映画作品のタイトルであるとする。
【００３０】
ここで、回答候補抽出部４及び回答候補データベース１３について説明する。
【００３１】
まず、回答候補抽出部４を用いる場合には、予め回答候補抽出部４により、文書データベース１１に登録されている検索対象文書をもとに回答候補情報を生成し、これを回答候補データベース１３に保持しておく。回答候補抽出部４の処理については、いわゆる固有名詞抽出やオントロジータガーなどの公知の技術を用いて構わない。回答候補抽出においては、検索対象文書の表層表現に対して処理を行ってもよいし、形態素解析した結果に対して処理を行ってもよいし、構文・係り受け解析結果に対して処理を行ってもよい。
【００３２】
回答候補抽出処理の一具体例を次に示す。まず、表現カテゴリデータベース１２に、（回答候補文字列を検出するための情報、回答カテゴリ）の対が登録されている。例えば、“○○○監督”という語句を含む対象文書から、“○○○監督”を検出して“製作者”という回答カテゴリを付与すべき場合には、表現カテゴリデータベース１２に、具体的な回答候補文字列を用いた（○○○監督、製作者）が登録されている。そして、回答候補抽出部４は、文書データベース１１に登録されている検索対象文書（例えば、図２の文書ＩＤ＝０００５０の文書）と、表現カテゴリデータベース１２に登録されている表現（例えば、上記の（○○○監督、製作者）における○○○監督）とを比較して回答候補文字列（例えば、○○○監督）を抽出し、これに回答候補文字列を抽出した文書の文書ＩＤ（この例の場合、文書ＩＤ＝０００５０）と抽出された回答候補文字列（この例の場合、○○○監督）に対応する回答カテゴリ情報（この場合、上記の（○○○監督、製作者）における製作者）を付与して、文書ＩＤと回答候補文字列と回答カテゴリ情報とを含む回答候補情報を生成し、これを回答候補データベース１３に保持しておく。なお、この例の場合においても、表現カテゴリデータベース１２の形態は、種々のものが可能である。例えば、表現カテゴリデータベース１２に、回答候補文字列が含むべき文字列を用いた（監督、製作者）を登録し、対象文書から“監督”を含む文字列“○○○監督”を抽出し、これに回答カテゴリ“製作者”を付与するようにしてもよい。また、例えば、表現カテゴリデータベース１２に、（「＊」、タイトル）を登録し（ここで、＊は任意文字列を示すものとする）、対象文書から括弧に挟まれた“「ＸＸＸ」”の文字列を検出し、この文字列から“「」”を省いた部分“ＸＸＸ”を候補文字列とし、これに回答カテゴリ“タイトル”を付与するようにしてもよい。その他にも、種々の方法が可能である。
【００３３】
図３に、図２の文書データベース１１に格納された文書をもとに生成した回答候補情報を登録した回答候補データベース１３の一例を示す。
【００３４】
他方、回答候補抽出部４を用いず、かつ、外部から回答候補情報を取得して回答候補データベース１３に登録しておく必要がある場合には、その作業を行っておく。
【００３５】
また、回答候補抽出部４を用いず、かつ、外部から回答候補情報を取得して回答候補データベース１３に登録しておく必要がない場合には、必要時に回答候補データベース１３にアクセスすればよい。
【００３６】
図４に、本質問応答システムの質問文入力から結果出力までの処理の流れの一例を示す。
【００３７】
ユーザは、入力部１を介してテキストや音声などにより質問文を入力する。
【００３８】
なお、音声入力を行った場合にも公知の音声認識技術により質問文をテキストデータに変換することが可能であるため（この場合には、例えば入力部１が該変換の機能を持てばよい）、以後、質問文がテキストデータとして得られている場合を例にとって説明を行う。
【００３９】
入力部１は、ユーザからの質問文の入力を受けると、入力された質問文を、文書検索部２と回答種別判定部３と回答生成部５へ送る（ステップＳ１）。
【００４０】
ここでは、“「ＹＹＹ」の監督は誰ですか？”という質問文が入力されたものとする。
【００４１】
文書検索部２は、入力部１から受け取った質問文をもとに文書データベース１１の検索対象文書に対して検索を行い、得られた各文書の文書スコアを求め、文書スコアの順で上位のものから規定数（例えば、予め固定された数、あるいは適宜ユーザ指定された数など）だけ文書を選択し、選択された文書（根拠文書）の持つ文書ＩＤ及びその文書スコアを含む検索結果を、回答生成部５に送る（ステップＳ２）。なお、文書検索部２の処理には、公知の技術を用いて構わない。
【００４２】
図５に、“「ＹＹＹ」の監督は誰ですか？”という質問文をもとに図２の文書データベース１１に格納された文書を検索した場合の出力結果の例を示す（この例では、文書スコアの値が大きいほど、質問文に適合していると評価されたものとする）。
【００４３】
他方、この検索処理と前後して又は並行して、回答種別判定部３は、入力部１から受け取った質問文と、質問パターンデータベース１４に登録されている表現とを比較して回答種別の判定を行い、その判定結果を含む回答種別情報を、回答生成部５へ送る（ステップＳ３）。なお、文書種別判定部３の処理については、公知の技術を用いて構わない（例えば、特開２００２−１３２８１２参照）。
【００４４】
図６に、質問パターンデータベース１４の一例を示す。この例は、質問文に“いつ”が含まれている場合には、回答種別は“日時”であると判定し、質問文に“誰”が含まれている場合には、回答種別は“人名”であると判定し、質問文に“どこ”が含まれている場合には、回答種別は“場所”であると判定するものである。
【００４５】
図７に、“「ＹＹＹ」の監督は誰ですか？”という質問文をもとに、図６の質問パターンデータベース１４により回答種別を判定した結果の例を示す。この場合、質問文中の“誰”によって、回答種別は“人名”であると判定される。
【００４６】
さて、回答生成部５は、回答種別判定部３から送られてきた回答種別情報を、予め定められた展開規則に基づき１又は複数の回答カテゴリ情報に展開し、この展開によって得られた複数の回答カテゴリ情報と、文書検索部２から送られてきた検索結果に含まれる文書ＩＤとをキーとして、回答候補データベース１３を検索し、回答候補情報を得る（ステップＳ４）。更に回答生成部５は、このステップＳ４ではさらに、得られた回答候補情報のうち表層文字列が同じものをマージし、回答文字列と回答カテゴリ情報と根拠文書の文書ＩＤとを含む回答情報（質問文及び文書スコアを含んでもよい）を作成し、回答情報を根拠文書情報付加部６へ送る。
【００４７】
上記展開規則とは、例えば、回答種別情報と、この回答種別情報を展開すべき１又は複数の回答カテゴリ情報とを対応付けて定義したものである。例えば、回答種別情報“場所”は回答カテゴリ情報“国名”“地名”“出身地”に展開し、回答種別情報“日時”は回答カテゴリ情報“年”“日付”“時刻”に展開し、回答種別情報“人名”は回答カテゴリ情報“人名”に展開するというような規則である。
【００４８】
上記具体例では、回答種別は“人名”と判定されるので、これに上記の例のような展開規則を適用すると、回答カテゴリ情報“人名”に展開されることになる。
【００４９】
また、上記具体例において、回答生成部５が、上記の展開によって得られた回答カテゴリ情報“人名”と、文書検索部２から送られてきた検索結果に含まれる文書ＩＤ＝０００５０、００２０１、または００５６０とをキーとして、回答候補データベース１３を検索する。この際、回答生成部５は、例えば、回答カテゴリ情報“人名”と回答カテゴリ情報“制作者”とがマッチすることを示す知識情報を有しており、この結果、回答カテゴリ情報“人名”を持つ回答候補情報だけでなく、回答カテゴリ情報“制作者”を持つ回答候補情報も抽出されることになる。
【００５０】
図８に、上記の具体例の場合に回答生成部５により得られる回答候補情報の一例を示す。また、図９に、この場合に回答生成部５から根拠文書情報付加部６に送られる回答情報の一例を示す（図９では質問文は省略している）。
【００５１】
なお、上記では、回答種別情報“人名”を回答カテゴリ情報“人名”に展開し、検索の段階では、展開された回答カテゴリ情報“人名”をもとに検索するだけでなく、回答カテゴリ情報“人名”と回答カテゴリ情報“制作者”とがマッチするという知識を用いた。本発明はこれに代え、回答種別情報“人名”を回答カテゴリ情報“制作者”等に展開し、検索では、上記規則は用いずに、展開された回答カテゴリ情報“制作者”等をもとに検索するようにしてもよい。
【００５２】
次に、根拠文書情報付加部６は、回答生成部５から受け取った回答情報、回答候補データベース１３に保持されている回答候補情報、文書データベース１１に保持されている文書情報をもとに、根拠文書が持っている関連情報に関するデータを生成して根拠文書情報テーブル１５に記録する。根拠文書情報付加部６は、この記録したデータと、回答生成部５から受け取った回答情報とをもとに、回答情報と根拠文書情報とを含む提示情報を生成し、これを出力部７へ送る（ステップＳ５）。
【００５３】
ここで、根拠文書情報付加部６の処理について、より詳しく説明する。
【００５４】
根拠文書情報付加部６は、回答生成部５から回答情報を受け取ると、回答情報に含まれる各根拠文書の文書ＩＤをキーとして文書データベース１１を検索して各根拠文書のタイトル情報を取得し、根拠文書情報テーブル１５に記録する。また、各根拠文書の文書ＩＤをキーとして回答候補データベース１３を検索し、各回答候補情報のうち、質問文から抽出される表現（本具体例の場合、“ＹＹＹ”）又は回答文字列（本具体例の場合、“○○○監督”）のいずれかと一致する回答候補文字列を含む回答候補情報を除外した回答候補情報に係る回答カテゴリ情報を抽出し、この情報をカテゴリ情報毎に計数した個数とともに根拠文書情報テーブル１５に記録する。
【００５５】
図１０に、本具体例の場合における根拠文書情報テーブル１５の一例を示す。
【００５６】
最後に、出力部７は、根拠文書情報付加部６より受け取った情報に基づき、質問文とそれに対する回答と根拠文書情報との出力を行う（ステップＳ６）。
【００５７】
図１１に、本具体例の場合において、出力部７より表示された表示画面の一例を示す。この例では、質問文の内容“「ＹＹＹ」の監督は誰ですか？”と、これに対する回答“○○○監督”の他に、根拠文書情報が表示されている。なお、図１１の根拠文書情報では、３つの根拠文書を、その文書スコアの順に並べた例を示している（図５参照）。また、各根拠文書情報における“他情報”は、図１０の回答カテゴリ情報及び当該カテゴリが付された文書の個数を提示したものである。例えば、文書（１）では“他情報”がないことが示され、文書（２）では“他情報”は回答カテゴリ情報“タイトル”，“出身地”，“年”を持つものがそれぞれ３個，１個，１個存在することが示され、文書（３）では“他情報”は回答カテゴリ情報“タイトル”を持つものが１個存在することが示されている。
【００５８】
なお、図１１では、３つの根拠文書を、その文書スコアの順に並べた例を示したが、その他の基準によって順に並べるようにしてもよい。例えば、後で説明する評価点の順に並べて提示するようにしてもよい（この場合、例えば、文書ＩＤが、００２１０、０００５０、００５６０の順で文書が並べられる（図１３参照））。
【００５９】
また、図１１において、例えば、根拠文書のタイトル情報をマウスで選択し、あるいは根拠文書の順位番号をキーボードで選択することなどによって、選択された根拠文書の内容が表示されるようにしてもよい。
【００６０】
また、図１１では、全ての根拠文書について、そのタイトル情報と他情報のみを表示したが、例えば、（文書スコアあるいは評価点などで）第１順位の根拠文書については（あるいは、第１順位から予め定められた順位までの根拠文書については）、タイトル情報と他情報に代えて又はタイトル情報と他情報とともに、当該根拠文書の内容を表示するようにしてもよい。
【００６１】
以上説明したように、本実施形態によれば、質問に対する回答を提示するとともに、当該質問に対する回答を得る根拠となった文書中に、当該回答の他にどのような情報が含まれているかを提示するので、ユーザは当該質問以外に知りたい情報が根拠文書中に存在するか否かあるいは存在する可能性の有無などを容易に把握することができるようになる。
【００６２】
さて、以下では、根拠文書情報付加部６の処理及び出力部７による出力方法の他の例について説明する。
【００６３】
（第１の変形例）
まず、第１の変形例について説明する。
【００６４】
根拠文書情報付加部６は、回答生成部５から回答情報を受け取ると、回答情報に含まれる各根拠文書の文書ＩＤをキーとして文書データベース１１を検索して各根拠文書のタイトル情報を取得し、根拠文書情報テーブル１５に記録する。また、各根拠文書の文書ＩＤをキーとして回答候補データベース１３を検索し、各回答候補情報のうち、質問文から抽出される表現（本具体例の場合、“ＹＹＹ”）又は回答文字列（本具体例の場合、“○○○監督”）のいずれかと一致する回答候補文字列を含む回答候補情報を除外した回答候補情報に係る回答カテゴリ情報を抽出し、これを根拠文書情報テーブル１５に記録する。以上は、既に説明した例と同様である（図１０参照）。
【００６５】
次に、根拠文書情報付加部６は、根拠文書評価パターンデータベース（図示せず）に登録されているルールに基づいて、各根拠文書に対する評価点を求め、これを、根拠文書情報テーブル１５に記録する。
【００６６】
図１２に、根拠文書評価パターンデータベースに登録されている情報の一例を示す。回答カテゴリ（１）は、回答情報に係る回答カテゴリ情報であり（図９参照）、回答カテゴリ（２）は、根拠文書情報テーブル１５における各回答カテゴリ情報であり（図１１参照）、点数は、当該回答カテゴリ（１）と回答カテゴリ（２）の組合せの場合に評価点に加算する点数である。評価点の計算処理では、例えば、各根拠文書について、図１２のルールのうち、該当するルールに係る点数を全て加算した値が、求める評価点となる。例えば、本具体例のように質問文に対する回答情報に係る回答カテゴリ情報が“制作者”であり、根拠文書情報テーブル１５の内容が図１０のようである場合、各根拠文書の評価点を付加された根拠文書情報テーブル１５の内容は、例えば、図１３のようになる。
【００６７】
この場合、出力部７は、根拠文書に付与された評価点に基づいて、根拠文書情報に関する提示を行う。
【００６８】
図１４に、この場合の一例を示す。この例では、最も高い評価点を持つ第１順位の根拠文書（本例では、図２における文書ＩＤ＝００２１０の文書）のみについて、その内容を提示し、他の文書については、それが２つ存在することのみを提示している。なお、他の文書については、例えば、他の文書の番号をマウスで選択し、あるいは番号をキーボードで選択することなどによって、選択された根拠文書の内容が表示されるようにしている。例えば、番号１を選択すると、第２順位の評価点を持つ根拠文書（本例では、図２における文書ＩＤ＝０００５０の文書）の内容が表示される。
【００６９】
なお、上記では、最も高い評価点を持つ第１順位の根拠文書（本例では、図２における文書ＩＤ＝００２１０の文書）のみについて、その内容を提示したが、第１順位から予め定められた順位までの根拠文書について、その内容を提示するようにしてもよい。
【００７０】
なお、図１４では、内容を提示する根拠文書以外の根拠文書については、具体的な情報は何も提示しなかったが、例えば、図１１のように、そのタイトル情報と他情報を表示するようにしてもよい。もちろん、この場合にも、例えば、根拠文書のタイトル情報をマウスで選択し、あるいは根拠文書の順位番号をキーボードで選択することなどによって、選択された根拠文書の内容が表示されるようにしてもよい。
【００７１】
以上説明したように、本実施形態によれば、質問に対する回答を提示するとともに、当該質問に対する回答を得る根拠となった文書中に、当該回答の他に関連する情報が多く含まれると判断される文書を優先して表示することができるので、ユーザは当該質問以外に知りたい情報が根拠文書中に存在するか否かあるいは存在する可能性の有無などを容易に把握することができるようになる。そして、当該根拠文書中に知りたい情報が含まれていれば、新たな質問文を入力して検索等を行うことなく、当該根拠文書を閲覧するだけで、当該知りたい情報を得ることができる。
【００７２】
（第２の変形例）
次に、第２の変形例について説明する。
【００７３】
根拠文書情報付加部６は、回答生成部５から回答情報を受け取ると、回答情報に含まれる各根拠文書の文書ＩＤをキーとして文書データベース１１を検索して各根拠文書のタイトル情報を取得し、根拠文書情報テーブル１５に記録する。また、各根拠文書の文書ＩＤをキーとして回答候補データベース１３を検索し、各回答候補情報のうち、質問文から抽出される表現（本具体例の場合、“ＹＹＹ”）又は回答文字列（本具体例の場合、“○○○監督”）のいずれかと一致する回答候補文字列を含む回答候補情報を除外した回答候補情報に係る回答カテゴリ情報を抽出し、これを根拠文書情報テーブル１５に記録する。以上は、既に説明した例と同様である（図１０参照）。
【００７４】
次に、根拠文書情報付加部６は、根拠文書情報テーブル１５に登録されている回答カテゴリの類似性に基づいて根拠文書を分類し、その分類結果を根拠文書情報テーブル１５に反映させる（例えば、根拠文書情報テーブル１５の各根拠文書に対して、それが属する分類に関する情報を付加する）。なお、分類方法については、公知の技術を用いて構わない。
【００７５】
例えば、図１０の根拠文書情報テーブル１５の場合に、文書分類１は、回答カテゴリ“タイトル”に係る回答候補を含む根拠文書（文書ＩＤ＝００２１０、０００５０）の属する分類、文書分類２は、回答カテゴリ“出身地”に係る回答候補を含む根拠文書（文書ＩＤ＝００２１０）の属する分類、文書分類３は、回答カテゴリ“年”に係る回答候補を含む根拠文書（文書ＩＤ＝００２１０）の属する分類となり、根拠文書情報テーブル１５においては、文書ＩＤ＝００２１０の根拠文書には、文書分類１、文書分類２、文書分類３を示す情報が付加され、文書ＩＤ＝０００５０の根拠文書には、文書分類１を示す情報が付加される（文書ＩＤ＝００５６０の根拠文書には、分類に関する情報は付加されないか、またはそれが属する分類がないことを示す情報が付加される）。
【００７６】
この場合、出力部７は、根拠文書に付与された分類に関する情報に基づいて、根拠文書情報に関する提示を行う。
【００７７】
図１５に、この場合の一例を示す。この例では、各分類を、それに属する根拠文書数の多い順に、その分類に係る回答カテゴリとその分類に属する根拠文書のタイトルとを提示している。
【００７８】
なお、図１５において、各文書分類において、各根拠文書ごとに、それに含まれる当該回答カテゴリに係る回答候補情報の個数を提示するようにしてもよい。例えば、文書分類１のタイトル“ＹＹＹ”の根拠文書（文書ＩＤ＝００２１０）には、回答カテゴリ“タイトル”を持つ回答候補情報が３個存在するので（図１０参照）、“ＹＹＹ（３）”のように、当該文書分類の当該根拠文書のタイトルの表示の横に個数を表示するなどしてもよい。
【００７９】
また、例えば、図１５において、（文書スコアあるいは評価点などで）第１順位の根拠文書については（あるいは、第１順位から予め定められた順位までの根拠文書については）、当該根拠文書の内容を表示するようにしてもよい。
【００８０】
以上説明したように、本実施形態によれば、質問に対する回答を提示するとともに、当該質問に対する回答を得る根拠となった文書中に存在する当該回答の他に関連する情報によって根拠文書を分類して表示することができるので、ユーザは当該質問以外に知りたい情報が根拠文書中に存在するか否かあるいは存在する可能性の有無などを容易に把握することができるようになる。そして、当該根拠文書中に知りたい情報が含まれていれば、新たな質問文を入力して検索等を行うことなく、当該根拠文書を閲覧するだけで、当該知りたい情報を得ることができる。
【００８１】
なお、以上説明した根拠文書情報付加部６の処理及び出力部７による出力方法の各バリエーションは適宜組み合わせて実施可能である。また、根拠文書情報付加部６の処理及び出力部７による出力方法として、複数のものを用意しておき、ユーザがいずれを使用するかを設定可能にしてもよい。
【００８２】
また、以上の説明において、検索対象文書は、プレーンテキストとして示したが、ＸＭＬ文書のように予めタグ付けされた文書でも、同様に実施可能である。この場合、回答カテゴリの情報として予めタグ付けされた情報を用いることも可能である。
【００８３】
また、以上の説明において、回答種別と回答カテゴリとは１対多対応のものと定義したが、回答種別と回答カテゴリとが同じ名前である場合や、回答種別と回答カテゴリとが多対多対応又は多対１対応の場合でも、同様に実施可能である。
【００８４】
また、以上の説明において、回答カテゴリとして“人名”，“地名”のような上位概念タグを用いたが、“定義表現”，“手段表現”のようなメタ概念をタグとして用いた場合にも、同様に実施可能である。
【００８５】
また、以上の説明において、形態素解析や構文解析の手段を特に明示的に用いることはなかったが、各処理においてこれらの手段を用いた場合にも、同様に実施可能である。この場合、形態素解析辞書へのカテゴリ属性の付与や、構文パターンマッチによるカテゴリ同定を行うことも可能である。
【００８６】
なお、以上の各機能は、ソフトウェアとして記述し適当な機構をもったコンピュータに処理させても実現可能である。
また、本実施形態は、コンピュータに所定の手段を実行させるための、あるいはコンピュータを所定の手段として機能させるための、あるいはコンピュータに所定の機能を実現させるためのプログラムとして実施することもできる。加えて該プログラムを記録したコンピュータ読取り可能な記録媒体として実施することもできる。
【００８７】
なお、本発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。
【００８８】
【発明の効果】
本発明によれば、質問文に対する回答を提示するにあたって、その根拠文書に含まれる情報をも考慮した提示が可能になる。
【図面の簡単な説明】
【図１】本発明の一実施形態に係る質問応答システムの構成例を示す図
【図２】文書データベースの情報の一例を示す図
【図３】回答候補データベースの情報の一例を示す図
【図４】同実施形態に係る質問応答システムの処理手順の一例を示すフローチャート
【図５】検索結果の一例を示す図
【図６】質問パターンデータベースの情報の一例を示す図
【図７】回答種別判別結果の一例を示す図
【図８】抽出された回答候補情報の一例を示す図
【図９】回答文字列と回答カテゴリと根拠文書の文書ＩＤを含む回答情報の一例を示す図
【図１０】根拠文書情報テーブルの一例を示す図
【図１１】提示情報の表示例を示す図
【図１２】根拠文書評価パターンデータベースの情報の一例を示す図
【図１３】根拠文書情報テーブルの他の例を示す図
【図１４】提示情報の表示例を示す図
【図１５】提示情報の表示例を示す図
【符号の説明】
１…入力部、２…文書検索部、３…回答種別判定部、４…回答候補抽出部、５…回答生成部、６…根拠文書情報付加部、７…出力部、１１…文書データベース、１２…表現カテゴリデータベース、１３…回答候補データベース、１４…質問パターンデータベース、１５…根拠文書情報テーブル[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a question answering apparatus, a question answering method, and a program for outputting an answer to a question inputted by a user.
[0002]
[Prior art]
As represented by Internet search engines, techniques for searching and ranking documents that match user search requests are widely used. However, the document search can satisfy search requests such as “I want to read newspaper articles about…” or “I want to see a web page about…”, but “Who is the president of XX?” Can't answer directly to questions such as "Whales are going to be extinct?" Since the document search only outputs a document or a passage in the document, the user must search for an answer from the output result.
[0003]
There is a question answering system that directly outputs an answer to a question like the latter. The question answering system does not output a document about XX company (for example, XX company's website) to a question such as "Who is XX company president?" A person's name is output, and an answer such as “Mt. Fuji is 3776 m” is output to a question such as “How tall is Mt. Fuji?”.
[0004]
Such a question answering system has recently attracted attention as a development form of research such as information retrieval and information extraction, and it has become possible to return a direct answer to a user's question to some extent. For example, as in Patent Document 1, a direct response to a certain degree to a user's question text and a rationale document that allows the user to confirm that the answer is an answer to the question text are output. The system is known. The basis document is, for example, a source document from which the system has extracted answers to questions. The user can know which document is the answer based on the basis document.
[0005]
[Patent Document 1]
JP 2002-132812 A
[0006]
[Problems to be solved by the invention]
There may be only one evidence document as described above, but generally there may be a plurality of documents. In the conventional question answering system, when there are a plurality of ground documents, all the ground documents are listed and presented, or representative ground documents are selected and presented. However, the user may want only an answer to a single question, but often desires a series of knowledge and information. In the latter case, if the rationale document relating to the answer of a certain question contains useful information (eg, related information such as answers to other questions that the user should do afterwards) in addition to the answer For example, the user can efficiently acquire information by browsing the rationale document (for example, without asking other questions).
[0007]
However, the conventional question answering system cannot provide a presentation function related to information other than the answer existing in the ground document. For example, depending on the document evaluation method that is based on the degree of matching with a search keyword included in a question sentence, which is generally used as an evaluation method for a ground document, how information other than the answer to the question sentence is displayed. It is difficult to evaluate whether useful information is included. In addition, for example, in the method of displaying the summary of the ground document, there are only ones in which the same summary is generated regardless of the content of the question, or a summary that is biased toward the original question sentence is generated.
[0008]
The present invention has been made in consideration of the above circumstances, and in presenting a response to a question sentence, a question answering apparatus, a question answering method, and a program capable of presenting in consideration of information included in the basis document. The purpose is to provide.
[0009]
[Means for Solving the Problems]
A question answering apparatus according to the present invention includes a first database that stores a plurality of documents, a character string that can be an answer candidate included in the document for each document stored in the first database, Based on the second database for storing category information indicating categories as character string answer candidates in association with each other, input means for inputting a question sentence in natural language, and the first sentence based on the question sentence. A search means for searching a database; an extraction means for extracting an answer character string as an answer to the question sentence from the document searched by the search means; and the search means by referring to the second database. Acquisition means for acquiring category information corresponding to a character string other than the answer character string included in the basis document that is the document including the answer character string among the retrieved documents , Characterized in that an output means for outputting the response information including the category information corresponding to the character string other than the answer string and the reply string.
[0010]
The present invention relating to the apparatus is also established as an invention relating to a method, and the present invention relating to a method is also established as an invention relating to an apparatus.
Further, the present invention relating to an apparatus or a method has a function for causing a computer to execute a procedure corresponding to the invention (or for causing a computer to function as a means corresponding to the invention, or for a computer to have a function corresponding to the invention. It is also established as a program (for realizing) and also as a computer-readable recording medium on which the program is recorded.
[0011]
According to the present invention, when an answer to a question sentence is presented, it is possible to present the information in consideration of information included in the basis document. Therefore, according to the present invention, an answer to the question can be obtained, and for example, what information is included in the basis document in addition to the answer character string can be easily grasped.
[0012]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the invention will be described with reference to the drawings.
[0013]
FIG. 1 shows a configuration example of a question answering system according to an embodiment of the present invention.
[0014]
As shown in FIG. 1, the question answering system includes an input unit 1, a document search unit 2, an answer type determination unit 3, an answer candidate extraction unit 4, an answer generation unit 5, an evidence document information addition unit 6, and an output unit 7. It has.
[0015]
The question answering system may include a document database 11, an expression category database 12, an answer candidate database 13, a question pattern database 14, and a rational document information table 15.
[0016]
A configuration without the document database 11 and / or the answer candidate database 13 is also possible. For example, these databases 11 and 13 may be connected to each other via a network such as a LAN or the Internet and be searchable.
[0017]
If the answer candidate database 13 is not provided or the answer candidate database 13 is provided, but the contents of the answer candidates can be acquired from the outside via a network such as the Internet, the answer candidate extraction unit 4 and the expression category database 12 are provided. You do n’t have to.
[0018]
The outline of each part in FIG. 1 is as follows.
[0019]
The input unit 1 inputs a question sentence.
[0020]
The document search unit 2 searches the document database 11 based on the inputted question sentence, and scores the obtained document.
[0021]
The answer type determination unit 3 uses the question pattern database 14 to determine the answer type of the input question sentence.
[0022]
The answer candidate extraction unit 4 extracts an answer candidate character string from a search target document stored in the document database 11 using, for example, the expression category database 12 and assigns an answer category to the answer candidate character string. , Answer candidate information is generated and stored in the answer candidate database 13.
[0023]
The answer generation unit 5 receives the input question text, the search result obtained by the document search unit 2, the answer type obtained by the answer type determination unit 3, and the answer candidate information stored in the answer candidate database 13. Based on the above, response information is generated.
[0024]
The basis document information adding unit 6 uses the basis document information table 15 to add the basis document information related to the basis document to the response information generated by the response generation unit 5.
[0025]
The output unit 7 outputs the answer information to which the ground document information is added.
[0026]
Here, when this question answering system is realized using a computer, the input unit 1 and the output unit 7 correspond to a user interface. For example, an input device such as a keyboard, a mouse, and a microphone, a computer display, a speaker, and the like. It can be realized using an output device. Further, the document search unit 2, the response type determination unit 3, the response candidate extraction unit 4, the response generation unit 5, and the rational document information addition unit 6 can be realized by a program.
[0027]
Further, when the question answering system is realized as a client / server system, for example, the input unit 1 and the output unit 7 among the components are mounted on the client computer side, and other parts are mounted on the server computer side. .
[0028]
Hereinafter, the flow of processing of the question answering system will be described using a specific example.
[0029]
FIG. 2 shows an example of the document database 11. In this example, each document includes a document ID, a title, and text. XX is the name of a movie director, and XXX, YYY, ZZZ, and WWW are all titles of a movie work directed by XXX.
[0030]
Here, the answer candidate extraction unit 4 and the answer candidate database 13 will be described.
[0031]
First, when the answer candidate extraction unit 4 is used, the answer candidate extraction unit 4 generates answer candidate information based on a search target document registered in the document database 11 in advance, and stores the answer candidate information in the answer candidate database 13. Keep it. About the process of the answer candidate extraction part 4, you may use well-known techniques, such as what is called a proper noun extraction and ontology tagger. In answer candidate extraction, processing may be performed on the surface representation of the search target document, processing may be performed on the result of morphological analysis, or processing may be performed on the result of syntax and dependency analysis. May be.
[0032]
A specific example of the answer candidate extraction process is shown below. First, a pair of (information for detecting an answer candidate character string, answer category) is registered in the expression category database 12. For example, when “XX Director” is detected from the target document including the phrase “XXX Director” and an answer category “producer” is to be assigned, the expression category database 12 stores the specific category. The answer candidate character string (director, producer, producer) is registered. Then, the answer candidate extraction unit 4 searches for the search target document registered in the document database 11 (for example, the document with document ID = 0000000 in FIG. 2) and the expression registered in the expression category database 12 (for example, the above-described document). (XX director, producer) compared to XX director), extract the answer candidate character string (for example, XXX director), and extract the answer candidate character string to the document ID ( In this example, document ID = 000050) and answer category information corresponding to the extracted answer candidate character string (in this example, XXX director) (in this case, (XX director, producer) above) Is created, and answer candidate information including a document ID, answer candidate character string, and answer category information is generated and stored in the answer candidate database 13. Even in the case of this example, the form of the expression category database 12 can be various. For example, in the expression category database 12, a character string that should be included in the answer candidate character string (director, producer) is registered, and a character string “XXX director” including “director” is extracted from the target document. An answer category “producer” may be assigned thereto. Also, for example, (“*”, title) is registered in the expression category database 12 (where * indicates an arbitrary character string), and “XXX” ”enclosed in parentheses from the target document. A character string may be detected, and a portion “XXX” obtained by omitting ““ ”from the character string may be used as a candidate character string, and an answer category“ title ”may be assigned thereto. Is possible.
[0033]
FIG. 3 shows an example of the answer candidate database 13 in which the answer candidate information generated based on the documents stored in the document database 11 of FIG. 2 is registered.
[0034]
On the other hand, if it is necessary not to use the answer candidate extraction unit 4 and to acquire answer candidate information from the outside and register it in the answer candidate database 13, this work is performed.
[0035]
Further, when it is not necessary to obtain the answer candidate information from the outside and register it in the answer candidate database 13 without using the answer candidate extraction unit 4, the answer candidate database 13 may be accessed when necessary.
[0036]
FIG. 4 shows an example of the flow of processing from question text input to result output of the question answering system.
[0037]
The user inputs a question sentence by text or voice via the input unit 1.
[0038]
In addition, since it is possible to convert a question sentence into text data by a well-known voice recognition technique even when voice input is performed (in this case, for example, the input unit 1 may have a function of the conversion). Hereinafter, the case where the question sentence is obtained as text data will be described as an example.
[0039]
When receiving an input of a question sentence from the user, the input unit 1 sends the input question sentence to the document search unit 2, the answer type determination unit 3, and the answer generation unit 5 (step S1).
[0040]
Here, who is the director of “YYY”? ”Is entered.
[0041]
The document search unit 2 searches the search target document in the document database 11 based on the question sentence received from the input unit 1, obtains the document score of each obtained document, and ranks higher in the document score order. Select a document from a specified number (for example, a fixed number or a user-specified number as appropriate), and search results including the document ID and document score of the selected document (foundation document), The data is sent to the answer generation unit 5 (step S2). A known technique may be used for the processing of the document search unit 2.
[0042]
In Figure 5, “Who is the director of“ YYY ”? 2 shows an example of an output result when a document stored in the document database 11 of FIG. 2 is searched based on the question sentence “(In this example, the larger the document score value, the more suitable the question sentence is. ).
[0043]
On the other hand, before or after or in parallel with this search processing, the answer type determination unit 3 compares the question text received from the input unit 1 with the expressions registered in the question pattern database 14 to determine the answer type. The response type information including the determination result is sent to the response generation unit 5 (step S3). Note that a known technique may be used for the processing of the document type determination unit 3 (for example, see JP-A-2002-132812).
[0044]
FIG. 6 shows an example of the question pattern database 14. In this example, when “when” is included in the question sentence, it is determined that the answer type is “date and time”, and when “who” is included in the question sentence, the answer type is “ It is determined that it is “person name”, and when “where” is included in the question sentence, it is determined that the answer type is “location”.
[0045]
In Figure 7, “Who is the director of“ YYY ”? 6 shows an example of the result of determination of the answer type by the question pattern database 14 of FIG. 6. In this case, the answer type is determined to be “person name” by “who” in the question text. The
[0046]
Now, the response generation unit 5 expands the response type information sent from the response type determination unit 3 into one or a plurality of response category information based on a predetermined expansion rule, and a plurality of response category information obtained by this expansion. Using the answer category information and the document ID included in the search result sent from the document search unit 2 as a key, the answer candidate database 13 is searched to obtain answer candidate information (step S4). Further, in step S4, the answer generating unit 5 further merges the obtained answer candidate information having the same surface character string, and includes answer information including the answer character string, the answer category information, and the document ID of the basis document ( A question sentence and a document score may be included), and the response information is sent to the ground document information adding unit 6.
[0047]
The expansion rule is defined, for example, by associating response type information with one or a plurality of response category information to be expanded. For example, the response type information “location” is expanded to the response category information “country name”, “place name”, “hometown”, and the response type information “date” is expanded to the response category information “year” “date” “time”. The type information “person name” is a rule that expands to answer category information “person name”.
[0048]
In the above specific example, the answer type is determined to be “person name”. Therefore, if the development rule as in the above example is applied to this, the answer category information “person name” is developed.
[0049]
In the above specific example, the answer generation unit 5 receives the answer category information “person name” obtained by the above expansion and the document ID = 0000000, 00201 included in the search result sent from the document search unit 2 or The answer candidate database 13 is searched using 00560 as a key. At this time, the answer generation unit 5 has, for example, knowledge information indicating that the answer category information “person name” matches the answer category information “producer”. As a result, the answer category information “person name” is obtained. In addition to the answer candidate information possessed, answer candidate information having the answer category information “producer” is extracted.
[0050]
FIG. 8 shows an example of answer candidate information obtained by the answer generation unit 5 in the case of the above specific example. FIG. 9 shows an example of answer information sent from the answer generating unit 5 to the rational document information adding unit 6 in this case (the question text is omitted in FIG. 9).
[0051]
In the above, the response type information “person name” is expanded to the response category information “person name”, and in the search stage, not only the search based on the expanded answer category information “person name” but also the response category information “ The knowledge that “person name” and answer category information “producer” match was used. Instead of this, the present invention expands the response type information “person name” into the response category information “producer”, etc., and the search does not use the above rules, but based on the expanded response category information “producer”, etc. You may make it search for.
[0052]
Next, the basis document information adding unit 6 uses the basis of the answer information received from the answer generation unit 5, the answer candidate information held in the answer candidate database 13, and the document information held in the document database 11. Data relating to related information held in the document is generated and recorded in the basis document information table 15. Based on the recorded data and the response information received from the response generation unit 5, the basis document information adding unit 6 generates presentation information including the response information and the basis document information, and outputs this to the output unit 7. Send (step S5).
[0053]
Here, the processing of the ground document information adding unit 6 will be described in more detail.
[0054]
Upon receiving the response information from the response generation unit 5, the rational document information adding unit 6 searches the document database 11 using the document ID of each rational document included in the response information as a key, and acquires the title information of each rational document, Record in the rationale document information table 15. In addition, the answer candidate database 13 is searched using the document ID of each evidence document as a key, and the expression extracted from the question sentence (“YYY” in this specific example) or the answer character string (this book) in each answer candidate information In the case of a specific example, the answer category information related to the answer candidate information excluding the answer candidate information including the answer candidate character string that matches any one of “xxx director” is extracted, and this information is counted for each category information. The number is recorded in the basis document information table 15 together with the number.
[0055]
FIG. 10 shows an example of the ground document information table 15 in the case of this specific example.
[0056]
Finally, the output unit 7 outputs a question sentence, an answer thereto, and the basis document information based on the information received from the basis document information adding unit 6 (step S6).
[0057]
FIG. 11 shows an example of a display screen displayed from the output unit 7 in the case of this specific example. In this example, who is the director of the question ““ YYY ”? "And the answer" XXX Director "in response to this, the evidence document information is displayed. In the evidence document information of Fig. 11, three evidence documents are arranged in the order of their document scores. In addition, “other information” in each ground document information indicates the response category information in FIG. 10 and the number of documents to which the category is attached. ) Indicates that there is no “other information”. In document (2), “other information” includes answer category information “title”, “place of origin”, and “year”, three, one, and one respectively. In the document (3), “other information” indicates that there is one having answer category information “title”.
[0058]
Although FIG. 11 shows an example in which three ground documents are arranged in the order of their document scores, they may be arranged in order according to other criteria. For example, it may be arranged and presented in the order of evaluation points described later (in this case, for example, the documents are arranged in the order of document IDs 00100, 050,000, and 00560 (see FIG. 13)).
[0059]
In FIG. 11, for example, the content of the selected rationale document may be displayed by selecting the title information of the rationale document with the mouse or selecting the rank number of the rationale document with the keyboard. .
[0060]
In FIG. 11, only the title information and other information are displayed for all the ground documents. For example, the ground documents in the first rank (from the document score or the evaluation score) (or from the first rank) For the basis documents up to a predetermined order), the content of the basis document may be displayed instead of the title information and other information or together with the title information and other information.
[0061]
As described above, according to the present embodiment, the answer to the question is presented, and what information is included in the document that is the basis for obtaining the answer to the question. Since the information is presented, the user can easily grasp whether or not there is information that the user wants to know other than the question in the rationale document.
[0062]
In the following, another example of the processing of the rational document information adding unit 6 and the output method by the output unit 7 will be described.
[0063]
(First modification)
First, a first modification will be described.
[0064]
Upon receiving the response information from the response generation unit 5, the rational document information adding unit 6 searches the document database 11 using the document ID of each rational document included in the response information as a key, and acquires the title information of each rational document, Record in the rationale document information table 15. In addition, the answer candidate database 13 is searched using the document ID of each evidence document as a key, and the expression extracted from the question sentence (“YYY” in this specific example) or the answer character string (this book) in each answer candidate information In the case of a specific example, the response category information related to the answer candidate information excluding the answer candidate information including the answer candidate character string that matches any of “XX director” is extracted and recorded in the basis document information table 15. To do. The above is similar to the example already described (see FIG. 10).
[0065]
Next, the evidence document information adding unit 6 obtains an evaluation score for each evidence document based on the rules registered in the evidence document evaluation pattern database (not shown), and records this in the evidence document information table 15. To do.
[0066]
FIG. 12 shows an example of information registered in the ground document evaluation pattern database. The answer category (1) is answer category information related to the answer information (see FIG. 9), the answer category (2) is each answer category information in the evidence document information table 15 (see FIG. 11), and the score is This is a score added to the evaluation score in the case of the combination of the answer category (1) and the answer category (2). In the evaluation point calculation process, for example, a value obtained by adding all the points related to the corresponding rule among the rules in FIG. For example, when the answer category information related to the answer information to the question sentence is “producer” and the contents of the evidence document information table 15 are as shown in FIG. The content of the basis document information table 15 is as shown in FIG. 13, for example.
[0067]
In this case, the output unit 7 presents the evidence document information based on the evaluation score given to the evidence document.
[0068]
FIG. 14 shows an example of this case. In this example, only the first-order basis document having the highest evaluation score (in this example, the document with document ID = 00210 in FIG. 2) is presented, and the other documents have two. It only presents that it exists. For other documents, for example, the number of the other document is selected with the mouse, or the number is selected with the keyboard, and the content of the selected ground document is displayed. For example, when number 1 is selected, the contents of the ground document having the second-ranked evaluation score (in this example, the document with document ID = 0000000 in FIG. 2) is displayed.
[0069]
In the above description, the content of only the first-ranked ground document having the highest evaluation score (in this example, the document with document ID = 00210 in FIG. 2) is presented. You may make it show the content about the ground document to rank.
[0070]
In FIG. 14, no specific information is presented for the ground document other than the ground document for presenting the contents. For example, the title information and other information are displayed as shown in FIG. It may be. Of course, in this case as well, for example, the content of the selected evidence document may be displayed by selecting the title information of the evidence document with the mouse or selecting the rank number of the evidence document with the keyboard. Good.
[0071]
As described above, according to the present embodiment, it is determined that a lot of related information is included in the document that provides the answer to the question and is the basis for obtaining the answer to the question. Documents can be displayed with priority, so that the user can easily grasp whether or not there is information that he / she wants to know other than the question in the rationale document. Become. And, if the information you want to know is included in the evidence document, you can get the information you want to know just by browsing the evidence document without entering a new question and searching. .
[0072]
(Second modification)
Next, a second modification will be described.
[0073]
Upon receiving the response information from the response generation unit 5, the rational document information adding unit 6 searches the document database 11 using the document ID of each rational document included in the response information as a key, and acquires the title information of each rational document, Record in the rationale document information table 15. In addition, the answer candidate database 13 is searched using the document ID of each evidence document as a key, and the expression extracted from the question sentence (“YYY” in this specific example) or the answer character string (this book) in each answer candidate information In the case of a specific example, the response category information related to the answer candidate information excluding the answer candidate information including the answer candidate character string that matches any of “XX director” is extracted and recorded in the basis document information table 15. To do. The above is similar to the example already described (see FIG. 10).
[0074]
Next, the basis document information adding unit 6 classifies the basis documents based on the similarity of the response categories registered in the basis document information table 15 and reflects the classification result in the basis document information table 15 (for example, Information regarding the classification to which the document belongs is added to each document in the document document information table 15). As a classification method, a known technique may be used.
[0075]
For example, in the case document information table 15 shown in FIG. 10, document category 1 is a category to which a basis document (document ID = 00210, 0950) including an answer candidate related to the response category “title” belongs, and document category 2 is an answer. The classification to which the rational document (document ID = 00210) including the answer candidate related to the category “Birthplace” belongs, and the document classification 3 is the classification to which the rational document (document ID = 00210) including the response candidate related to the response category “year” belongs. In the rational document information table 15, information indicating document classification 1, document classification 2, and document classification 3 is added to the rational document with the document ID = 00210, and the document classification is included in the rational document with the document ID = 000050. 1 is added (the basis document with document ID = 00560 does not have information related to the classification or the classification to which it belongs. Information indicating it is added).
[0076]
In this case, the output unit 7 presents the evidence document information based on the information related to the classification given to the evidence document.
[0077]
FIG. 15 shows an example of this case. In this example, for each classification, the answer category related to the classification and the title of the rational document belonging to the classification are presented in descending order of the number of evidence documents belonging to the classification.
[0078]
In FIG. 15, in each document classification, the number of answer candidate information related to the answer category included in each evidence document may be presented. For example, in the ground document (document ID = 00210) for the title “YYY” of document classification 1, there are three pieces of answer candidate information having the answer category “title” (see FIG. 10), so “YYY (3)”. As described above, the number may be displayed beside the display of the title of the basis document of the document classification.
[0079]
Further, for example, in FIG. 15, for the first-ranking ground document (or the ground document from the first rank to a predetermined rank) (for example, document score or evaluation score), the contents of the ground document May be displayed.
[0080]
As described above, according to the present embodiment, an answer to a question is presented, and the basis document is classified based on related information in addition to the answer existing in the document that is the basis for obtaining the answer to the question. Therefore, the user can easily grasp whether or not there is information that he / she wants to know in addition to the question in the basis document. And, if the information you want to know is included in the evidence document, you can get the information you want to know just by browsing the evidence document without entering a new question and searching. .
[0081]
Note that the variations of the processing of the ground document information adding unit 6 and the output method of the output unit 7 described above can be combined as appropriate. Also, a plurality of methods may be prepared as the processing of the rational document information adding unit 6 and the output method by the output unit 7 so that the user can set which one to use.
[0082]
In the above description, the search target document is shown as plain text. However, a document tagged in advance such as an XML document can be similarly implemented. In this case, pre-tagged information can be used as the answer category information.
[0083]
In the above explanation, the response type and the response category are defined as one-to-many correspondence. However, when the response type and the response category have the same name, the response type and the response category are many-to-many correspondence. Or, even in the case of many-to-one correspondence, it can be similarly implemented.
[0084]
Also, in the above explanation, higher-level concept tags such as “person name” and “place name” are used as the answer category, but also when meta concepts such as “definition expression” and “means expression” are used as tags. Can be implemented as well.
[0085]
Further, in the above description, morphological analysis and syntax analysis means are not particularly explicitly used, but the same can be implemented when these means are used in each processing. In this case, it is also possible to assign a category attribute to the morphological analysis dictionary and perform category identification by syntax pattern matching.
[0086]
Each of the above functions can be realized even if it is described as software and processed by a computer having an appropriate mechanism.
The present embodiment can also be implemented as a program for causing a computer to execute predetermined means, causing a computer to function as predetermined means, or causing a computer to realize predetermined functions. In addition, the present invention can be implemented as a computer-readable recording medium on which the program is recorded.
[0087]
Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of components disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.
[0088]
【The invention's effect】
According to the present invention, when an answer to a question sentence is presented, it is possible to present the information in consideration of information included in the basis document.
[Brief description of the drawings]
FIG. 1 is a diagram showing a configuration example of a question answering system according to an embodiment of the present invention.
FIG. 2 is a diagram showing an example of information in a document database
FIG. 3 is a diagram showing an example of information in an answer candidate database
FIG. 4 is a flowchart showing an example of a processing procedure of the question answering system according to the embodiment.
FIG. 5 is a diagram showing an example of search results
FIG. 6 is a diagram showing an example of information in a question pattern database
FIG. 7 is a diagram showing an example of an answer type determination result
FIG. 8 is a diagram showing an example of extracted answer candidate information
FIG. 9 is a diagram illustrating an example of response information including a response character string, a response category, and a document ID of a rational document.
FIG. 10 is a diagram showing an example of a rationale document information table
FIG. 11 is a diagram showing a display example of presentation information
FIG. 12 is a diagram showing an example of information in the basis document evaluation pattern database
FIG. 13 is a diagram showing another example of the basis document information table
FIG. 14 is a diagram showing a display example of presentation information
FIG. 15 is a diagram showing a display example of presentation information
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Input part, 2 ... Document search part, 3 ... Answer classification determination part, 4 ... Answer candidate extraction part, 5 ... Answer generation part, 6 ... Ground document information addition part, 7 ... Output part, 11 ... Document database, 12 ... expression category database, 13 ... answer candidate database, 14 ... question pattern database, 15 ... basis document information table

Claims

A question answering apparatus that inputs a question sentence in a natural language and outputs an answer character string that is a character string serving as an answer to the question sentence,
A first database storing a plurality of documents;
For each document stored in the first database, and candidate character strings taken as to obtain candidates for the answer string from among the character strings included in the document, semantic content of the candidate character strings the answer candidate information including the category information indicating a category relating to a second database for each arbitrary number memorize,
And input means for inputting the question,
Search means for searching the first database based on the question sentence;
From the document retrieved by said retrieval means, extracting means for extracting the answers string,
Among previous SL retrieval means documents retrieved by, the answer string as target basis document respectively a free plain coarse document from all the answer candidate information about the evidence documents stored in the second database The acquisition means for excluding the answer candidate information including the candidate character string matching the answer character string, and for obtaining the category information included in the answer candidate information for the answer candidate information obtained as a result ;
The extraction means extracting said answer character string, and question-answering apparatus being characterized in that an output means for outputting the category information obtained by the obtaining unit.

The acquisition unit is configured to select all of the answer candidates stored in the second database for the basis document for each of the basis documents that are documents including the answer character string from among the documents searched by the search unit. From the information, answer candidate information including a candidate character string that matches either the character string included in the question sentence or the answer character string is excluded, and the answer candidate information obtained as a result is included in the answer candidate information. 2. The question answering apparatus according to claim 1, wherein the category information included is acquired.

And the output means, when performing the output, for each of the grounds document, and information indicating the rationale document, claim 1 in association with the category information obtained for the rationale document and outputs it or 2. The question answering apparatus according to 2.

The output means further outputs, for each category information acquired for the basis document, information indicating the number of candidate character strings corresponding to the category information included in the basis document for performing the output. The question answering apparatus according to claim 3 , wherein

The output means, when performing the output, when there are a plurality of the evidence documents, outputs the category information related to the evidence documents in order from the evidence documents highly evaluated by a predetermined evaluation method. The question answering device according to claim 3 or 4 .

The output means classifies the basis document based on the category information acquired for the basis document when performing the output, information indicating the basis document belonging to the category for each category, and a category related to the category 4. The question answering apparatus according to claim 3 , wherein information is output.

3. The question according to claim 1, wherein the output unit outputs the contents of the basis document for the basis document that is most highly evaluated by a predetermined evaluation method when performing the output. 4. Response device.

A first database storing a plurality of documents;
For each document stored in the first database, a character string that can be an answer candidate included in the document and category information indicating a category as an answer candidate of the character string are stored in association with each other. Database of
An input means for inputting a question sentence in natural language;
Search means for searching the first database based on the question sentence;
Extraction means for extracting an answer character string that is an answer to the question sentence from the document searched by the search means;
Referring to the second database, the first character string corresponding to each of the character strings other than the answer character string included in the ground document that is the document including the answer character string among the documents searched by the search means. Acquisition means for acquiring first category information and second category information corresponding to the answer character string;
About the basis document, an evaluation score is obtained based on a rule registered in a pattern database of a combination of the acquired first category information and the acquired second category information. An evaluation means for evaluating the evidence document;
A question answering apparatus comprising: output means for outputting the answer character string and the contents of a basis document having a predetermined number of documents selected based on an evaluation result by the evaluation means.

The evaluation means refers to a third database that stores information indicating evaluation points determined for each combination of the first category information and the second category information when evaluating the basis document. A value obtained by summing up the evaluation points determined from each of the first category information acquired from the document and the evaluation points determined from the second category information is used as the evaluation value of the basis document. The question answering apparatus according to claim 8 .

9. The question answering apparatus according to claim 8 , wherein the output means outputs the content of the basis document that is most highly evaluated by the evaluation means.

The extraction means obtains a condition to be satisfied by category information corresponding to an answer character string for the question sentence based on the question sentence,
The acquisition unit obtains category information satisfying the condition among the category information stored for the document searched by the search unit in the second database,
The extraction means uses the candidate character string stored in the answer candidate information in association with category information satisfying the condition in the second database as the answer character string. The question answering apparatus according to 2 or 8 .

Extracting a character string to be answered candidate from the first database stored document, which was said candidate character string, in the candidate character string, and applying the category information, per the document, the candidate the question answering system according to the answer candidate information including the character string and the category information to claim 1, 2 or 8, further comprising a means for storing in said second database.

In order to input a question sentence in a natural language and output an answer character string that is a character string that is an answer to the question sentence, a first database that stores a plurality of documents and a first database that is stored in the first database for each document, including a candidate character string retrieved as can be a candidate for the answer string from among the character strings included in the document, and a category information indicating a category related semantic content of the candidate character strings the answer candidate information, a second and a database, an input means, search means and the extraction means and the acquisition means and, question answering method of question and answer device and output means for memorize by any number,
It said input means, an input step for inputting said question,
A search step in which the search means searches a first database storing a plurality of documents based on the question sentence;
The extraction means, from the document retrieved by the retrieving step, an extraction step of extracting the answers string,
All the acquisition means, among pre-Symbol search documents retrieved by step, the answer string as target people grounds document each is free plain coarse document, stored for the grounds document to the second database The step of obtaining the category information included in the answer candidate information for the candidate answer information obtained as a result of excluding the answer candidate information including the candidate character string matching the answer character string from the answer candidate information When,
Question answering wherein said output means, and having an output step of outputting the reply character string extracted by the extraction step, and the category information obtained by the obtaining step.

The acquisition unit is configured to select all of the answer candidates stored in the second database for the basis document for each of the basis documents that are documents including the answer character string from among the documents searched by the search unit. From the information, answer candidate information including a candidate character string that matches either the character string included in the question sentence or the answer character string is excluded, and the answer candidate information obtained as a result is included in the answer candidate information. 14. The question answering method according to claim 13, wherein category information included is acquired.

A first database that stores a plurality of documents, a character string that can be an answer candidate included in the document for each document stored in the first database, and a category as an answer candidate for the character string A question answering method for a question answering apparatus comprising: a second database for storing category information to be associated with each other, an input means, a search means, an extraction means, an acquisition means, an evaluation means, and an output means. There,
The input means for inputting a question sentence in a natural language;
A search step in which the search means searches a first database storing a plurality of documents based on the question sentence;
An extracting step in which the extracting means extracts an answer character string that becomes an answer to the question sentence from the document searched in the searching step;
The acquisition means associates, for each document stored in the first database, a character string that can be an answer candidate included in the document and category information indicating a category as an answer candidate of the character string. Corresponding to each of the character strings other than the answer character string included in the ground document that is the document including the answer character string among the documents retrieved by the retrieval step. Obtaining first category information and second category information corresponding to the answer character string;
The evaluation means obtains an evaluation score for the basis document based on a rule registered in a pattern database of a combination of the acquired first category information and the acquired second category information, An evaluation step for evaluating the rationale document according to an evaluation point; and
The question answering method, characterized in that the output means includes an output step of outputting the answer character string and the contents of a basis document of a predetermined number of documents selected based on the evaluation result of the evaluation step. .

In order to input a question sentence in a natural language and output an answer character string that is a character string that is an answer to the question sentence, a first database that stores a plurality of documents and a first database that is stored in the first database For each document, a candidate character string extracted as a candidate for the answer character string from among character strings included in the document, and category information indicating a category related to the semantic content of the candidate character string is included. A program for causing a computer to function as a question answering apparatus including a second database that stores answer candidate information in an arbitrary number ,
And the input function for entering the question,
A search function for searching a first database storing a plurality of documents based on the question sentence;
From the document retrieved by the search function, the extracting function of extracting the answers string,
Among pre Symbol search function documents retrieved by, the answer string as target basis document respectively a free plain coarse document from all the answer candidate information about the evidence documents stored in the second database , excluding the answer candidate information including a candidate character string that matches the answer string as an object the answer candidate information obtained as a result, an acquisition function of acquiring the category information included in the answer candidate information,
The extraction feature extracted the answer string, and a program for realizing an output function on a computer that outputs the category information that the acquirer has acquired.

The acquisition function is for all the answer candidates stored in the second database for the basis document for each of the basis documents that are documents including the answer character string among the documents searched by the search function. From the information, answer candidate information including a candidate character string that matches either the character string included in the question sentence or the answer character string is excluded, and the answer candidate information obtained as a result is included in the answer candidate information. The program according to claim 16, wherein category information included is acquired.

A program for causing a computer to function as a question answering device,
An input function for inputting questions in natural language;
A search function for searching a first database storing a plurality of documents based on the question sentence;
An extraction function for extracting an answer character string as an answer to the question sentence from the document searched by the search function;
For each document stored in the first database, a character string that can be an answer candidate included in the document and category information indicating a category as an answer candidate of the character string are stored in association with each other. The first category corresponding to each of the character strings other than the answer character string included in the ground document that is the document containing the answer character string among the documents searched by the search function An acquisition function for acquiring information and second category information corresponding to the answer character string;
About the basis document, an evaluation score is obtained based on a rule registered in a pattern database of a combination of the acquired first category information and the acquired second category information. An evaluation function to evaluate the rationale document;
A program for causing a computer to realize the output character string and an output function for outputting the contents of a document with a predetermined number of documents selected based on the evaluation result by the evaluation function.