JP2006091993A

JP2006091993A - Question/answering device and method and question/answering program

Info

Publication number: JP2006091993A
Application number: JP2004273510A
Authority: JP
Inventors: Makoto Koyama; 誠小山; Tetsuya Sakai; 哲也酒井
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2004-09-21
Filing date: 2004-09-21
Publication date: 2006-04-06

Abstract

<P>PROBLEM TO BE SOLVED: To quickly answer a question including relative expressions which should be answered from descriptive contents in a plurality of text data by retrieving a database. <P>SOLUTION: A database generating part 107 extracts relevant information from text data in a text information database 104 from such a point of view that the sequencing of extraction results is possible, and generates an extraction information database 108 from the information extracted from the same point of view. Thus, it is possible to acquire an answer by searching a point or range shown by relative expressions included in the question from the generated extraction information database 108. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

この発明は、ユーザからの自然言語による質問に対してデータベースを検索して回答を出力する質問応答装置および方法、質問応答プログラムに関する。 The present invention relates to a question answering apparatus and method and a question answering program that search a database for a question in a natural language from a user and output an answer.

ユーザからの自然言語による質問に対して回答を出力する質問応答システムは、通常、回答用のデータベースを保持し、このデータベースの中から回答を探してユーザに出力する。特に近年では、テキスト情報検索やテキスト情報抽出の分野で開発された技術を利用することにより、大量のテキストデータを記憶するデータベースの中から回答を検索する質問応答システムが実現されている（例えば、非特許文献1参照）。 A question answering system that outputs an answer to a question in a natural language from a user usually holds an answer database, searches for an answer from the database, and outputs it to the user. Particularly in recent years, a question answering system for searching for an answer from a database storing a large amount of text data has been realized by using a technology developed in the field of text information search and text information extraction (for example, Non-patent document 1).

こうした質問応答システムは、例えば、“○×社の社長は誰”という質問があれば、まず質問を用いてデータベースを検索する。検索結果に、例えば、“株式会社○×は、・・・○△氏の社長就任を発表した。”といったテキストデータが含まれている場合、このテキストデータ中から「○△氏」の文字列を抽出して質問に対する回答として出力する。 In such a question answering system, for example, if there is a question “who is the president of XX company”, the database is first searched using the question. For example, if the search result includes text data such as “X Corporation has announced the appointment of Mr. △ President as President”, the text string “Mr. XX” is included in this text data. Is extracted and output as an answer to the question.

こうした質問応答装置では、質問と検索対象のテキストデータにおいて記述された表現に相違がある場合、検索がうまくできなくなる。例えば、質問中では“アメリカ”という語が使われているが、検索対象のテキストデータ中では“米国”と記述されている場合などである。 In such a question answering apparatus, if there is a difference between the expressions described in the question and the text data to be searched, the search cannot be performed well. For example, the word “USA” is used in the question, but “US” is described in the text data to be searched.

このような問題を解消してシステムが回答を得るためには質問とテキストデータ間の表現の相違を解消する必要がある。この解消する方法として、例えば同義語や関連語を利用した検索を行ったり、質問の変形を行ったりすることが考えられている。このようなシステムとして、例えば事前に用意された変形規則を用いて、質問と検索対象のテキストデータとの類似度が高くなるように変形を行った後に回答を得るシステムが知られている（例えば、特開２００２−１４９９０公報）。
特開２００２−１４９９０公報 E.Voorhees、"The TREC-8 question answering track report. In Proceedings of TREC-8, 1999."、［online］、［平成１６年９月２１日検索］、インターネット<URL：http://trec.nist.gov/data/qa/t8_qadata.html> In order to solve such a problem and the system obtains an answer, it is necessary to eliminate the difference in expression between the question and the text data. As a method for solving this problem, for example, it is considered to perform a search using synonyms and related words, or to change a question. As such a system, for example, a system is known that obtains an answer after performing transformation so that the similarity between the question and the text data to be searched is increased using a transformation rule prepared in advance (for example, JP 2002-14990 A).
JP 2002-14990 A E.Voorhees, "The TREC-8 question answering track report. In Proceedings of TREC-8, 1999.", [online], [searched on September 21, 2004], Internet <URL: http: // trec. nist.gov/data/qa/t8_qadata.html>

ここで、“○○の前の”、“○○の次の”、“○○の以前の”などというような、相対的な表現が含まれる質問が質問応答システムに与えられた場合を考える。例えば、“○△氏の前の株式会社○×の社長は誰”という質問が与えられ、検索対象のデータベースには、次のテキストデータＡ，Ｂ，Ｃが含まれるとする。 Here, consider a case where a question answering system is given a question that includes relative expressions such as “before XX”, “next to XX”, “before XX”, etc. . For example, it is assumed that the question “who is the president of XX Co., Ltd. before Mr. XX” is given, and the search target database includes the following text data A, B, and C.

Ａ：“１９９５年○月○日、株式会社○×は○◇氏が社長に就任したと発表・・・。”
Ｂ：“１９９７年○月○日、株式会社○×は○△氏が社長に就任したと発表・・・。”
Ｃ：“１９９９年○月○日、株式会社○×は○□氏が社長に就任したと発表・・・。”
こうした相対的な表現が使われる質問に対して正しい回答を得るためには、各テキストデータに記述された日付を比較する必要がある。上記の例の場合は、テキストデータＡ、Ｂ、Ｃそれぞれに記述された日付を比較することにより、回答はテキストデータＡに含まれる“○◇氏”だと分かる。なお、ここでは株式会社○×の社長が交代したときのテキストデータがデータベースに含まれていることを前提としている。 A: “1995, Monday, Sunday, XX Co., Ltd. announced that Mr. ◇ was appointed as President ...”
B: “January, 1997, XX Co., Ltd. announced that Mr. △ took office as President ...”
C: “Monday, Sunday, 1999, XX Co., Ltd. announced that Mr. □ was appointed president.”
In order to obtain correct answers to questions that use such relative expressions, it is necessary to compare the dates described in each text data. In the case of the above example, by comparing the dates described in the text data A, B, and C, it can be understood that the answer is “Mr. Here, it is assumed that text data when the president of XX Co., Ltd. changes is included in the database.

しかし、従来の同義語や関連語を用いた検索や、変形規則を用いた質問の変形では、こうした検索対象の複数テキストデータの内容に応じた処理は行われないため、上述のような、ある観点による軸上（上記の例では時間軸上）での相対的な表現を含む質問に対し、回答を求めるための適切な処理が行えないという問題があった。 However, in the conventional search using synonyms and related words and the modification of the question using the transformation rule, the processing according to the contents of the multiple text data to be searched is not performed. There is a problem that an appropriate process for obtaining an answer cannot be performed on a question including a relative expression on the axis (in the above example, on the time axis) from the viewpoint.

上記の目的を達成するために、この発明においては、予め用意した複数のパターンを用いて、ある軸上で並べ替えが可能な軸データおよび当該データに関連付けられた関連データとを、テキスト情報データベースから抽出し、前記軸データおよび前記関連データを組として前記複数のパターンに応じた複数の抽出情報データからなる抽出情報データベースを生成するデータベース生成手段と、ユーザからの自然言語による質問を入力する入力手段と、前記質問を解析し当該質問に対する回答の回答タイプおよびキーワードを抽出する質問解析手段と、前記入力手段から入力された前記質問から前記軸に対する相対表現を解析する相対表現解析手段と、前記回答タイプおよび前記キーワードを用いて、前記抽出情報データベース中の複数の抽出情報データの内、いずれの抽出情報データを用いるか決定する抽出情報データ決定手段と、この抽出情報データ決定手段により決定された抽出情報データおよび前記相対表現解析手段による解析結果とを用いて、前記抽出情報データベースから前記質問に対する回答を検索する回答検索手段と、この回答検索手段により検索された回答を出力する出力手段を備えることを特徴とする質問応答装置を提供する。 In order to achieve the above object, according to the present invention, a plurality of patterns prepared in advance, axis data that can be rearranged on a certain axis, and related data associated with the data, a text information database A database generating means for generating an extracted information database composed of a plurality of extracted information data corresponding to the plurality of patterns by combining the axis data and the related data, and an input for inputting a question in a natural language from a user Means, analyzing the question and extracting a response type and a keyword of an answer to the question, a relative expression analyzing means for analyzing a relative expression with respect to the axis from the question input from the input means, Using the answer type and the keyword, a plurality of extractions in the extraction information database The extracted information data determining means for determining which extracted information data is used in the report data, the extracted information data determined by the extracted information data determining means and the analysis result by the relative expression analyzing means, There is provided a question answering device comprising answer search means for searching for an answer to the question from an extracted information database, and output means for outputting an answer searched by the answer search means.

なお、装置に係る本発明は方法に係る発明としても成立し、方法に係る本発明は装置に係る発明としても成立する。
また、装置または方法に係る本発明は、コンピュータに当該発明に相当する手順を実行させるための（あるいはコンピュータを当該発明に相当する手段として機能させるための、あるいはコンピュータに当該発明に相当する機能を実現させるための）プログラムとしても成立し、該プログラムを記録したコンピュータ読み取り可能な記録媒体としても成立する。 The present invention relating to the apparatus is also established as an invention relating to a method, and the present invention relating to a method is also established as an invention relating to an apparatus.
Further, the present invention relating to an apparatus or a method has a function for causing a computer to execute a procedure corresponding to the invention (or for causing a computer to function as a means corresponding to the invention, or for a computer to have a function corresponding to the invention. It can also be realized as a program (for realizing the program), and can also be realized as a computer-readable recording medium on which the program is recorded.

本発明による質問応答装置では、順序付けが可能なある観点によりテキストデータ中から関連する情報を抽出した後、同一観点で抽出された情報を抽出内容に基づきデータベースを生成し、このデータベースに対して、質問に含まれる相対表現が示す点または範囲を求めることにより回答を得る。これにより、複数のテキストデータにおける記述内容から回答を求めることが必要となる相対的な表現を含む質問に対し、データベースの検索によって回答を求めることが可能となる。 In the question answering apparatus according to the present invention, after extracting relevant information from text data from a viewpoint that can be ordered, a database is generated based on the extracted contents of information extracted from the same viewpoint. An answer is obtained by obtaining the point or range indicated by the relative expression included in the question. Accordingly, it is possible to obtain an answer by searching a database for a question including a relative expression that requires an answer to be obtained from descriptions in a plurality of text data.

本実施形態は、典型的には、ソフトウェアで制御されるコンピュータにより実現される。この場合のソフトウェアは、プログラムやデータを含み、コンピュータのハードウェアを物理的に活用することで本発明の作用効果を実現するものであり、従来技術を適用可能な部分には好適な従来技術が適用される。さらに、本発明を実現するハードウェアやソフトウェアの具体的な種類や構成、ソフトウェアで処理する範囲などは自由に変更可能である。従って、以下の説明では、本発明を構成する機能ごとにブロック化して図示した仮想的機能ブロック図を用いる。なお、コンピュータを動作させて本発明を実現するためのプログラムも、本発明の一態様である。 This embodiment is typically realized by a computer controlled by software. The software in this case includes programs and data, and realizes the operational effects of the present invention by physically utilizing the computer hardware. Applied. Furthermore, the specific types and configurations of hardware and software that implement the present invention, the scope of processing by software, and the like can be freely changed. Therefore, in the following description, a virtual function block diagram illustrated in a block form for each function constituting the present invention is used. Note that a program for operating a computer to implement the present invention is also an embodiment of the present invention.

以下、図面を参照しながら本発明の実施の形態について説明する。
（第１の実施形態）
図１は、本発明の第１の実施形態に係る質問応答装置のシステム構成を示す図である。
本発明の質問応答装置１００は、ユーザの質問を入力する質問入力部１０１と、質問を処理してキーワード抽出や回答タイプ解析を行う質問処理部１０２と、質問に含まれる相対表現を解析する相対表現解析部１０３と、テキスト情報データベース１０４と、抽出結果の順序付けが可能な観点でテキスト情報データベース１０４から関連する情報を抽出する関連情報抽出部１０５と、同一観点で抽出された複数の抽出情報を整理する抽出情報整理部１０６と、抽出・整理した情報からデータベースを生成するデータベース生成部１０７と、このデータベース生成部１０７により生成される抽出情報データベース１０８と、テキスト情報データベース１０４または抽出情報データベース１０８から回答が含まれる情報を検索する回答情報検索部１０９と、回答情報検索部１０９の検索結果からユーザに提示する回答を抽出する回答抽出部１１０と、回答をユーザに出力する回答出力部１１１とから構成される。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
(First embodiment)
FIG. 1 is a diagram showing a system configuration of a question answering apparatus according to the first embodiment of the present invention.
The question answering apparatus 100 of the present invention includes a question input unit 101 for inputting a user's question, a question processing unit 102 for processing a question to perform keyword extraction and answer type analysis, and a relative for analyzing a relative expression included in the question. An expression analysis unit 103, a text information database 104, a related information extraction unit 105 that extracts related information from the text information database 104 from the viewpoint that the extraction results can be ordered, and a plurality of pieces of extracted information extracted from the same viewpoint From the extracted information organizing unit 106 for organizing, the database generating unit 107 for generating a database from the extracted / organized information, the extracted information database 108 generated by the database generating unit 107, and the text information database 104 or the extracted information database 108 Answer information search unit that searches for information that includes answers And 09, and answer extraction unit 110 for extracting an answer to be presented to the user from the search results of answer information retrieval unit 109, and a reply output section 111 for outputting an answer to the user.

ここで、テキスト情報データベース１０４は、回答検索対象とするテキストデータを登録したデータベースであり、既存のテキスト情報検索技術で用いられる転置インデックス法などの索引生成技術により生成される。 Here, the text information database 104 is a database in which text data to be searched for answers is registered, and is generated by an index generation technique such as an inverted index method used in an existing text information search technique.

次に、装置全体の動作と、各部の動作について説明する。
装置全体の動作は、事前に回答用の抽出情報データベースを生成するデータベース生成フェーズと、質問に対する応答を行う質問回答フェーズとに分けられる。
以下では、まず、データベース生成フェーズにおける動作について図２のフローチャートを用いて説明し、続いて、質問応答フェーズにおける動作について図３のフローチャートを用いて説明する。 Next, the operation of the entire apparatus and the operation of each unit will be described.
The operation of the entire apparatus is divided into a database generation phase for generating an extraction information database for answers in advance and a question answer phase for responding to questions.
In the following, first, the operation in the database generation phase will be described with reference to the flowchart of FIG. 2, and then the operation in the question answering phase will be described with reference to the flowchart of FIG.

（データベース生成フェーズ）
データベース生成フェーズでは、まず、ステップＳ２０１にて、関連情報抽出部１０５が、予め用意したパターンを用いて、テキスト情報データベース１０４に登録されているテキストデータから、関連する情報として前記パターンに合うものを抽出する。この抽出に用いるパターンは、予めどのような質問がされるかを想定して種々用意される。例えば、次のようなパターンが用意される。
パターンＰ１：“＠ＰＲＯＤＵＣＴが＠ＤＡＴＥに発売”
パターンＰ２：“＠ＰＲＯＤＵＣＴ（＠ＤＡＴＥ発売）”
ここで、抽出対象のテキストデータに、次のような記述がある場合を考える。
テキストＴ１：“・・・ＡＢＣが○月×日に発売された。”
テキストＴ２：“・・・ＣＤＥ（○月△日発売）は・・・。”
関連情報抽出部１０５は、まず入力テキストデータから固有名の抽出を行う。テキストデータからの固有名の抽出は、固有名辞書やパターン照合に基づく抽出ルールなどを用いた既存の技術により抽出する。例えば、人名や地名などは辞書を用いて、日付などは抽出ルールを用いる。上記した例のテキストからは、次のように製品名と日付を抽出することが可能である。ここで、「＠ＰＲＯＤＵＣＴ」は製品名を、「＠ＤＡＴＥ」は日付を表す。
テキストＴ１’：“・・・＠ＰＲＯＤＵＣＴが＠ＤＡＴＥに発売された。”
テキストＴ２’：“・・・＠ＰＲＯＤＵＣＴ（＠ＤＡＴＥ発売）は・・・。”
この結果と先のパターンＰ１およびＰ２とにより、パターンＰ１とテキストＴ１’、パターンＰ２とテキストＴ２’の照合が成功する。
ここで、テキストＴ１’に対し、マッチした固有名の部分について元のテキストデータであるテキストＴ１を参照して、
“（○月×日、ＡＢＣ）”
が求められる。同様に、テキストＴ２’に対し、マッチした固有名の部分について元のテキストデータであるテキストＴ２を参照して、
“（○月△日、ＣＤＥ）”
が求められる。
また、例えば、
パターンＰ３：“＠ＣＯＭＰＡＮＹ．＊＠ＰＥＲＳＯＮが＠ＤＡＴＥに＠ＣＯＭＰＡＮＹの社長に就任”
のような正規表現パターンを用いて（日付、会社名、社長名）を抽出する。なお、このパターンでは＠ＰＥＲＳＯＮにより抽出された人名を社長名とする。
例えば、次のテキストデータＴ３，Ｔ４があったとする。
テキストＴ３：“○×社は・・・○□氏が○年△月×日に社長に就任したと・・・。”
テキストＴ４：“○×社は・・・○△氏が□年△月×日に社長に就任したと・・・。”
関連情報抽出部１０５は、テキストデータＴ３，Ｔ４から固有名の抽出を行い、次のように会社名、人名、日付を抽出することが可能である。ここで、「＠ＣＯＭＰＡＮＹ」は会社名を、「＠ＰＥＲＳＯＮ」は社長名となる人名を、「＠ＤＡＴＥ」は日付を表す。
テキストＴ３’：“＠ＣＯＭＰＡＮＹは・・・＠ＰＥＲＳＯＮが＠ＤＡＴＥに社長に就任したと・・・。”
テキストＴ４’：“＠ＣＯＭＰＡＮＹは・・・＠ＰＥＲＳＯＮが＠ＤＡＴＥに社長に就任したと・・・。”
などの記述から（日付、会社名、社長名）を抽出し、
“（○年△月×日、○×社、○□氏）”
“（□年△月×日、○×社、○△氏）”
を抽出することができる。
次にステップＳ２０２にて、抽出情報整理部１０６は、同一観点で抽出された複数の情報をまとめ、これらを抽出内容に基づき整理する。
例えば、上記の抽出の例における（日付、製品名）に関する情報を並べ替えた結果は図４のようになる。図４では抽出された各情報が時間順に並べ替えられている。
なお、ここでは（日付、製品名）、（日付、会社名、社長名）などの時間に関する情報について説明しているが、この他、例えば、（価格、製品名）、（地点、建物名）等のような他の情報についても同様の処理が可能である。こうした情報を抽出した場合は、製品名を価格順に並べ替えたり、建物名を南北（あるいは東西）方向に並べ替えたりすることが可能である。 (Database generation phase)
In the database generation phase, first, in step S201, the related information extraction unit 105 uses a pattern prepared in advance to search the text data registered in the text information database 104 for relevant information that matches the pattern. Extract. The pattern used for extraction is variously prepared assuming either advance what questions to be. For example, the following pattern is prepared.
Pattern P1: “@PRODUCT released on @DATE”
Pattern P2: “@PRODUCT (@DATE release)”
Here, consider a case where the text data to be extracted has the following description.
Text T1: “... ABC was released on XX month x day”
Text T2: "... CDE (released on the month and day) ..."
The related information extraction unit 105 first extracts a unique name from the input text data. The unique name from the text data is extracted by an existing technique using an extraction rule based on a unique name dictionary or pattern matching. For example, a dictionary is used for names of people and places, and an extraction rule is used for dates. From the text of the above example, it is possible to extract the product name and date as follows. Here, “@PRODUCT” represents the product name, and “@DATE” represents the date.
Text T1 ': "... @ PRODUCT has been released to @DATE"
Text T2 ': "... @ PRODUCT (@DATE released) ..."
Based on this result and the previous patterns P1 and P2, the pattern P1 and the text T1 ′ and the pattern P2 and the text T2 ′ are successfully verified.
Here, referring to the text T1 which is the original text data for the matched unique name portion for the text T1 ′,
“(○ month x day, ABC)”
Is required. Similarly, for the text T2 ′, referring to the text T2 that is the original text data for the matched unique name portion,
“(○ month △ day, CDE)”
Is required.
For example,
Pattern P3: “@COMPANY. * @ PERSON becomes the president of @COMPANY at @DATE”
(Date, company name, president name) is extracted using a regular expression pattern such as In this pattern, the name extracted by @PERSON is the president name.
For example, it is assumed that the following text data T3 and T4 exist.
Text T3: “○ × Company ... □ Mr. X □ assumed the post of President on the day of the month △ month × ...”
Text T4: “○ × company is ... △ Mr. △ appointed as president on □ month △ month × day ...”
The related information extraction unit 105 can extract a unique name from the text data T3 and T4 and extract a company name, a person name, and a date as follows. Here, “@COMPANY” represents the company name, “@PERSON” represents the name of the president, and “@DATE” represents the date.
Text T3 ': “When @COMPANY ... @PERSON became president of @DATE ..."
Text T4 ': “When @COMPANY ... @PERSON became president of @DATE ..."
(Date, company name, president name)
“(○ year △ month × day, ○ × company, Mr. ○ □)”
“(□ year △ month × day, ○ × company, ○ △ Mr.)”
Can be extracted.
In step S202, the extracted information organizing unit 106 collects a plurality of pieces of information extracted from the same viewpoint, and arranges them based on the extracted contents.
For example, the result of rearranging information on (date, product name) in the above extraction example is as shown in FIG. In FIG. 4, each extracted information is rearranged in order of time.
In addition, although information on time such as (date, product name), (date, company name, president name) is described here, for example, (price, product name), (location, building name) The same processing can be performed for other information such as. When such information is extracted, the product names can be rearranged in order of price, or the building names can be rearranged in the north-south (or east-west) direction.

なお、この実施形態ではデータを並べ替え、この並べ替えたデータから回答を検出する方法が最も自然で効率的な方法と考えられるため、この方法について説明するが、データの比較が可能ならば必ずしも並べ替えずに回答を検出するようにしてもよい。 In this embodiment, a method of rearranging data and detecting a response from the rearranged data is considered to be the most natural and efficient method. Therefore, this method will be described. However, if comparison of data is possible, the method is not necessarily described. You may make it detect an answer, without rearranging.

次にステップＳ２０３にて、データベース生成部１０７は、抽出・整理した情報を検索するための抽出情報データベース１０８を生成する。ここでは、先に抽出した（日付、製品名）、（日付、会社名、社長名）などの抽出情報を各要素（日付、製品名、会社名、社長名など）から検索するインデックスを生成する。これは各要素をキーとするハッシュ表を生成するなど、既存のデータベース生成技術により実現できる。 In step S <b> 203, the database generation unit 107 generates an extracted information database 108 for searching the extracted / organized information. Here, generate an index that searches the extracted information such as (date, product name), (date, company name, president name) etc. from each element (date, product name, company name, president name, etc.) . This can be realized by an existing database generation technique such as generating a hash table with each element as a key.

なお、上記実施形態では、データベース生成部１０７は、質問を受ける前に抽出情報データベース１０８を生成したが、必ずしも質問を受ける前に抽出情報データベース１０８を生成する必要はない。例えば、データベース生成部１０７は、この後説明する質問応答フェーズでユーザから質問が入力されてから質問処理部１０２によって当該質問が解析され得られたキーワードおよび回答タイプを用いて、当該質問に必要とされる抽出情報データベース１０８をテキスト情報データベース１０４から生成するようにしてもよい。 In the above embodiment, the database generation unit 107 generates the extraction information database 108 before receiving a question. However, it is not always necessary to generate the extraction information database 108 before receiving a question. For example, the database generation unit 107 uses the keyword and the answer type obtained by analyzing the question by the question processing unit 102 after the question is input from the user in the question answering phase to be described later. The extracted information database 108 may be generated from the text information database 104.

もし、このときにテキスト情報データベース１０４のデータ量が大きいために抽出情報データベース１０８を生成するのに時間を要す場合には、一旦、質問から抽出されたキーワードによってテキスト情報データベース１０４を検索し、キーワードと関連の深い例えば上位数十件を抽出する。そして、この抽出されたデータから抽出情報データベース１０８を生成することにより、少し回答の精度は落ちることになるが、抽出情報データベース１０８を生成するのに要する時間を短縮することができる。結果として質問を受けてから回答を出力するまでに要する時間を短縮することができる。 If it takes time to generate the extracted information database 108 due to the large amount of data in the text information database 104 at this time, the text information database 104 is searched once using the keyword extracted from the question, For example, the top tens of cases that are closely related to keywords are extracted. Then, by generating the extraction information database 108 from the extracted data, the accuracy of the answer is slightly lowered, but the time required to generate the extraction information database 108 can be shortened. As a result, the time required from receiving a question to outputting an answer can be reduced.

（質問応答フェーズ）
続いて、質問応答フェーズにおける処理を説明する。
（ステップＳ３０１）
まず、ステップＳ３０１にて、質問入力部１０１は、ユーザからの質問を入力する。
（ステップＳ３０２）
次に、ステップＳ３０２にて、質問処理部１０２は、質問入力部１０１から入力された質問からキーワードの抽出、回答タイプの解析を行う。このキーワードの抽出は既存の技術を用いて実現できる。例えば、形態素解析技術を用いて、質問の形態素解析結果から特定の品詞情報を持つ形態素（例えば、一般名詞、固有名詞など）を抽出することにより実現できる。回答タイプの解析も既存の技術により実現できる。例えば、回答タイプ毎に対応するルールを用意し、質問との照合結果に基づき解析できる。解析用ルールは、例えば次のようになる。
ルールＲ１：“．＊はだれ”→＠ＰＥＲＳＯＮ
ルールＲ２：“．＊はいつ”→＠ＤＡＴＥ
ルールＲ１は、質問が“．＊はだれ”というパターンにマッチした場合、このマッチした回答タイプを＠ＰＥＲＳＯＮとする。
ルールＲ２は、質問が”．＊はいつ”というパターンにマッチした場合、その回答タイプを＠ＤＡＴＥとする。
（ステップＳ３０３）
次に、ステップＳ３０３にて、相対表現解析部１０３が、質問に含まれる相対表現（“前の”、“次の”など）の解析を行う。これは、予め解析対象となる相対表現のパターンを用意し、質問との照合結果に基づき、当該質問に含まれる相対表現を解析する。 (Question answer phase)
Subsequently, processing in the question answering phase will be described.
(Step S301)
First, in step S301, the question input unit 101 inputs a question from a user.
(Step S302)
Next, in step S302, the question processing unit 102 extracts keywords from the question input from the question input unit 101 and analyzes the answer type. This keyword extraction can be realized using existing technology. For example, it can be realized by extracting a morpheme (for example, a general noun, proper noun, etc.) having specific part-of-speech information from a morpheme analysis result of a question using a morpheme analysis technique. Response type analysis can also be realized with existing technology. For example, a rule corresponding to each answer type can be prepared and analyzed based on a result of matching with a question. The analysis rules are as follows, for example.
Rule R1: “. * Who” → @ PERSON
Rule R2: “When is *.” → @ DATE
When the question matches the pattern “. * Who”, the rule R1 sets the matched answer type to @PERSON.
When the question matches the pattern “when is *.”, The rule R2 sets the answer type to @DATE.
(Step S303)
Next, in step S303, the relative expression analysis unit 103 analyzes the relative expressions (such as “previous” and “next”) included in the question. In this method, a relative expression pattern to be analyzed is prepared in advance, and a relative expression included in the question is analyzed based on a result of matching with the question.

例えば、次のような文字列のパターンを用意する。
パターンＢ１：“＠ＰＥＲＳＯＮの前の”：Ａ＝−１
パターンＢ２：“＠ＰＥＲＳＯＮの一代前の”：Ａ＝−１
パターンＢ３：“＠ＰＥＲＳＯＮの次の”：Ａ＝＋１
パターンＢ４：“＠ＰＥＲＳＯＮの前の前の”：Ａ＝−２
パターンＢ５：“＠ＰＥＲＳＯＮの二代前の”：Ａ＝−２
パターンＢ６：“＠ＰＥＲＳＯＮより前の”：Ａ＜＠ＰＥＲＳＯＮ
ここで、各パターンの“：”の右側は相対表現が表す点または範囲を示す。Ａ＝−１は、抽出された＠ＰＥＲＳＯＮに対して１つ前、つまり一代前を指している。
ここではステップＳ３０１にて、次の質問が入力されたものとして説明する。
Ｑ１：“○△氏の一代前の社長はだれ”
まず、相対表現解析部１０３は質問文から固有名の抽出を行う。質問からの固有名の抽出は、関連情報抽出部１０５において説明したように、既存の技術により実現できる。ここで、質問に対して固有名抽出を行うことより次のような結果が得られる。
Ｑ１’：“＠ＰＥＲＳＯＮの一代前の社長はだれ”
この結果に対してパターン照合を行うと、パターンＢ２との照合が成功する。これより質問Ｑ１に相対表現が含まれることが分かり、更に、元の固有名を質問Ｑ１から求めることができる。
この結果、次のような、基点となる固有名と、これに対して相対表現が示す点または範囲が得られる（ステップＳ３０３）。
（“○△氏”、Ａ＝−１）
次に、回答情報検索部１０９は、テキスト情報データベース１０４から質問Ｑ１の回答が含まれるデータを検索する。まず回答情報検索部１０９は、質問処理部１０２からは解析されたキーワードおよび回答タイプを得て、相対表現解析部１０３からは解析結果を得る。 For example, the following character string pattern is prepared.
Pattern B1: “Before @PERSON”: A = −1
Pattern B2: “Before the @PERSON”: A = −1
Pattern B3: “next to @PERSON”: A = + 1
Pattern B4: “Before @PERSON”: A = −2
Pattern B5: “Two generations before @PERSON”: A = −2
Pattern B6: “Before @PERSON”: A <@PERSON
Here, the right side of “:” in each pattern indicates a point or range represented by the relative expression. A = −1 indicates the previous one, that is, one generation before the extracted @PERSON.
Here, it is assumed that the next question has been input in step S301.
Q1: “Who is the previous president of Mr. △?”
First, the relative expression analysis unit 103 extracts a unique name from the question sentence. As described in the related information extraction unit 105, the extraction of the proper name from the question can be realized by an existing technique. Here, the following results are obtained by extracting the proper name for the question.
Q1 ': “Who is the previous president of @PERSON”
When pattern matching is performed on this result, matching with the pattern B2 succeeds. As a result, it is understood that the question Q1 includes a relative expression, and the original unique name can be obtained from the question Q1.
As a result, the following unique name and the point or range indicated by the relative expression are obtained (step S303).
("Mr. △", A = -1)
Next, the answer information search unit 109 searches the text information database 104 for data including the answer to the question Q1. First, the answer information search unit 109 obtains an analyzed keyword and answer type from the question processing unit 102, and obtains an analysis result from the relative expression analysis unit 103.

（ステップＳ３０４のＮｏ）
ここで、相対表現解析部１０３において、質問Ｑ１から相対表現が解析されなかった場合（ステップＳ３０４のＮｏ）には、質問Ｑ１が相対表現を含まない質問であると判断し、質問Ｑ１から解析されたキーワードと回答タイプのみを用いてテキスト情報データベース１０４を対象に回答情報を検索し（ステップＳ３０６）、例えば上位５件の回答候補をユーザに提示する。このように上位５件の回答候補を抽出するのは、キーワードと回答タイプのみを用いてテキスト情報データベース１０４から正しい回答を得ることは難しいからである。なお、これらは既存のテキスト情報検索技術を用いて実現することが可能である。 (No in step S304)
Here, when the relative expression is not analyzed from the question Q1 in the relative expression analysis unit 103 (No in step S304), it is determined that the question Q1 is a question that does not include the relative expression, and is analyzed from the question Q1. The answer information is searched for the text information database 104 using only the keyword and the answer type (step S306), and, for example, the top five answer candidates are presented to the user. The reason why the top five answer candidates are extracted in this way is that it is difficult to obtain correct answers from the text information database 104 using only keywords and answer types. These can be realized by using existing text information retrieval technology.

（ステップＳ３０４のＹｅｓ）
反対に、相対表現解析部１０３において、質問Ｑ１から相対表現が解析された場合（ステップＳ３０４のＹｅｓ）、この解析結果に基づき回答情報検索部１０９は抽出情報データベース１０８を対象に回答情報を検索する（ステップＳ３０５）。 (Yes in step S304)
On the other hand, when the relative expression is analyzed from the question Q1 in the relative expression analysis unit 103 (Yes in step S304), the answer information search unit 109 searches the extracted information database 108 for the answer information based on the analysis result. (Step S305).

まず、回答情報検索部１０９は、解析結果における固有名を用いて抽出情報データベース１０８を検索する。
次に、この検索結果に対し、相対表現が示す点または範囲の情報を回答情報として、抽出情報データベース１０８中から取り出す。例えば、相対表現解析部１０８の解析結果が、（“○△氏”、Ａ＝−１）のようになる場合は、まず固有名の”○△氏”を用いて抽出情報データベース１０８を検索する。
ここで、抽出情報データベース１０８には図５に示すような情報（日時、会社名、社長名）が登録されているものとする。なお、図５には、この他にも製品が何時発売されたかに関する抽出情報データ等、種々の抽出情報データが登録されているものとする。 First, the answer information search unit 109 searches the extracted information database 108 using the unique name in the analysis result.
Next, information on the point or range indicated by the relative expression is extracted from the extracted information database 108 as answer information for this search result. For example, when the analysis result of the relative expression analysis unit 108 becomes (“Mr. △△”, A = −1), first, the extraction information database 108 is searched using the unique name “Mr. △△”. .
Here, it is assumed that information (date and time, company name, president name) as shown in FIG. 5 is registered in the extracted information database 108. In addition, in FIG. 5, it is assumed that various extracted information data such as extracted information data regarding when the product was released are registered.

回答情報検索部１０９は、質問処理部１０２から入力されたキーワードおよび回答タイプを用いて、抽出情報データベース１０８からデータを検索する。例えば上記の例では、キーワード：“○△氏”，“○×社”，“社長”、回答タイプ：＠ＰＥＲＳＯＮが用いられる。 The answer information search unit 109 searches the extracted information database 108 for data using the keyword and answer type input from the question processing unit 102. For example, in the above example, the keywords: “Mr. XX”, “XX Company”, “President”, and answer type: @PERSON are used.

回答情報検索部１０９は、このとき回答タイプの＠ＰＥＲＳＯＮおよびキーワード“社長”より、社長名（ＰＥＲＳＯＮ）が含まれる抽出情報データが検索対象とし、更にこの抽出情報データから、キーワード“○×社”、“○△氏”により検索を行うことで、 At this time, the response information search unit 109 searches for the extracted information data including the president name (PERSON) from the response type @PERSON and the keyword “President”. Further, from this extracted information data, the keyword “XX company” is searched. , By searching for “Mr.

“（２００３年４月１日、○×社、○△氏）”、
…、
“（２００４年３月２８日、○×社、○△氏）”、
という検索結果を得る。
さらに抽出情報データベース１０８において、Ａ＝−１から、これら情報より１つ前の情報、すなわち
“（２００３年３月３１日、○×社、□△氏）”
が検索され、この検索結果から“□△氏”が回答情報とされる。
詳細には、抽出情報データベース１０８に登録されている情報の中から、情報の日時が２００３年４月１日よりも過去のデータであって、キーワード：“○×社”，“社長”、回答タイプ：＠ＰＥＲＳＯＮのデータを抽出して日時で並べ替えることにより、図５の
“（１９９９年１０月１１日、○×社、◇×氏）”、
…、
“（２００３年３月２９日、○×社、□△氏）”、
“（２００３年３月３１日、○×社、□△氏）”
を得る。そしてこの並べ替えた中から最も新しいデータから検索していき、“○△氏”から初めて（１回目に）変わる社長名のデータである、
“（２００３年３月３１日、○×社、□△氏）”
が検索され、この検索結果から“□△氏”が回答情報とされる。
なお、質問の内容が一代前ではなく二代前であった場合には、“○△氏”から２回目に変わる社長名のデータが検索され回答情報を得ることになる。
また、検索方法の変形例としては、一旦、キーワード：“○×社”，“社長”、回答タイプ：＠ＰＥＲＳＯＮを用いて抽出情報データベース１０８から全ての日時に関するデータを検索し、この検索結果から、“○△氏”から初めて（１回目に）変わる社長名のデータを回答とするようにしても、同じ回答を得ることができる。 “(April 1, 2003, ○ × company, Mr. ○ △)”,
…,
“(March 28, 2004, ○ × company, Mr. ○ △)”,
The search result is obtained.
Further, in the extracted information database 108, since A = −1, information immediately before this information, that is, “(March 31, 2003, XX company, Mr. □ △)”
Is searched, and "Mr.
Specifically, from the information registered in the extracted information database 108, the date and time of the information is past data from April 1, 2003, and the keywords: “XX Company”, “President”, answer By extracting the data of type: @PERSON and sorting by date and time, “(October 11, 1999, ○ × company, ◇ ×)” in FIG.
…,
“(March 29, 2003, XX company, Mr. □ △)”,
“(March 31, 2003, ○ × company, Mr. □ △)”
Get. The search is performed from the newest data among the sorted data, and is the data of the president's name that changes for the first time (first time) from “Mr.
“(March 31, 2003, ○ × company, Mr. □ △)”
Is searched, and "Mr.
When the content of the question is not the previous generation but the second generation, the data of the president name that changes the second time from “Mr.
As a modified example of the search method, data on all dates and times is searched from the extracted information database 108 using the keywords: “XX Company”, “President”, and the answer type: @PERSON. The same answer can be obtained even if the answer is the data of the president's name that changes for the first time (first time) from “Mr.

なお、ここで検索が失敗して、抽出情報データベース１０８から回答情報が得られなかった場合は、質問から相対表現情報が解析されなかった場合であるステップＳ３０４のＮｏと同様の方法を用いて、改めてテキスト情報データベース１０４を対象に回答情報を検索する。 If the search fails and no answer information is obtained from the extracted information database 108, using the same method as in step S304, where the relative expression information is not analyzed from the question, The response information is searched again for the text information database 104.

次に、ステップＳ３０７にて、回答抽出部１１０は、回答情報検索部１０９の検索結果から回答を抽出する。これは既存の質問応答システムに用いられている技術で可能である。固有名抽出により検索結果から回答タイプと同じタイプの情報を抽出し、この抽出された各情報と質問Ｑ１中の各キーワードとの距離等に基づき、適切と考えられる回答を選択することができる。固有名抽出についても、関連情報抽出部１０５における処理と同様に可能である。 Next, in step S307, the answer extraction unit 110 extracts an answer from the search result of the answer information search unit 109. This is possible with the technology used in existing question answering systems. The same type of information as the answer type is extracted from the search result by proper name extraction, and an answer that is considered appropriate can be selected based on the distance between the extracted information and each keyword in the question Q1. The unique name can be extracted in the same manner as the processing in the related information extraction unit 105.

最後に、ステップＳ３０８にて、回答出力部１１１は、回答抽出部１１０で得た回答をユーザに出力する。
なお、この実施形態では例として、日付（＠ＤＡＴＥ）に関する相対表現が質問に含まれる場合について説明を行った。しかし、本発明は日付等の時間軸上での相対表現に限定するものではない。例えば、（価格、製品名）のような情報を抽出して“○□の次に安い○◇は何”という質問に回答したり、（地点、建物名）のような情報を抽出して“○×の北にある○△は何”という質問に回答したりすることも可能である。このように抽出した情報が順序づけて並べ替え可能であれば、任意の観点において抽出される情報に対して同様に適用できる。 Finally, in step S308, the answer output unit 111 outputs the answer obtained by the answer extraction unit 110 to the user.
In this embodiment, as an example, a case where a relative expression related to the date (@DATE) is included in the question has been described. However, the present invention is not limited to relative expressions on the time axis such as date. For example, you can extract information such as (price, product name) and answer the question “What is the cheapest ○ ◇ next to ○ □”, or extract information such as (location, building name) It is also possible to answer the question “What is △ in the north of XX”. If the information extracted in this way can be rearranged in order, it can be similarly applied to information extracted from an arbitrary viewpoint.

以上説明した通り本発明によれば、複数のテキストデータにおける記述内容から回答を求めることが必要となる相対的な表現を含む質問に対し、データベースの検索によって回答を求めることが可能となる。 As described above, according to the present invention, it is possible to obtain an answer by searching a database for a question including a relative expression that requires an answer to be obtained from descriptions in a plurality of text data.

なお、本願発明はテキスト情報データベース１０４に蓄積されている範囲から分かる範囲内の回答であり、例えば質問された内容に関するテキスト情報がテキスト情報データベース１０４に全く蓄積されていないような場合には正しい答えが導き出されない。しかし、これは本発明では誤差の範囲内である。 It should be noted that the present invention is an answer within a range that can be understood from the range stored in the text information database 104. For example, when the text information related to the questioned content is not stored at all in the text information database 104, the correct answer is given. Is not derived. However, this is within the limits of the error in the present invention.

（第２の実施形態）
図６は、本発明の第２の実施形態に係る質問応答装置のシステム構成を示す図である。
この実施形態に係る質問応答装置は、更に、外部のデータを入力するデータ入力手段６０１を更し、外部データベース等の外部データ６０２を抽出情報データベース１０８の生成に利用することもできる。このシステムでは、データ入力部６０１から入力される外部データが関連情報抽出部１０５に送られ、抽出情報データベース１０８の生成に利用される。この他の構成や動作は第１の実施形態と同様であり説明を省略する。 (Second Embodiment)
FIG. 6 is a diagram showing a system configuration of a question answering apparatus according to the second embodiment of the present invention.
The question answering apparatus according to this embodiment can further include data input means 601 for inputting external data, and can use external data 602 such as an external database for generating the extraction information database 108. In this system, external data input from the data input unit 601 is sent to the related information extraction unit 105 and used to generate the extraction information database 108. Other configurations and operations are the same as those in the first embodiment, and a description thereof will be omitted.

質問応答装置のシステム構成例を示す図。The figure which shows the system structural example of a question answering apparatus. データベース生成フェーズにおける処理を示す図。The figure which shows the process in a database production | generation phase. 質問応答フェーズにおける処理を示す図。The figure which shows the process in a question response phase. 抽出情報の整理結果を示す図。The figure which shows the arrangement | sequence result of extraction information. 抽出情報データベース１０８の検索を示す図。The figure which shows the search of the extraction information database. 質問応答装置の他のシステム構成例を示す図。The figure which shows the other system configuration example of a question answering apparatus.

Explanation of symbols

１００…質問応答装置、１０１…質問入力部、１０２…質問処理部、１０３…相対表現解析部、１０４…テキスト情報データベース、１０５…関連情報抽出部、１０６…抽出情報整理部、１０７…データベース生成部、１０８…抽出情報データベース、１０９…回答情報検索部、１１０…回答抽出部、１１１…回答出力部。 DESCRIPTION OF SYMBOLS 100 ... Question answering apparatus, 101 ... Question input part, 102 ... Question processing part, 103 ... Relative expression analysis part, 104 ... Text information database, 105 ... Related information extraction part, 106 ... Extraction information arrangement part, 107 ... Database generation part , 108 ... extraction information database, 109 ... answer information search unit, 110 ... answer extraction unit, 111 ... answer output unit.

Claims

Using a plurality of patterns prepared in advance, axis data that can be rearranged on a certain axis and related data associated with the data are extracted from a text information database, and the axis data and the related data are grouped Database generating means for generating an extraction information database comprising a plurality of extraction information data corresponding to the plurality of patterns;
An input means for inputting natural language questions from users;
A question analysis means for analyzing the question and extracting an answer type and a keyword of an answer to the question;
Relative expression analysis means for analyzing a relative expression for the axis from the question input from the input means;
Extraction information data determination means for determining which extraction information data to use among a plurality of extraction information data in the extraction information database using the answer type and the keyword;
Using the extracted information data determined by the extracted information data determining means and the analysis result by the relative expression analyzing means, an answer searching means for searching for an answer to the question from the extracted information database;
A question answering apparatus comprising output means for outputting an answer searched by the answer searching means.

Furthermore, an external data input unit for inputting text information from the outside is provided.
The question answering apparatus according to claim 1, wherein the database generation unit extracts the axis data and the related data from text information input by the external data input unit.

Using a plurality of patterns prepared in advance, axis data that can be rearranged on a certain axis and related data associated with the data are extracted from a text information database, and the axis data and the related data are grouped Generating an extraction information database comprising a plurality of extraction information data corresponding to the plurality of patterns;
Enter natural language questions from users,
Analyzing the question and extracting answer types and keywords for answers to the question,
Analyzing the relative expression for the axis from the question,
Using the answer type and the keyword, it is determined which extracted information data to use among a plurality of extracted information data in the extracted information database,
Using the determined extraction information data and the analysis result of the relative expression, search for an answer to the question from the extraction information database,
A question answering method characterized by outputting the retrieved answer.

In a program for causing a computer to function as a question answering device,
The program is stored in the computer.
Using a plurality of patterns prepared in advance, axis data that can be rearranged on a certain axis and related data associated with the data are extracted from a text information database, and the axis data and the related data are grouped Generating an extraction information database including a plurality of extraction information data corresponding to the plurality of patterns;
Let users enter questions in natural language,
Analyzing the question and extracting the answer type and keyword of the answer to the question,
Analyzing a relative expression for the axis from the question input from the input means;
Using the answer type and the keyword, it is determined which extracted information data to use among a plurality of extracted information data in the extracted information database,
Using the determined extraction information data and the analysis result of the relative expression, the answer to the question is searched from the extraction information database,
A question answering program for outputting the retrieved answer.