JP2002014990A

JP2002014990A - Question answering system, question answering processing method, transformation rule automatic acquisition processing method and recording medium for these system and method programs

Info

Publication number: JP2002014990A
Application number: JP2000193671A
Authority: JP
Inventors: Maki Murata; 真樹村田; Masao Uchiyama; 将夫内山; Hitoshi Isahara; 均井佐原
Original assignee: Communications Research Laboratory
Current assignee: Communications Research Laboratory
Priority date: 2000-06-28
Filing date: 2000-06-28
Publication date: 2002-01-18

Abstract

PROBLEM TO BE SOLVED: To provide a question answering system which has high correct answer rates to question statements and also has its easy and flexible construction. SOLUTION: In order to obtain an answer from a data base 11 to a question statement, the question statement is collated with a data base statement at a collation part 14 and resemblance is calculated between both statements. At the same time, a transformation part 16 transforms both question and data base statements by means of a transformation rule that is previously stored in a transformation rule storage part 15. Then the collation and transformation of statements are repeated at the parts 14 and 16 respectively for searching for a data base statement having higher resemblance to the question statement.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は，コンピュータによ
る自然言語の情報処理システムに係わり，特に類似度に
基づく推論を用いた質問応答システムに関するものであ
る。情報検索，情報抽出に利用することができる。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a natural language information processing system using a computer, and more particularly to a question answering system using inference based on similarity. It can be used for information retrieval and information extraction.

【０００２】[0002]

【従来の技術】質問応答システムとは，例えば「パーキ
ンソン病の兆候は脳のどの部分にある細胞の死が関係し
ていますか。」という質問を入力すると，大量の電子化
テキストから「パーキンソン病は，中脳の黒質にあるメ
ラニン細胞が変性し，黒質細胞内で作られる神経伝達物
質のドーパミンがなくなり発病する，とされている。」
といった文を探し出し，その文の中から質問に該当する
「黒質」を的確に取り出しこれを解答として出力するシ
ステムのことである。2. Description of the Related Art A question answering system, for example, inputs a question "What signs of the brain are involved in the signs of Parkinson's disease?" It is said that melanocytes in the substantia nigra of the midbrain degenerate and dopamine, a neurotransmitter that is produced in the substantia nigra, causes disease. "
This is a system that searches for such sentences and accurately extracts the "black matter" corresponding to the question from the sentence and outputs it as an answer.

【０００３】このような質問応答システムとして，本発
明者等が下記の参考文献１に発表した「構文情報を利用
した質問応答システム」が知られている。〔参考文献１〕Masaki Murata and Masao Utiyama and
Hitoshi Isahara, Question Answering System Using S
yntactic Information, 1999, http://xxx.lanl.gov/ab
s/cs.CL/9911006. この質問応答システムでは，まず質問文からキーワード
抽出を行い，データベースからキーワードのＩＤＦ（In
verse Document Frequency）の和が大きい文を抽出す
る。次に質問文と抽出した文とを構文解析する。解析結
果の構文情報を利用して質問文と抽出した文とを照合
し，解の候補を出しつつ，質問文と抽出した文の類似度
を所定の算出方法に従って計算する。データベースから
抽出した文のうち質問文との類似の最も高かったものか
ら解を抽出する。解の抽出は，質問文における疑問詞を
含む文節に対応づけられたデータベース側の文節を解と
することで行う。このシステムは，質問文とデータベー
スから得た文を類似度が高くなるように変形して照合す
ることは考慮していない。[0003] As such a question answering system, a "question answering system using syntax information" published by the present inventors in Reference 1 below is known. [Reference 1] Masaki Murata and Masao Utiyama and
Hitoshi Isahara, Question Answering System Using S
yntactic Information, 1999, http://xxx.lanl.gov/ab
s / cs.CL / 9911006. This question answering system first extracts keywords from a question sentence, and then outputs the IDF (In
verse Document Frequency) is extracted. Next, the sentence and the extracted sentence are parsed. The query sentence is compared with the extracted sentence using the syntax information of the analysis result, and the similarity between the question sentence and the extracted sentence is calculated in accordance with a predetermined calculation method, while providing solution candidates. From the sentences extracted from the database, the solution is extracted from the one having the highest similarity to the question sentence. The extraction of the solution is performed by using the phrase on the database side associated with the phrase containing the question word in the question sentence as the solution. This system does not consider transforming a query sentence and a sentence obtained from a database so as to increase the similarity and then collating the sentence.

【０００４】この「構文情報を利用した質問応答システ
ム」に先行する従来技術として，下記の参考文献２に記
載されている文の変形を利用する情報検索技術が知られ
ている。以下，これをＫａｔｚの方法という。〔参考文献２〕Boris Katz, Using English for Indexi
ng and Retrieving, Artificial Intelligence at MIT,
Vol.2, MIT Press, 1990. 一般に，質問応答システムにおける質問文とデータベー
スの文の照合による解答の導出では，質問文とデータベ
ース文とを照合することにより，質問文に最も一致する
データベース文の中から解答を得ることが行われる。[0004] As a prior art prior to the "question answering system using syntax information", there is known an information retrieval technique using a modification of a sentence described in Reference 2 below. Hereinafter, this is referred to as Katz's method. [Reference 2] Boris Katz, Using English for Indexi
ng and Retrieving, Artificial Intelligence at MIT,
Vol.2, MIT Press, 1990. In general, in deriving an answer by comparing a question sentence and a database sentence in a question answering system, the question sentence and the database sentence are compared to obtain a database sentence that best matches the question sentence. Getting answers from inside is done.

【０００５】例えば「日本の首都はどこですか」という
入力質問文があったときには，それと良く似た文をデー
タベースから抽出する。ここでは，「日本の首都は東京
である」という文があったとしよう。そして，「日本の
首都はどこですか」と「日本の首都は東京である」とを
照合し，疑問詞「どこ」の部分に対応する「東京」を解
答として出力する。[0005] For example, when there is an input question sentence "Where is the capital of Japan?", A sentence similar to that is extracted from the database. Here, let's say that there is a sentence "The capital of Japan is Tokyo." Then, "where is the capital of Japan" and "the capital of Japan is Tokyo" are collated, and "Tokyo" corresponding to the part of the question word "where" is output as an answer.

【０００６】しかし，いつも上記のように質問文とデー
タベース文とがぴったり一致し，解答部分を容易に取り
出せるとは限らない。例えば，データベースには，「日
本の首都は東京である」という文がなく，「東京は日本
の首都である」という文しかなかったとしよう。そうす
ると，質問文と照合できず解が得られなくなってしま
う。However, as described above, the question sentence and the database sentence exactly match, and the answer part cannot always be easily extracted. For example, suppose the database did not have the sentence "The capital of Japan is Tokyo", but only the sentence "Tokyo is the capital of Japan". Then, it cannot be matched with the question sentence and the solution cannot be obtained.

【０００７】上記参考文献２に記載されているＫａｔｚ
の方法は，質問文とデータベース文との文型が異なって
も照合ができるように，すべての文を最も一般的な表現
（これを基底表現という）に変形してから照合を行って
解を抽出する。例えば，上記の例では「日本の首都は東
京である」と「東京は日本の首都である」と，どちらが
基底表現かはわからないが，ここでは「日本の首都は東
京である」を基底表現としておくことにして，データベ
ースの文をそれに変形し，質問文の「日本の首都はどこ
ですか」と照合して解を得る。[0007] Katz described in Reference 2 above
Is to convert all sentences to the most common expression (this is called the base expression) and then extract the solution so that the sentence can be matched even if the sentence patterns of the query sentence and the database sentence are different. I do. For example, in the above example, it is not known which base expression is "Japan's capital is Tokyo" or "Tokyo is Japan's capital", but here "Japan's capital is Tokyo" as the base expression. The sentence in the database is transformed into it, and the answer is obtained by collating it with "Where is the capital of Japan?"

【０００８】[0008]

【発明が解決しようとする課題】上記参考文献１の「構
文情報を利用した質問応答システム」は，質問文とデー
タベース文との照合に主に意味制約を利用するものであ
る。しかし，文を照合する際に意味制約を利用する方法
と文の変形を利用する方法には，それぞれ一長一短があ
り，意味制約を利用する方法だけでは，必ずしも十分な
照合を実現できるとは限らない。文の変形を利用する方
法も質問文とデータベース文との照合には有効であると
考えられる。The "question answering system using syntax information" of the above-mentioned reference 1 mainly uses a semantic constraint for matching a question sentence with a database sentence. However, there are advantages and disadvantages to using semantic constraints and using sentence transformations when matching sentences, and a method using semantic constraints alone cannot always achieve sufficient matching. . A method using sentence transformation is also considered effective for collating a question sentence with a database sentence.

【０００９】しかし，上記Ｋａｔｚの方法には，次のよ
うな難点がある。それは，すべての文を最も一般的な表
現である基底表現に変形する必要があるが，基底表現を
厳密に定義することは困難であることである。例えば，
基底表現を能動態の文と定義し，受動態の文を能動態の
文に変形するような場合には，比較的一律な変形が可能
であるが，上記の例で「東京は日本の首都である」とい
う文と，「日本の首都は東京である」という文とは，ど
ちらを基底表現とするかを明確に定めることはできな
い。すなわち，相互に変形可能な複数の文があった場合
に，どちらを基底表現とすべきか曖昧なケースが数多く
あり，必ず基底表現に変形するＫａｔｚの方法では，す
べてのケースについてどちらかを無理やり基底表現と定
義する必要がある。しかし，このような基底表現を決め
るのは実際には非常に困難な作業である。However, the Katz method has the following disadvantages. It is necessary to transform every sentence into the most common representation, the base representation, but it is difficult to define the base representation exactly. For example,
In the case where the base expression is defined as the active sentence and the passive sentence is transformed into the active sentence, a relatively uniform transformation is possible, but in the above example, "Tokyo is the capital of Japan" And the sentence "The capital of Japan is Tokyo" cannot be clearly defined as the base expression. That is, when there are a plurality of mutually deformable sentences, there are many cases where it is ambiguous which one should be used as a base expression. Must be defined as an expression. However, determining such a base representation is actually a very difficult task.

【００１０】また，単なる基底表現への変形では，「日
本の首都は東京である」という文と，「東京は関東地方
にある」という二つの文から「日本の首都は関東地方に
ある」というような文を導出する変形はできない。[0010] In addition, in a simple transformation to the base representation, the two sentences "Tokyo is located in the Kanto region" and "Tokyo is located in the Kanto region" indicate that "the Japanese capital is located in the Kanto region". Such a sentence cannot be transformed.

【００１１】本発明は上記問題点の解決を図り，質問文
に対する解答の正解率が高く，かつシステムの構築が容
易で柔軟性のある質問応答システムを提供することを目
的とする。具体的には，質問応答システムにおいて文の
変形を利用するにあたって，基底表現を決める必要をな
くし，変形規則の記述を容易に行うことができるように
すること，文の変形を柔軟に行うことができるようにす
ること，また変形規則を自動獲得する手段を提供するこ
とを目的とする。SUMMARY OF THE INVENTION It is an object of the present invention to provide a question answering system which solves the above-mentioned problems, has a high correct answer rate to a question sentence, and is easy and flexible to construct a system. Specifically, when using sentence transformation in a question answering system, it is not necessary to determine a base expression, so that transformation rules can be easily described, and sentence transformation can be performed flexibly. The purpose is to provide a means for automatically acquiring deformation rules.

【００１２】[0012]

【課題を解決するための手段】本発明は，質問文とデー
タベース文との照合の際に，あらかじめ記憶された変形
規則を用いて，質問文とデータベース文との類似度が高
まるように双方の文を書き換えることを最も主要な特徴
とする。According to the present invention, when a query sentence is compared with a database sentence, both of the query sentence and the database sentence are enhanced by using a transformation rule stored in advance so as to increase the similarity between the query sentence and the database sentence. Rewriting sentences is the most important feature.

【００１３】変形を利用するところは上記Ｋａｔｚの方
法に似ているが，Ｋａｔｚの方法では，すべての文を基
底表現に変形するのに対し，本発明では，基底表現への
変形に限らず，類似度を尺度として，類似度が高くなる
ように変形を行う。これにより，本発明には以下の利点
がある。The use of the transformation is similar to the above-mentioned Katz's method. In the Katz's method, all sentences are transformed into a base expression. In the present invention, however, the present invention is not limited to the transformation into the base expression. Using the similarity as a scale, the transformation is performed so as to increase the similarity. Thus, the present invention has the following advantages.

【００１４】本発明では，基底表現を決める必要性がな
いため，変形規則の記述が容易になる。例えば，「日本
の首都は東京である」と「東京は日本の首都である」が
相互に変形可能な場合，どちらを基底表現とすべきか曖
昧であり，必ず基底表現に変形するＫａｔｚの方法で
は，どちらかを無理やり基底表現と定義する必要があ
る。しかし，このように無理やり定義しなければならな
い場合は数多くあり，基底表現を決めるのは実際的に難
しい。これに対して本発明では，必ずしも基底表現への
変形である必要はないため，この例の場合には，以下の
二つの規則を書くことで問題が解決される。In the present invention, there is no need to determine a base expression, so that the description of the deformation rule is facilitated. For example, if "Japan's capital is Tokyo" and "Tokyo is Japan's capital" are mutually deformable, it is unclear which base expression should be used. , It is necessary to force one of them to be defined as the base representation. However, there are many cases where such a definition must be forced, and it is practically difficult to determine a base representation. On the other hand, in the present invention, since it is not always necessary to transform the expression into the base expression, in this case, the problem is solved by writing the following two rules.

【００１５】規則１：「日本の首都は東京である」を
「東京は日本の首都である」に変形規則２：「東京は日本の首都である」を「日本の首都は
東京である」に変形このように本発明は，変形規則の右辺が基底表現である
必要がないことが特徴である。ただし，右辺に基底表現
以外のものを記述する場合，変形を制御・管理する機構
が必要であり，本発明では類似度という尺度で変形を制
御・管理する。つまり，類似度が高くなるように変形を
行うことで，質問文とデータベース文との照合における
精度の向上を図る。Rule 1: "Tokyo is the capital of Japan" is transformed from "Tokyo is the capital of Japan" Rule 2: "Tokyo is the capital of Japan" to "Tokyo is the capital of Japan" Modification As described above, the present invention is characterized in that the right side of the modification rule does not need to be a base expression. However, if something other than the base expression is described on the right side, a mechanism for controlling and managing the deformation is required. In the present invention, the deformation is controlled and managed on the scale of similarity. In other words, by performing deformation so as to increase the similarity, the accuracy in matching the question sentence with the database sentence is improved.

【００１６】また，本発明は，基底表現への変形規則と
いう制約がないため，国語辞典などの辞書データや既存
の質問応答集などのデータを利用して変形規則を自動獲
得することも可能である。すなわち，複数の辞書ファイ
ルから読み出した辞書データから同じ単語の説明文（ま
たは定義文）を抽出し，抽出した複数の説明文を突き合
わせ，その結果から同義語または同義フレーズを抽出し
て，ある文を同じ内容を表す他の文に書き換えるための
変形規則を生成する。In the present invention, since there is no restriction on the transformation rule for the base expression, the transformation rule can be automatically acquired by using dictionary data such as a Japanese language dictionary or data such as an existing collection of questions and answers. is there. In other words, a description sentence (or definition sentence) of the same word is extracted from the dictionary data read from a plurality of dictionary files, the extracted explanations are matched, and a synonym or synonymous phrase is extracted from the result. To generate another transformation rule for rewriting to another sentence representing the same content.

【００１７】または，既にある質問文とそれに対する応
答文とを入力し，入力した質問文と応答文とを突き合わ
せ，その結果から同義語または同義フレーズを抽出し，
それをもとに変形規則を生成する。Alternatively, an existing question sentence and a response sentence thereto are inputted, the input question sentence and the response sentence are matched, and a synonym or a synonym phrase is extracted from the result.
Based on this, a transformation rule is generated.

【００１８】以上の各処理をコンピュータによって実現
するためのプログラムは，コンピュータが読み取り可能
な可搬媒体メモリ，半導体メモリ，ハードディスクなど
の適当な記録媒体に格納することができる。A program for realizing each of the above processes by a computer can be stored in an appropriate recording medium such as a computer-readable portable medium memory, a semiconductor memory, and a hard disk.

【００１９】[0019]

【発明の実施の形態】図１は，本発明のシステム構成例
を示す。図中，１は本発明に係る質問応答システム，２
は辞書データや質問応答集から文の変形規則を自動獲得
する変形規則自動獲得システムである。FIG. 1 shows an example of a system configuration according to the present invention. In the figure, 1 is a question answering system according to the present invention, 2
Is a transformation rule automatic acquisition system that automatically acquires sentence transformation rules from dictionary data and question answer collections.

【００２０】質問文入力部１０は，自然言語による質問
文を入力する手段である。データベース１１は，新聞，
論文その他各種文献の電子化されたテキスト情報が格納
されたデータベースである。キーワード抽出／情報検索
部１２は，質問文からキーワードを抽出し，データベー
スを検索する手段である。構文解析部１３は，質問文と
データベース１１から検索によって抽出された文（これ
をデータベース文という）とを構文解析する手段であ
る。照合部１４は，入力した質問文とデータベース文と
を照合し，それらの類似度を算出する手段である。変形
規則記憶部１５は，文を同じ内容を表す他の文に変形す
る規則を記憶しているものである。The question sentence input section 10 is a means for inputting a question sentence in a natural language. The database 11 is a newspaper,
This is a database that stores digitized text information of papers and other various documents. The keyword extraction / information search unit 12 is a means for extracting a keyword from a question sentence and searching a database. The syntax analysis unit 13 is a means for syntactically analyzing a question sentence and a sentence extracted from the database 11 by a search (this is referred to as a database sentence). The collating unit 14 is a means for collating the input question sentence with the database sentence and calculating their similarity. The transformation rule storage unit 15 stores rules for transforming a sentence into another sentence representing the same content.

【００２１】変形部１６は，変形規則記憶部１５に記憶
されている変形規則を用いて質問文とデータベース文と
を書き換える手段である。書き換えた結果は，再度，照
合部１４において照合され，類似度が算出され，変形部
１６による処理と照合部１４による処理とが，類似度が
向上しなくなるまで繰り返される。解答出力部１７は，
類似度が最も高くなる照合において抽出されたデータベ
ース文から解を抽出し，それを応答文として出力する手
段である。The transformation section 16 is means for rewriting a question sentence and a database sentence using the transformation rules stored in the transformation rule storage section 15. The rewritten result is collated again by the collation unit 14, the similarity is calculated, and the processing by the transformation unit 16 and the processing by the collation unit 14 are repeated until the similarity does not improve. The answer output unit 17
This is a means for extracting a solution from the database sentence extracted in the matching having the highest similarity and outputting it as a response sentence.

【００２２】図２に，変形規則記憶部１５に格納される
変形規則の例を示す。図２（Ａ）は，同義語についての
変形規則の例であり，この変形規則は，上記Ｋａｔｚの
方法でも扱えるものである。図２（Ｂ）は，同義フレー
ズについての変形規則の例であり，この例における「Ａ
はＢである」→「ＢはＡである」という変形規則は，上
記Ｋａｔｚの方法では扱うことはできない。FIG. 2 shows an example of a transformation rule stored in the transformation rule storage unit 15. FIG. 2A is an example of a transformation rule for a synonym, and this transformation rule can be handled by the Katz method. FIG. 2B is an example of a transformation rule for a synonymous phrase.
Is B "→" B is A "cannot be handled by the Katz method described above.

【００２３】また，以上のような意味の直接的な等価性
を扱うものだけではなく，推論に関与する変形規則を利
用することもできる。図２（Ｃ）に，その例を示す。こ
こでは，変形規則の左辺が「Ａである」と「ＡならばＢ
である」という複数の文を入力としている。この利用例
について説明する。In addition, not only those dealing with the direct equivalence of the meaning as described above, but also transformation rules related to inference can be used. FIG. 2C shows an example thereof. Here, the left side of the transformation rule is "A" and "if A, B
Are input. This usage example will be described.

【００２４】例えば，「晴れである」という文と「晴れ
ならば傘は不要である」という文の二つの文があったと
する。これらの文に，図２（Ｃ）に示す変形規則を適用
すると，「晴れ」とＡ，「傘は不要」とＢが一致し，そ
の結果から「傘は不要である」が導出される。For example, it is assumed that there are two sentences, "the weather is fine" and the sentence "if it is fine, an umbrella is unnecessary". When the transformation rule shown in FIG. 2C is applied to these sentences, "sunny" matches A and "umbrella is unnecessary" and B match, and "umbrella is unnecessary" is derived from the result.

【００２５】本発明では，このように推論によって得ら
れる知識も変形規則で扱うことができる。変形規則の左
辺，右辺には，どのようなものがきてもよく，文の一部
でも結合体でも任意の記述が可能である。In the present invention, the knowledge obtained by the inference as described above can be handled by the transformation rule. The left and right sides of the transformation rule can be anything, and any description can be made, either as part of a sentence or as a union.

【００２６】次に，図３に示すフローチャートに従っ
て，図１に示す質問応答システム１の処理を説明する。Next, the processing of the question answering system 1 shown in FIG. 1 will be described with reference to the flowchart shown in FIG.

【００２７】まず，質問文入力部１０が質問文を入力す
る（ステップＳ１）。ネットワークを介した端末からの
入力，または情報検索などのアプリケーションプログラ
ムからの入力など，入力方法は問わない。First, the question sentence input unit 10 inputs a question sentence (step S1). There is no limitation on the input method such as input from a terminal via a network or input from an application program such as information retrieval.

【００２８】キーワード抽出／情報検索部１２は，入力
した質問文からキーワードを抽出する（ステップＳ
２）。キーワード抽出の簡単な方法としては，例えば文
を形態素解析し，名詞のみを残すといった方法がある。The keyword extraction / information search unit 12 extracts a keyword from the input question sentence (step S).
2). As a simple method of keyword extraction, for example, there is a method of morphologically analyzing a sentence and leaving only nouns.

【００２９】次に，キーワード抽出／情報検索部１２
は，データベース１１からキーワードのＩＤＦ（Invers
e Document Frequency）の和が大きい文を複数文抽出す
る（ステップＳ３）。ＩＤＦの値の簡単な算出方法とし
ては，例えばデータベース中の全文字列をＮ，キーワー
ドのデータベース中での出現頻度をｎとするときに，ｌ
ｏｇ（Ｎ／ｎ）としたものを用いることができる。な
お，ＩＤＦの値を用いずに，データベース中からおおざ
っぱに質問文に現れるキーワードを含む文をすべて取り
出してもよい。Next, the keyword extraction / information search section 12
Is the keyword IDF (Invers
A plurality of sentences having a large sum of (e Document Frequency) are extracted (step S3). As a simple method of calculating the IDF value, for example, when all character strings in the database are N and the frequency of appearance of keywords in the database is n,
og (N / n) can be used. Instead of using the value of IDF, all sentences including keywords appearing in the question sentence may be roughly extracted from the database.

【００３０】次に，構文解析部１３は，質問文とデータ
ベース１１から抽出した文（データベース文）のすべて
を構文解析し，これらをそれぞれ質問文の集合，データ
ベース文の集合とする（ステップＳ４）。この構文解析
では，例えば次の参考文献３に記載されている日本語構
文解析システムなどを利用することができる。〔参考文献３〕Sadao Kurohashi,Japanese Dependency/
Case Structure Analyzer KNP version 2.0b6,(Departm
ent of Informatics,Kyoto Univercity,1998). その後，照合部１４は，現在までの最も大きい類似度を
記憶する変数Ｓを０に初期化し（ステップＳ５），ステ
ップＳ６に進む。ステップＳ６では，構文解析部１３に
よる構文情報を利用して，質問文の集合の各成員と，デ
ータベース文の集合の各成員をあらゆる組合せで照合
し，それぞれに対して解の候補を求めながら，質問文と
データベース文の類似度を計算する。Next, the syntax analyzer 13 analyzes the syntax of the question sentence and all of the sentences (database sentences) extracted from the database 11, and uses them as a set of question sentences and a set of database sentences, respectively (step S4). . In this syntax analysis, for example, a Japanese syntax analysis system described in Reference 3 below can be used. [Reference 3] Sadao Kurohashi, Japanese Dependency /
Case Structure Analyzer KNP version 2.0b6, (Departm
After that, the matching unit 14 initializes a variable S that stores the largest similarity up to the present time to 0 (step S5), and proceeds to step S6. In step S6, each member of the set of question sentences and each member of the set of database sentences are collated in any combination by using the syntax information by the syntax analysis unit 13, and a solution candidate is determined for each combination. Calculate the similarity between a question sentence and a database sentence.

【００３１】類似度の計算式の例について説明する。質
問文とデータベース文ｐの類似度は，以下の式のScore
(p)によって与えられる。An example of a formula for calculating the similarity will be described. The similarity between the question sentence and the database sentence p is calculated by the following formula Score
given by (p).

【００３２】Score(p)＝BNST1(p)＋α×BNST2(p)−β₁
×BNUM_sent(p) −β₂×BNUM_all(p) ここで， BNST1(s)＝Σ NEAR(p,b)×JIRITSU(b) （Σは質問文
の文節ｂの和） BNST2(s)＝Σ NEAR(p,b)×bnst2(b1,b2) （Σは質問文
のすべての係り受け関係(b1,b2) で和をとる。ただし，
b1はb2に係る） bnst2(b1,b2)は， JIRITSU(b1)×FUZOKU(b1)×JIRITSU
(b2) が０でないとき， bnst2(b1,b2)＝ JIRITSU(b1)＋FUZOKU(b1)＋JIRITSU(b
2) それ以外のとき，bnst2(b1,b2)＝０ NEAR(p,b) ＝NEAR1(p,b)＋NEAR2(p,b)＋NEAR3(p,b) NEAR1(p,b)は，ｂが解答部分と同一文の場合，NEAR1(p,
b)＝γ₁それ以外のとき，NEAR1(p,b)＝１ NEAR2(p,b)は，ｂが解答部分と同一文で疑問詞とｂの係
り受け距離とデータベース文でそれらに対応する文節間
の係り受け距離の大きいほうがｄの場合, NEAR2(p,b)＝１＋γ₂／（１＋ｄ）それ以外のとき，NEAR2(p,b)＝１ NEAR3(p,b)は，ｂが解答部分と同一文で疑問詞とｂの文
節距離とデータベース文でそれらに対応する文節間の文
節距離の大きいほうがｄ′の場合, NEAR3(p,b)＝１＋γ₃／（１＋ｄ′）それ以外のとき，NEAR3(p,b)＝１抽出したデータベース文のすべての文節は，上記のScor
e(p)の値が最大になるように入力側のいずれかの文節に
対応づける。JIRITSU(b)は，入力側の文節ｂと，それに
対応づけられた文節との間の自立語における類似度で，
FUZOKU(b) は，入力側の文節ｂと，それに対応づけられ
た文節との間の付属語における類似度である。Score (p) = BNST1 (p) + α × BNST2 (p) -β ₁
× BNUM _sent (p) −β ₂ × BNUM _all (p) where BNST1 (s) = Σ NEAR (p, b) × JIRITSU (b) (Σ is the sum of clause b of the question sentence) BNST2 (s) = Σ NEAR (p, b) × bnst2 (b1, b2) (Σ is the sum of all dependency relations (b1, b2) in the question.
b1 is related to b2) bnst2 (b1, b2) is JIRITSU (b1) × FUZOKU (b1) × JIRITSU
When (b2) is not 0, bnst2 (b1, b2) = JIRITSU (b1) + FUZOKU (b1) + JIRITSU (b
2) Otherwise, bnst2 (b1, b2) = 0 NEAR (p, b) = NEAR1 (p, b) + NEAR2 (p, b) + NEAR3 (p, b) NEAR1 (p, b) is b If the sentence is the same as the answer, NEAR1 (p,
b) = γ ₁ Otherwise, NEAR1 (p, b) = 1 NEAR2 (p, b) is a case where b is the same sentence as the answer part and the dependency distance between the question word and b and the database sentence correspond to them NEAR2 (p, b) = 1 + γ ₂ / (1 + d) If the greater of the dependency distance between phrases is d, NEAR2 (p, b) = 1 otherwise, b is the answer for NEAR3 (p, b) NEAR3 (p, b) = 1 + γ ₃ / (1 + d ′) If the greater of the phrase distance between the question word and b in the same sentence as the part and the phrase distance between the corresponding phrases in the database sentence is d ′, NEAR3 (p, b) = 1 + γ ₃ / (1 + d ′) Then, NEAR3 (p, b) = 1 All the clauses of the extracted database sentence are
It is associated with any clause on the input side so that the value of e (p) is maximized. JIRITSU (b) is the similarity in the independent word between the phrase b on the input side and the phrase associated with it,
FUZOKU (b) is the degree of similarity in the appendix between the phrase b on the input side and the phrase associated with it.

【００３３】BNUM_sent(p) は，データベース文ｐのうち
解答部分が含まれる文の文節数である。BNUM_all(p)
は，データベース文ｐの文節数である。二つのBNUMは，
他の情報が同じなら余分な文節が存在しない文との照合
のほうが大きくなるようにするための項である。BNUM _sent (p) is the number of phrases in the database sentence p including the answer part. BNUM _all (p)
Is the number of clauses of the database sentence p. The two BNUMs are
This is a term for ensuring that matching with a sentence without an extra clause is larger if other information is the same.

【００３４】NEARは，解答部分と近接している文節の値
を上げるもので，質問文とデータベース文との照合をよ
りよく行うためのものである。NEARの算出で用いる二つ
の文節の間の「係り受け距離」とは，構文木におけるそ
の二つの文節の間の枝の数を意味し，二文節間の「文節
距離」とは，その二つの文節の間に１を加えた数を意味
する。NEAR raises the value of a phrase close to the answer part, and is used for better matching between the question sentence and the database sentence. The “dependency distance” between two clauses used in the calculation of NEAR means the number of branches between the two clauses in the parse tree, and the “separation distance” between two clauses is the two It means the number obtained by adding 1 between phrases.

【００３５】α，β₁，β₂，γ₁，γ₂，γ₃は，実
験で定める定数である。また，ここで示したScore(p)の
式は，BNST1,BNST2 と二項関係までしか用いていない
が，さらに三項，四項関係といったものを追加して用い
てもよい。Α, β ₁ , β ₂ , γ ₁ , γ ₂ and γ ₃ are constants determined by experiments. In addition, the expression of Score (p) shown here uses only the binary relation with BNST1 and BNST2, but may additionally use a ternary or quaternary relation.

【００３６】以上の式により質問文とデータベース文と
の類似度を計算したならば，この組合せのうち，最も大
きい類似度の値をＳ，そのときの解の候補をＡとする
（ステップＳ６）。When the similarity between the question sentence and the database sentence is calculated by the above equation, the largest similarity value of this combination is S, and the candidate solution at that time is A (step S6). .

【００３７】次に，今回のＳの値が前回のＳよりも大き
いかどうかを判定し（ステップＳ７），大きい場合には
ステップＳ８へ進み，大きくない場合にはステップＳ９
へ進む。Next, it is determined whether or not the current value of S is larger than the previous value (step S7). If it is larger, the process proceeds to step S8. If not, step S9 is performed.
Proceed to.

【００３８】ステップＳ８では，変形部１６が，変形規
則記憶部１５に記憶されている変形規則を用いて，質問
文の集合，データベース文の集合の各成員を書き換え，
書き換えた文を質問文の集合，データベース文の集合に
それぞれ追加する。その後，ステップＳ６へ戻り，再
度，照合部１４による照合を繰り返す。なお，変形部１
６による文の変形処理を繰り返すと，質問文の集合，デ
ータベース文の集合の成員の数が膨大な数となり，計算
コストが膨大になる。このときには，照合部１４による
処理の際に，質問文とデータベース文の類似度の値があ
る程度大きくなる場合の成員のみを残して，それ以外の
成員を削除しながら，処理を繰り返すのがよい。In step S8, the transformation unit 16 rewrites each member of the set of question sentences and the set of database sentences by using the transformation rules stored in the transformation rule storage unit 15.
The rewritten sentences are added to a set of question sentences and a set of database sentences, respectively. Thereafter, the process returns to step S6, and the collation by the collation unit 14 is repeated again. In addition, the deformation part 1
6 is repeated, the number of members of the set of question sentences and the set of database sentences becomes enormous, and the calculation cost becomes enormous. At this time, it is preferable to repeat the process while leaving only the members whose similarity between the question sentence and the database sentence increases to some extent and deleting other members during the processing by the matching unit 14.

【００３９】ステップＳ７の判定において，今回のＳの
値が前回のＳよりも大きくならなかった場合，解答出力
部１７は，最も類似度の大きい解の候補Ａから解を抽出
し，それを質問文に対する応答文として出力し（ステッ
プＳ９），処理を終了する。解の候補となったデータベ
ース文からの応答文の生成は，例えば質問文における疑
問詞を含む文節と対応づけられたデータベース文中の文
節を解とすることで行う。If it is determined in step S7 that the current value of S is not larger than the previous value of S, the answer output unit 17 extracts a solution from the solution candidate A having the highest similarity and sends it to the question It is output as a response sentence to the sentence (step S9), and the process ends. The generation of a response sentence from a database sentence that is a solution candidate is performed, for example, by using a phrase in a database sentence associated with a phrase including a question word in a question sentence as a solution.

【００４０】以上の処理の例では，類似度の向上がみら
れなくなるまで変形部１６による変形を繰り返すとして
説明したが，計算時間の関係上，類似度がある閾値を上
回った場合に変形処理の繰り返しを打ち切ったり，また
は変形処理の繰り返しの最大回数をあらかじめ定めてお
き，その回数分繰り返した後に変形処理を終了したりす
るような実施も可能である。In the above-described processing example, the description has been given assuming that the deformation by the deformation unit 16 is repeated until the improvement of the similarity is not observed. However, due to the calculation time, when the similarity exceeds a certain threshold, the deformation processing is not performed. It is also possible to terminate the repetition or to determine the maximum number of repetitions of the deformation processing in advance, and to terminate the deformation processing after repeating the number of repetitions.

【００４１】以下に，上記処理による具体的な実行例に
ついて説明する。Hereinafter, a specific example of execution by the above processing will be described.

【００４２】(1) まず，質問文として以下の文が入力さ
れたとする。(1) First, it is assumed that the following sentence is input as a question sentence.

【００４３】・質問文：「日本の首都はどこであるか」 --- a この文が質問文の集合の成員となる。キーワード抽出／
情報検索部１２では，「日本」「首都」がキーワードと
して得られる。Question sentence: "Where is the capital of Japan?" --- a This sentence is a member of a set of question sentences. Keyword extraction /
In the information search unit 12, "Japan" and "capital" are obtained as keywords.

【００４４】(2) キーワード抽出／情報検索部１２は，
「日本」と「首都」をキーワードとしてデータベース１
１を検索し，これらのＩＤＦの値が大きい文を複数文，
抽出する。ここでは，以下の二つの文が得られたとす
る。(2) The keyword extraction / information search unit 12
Database 1 with keywords "Japan" and "Capital"
1 and a plurality of sentences having a large IDF
Extract. Here, it is assumed that the following two sentences are obtained.

【００４５】・データベース文１：「東京は日本の首都である」 --- b ・データベース文２：「日本の隣国韓国の首都はソウルである」 --- c この二つの文が抽出した文の集合の成員となる。Database sentence 1: “Tokyo is the capital of Japan” --- b ・ Database sentence 2: “The capital of Japan's neighbor Korea is Seoul” --- c The sentence extracted from these two sentences Become a member of the set.

【００４６】(3) 構文解析部１３は，質問文とデータベ
ース文をすべて構文解析する。つまり，上記ａ，ｂ，ｃ
の文を構文解析する。(3) The syntax analyzer 13 parses all the question sentences and database sentences. That is, a, b, c
Is parsed.

【００４７】(4) 照合部１４は，照合の開始にあたっ
て，最大の類似度を記憶する変数Ｓの値を０にセットす
る。(4) At the start of the collation, the collation unit 14 sets the value of the variable S that stores the maximum similarity to 0.

【００４８】(5) その後，照合部１４は，質問文の集合
の成員とデータベース文の集合の成員をあらゆる組合せ
で照合する。つまり，ａとｂの照合，ａとｃの照合を行
う。ここで，ａとｂの類似度が２２，ａとｃの類似度が
３５であったとする。このとき，照合の最大の類似度は
３５であるので，Ｓは３５にセットされる。(5) Thereafter, the collating unit 14 collates the members of the set of question sentences with the members of the set of database sentences in any combination. That is, the comparison between a and b and the comparison between a and c are performed. Here, it is assumed that the similarity between a and b is 22, and the similarity between a and c is 35. At this time, since the maximum similarity of the collation is 35, S is set to 35.

【００４９】(6) Ｓの値３５が前回のＳの値０に比べて
大きいので，変形部１６の処理（ステップＳ８）が実行
される。変形部１６で，質問文の集合，データベース文
の集合の各成員に変形規則が適用される。ここでは，説
明を簡単にするため，変形規則が以下のものだけ用いら
れていたとする。(6) Since the S value 35 is larger than the previous S value 0, the processing of the deforming unit 16 (step S8) is executed. The transformation unit 16 applies a transformation rule to each member of the set of question sentences and the set of database sentences. Here, for the sake of simplicity, it is assumed that only the following deformation rules are used.

【００５０】・変形規則：ＡはＢである→ＢはＡである --- d 上記ａ，ｂ，ｃの文にそれぞれｄの変形規則が適用さ
れ，以下の三つの文が新たに生成される。Modification rule: A is B → B is A --- d The modification rule of d is applied to each of the sentences a, b, and c, and the following three sentences are newly generated. You.

【００５１】・ａの変形（ｄの利用) ：「どこが日本の首都であるか」 --- e ・ｂの変形（ｄの利用) ：「日本の首都は東京である」 --- f ・ｃの変形（ｄの利用) ：「ソウルは日本の隣国韓国の首都である」--- g これらを質問文の集合，データベース文の集合に追加す
る。ここでは，ｅが質問文の集合に，ｆ，ｇがデータベ
ース文の集合に追加される。Deformation of a (use of d): “Where is the capital of Japan” --- e · Deformation of b (use of d): “The capital of Japan is Tokyo” --- f ・Modification of c (use of d): "Seoul is the capital of Korea, the neighbor of Japan." --- g Add these to the set of question sentences and the set of database sentences. Here, e is added to the set of question sentences, and f and g are added to the set of database sentences.

【００５２】(7) 再度，照合部１４で質問文の集合の成
員とデータベース文の集合の成員をあらゆる組合せで照
合する。つまり，ａとｂ，ａとｃ，ａとｆ，ａとｇ，ｅ
とｂ，ｅとｃ，ｅとｆ，ｅとｆの照合を行う。ここで
は，ａとｆ，ｅとｂの類似度が４７で最も大きかったと
する。Ｓは，４７にセットされる。(7) The collating unit 14 again collates the members of the set of question sentences with the members of the set of database sentences in any combination. That is, a and b, a and c, a and f, a and g, e
And b, e and c, e and f, and e and f. Here, it is assumed that the similarity between a and f and e and b is 47, which is the largest. S is set to 47.

【００５３】(8) 今回のＳの値４７は前回の Sの値３５
よりも大きいので，再度，変形部１６の処理が実行され
る。変形部１６では，今回はすでに質問文の集合，デー
タベース文の集合にある文しか生成されないので，質問
文の集合，データベース文の集合は変化しない。(8) The current S value 47 is the previous S value 35
Therefore, the processing of the deforming unit 16 is executed again. At this time, the transformation unit 16 generates only the sentences in the set of question sentences and the set of database sentences this time, so that the set of question sentences and the set of database sentences do not change.

【００５４】(9) 再度，照合部１４で質問文の集合の成
員とデータベース文の集合の成員をあらゆる組合せで照
合する。このとき，集合の成員が前回から変化していな
いので，前回と同じくａとｆ，ｅとｂの照合の類似度が
最も大きく，その値は４７となる。また，このときの解
の候補は照合の際に疑問詞の部分に対応していた「東
京」であるとする。(9) Again, the collating unit 14 collates the members of the set of question sentences with the members of the set of database sentences in any combination. At this time, since the members of the set have not changed from the previous time, the similarity between the matching of a and f and the matching of e and b are the same as in the previous time, and the value is 47. In addition, it is assumed that the solution candidate at this time is “Tokyo” corresponding to the part of the question word at the time of matching.

【００５５】(10) 今回のＳの値４７は前回のＳの値と
同じ大きさなので，繰り返し処理はここで終了し，解の
候補としていた「東京」が解として出力される。(10) Since the current S value 47 is the same size as the previous S value, the repetition processing ends here, and “Tokyo”, which is a solution candidate, is output as a solution.

【００５６】本発明では，さらに変形部１６が使用する
変形規則を自動獲得する変形規則自動獲得システム２を
持つ。この変形規則自動獲得システム２は，質問応答シ
ステム１内の処理機能として，質問応答システム１内に
組み込むこともできる。The present invention further has a transformation rule automatic acquisition system 2 for automatically acquiring the transformation rule used by the transformation unit 16. The transformation rule automatic acquisition system 2 can be incorporated in the question answering system 1 as a processing function in the question answering system 1.

【００５７】従来技術として説明したＫａｔｚの方法で
は，変形規則は，ある特定の基底表現に変形するもので
あるため，人手によって変形規則を記述し作成する必要
がある。これに対し，本発明で用いる変形規則は，文Ａ
から文Ｂへの変形と，文Ｂから文Ａへの変形とを区別す
る必要がない。したがって，以下に説明するように，既
存のデータを用いてコンピュータによる処理により，変
形規則を自動生成することが可能である。In the Katz method described as the prior art, since the transformation rule transforms to a specific base expression, it is necessary to manually describe and create the transformation rule. On the other hand, the transformation rule used in the present invention is as follows:
There is no need to distinguish between a transformation from a sentence B to a sentence B and a transformation from a sentence B to a sentence A. Therefore, as will be described below, it is possible to automatically generate a deformation rule by performing processing by a computer using existing data.

【００５８】第１の方法は，コンピュータが読み取り可
能な複数の国語辞典を用意し，これら複数の辞典の説明
文（定義文）の突き合わせにより，同義語・同義フレー
ズに関する知識を得て，それから変形規則を獲得する方
法である。A first method is to prepare a plurality of computer-readable Japanese language dictionaries, obtain knowledge about synonyms and synonymous phrases by matching explanatory sentences (definition sentences) of these dictionaries, and then transform the dictionary. How to get the rules.

【００５９】第２の方法は，質問応答集のデータを与え
ることで，その質問応答を成立させるために必要とされ
る同義語・同義フレーズに関する知識を得て，変形規則
を獲得する方法である。The second method is a method of acquiring data on a collection of questions and answers, obtaining knowledge about synonyms and synonyms required to establish the question and answer, and acquiring a transformation rule. .

【００６０】変形規則を自動獲得する第１の方法の処理
フローチャートを，図４（Ａ）に示す。まず，複数の国
語辞典等の辞書ファイルを用意し，それらから読み出し
た辞書データから，同じ見出し語（単語）に対する説明
文を抽出する（ステップＳ１１）。FIG. 4A is a processing flowchart of the first method for automatically acquiring the deformation rule. First, a plurality of dictionary files such as Japanese language dictionaries are prepared, and an explanatory sentence for the same headword (word) is extracted from the dictionary data read from them (step S11).

【００６１】次に，抽出した複数の説明文を突き合わせ
（ステップＳ１２），その結果から，一致する部分を除
いた異なる表現の部分を，同義語もしくは同義フレーズ
として抽出する（ステップＳ１３）。その同義語もしく
は同義フレーズを相互に変換できるように変形規則の左
辺，右辺に割り当てることにより，変形規則を生成し，
記憶する（ステップＳ１４）。以上の処理を辞書データ
中のすべての見出し語について繰り返し，すべての見出
し語について処理したならば，処理を終了する（ステッ
プＳ１５）。Next, the extracted explanatory texts are matched (step S12), and from the result, a part of a different expression excluding the matching part is extracted as a synonym or synonymous phrase (step S13). By assigning the synonyms or synonyms to the left and right sides of the transformation rule so that they can be converted mutually, a transformation rule is generated,
It is stored (step S14). The above processing is repeated for all headwords in the dictionary data, and when all headwords have been processed, the processing is terminated (step S15).

【００６２】具体例で説明する。例えば辞書＃１と辞書
＃２の二つの辞書があったとする。そこで「あべこべ」
という見出し語の辞書データから変形規則を生成するも
のとする。A specific example will be described. For example, assume that there are two dictionaries, dictionary # 1 and dictionary # 2. So "Abekobe"
It is assumed that a transformation rule is generated from the dictionary data of the headword.

【００６３】辞書＃１の「あべこべ」の定義文が，「順
序・位置などの関係がさかさまに入れかわっているこ
と」であったとし，辞書＃２の「あべこべ」の定義文
が，「順序・位置・関係がひっくり返っていること」で
あったとする。同じ見出し語の定義文であるので辞書＃
１の定義文と辞書＃２の定義文とは，同じ意味であると
考えられる。これらの定義文を突き合わせて照合する
と，「順序・位置などの関係が」と「順序・位置・関係
が」の部分がよく似ているので，この部分は一致すると
考えられる。このことから，両定義文の残りの部分「さ
かさまに入れかわっていること」と「ひっくり返ってい
ること」が対応することがわかる。これから，以下の二
つの変形規則が得られる。Assume that the definition sentence of “Abekobe” in dictionary # 1 is “the relationship between order and position is upside down”, and the definition sentence of “Abekobe” in dictionary # 2 is “ The order, position, and relationship are upside down. " Dictionary # because it is a definition sentence of the same headword
It is considered that the definition sentence 1 and the definition sentence # 2 have the same meaning. When these definition sentences are collated and matched, the part of "the relation of order, position, etc." is very similar to the part of "order, position, relation", so this part is considered to match. From this, it can be seen that the rest of the two definitions correspond to "upside down" and "overturning." From this, the following two deformation rules are obtained.

【００６４】変形規則１：「さかさまに入れかわってい
る」→「ひっくり返っている」変形規則２：「ひっくり返っている」→「さかさまに入
れかわっている」ここでは，複数の国語辞典の定義文の対応関係を利用し
ているが，これ以外にも意味的な対応関係があるもの同
士ならば，上記第１の方法を使うことができる。Deformation rule 1: “turned upside down” → “turned over” Deformation rule 2: “turned over” → “turned upside down” Although the correspondence between the definition sentences of the Japanese language dictionary is used, the above-described first method can be used as long as there are other meaningful correspondences.

【００６５】変形規則を自動獲得する第２の方法の処理
フローチャートを，図４（Ｂ）に示す。まず，電子化さ
れた質問応答集のテキストデータから質問文と応答文と
を読み出す（ステップＳ２１）。FIG. 4B is a processing flowchart of the second method for automatically acquiring the deformation rule. First, a question sentence and a response sentence are read from the text data of the digitized question and answer collection (step S21).

【００６６】次に，読み出した質問文と応答文とを突き
合わせ（ステップＳ２２），その結果から，同義語もし
くは同義フレーズとして抽出する（ステップＳ２３）。
その同義語もしくは同義フレーズを相互に変換できるよ
うに変形規則の左辺，右辺に割り当てることにより，変
形規則を生成し，記憶する（ステップＳ２４）。以上の
処理をすべての質問応答文について繰り返し，すべての
質問応答文について処理したならば，処理を終了する
（ステップＳ２５）。Next, the read question sentence is compared with the response sentence (step S22), and the result is extracted as a synonym or synonymous phrase (step S23).
By assigning the synonym or synonymous phrase to the left and right sides of the transformation rule so that they can be mutually converted, a transformation rule is generated and stored (step S24). The above process is repeated for all the question-and-answer sentences, and when all the question-and-answer sentences have been processed, the process ends (step S25).

【００６７】具体例で説明する。今，質問応答集から得
た質問文と応答文として，次のような文があったとす
る。A specific example will be described. Now, suppose that there are the following sentences as a question sentence and a response sentence obtained from the question and answer collection.

【００６８】質問文：「日本の首都はどこであるか」応答文：「東京は日本の首都である」このとき，おおざっぱな照合でも，場所を意味する疑問
詞「どこ」と「東京」とが対応することがわかる。ま
た，「日本の首都」というフレーズは容易に対応してい
ることがわかる。以上の知識から，以下の変形規則を得
ることができる。なお，疑問助詞「か」は省略する。Question sentence: "Where is the capital of Japan?" Response sentence: "Tokyo is the capital of Japan." It can be seen that they correspond. In addition, the phrase "the capital of Japan" is easily understood. From the above knowledge, the following deformation rules can be obtained. The question particle "ka" is omitted.

【００６９】変形規則：「ＡはＢである」→「ＢはＡで
ある」この変形規則があると上記の例の場合，質問文とデータ
ベース文とが完全に一致するまで変形することができ
る。このように，質問文とデータベース文の類似度が極
力上がるような変形規則を，フレーズの対応関係などか
ら獲得する。Modification rule: “A is B” → “B is A” With this modification rule, in the case of the above example, it can be modified until the question sentence and the database sentence completely match. . In this way, a transformation rule that maximizes the similarity between the question sentence and the database sentence is obtained from the correspondence between phrases and the like.

【００７０】本発明の質問応答システムは，情報抽出の
技術としても有用である。情報抽出とは，例えば県に関
する情報であれば，県名，県庁所在地，面積，人口，主
な産物…といった情報を，既存のデータベースから自動
抽出する技術である。The question answering system of the present invention is also useful as an information extraction technique. The information extraction is a technology for automatically extracting information such as the name of the prefecture, the location of the prefectural office, the area, the population, the main product, etc. from the existing database if the information is related to the prefecture.

【００７１】一般に，現在の情報抽出の技術は，対象と
する分野固有の知識に依存する部分が多く，システムを
他の分野へ移行させるのにコストがかかるという問題が
あるとされている。In general, it is said that the current information extraction technology largely depends on the knowledge specific to the target field, and there is a problem that it is costly to transfer the system to another field.

【００７２】これに対し，本発明のような質問応答シス
テムは，例えば「岩手県の県庁所在地はどこですか」と
聞いて，「盛岡」と答えるシステムである。質問応答の
場合，多様な自然言語の質問を行うことができ，分野依
存性がなく，さまざまな情報を自由に取得できるという
利点を持っている。この質問応答システムを用いて，前
述した県に関する一覧的な情報を抽出するには，「県に
はどのようなものがありますか」と聞いてから，それぞ
れに対して「県庁所在地はどこですか」，「面積はいく
らですか」，…と順次質問していけばよい。この逐次的
な質問をプログラム化しておけば，質問応答システムに
より分野依存性なく情報抽出の問題を解くことができ
る。On the other hand, a question answering system such as the present invention is a system which asks, for example, "Where is the prefectural office of Iwate prefecture?" And answers "Morioka". In the case of question answering, it is possible to ask questions in various natural languages, and there is an advantage that there is no dependence on fields and various information can be obtained freely. To use this question-answering system to extract comprehensive information about the prefectures mentioned above, ask "What kind of prefectures are there?" And then ask "Where is the prefectural office located?" , "How much is the area?" If this sequential question is programmed, the question extraction system can solve the information extraction problem without depending on the field.

【００７３】この結果，例えば次のような情報を自動抽
出することが可能になる。As a result, for example, the following information can be automatically extracted.

【００７４】（青森県，青森，Ｘｋｍ²，ａ人，りんご，……）（秋田県，秋田，Ｙｋｍ²，ｂ人，米，……）（岩手県，盛岡，Ｚｋｍ²，ｃ人，…，……）・・・・・(Aomori prefecture, Aomori, Xkm ² , a person, apple, ...) (Akita prefecture, Akita, Ykm ² , b person, rice, ...) (Iwate prefecture, Morioka, Zkm ² , c person, ...) , ……) ・・・・・

【００７５】[0075]

【発明の効果】以上説明したように，本発明によれば，
質問文とデータベース文の照合の際に，照合の類似度が
上がるように質問文とデータベース文とを変形するの
で，質問文とデータベース文との照合の精度を向上させ
ることができる。また，本発明では，類似度という尺度
で変形操作を制御するので，変形規則として基底表現に
限らず任意の変形規則を用いることができ，変形規則に
ついても容易に記述または生成することが可能である。As described above, according to the present invention,
Since the query sentence and the database sentence are transformed so as to increase the similarity of the matching at the time of matching the query sentence and the database sentence, the accuracy of the matching between the question sentence and the database sentence can be improved. Further, in the present invention, since the transformation operation is controlled on the basis of the similarity, any transformation rule can be used as the transformation rule, not limited to the base expression, and the transformation rule can be easily described or generated. is there.

[Brief description of the drawings]

【図１】本発明のシステム構成例を示す図である。FIG. 1 is a diagram showing a system configuration example of the present invention.

【図２】変形規則の例を示す図である。FIG. 2 is a diagram illustrating an example of a deformation rule.

【図３】質問応答システムのフローチャートである。FIG. 3 is a flowchart of a question answering system.

【図４】変形規則自動獲得システムのフローチャートで
ある。FIG. 4 is a flowchart of a transformation rule automatic acquisition system.

[Explanation of symbols]

１質問応答システム２変形規則自動獲得システム１０質問文入力部１１データベース１２キーワード抽出／情報検索部１３構文解析部１４照合部１５変形規則記憶部１６変形部１７解答出力部 DESCRIPTION OF SYMBOLS 1 Question answering system 2 Modification rule automatic acquisition system 10 Question sentence input part 11 Database 12 Keyword extraction / information search part 13 Syntax analysis part 14 Matching part 15 Deformation rule storage part 16 Deformation part 17 Answer output part

─────────────────────────────────────────────────────
────────────────────────────────────────────────── ───

【手続補正書】[Procedure amendment]

【提出日】平成１３年６月２１日（２００１．６．２
１）[Submission Date] June 21, 2001 (2001.6.2)
1)

【手続補正１】[Procedure amendment 1]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】特許請求の範囲[Correction target item name] Claims

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【特許請求の範囲】[Claims]

───────────────────────────────────────────────────── フロントページの続き (72)発明者井佐原均兵庫県神戸市西区岩岡町岩岡588−２郵政省通信研合研究所関西先端研究センター内Ｆターム(参考） 5B075 ND03 NK02 NK31 NK35 PP24 PR06 QM08 QP03 ────────────────────────────────────────────────── ─── Continuing on the front page (72) Inventor Hitoshi Isahara 588-2 Iwaoka, Iwaoka-cho, Nishi-ku, Kobe-shi, Hyogo F-term in the Kansai Advanced Research Center, Telecommunications Research Institute, Ministry of Posts and Telecommunications 5B075 ND03 NK02 NK31 NK35 PP24 PR06 QM08 QP03

Claims

[Claims]

1. A question answering system that inputs a question sentence in a natural language, generates a response sentence by comparing it with a sentence in a database, and outputs the answer sentence, storing rules for transforming the sentence into another sentence having the same content. A matching rule storage unit, a matching unit that matches an input question sentence with a sentence extracted from a database, and calculates a similarity between them; and a deformation rule storage unit based on a calculation result of the similarity by the matching unit. A rewriting unit that rewrites the question sentence and the sentence extracted from the database using the reshaping rules stored in the database and a collation unit that maximizes the similarity after repeating the processing by the collation unit and the remodeling unit. And a solution output unit that outputs the solution extracted in (1) as a response sentence.

2. A question-and-response processing method for inputting a question sentence in a natural language and generating and outputting a response sentence by comparing the sentence in a database with a sentence extracted from the database. Calculating the similarity thereof, and rewriting the question sentence and the sentence extracted from the database until the similarity becomes the highest, using a pre-stored sentence transformation rule. Outputting a solution extracted in the matching having the highest similarity as a response sentence.

3. A method for generating, using a computer, a transformation rule for transforming a sentence described in a natural language into another sentence representing the same contents, wherein the same rule is used for the same word from dictionary data read from a plurality of dictionary files. Extracting the description,
The process of matching multiple extracted explanatory sentences and extracting synonyms or synonyms from the results, and generating transformation rules from the extracted synonyms or synonym phrases to rewrite one sentence to another sentence that represents the same content Performing a transformation rule automatic acquisition process.

4. A method for using a computer to generate a transformation rule for transforming a sentence described in a natural language into another sentence representing the same content, comprising the steps of inputting a question sentence and a response sentence thereto. , Matching the input question sentence with the answer sentence, extracting synonyms or synonymous phrases from the results, and transforming the extracted synonyms or synonymous phrases into another sentence representing the same content Generating a rule.

5. A recording medium for recording a program for inputting a question sentence in a natural language and generating and outputting a response sentence by comparing it with a sentence in a database, the program being extracted from the input question sentence and the database. With the sentence
A process of calculating the similarity thereof, a process of rewriting the question sentence and the sentence extracted from the database using a pre-stored sentence transformation rule until the similarity becomes the highest, A question answer processing program recording medium characterized by recording a program for causing a computer to execute, as a response sentence, a solution extracted in the matching having the highest similarity.

6. A recording medium recording a program for generating, using a computer, a transformation rule for transforming a sentence described in a natural language into another sentence having the same content,
A process of extracting a description sentence of the same word from dictionary data read from a plurality of dictionary files, a process of matching a plurality of extracted description sentences and extracting a synonym or synonymous phrase from the result, and a process of extracting a synonym or synonym extracted A transformation rule automatic acquisition processing program recording medium characterized by recording a program for causing a computer to execute a process of generating a transformation rule for rewriting a sentence from a phrase to another sentence having the same content.

7. A recording medium recording a program for generating, using a computer, a transformation rule for transforming a sentence described in a natural language into another sentence representing the same content,
A process of inputting a question sentence and a response sentence thereto, a process of matching the input question sentence and a response sentence, and extracting a synonym or synonymous phrase from the result, and a process of extracting a synonym or synonymous phrase from the extracted synonym or synonymous phrase Processing for generating a transformation rule for rewriting the text into another sentence representing the same content,
A recording medium for automatically acquiring a deformation rule, wherein a program for causing a computer to execute the program is recorded.