JP2012079161A

JP2012079161A - Natural language text generation device and computer program

Info

Publication number: JP2012079161A
Application number: JP2010224872A
Authority: JP
Inventors: Saeger Stijn De; デサーガステイン; Kentaro Torisawa; 健太郎鳥澤; Junichi Kazama; 淳一風間; Varga Istvan; イシュトヴァーンヴァルガ; Kiyotaka Otake; 清敬大竹
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2010-10-04
Filing date: 2010-10-04
Publication date: 2012-04-19
Anticipated expiration: 2030-10-04
Also published as: JP5540335B2

Abstract

PROBLEM TO BE SOLVED: To provide a query answering system capable of highly accurately and automatically generating a response text to a query by a natural language text of a broad range of fields.SOLUTION: A query answering device includes: a template assembly storage section 46 for storing a template of a response text to a query text; a template estimation section 48, upon receiving a query text, estimating a template having a prescribed relation with the query; a pattern extension processing section 52 generating an extension template in which a structure of words or a text is corrected by applying a word class and a template extension regulation to the estimated template; a matching section 60 for matching the extension template with a text of a Web corpus 32 and outputting response candidates to the query; a scoring selection section 68 calculating a score of the candidates indicating an eligibility as a replay to the query and outputting them in order of a high score. The extension template has two variables, and a word class which the word should have is specified to each variable.

Description

この発明は、自然言語の質問文を受けて回答を生成するシステムに関し、特に、自然言語で発せられた任意の質問に対する高精度な回答を自動的に生成する質問応答装置に関する。 The present invention relates to a system that receives a natural language question sentence and generates an answer, and more particularly, to a question answering apparatus that automatically generates a highly accurate answer to an arbitrary question issued in a natural language.

種々の質問に対して回答を与える質問応答システムの開発が進んでいる。たとえば後掲の非特許文献１には、ある関係にある単語対を取り出す技術が開示されている。非特許文献１に開示された技術では、取り出したい関係の単語を少量準備し、その単語対からパターンを学習する。回答時には、このパターンを利用して、その関係の単語対を取り出す。 Development of a question answering system that gives answers to various questions is in progress. For example, Non-Patent Document 1 described later discloses a technique for extracting word pairs having a certain relationship. In the technique disclosed in Non-Patent Document 1, a small amount of related words to be extracted is prepared, and a pattern is learned from the word pairs. At the time of answering, this pattern is used to extract the word pair of the relationship.

しかしこの非特許文献１に記載された技術では、目的の単語対の抽出精度に問題があるとされている。 However, in the technique described in Non-Patent Document 1, there is a problem in the extraction accuracy of the target word pair.

一方、上記したようなパターンをさらに言い換えによって拡張して質問応答システムで利用しようとする試みが、非特許文献２に開示されている。 On the other hand, Non-Patent Document 2 discloses an attempt to further expand the above-described pattern by paraphrasing and use it in the question answering system.

パトリック・パンテル他、Ｅｓｐｒｅｓｓｏ：意味的関係を自動的に獲得するための、一般パターンの拡大、第２１回計算機言語に関する国際大会及び第４４回計算機言語学会第４４回年次大会（Ｃｏｌｉｎｇ−ＡＣＬ−０６）、１１３−１２０ページ、２００６年（Patric Pantel et al., Espresso: Leveraging generic patterns forautomatically harvesting semantic relations. In Proceedings of the 21stInternational Conference on Computational Linguistics and 44thAnnual Meeting of the Association for Computational Linguistics(Coling-ACL-06), pages 113-120, 2006.）Patrick Pantell et al., Espresso: Expanding general patterns to automatically acquire semantic relationships, the 21st International Conference on Computer Language, and the 44th Annual Conference of the Computer Language Society (Colling-ACL-) 06), 113-120, 2006 (Patric Pantel et al., Espresso: Leveraging generic patterns forautomatically harvesting semantic relations. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (Coling-ACL- 06), pages 113-120, 2006.) デカン・リング他、質問応答に関する推論の発見、自然言語工学、７（４）：３４３−３６０ページ、２００１年（Dekang Ling et al, Discovery of Inference Rules for QuestionAnswering. Natural Language Engineering, 7(4): 343-360, 2001.）Deccan Ring et al., Discovery of Reasoning about Question Answering, Natural Language Engineering, 7 (4): 343-360, 2001 (Dekang Ling et al, Discovery of Inference Rules for QuestionAnswering. Natural Language Engineering, 7 (4): 343-360, 2001.)

しかし、上記した特許文献２に開示された技術は、十分に学習した分野の質問文に対する回答を推定することはできるものの、対応可能な分野に制限があるという問題がある。 However, although the technique disclosed in Patent Document 2 described above can estimate the answer to a question sentence in a sufficiently learned field, there is a problem that the field that can be handled is limited.

特定の分野に予め質問を限定することは、ある特定の使途にのみ使用するシステムでは有用ではあるが、できればより広い範囲の質問に自動的に回答できる質問応答システムがあると望ましい。 Limiting questions in advance to a specific field is useful in a system that is used only for a specific purpose, but it is desirable to have a question answering system that can automatically answer a wider range of questions if possible.

それゆえに本発明の目的は、幅広い分野の、自然言語文による質問に対して精度よく、自動的に回答文を生成できる質問応答システムを提供することである。 SUMMARY OF THE INVENTION Therefore, an object of the present invention is to provide a question answering system capable of automatically generating an answer sentence with high accuracy in response to a question in a natural language sentence in a wide range of fields.

本発明の第１の局面にかかる質問回答装置は、質問文に対する回答文のテンプレートであって、第１及び第２の２つの変数部分をもつテンプレートを記憶するためのテンプレート記憶手段と、質問文の入力を受けたことに応答し、テンプレート記憶手段に記憶されたテンプレートのうち、当該質問文に対する回答文の原型となる１又は複数のテンプレートを推定し、質問文のうちで各テンプレートの第１の変数部分に相当する単語を示す制約条件と共に出力するためのテンプレート推定手段と、テンプレート推定手段により出力されたテンプレートのうち、当該テンプレートの制約条件を構成する単語の単語クラスに基づいて選択されたテンプレートに対し、予め準備されたテンプレート拡張規則を適用することにより、単語又は文の構造が修正された１又は複数の拡張テンプレートを生成するためのテンプレート拡張手段とを含む。拡張テンプレートの各々は、質問文に対する回答候補の位置を示す第１及び第２の２つの変数を含み、当該２つの変数は、前記テンプレート記憶手段に記憶されたテンプレートの前記第１及び第２の２つの変数部分とそれぞれ関係付けられており、かつそれぞれには当該変数が満たすべき属性が指定されている。テンプレート拡張手段は、前記制約条件を構成する単語の属性が前記第１の変数と所定の関係を持つ拡張テンプレートを選択する。質問回答装置はさらに、テンプレート拡張手段により出力された拡張テンプレートの各々と、予め入手可能なように準備された多数の文とをマッチングすることにより、拡張テンプレートの第２の変数と交換可能な候補を１又は複数個出力するためのマッチング手段と、マッチング手段により出力された候補に対し、当該候補が得られるまでの経過、または当該候補にマッチした拡張テンプレートに含まれる単語と当該候補との共起頻度に基づいて、質問文に対する回答としての適格性を示すスコアを算出し、当該スコアの高いものの順番に候補を出力するためのスコアリング及び選択手段とを含む。 A question answering apparatus according to a first aspect of the present invention comprises a template storage means for storing a template of an answer sentence for a question sentence, the template having first and second variable parts, and a question sentence. In response to receiving the input, one or a plurality of templates serving as a prototype of the answer sentence for the question sentence is estimated from the templates stored in the template storage unit, and the first of each template is determined among the question sentences. The template estimation means for outputting together with the constraint condition indicating the word corresponding to the variable part of the template, and the template output by the template estimation means is selected based on the word class of the word constituting the constraint condition of the template By applying the template extension rules prepared in advance to the template, the structure of the word or sentence And a template extension means for generating one or more extension templates were Tadashisa. Each of the extension templates includes first and second variables indicating positions of answer candidates for the question sentence, and the two variables are the first and second variables of the template stored in the template storage unit. The two variable parts are associated with each other, and the attribute to be satisfied by the variable is designated for each. The template expansion means selects an expansion template in which the attribute of the word constituting the constraint condition has a predetermined relationship with the first variable. The question answering device further matches each of the extension templates output by the template extension means with a number of sentences prepared in advance so as to be exchangeable with the second variable of the extension template. Matching means for outputting one or a plurality of words, and the candidate output from the matching means until the candidate is obtained, or the word included in the extended template that matches the candidate and the candidate A scoring and selection means for calculating a score indicating eligibility as an answer to the question sentence based on the occurrence frequency and outputting candidates in order of the highest score is included.

変数の属性は、変数に代入される単語の単語クラスであってもよい。 The attribute of the variable may be a word class of a word assigned to the variable.

好ましくは、質問回答装置はさらに、質問文を通信により音声信号で受信し、音声認識によりテキスト列に変換してテンプレート推定手段に入力するための質問文受信手段を含む。 Preferably, the question answering apparatus further includes a question sentence receiving means for receiving the question sentence as a voice signal by communication, converting it into a text string by voice recognition, and inputting the text string to the template estimation means.

より好ましくは、質問回答装置はさらに、スコアリング及び選択手段により出力される候補を音声合成により音声に変換するための音声合成手段を含む。 More preferably, the question answering apparatus further includes speech synthesis means for converting candidates output by the scoring and selection means into speech by speech synthesis.

質問回答装置は、さらに、質問文を通信により音声信号で受信し、音声認識によりテキスト列に変換してテンプレート推定手段に入力するための質問文受信手段と、スコアリング及び選択手段により出力される候補を音声合成により音声信号に変換して、質問文を送信してきた端末に返信するための音声合成手段とを含んでもよい。 The question answering apparatus further receives the question sentence as a voice signal by communication, converts it into a text string by voice recognition, and outputs it to the template estimation means, and outputs it by the scoring and selection means. Speech candidates may be included for converting the candidate into a speech signal by speech synthesis and returning it to the terminal that sent the question sentence.

本発明の第２の局面にかかるコンピュータプログラムは、コンピュータを、質問文に対する回答文のテンプレートであって、第１及び第２の２つの変数部分をもつテンプレートを記憶するためのテンプレート記憶手段と、質問文の入力を受けたことに応答し、テンプレート記憶手段に記憶されたテンプレートのうち、当該質問文に対する回答文の原型となる１又は複数のテンプレートを推定し、質問文のうちで各テンプレートの第１の変数部分に相当する単語を示す制約条件と共に出力するためのテンプレート推定手段と、テンプレート推定手段により出力されたテンプレートの各々に対し、当該テンプレートの制約条件を構成する単語の単語クラス、及び予め準備されたテンプレート拡張規則を適用することにより、単語又は文の構造が修正された１又は複数の拡張テンプレートを生成するためのテンプレート拡張手段とを含み、拡張テンプレートの各々は、質問文に対する回答候補の位置を示す第１及び第２の２つの変数を含み、当該２つの変数は、前記テンプレート記憶手段に記憶されたテンプレートの前記第１及び第２の２つの変数部分とそれぞれ関係付けられており、テンプレート拡張手段により出力された拡張テンプレートの各々と、予め入手可能なように準備された多数の文とをマッチングすることにより、拡張テンプレートの第２の変数と交換可能な候補を１又は複数個出力するためのマッチング手段と、マッチング手段により出力された候補に対し、当該候補が得られるまでの経過、または若しくは当該候補にマッチした拡張テンプレートに含まれる単語と当該候補との共起頻度に基づいて、質問文に対する回答としての適格性を示すスコアを算出し、当該スコアの高いものの順番に候補を出力するためのスコアリング及び選択手段として機能させる。 A computer program according to a second aspect of the present invention comprises a template storage means for storing a computer as a template of an answer sentence for a question sentence, the template having first and second variable parts; In response to receiving the input of the question sentence, one or more templates that are the prototypes of the answer sentences for the question sentence are estimated from the templates stored in the template storage means, A template estimation unit for outputting together with a constraint condition indicating a word corresponding to the first variable part, a word class of a word constituting the constraint condition of the template for each of the templates output by the template estimation unit, and By applying pre-prepared template expansion rules, the structure of the word or sentence is corrected. A template extension means for generating one or a plurality of extended templates, each of the extension templates including first and second variables indicating positions of answer candidates for the question sentence, A variable is associated with each of the first and second variable parts of the template stored in the template storage means, and can be obtained in advance with each of the extension templates output by the template extension means. The matching means for outputting one or more candidates that can be exchanged with the second variable of the extension template by matching a large number of sentences prepared in the above, and the candidate output by the matching means Progress until a candidate is obtained, or a word included in an extended template that matches the candidate Based on the co-occurrence frequency with complement, it calculates a score indicating the eligibility of the answer to the question message, to function as scoring and selection means for outputting the candidate in the order of having a high the score.

本発明の１実施の形態に係る質問応答システムのブロック図である。1 is a block diagram of a question answering system according to an embodiment of the present invention. シードパターン集合の例を示す図である。It is a figure which shows the example of a seed pattern set. 拡張パターン集合の例を示す図である。It is a figure which shows the example of an extended pattern set. シードパターンを拡張するためのプログラムの制御構造を示すフローチャートである。It is a flowchart which shows the control structure of the program for extending a seed pattern. Ｗｅｂコーパスからパターンに合致する単語を抽出するためのプログラムの制御構造を示すフローチャートである。It is a flowchart which shows the control structure of the program for extracting the word which matches a pattern from a web corpus. 本発明の１実施の形態に係る質問回答システム実現するコンピュータシステムの外観図である。1 is an external view of a computer system that realizes a question answering system according to an embodiment of the present invention. 図６に示すコンピュータのハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the computer shown in FIG.

以下の説明及び図面では、同一の部品には同一の参照番号を付してある。したがって、それらについての詳細な説明は繰返さない。 In the following description and drawings, the same parts are denoted by the same reference numerals. Therefore, detailed description thereof will not be repeated.

［構成］
図１を参照して、本発明の１実施の形態に係る質問応答装置３０は、Ｗｅｂから収集した文からなるＷｅｂコーパス３２に含まれる自然言語文を用い、幅広い分野に関する自然言語文の入力質問文２０に対する適切な回答文３４を精度よく生成するためのものである。なお、Ｗｅｂコーパス３２に代えて、通常のコーパスを用いても何ら問題がないことについては当業者には明らかであろう。 [Constitution]
Referring to FIG. 1, a question answering apparatus 30 according to an embodiment of the present invention uses natural language sentences included in a Web corpus 32 composed of sentences collected from the Web, and inputs natural language sentences related to a wide range of fields. This is to generate an appropriate answer sentence 34 for the sentence 20 with high accuracy. It will be apparent to those skilled in the art that there is no problem even if a normal corpus is used instead of the Web corpus 32.

質問応答装置３０は、質問文に対する回答文のテンプレートを多数記憶するテンプレート集合記憶部４６と、入力質問文２０に対する回答文としてそのパターンがもっともマッチするテンプレートをテンプレート集合記憶部４６に記憶されたテンプレートの中で推定し抽出するためのテンプレート推定部４８とを含む。テンプレート推定部４８により抽出されるテンプレートは、通常は複数個ある。これらの各々をシードテンプレートと呼び、それらをまとめてシードテンプレート集合と呼ぶ。テンプレート推定部４８によるテンプレートの推定では、予め入力文と、それに対応するテンプレートとの間の関係を教師付機械学習により学習した推定装置を用いることができる。テンプレート推定部４８は、入力質問文から得られる単語のうち、シードテンプレートに含まれる変数のうちのいずれかに入るものを特定し、それをシードテンプレートに対する制約条件としてシードテンプレートと共に出力する。したがって、シードテンプレートのうち、いずれかの変数にどのような値を入れるべきかは定まっている。 The question answering device 30 stores a template set storage unit 46 that stores a large number of answer sentence templates for a question sentence, and a template stored in the template set storage part 46 as a template that best matches the pattern as an answer sentence for the input question sentence 20. And a template estimation unit 48 for estimation and extraction. There are usually a plurality of templates extracted by the template estimation unit 48. Each of these is called a seed template, and they are collectively called a seed template set. In the template estimation by the template estimation unit 48, an estimation device in which the relationship between the input sentence and the corresponding template is learned in advance by supervised machine learning can be used. The template estimation unit 48 identifies words that are included in any of the variables included in the seed template among the words obtained from the input question sentence, and outputs them together with the seed template as a constraint condition for the seed template. Therefore, what value should be put in any of the seed templates is determined.

質問応答装置３０はさらに、テンプレート推定部４８の出力したシードテンプレート集合を記憶するためのシードテンプレート集合記憶部５０と、シードテンプレートに含まれる単語の単語クラス、単語と単語との意味的関係等を記憶するためのシソーラス５８と、シードテンプレート集合記憶部５０内のシードテンプレートの各々を拡張してさらに多数のテンプレートを形成するために予め準備されたテンプレート拡張規則を記憶するためのテンプレート拡張規則記憶部５４と、テンプレート拡張規則記憶部５４に記憶されたテンプレート拡張規則、及びシソーラス５８を用い、シードテンプレート集合記憶部５０に記憶されたシードテンプレートの各々から、シードテンプレートと異なる形の拡張テンプレートを多数生成するためのテンプレート拡張処理部５２と、テンプレート拡張処理部５２により出力される拡張テンプレート集合を記憶するための拡張テンプレート集合記憶部５６とを含む。 The question answering device 30 further includes a seed template set storage unit 50 for storing the seed template set output from the template estimation unit 48, a word class of words included in the seed template, a semantic relationship between words, and the like. A thesaurus 58 for storing, and a template expansion rule storage unit for storing template expansion rules prepared in advance to form each of a plurality of templates by expanding each of the seed templates in the seed template set storage unit 50 54, a template extension rule stored in the template extension rule storage unit 54, and a thesaurus 58, and a large number of extension templates different from the seed template are generated from each of the seed templates stored in the seed template set storage unit 50. Ten to do The rate extension processing unit 52, and a extension template set storage unit 56 for storing an extension template set output by the template extension processing unit 52.

テンプレート集合記憶部４６に記憶されたテンプレートは、質問に回答するための文パターンである。学習のために種々の文を収集し、それらから質問の形を作成しておき、それらをテンプレートとしてテンプレート集合記憶部４６に用意しておく。 The template stored in the template set storage unit 46 is a sentence pattern for answering a question. Various sentences are collected for learning, question shapes are created from them, and these are prepared as templates in the template set storage unit 46.

テンプレート推定部４８は、入力質問文２０を解析し、それに回答するテンプレートをテンプレート集合記憶部４６中から抽出するためのものである。入力質問文２０を構成する文字列との類似性に基づき、テンプレート集合記憶部４６に記憶されたテンプレートの中からもっとも近いテンプレートを特定すること等により、テンプレート推定部４８はシードテンプレートを抽出する。推定結果は複数個用いることが可能である。すなわち、シードテンプレート集合記憶部５０には複数個のシードテンプレートが格納され得る。 The template estimation unit 48 analyzes the input question sentence 20 and extracts a template that answers it from the template set storage unit 46. The template estimation unit 48 extracts a seed template, for example, by specifying the closest template from the templates stored in the template set storage unit 46 based on the similarity to the character strings constituting the input question sentence 20. A plurality of estimation results can be used. That is, a plurality of seed templates can be stored in the seed template set storage unit 50.

図２にシードテンプレートの例を示す。図２の例では、入力質問文２０が「ウルトラマンの敵は何？」というようなものであった場合を想定している。この質問文にマッチするテンプレートとしては、Ｘを変数として、「Ｙの敵はＸ」、「ＸがＹの敵」等というテンプレートが得られる。なお、入力質問文２０から、上記テンプレートを抽出するにあたり、「ウルトラマン」が変数Ｙに相当し、変数Ｘが入力質問文２０に対する回答となり得る部分を示す。 FIG. 2 shows an example of a seed template. In the example of FIG. 2, it is assumed that the input question sentence 20 is “What is the enemy of Ultraman?”. As a template that matches the question sentence, templates such as “Y enemy is X”, “X is Y enemy”, etc. are obtained using X as a variable. In extracting the template from the input question sentence 20, “Ultraman” corresponds to the variable Y, and the variable X indicates a portion that can be an answer to the input question sentence 20.

テンプレートは、自然言語文から自動的に生成することも可能である。本実施の形態では、人間が手作業で準備するものとする。テンプレートは、本実施の形態ではいわゆる正規表現で記述するものとする。正規表現としては種々のものが知られているが、ここではそれらのいずれも用いるようにしてもよい。 Templates can also be automatically generated from natural language sentences. In this embodiment, it is assumed that a human prepares manually. In this embodiment, the template is described by a so-called regular expression. Various regular expressions are known, but any of them may be used here.

シソーラス５８は、各単語について、その属性として単語クラスを対応付けたものである。 The thesaurus 58 associates a word class as an attribute for each word.

テンプレート拡張規則記憶部５４に記憶されたテンプレート拡張規則は、シードテンプレート集合記憶部５０に記憶されたシードテンプレートを拡張し、拡張テンプレートを生成するための規則である。テンプレート拡張規則も正規表現を用いて記述できる。本実施の形態では、各拡張規則は構文解析情報（典型的には構文解析木）を含み、構文解析木中に配置された単語または単語列（以下「単語等」と呼ぶ。）の集合を含む。これら単語等の集合は２つの変数を持つ。変数とは、べつの単語に入れ替えることが可能な箇所を表す。本実施の形態では、各変数にはいずれもその属性（単語クラス）がクラス制限として指定されている。テンプレート拡張規則は、単語の置換、文構造の修正等を含むものでもよい。単語の置換でも、たとえば同じ単語クラスの他の単語に置換する、その単語のより下位の概念の単語に置換する、その単語の類義語に置換する、等の種々の規則を用いることができる。 The template expansion rule stored in the template expansion rule storage unit 54 is a rule for expanding the seed template stored in the seed template set storage unit 50 and generating an expansion template. Template extension rules can also be described using regular expressions. In this embodiment, each extended rule includes parsing information (typically a parsing tree), and a set of words or word strings (hereinafter referred to as “words”) arranged in the parsing tree. Including. These sets of words have two variables. A variable represents a place that can be replaced with another word. In the present embodiment, the attribute (word class) is specified as a class restriction for each variable. The template expansion rule may include word replacement, sentence structure correction, and the like. Various rules can be used for word replacement, such as replacement with another word in the same word class, replacement with a word of a lower concept of the word, replacement with a synonym of the word, and the like.

シードテンプレートにテンプレート拡張規則を適用する際には、シードテンプレートに含まれる単語のうち、質問文から得られ、テンプレートのうち変数に相当する部分に挿入された単語の単語クラスと、テンプレート拡張規則の変数部分に付された単語クラスとのマッチングが行なわれる。両者が一致した場合にそのテンプレート拡張規則がシードテンプレートに適用される。この結果得られる拡張テンプレートには２箇所の変数があり、それら変数にはその属性（単語クラス）がクラス制限として付されることになる。 When applying a template expansion rule to a seed template, out of the words included in the seed template, the word class of the word obtained from the question sentence and inserted in the portion corresponding to the variable in the template, and the template expansion rule Matching with the word class attached to the variable part is performed. If they match, the template expansion rule is applied to the seed template. The extension template obtained as a result has two variables, and the attribute (word class) is given as a class restriction to these variables.

図３を参照して、拡張テンプレート集合の内容の一例として、図２に示すシードテンプレートに拡張規則を適用することで、変数Ｙ＝「ウルトラマン」という制約のもと、「Ｘ＜怪獣／妖怪＞はＹ＜ヒーロー＞のライバル」、「Ｘ＜怪獣／妖怪＞がＹ＜ヒーロー＞の宿敵」、「Ｙ＜ヒーロー＞の敵がＸ＜怪獣／妖怪＞だった」、等という拡張テンプレートが得られる。Ｘ，Ｙは変数を表す。拡張テンプレートも２つの変数を持ち、それぞれに単語クラスが付されている。ここで、「ウルトラマン」という、入力質問文に含まれる単語から、シソーラスを用いて＜ヒーロー＞という単語クラスが得られるものとする。 Referring to FIG. 3, as an example of the contents of the extended template set, by applying an extended rule to the seed template shown in FIG. 2, “X <Monster / Yokai>” under the constraint of variable Y = “Ultraman”. Y <hero> rival "," X <monster / youkai> is Y <hero> nemesis "," Y <hero> enemy was X <monster / youkai> ", etc. . X and Y represent variables. The extension template also has two variables, each with a word class. Here, it is assumed that a word class <hero> is obtained from a word “Ultraman” included in an input question sentence by using a thesaurus.

再び図１を参照して、質問応答装置３０はさらに、拡張テンプレート集合記憶部５６に記憶された拡張テンプレートの各々と、Ｗｅｂコーパス３２に含まれる文とのマッチングを、シソーラス５８から得られる単語クラスを用いて行ない、Ｗｅｂコーパス３２から、拡張テンプレートのいずれかに合致する部分を持つ単語列（「マッチ単語列」と呼ぶ。）を抽出するマッチング部６０と、マッチング部６０によりＷｅｂコーパス３２から抽出されたマッチ単語列からなるマッチ単語列集合を記憶するマッチ単語列集合記憶装置６２と、マッチ単語列集合記憶装置６２に記憶されたマッチ単語列から、質問に対する答えとなりそうな部分を収集するための回答候補収集部６６と、回答候補収集部６６の出力するマッチ単語列の各々に対し、テンプレート推定部４８で適用されたテンプレート、テンプレート拡張処理部５２でのテンプレートの拡張に使用されたテンプレート拡張規則及び単語クラス、回答に相当する部分の単語とその周囲に含まれる単語との共起頻度等に基づき、各マッチ単語列に対してスコアを付与し、スコアの高いマッチ単語列から順番に回答文３４として出力するためのスコアリング・選択部６８とを含む。 Referring again to FIG. 1, the question answering apparatus 30 further includes a word class obtained from the thesaurus 58 for matching each of the extended templates stored in the extended template set storage unit 56 with the sentence included in the Web corpus 32. And a matching unit 60 that extracts a word string (referred to as a “match word string”) having a portion that matches any of the extended templates from the Web corpus 32, and is extracted from the Web corpus 32 by the matching unit 60. In order to collect a match word string set storage device 62 that stores a match word string set made up of matched match word strings and a match word string stored in the match word string set storage device 62, a portion that is likely to be an answer to a question For each of the answer candidate collection unit 66 and the match word string output by the answer candidate collection unit 66 Template applied by the rate estimation unit 48, template expansion rules and word classes used for template expansion in the template expansion processing unit 52, and the co-occurrence frequency of the word corresponding to the answer and the words included in the surrounding area Based on the above, a scoring / selecting unit 68 is provided for assigning a score to each match word string, and outputting the match sentence string in descending order of the score.

マッチング部６０におけるマッチングでは、拡張テンプレートに含まれる構文解析木を使用し、構文解析木の中の部分木の各々と、Ｗｅｂコーパス３２に含まれる文とのマッチングを行なう。例えばＷｅｂコーパス３２からの文の構文解析木中のあるノードを省略した部分が拡張テンプレートの構文解析木と一致するのであれば、それ以外の条件（変数部分の単語クラスの一致、それ以外の単語列の一致）が満たされさえすればその一致する部分がマッチ単語列として抽出される。したがって、１次元的な単語列の配列を見るだけでは得られないような単語列（Ｗｅｂコーパス３２には実際には含まれないような単語列）もマッチ単語列として抽出できる。 In the matching in the matching unit 60, a parsing tree included in the extended template is used to match each subtree in the parsing tree with a sentence included in the Web corpus 32. For example, if a part in which a node in a sentence parse tree of a sentence from the Web corpus 32 is omitted matches the parse tree of the extension template, other conditions (matching word class in variable part, other words As long as (matching of sequence) is satisfied, the matching portion is extracted as a match word sequence. Therefore, a word string that cannot be obtained only by looking at the one-dimensional word string array (a word string that is not actually included in the Web corpus 32) can be extracted as a match word string.

図４を参照して、図１のテンプレート拡張処理部５２をコンピュータにより実現するためのプログラムは、以下のような制御構造を有する。このプログラムは、全シードテンプレートの変数部分のうち、入力質問文により補充された部分の単語の単語クラスをシソーラス５８を参照して割当てるステップ９８と、テンプレート拡張規則記憶部５４に記憶されたテンプレート拡張規則を全てコンピュータの主記憶装置に読込むステップ１００と、ステップ１００で読込まれた各規則に対し、以下に説明するステップ１０４を実行して拡張テンプレート集合に拡張テンプレートを追加するステップ１０２と、ステップ１０２の処理が完了した後、ステップ１０２の処理で得られた拡張テンプレート集合を出力して処理を終了するステップ１０６とを含む。 Referring to FIG. 4, a program for realizing the template extension processing unit 52 of FIG. 1 by a computer has the following control structure. This program assigns the word class of the word of the part supplemented by the input question sentence among the variable parts of all the seed templates with reference to the thesaurus 58, and the template extension stored in the template extension rule storage unit 54. Step 100 for reading all the rules into the main memory of the computer, Step 102 for adding the extension template to the set of extension templates by executing Step 104 described below for each rule read in Step 100, Step After the process of 102 is completed, the process includes a step 106 of outputting the extended template set obtained by the process of step 102 and ending the process.

ステップ１０４は、シードテンプレート集合記憶部５０に記憶された全てのシードテンプレートに対し変数の単語クラスのうち、入力質問文により補充されたものが拡張規則の変数のいずれかの単語クラスと一致するものがあるか否かを判定するステップ１１０を含む。ステップ１１０の判定が否定であればこの拡張規則に対する処理を終了し、次の拡張規則に処理を進める。 In step 104, for all seed templates stored in the seed template set storage unit 50, among the word classes of variables, those supplemented by the input question sentence match with any word class of variables of the extended rule. The step 110 of determining whether there exists is included. If the determination in step 110 is negative, the process for this extended rule is terminated and the process proceeds to the next extended rule.

ステップ１０４はさらに、ステップ１１０の判定が肯定のときに、条件を充足するシードテンプレートに対し、以下のステップ１１４を実行するステップ１１２を含む。 Step 104 further includes a step 112 of performing the following step 114 on the seed template that satisfies the condition when the determination of step 110 is affirmative.

ステップ１１２は、現在処理対象となっている拡張規則を対象のシードテンプレートに適用し、新たなテンプレート（拡張テンプレート）を生成するステップ１２０と、新たに作成された拡張テンプレートに、基となったシードテンプレートの重みと、適用された拡張規則の重みとの積を計算し、重みとして付すステップ１２２と、ステップ１２２で作成された拡張テンプレートを、ステップ１２２で計算された重みとともに、拡張テンプレート集合にマージするステップ１２４とを含む。なお、ステップ１２０で作成された拡張テンプレートが既に拡張テンプレート集合にマージされている場合、そのテンプレートは拡張テンプレート集合に追加されない。 In step 112, the extended rule that is the current processing target is applied to the target seed template to generate a new template (extended template), and the seed that is based on the newly created extended template The product of the template weight and the weight of the applied extension rule is calculated and attached as a weight step 122. The extension template created in step 122 is merged into the extension template set together with the weight calculated in step 122. Step 124. If the extension template created in step 120 has already been merged into the extension template set, the template is not added to the extension template set.

図５を参照して、図１に示すマッチング部６０を実現するためのプログラムは、Ｗｅｂコーパス３２に記憶されている各文に対して以下のステップ１３２を実行するステップ１３０を含む。 Referring to FIG. 5, the program for realizing matching unit 60 shown in FIG. 1 includes step 130 for executing the following step 132 for each sentence stored in Web corpus 32.

ステップ１３０は、処理対象となっている文に対して形態素解析を行なうステップ１４０と、形態素解析処理により単語クラス、活用形等を示すタグが付された単語列（形態素列）を受け、当該単語列を構文解析し、構文解析木からなる単語列パターンを出力するステップ１４１と、拡張テンプレート集合記憶部５６に記憶された各テンプレートについて、以下のステップ１４４を実行するステップ１４２とを含む。なお、ここでは対象言語を日本語としているため、ステップ１４０では形態素解析を行っている。対象言語が英語のように単語を空白で区切る言語の場合、ここでは形態素解析ではなく品詞解析等の解析処理を実行すればよい。形態素解析には、既存の形態素解析プログラムを使用すればよい。形態素解析プログラムとして、例えば、ＪＵＭＡＮ (URL＝http://nlp.kuee.Kyoto-u.ac.jp/nl-resource/juman.html)、またはＣｈａＳｅｎ(URL=http://chasen-legacy.sourceforge.jp/)を用いることができる。 Step 130 receives step 140 for performing morphological analysis on the sentence to be processed, and a word string (morpheme string) to which a tag indicating a word class, a utilization form, etc. is attached by the morphological analysis process, Step 141 for parsing the sequence and outputting a word string pattern composed of a parse tree and step 142 for executing the following step 144 for each template stored in the extended template set storage unit 56 are included. Since the target language is Japanese here, step 140 performs morphological analysis. If the target language is a language that separates words with spaces such as English, analysis processing such as part-of-speech analysis may be executed here instead of morphological analysis. For the morphological analysis, an existing morphological analysis program may be used. As a morphological analysis program, for example, JUMAN (URL = http: //nlp.kuee.Kyoto-u.ac.jp/nl-resource/juman.html) or ChaSen (URL = http: //chasen-legacy.sourceforge .jp /) can be used.

ステップ１４４は、処理対象となっている単語列パターン中で、処理対象となっているテンプレートに木構造を含めてマッチする箇所があるか否かを判定するステップ１５０と、ステップ１５０の判定が肯定のときに、そのマッチする箇所の各々に対して以下のステップ１５４の処理を実行するステップ１５２とを含む。 In step 144, step 150 for determining whether or not there is a matching portion including the tree structure in the template to be processed in the word string pattern to be processed, and the determination in step 150 is affirmative. , Step 152 for executing the processing of the following step 154 for each of the matching locations.

ステップ１５４は、、マッチした箇所のうち処理対象となっているものの変数部分（変数Ｘ等）に、処理対象となっているテンプレートの重みを付すステップ１６０と、得られたテンプレートに出現する単語等の共起頻度に基づく重みを、このテンプレートの重みにさらに乗じてテンプレートに付して出力し、処理を次のマッチ箇所に移動させるステップ１６２とを含む。ステップ１５０の判定が否定のときには何もされず、処理は次のテンプレートに移動する。 Step 154 includes a step 160 of assigning the weight of the template to be processed to the variable portion (variable X or the like) of the processing target among the matched parts, a word appearing in the obtained template, etc. And a step 162 of further multiplying the template weight by the weight based on the co-occurrence frequency and attaching it to the template for output, and moving the process to the next match location. If the determination in step 150 is negative, nothing is done and the process moves to the next template.

既に述べたように、ステップ１５０の判定では対象となる文の構文解析木の全部分木と、テンプレートの構文解析木との比較を行ない、一致する場合にはそれをステップ１５２以下の処理の対象とする。例えば、入力される文の構文解析木のうち、一部のノードを省略したものと、拡張テンプレートの構文解析木とが一致する場合、それ以外の条件が満たされて入れば、その一致した部分のみがマッチ箇所として抽出される。したがって、Ｗｅｂコーパス３２には実際には存在しない単語列もマッチ文字列として抽出できる。 As described above, in the determination in step 150, the entire sub-tree of the target sentence parse tree is compared with the parse tree of the template, and if they match, they are processed in step 152 and subsequent steps. And For example, if the parse tree of the input sentence with some nodes omitted matches the parse tree of the extension template, and if other conditions are met, the matched part Is extracted as a match. Therefore, a word string that does not actually exist in the Web corpus 32 can be extracted as a match character string.

［動作］
図１〜図５に示した質問応答装置３０は以下のように動作する。予め、Ｗｅｂから多数の文を収集し、Ｗｅｂコーパス３２に記憶させておく。テンプレート集合記憶部４６には、予め手作業又は自動処理により、回答文のテンプレートを準備しておく。シソーラス５８も予め準備する。テンプレート拡張規則記憶部５４には、テンプレート拡張規則を予め手作業又は自動処理により準備しておく。これらはいずれも機械可読な形でハードディスク等に記憶される。 [Operation]
The question answering apparatus 30 shown in FIGS. 1 to 5 operates as follows. A large number of sentences are collected in advance from the Web and stored in the Web corpus 32 in advance. In the template set storage unit 46, a template for an answer sentence is prepared in advance by manual operation or automatic processing. A thesaurus 58 is also prepared in advance. In the template expansion rule storage unit 54, template expansion rules are prepared in advance by manual operation or automatic processing. These are all stored on a hard disk or the like in a machine-readable form.

入力質問文２０が質問応答装置３０に与えられると、テンプレート推定部４８がテンプレート集合記憶部４６に記憶されたテンプレートの中から入力質問文２０の回答文として最もふさわしいものを１又は複数個抽出し、シードテンプレートとしてシードテンプレート集合記憶部５０に出力する（図２を参照）。この抽出には、機械学習結果に基づく判別器が使用される。 When the input question sentence 20 is given to the question answering device 30, the template estimation unit 48 extracts one or a plurality of the most appropriate answer sentences for the input question sentence 20 from the templates stored in the template set storage unit 46. The seed template is output to the seed template set storage unit 50 (see FIG. 2). For this extraction, a discriminator based on the machine learning result is used.

シードテンプレートがシードテンプレート集合記憶部５０に記憶されると、テンプレート拡張処理部５２が動作し、シードテンプレート集合記憶部５０に記憶されたシードテンプレートの各々に、テンプレート拡張規則記憶部５４に記憶されたテンプレート拡張規則を適用することにより、シードテンプレートを拡張する。この拡張により多数の拡張テンプレートが生成され拡張テンプレート集合記憶部５６に記憶される。 When the seed template is stored in the seed template set storage unit 50, the template expansion processing unit 52 operates, and each of the seed templates stored in the seed template set storage unit 50 is stored in the template expansion rule storage unit 54. Extend the seed template by applying template expansion rules. A number of extension templates are generated by this extension and stored in the extension template set storage unit 56.

拡張テンプレート集合記憶部５６に格納された拡張テンプレートはマッチング部６０に読み出され、図示しない主記憶部に記憶される。マッチング部６０は、Ｗｅｂコーパス３２に記憶された多数の文を順次読出し、各々について形態素解析及び構文解析を行なう（図５のステップ１４０及び１４１）。さらにマッチング部６０は、形態素解析及び構文解析により得られた単語列（単語クラス、意味クラス等のタグが付された形態素列）が付された構文解析木について、主記憶部に記憶された拡張テンプレートにマッチする部分を持つか否かを判定する（ステップ１５０）。拡張テンプレートのいずれかとマッチする部分がある場合（ステップ１５０の判定が肯定）、マッチング部６０はその構造により表される単語列のうち、回答に相当する部分（変数Ｘ）を、マッチしたテンプレートに付された重みとともにマッチ単語列集合記憶装置に出力する（ステップ１５２）。マッチ単語列集合記憶装置６２は、これらの単語列を、単語に付されたタグ及び重みとともに記憶する。マッチング部６０は、Ｗｅｂコーパス３２に記憶された全ての文についてこれを繰返す。 The extension template stored in the extension template set storage unit 56 is read by the matching unit 60 and stored in a main storage unit (not shown). The matching unit 60 sequentially reads a large number of sentences stored in the web corpus 32 and performs morphological analysis and syntax analysis on each of them (steps 140 and 141 in FIG. 5). Furthermore, the matching unit 60 expands the parse tree to which the word sequence (morpheme sequence with tags such as word class and semantic class) obtained by morphological analysis and syntax analysis is stored in the main storage unit. It is determined whether or not there is a part that matches the template (step 150). If there is a part that matches any of the extended templates (Yes in step 150), the matching unit 60 uses the part corresponding to the answer (variable X) in the word string represented by the structure as the matched template. Along with the assigned weight, it is output to the match word string set storage device (step 152). The match word string set storage device 62 stores these word strings together with tags and weights attached to the words. The matching unit 60 repeats this for all sentences stored in the web corpus 32.

回答候補収集部６６は、マッチ単語列集合記憶装置６２に記憶されたマッチ単語列の各々について、その回答を得るために使用されたテンプレート、拡張規則、単語クラス等の情報と、その単語と、その周囲の単語との共起頻度とに基づいて、各回答候補にスコアを付与し、スコアリング・選択部６８に与える。 For each of the match word strings stored in the match word string set storage device 62, the answer candidate collection unit 66 includes information such as a template, an extended rule, and a word class used for obtaining the answer, the word, Based on the co-occurrence frequency with surrounding words, a score is assigned to each answer candidate and given to the scoring / selection unit 68.

スコアリング・選択部６８は、回答候補収集部６６からの回答候補をそのスコアの昇順に並べて保持する。スコアリング・選択部６８は、回答候補収集部６６からの回答候補の出力がすべて終了すると、スコアの上位のものから順番に所定個数を回答文３４として出力する。 The scoring / selection unit 68 holds the answer candidates from the answer candidate collection unit 66 in the ascending order of the scores. When all the output of the answer candidates from the answer candidate collection unit 66 is completed, the scoring / selection unit 68 outputs a predetermined number as the answer sentences 34 in order from the highest score.

このようにして回答文３４が作成される。回答文３４は、最初に準備したテンプレートから選択されたシードテンプレートと、シードテンプレートから拡張した拡張テンプレートとに適合した文から生成された多数の回答候補の中から選択される。シードテンプレートは、入力質問文２０に対する回答文として最も可能性の高いものである。拡張テンプレートは、そのシードテンプレートに含まれる単語の類義語への置換、シードテンプレートの表現の言い換え等からなる。したがって、回答文３４は、入力質問文２０に対する多数の回答候補の中から選択されたものとなる。しかもシードテンプレートは、テンプレート拡張規則により拡張されるため、拡張テンプレート集合記憶部５６には非常に多数のテンプレートが記憶される。しかもこのテンプレートには正規表現が用いられるため、テンプレートとＷｅｂコーパス３２に含まれる文とのマッチングにより非常に多くの回答候補がＷｅｂコーパス３２から抽出される。さらにこのマッチングでは、構文解析木の部分木まで含めてマッチング箇所が検索される。したがって、ここでは「抽出」という語を用いているが、Ｗｅｂコーパス３２には含まれない表現もマッチング部６０の処理により抽出（生成）されることになる。 In this way, the answer sentence 34 is created. The answer sentence 34 is selected from a large number of answer candidates generated from sentences that match the seed template selected from the template prepared first and the extended template expanded from the seed template. The seed template has the highest possibility as an answer sentence to the input question sentence 20. The extension template includes replacement of a word included in the seed template with a synonym, paraphrase of expression of the seed template, and the like. Accordingly, the answer sentence 34 is selected from a large number of answer candidates for the input question sentence 20. Moreover, since the seed template is expanded by the template expansion rule, a very large number of templates are stored in the expanded template set storage unit 56. Moreover, since a regular expression is used for this template, a very large number of answer candidates are extracted from the web corpus 32 by matching the template and the sentence included in the web corpus 32. Further, in this matching, a matching portion is searched including a subtree of the parse tree. Therefore, although the word “extraction” is used here, expressions that are not included in the Web corpus 32 are also extracted (generated) by the processing of the matching unit 60.

Ｗｅｂコーパス３２は、入手可能なコーパスとしては、最も多数の表現を含むと考えられる。しかし、Ｗｅｂコーパス３２に含まれる表現は、人間により作成されたものであり、そのためにその数にはどうしても限りがある。それに対し、本実施の形態では、テンプレートを拡張して様々な拡張テンプレートでＷｅｂコーパス３２とマッチングを行なうことにより、マッチ単語列として、人手で作成されたものよりもはるかに幅広い表現が格納されることになる。したがって、それら表現の中からスコアリング・選択部６８により選択された回答文３４は、入力質問文２０に対する回答としてふさわしいものとなる確率が高くなる。その結果、種々の入力質問文２０に対して、精度よく、回答文を自動的に生成し出力できる。ただし、既に述べたとおり、Ｗｅｂコーパス３２に代えて別のコーパスを使用してもよいことはもちろんである。 The Web corpus 32 is considered to contain the largest number of expressions as an available corpus. However, the expressions included in the Web corpus 32 are created by humans, and therefore the number is inevitably limited. On the other hand, in this embodiment, by expanding the template and performing matching with the Web corpus 32 using various extended templates, a much broader expression than that created manually is stored as the match word string. It will be. Accordingly, the answer sentence 34 selected by the scoring / selection unit 68 from these expressions has a high probability of being suitable as an answer to the input question sentence 20. As a result, an answer sentence can be automatically generated and output with high accuracy for various input question sentences 20. However, as described above, it goes without saying that another corpus may be used instead of the Web corpus 32.

上記実施の形態では、シードテンプレートの変数には、単語タグ等、単語の属性を示すものは付されていない。しかし本発明はそのような実施の形態に限定されず、変数に単語タグが付されたものをシードテンプレートして用いても良い。また、シードテンプレートの変数の数は２個であったが、シードテンプレートに３個以上の変数を含むようにしてもよい。 In the above-described embodiment, the variables of the seed template are not attached with those indicating word attributes such as word tags. However, the present invention is not limited to such an embodiment, and a variable with a word tag attached may be used as a seed template. Further, although the number of variables in the seed template is two, the seed template may include three or more variables.

なお、上記した実施の形態では、テンプレート拡張処理部５２によるテンプレートの拡張はシードテンプレートに対するもののみであった。しかし本発明はそのようなものには限定されない。シードテンプレートに対してテンプレート拡張規則を適用して得られた拡張テンプレートに、さらにテンプレート拡張規則を適用することでさらにテンプレート数を増加させるようにしてもよい。この場合、所定の繰返し回数だけテンプレート拡張の処理を行なっても良いし、新たな拡張テンプレートが出現しなくなるまで、テンプレート拡張の処理を再帰的に繰返し実行するようにしてもよい。 In the above-described embodiment, the template expansion by the template expansion processing unit 52 is only for the seed template. However, the present invention is not limited to such. You may make it increase the number of templates further by applying a template expansion rule to the expansion template obtained by applying a template expansion rule with respect to a seed template. In this case, the template expansion process may be performed a predetermined number of times, or the template expansion process may be recursively repeated until no new expansion template appears.

上記実施の形態では、シードテンプレートに予め種々の重みを付与している。しかし本発明はそのような実施の形態には限定されない。シードテンプレートに付与している重みを一定とし、どのテンプレート拡張規則が用いられたかのみにより、テンプレートの重みを決定するようにしてもよい。または、Ｗｅｂコーパス３２に含まれる単語列について、適用可能なテンプレートが複数個ある場合には、その個数に応じて大きくなる重みを与えるようにしてもよい。テンプレート拡張規則をシードテンプレートだけでなく拡張テンプレートにも適用してテンプレートを作成するようにした場合には、テンプレート拡張規則を適用するごとに、テンプレートの重みが軽くなるようにすることが望ましい。 In the above embodiment, various weights are given to the seed template in advance. However, the present invention is not limited to such an embodiment. The weight assigned to the seed template may be constant, and the template weight may be determined only by which template expansion rule is used. Alternatively, when there are a plurality of applicable templates for the word string included in the web corpus 32, a weight that increases according to the number of templates may be given. When a template is created by applying the template expansion rule not only to the seed template but also to the expansion template, it is desirable that the weight of the template is reduced each time the template expansion rule is applied.

上記実施の形態では、各規則はいずれも正規表現を用いて記述されている。しかし本発明はそのような実施の形態には限定されない。目的に応じて規則を的確に記述できるものであれば、どのような記述方式に従うものであってもよい。 In the above embodiment, each rule is described using regular expressions. However, the present invention is not limited to such an embodiment. As long as the rules can be accurately described according to the purpose, any description method may be used.

上記実施の形態では、Ｗｅｂコーパス３２としてＷｅｂから収集した文を用いている。現在のところ、Ｗｅｂ上には非常に多数の文があるため、Ｗｅｂコーパス３２はＷｅｂそのものを用いることが望ましい。しかし、もちろん、Ｗｅｂ以外のコーパスをＷｅｂコーパス３２として用いることもできる。 In the above embodiment, sentences collected from the Web are used as the Web corpus 32. At present, since there are a large number of sentences on the Web, it is desirable to use the Web itself as the Web corpus 32. However, of course, a corpus other than the web can be used as the web corpus 32.

［動作例］
この実施の形態による具体的な動作例について説明する。質問応答装置３０は、さまざまな質問にＷＷＷで答えることができる質問応答システムのサーバ側の装置である。この例では、非特許文献１に記載された類似関係パターンの抽出手法を数億のＷＷＷページに適用し、予め多数の関係パターンを抽出する。同様にして、テンプレート推定部４８においてシードテンプレートを抽出するためのデータも予め準備する。 [Operation example]
A specific operation example according to this embodiment will be described. The question answering apparatus 30 is an apparatus on the server side of a question answering system that can answer various questions by WWW. In this example, the similar relationship pattern extraction method described in Non-Patent Document 1 is applied to hundreds of millions of WWW pages, and a large number of relationship patterns are extracted in advance. Similarly, data for extracting a seed template in the template estimation unit 48 is also prepared in advance.

具体的な例として、入力質問文２０が「ウルトラマンの敵は誰」というものである場合を想定する。この入力に対して、テンプレート推定部４８が、テンプレート集合記憶部４６から「ＸはＹの敵」及び「ＸがＹの敵」（Ｙ＝ウルトラマン）というテンプレートを推定したものとする。テンプレート推定部４８は、この２つのテンプレートをシードテンプレートとしてシードテンプレート集合記憶部５０に記憶させる。 As a specific example, it is assumed that the input question sentence 20 is “who is the enemy of Ultraman”. In response to this input, it is assumed that the template estimation unit 48 estimates the templates “X is an enemy of Y” and “X is an enemy of Y” (Y = Ultraman) from the template set storage unit 46. The template estimation unit 48 stores these two templates as seed templates in the seed template set storage unit 50.

テンプレート拡張処理部５２は、シードテンプレート集合記憶部５０に記憶されたこの２つのシードテンプレートを拡張する。その結果、「ＸはＹのライバル」、「ＸがＹの宿敵」、「Ｙの敵がＸだった」等のテンプレートが得られる。さらに、（Ｙ＝ウルトラマン）という制約に基づき、Ｙに＜ヒーロー＞という単語クラスを割当てて、この単語クラスに一致する変数を持つ拡張規則を適用して拡張テンプレートを作成する。このようにして得られた拡張テンプレートが拡張テンプレート集合記憶部５６に記憶される。 The template expansion processing unit 52 expands these two seed templates stored in the seed template set storage unit 50. As a result, templates such as “X is Y's rival”, “X is Y's nemesis”, and “Y's enemy was X” are obtained. Furthermore, based on the constraint (Y = Ultraman), a word class <hero> is assigned to Y, and an extension template having a variable that matches this word class is applied to create an extension template. The extension template obtained in this way is stored in the extension template set storage unit 56.

マッチング部６０は、Ｗｅｂコーパス３２内の多数の文と、拡張テンプレート集合記憶部５６に記憶された拡張テンプレートとの間で、構造解析木の部分木まで考慮してマッチングし、変数Ｘに相当する表現を求める。これらが回答候補である。すなわち、回答候補の集合＝「ピグモン、ジャミラ、バルタン星人、ゼットン…」となる。この回答候補の集合の中から、もととなったテンプレート、拡張テンプレート、元の単語と置換された単語との関係、回答候補の単語とテンプレート中のその単語以外の単語との共起関係等に基づいてスコアリング・選択部６８が各候補にスコアを付与し、スコアの高い所定個数、たとえば「ピグモン」及び「バルタン星人」を回答文３４として出力する。 The matching unit 60 matches between a large number of sentences in the Web corpus 32 and the extended template stored in the extended template set storage unit 56 in consideration of the subtree of the structural analysis tree, and corresponds to the variable X. Ask for expression. These are answer candidates. That is, a set of answer candidates = “Pigmon, Jamira, Baltan, Zetton ...”. From this set of answer candidates, the original template, extended template, relationship between the original word and the replaced word, co-occurrence relationship between the answer candidate word and a word other than that word in the template, etc. The scoring / selection unit 68 gives a score to each candidate, and outputs a predetermined number of high scores, for example, “Pigmon” and “Bartan Alien” as the answer sentence 34.

回答文３４は、図示しない音声合成装置により音声として出力されてもよい。 The answer sentence 34 may be output as speech by a speech synthesizer (not shown).

［コンピュータによる実現］
この実施の形態に係る質問応答装置３０は、コンピュータハードウェアと、そのコンピュータハードウェアにより実行されるプログラムと、コンピュータハードウェアに格納されるデータとにより実現できる。 [Realization by computer]
The question answering apparatus 30 according to this embodiment can be realized by computer hardware, a program executed by the computer hardware, and data stored in the computer hardware.

図６を参照して、質問応答装置３０を含む質問応答システムは、上記した質問応答装置３０として機能するコンピュータシステム３３０と、コンピュータシステム３３０に質問文（図１に示す入力質問文２０）を音声信号として送信し、コンピュータシステム３３０からその回答文（図１に示す回答文３４）を音声信号により受信する携帯電話３００とを含む。本実施の形態では、携帯電話３００とコンピュータシステム３３０との間の通信のうち、質問文と回答文との送信は音声で行なわれる。したがって携帯電話３００は通常のキャリアとの間で音声通信が可能なものであればよい。一方、コンピュータシステム３３０は、電話からの音声信号を受信する機能と、その音声信号を音声認識によりテキスト列に変換し、入力質問文２０としてテンプレート推定部４８に与える機能と、スコアリング・選択部６８の出力する回答文３４を音声合成により音声信号に変換し、電話により携帯電話３００に返信する機能を持つ必要がある。これらはいずれも既存の機能であるため、ここにはその詳細は説明しない。もちろん、携帯電話３００がデータ通信の機能を持っていれば、コンピュータシステム３３０からの回答文をテキスト形式で携帯電話３００に送信したり、回答と、その回答に関係したＵＲＬとから合成したＨＴＭＬ（ＨｙｐｅｒＴｅｘｔＭａｒｋｕｐＬａｎｇｕａｇｅ）文書を携帯電話３００に返信し、携帯電話３００でブラウザを起動させたりしてもよい。 Referring to FIG. 6, the question answering system including question answering apparatus 30 has a computer system 330 functioning as question answering apparatus 30 described above and voices a question sentence (input question sentence 20 shown in FIG. 1) to computer system 330. The mobile phone 300 includes a mobile phone 300 that transmits as a signal and receives an answer sentence (the answer sentence 34 shown in FIG. 1) from the computer system 330 by an audio signal. In the present embodiment, of the communication between the mobile phone 300 and the computer system 330, the question sentence and the answer sentence are transmitted by voice. Therefore, the mobile phone 300 only needs to be capable of voice communication with a normal carrier. On the other hand, the computer system 330 has a function of receiving a voice signal from a telephone, a function of converting the voice signal into a text string by voice recognition, and giving the input question sentence 20 to the template estimation unit 48, and a scoring / selection unit. It is necessary to have a function of converting the answer sentence 34 output by 68 into a voice signal by voice synthesis and returning it to the mobile phone 300 by telephone. Since these are all existing functions, their details are not described here. Of course, if the mobile phone 300 has a data communication function, the response text from the computer system 330 is sent to the mobile phone 300 in text format, or the HTML (combined from the answer and the URL related to the answer) (HyperText Markup Language) document may be returned to the mobile phone 300 and a browser may be activated on the mobile phone 300.

図６を参照して、このコンピュータシステム３３０は、ＦＤ（フレキシブルディスク）ドライブ３５２およびＣＤ−ＲＯＭ（コンパクトディスク読出専用メモリ）ドライブ３５０を有するコンピュータ３４０と、キーボード３４６と、マウス３４８と、モニタ３４２とを含む。 Referring to FIG. 6, the computer system 330 includes a computer 340 having an FD (flexible disk) drive 352 and a CD-ROM (compact disk read only memory) drive 350, a keyboard 346, a mouse 348, and a monitor 342. including.

図７を参照して、コンピュータ３４０は、ＦＤドライブ３５２およびＣＤ−ＲＯＭドライブ３５０に加えて、ＣＰＵ（中央処理装置）３５６と、ＣＰＵ３５６、ＦＤドライブ３５２およびＣＤ−ＲＯＭドライブ３５０に接続されたバス３６６と、ブートアッププログラム等を記憶する読出専用メモリ（ＲＯＭ）３５８と、バス３６６に接続され、プログラム命令、システムプログラム、および作業データ等を記憶するランダムアクセスメモリ（ＲＡＭ）３６０とを含む。コンピュータシステム３３０はさらに、インターネットへの接続を提供するネットワークインターフェイス（Ｉ／Ｆ）３４４を含む。図示しないが、コンピュータ３４０はネットワークＩ／Ｆ３４４を介して携帯電話ネットワークと接続されており、携帯電話３００とデータ通信を行なうことができる。 Referring to FIG. 7, in addition to FD drive 352 and CD-ROM drive 350, computer 340 includes CPU (central processing unit) 356 and bus 366 connected to CPU 356, FD drive 352 and CD-ROM drive 350. And a read only memory (ROM) 358 for storing a boot-up program and the like, and a random access memory (RAM) 360 connected to the bus 366 for storing a program command, a system program, work data, and the like. The computer system 330 further includes a network interface (I / F) 344 that provides a connection to the Internet. Although not shown, the computer 340 is connected to the mobile phone network via the network I / F 344 and can perform data communication with the mobile phone 300.

コンピュータシステム３３０に質問応答装置３０としての動作を行なわせるためのコンピュータプログラムは、ＣＤ−ＲＯＭドライブ３５０またはＦＤドライブ３５２に挿入されるＣＤ−ＲＯＭ３６２またはＦＤ３６４に記憶され、さらにハードディスク３５４に転送される。または、プログラムは図示しないネットワークを通じてコンピュータ３４０に送信されハードディスク３５４に記憶されてもよい。プログラムは実行の際にＲＡＭ３６０にロードされる。ＣＤ−ＲＯＭ３６２から、ＦＤ３６４から、またはネットワークを介して、直接にＲＡＭ３６０にプログラムをロードしてもよい。 A computer program for causing the computer system 330 to operate as the question answering apparatus 30 is stored in the CD-ROM 362 or FD 364 inserted in the CD-ROM drive 350 or FD drive 352 and further transferred to the hard disk 354. Alternatively, the program may be transmitted to the computer 340 through a network (not shown) and stored in the hard disk 354. The program is loaded into the RAM 360 when executed. The program may be loaded directly into the RAM 360 from the CD-ROM 362, from the FD 364, or via a network.

このプログラムは、コンピュータ３４０にこの実施の形態の質問応答装置３０として動作を行なわせる複数の命令を含む。この動作を行なわせるのに必要な基本的機能のいくつかはコンピュータ３４０上で動作するオペレーティングシステム（ＯＳ）もしくはサードパーティのプログラム、またはコンピュータ３４０にインストールされる各種ツールキットのモジュールにより提供される。従って、このプログラムはこの実施の形態のシステムおよび方法を実現するのに必要な機能全てを必ずしも含まなくてよい。このプログラムは、命令のうち、所望の結果が得られるように制御されたやり方で適切な機能または「ツール」（プログラムライブラリ）を呼出すことにより、上記した質問応答装置３０としての動作を実行する命令のみを含んでいればよい。 This program includes a plurality of instructions for causing the computer 340 to operate as the question answering apparatus 30 of this embodiment. Some of the basic functions required to perform this operation are provided by operating system (OS) or third party programs running on the computer 340 or various toolkit modules installed on the computer 340. Therefore, this program does not necessarily include all functions necessary for realizing the system and method of this embodiment. This program executes an operation as the above-mentioned question answering apparatus 30 by calling an appropriate function or “tool” (program library) in a controlled manner so as to obtain a desired result. Only need to be included.

なお、図１に示すＷｅｂコーパス３２、テンプレート集合記憶部４６、シードテンプレート集合記憶部５０、テンプレート拡張規則記憶部５４、拡張テンプレート集合記憶部５６、マッチ単語列集合記憶装置６２等は、いずれも図７に示すハードディスク３５４またはＲＡＭ３６０により実現される。特に、例えばＷｅｂコーパス３２、テンプレート拡張規則記憶部５４等は通常はハードディスク３５４に記憶されており、プログラムの実行時、必要に応じてＲＡＭ３６０にロードされる。シードテンプレート集合記憶部５０、拡張テンプレート集合記憶部５６、マッチ単語列集合記憶装置６２等はワークファイル的な性格を持つ。したがって、これらは生成時にはＲＡＭ３６０に生成され、保存の必要があればハードディスク３５４に保存される。 The web corpus 32, template set storage unit 46, seed template set storage unit 50, template extended rule storage unit 54, extended template set storage unit 56, match word string set storage device 62, etc. shown in FIG. 7 is realized by the hard disk 354 or the RAM 360 shown in FIG. In particular, for example, the Web corpus 32 and the template expansion rule storage unit 54 are normally stored in the hard disk 354, and are loaded into the RAM 360 as necessary when the program is executed. The seed template set storage unit 50, the extended template set storage unit 56, the match word string set storage device 62, and the like have work file characteristics. Therefore, these are generated in the RAM 360 at the time of generation, and are stored in the hard disk 354 if necessary to be stored.

コンピュータシステム３３０の動作は周知であるので、ここでは繰返さない。 The operation of computer system 330 is well known and will not be repeated here.

今回開示された実施の形態は単に例示であって、本発明が上記した実施の形態のみに制限されるわけではない。本発明の範囲は、発明の詳細な説明の記載を参酌した上で、特許請求の範囲の各請求項によって示され、そこに記載された文言と均等の意味及び範囲内での全ての変更を含む。 The embodiment disclosed herein is merely an example, and the present invention is not limited to the above-described embodiment. The scope of the present invention is indicated by each claim of the claims after taking into account the description of the detailed description of the invention, and all modifications within the meaning and scope equivalent to the wording described therein are included. Including.

２０入力質問文
３０質問応答装置
３２Ｗｅｂコーパス
３４回答文
４６テンプレート集合記憶部
４８テンプレート推定部
５０シードテンプレート集合記憶部
５２テンプレート拡張処理部
５４テンプレート拡張規則記憶部
５６拡張テンプレート集合記憶部
６０マッチング部
６２マッチ単語列集合記憶装置
６６回答候補収集部
６８スコアリング・選択部 DESCRIPTION OF SYMBOLS 20 Input question sentence 30 Question answering device 32 Web corpus 34 Answer sentence 46 Template set storage part 48 Template estimation part 50 Seed template set storage part 52 Template expansion process part 54 Template extended rule storage part 56 Extended template set storage part 60 Matching part 62 Match word string set storage device 66 Answer candidate collection unit 68 Scoring / selection unit

Claims

A template storage means for storing a template of an answer sentence for a question sentence, the template having first and second variable parts;
In response to receiving an input of a question sentence, one or a plurality of templates as a prototype of an answer sentence for the question sentence is estimated from the templates stored in the template storage unit, and each template in the question sentence is estimated. Template estimation means for outputting together with a constraint indicating a word corresponding to the first variable part of
By applying a template expansion rule prepared in advance to a template selected based on the word class of the words that constitute the constraint condition of the template among the templates output by the template estimation means, a word or sentence Template extension means for generating one or a plurality of extension templates whose structures are modified,
Each of the extension templates includes first and second variables indicating positions of answer candidates for the question sentence, and the two variables are the first and second variables of the template stored in the template storage unit. 2 is associated with each of the two variable parts, and each of them specifies an attribute to be satisfied by the variable,
The template extension means selects an extension template in which the attribute of the word constituting the constraint condition has a predetermined relationship with the attribute of the first variable,
For outputting one or more candidates that can be exchanged with the second variable of the extension template by matching each of the extension templates output by the template extension means and a plurality of sentences prepared in advance. Matching means;
As an answer to the question sentence, based on the process until the candidate is obtained or the co-occurrence frequency of the word included in the extended template that matches the candidate and the candidate for the candidate output by the matching means And a scoring and selection means for calculating a score indicating eligibility of the score and outputting the candidates in order of the highest score.

The question answering device according to claim 1, wherein the attribute is a word class of a word assigned to a corresponding variable.

The question according to claim 1 or 2, further comprising: a question sentence receiving means for receiving the question sentence as a voice signal by communication, converting it into a text string by voice recognition and inputting the text string to the template estimating means. Answer device.

The question answering apparatus according to any one of claims 1 to 3, further comprising speech synthesis means for converting a candidate output by the scoring and selection means into speech by speech synthesis.

further,
A question sentence receiving means for receiving the question sentence as a voice signal by communication, converting it into a text string by voice recognition and inputting it to the template estimation means;
The speech synthesis means for converting the candidate output by the scoring and selection means into a speech signal by speech synthesis and sending it back to the terminal that has transmitted the question sentence. The question answering device described.

Computer
A template storage means for storing a template of an answer sentence for a question sentence, the template having first and second variable parts;
In response to receiving an input of a question sentence, one or a plurality of templates as a prototype of an answer sentence for the question sentence is estimated from the templates stored in the template storage unit, and each template in the question sentence is estimated. Template estimation means for outputting together with a constraint indicating a word corresponding to the first variable part of
The structure of the word or sentence is corrected by applying the word class of the word constituting the constraint condition of the template and the template expansion rule prepared in advance to each of the templates output by the template estimation means. Template extension means for generating one or more extension templates;
Each of the extension templates includes first and second variables indicating positions of answer candidates for the question sentence, and the two variables are the first and second variables of the template stored in the template storage unit. 2 is associated with each of the two variable parts, and each of them specifies an attribute to be satisfied by the variable,
One or a plurality of candidates that can be exchanged with the second variable of the extension template by matching each of the extension templates output by the template extension means with a number of sentences prepared to be available in advance. Matching means for outputting the number of pieces,
Answers to the question sentence based on the process until the candidate is obtained or the co-occurrence frequency of the word included in the extended template that matches the candidate and the candidate for the candidate output by the matching means A computer program that calculates a score indicating eligibility as a score and functions as scoring and selection means for outputting the candidates in order of the highest score.