JP5882241B2

JP5882241B2 - Method and apparatus for generating search keyword for question answering, and program

Info

Publication number: JP5882241B2
Application number: JP2013001146A
Authority: JP
Inventors: 今村　賢治; 賢治今村; 朋子泉; 東中　竜一郎; 竜一郎東中
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-01-08
Filing date: 2013-01-08
Publication date: 2016-03-09
Anticipated expiration: 2033-01-08
Also published as: JP2014134871A

Description

本発明は、質問応答用検索キーワード生成方法、装置、及びプログラムに関し、特に、質問文に対する回答を検索するための検索キーワードを生成する質問応答用検索キーワード生成方法、装置、及びプログラムに関する。 The present invention relates to a query response search keyword generation method, apparatus, and program, and more particularly to a question response search keyword generation method, apparatus, and program for generating a search keyword for searching for an answer to a question sentence.

コンピュータによる質問応答システムにおいては、ユーザの質問が自然言語で入力される。たとえば、単純な事実や出来事に関して尋ねるファクトイド型質問では、「日本にメガネを伝えたのは誰？」のような形で入力される。それに対して質問応答システムは、「フランシスコ・ザビエル」という回答を返すことが望まれる。 In a question answering system using a computer, a user's question is input in a natural language. For example, a factoid-type question that asks about simple facts and events is entered in a form such as "Who did you tell me about glasses?" On the other hand, it is desirable that the question answering system returns an answer “Francisco Xavier”.

また、理由や原因、意見、方法などを尋ねるノンファクトイド型質問では、「犬にかまれちゃいました。どうしたらいいでしょう？」のような形で入力され、質問応答システムは関連するホームページを表示したり、ホームページ上の方法部分を切り取って表示する。以下、ユーザ質問「日本にメガネを伝えたのは誰？」に対して、質問応答システムが「フランシスコ・ザビエル」と返す例を例１、ユーザ質問「犬にかまれちゃいました。どうしたらいいでしょう？」に対して、質問応答システムが「犬にかまれたときは、傷口をよく洗い流し、すぐに病院に行って治療を受けましょう」を返す例を例２として説明する。 In addition, non-factoid type questions that ask reasons, causes, opinions, methods, etc. are entered in a form such as “I was bitten by a dog. What should I do?” Or cut and display the method part on the homepage. The following is an example in which the question answering system returns “Francisco Xavier” to the user question “Who brought the glasses to Japan?” User question “The dog has been bitten. An example will be described in which the question answering system returns “When a dog bites you, wash your wounds well and go to the hospital and receive treatment immediately”.

図８に、質問応答システムの構成例を示す。この質問応答システムでは、入力されたユーザ質問に対して、以下のように動作する。はじめに質問解析において、ユーザ質問を解析し、回答タイプの同定、および情報検索用の検索キーワードへの変換を行う。次に、情報検索によって、質問解析から渡された検索キーワードを元にインターネット検索を行い、検索キーワードを含む文書またはその概要文（スニペット）のうち、検索キーワードを多く含む上位ｎ個のスニペットを取得する。 FIG. 8 shows a configuration example of the question answering system. This question answering system operates as follows in response to an input user question. First, in the question analysis, a user question is analyzed, an answer type is identified, and converted into a search keyword for information search. Next, by performing an information search, an Internet search is performed based on the search keyword passed from the question analysis, and the top n snippets containing a large number of search keywords are obtained from the document including the search keyword or its summary sentence (snippet). To do.

そして、回答候補抽出において、情報検索で取得されたｎ個のスニペットと、質問解析で得られた回答タイプとを元に、回答の候補を作成する。もし、回答タイプがファクトイド型質問のタイプであった場合、情報検索で得られたスニペットから固有表現の抽出を行う。ノンファクトイド型質問における回答候補抽出法には、非特許文献１などがある。そして、回答候補評価によって、回答候補抽出で得られた複数の回答候補のうち、最も回答にふさわしいものを選択して出力する。 In the answer candidate extraction, answer candidates are created based on the n snippets obtained by the information search and the answer type obtained by the question analysis. If the answer type is a factoid question type, a specific expression is extracted from the snippet obtained by the information search. Nonpatent literature 1 etc. exist in the answer candidate extraction method in a non fact fact type question. Then, the answer candidate evaluation selects and outputs the most suitable answer among the plurality of answer candidates obtained by answer candidate extraction.

また、入力された文について、出来事の事実性を保持したまま簡潔に言い換える方法として、文の述部を、その意味をできるだけ変えずに最も単純な形に変換する方法が知られている（例えば、特許文献１を参照）。 In addition, as a method for concisely rephrasing the input sentence while maintaining the factuality of the event, there is known a method for converting the predicate of the sentence into the simplest form without changing its meaning as much as possible (for example, , See Patent Document 1).

特開２０１１−１４５８４４号公報JP 2011-145844 A

Murata M、 Tsukawaki S、 Kanamaru T、 Ma Q、and Isahara H、「A System for Answering Non-Factoid Japanese Questions by Using Passage Retrieval Weighted Based on Type of Answer.」、In Proceedings of NTCIR-6 Workshop Meeting、Tokyo、Japan、２００７、pp.477-482Murata M, Tsukawaki S, Kanamaru T, Ma Q, and Isahara H, `` A System for Answering Non-Factoid Japanese Questions by Using Passage Retrieval Weighted Based on Type of Answer. '', In Proceedings of NTCIR-6 Workshop Meeting, Tokyo, Japan, 2007, pp.477-482

上記一連の処理の中で重要な要素の一つは、適切な検索キーワードを設定して、真の回答が多く含まれるスニペット（文書）を情報検索によって得る必要があるという点である。もし、不適切な検索キーワードを設定してしまうと、真の回答が含まれない文書が多数検索されてしまい、正しい回答を出力することができなくなる。 One of the important elements in the above series of processes is that it is necessary to set an appropriate search keyword and obtain a snippet (document) containing many true answers by information search. If an inappropriate search keyword is set, a large number of documents that do not include a true answer are searched, and a correct answer cannot be output.

たとえば、上記の例１のユーザ質問に対して、質問解析が「日本メガネ伝える」という内容語だけの検索キーワードを生成した場合、図９に示す文書１−１、文書１−２のようなスニペットが得られる（情報検索で得るスニペット数ｎ＝２の場合）。しかし、検索キーワード「日本メガネ伝える」によって得られた文書１−１にも文書１−２にも、正解となるべき「フランシスコ・ザビエル」が含まれないため、正しい回答を出力することはできない。 For example, for the user question of Example 1 above, when the query analysis generates a search keyword with only the content word “Japanese glasses tell”, snippets such as document 1-1 and document 1-2 shown in FIG. (When the number of snippets obtained by information retrieval is n = 2). However, since neither the document 1-1 nor the document 1-2 obtained by the search keyword “Japanese glasses tell” contains “francis Xavier” which should be the correct answer, a correct answer cannot be output.

また、上記の例２のユーザ質問に対して、質問解析が「犬かむ」という内容語だけの検索キーワードを生成した場合、図１０に示す文書１−１、文書１−２のようなスニペットが得られるが（インターネット検索では、文書の形態素解析を行った上で検索キーワードとマッチしており、検索キーワード「かむ」に対して、文書の出現形「かま（かむの未然形）」「かみ（同連用形）」などにマッチする）、文書１−１は犬のしつけに関する文書で、文書１−２は犬用のガムの広告であるので、犬にかまれたときの対処法を出力することはできない。 Further, in the case of the user question of Example 2 above, when the query analysis generates a search keyword having only the content word “Dog Kam”, snippets such as Document 1-1 and Document 1-2 shown in FIG. (In Internet search, it matches the search keyword after performing morphological analysis of the document. For the search keyword “Kam”, the appearance form of the document “Kama (Kam's blank form)” “Kami ( The document 1-1 is a document related to dog discipline, and the document 1-2 is an advertisement for dog gum, so output a countermeasure when a dog is bitten. I can't.

このように、ユーザ質問の内容語だけを検索キーワードとした場合、否定の「ない」、態を表す「れる（られる）」、時制、モダリティ情報がわからないため、ユーザ質問の意図とは異なる文書が多数検索される。 As described above, when only the content word of the user question is used as a search keyword, a document that is different from the intention of the user question cannot be obtained because there is no negative “no”, “being (being)” indicating the state, tense, and modality information. Many searches.

もう一つの従来法として、ユーザ質問を文節に分解し、動詞、形容詞を含む文節に関しては、内容語だけでなく機能語も検索キーワードとする方法が考えられる。この方法では、上記例１のユーザ質問からは、検索キーワードとして、「日本メガネ伝えた」が得られ、図９の文書２−１、文書２−２に示すスニペットが得られるようになる。上記スニペットには、例１の正解である「フランシスコ・ザビエル」が含まれるため、質問応答システムも正しい回答を出力することが可能になる。 As another conventional method, a method in which a user question is decomposed into phrases, and for phrases including verbs and adjectives, not only content words but also function words can be used as search keywords. In this method, from the user question of Example 1 above, “Japanese glasses transmitted” is obtained as a search keyword, and snippets shown in document 2-1 and document 2-2 in FIG. 9 are obtained. Since the snippet includes “Francisco Xavier” which is the correct answer of Example 1, the question answering system can also output a correct answer.

しかし、上記例２に関しては、検索キーワード「犬かまれちゃいました」が得られ、図１０の文書３−１、文書３−２のように、「ちゃいました」がマッチする文書が優先して検索されてしまい、犬にかまれたときの対処法は得られない。 However, with respect to Example 2 above, the search keyword “dog was bitten” was obtained, and documents matching “Chaita” had priority, such as Document 3-1 and Document 3-2 in FIG. It is not possible to obtain a countermeasure when a dog is bitten by a search.

日本語では、「犬にかまれちゃいましたときは」とは言わないように、連体修飾を用いる場合、丁寧表現「ます」が欠落したり、完了を意味する「ちゃう」と「た」が同時に現れにくい。したがって、不必要な機能表現が検索キーワードに含まれていると、情報検索で得られた文書に、ユーザ質問の回答を含まれないものが多くなるという問題がある。 In Japanese, don't say "when you got bitten by a dog". When you use the combination modification, the polite expression "mas" is missing or "chau" and "ta" mean completion. It is difficult to appear at the same time. Therefore, if an unnecessary functional expression is included in the search keyword, there is a problem that many documents obtained by information search do not include the answer to the user question.

本発明は、上記問題を解決するためになされたもので、質問文に対する回答を精度よく検索するための検索キーワードを得ることができる質問応答用検索キーワード生成方法、装置、及びプログラムを提供することを目的とする。 The present invention has been made to solve the above problem, and provides a query response search keyword generation method, apparatus, and program capable of obtaining a search keyword for accurately searching for an answer to a question sentence. With the goal.

上記の目的を達成するために本発明に係る質問応答用検索キーワード生成方法は、述部機能表現正規化手段、内容語抽出手段、及び検索キーワード生成手段を含む質問応答用検索キーワード生成装置における質問応答用検索キーワード生成方法であって、前記述部機能表現正規化手段によって、入力された質問文の形態素解析結果に基づいて、前記質問文に含まれる、内容語と、前記内容語に後続する文字列であって少なくとも１つの機能語を含む文字列である機能表現との組み合わせで構成される述部について、予め定められた、否定を表す機能語、態を表す機能語、時制を表す機能語、及びモダリティを表す機能語を残し、否定を表す機能語、態を表す機能語、時制を表す機能語、及びモダリティを表す機能語以外の機能語を、削除することにより、前記述部を正規化するステップと、前記内容語抽出手段によって、前記質問文の形態素解析結果に基づいて、前記質問文の述部以外の部分から、内容語を抽出するステップと、前記検索キーワード生成手段によって、前記述部機能表現正規化手段によって正規化された述部と、前記内容語抽出手段によって抽出された内容語との組み合わせを、前記質問文に対する回答を検索するための検索キーワードとして生成するステップと、を含む。 In order to achieve the above object, a query keyword search keyword generation method according to the present invention includes a predicate function expression normalization unit, a content word extraction unit, and a query in a query keyword search keyword generation device including a search keyword generation unit. A method for generating a search keyword for response, which includes a content word included in the question sentence and the content word following the content word based on the morphological analysis result of the question sentence input by the pre-description unit function expression normalization means Predetermined function words that represent combinations of function expressions that are character strings that include at least one function word, function words that represent negation, function words that represent states, and functions that represent tense word, and leave the function words representing the modality, function words representing the negative, function words representing the state, function words representing the tense, and other than function words representing the modality of the function words, to remove Normalizing the previous description part, and extracting the content word from the part other than the predicate part of the question sentence based on the morphological analysis result of the question sentence by the content word extracting means, A search for an answer to the question sentence by a combination of the predicate normalized by the pre-description part functional expression normalizing means and the content word extracted by the content word extracting means by the search keyword generating means Generating as a search keyword.

本発明に係る質問応答用検索キーワード生成装置は、入力された質問文の形態素解析結果に基づいて、前記質問文に含まれる、内容語と、前記内容語に後続する文字列であって少なくとも１つの機能語を含む文字列である機能表現との組み合わせで構成される述部について、予め定められた、否定を表す機能語、態を表す機能語、時制を表す機能語、及びモダリティを表す機能語を残し、否定を表す機能語、態を表す機能語、時制を表す機能語、及びモダリティを表す機能語以外の機能語を、削除することにより、前記述部を正規化する述部機能表現正規化手段と、前記質問文の形態素解析結果に基づいて、前記質問文の述部以外の部分から、内容語を抽出する内容語抽出手段と、前記述部機能表現正規化手段によって正規化された述部と、前記内容語抽出手段によって抽出された内容語との組み合わせを、前記質問文に対する回答を検索するための検索キーワードとして生成する検索キーワード生成手段とを含んで構成されている。 The search keyword generation device for question response according to the present invention is a content word included in the question sentence based on a morphological analysis result of the inputted question sentence, and a character string following the content word, and at least 1 For predicates composed of combinations of function expressions that are character strings including two function words, the function words representing negation, function words representing state, function words representing tense, and function representing modalities are predetermined. Predicate function expression that normalizes the predescription part by deleting the function words other than the function word that leaves the word, the function word that represents negation, the function word that represents the state, the function word that represents the tense, and the modality Normalization means, content word extraction means for extracting content words from a part other than the predicate of the question sentence based on the morphological analysis result of the question sentence, and normalization by the pre-description part functional expression normalization means Predicates and The combination of content words extracted by the serial contents word extraction means is configured to include a search keyword generating means for generating a search keyword for searching an answer to the question message.

本発明に係る質問応答用検索キーワード生成方法及び質問応答用検索キーワード生成装置によれば、述部機能表現正規化手段によって、入力された質問文の形態素解析結果に基づいて、質問文に含まれる、内容語と、内容語に後続する文字列であって少なくとも１つの機能語を含む文字列である機能表現との組み合わせで構成される述部について、述部の意味に影響を与えないように機能表現を単純な形に変換することにより、述部を正規化する。 According to the search keyword generation method for question response and the search keyword generation device for question response according to the present invention, the predicate function expression normalization means includes the question sentence based on the morphological analysis result of the inputted question sentence. In the predicate consisting of a combination of a content word and a functional expression that is a character string that follows the content word and includes at least one functional word so as not to affect the meaning of the predicate Normalize predicates by converting functional expressions into simple forms.

そして、内容語抽出手段によって、質問文の形態素解析結果に基づいて、質問文の述部以外の部分から、内容語を抽出する。 Then, the content word extraction unit extracts the content word from the portion other than the predicate of the question sentence based on the morphological analysis result of the question sentence.

そして、検索キーワード生成手段によって、述部機能表現正規化手段によって正規化された述部と、内容語抽出手段によって抽出された内容語との組み合わせを、質問文に対する回答を検索するための検索キーワードとして生成する。 Then, the search keyword for searching for the answer to the question sentence by the combination of the predicate normalized by the predicate function expression normalizing means and the content word extracted by the content word extracting means by the search keyword generating means Generate as

このように、質問文の述部の意味に影響を与えないように、機能表現を単純な形に変換して述部を正規化し、正規化された述部と、質問文に含まれる内容語との組み合わせを、質問文に対する回答を検索するための検索キーワードとして生成することにより、質問文に対する回答を精度よく検索するための検索キーワードを得ることができる。 In this way, in order not to affect the meaning of the predicate of the question sentence, the function expression is converted into a simple form to normalize the predicate, and the normalized predicate and the content word included in the question sentence Is generated as a search keyword for searching for an answer to a question sentence, a search keyword for accurately searching for an answer to the question sentence can be obtained.

また、前記述部機能表現正規化手段は、前記入力された質問文の形態素解析結果に基づいて、前記質問文に含まれる、内容語と、前記機能表現との組み合わせで構成される述部について、前記述部の意味に影響を与えない機能語及び冗長な機能語を削除して前記機能表現を単純な形に変換することにより、前記述部を正規化するようにすることができる。 In addition, the pre-description part functional expression normalization means is based on the morphological analysis result of the input question sentence, and the predicate composed of a combination of the content word and the functional expression included in the question sentence The previous description part can be normalized by deleting function words that do not affect the meaning of the previous description part and redundant function words and converting the function expression into a simple form.

また、前記述部機能表現正規化手段は、予め定められた、否定を表す機能語、態を表す機能語、時制を表す機能語、及びモダリティを表す機能語以外の機能語を、前記述部の意味に影響を与えない機能語として削除するようにすることができる。 Further, the pre-description part function expression normalizing means is configured to predetermine a function word other than a function word representing negation, a function word representing a state, a function word representing tense, and a function word representing a modality. It can be deleted as a function word that does not affect the meaning of.

また、本発明に係るプログラムは、上記の質問応答用検索キーワード生成方法の各ステップをコンピュータに実行させるためのプログラムである。 A program according to the present invention is a program for causing a computer to execute each step of the above-described question answer search keyword generation method.

以上説明したように、本発明の質問応答用検索キーワード生成方法、装置、及びプログラムによれば、質問文の述部の意味に影響を与えないように、機能表現を単純な形に変換して述部を正規化し、正規化された述部と、質問文に含まれる内容語との組み合わせを、質問文に対する回答を検索するための検索キーワードとして生成することにより、質問文に対する回答を精度よく検索するための検索キーワードを得ることができる、という効果が得られる。 As described above, according to the query keyword search keyword generation method, apparatus, and program of the present invention, the function expression is converted into a simple form so as not to affect the meaning of the predicate of the question sentence. By normalizing predicates and generating a combination of the normalized predicates and content words contained in the question text as search keywords for searching for answers to the question text, the answers to the question text can be accurately obtained. The effect that the search keyword for searching can be obtained is acquired.

本発明の実施の形態に係る質問応答装置の一構成例を示すブロック図である。It is a block diagram which shows the example of 1 structure of the question answering apparatus which concerns on embodiment of this invention. 質問文「パソコンが壊れちゃったよ。」についての形態素解析結果の例を示す図である。It is a figure which shows the example of the morphological analysis result about a question sentence "The personal computer has broken." 質問文「パソコンが壊れちゃったよ。」についての述部抽出結果の例を示す図である。It is a figure which shows the example of the predicate extraction result about a question sentence "a personal computer has broken." 質問文「パソコンが壊れちゃったよ。」についての意味ラベル付与結果の例を示す図である。It is a figure which shows the example of the meaning label provision result about a question sentence "a personal computer has broken." 質問文「パソコンが壊れちゃったよ。」についてのＮＵＬＬ削除結果の例を示す図である。It is a figure which shows the example of the NULL deletion result about a question sentence "a personal computer has broken." 質問文「パソコンが壊れちゃったよ。」についての冗長ラベル削除結果の例を示す図である。It is a figure which shows the example of the redundant label deletion result about a question sentence "a personal computer has broken." 本発明の実施の形態に係る質問応答装置おける質問応答処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the question answer process routine in the question answering apparatus which concerns on embodiment of this invention. 従来技術を説明するための説明図である。It is explanatory drawing for demonstrating a prior art. 質問文「日本にメガネを伝えたのは誰？」に対する検索キーワードと、情報検索によって得られた文書の例を示す図である。It is a figure which shows the example of the document obtained by the search keyword with respect to the question sentence "Who did you tell glasses to Japan?", And an information search. 質問文「犬にかまれちゃいました。どうしたらいいでしょう？」に対する検索キーワードと情報検索によって得られた文書の例を示す図である。It is a figure which shows the example of the document obtained by the search keyword and information search with respect to the question sentence "The dog has bitten. What should I do?"

＜概要＞
まず、本発明の実施の形態の概要について説明する。 <Overview>
First, an outline of an embodiment of the present invention will be described.

本実施の形態は、ユーザによって作成された質問文を、出来事の事実性を保持しつつ簡潔に言い換えてから検索キーワードを生成することで、情報検索において、システムが回答すべき正解を含んだ文書またはその概要文（以下、スニペットと称する。）を多く取得し、得られたスニペットから質問に対する回答を抽出することを目的とする。 In this embodiment, a query including a correct answer to be answered by the system in information retrieval is generated by generating a search keyword after briefly rephrasing a question sentence created by a user while maintaining the factuality of an event. Alternatively, an object is to obtain many summary sentences (hereinafter referred to as snippets) and to extract answers to questions from the obtained snippets.

質問文を、出来事の事実性を保持したまま簡潔に言い換える方法としては、質問文の述部を、その意味をできるだけ変えずに最も単純な形に変換する方法を用いる（上記特許文献１を参照）。述部については、変換された述部全体を検索キーワードとし、その他の部分に関しては、内容語を検索キーワードとする。 As a method for succinctly rephrasing the question sentence while maintaining the fact of the event, a method of converting the predicate of the question sentence into the simplest form without changing its meaning as much as possible is used (see Patent Document 1 above). ). For predicates, the entire converted predicate is used as a search keyword, and for other parts, the content word is used as a search keyword.

以下、質問文「日本にメガネを伝えたのは誰？」に対して、質問応答装置が「フランシスコ・ザビエル」と応答する例を例１、質問文「犬にかまれちゃいました。どうしたらいいでしょう？」に対して、質問応答装置が「犬にかまれたときは、傷口をよく洗い流し、すぐに病院に行って治療を受けましょう」と応答する例を例２として説明する。 In the following example, the question answering device responds to the question “Who brought the glasses to Japan?” With the question answering device “Francisco Xavier”. An example will be described as an example 2 in which the question answering device responds “When a dog bites you, wash your wounds well and go to the hospital immediately for treatment”.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜質問応答装置のシステム構成＞
図１は、本発明の実施の形態に係る質問応答装置１００を示すブロック図である。この質問応答装置１００は、ＣＰＵと、ＲＡＭと、後述する質問応答処理ルーチンを実行するためのプログラムを記憶したＲＯＭとを備えたコンピュータで構成され、機能的には次に示すように構成されている。 <System configuration of question answering device>
FIG. 1 is a block diagram showing a question answering apparatus 100 according to an embodiment of the present invention. This question answering apparatus 100 is composed of a computer having a CPU, a RAM, and a ROM storing a program for executing a question answering processing routine described later, and is functionally constructed as follows. Yes.

本実施の形態に係る質問応答装置１００は、図１に示すように、入力部１と、演算部２と、出力部３とを備えている。 As shown in FIG. 1, the question answering apparatus 100 according to the present embodiment includes an input unit 1, a calculation unit 2, and an output unit 3.

入力部１は、ユーザによって作成された質問文を受け付ける。質問文は、自然言語によって作成された文である。 The input unit 1 receives a question sentence created by the user. The question sentence is a sentence created in a natural language.

演算部２は、質問解析部２０と、情報検索部２１と、回答候補抽出部２２と、回答候補評価部２３とから構成されている。 The calculation unit 2 includes a question analysis unit 20, an information search unit 21, an answer candidate extraction unit 22, and an answer candidate evaluation unit 23.

質問解析部２０は、質問文を解析し、質問文に対する回答タイプの同定、及び情報検索用の検索キーワードの生成を行う。 The question analysis unit 20 analyzes the question sentence, identifies an answer type for the question sentence, and generates a search keyword for information search.

また、質問解析部２０は、形態素解析部２０１と、述部機能表現正規化部２０２と、内容語抽出部２０３と、検索キーワード生成部２０４と、回答タイプ判定部２０５とから構成されている。 The question analysis unit 20 includes a morpheme analysis unit 201, a predicate function expression normalization unit 202, a content word extraction unit 203, a search keyword generation unit 204, and an answer type determination unit 205.

形態素解析部２０１は、入力部１により入力された質問文について、形態素解析を行って単語に分解し、各々の単語に品詞を付与する。 The morpheme analysis unit 201 performs morphological analysis on the question sentence input by the input unit 1 to decompose it into words, and gives parts of speech to each word.

述部機能表現正規化部２０２は、質問文の述部について、形態素解析部２０１によって解析された形態素解析結果に基づいて、述部の意味に影響を与えないように機能表現を単純な形に変換する。ここで、述部は、「内容語」と、「内容語」に後続する文字列であって少なくとも１つの機能語を含む文字列である「機能表現」との組み合わせで構成される。また、「内容語」は動詞、名詞、形容詞、形容動詞、副詞といった一般的な意味を持つ語を指し、「機能語」は助詞、助動詞などの文法的な役割を持つ語を指す。 The predicate function expression normalization unit 202 simplifies the function expression of the predicate of the question sentence based on the morpheme analysis result analyzed by the morpheme analysis unit 201 so as not to affect the meaning of the predicate. Convert. Here, the predicate is composed of a combination of “content word” and “function expression” which is a character string subsequent to “content word” and including at least one function word. “Content word” refers to a word having a general meaning such as a verb, noun, adjective, adjective verb, or adverb, and “function word” refers to a word having a grammatical role such as a particle or an auxiliary verb.

具体的には、述部機能表現正規化部２０２は、質問文の述部を抽出した後に、当該述部の機能表現を単純な表現に言い換える正規化を行う。当該述部機能表現の正規化方法は、上記特許文献１に開示されているとおりであり、内容語と、機能表現との組み合わせで構成される述部について、述部の意味に影響を与えない機能語及び冗長な機能語を削除して機能表現を単純な形に変換する。 Specifically, the predicate function expression normalization unit 202 extracts the predicate of the question sentence, and then performs normalization that paraphrases the function expression of the predicate into a simple expression. The normalization method of the predicate function expression is as disclosed in Patent Document 1 described above, and does not affect the meaning of the predicate for a predicate composed of a combination of a content word and a function expression. Remove function words and redundant function words to convert the function expression into a simple form.

述部の意味に影響を与えない機能語を削除する方法としては、予め定められた、否定を表す機能語、態を表す機能語、時制を表す機能語、及びモダリティを表す機能語に該当しない機能語を、述部の意味に影響を与えない機能語として削除する。すなわち、出来事の意味に影響する否定、態、時制、モダリティを表す機能語は残し、それ以外の機能語を削除することで、述部の機能表現を、出来事の意味を変えずに単純な表現に言い換えることができる。 As a method of deleting a function word that does not affect the meaning of the predicate, it does not correspond to a predetermined function word representing negation, a function word representing a state, a function word representing tense, and a function word representing a modality. The function word is deleted as a function word that does not affect the meaning of the predicate. In other words, function words that represent negation, state, tense, and modality that affect the meaning of the event remain, and other function words are deleted, so that the functional expression of the predicate can be expressed simply without changing the meaning of the event. In other words.

ここで、述部機能表現正規化部２０２の動作について、質問文「パソコンが壊れちゃったよ。」が質問応答装置１００に入力された場合を例に挙げて詳述する。 Here, the operation of the predicate function expression normalization unit 202 will be described in detail by taking as an example a case where the question sentence “PC has been broken” is input to the question answering apparatus 100.

述部機能表現を正規化するためには、機能表現意味ラベル辞書を用いる。ここで、機能表現意味ラベル辞書とは、述部の意味に影響を与える機能表現の意味を表す意味ラベルと、各意味ラベルに対応する機能表現の標準形による文字列のリストと、一つの述部に対応する機能表現が同じ意味ラベルの機能語を複数含む場合にそれらの機能語をどのように保持すべきかを表した意味ラベル毎の冗長ルールとを組み合わせてなる辞書である。機能表現意味ラベル辞書には、以下（１）〜（４）の意味ラベルと文字のリストが格納されている。 In order to normalize the predicate function expression, a function expression meaning label dictionary is used. Here, the functional expression semantic label dictionary is a semantic label that represents the meaning of the functional expression that affects the meaning of the predicate, a list of character strings in the standard form of the functional expression corresponding to each semantic label, and a single description. When the function expression corresponding to the part includes a plurality of function words having the same semantic label, the dictionary is formed by combining redundant rules for each semantic label indicating how to retain the function words. The function expression meaning label dictionary stores the following meaning labels and character lists (1) to (4).

（１）述部が表す出来事の否定または肯定を区別するための意味ラベルとして、「否定」の意味ラベルが格納されており、「否定」の意味ラベルに対応する文字リストとして、「ない」、「ねえ」が格納されている。なお、「否定」の意味ラベルに対応する文字リストに含まれる機能語が、否定を表す機能語の一例である。
（２）述部の態を区別するための意味ラベルとして、「受身」、「使役」の意味ラベルが格納されている。そして、各々の意味ラベルに対応する文字リストとして、以下の文字リストが格納されている。
「受身」：「れる」「られる」
「使役」：「せる」「される」
なお、「受身」、「使役」の意味ラベルに対応する文字リストに含まれる機能語が、態を表す機能語の一例である。
（３）述部が表す出来事の時制を特徴付ける意味ラベルとして、「完了」の意味ラベルが格納されており、「完了」の意味ラベルに対応する文字リストとして、「た」、「ちゃう」、「ちまう」、「て/しまう」、「て/おく」が格納されている。なお、「完了」の意味ラベルに対応する文字リストに含まれる機能語が、時制を表す機能語の一例である。
（４）述部が表す出来事に話者（ユーザ）の主観（即ち、モダリティ表現）が含まれているか否かを区別するための意味ラベルとして、「疑問」、「勧誘・意志」、「願望」、「依頼」、「勧め」、「必要」、「許可」、「推量」、「可能」が格納されている。そして、各々の意味ラベルに対応する文字リストとして、以下の文字リストが格納されている。
「疑問」：「か」
「勧誘・意志」：「う」
「願望」：「たい」「がな」「たい/がる」
「依頼」：「て/くれる」「て/欲しい」
「勧め」：「た/方/が/いい」「と/良い」
「必要」：「べき」「ない/て/は/いける/ない」
「許可」：「て/も/いい」
「推量」：「かも/知れる/ない」「らしい」「よう」「そう」「だろう」
「可能」：「れる」「こと/が/できる」
なお、これらの意味ラベルに対応する文字リストに含まれる機能語が、モダリティを表す機能語の一例である。 (1) A meaning label of “Negation” is stored as a meaning label for distinguishing negation or affirmation of the event represented by the predicate, and “None” as a character list corresponding to the meaning label of “Negation”, “Hey” is stored. Note that the function word included in the character list corresponding to the meaning label “deny” is an example of a function word representing negation.
(2) Semantic labels for “passive” and “usage” are stored as semantic labels for distinguishing predicate states. The following character lists are stored as character lists corresponding to the respective semantic labels.
"Passive": "Be""Be"
“Utility”: “Make” “To be”
Note that the function words included in the character list corresponding to the meaning labels “passive” and “usage” are examples of function words representing the state.
(3) A semantic label of “complete” is stored as a semantic label characterizing the tense of the event represented by the predicate, and “ta”, “chau”, “ “Chimau”, “Te / Eku”, and “Te / Oku” are stored. Note that the function word included in the character list corresponding to the meaning label “complete” is an example of a function word representing tense.
(4) “Question”, “Invitement / will”, “Aspiration” as a semantic label for distinguishing whether or not the event represented by the predicate includes the subjectivity (ie, modality expression) of the speaker (user) "," Request "," Recommendation "," Necessary "," Allow "," Inference ", and" Possible "are stored. The following character lists are stored as character lists corresponding to the respective semantic labels.
"Question": "ka"
“Invite / Will”: “U”
"Wish": "I want""Gana""I want / I want"
"Request": "Te / Me""Te / I want"
“Recommended”: “Ta / One / Bad / Good” “To / Good”
"Necessary": "Should""None / Te / Has / Can / No"
"Allow": "Te / mo / good"
“Inference”: “Maybe / Knowable / None” “Like” “Yo” “So” “Well”
"Possible": "Let""Koto / can / can"
Note that the function words included in the character list corresponding to these semantic labels are examples of function words representing modalities.

質問文「パソコンが壊れちゃったよ。」について、形態素解析部２０１によって形態素解析が行われ、図２に示すような形態素解析結果が述部機能表現正規化部２０２に入力されると、最初に、機能表現意味ラベル辞書（図示省略）の機能表現の標準形による文字列のリスト（エントリー）と一致する「ちゃ（う）」「た」が、述部の意味に影響を与える機能語として認識される。 When the morphological analysis is performed by the morphological analysis unit 201 for the question sentence “PC is broken”, and the morphological analysis result as shown in FIG. 2 is input to the predicate function expression normalization unit 202, first, “Cha” and “Ta” that match the list (entry) of character strings in the standard form of the functional expression in the functional expression meaning label dictionary (not shown) are recognized as function words that affect the meaning of the predicate. The

次に、品詞「助詞」に属する「よ」が機能語として認識される。そして最後に、品詞「動詞−自立」に属する「壊れ（る）」が内容語として認識される。その結果、「壊れちゃったよ」が述部として抽出される。図３に当該述部抽出結果の例を示す。 Next, “yo” belonging to the part of speech “particle” is recognized as a function word. Finally, “broken” belonging to the part of speech “verb—independence” is recognized as a content word. As a result, “It's broken” is extracted as a predicate. FIG. 3 shows an example of the predicate extraction result.

次に、上記抽出された述部「壊れちゃったよ」について、「後方からの最長一致法」によって、初めに「よ」が解析される。しかし、機能表現意味ラベル辞書に「よ」のエントリーがないため、空の意味ラベル「ＮＵＬＬ」が付与される。次に「た」及び「ちゃう」が解析され、それぞれ「完了」の意味ラベルが付与される。述部を構成する形態素に対して後方からの最長一致法で処理を進めた場合の最後の形態素、つまり内容語「壊れる」の直前でこの意味ラベル付与作業を終了する。図４に意味ラベル付与結果の例を示す。なお、意味ラベルの付与方法は後方からの最長一致法に限定せず、他の方法を用いてもよい。 Next, the extracted predicate “has been broken” is first analyzed by “longest matching method from the back”. However, since there is no “yo” entry in the functional expression semantic label dictionary, an empty semantic label “NULL” is given. Next, “ta” and “chau” are analyzed, and a “complete” semantic label is assigned to each. This semantic label assignment operation is terminated immediately before the last morpheme when the morpheme constituting the predicate is processed by the longest match method from the back, that is, immediately before the content word “break”. FIG. 4 shows an example of the meaning label assignment result. Note that the method of assigning a semantic label is not limited to the longest matching method from the back, and other methods may be used.

次に、上記述部の機能表現「ちゃったよ」について、空の意味ラベル「ＮＵＬＬ」が付与された「よ」が削除され、「ちゃった」という機能表現に単純化される。図５にＮＵＬＬ削除結果の例を示す。 Next, with respect to the functional expression “Cha-Tyo” in the upper description part, “Yo” with the empty semantic label “NULL” is deleted, and the functional expression “Cha-ta” is simplified. FIG. 5 shows an example of a NULL deletion result.

次に、機能表現意味ラベル辞書に記載の冗長ルールを用いて、冗長な機能語について、そのエントリー全て（単語情報）を削除する。冗長ルールというのは、「同一述部内に同じ意味を表す機能語が複数存在する場合、そのうちの一つのみ残すことで当該述部が表す意味を保持できる」という知見に基づくものである。意味ラベル付与後で、かつＮＵＬＬ削除後の述部が「壊れちゃった」であった場合、初めに当該意味ラベル付与後でかつＮＵＬＬ削除後の述部に対応する形態素毎の単語情報中に同一意味ラベルが２つ以上存在するか否かを調べ、２つ以上存在する場合は冗長ルールに沿って削除を行う。上記述部「壊れちゃった」に対応する形態素毎の単語情報には意味ラベル「完了」が２つ存在しているので、冗長ルールの対象となる。「完了」の冗長ルールは「表層形が同じ場合は先頭（Ｆｉｒｓｔ）を残し、それ以外は最後（Ｌａｓｔ）を残す」というものである。これに従って、機能語「ちゃっ」が削除され、正規化された機能表現として、意味ラベル「完了」の機能語「た」が残される。図６に冗長ラベル削除結果の例を示す。 Next, using the redundancy rule described in the function expression meaning label dictionary, all entries (word information) of redundant function words are deleted. The redundancy rule is based on the knowledge that “when there are a plurality of function words representing the same meaning in the same predicate, the meaning represented by the predicate can be retained by leaving only one of them”. If the predicate after semantic label assignment and NULL deletion is “broken”, it is the same in the word information for each morpheme corresponding to the predicate after provision of the semantic label and after NULL deletion. It is checked whether or not there are two or more semantic labels. If there are two or more semantic labels, deletion is performed according to the redundancy rule. Since the word information for each morpheme corresponding to the upper description part “has been broken” has two semantic labels “completed”, it is subject to a redundancy rule. The redundancy rule “completed” is “if the surface shape is the same, leave the first (First), otherwise leave the last (Last)”. Accordingly, the function word “Cha” is deleted, and the function word “ta” of the meaning label “complete” is left as a normalized function expression. FIG. 6 shows an example of the redundant label deletion result.

次に、内容語「壊れ（る）」と、正規化された機能表現「た」とを接続させる。単語の活用を含む形態素の接続処理には、言語モデルによる活用生成器を使用することができる。言語モデルによる活用生成器とは、予め正解データより、前方の単語の表層形、品詞及び活用型と、後方の単語の表層形及び品詞とを素性として「どの接続が尤もらしいか」を学習したモデルを用いた生成器である。従って、言語モデルによる活用生成器に、内容語「壊れ（る）」の「表記；品詞；活用型」である「壊れ；動詞−自立；一段」と、機能表現「た」の「た；助動詞；特殊・タ」と、文の終わりを表す形態素「。」の「。；記号−句点」とを入力することで、正しく接続された述部である「壊れた。」を生成することができる。なお、質問文に機能表現を含まない述部（正規化が不要な述部）が含まれている場合にも、当該述部を検索キーワードに含めるようにすることができる。 Next, the content word “broken” is connected to the normalized function expression “ta”. For the morpheme connection process including word utilization, a utilization generator based on a language model can be used. A language model utilization generator learns in advance from the correct answer data “what connection is likely” using the surface form, part of speech, and utilization type of the front word as well as the surface form and part of speech of the back word. It is a generator using a model. Therefore, in the utilization generator based on the language model, “notation; part of speech; utilization type” of the content word “break” (ru); “break; verb—independence; one step” and “ta; auxiliary verb of the function expression“ ta ” By inputting “special / ta” and “.; Symbol-punctuation” of the morpheme “.” Indicating the end of the sentence, it is possible to generate “broken” that is a correctly connected predicate. . In addition, even when a predicate that does not include a functional expression (a predicate that does not require normalization) is included in the question sentence, the predicate can be included in the search keyword.

上記例１について、述部機能表現正規化部２０２によって述部機能表現正規化を行うと、上記例１の質問文は「伝えた」が述部であり、正規化後の述部も「伝えた」となる。これは、述語「伝える」に完了の意味を持つ機能語「た」が付与されている形式であるが、完了は時制に属する意味であり、これを削除すると出来事が現在時制に変わってしまうため、「伝える」に「た」が付与された形で正規化される。また、上記例２の場合、述部機能表現正規化で「かまれちゃいました」が述部として判定され、正規化が行われる。この述部は、「かむ」が述語、「れる」が意味ラベル「受身」の機能語、「ちゃう」が意味ラベル「完了」の機能語、「ます」が丁寧を表す機能語、「た」が意味ラベル「完了」の機能語である。このうち「ます」は出来事の意味に影響しないため、削除され、「完了」の意味ラベルの機能語として、「ちゃう」「た」が冗長に現れているため、「ちゃう」は不必要と判断され、削除される。結果、正規化結果として、「かむ」「れる」「た」が残り、「かまれた」が正規化済み述部として出力される。 When the predicate function expression normalization unit 202 performs normalization of the predicate function expression with respect to the example 1, the question sentence of the above example 1 is the predicate, and the normalized predicate is also transmitted. " This is a form in which the function word “ta” with the meaning of completion is given to the predicate “Tell”, but “completion” belongs to the tense, and deleting this changes the event to the present tense. , Normalized by adding “ta” to “tell”. Further, in the case of the above example 2, “it has been bitten” is determined as a predicate in the predicate function expression normalization, and normalization is performed. In this predicate, "Kamu" is a predicate, "Ru" is a function word with a semantic label "Passive", "Cha" is a function word with a semantic label "Complete", "Mas" is a function word that represents politeness, "Ta" Is a function word of the semantic label “complete”. Of these, “mas” does not affect the meaning of the event, so it is deleted, and “chau” and “ta” appear redundantly as function words of the meaning label of “complete”, so “chau” is judged unnecessary And deleted. As a result, “bite”, “re” and “ta” remain as normalization results, and “bite” is output as a normalized predicate.

内容語抽出部２０３は、形態素解析部２０１による形態素解析結果に基づいて、質問文の述部以外の部分から、内容語を抽出する。たとえば、上記例１では「日本」「メガネ」の２語が、述部以外の部分における内容語である。また、上記例２では、述部以外の部分における内容語として抽出されるのは、「犬」である。 The content word extraction unit 203 extracts a content word from a part other than the predicate of the question sentence based on the morphological analysis result by the morpheme analysis unit 201. For example, in Example 1 above, the two words “Japan” and “Glasses” are content words other than the predicate. In Example 2 above, “dog” is extracted as the content word in the portion other than the predicate.

検索キーワード生成部２０４は、述部機能表現正規化部２０２によって正規化された述部と、内容語抽出部２０３によって抽出された内容語との組み合わせを、質問文に対する回答を検索するための検索キーワードとして生成する。上記例１の場合、結果として「日本メガネ伝えた」が検索キーワードとして生成される。上記例２の場合、「犬かまれた」が検索キーワードとして生成される。 The search keyword generation unit 204 searches for a combination of the predicate normalized by the predicate function expression normalization unit 202 and the content word extracted by the content word extraction unit 203 to search for an answer to the question sentence. Generate as a keyword. In the case of the above example 1, as a result, “Japanese glasses transmitted” is generated as a search keyword. In the case of Example 2 above, “dog bitten” is generated as a search keyword.

回答タイプ判定部２０５は、形態素解析部２０１による形態素解析結果に基づいて、質問文に対応する回答タイプを判定して出力する。回答タイプはあらかじめ決まっており、本実施の形態では、ファクトイド型質問の回答タイプとして、人名、地名、組織名、人工物名、日付、時間、金額、割合の８種があり、ノンファクトイド型質問の回答タイプとして、理由、方法の２種がある。回答タイプは、質問文に含まれる手がかり語（単語や表現）や、機械学習を用いた分類器によって決定される。たとえば上記例１では、手がかり語「誰」が含まれているため、人名を問われていると判定される。上記例２では、手がかり語「どうしたら」が含まれているため、方法を答えるものとして判定される。 The answer type determination unit 205 determines and outputs an answer type corresponding to the question sentence based on the morphological analysis result by the morpheme analysis unit 201. Answer types are determined in advance, and in this embodiment, there are eight types of answer types of factoid type questions: person name, place name, organization name, artifact name, date, time, amount of money, ratio, non-factoid type There are two types of question answer types: reason and method. The answer type is determined by a clue word (word or expression) included in the question sentence or a classifier using machine learning. For example, in Example 1 above, since the clue word “who” is included, it is determined that the person's name is being asked. In the above example 2, since the clue word “how to” is included, it is determined that the method is answered.

情報検索部２１は、検索キーワード生成部２０４によって生成された検索キーワードを基にインターネット検索を行い、得られた検索結果から、検索キーワードを含むスニペットのうち、検索キーワードを多く含む上位ｎ個のスニペットを取得する。ここでｎは通常数十である。例えば、上記例１では「日本メガネ伝えた」という検索キーワードに対して、上記図９に示す文書２−１、文書２−２のスニペットが得られる。上記例２では、「犬かまれた」という検索キーワードに対して、上記図１０に示す文書２−１、文書２−２のようなスニペットが得られる。 The information search unit 21 performs an Internet search based on the search keyword generated by the search keyword generation unit 204. From the obtained search results, the top n snippets including many search keywords among the snippets including the search keywords. To get. Here, n is usually several tens. For example, in the above example 1, the snippet of the document 2-1 and the document 2-2 shown in FIG. 9 is obtained with respect to the search keyword “Japanese glasses transmitted”. In the above example 2, snippets such as the document 2-1 and the document 2-2 shown in FIG. 10 are obtained for the search keyword “dog bite”.

回答候補抽出部２２は、情報検索部２１で取得されたｎ個のスニペットと回答タイプ判定部２０５で得られた回答タイプとに基づいて、回答候補を抽出する。
もし、回答タイプ判定部２０５によって判定された回答タイプがファクトイド型質問の回答タイプであった場合には、情報検索部２１によって得られたスニペットから固有表現の抽出を行う。固有表現の抽出では、スニペットから、ファクトイド型質問の回答タイプ８種に対応する、人名、地名、組織名、人工物名、日付、時間、金額、及び割合の各々を表す形態素列を固有表現として抽出する。スニペットが「メガネは１５４９年にフランシスコ・ザビエルが日本に伝えた」であったとすると、「１５４９年」が日付、「フランシスコ・ザビエル」が人名、「日本」が地名として抽出される。回答候補抽出部２２では、この固有表現の抽出処理をインターネット検索で得られた全スニペットに対して行い、抽出された固有表現から、判定された回答タイプ（上記例１では人名）と一致する固有表現をすべて回答候補として抽出する。
もし、回答タイプ判定部２０５によって判定された回答タイプがノンファクトイド型質問のタイプであった場合には、スニペット（文書）を段落や文に分解したのちに、回答タイプに応じた処理が行われる（上記非特許文献１参照）。回答タイプが方法であった場合、たとえば手がかり表現「方法」「手順」「ことにより」「〜には」「〜ときは」が含まれている段落や文を回答候補として抽出する（上記非特許文献１参照）。 The answer candidate extraction unit 22 extracts answer candidates based on the n snippets acquired by the information search unit 21 and the answer type obtained by the answer type determination unit 205.
If the answer type determined by the answer type determination unit 205 is the answer type of the factoid question, the specific expression is extracted from the snippet obtained by the information search unit 21. In the extraction of the specific expression, the morpheme sequence representing each of the person name, the place name, the organization name, the artifact name, the date, the time, the amount, and the ratio corresponding to the eight answer types of the factoid type question is used as the specific expression from the snippet. Extract. If the snippet was “Glasses was introduced to Japan by Francisco Xavier in 1549”, “1549” is the date, “Frances Xavier” is the name of the person, and “Japan” is the place name. The answer candidate extraction unit 22 performs the extraction process of the unique expression on all snippets obtained by the Internet search, and the unique expression that matches the determined answer type (person name in the above example 1) from the extracted unique expression. All expressions are extracted as answer candidates.
If the answer type determined by the answer type determination unit 205 is a non-factoid question type, the snippet (document) is decomposed into paragraphs and sentences, and then processing according to the answer type is performed. (See Non-Patent Document 1 above). When the answer type is a method, for example, paragraphs and sentences including the clue expressions “method”, “procedure”, “by”, “to”, “to tomo” are extracted as answer candidates (the above non-patent document) Reference 1).

回答候補評価部２３は、回答候補抽出部２２で得られた複数の回答候補のうち、最も回答に適したものを選択して出力する。回答タイプ判定部２０５によって判定された回答タイプがファクトイド型質問のタイプであった場合には、回答候補のうち、最も多くのスニペットに出現したものを選択する。たとえば、上記例１については、人名「フランシスコ・ザビエル」が４つのスニペットに出現し、「大内義隆」が２つのスニペットに出現した場合、回答として「フランシスコ・ザビエル」を出力する。一方、回答タイプ判定部２０５によって判定された回答タイプがノンファクトイド型質問のタイプであった場合には、回答候補抽出部２２によって抽出された回答候補の各々についてスコアを算出し、当該スコアと予め定められた閾値に基づいて、回答を出力する（上記非特許文献１参照）。 The answer candidate evaluation unit 23 selects and outputs the most suitable answer candidate from among the plurality of answer candidates obtained by the answer candidate extraction unit 22. If the answer type determined by the answer type determination unit 205 is a factoid question type, the answer candidate that appears in the most snippet is selected. For example, in the case of the above-described example 1, when the personal name “Francisco Xavier” appears in four snippets and “Yoshitaka Ouchi” appears in two snippets, “Francisco Xavier” is output as an answer. On the other hand, if the answer type determined by the answer type determining unit 205 is a non-factoid question type, a score is calculated for each of the answer candidates extracted by the answer candidate extracting unit 22, and the score and An answer is output based on a predetermined threshold (see Non-Patent Document 1 above).

出力部３は、回答候補評価部２３によって出力された回答を結果として出力する。上記図９に示す文書２−１、文書２−２には、回答の正解である「フランシスコ・ザビエル」が含まれているため、質問応答装置１００は「フランシスコ・ザビエル」を正しい回答として出力することができる。また、上記図１０に示す文書２−１、文書２−２には、犬にかまれたときの対処法が書かれているため、質問応答装置１００は当該対処法部分を質問文に合った回答として出力することができる。 The output unit 3 outputs the answer output by the answer candidate evaluation unit 23 as a result. Since the document 2-1 and the document 2-2 shown in FIG. 9 include “francis Xavier” which is the correct answer, the question answering apparatus 100 outputs “francis xavier” as a correct answer. be able to. In addition, in the document 2-1 and the document 2-2 shown in FIG. 10 described above, the countermeasures when the dog is bitten are written, so the question answering apparatus 100 matches the countermeasure part with the question sentence. Can be output as an answer.

＜質問応答装置の作用＞
次に、本実施の形態に係る質問応答装置１００の作用について説明する。まず、ユーザによって作成された質問文が質問応答装置１００に入力されると、質問応答装置１００によって、図７に示す質問応答処理ルーチンが実行される。 <Operation of question answering device>
Next, the operation of the question answering apparatus 100 according to the present embodiment will be described. First, when a question sentence created by the user is input to the question answering apparatus 100, the question answering process routine shown in FIG.

まず、ステップＳ１００において、質問文の入力を受け付ける。次に、ステップＳ１０２において、上記ステップＳ１００で受け付けた質問文について、形態素解析処理を行う。 First, in step S100, an input of a question sentence is accepted. Next, in step S102, a morphological analysis process is performed on the question sentence received in step S100.

次に、ステップＳ１０４において、述部機能表現正規化部２０２によって、上記ステップＳ１０２で解析された形態素解析結果に基づいて、質問文の述部を特定して抽出する。 Next, in step S104, the predicate function expression normalization unit 202 identifies and extracts the predicate of the question sentence based on the morpheme analysis result analyzed in step S102.

そして、ステップＳ１０６において、述部機能表現正規化部２０２によって、上記ステップＳ１０４で抽出された質問文の述部について、形態素解析部２０１によって解析された形態素解析結果に基づいて、述部の意味に影響を与えないように機能表現を単純な形に変換して正規化する。 In step S106, the predicate function expression normalization unit 202 converts the predicate of the question sentence extracted in step S104 to the meaning of the predicate based on the morpheme analysis result analyzed by the morpheme analysis unit 201. Convert the functional representation into a simple form and normalize it so as not to affect it.

そして、ステップＳ１０８において、内容語抽出部２０３によって、上記ステップＳ１０２による形態素解析結果に基づいて、質問文の述部以外の部分から、内容語を抽出する。 In step S108, the content word extraction unit 203 extracts content words from portions other than the predicate of the question sentence based on the morphological analysis result in step S102.

ステップＳ１１０において、検索キーワード生成部２０４によって、上記ステップＳ１０６で正規化された述部と、上記ステップＳ１０８で抽出された内容語との組み合わせを、質問文に対する回答を検索するための検索キーワードとして生成する。 In step S110, the search keyword generation unit 204 generates a combination of the predicate normalized in step S106 and the content word extracted in step S108 as a search keyword for searching for an answer to the question sentence. To do.

次に、ステップＳ１１２において、回答タイプ判定部２０５によって、上記ステップＳ１０２による形態素解析結果に基づいて、質問文に対応する回答タイプを判定する。 Next, in step S112, the answer type determination unit 205 determines the answer type corresponding to the question sentence based on the morphological analysis result in step S102.

そして、ステップＳ１１４において、情報検索部２１によって、上記ステップＳ１１０で生成された検索キーワードを基にインターネット検索を行い、得られた検索結果から、検索キーワードを含むスニペットのうち、検索キーワードを多く含む上位ｎ個のスニペットを取得する。 In step S114, the information search unit 21 performs an Internet search based on the search keyword generated in step S110. From the obtained search results, the snippet including the search keyword includes a high number of search keywords. Get n snippets.

ステップＳ１１６において、回答候補抽出部２２によって、上記ステップＳ１１４で取得されたｎ個のスニペットと上記ステップＳ１１２で判定された回答タイプとを基に、回答候補を抽出する。 In step S116, the answer candidate extraction unit 22 extracts answer candidates based on the n snippets acquired in step S114 and the answer type determined in step S112.

次に、ステップＳ１１８において、回答候補評価部２３によって、上記ステップＳ１１６で得られた複数の回答候補のうち、最も回答に適したものを選択する。 Next, in step S118, the answer candidate evaluation unit 23 selects the answer candidate most suitable for the answer from among the plurality of answer candidates obtained in step S116.

そして、ステップＳ１２０において、出力部３によって、上記ステップＳ１１８で選択された回答候補を結果として出力して、質問回答処理ルーチンを終了する。 In step S120, the output unit 3 outputs the answer candidate selected in step S118 as a result, and the question answering process routine is terminated.

以上説明したように、本実施の形態に係る質問応答装置１００によれば、質問文の述部の意味に影響を与えないように、機能表現を単純な形に変換して述部を正規化し、正規化された述部と、質問文に含まれる内容語との組み合わせを、質問文に対する回答を検索するための検索キーワードとして生成することにより、質問文に対する回答を精度よく検索するための検索キーワードを得ることができる。 As described above, according to the question answering apparatus 100 according to the present embodiment, the function expression is converted into a simple form and the predicate is normalized so as not to affect the meaning of the predicate of the question sentence. , Search for accurately searching for answers to a question sentence by generating a combination of normalized predicates and content words contained in the question sentence as search keywords for searching the answer to the question sentence Keywords can be obtained.

また、質問文から、出来事に関連する否定、態、時制、モダリティの機能語は残し、それ以外の機能語を削除した検索キーワードを生成することにより、出来事に必要な単語を含むスニペットが優先的に検索される。 In addition, by generating a search keyword that deletes the function words of negation, state, tense, and modality related to the event and deletes other function words from the question sentence, the snippet that includes the word necessary for the event is given priority. To be searched.

また、出来事に影響しない機能表現は削除されるため、質問応答の回答にならないスニペットが検索されるリスクが低減し、その結果、質問応答の回答精度が向上する。 In addition, since the functional expression that does not affect the event is deleted, the risk of searching for a snippet that does not become an answer to the question answer is reduced, and as a result, the answer accuracy of the question answer is improved.

また、質問解析部２０によって生成された検索キーワードを使うことにより、情報検索において、必要な機能表現を含み、余計な機能表現を含まないスニペットが多くマッチし、質問応答装置が適切な回答を返すようになる。 In addition, by using the search keyword generated by the question analysis unit 20, in the information search, many snippets including necessary function expressions and not including unnecessary function expressions are matched, and the question answering apparatus returns an appropriate answer. It becomes like this.

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 Note that the present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the gist of the present invention.

例えば、質問応答装置１００における述部機能表現正規化部２０２、内容語抽出部２０３、及び回答タイプ判定部２０５は、形態素解析部２０１による形態素解析結果に基づいて処理を行う場合を例に説明したが、これに限定されるものではなく、例えば、入力部１により入力される質問文は形態素解析済みであっても良い。 For example, the predicate function expression normalization unit 202, the content word extraction unit 203, and the answer type determination unit 205 in the question answering apparatus 100 have been described as an example in which processing is performed based on the morpheme analysis result by the morpheme analysis unit 201. However, the present invention is not limited to this. For example, the question text input by the input unit 1 may have been subjected to morphological analysis.

上述の質問応答装置は、内部にコンピュータシステムを有しているが、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。 Although the above-described question answering apparatus has a computer system therein, the “computer system” includes a homepage providing environment (or display environment) if the WWW system is used.

また、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能である。 In the present specification, the embodiment has been described in which the program is installed in advance. However, the program can be provided by being stored in a computer-readable recording medium.

２演算部
２０質問解析部
１００質問応答装置
２０２述部機能表現正規化部
２０３内容語抽出部
２０４検索キーワード生成部 2 Calculation unit 20 Question analysis unit 100 Question answering device 202 Predicate function expression normalization unit 203 Content word extraction unit 204 Search keyword generation unit

Claims

A query response search keyword generation method in a query response search keyword generation device including a predicate function expression normalization means, a content word extraction means, and a search keyword generation means,
Based on the morphological analysis result of the question sentence input by the pre-description unit function expression normalization means, the content word included in the question sentence and a character string following the content word and at least one function word For predicates consisting of combinations of functional expressions that are character strings containing , leave the predetermined functional words that represent negation, functional words that represent states, functional words that represent tense, and functional words that represent modalities. A function word representing negation, a function word representing a state, a function word representing tense, and a function word other than a function word representing a modality, thereby normalizing the previous description part; and
A step of extracting a content word from a portion other than a predicate of the question sentence based on a morphological analysis result of the question sentence by the content word extraction means;
A search for an answer to the question sentence by a combination of the predicate normalized by the pre-description part functional expression normalizing means and the content word extracted by the content word extracting means by the search keyword generating means Generating as search keywords,
For generating query keywords for question answering including

The step of normalizing by the pre-description part functional expression normalizing means is composed of a combination of the content word and the functional expression included in the question sentence based on the morphological analysis result of the inputted question sentence The predicate part is normalized by deleting function words that do not affect the meaning of the previous description part and redundant function words and converting the function expression into a simple form. Search keyword generation method for question answering.

The step of normalizing by the pre-description part functional expression normalizing means includes a predetermined functional word other than the functional word representing the negative, the functional word representing the state, the functional word representing the tense, and the functional word representing the modality. The method for generating a search keyword for question answering according to claim 2, wherein the keyword is deleted as a function word that does not affect the meaning of the previous description part.

Based on the morphological analysis result of the input question sentence, the content word included in the question sentence, and a functional expression that is a character string that follows the content word and includes at least one functional word For predicates that are composed of combinations, function words that represent negation, function words that represent negation, function words that represent states, function words that represent tense, and function words that represent modalities remain, and represent function words and states that represent negation. A predicate function expression normalizing means for normalizing the previous description part by deleting function words other than function words, function words representing tense, and function words representing modality ;
Based on the result of the morphological analysis of the question sentence, content word extracting means for extracting a content word from a part other than the predicate of the question sentence;
A search keyword for generating a combination of the predicate normalized by the predescription part functional expression normalizing means and the content word extracted by the content word extracting means as a search keyword for searching for an answer to the question sentence Generating means;
Search keyword generating device for question answering including

The pre-description part functional expression normalization means performs pre-description on a predicate composed of a combination of the content word and the functional expression included in the question sentence based on the morphological analysis result of the input question sentence. The query keyword for question answering according to claim 4, wherein the previous description part is normalized by deleting function words that do not affect the meaning of the description part and redundant function words and converting the function expression into a simple form. Generator.

The pre-description part function expression normalization means predetermines the function words other than the function word representing negation, the function word representing the state, the function word representing the tense, and the function word representing the modality. The search keyword generation device for question response according to claim 5, wherein the keyword is deleted as a function word that does not affect the function.

The program for making a computer perform each step which comprises the search keyword production | generation method for question answering in any one of Claims 1-3.