JP4512826B2

JP4512826B2 - Question answering system

Info

Publication number: JP4512826B2
Application number: JP2005058390A
Authority: JP
Inventors: 敦藤井; 徹也石川; 英理三原
Original assignee: 国立大学法人筑波大学
Priority date: 2005-03-03
Filing date: 2005-03-03
Publication date: 2010-07-28
Anticipated expiration: 2025-03-03
Also published as: JP2006244102A

Description

本発明は、何らかの困った状況にあるユーザからの質問に対して、解決手段となる行動を回答する質問応答システムに関するものである。 The present invention relates to a question answering system that answers an action as a solution to a question from a user in some troubled situation.

インターネットの普及により、Ｗｅｂ（ウェブ）〔ＷｏｒｌｄＷｉｄｅＷｅｂの略〕上には多種多様かつ膨大な情報が存在する。そのため、日常的な疑問に対し、検索エンジンを用いてウェブ上の情報から回答を探す機会が増えている。しかし、既存の検索エンジンを使うためには、質問の内容をキーワードで的確に表現する必要がある。さらに、検索された膨大なページから回答となる語句や文章をユーザ自身が探さなければならない。 Due to the spread of the Internet, there is a great variety of information on the Web (abbreviation of World Wide Web). For this reason, there are increasing opportunities to search for answers from information on the web using search engines for everyday questions. However, in order to use an existing search engine, it is necessary to accurately express the contents of the question with keywords. Furthermore, the user himself / herself has to search for a word or a sentence to be answered from a huge number of searched pages.

ところで、自然言語による質問文に対して回答するために、質問応答システムが研究されている。既存の研究は、限定された領域に関する知識や規則によって推論を行うもの（推論型システム）と、組織化されていない文書集合から回答を抽出するもの（抽出型システム）との２種類に大別される。近年は後者の抽出型システムが主に研究されている。
以下、上記した従来技術について詳細に説明する。 By the way, a question answering system has been studied in order to answer a question sentence in a natural language. Existing research is broadly divided into two types: those that make inferences based on limited domain knowledge and rules (inference type systems) and those that extract responses from unorganized document sets (extraction type systems). Is done. In recent years, the latter extraction system has been mainly studied.
Hereinafter, the above-described conventional technology will be described in detail.

人工知能分野における質問応答の研究は１９６０年代に遡る。具体的には、エキスパート・システムや対話システムであり、推論によって回答する「推論型システム」である。これらは、システムとユーザとの対話を通して質問・回答を洗練させたり、推論によって複数の事実から回答を導き出したりするという特長がある。しかし、領域知識や推論規則を人手で構築する必要があるため、回答できる分野が限定され、拡張性に乏しいといった問題があった。 Research on question answering in the field of artificial intelligence dates back to the 1960s. Specifically, it is an expert system or a dialogue system, which is an “inference type system” that answers by inference. These have the feature of refining questions and answers through dialogue between the system and the user, and deriving answers from multiple facts by inference. However, since domain knowledge and inference rules need to be constructed manually, there are problems such as limited fields that can be answered and poor expandability.

一方、自然言語処理や情報検索分野では、１９９０年代後半からＴＲＥＣ（ＴｅｘｔＲＥｔｒｉｅｖａｌＣｏｎｆｅｒｅｎｃｅ：ｈｔｔｐ：／／ｔｒｅｃ．ｎｉｓｔ．ｇｏｖ／）やＮＴＣＩＲ（情報検索システム評価用テストコレクション構築プロジェクト）で質問応答の研究が盛んになった。人工知能分野における推論型システムと異なり、ここでは、「抽出型システム」が主な研究対象である。組織化されていない文書集合を情報源として用いるため、領域知識や推論規則の構築が不要な点で推論型システムとは異なる。 On the other hand, in the field of natural language processing and information retrieval, since the late 1990s, research on questions and answers has been conducted in TREC (Text Retrieval Conference: http://trec.nist.gov/) and NTCIR (Information Collection System Evaluation Test Collection Project). Became popular. Unlike inference systems in the field of artificial intelligence, “extraction systems” are the main research subjects here. Since an unorganized document set is used as an information source, it differs from an inference type system in that domain knowledge and inference rules need not be constructed.

このようなＴＲＥＣやＮＴＣＩＲにおける質問応答システムの回答対象は客観的事実（ｆａｃｔｏｉｄ）である。ユーザが入力した、ｗｈｏ、ｗｈｅｎ、ｗｈｅｒｅ、ｗｈａｔを中心とするＷＨ疑問文に対し、名詞で回答する。客観的事実を回答とするシステムとしては、ｇｏｏラボ日本語自然文検索（ｈｔｔｐ：／／ｌａｂｓ．ｎｔｔｒｄ．ｃｏｍ／）やＭＵＲＡＸ（下記非特許文献１参照）がある。ＭＵＲＡＸは入力された質問文を解析することで、何についての質問で、回答はどのようなものかという仮説を立てる。その仮説に基づいてクエリ構築や検索を行い、文書集合から該当箇所を探して回答する。 The answer target of the question answering system in TREC and NTCIR is an objective fact. Answers with nouns to WH question sentences centered on who, when, where, and what entered by the user. Examples of systems that use objective facts as answers include goo lab Japanese natural language search (http://labs.nttrd.com/) and MURAX (see Non-Patent Document 1 below). MURAX analyzes the input question sentence to make a hypothesis about what question it is and what the answer is. Based on the hypothesis, a query is constructed and searched, and the corresponding part is searched from the document set and answered.

また、下記非特許文献２では、行動を問う「ｈｏｗ型質問」への回答を目的とし、知識獲得手法が提案されている。ｌｉｎｕｘユーザのメーリングリストに投稿された質問メール・回答メールの本文から、重要文を抽出することで、方法や対処法を問う質問に回答するための知識を得ている。しかし、本手法では対象となる領域が限定されている。
ＪｕｌｉａｎＫｕｐｉｅｃ．ＭＵＲＡＸ：ＦｉｎｄｉｎｇａｎｄＯｒｇａｎｉｚｉｎｇＡｎｓｗｅｒｓｆｒｏｍＴｅｘｔＳｅａｒｃｈ．ＩｎＴ．Ｓｔｒｚａｌｋｏｗｓｋｙ，ｅｄｉｔｏｒ，ＮａｔｕｒａｌＬａｎｇｕａｇｅＩｎｆｏｒｍａｔｉｏｎＲｅｔｒｉｅｖａｌ，ｐｐ．３１１−３３２．ＫｌｕｗｅｒＡｖａｄｅｍｉｃＰｕｂｌｉｓｈｅｒｓ，１９９９．渡辺靖彦，園和也，岡田至弘．「メーリングリストを利用した質問応答システムのための知識獲得」，情報処理学会研究報告，２００４−ＮＬ−１６２，ｐｐ．１３１−１３８，２００４．徳永健伸，岩山真，乾健太郎，田中穂積．「日本語語順の推定モデルとその応用」，情報処理学会研究報告，９１−ＮＬ−８１，ｐｐ．９−１６，１９９１． Non-Patent Document 2 below proposes a knowledge acquisition method for the purpose of answering a “how type question” for asking behavior. By extracting important sentences from the text of question mails / answer mails posted on the mailing list of Linux users, knowledge for answering questions about methods and countermeasures is obtained. However, in this method, the target area is limited.
Julian Kupiec. MURAX: Finding and Organizing Answers from Text Search. In T. Strzalkowski, editor, Natural Language Information Retrieval, pp. 311-332. Kluwer Avademic Publishers, 1999. Yasuhiko Watanabe, Kazuya Sono and Yoshihiro Okada. “Acquisition of Knowledge for Question Answering System Using Mailing List”, Research Report of Information Processing Society of Japan, 2004-NL-162, pp. 131-138, 2004. Takenobu Tokunaga, Makoto Iwayama, Kentaro Inui, Hozumi Tanaka. "Japanese word order estimation model and its application", Information Processing Society of Japan, 91-NL-81, pp. 9-16, 1991.

上記抽出型システムは、ｗｈｏ、ｗｈｅｎ、ｗｈｅｒｅ、ｗｈａｔを中心とするＷＨ疑問文に対し、名詞で回答するものが主流であるため、必ずしもユーザが知りたい回答（特に行動を表す表現）と一致するとは限らない。また、行動を問うｈｏｗ型質問に対応した研究は、コンピュータの利用法などに対象領域が限定されている。
本発明は、上記状況に鑑みて、抽出型の質問応答システムをウェブに応用し、行動を問うｈｏｗ型質問に領域を限定することなく回答することができる質問応答システムを提供する。つまり、何らかの困った状況にあるユーザからの質問に対して、取るべき行動を示唆するヘルプデスク指向の質問応答システムを提供する。 The above-mentioned extraction type system mainly responds with nouns to WH question sentences centering on who, when, where, and what, and therefore it always matches the answer that the user wants to know (especially an expression that expresses behavior). Is not limited. In addition, the research corresponding to the how-type question that asks for behavior is limited to the usage of computers and the like.
In view of the above circumstances, the present invention provides a question answering system that can apply an extraction type question answering system to the web and answer a how type question that asks an action without limiting the area. That is, it provides a help desk-oriented question answering system that suggests an action to be taken in response to a question from a user in some troubled situation.

本発明は、上記目的を達成するために、
〔１〕質問応答システムにおいて、ウェブに接続され、ウェブページを検索し頻出する質問表現をよくある質問として抽出する質問抽出部と、この質問抽出部に接続されるとともに、ウェブに接続され、検索エンジンを用いて前記質問抽出部で抽出されたよくある質問に関連するページを網羅的に検索する検索部と、この検索部に接続され、前記検索されたページから前記よくある質問に対する回答となる行動を表す表現と、この行動を表す表現を含む複数の段落を抽出する回答抽出部と、この回答抽出部に接続され、前記回答抽出部で抽出された複数の段落を、同じ行動を表す表現を含むグループごとに分類する回答組織化部と、前記質問抽出部で抽出されたよくある質問と前記回答組織化部で分類された回答とを対にして予め記憶しておくＦＡＱデータベース部と、質問を入力し、回答が出力されるユーザインタフェース装置と、前記ＦＡＱデータベース部及び前記ユーザインタフェース装置に接続され、前記ＦＡＱデータベース部に記憶されたよくある質問に、前記ユーザインタフェース装置に入力された質問と同じ質問があるかを判別する判別部とを具備し、前記判別部で同じ質問があると判定された場合、前記ユーザインタフェース装置に入力された質問に対応する回答を、前記行動を表す表現又はこの行動を表す表現を含む段落を回答の単位として前記ＦＡＱデータベース部から提示し、前記判別部で同じ質問がないと判定された場合、前記ユーザインタフェース装置に入力された質問に対して前記検索・回答抽出・回答組織化を行って回答を生成し提示することを特徴とする。 In order to achieve the above object, the present invention provides
In [1] question answering system, it is connected to the web, and the question extraction unit that extracts as a FAQ question representation you search frequently a web page, is connected to the question extractor, connected to the web, a search unit to exhaustively search the page associated with frequently asked questions extracted by the query extractor with search engine, connected to the search unit, answers to the frequently asked questions from the retrieved page a representation of the become action, an answer extraction unit for extracting a plurality of paragraphs including representing this action is connected to the answer extraction unit, a plurality of paragraphs extracted by the answer extraction section, the same action an answer organization unit for classifying for each group containing a representation of, our previously stored in the questions extracted well extracted by the unit questions and the answer pairs and classified answered organized unit And FAQ database unit, enter the question, and a user interface device reply is output, is connected to the FAQ database unit and the user interface device, often questions stored in the FAQ database unit, the user interface device A determination unit that determines whether there is a question that is the same as the question input to the answer, and when the determination unit determines that there is the same question, an answer corresponding to the question input to the user interface device, A question input to the user interface device when the expression representing the action or a paragraph including the expression representing the action is presented as a unit of an answer from the FAQ database unit, and the determination unit determines that there is no same question Japanese to the retrieval and answer extraction and replies organizing performed to generate the answer presented to To.

〔２〕上記〔１〕記載の質問応答システムにおいて、前記段落の抽出には文字数に関する閾値を併用し、１２０文字以上２００文字以下を段落として抽出することを特徴とする。
〔３〕上記〔１〕記載の質問応答システムにおいて、前記回答抽出部は、前記抽出された複数の段落に含まれる前記行動を表す表現にスコアを付け、各段落に含まれる前記行動を表す表現のスコアを総和することで、前記各段落のスコア付けを行うことを特徴とする。 [2] In the above [1], wherein question answering system, a combination of threshold for number of characters in the extraction of the paragraph, and extracting the following 120 characters or more 200 characters as a paragraph.
[3] In the above [1], wherein the question answering systems, the answer extraction unit scored representation of the action included in the extracted multiple paragraphs, representation of the action contained in each paragraph The scores of the paragraphs are scored by summing the scores .

〔４〕上記〔３〕記載の質問応答システムにおいて、前記行動を表す表現のスコアは、回答として適切な行動を表す表現を選択するため、（ａ）係り受けの距離が近い、（ｂ）推奨表現と禁止表現を伴っている、（ｃ）抽出元ページの検索における順位が高い、（ｄ）質問に含まれる行動を表す表現との距離が近い、（ｅ）質問者（ユーザ）がすべき行動である、上記５つの基準を用いてスコア付けすることを特徴とする。 [4] In the above [3], wherein the question answering systems, the score of representation of the action, to select a representation of the appropriate action as an answer, the short distance of the receiving (a) relates, (b) Recommended It is accompanied by expressions and prohibited expressions, (c) high in rank in the search of the source page, (d) close to the expression representing the action included in the question, (e) the questioner (user) should a behavior, characterized Rukoto to be scored using the five criteria.

本発明によれば、以下のような効果を奏することができる。
（ａ）領域を限定することなく、ｈｏｗ型質問に回答することができる。つまり、何らかの困った状況にあるユーザに対して、取るべき行動を示唆するヘルプデスク指向の質問応答システムを提供する。
（ｂ）よくある質問とそれに対する回答を事前に記憶するＦＡＱデータベース部を有するため、よくある質問に対しては迅速なる回答を提示することができる。 According to the present invention, the following effects can be achieved.
(A) It is possible to answer a how-type question without limiting the area. That is, a help desk-oriented question answering system that suggests an action to be taken is provided to a user in some troubled situation.
(B) Since the FAQ database unit for storing frequently asked questions and the answers to the frequently asked questions is provided in advance, quick answers can be presented to the frequently asked questions.

本発明の質問応答システムは、ウェブに接続され、ウェブページを検索し頻出する質問表現をよくある質問として抽出する質問抽出部と、この質問抽出部に接続されるとともに、ウェブに接続され、検索エンジンを用いて前記質問抽出部で抽出されたよくある質問に関連するページを網羅的に検索する検索部と、この検索部に接続され、前記検索されたページから前記よくある質問に対する回答となる行動を表す表現と、この行動を表す表現を含む複数の段落を抽出する回答抽出部と、この回答抽出部に接続され、前記回答抽出部で抽出された複数の段落を、同じ行動を表す表現を含むグループごとに分類する回答組織化部と、前記質問抽出部で抽出されたよくある質問と前記回答組織化部で分類された回答とを対にして予め記憶しておくＦＡＱデータベース部と、質問を入力し、回答が出力されるユーザインタフェース装置と、前記ＦＡＱデータベース部及び前記ユーザインタフェース装置に接続され、前記ＦＡＱデータベース部に記憶されたよくある質問に、前記ユーザインタフェース装置に入力された質問と同じ質問があるかを判別する判別部とを具備し、前記判別部で同じ質問があると判定された場合、前記ユーザインタフェース装置に入力された質問に対応する回答を、前記行動を表す表現又はこの行動を表す表現を含む段落を回答の単位として前記ＦＡＱデータベース部から提示し、前記判別部で同じ質問がないと判定された場合、前記ユーザインタフェース装置に入力された質問に対して前記検索・回答抽出・回答組織化を行って回答を生成し提示する。 Question answering system of the present invention is connected to the web, and the question extraction unit that extracts as a FAQ question representation you search frequently a web page, is connected to the question extractor, connected to the web, a search unit to exhaustively search the page associated with frequently asked questions extracted by the query extractor with search engine, connected to the search unit, answers to the frequently asked questions from the retrieved page a representation of the become action, an answer extraction unit for extracting a plurality of paragraphs including representing this action is connected to the answer extraction unit, a plurality of paragraphs extracted by the answer extraction section, the same action and answers organized unit which classifies every group that contains expressions that describe and stored in advance in the question extracted well extracted by the unit questions and the answer pairs and classified answered organized unit F And Q database unit, enter the question, and a user interface device reply is output, is connected to the FAQ database unit and the user interface device, often questions stored in the FAQ database unit, the user interface device A determination unit that determines whether there is a question that is the same as the question input to the answer, and when the determination unit determines that there is the same question, an answer corresponding to the question input to the user interface device, A question input to the user interface device when the expression representing the action or a paragraph including the expression representing the action is presented as a unit of an answer from the FAQ database unit, and the determination unit determines that there is no same question In response to the above search, answer extraction and answer organization, an answer is generated and presented .

以下、本発明の実施の形態について詳細に説明する。
本発明のシステムは、入力された質問文に関連するウェブページを検索し、抽出した回答を分類して提示する。対象とするのは、ｗｈａｔ−ｉｆ、ｗｈａｔ−ｔｏ−ｄｏ、ｈｏｗ−ｔｏ−ｄｏ型の行動を問う質問である。これらの質問に回答するため、名詞句と動詞からなる「行動を表す表現（以下、単に行動表現と記す）」に着目する。例えば、「蜂に刺されたらどうすればよい？」という質問には「針を抜く」、「流水で洗う」、「軟膏を塗る」などの行動表現と、その行動表現を含む段落を回答する。 Hereinafter, embodiments of the present invention will be described in detail.
The system of the present invention searches a web page related to an inputted question sentence, classifies and presents the extracted answers. The questions are questions that ask what-if, what-to-do, and how-to-do behaviors. In order to answer these questions, we focus on “ expressions that represent actions (hereinafter simply referred to as action expressions) ” consisting of noun phrases and verbs. For example, to the question “What should I do if I am stung by a bee?”, The behavioral expressions such as “pull out the needle”, “wash with running water”, and “apply ointment” and the paragraph including the behavioral expression are answered.

図１は本発明の質問応答システムの構成図である。
この図において、１はユーザインタフェース装置、２は質問（質問文：質問情報）、３は回答（回答文：回答情報）、１０は事前処理部、１１は回答部、１２はＦＡＱデータベース部、１３はウェブ、１４は質問抽出部、１５はよくある質問か否かを判別する判別部、１６は検索部、１７は回答抽出部、１８は回答組織化部である。 FIG. 1 is a configuration diagram of a question answering system of the present invention.
In this figure, 1 is a user interface device, 2 is a question (question text: question information), 3 is an answer (answer text: answer information), 10 is a preprocessing section, 11 is a response section, 12 is a FAQ database section, 13 Is a web, 14 is a question extracting unit, 15 is a discriminating unit for discriminating whether or not it is a common question, 16 is a searching unit, 17 is an answer extracting unit, and 18 is an answer organizing unit.

本システムでは、検索されたページを解析して回答を抽出することになるため、通常のウェブ検索よりも応答に時間がかかる。そこで、本発明では、聞かれそうな質問とそれに対する回答を事前にＦＡＱ（ＦｒｅｑｕｅｎｔｌｙＡｓｋｅｄＱｕｅｓｔｉｏｎｓ：頻繁に尋ねられる質問）としてデータベース化することで、迅速な回答を可能にする。
まず、ＦＡＱデータベースを構築する事前処理（オフライン処理）について説明する。ここでは、質問抽出部１４による「質問抽出」、検索部１６による「検索」、回答抽出部１７による「回答抽出」、回答組織化部１８による「回答組織化」の処理を段階的に行う。 In this system, since the answer is extracted by analyzing the searched page, it takes more time to respond than a normal web search. Therefore, in the present invention, a question that is likely to be asked and a response to the question are databased in advance as FAQ (Frequently Asked Questions), thereby enabling a quick answer.
First, pre-processing (offline processing) for constructing the FAQ database will be described. Here, "question extraction" by the question extraction unit 14, "search" by the search unit 16, "answer extraction" by the answer extraction unit 17, and "answer organization" by the answer organization unit 18 are performed step by step.

まず、質問抽出部１４は、ウェブ１３上に頻出する質問を抽出する。次に、検索部１６は、既存の検索エンジンを用いて、質問抽出部１４により抽出された質問に関連するページを網羅的に検索する。次に、回答抽出部１７は、検索部１６によって検索されたページから行動表現を含む段落を抽出する。次に、回答組織化部１８は、含まれる行動表現によってこの抽出された段落を分類し、分類された回答と質問を対にしてＦＡＱデータベース部１２に蓄積する。 First, the question extraction unit 14 extracts questions that frequently appear on the web 13. Next, the search unit 16 comprehensively searches for pages related to the question extracted by the question extraction unit 14 using an existing search engine. Next, the answer extraction unit 17 extracts a paragraph including the action expression from the page searched by the search unit 16. Next, the answer organizing unit 18 classifies the extracted paragraph based on the included action expression, and accumulates the classified answers and questions in pairs in the FAQ database unit 12.

次に、ユーザがシステムを利用する場合のオンライン処理について説明する。ユーザがユーザインタフェース装置１に質問２を入力すると、ＦＡＱデータベース部１２内に同じ質問がないか判別部１５で検索する。同じ質問があれば、それに対応する回答をＦＡＱデータベース部１２より取り出し提示する。同じ質問がなければ、事前処理と同様に、検索部１６で検索、回答抽出部１７で回答抽出、回答組織化部１８で回答組織化を行い、動的に作成された回答を回答部１１に生成して提示する。 Next, online processing when the user uses the system will be described. When the user inputs the question 2 to the user interface device 1, the discrimination unit 15 searches for the same question in the FAQ database unit 12. If there is the same question, the corresponding answer is extracted from the FAQ database unit 12 and presented. If there is no same question, the search unit 16 searches, the answer extraction unit 17 extracts the answers, the answer organization unit 18 organizes the answers, and the dynamically created answers are sent to the answer unit 11 as in the pre-processing. Generate and present.

図２はその回答を示す画面例である。
その回答は、図２に示すような画面で提示する。これは「蜂に刺されたらどうすればいい？」という質問に対する回答画面例である。「病院へ行く」という見出し表現は組織化後のグループを代表する行動表現で、その下の「針を抜く」、「診察を受ける」などは、このグループに含まれる他の行動表現である。それらの行動表現を含む段落と、抽出元のページへのリンクを併記し、詳細情報を取得しやすくしている。 FIG. 2 is a screen example showing the answer.
The answer is presented on a screen as shown in FIG. This is an example of an answer screen for the question “What should I do if a bee stings?” The heading expression “going to the hospital” is an action expression that represents the group after organizing, and “pull out the needle” and “get a medical examination” below it are other action expressions included in this group. The paragraphs containing these behavioral expressions and a link to the page from which they are extracted are written together to make it easier to obtain detailed information.

すなわち、本システムの「回答」とは、「病院へ行く」のような行動表現の句であり、また同時に、行動表現を含む段落のような文章でもある。ユーザは利用環境などによって、行動表現のグループ一覧やグループ内の段落一覧など、表示内容を選択することができる。
以下、質問抽出、検索、回答抽出、回答組織化の各処理について説明する。 That is, the “answer” of the present system is a phrase of action expression such as “go to hospital” and, at the same time, a sentence like a paragraph including the action expression. The user can select display contents such as a group list of action expressions and a list of paragraphs in the group depending on the usage environment.
Hereinafter, each process of question extraction, search, answer extraction, and answer organization will be described.

（１）質問抽出
ウェブ１３上には、ＦＡＱサイトに用意された質問や、個人の日記サイトやウェブログに書かれた素朴な疑問まで、様々な質問表現が潜在するため、これらを収集する。
「○○が△△したら、どうする？」、「○○を△△したら？」のような表現を単語や品詞で規則化し、該当するフレーズや文を抽出する。具体的には『名詞の後に助詞（が・に・を）、動詞（〜したら）、と続き、さらに疑問符や「（、）どうする」が続く』という規則を満たす表現を抽出する。 (1) Question extraction Since various question expressions are latent on the web 13, such as questions prepared on FAQ sites and simple questions written on personal diary sites and web logs, these are collected.
Expressions such as “What happens when XX is △?” And “What happens when XX is △△?” Are regularized with words and parts of speech, and the corresponding phrases and sentences are extracted. Specifically, an expression satisfying the rule that “noun is followed by a particle (ga.ni.)”, verb (˜after), followed by a question mark or “(,) what to do” is extracted.

このとき、形態素も用いるため、表層表現に一致しない語句も抽出することができる。他方において、質問以外の表現（ノイズ）が検出される可能性がある。しかし、仮にＦＡＱにノイズが含まれても、ユーザが当該ノイズを含む同様の質問をしなければ実害はない。そこで、質問抽出では精度（ｐｒｅｃｉｓｉｏｎ）よりも再現率（ｒｅｃａｌｌ）を重視する。 At this time, since morphemes are also used, it is possible to extract words that do not match the surface expression. On the other hand, expressions (noise) other than questions may be detected. However, even if the FAQ contains noise, there is no real harm unless the user asks a similar question including the noise. Therefore, in the question extraction, the recall (recall) is more important than the precision (precise).

抽出元データにはＮＴＣＩＲ（ｈｔｔｐ：／／ｒｅｓｅａｒｃｈ．ｎｉｉ．ａｃ．ｊｐ／ｎｔｃｉｒ／ｉｎｄｅｘ−ｊａ．ｈｔｍｌ）のウェブコレクションを使用した。これはＪＰドメインから１１，０３８，７２０のページ（約１００ＧＢ）を収集したコレクションである。その結果、６９６１のフレーズが抽出された。
（２）検索
ウェブ１３検索には検索サイトＧｏｏｇｌｅを利用し、入力された質問文をそのままクエリとして用いる。しかし、実験の規模を拡張するためには、質問応答に適した検索エンジンを独自に開発する必要がある。 NTCIR (http://research.nii.ac.jp/ntcir/index-ja.html) web collection was used as extraction source data. This is a collection of 11,038,720 pages (about 100 GB) collected from the JP domain. As a result, 6961 phrases were extracted.
(2) Search The search site Google is used for the web 13 search, and the inputted question sentence is used as it is as a query. However, in order to expand the scale of the experiment, it is necessary to independently develop a search engine suitable for question answering.

（３）回答抽出
回答抽出部１７は、検索部１６で検索されたページ群から行動表現を含む回答となる段落を抽出し、スコア付けを行う。実際のウェブ１３上にはＰＤＦファイルなども存在するが、本発明では本文のレイアウト解析を行う必要があるためＨＴＭＬファイルのみを対象とする。 (3) Answer Extraction The answer extraction unit 17 extracts a paragraph that becomes an answer including an action expression from the page group searched by the search unit 16 and performs scoring. Although there are PDF files and the like on the actual web 13, in the present invention, since it is necessary to perform a layout analysis of the text, only HTML files are targeted.

以下、詳しい抽出手順を説明する。
まず、検索された各ページから、一定の長さを持つ段落を抽出する。これは、単語や文の単位で抽出した回答では、行動を説明するための情報が少なく、逆にページ単位では不要な情報が含まれやすいためである。段落の判別には、ＨＴＭＬタグの中でも＜Ｐ＞と＜ＢＲ＞を用いた。 Hereinafter, a detailed extraction procedure will be described.
First, a paragraph having a certain length is extracted from each retrieved page. This is because the answers extracted in units of words and sentences have little information for explaining the behavior, and conversely, unnecessary information is likely to be included in units of pages. For distinguishing paragraphs, <P> and <BR> were used among HTML tags.

タグの使い方はページ作成者によって様々であり、タグを判別に用いるだけでは段落の長さにばらつきが出てしまう。例えば、段落を示す＜Ｐ＞を改行代わりに用いているページでは、各文が段落となってしまい情報が不足する。逆に、非常に長い段落を読むことはユーザの負担が大きい。そこで、文字数に関する閾値を併用し、１２０文字以上２００文字以下を段落とした。 The usage of tags varies depending on the page creator, and variations in the length of paragraphs occur only by using tags for discrimination. For example, in a page using <P> indicating a paragraph instead of a line feed, each sentence becomes a paragraph and information is insufficient. Conversely, reading very long paragraphs is burdensome for the user. Therefore, a threshold value regarding the number of characters is used in combination, and 120 to 200 characters are defined as a paragraph.

次に、回答としての適切さによって段落にスコア付けを行う。本発明のシステムでは行動表現が回答の最小単位となるため、行動表現がより多く含まれている段落が回答として重要である。そのため、行動表現を形式化し、スコア付けに用いた。
段落のスコア付けでは、まず、検索されたページに出現する個々の行動表現にスコアを付け、段落内に含まれる行動表現の各スコアを総和する。 Next, the paragraph is scored according to its appropriateness as an answer. Since the action expression is the minimum unit answers the system of the present invention, paragraph Line Dohyo current is contained more is important as the answer. Therefore, the action expression was formalized and used for scoring.
In the scoring of a paragraph, first, a score is assigned to each behavioral expression appearing on the searched page, and the scores of the behavioral expressions included in the paragraph are summed up.

ここで、行動表現について説明する。
Here, behavioral expressions will be described.

を用いて、検索されたページのテキスト部分について係り受け解析を行う。そこから「名詞＋助詞＋動詞」という係り受け構造を抽出し、行動表現の最小単位とする。例えば「薬を患部に塗る。」という文の場合は「薬＋を＋塗る」と「患部＋に＋塗る」の２フレーズが得られる。ここで「薬を塗る」ならば「薬：塗る」のように助詞を省き、これを行動表現とする。 Is used to perform dependency analysis on the text portion of the retrieved page. From this, a dependency structure of “noun + particle + verb” is extracted and set as the minimum unit of action expression. For example, in the case of the sentence “Apply medicine to the affected area”, two phrases “Apply + drug +” and “Apply + affected area +” are obtained. Here, if “drug is applied”, the particle is omitted as in “drug: apply” and this is used as an action expression.

助詞を省くのは「蜂に人が刺される」と「蜂が人を刺す」のように、格交代により助詞が変化しても「誰が何をどうする」という行動内容は変わらず、同じ行動表現とみなすためである。
本手法は、「薬を患部に塗る」のように複数の名詞句（名詞＋助詞）が動詞に係った表現を行動表現として扱うこともできる。しかし、名詞句が複数あった場合「薬を塗る」と「薬を患部に塗る」が別の行動表現とみなされるという問題が生じる。そこで、実験では名詞句をひとつに限定した。 Even if the particle changes due to alternation, the action content of “who will do what” does not change, and the same action expression is omitted, such as “a person stings a bee” and “a bee stings a person” Because it is considered.
This method can also treat an expression in which a plurality of noun phrases (nouns + particles) are related to a verb, such as “paint medicine on the affected area”, as an action expression. However, when there are a plurality of noun phrases, there is a problem that “painting medicine” and “painting medicine on the affected part” are regarded as different behavior expressions. Therefore, the experiment limited the noun phrase to one.

行動表現のスコア付けでは、回答として適切な行動表現を選択するため、次の５つの基準を用いる。
（ａ）係り受けの距離が近い。
係り受けの距離とは、名詞句と動詞の間にある形態素数のことである。日本語の場合、文末に近い要素ほど動詞の意味を限定する性質がある（上記非特許文献３参照）。また、距離が近いほど係り受け解析の誤りが少ないという利点もある。 In scoring action expressions, the following five criteria are used to select an appropriate action expression as an answer.
(A) The dependency distance is short.
The dependency distance is the number of morphemes between the noun phrase and the verb. In the case of Japanese, an element closer to the end of the sentence has the property of limiting the meaning of the verb (see Non-Patent Document 3 above). In addition, there is an advantage that the shorter the distance, the fewer errors in dependency analysis.

（ｂ）「〜すること」、「〜しましょう」といった推奨表現と、「〜してはいけない」といった禁止表現を伴っている。
推奨表現は問題解決に有効な対処法を述べる時に用いられる。また、禁止表現は行ってはならない対処法を述べる時に用いられるため、推奨表現と同等に有用である。例えば、「蜂に刺されたら？」の質問例では「薬を塗りましょう」、「病院へ行くこと」などが推奨表現であり、「アンモニアをかけてはいけない」、「毒袋をつぶさないでください」などが禁止表現である。このような表現を正規表現によって照合し、検出する。 (B) It is accompanied by a recommended expression such as “to do” and “to do” and a prohibited expression such as “do not do”.
Recommended expressions are used to describe effective countermeasures for problem solving. In addition, because prohibited expressions are used when describing measures that should not be performed, they are as useful as recommended expressions. For example, in the example of the question “If you are stung by a bee?”, “Let's apply medicine”, “Go to the hospital”, etc. are recommended expressions, “Do not put ammonia”, “Do not crush poison bags” "Please" is a prohibited expression. Such expressions are collated and detected by regular expressions.

（ｃ）抽出元ページの検索における順位が高い。
抽出元ページとは、行動表現が抽出されたページである。現在、検索にはＧｏｏｇｌｅを用いているため、抽出元ページの順位とはＧｏｏｇｌｅの検索順位を意味する。ＧｏｏｇｌｅはＰａｇｅＲａｎｋにより、他ページから多くリンクされているページは高順位とする。そのため、信頼性を部分的に取り入れることができる。なお、このスコアは以下の式で計算する。 (C) The rank in the search of the extraction source page is high.
The extraction source page is a page from which action expressions are extracted. Since Google is currently used for the search, the rank of the extraction source page means the search rank of Google. Google uses PageRank, and pages that are frequently linked from other pages are ranked high. Therefore, it is possible to partially incorporate reliability. This score is calculated by the following formula.

（検索ページ数−順位）／検索ページ数
（ｄ）段落内で質問に含まれる行動表現との距離が近い。
これは行動表現中の動詞と質問中の動詞の間にある形態素数から判断する。行動表現中の動詞が質問中の動詞に近いほど質問との関連性が強く、対処法として重要な行動表現が出現しやすい。 (Number of search pages-rank) / number of search pages (d) The distance from the behavioral expression included in the question within the paragraph is short.
This is judged from the number of morphemes between the verb in the action expression and the verb in the question. The closer the verb in the behavioral expression is to the verb in the question, the stronger the relationship with the question, and the more important behavioral expression appears as a coping method.

（ｅ）質問者（ユーザ）がすべき行動である。
質問者が動作主となる行動を取り出すため、格助詞「ガ」（以下、ガ格）によって主語が明示されている行動表現はスコアを下げる。例えば「蜂が巣を守る。」という文であれば動作主は「蜂」である。この文に出現する行動表現「蜂が守る」、「巣を守る」はともに蜂が行う行動表現であり、質問者への回答として適切でない。 (E) Actions to be taken by the questioner (user).
In order for the questioner to extract the behavior that is the main actor, the behavioral expression in which the subject is clearly indicated by the case particle “ga” (hereinafter “ga”) lowers the score. For example, if the sentence “bee protects the nest”, the actor is “bee”. The behavioral expressions “protecting bees” and “protecting the nest” appearing in this sentence are both behavioral expressions performed by the bees and are not appropriate as answers to the questioner.

各行動表現のスコアは、上記（ａ）〜（ｅ）の基準を下記式（１）によって結合し、計算する。 The score of each action expression is calculated by combining the above criteria (a) to (e) by the following formula (1).

ここで、ｘは段落に出現する行動表現である。ａｘは係り受けの距離、ｄｘは質問との距離である。ｂｘは推奨・禁止表現の有無によって１か０を取る。ｃｘは上記（ｃ）で計算される値である。ｅｘはガ格の有無によって１か０を取る。 Here, x is an action expression that appears in a paragraph. ax is the dependency distance, and dx is the distance to the question. bx takes 1 or 0 depending on the presence / absence of recommended / prohibited expressions. cx is a value calculated in (c) above. ex takes 1 or 0 depending on the presence or absence of the case.

また、段落Ｐのスコアは下記式（２）により、Ｐに含まれる行動表現のスコアを総和して求める。 The score of paragraph P is obtained by summing up the scores of behavioral expressions included in P by the following equation (2).

（４）回答組織化
回答組織化部１８は、上記の回答抽出の手法で抽出された複数の段落を分類する。ここでは、同じ行動表現を含む段落を同じグループにまとめる。ただし、スコアの低い行動表現はノイズになるので、事前に削除する。現在は経験的に、スコアが２以上の行動表現を使用して分類を行っている。ユーザに回答を提示する際に、グループごとに最も高いスコアの段落と行動表現を提示することで、類似した複数の段落を読む手間を省くことができ、どのような回答グループがあるのかを概観しやすくなる。 (4) Answer organization The answer organization unit 18 classifies a plurality of paragraphs extracted by the above-described answer extraction method. Here, paragraphs containing the same action expression are grouped into the same group. However, behavioral expressions with low scores are noisy and are deleted in advance. At present, classification is performed based on behavioral expressions having a score of 2 or more. When presenting answers to the user, it is possible to save the trouble of reading multiple similar paragraphs by presenting the highest score paragraph and action expression for each group. It becomes easy to do.

また、同一グループに含まれる行動表現は依存関係にあり、逆に、同じグループに属さない行動表現は排他的な関係にある。このような行動表現同士の関係は、行動の順序関係や依存関係を抽出するための手がかりになる。
（評価実験）
以下、本発明の質問応答システムの評価実験を行ったので説明する。 In addition, action expressions included in the same group are in a dependency relationship, and conversely, action expressions not belonging to the same group are in an exclusive relationship. Such a relationship between behavior expressions is a clue for extracting an order relationship or dependency relationship of behavior.
(Evaluation experiment)
Hereinafter, an evaluation experiment of the question answering system according to the present invention will be described.

ｈｏｗ型の質問応答システムはほとんど研究されていないため、評価方法が確立されていない。そこで、人手でテストコレクションを作成し、本発明のシステムで最も重要な機能といえる回答抽出の機能、中でも行動表現のスコア付けに焦点を絞って評価を行った。
これは、本発明のシステムが、高スコアの行動表現を元に段落をグループ化し、グループ内で最高スコアの行動表現を代表（すなわち、図２における「病院へ行く」としてユーザに提示するため、回答として適切な行動表現のスコアが他よりも高くなることが重要となるからである。そこで、行動表現のスコア付けについて、先に提案した（ａ）〜（ｅ）の５つの基準が有効であるかという観点で評価を行った。 Since a how-type question answering system has hardly been studied, an evaluation method has not been established. Therefore, a test collection was created manually, and the evaluation was focused on the answer extraction function, which is the most important function in the system of the present invention, and in particular, the scoring of behavioral expressions.
This is because the system of the present invention groups paragraphs based on high-scoring behavioral expressions and presents the highest-scoring behavioral expression within the group to the user as representative (ie, “go to hospital” in FIG. This is because it is important that the appropriate action expression score is higher than others, so the five criteria (a) to (e) previously proposed are effective for scoring action expressions. Evaluation was performed from the viewpoint of whether or not there was.

質問は、質問抽出で抽出されたフレーズのうち、質問頻度が高い以下の１３フレーズを使用した。
（１）パスワードを忘れたら、（２）事故にあったら、（３）迷子になったら、（４）急病にかかったら、（５）盗難にあったら、（６）蜂に刺されたら、（７）怪我をしたら、（８）ニキビができたら、（９）発作がおきたら、（１０）パスポートを紛失したら、（１１）セクハラを受けたら、（１２）やけどをしたら、（１３）火事になったら
これらの質問に対して、抽出された行動表現の適否を人手で判定した。さらに、適切な表現については、禁止表現かどうかも判定し、正解として扱った。 The question used the following 13 phrases with high question frequency among the phrases extracted by question extraction.
(1) If you forget your password, (2) If you have an accident, (3) If you get lost, (4) If you have a sudden illness, (5) If you are stolen, (6) If you are stung by a bee, (7 ) If you get injured, (8) If you have acne, (9) If you have a seizure, (10) If you lose your passport, (11) If you get sexual harassment, (12) If you get burned, (13) Become a fire When these questions were asked, the appropriateness of the extracted behavioral expression was determined manually. In addition, appropriate expressions were also judged as prohibited expressions and treated as correct answers.

スコアによって順位付けた行動表現について、以下に示す精度と再現率を用い、Ｆ値を計算した。
ここで、Ｆ値（Ｆ−ｍｅａｓｕｒｅ）とは、再現率（ｒｅｃａｌｌ）と精度（ｐｒｅｃｉｓｉｏｎ）の調和平均である。再現率と精度の両方を考慮した評価尺度であり、再現率と精度が大きいほど、Ｆ値も大きくなる。 For the behavioral expressions ranked by score, F values were calculated using the accuracy and recall shown below.
Here, the F value (F-measure) is a harmonic average of a recall and a precision. This is an evaluation scale that takes into account both the recall and accuracy. The larger the recall and accuracy, the greater the F value.

今、ある質問に対する全ての可能な行動表現に対して、
「システムが回答として出力した行動表現かどうか？」、「その質問に対する正しい行動表現かどうか？」という観点で分類すると、以下のような分割表（ｃｏｎｔｉｎｇｅｎｃｙｔａｂｌｅ）を作ることができる。 Now, for all possible behavioral expressions for a question,
The following contingency table can be created by classifying from the viewpoint of “whether it is an action expression output by the system as an answer?” Or “whether it is a correct action expression for the question?”.

ここで、Ａ〜Ｄは該当するセルの行動表現件数であり、Ａ＋Ｂ＋Ｃ＋Ｄは全行動表現数になる。また、Ａ〜Ｃのセルは以下のケースに対応する。
Ａ：成功、Ｂ：失敗（抽出漏れ）、Ｃ：失敗（抽出誤り）
上記の分割表に基づいて、再現率と精度はそれぞれ以下の式で計算される。 Here, A to D are the number of action expressions in the corresponding cell, and A + B + C + D is the total number of action expressions. The cells A to C correspond to the following cases.
A: Success, B: Failure (extraction failure), C: Failure (extraction error)
Based on the above contingency table, the recall and accuracy are calculated by the following equations, respectively.

再現率＝Ａ／（Ａ＋Ｂ）
精度＝Ａ／（Ａ＋Ｃ）
Ｆ値は再現率（Ｒ）と精度（Ｐ）の調和平均なので、以下で計算される。
まず、ＲとＰの逆数に対して算術平均を計算する。 Reproducibility = A / (A + B)
Accuracy = A / (A + C)
Since the F value is a harmonic average of recall (R) and accuracy (P), it is calculated below.
First, an arithmetic average is calculated for the reciprocals of R and P.

さらに、逆数を取ると、Ｆ値＝２Ａ／（２Ａ＋Ｂ＋Ｃ）
すなわち、Ｆ値は、「成功と失敗（抽出漏れ、抽出誤り）」に対する「成功」の比率を表す。 Further, taking the reciprocal, F value = 2A / (2A + B + C)
That is, the F value represents the ratio of “success” to “success and failure (extraction failure, extraction error)”.

まとめると、
精度＝出力した正解数／出力した全行動表現数
再現率＝出力した正解数／全行動表現中の正解数
Ｆ値＝２×再現率×精度／（再現率＋精度）
（結果と考察）
回答数によるＦ値の推移を調べるため、行動表現を段階的に増やしていき、上位２００件までの行動表現についてＦ値を計算した。評価に用いた全質問について、行動表現数を変化させた各点におけるＦ値を計算し、その平均値を用いて比較評価した。 Summary,
Accuracy = Number of correct answers output / Number of all action expressions output Recall rate = Number of correct answers output / Number of correct answers in all action expressions F value = 2 x Recall rate x Accuracy / (Recall rate + Accuracy)
(Results and discussion)
In order to examine the transition of the F value according to the number of responses, the behavioral expressions were increased step by step, and the F values were calculated for the top 200 behavioral expressions. For all questions used in the evaluation, F values at each point where the number of behavioral expressions was changed were calculated, and the average values were used for comparative evaluation.

結果を図３に示す。グラフの縦軸は平均Ｆ値、横軸は行動表現数を表す。それぞれの系列は以下の条件で評価した値を用い、全質問の平均を取っている。
ａｌｌ：基準（ａ）〜（ｅ）をすべて用いたスコアによる順位付け
Ｐａ：基準（ａ）「係り受けの距離」を使用したスコアによる順位付け
Ｐｂ：基準（ｂ）「推奨・禁止表現の有無」を使用したスコアによる順位付け
Ｐｃ：基準（ｃ）「検索における順位」を使用したスコアによる順位付け
Ｐｄ：基準（ｄ）「質問に含まれる行動表現との距離」を使用したスコアによる順位付け
Ｐｅ：基準（ｅ）「ガ格の名詞がかかる行動表現」を使用したスコアによる順位付け
また、比較のために、
ｆｒｅｑ：出現頻度による順位付け
という条件で評価した結果も示す。 The results are shown in FIG. The vertical axis of the graph represents the average F value, and the horizontal axis represents the number of behavior expressions. Each series uses the value evaluated under the following conditions, and averages all questions.
all: ranking by score using all of criteria (a) to (e) Pa: ranking by score using criteria (a) “dependency distance” Pb: criteria (b) “presence / absence of recommended / prohibited expressions” Pc: Ranking by score using criteria (c) “ranking in search” Pd: Ranking by score using criteria (d) “distance from behavioral expression included in question” Pe: Ranking by score using the standard (e) “behavioral expression with ga nouns” For comparison,
freq: Indicates the result of evaluation under the condition of ranking by appearance frequency.

まず、ｆｒｅｑとその他を比較すると、基準（ｄ）以外に関しては、いずれかの基準を用いることで平均Ｆ値が上がり結果が良くなっており、基準（ｄ）以外は単独で用いても有効な基準といえる。
また、ａｌｌとＰａ〜Ｐｅを比較すると、出力する行動表現の数によらず、ａｌｌの方が良い結果であった。すなわち、基準の組み合わせによる相乗効果があることが分かった。 First, when comparing freq and others, the average F value is increased by using any of the criteria except for the criterion (d), and the result is improved. It can be said that it is a standard.
Further, when all was compared with Pa to Pe, all showed better results regardless of the number of behavioral expressions to be output. That is, it was found that there is a synergistic effect by the combination of standards.

上記評価実験ではｈｏｗ型質問についての調査を示したが、本発明によれば、原理的にはｗｈｅｒｅ−ｔｏ−ｄｏ質問やｗｈｅｎ−ｔｏ−ｄｏ質問にも応用可能であり、「行動を問う質問」に対して幅広い観点から回答することができる。
なお、本発明は上記実施例に限定されるものではなく、本発明の趣旨に基づき種々の変形が可能であり、これらを本発明の範囲から排除するものではない。 In the above evaluation experiment, a survey on a how-type question was shown. However, according to the present invention, in principle, it can be applied to a where-to-do question and a where-to-do question. Can be answered from a broad perspective.
In addition, this invention is not limited to the said Example, Based on the meaning of this invention, a various deformation | transformation is possible and these are not excluded from the scope of the present invention.

本発明の質問応答システムは、行動を問う質問に対して、領域を限定することなく回答することができる質問応答システムとして利用可能である。 The question answering system of the present invention can be used as a question answering system that can answer a question asking for an action without limiting the area.

本発明の実施例を示す質問応答システムの構成図である。It is a block diagram of the question answering system which shows the Example of this invention. 本発明の実施例を示す質問応答システムにおける回答画面例を示す図である。It is a figure which shows the example of an answer screen in the question answering system which shows the Example of this invention. 本発明の実験例における各条件によるＦ値グラフである。It is a F value graph by each condition in the example of an experiment of the present invention.

１ユーザインタフェース装置
２質問（質問文：質問情報）
３回答（回答文：回答情報）
１０事前処理部
１１回答部
１２ＦＡＱデータベース部
１３ウェブ
１４質問抽出部
１５判別部
１６検索部
１７回答抽出部
１８回答組織化部 1 User Interface Device 2 Question (Question Text: Question Information)
3 answers (answer text: answer information)
DESCRIPTION OF SYMBOLS 10 Pre-processing part 11 Answer part 12 FAQ database part 13 Web 14 Question extraction part 15 Discrimination part 16 Search part 17 Answer extraction part 18 Answer organization part

Claims

Is connected to the (a) web, and the question extraction unit that extracts as frequently asked questions the question expression you searched frequent the web page,
(B) is connected to the question extractor, connected to the web, a search unit comprehensively searches for pages related to frequently asked questions extracted by the query extractor with search engine,
( C ) connected to the search unit, an expression representing an action that is an answer to the frequently asked question from the searched page , and an answer extraction unit that extracts a plurality of paragraphs including the expression representing the action ;
( D ) an answer organizing unit that is connected to the answer extracting unit and classifies a plurality of paragraphs extracted by the answer extracting unit into groups each including an expression representing the same action ;
( E ) a FAQ database unit that stores in advance a pair of frequently asked questions extracted by the question extraction unit and the answers classified by the answer organization unit;
( F ) a user interface device for inputting a question and outputting an answer;
( G ) a determination unit that is connected to the FAQ database unit and the user interface device and determines whether the common question stored in the FAQ database unit has the same question as the question input to the user interface device ; Comprising
(H) When it is determined by the determination unit that there is the same question, an answer corresponding to the question input to the user interface device is set to an expression representing the action or a paragraph including an expression representing the action. If the determination unit determines that there is no same question, the search, answer extraction, and response organization are performed on the question input to the user interface device to generate an answer. A question answering system characterized by being presented .

According to claim 1, wherein the question answering systems, question answering system that a combination of the threshold for number of characters in the extraction of the paragraph, and extracting the following 120 characters or more 200 characters as a paragraph.

2. The question answering system according to claim 1, wherein the answer extraction unit scores the expressions representing the actions included in the extracted plurality of paragraphs, and sums the scores of the expressions representing the actions included in the respective paragraphs. By doing so , scoring of each stage is performed .

In claim 3, wherein the question answering systems, the score of representation of the action, to select a representation of the appropriate action as an answer,
(A) The dependency distance is close (b) The recommended expression and the prohibited expression are accompanied (c) The rank in the search of the extraction source page is high (d) The distance from the expression representing the action included in the question is close ( question answering system characterized Rukoto to be scored using the five criteria are actions to be e) question (user).