JP4145776B2

JP4145776B2 - Question answering apparatus and question answering method

Info

Publication number: JP4145776B2
Application number: JP2003400110A
Authority: JP
Inventors: 佳美齋藤
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2003-11-28
Filing date: 2003-11-28
Publication date: 2008-09-03
Anticipated expiration: 2023-11-28
Also published as: JP2005165416A

Description

この発明は、ユーザの入力した質問に対する回答を出力する質問応答装置に関し、特に、回答を自然言語文書中から発見する質問応答装置に関する。 The present invention relates to a question answering device that outputs an answer to a question input by a user, and more particularly to a question answering device that finds an answer from a natural language document.

インターネットの検索エンジンに代表されるように、ユーザの検索要求に適合する文書を検索してランキングする文書検索技術は広く普及している。しかし、文書検索は、「××に関する新聞記事が読みたい」，「××に関するWebページが見たい」といった検索要求を満足することはできるが、「○×社の社長は誰か？」，「富士山の高さは？」，「鯨は絶滅しかかっているか？」といった質問に対してダイレクトに答えを返すことができない。文書検索は文書あるいは文書中のパッセージを出力するだけなので、ユーザは出力結果から自分で回答を探しださなくてはならない。 As represented by Internet search engines, document search techniques for searching and ranking documents that match user search requests are widely used. However, the document search can satisfy search requests such as “I want to read newspaper articles about XX” and “I want to see a web page about XX”, but “Who is the president of XX?”, “ Can't answer directly to questions like "How tall is Mt. Fuji?" And "Is whales going extinct?" Since document retrieval only outputs a document or a passage in the document, the user must search for an answer from the output result.

上記のような質問に対する回答を直接出力するものとして、質問応答装置がある。質問応答装置は、例えば「○×社の社長は誰か？」という質問に対し、○×社のホームページなど○×社に関する文書を出力するのではなく、○×社の社長の人名を出力する。また、「富士山の高さは？」のような質問に対しては、「富士山の標高は3776mです。」というように質問に対する回答を出力する。 There is a question answering device that directly outputs an answer to the above question. For example, in response to the question “Who is the president of XX company?”, The question answering device outputs the name of the president of XX company instead of outputting a document about XX company, such as the website of XX company. For questions such as “How tall is Mt. Fuji?”, The answer to the question is output as “Mt. Fuji is 3776m above sea level”.

質問応答装置は、近年、情報検索や情報抽出などの研究の発展形として注目を集めており、ユーザの質問に対しある程度ダイレクトな答えをかえすことが可能になってきている。例えば利用者の質問文に対して回答と、その回答が質問への回答となっていることを利用者が確認できる文書（以下、ここでは根拠文書と呼ぶ）とを出力するような質問応答装置が知られている（例えば、特許文献１参照）。 In recent years, the question answering apparatus has been attracting attention as an advanced form of research such as information retrieval and information extraction, and it has become possible to change a direct answer to a user's question to some extent. For example, a question answering device that outputs an answer to a user's question text and a document (hereinafter referred to as a rationale document) that allows the user to confirm that the answer is an answer to the question Is known (see, for example, Patent Document 1).

なお、ここで根拠文書とは、例えば、質問応答装置が質問に対する回答を抽出した抽出源の文書のことである。
特開２００２−１３２８１２公報 Here, the basis document is, for example, a document of an extraction source from which an answer to a question is extracted by a question answering device.
JP 2002-132812 A

しかしながら、入力される質問文は自然言語で記述されているので、必ずしも単純な質問とは限らず、例えば「新潟に本社のある、お菓子製造の会社は？」といった複雑な質問も入力され得る。このような複雑な質問に対しては、１つの文書中に、ある回答候補がその質問への回答であることを示す全ての情報が含まれているとは限らず、複数の文書中に分散して記述されている可能性がある。 However, since the question text to be entered is written in a natural language, it is not necessarily a simple question. For example, a complicated question such as “What is a confectionery manufacturing company headquartered in Niigata?” Can also be entered. . For such a complicated question, not all information indicating that a certain answer candidate is an answer to the question is included in one document. May be described.

例えばＡ社の本社が新潟にあるという情報と、Ａ社がお菓子製造の会社であるという情報が、別々の文書中に記述されていたとする。その場合、入力された質問文をそのまま用いて回答候補を検索するだけではその回答候補（上記の例ではＡ社）が回答候補リストの上位から洩れてしまう場合があり、結果として正しい回答を出力できなくなるという問題点があった。このような、情報の蓄積場所が複数存在するという問題に対し、いわゆるデータベース検索の分野では問い合わせ（クエリー）を複数の検索条件に分割するという技術が解決手段として知られている。 For example, it is assumed that information that company A's head office is in Niigata and information that company A is a confectionery manufacturing company are described in separate documents. In that case, simply searching for the answer candidate using the input question text as it is may cause the answer candidate (Company A in the above example) to leak from the top of the answer candidate list, and as a result, the correct answer is output. There was a problem that it was impossible. To solve such a problem that there are a plurality of information storage locations, in the field of so-called database search, a technique of dividing an inquiry (query) into a plurality of search conditions is known as a solution.

しかし、回答を自然言語文書中から発見するような質問応答装置においては、あらかじめ検索されるデータの形式・関係が定まっているデータベース検索と異なり、検索されるデータの形式・関係が分からないため、どのようなときに質問文を分割すべきかが必ずしも明確ではない。また誤った回答が検索される危険性の少ないデータベース検索とは異なり、誤った回答候補が検索されるために正しい回答候補が候補リストから漏れるといった危険性があり、検索対象に含まれる表現によって、分割しない質問文での問い合わせの方が適切な回答を抽出できる場合も、また反対に分割した質問文での問い合わせの方が適切な回答を抽出できる場合も存在する。 However, in question answering devices that find answers from natural language documents, unlike database searches where the format / relationship of the data to be searched is determined in advance, the format / relationship of the data to be searched is unknown, It is not always clear when the question should be divided. Also, unlike database searches that have a low risk of incorrect answers being searched, there is a risk that correct answer candidates will be leaked from the candidate list because incorrect answer candidates are searched, and depending on the expressions included in the search target, There are cases where an inquiry with a question sentence that is not divided can extract an appropriate answer, and an inquiry with an inquiry sentence divided in the opposite direction can extract an appropriate answer.

このように従来の質問応答装置においては、回答となる情報が複数の文書中に分散して記述されている場合、正しい回答候補が候補リストから漏れるという問題があった。 As described above, in the conventional question answering apparatus, there is a problem that correct answer candidates are omitted from the candidate list when the information to be answered is described in a plurality of documents.

この発明は、以上の問題点に鑑み、回答となる情報が複数の文書中に分散して記述されている場合でも適切な回答候補を出力できる質問応答装置を提供することを目的とする。 In view of the above problems, an object of the present invention is to provide a question answering apparatus capable of outputting appropriate answer candidates even when information to be answered is described in a distributed manner in a plurality of documents.

上記の目的を達成するために、この発明においては、入力された質問文に対する回答を複数の文書から抽出して出力する質問応答装置において、入力された質問文を複数の分割された質問文である分割質問文に分割する質問文分割手段と、前記質問文および前記分割質問文を解析して、前記複数の文書から前記回答の候補となる回答候補を得て、この得た回答候補と、前記質問文または前記分割質問文とを組として記憶する回答候補記憶手段と、前記回答候補記憶手段が記憶する回答候補を評価し、この評価の結果得られた評価点を前記組毎に付与する回答候補評価手段と、前記分割質問文から得られた前記組について、前記回答候補毎にマージするマージ手段と、最終的に得られた前記組の評価点が高い組の順に、所定個の前記回答候補を回答として出力する回答出力手段とを有することを特徴とする質問応答装置を提供する。 In order to achieve the above object, in the present invention, in a question answering apparatus that extracts and outputs answers to an inputted question sentence from a plurality of documents, the inputted question sentence is divided into a plurality of divided question sentences. A question sentence dividing means for dividing into a divided question sentence; and analyzing the question sentence and the divided question sentence to obtain an answer candidate as the answer candidate from the plurality of documents; and the obtained answer candidate; Answer candidate storage means for storing the question sentence or the divided question sentence as a set, and answer candidates stored in the answer candidate storage means are evaluated, and an evaluation score obtained as a result of this evaluation is assigned to each set Answer candidate evaluation means, merging means for merging each answer candidate for the set obtained from the divided question sentence, and a predetermined number of the above-mentioned sets in the order of the set with the highest evaluation score of the set Answer candidates Providing question-answering apparatus characterized by having an answer output means for outputting as an answer.

この発明によれば、回答となる情報が複数の文書中に分散して記述されているような場合でも適切な回答候補を出力できる質問応答装置を提供することができる。 According to the present invention, it is possible to provide a question answering apparatus capable of outputting appropriate answer candidates even when information to be answered is described in a distributed manner in a plurality of documents.

以下、図面を参照して本発明の実施の形態について詳細に説明する。
図１は本発明の一実施形態に係る質問応答装置を示すブロック図である。
この質問応答装置は、質問文を入力する入力部１０１，質問応答装置全体を制御する制御部１０２，回答結果を出力する出力部１０３，質問文を分割する分割質問文生成部１１１，入力質問文および分割質問文を管理する質問文リスト１１２，回答候補と回答候補の評価点と根拠文書の情報を保持する回答候補保持部１１３，質問パターンデータベース１２１，質問文の回答種別を判定する回答種別判定部１２２，質問文により検索を行い文書をスコアリングする文書検索部１２３，自然言語文書が蓄積されている文書データベース１２４，検索対象文書から回答となる可能性のある表現を抽出し回答カテゴリを付与する回答表現抽出部１２５，表現カテゴリデータベース１２６，回答表現データベース１２７，検索結果と回答種別と回答表現から回答候補を選択し評価点を付与する回答候補生成部１２８からなる。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
FIG. 1 is a block diagram showing a question answering apparatus according to an embodiment of the present invention.
This question answering apparatus includes an input unit 101 for inputting a question sentence, a control part 102 for controlling the entire question answering apparatus, an output part 103 for outputting an answer result, a divided question sentence generating part 111 for dividing a question sentence, and an input question sentence And question sentence list 112 for managing divided question sentences, answer candidate and answer candidate evaluation points and answer candidate holding part 113 for holding information on rationale documents, question pattern database 121, answer type determination for determining answer type of question sentence Unit 122, document search unit 123 for searching by a sentence and scoring a document, document database 124 in which natural language documents are stored, and an expression that may be a response from a search target document and assigning an answer category Answer expression extraction unit 125, expression category database 126, answer expression database 127, search result, answer type and answer table Select an answer candidate consisting answer candidate generating unit 128 that applies the evaluation point from.

質問文リスト１１２，回答候補保持部１１３，質問パターンデータベース１２１，文書データベース１２４，表現カテゴリデータベース１２６，回答表現データベース１２７は、メモリやハードディスクドライブ等の記憶媒体を備えている。 The question sentence list 112, the answer candidate holding unit 113, the question pattern database 121, the document database 124, the expression category database 126, and the answer expression database 127 include storage media such as a memory and a hard disk drive.

また、この質問応答装置は、ハードウェアをプロセッサ，ＲＯＭ，ＲＡＭ等から構成されたコンピュータとし、上記各機能をコンピュータが記憶媒体から読み出したコンピュータプログラムをプロセッサが実行することにより実現するようにしてもよい。 In the question answering apparatus, the hardware is a computer composed of a processor, ROM, RAM, and the like, and the above functions are realized by the processor executing computer programs read from the storage medium by the computer. Good.

以下、本質問応答装置の全体的な処理の流れを図1に沿って説明する。
回答表現抽出部１２５は、あらかじめ文書データベース１２４に登録されている検索対象文書と、表現カテゴリデータベース１２６に登録されている表現とを比較して回答となる可能性のある回答表現を回答種別情報付きで抽出し、回答表現文字列、出現文書ＩＤ、出現位置、回答種別の各情報を回答表現データベース１２７に保持しておく。 Hereinafter, the overall processing flow of the question answering apparatus will be described with reference to FIG.
The answer expression extraction unit 125 compares the search target document registered in the document database 124 in advance with the expression registered in the expression category database 126 and adds an answer expression that may become an answer with answer type information. The response expression character string, the appearance document ID, the appearance position, and the answer type are stored in the answer expression database 127.

この回答表現抽出部１２５の処理については、いわゆる固有名詞抽出やオントロジータガーの技術が既に知られており、これらの技術を適用すれば良い。本発明の特徴部ではないためここでは詳細な説明は省略する。回答表現抽出においては、対象文書の表層表現に対して処理を行ってもよいし、形態素解析した結果に対して処理を行ってもよいし、構文・係り受け解析結果に対して処理を行ってもよい。 Regarding the processing of the answer expression extraction unit 125, so-called proper noun extraction and ontology tagger techniques are already known, and these techniques may be applied. Since it is not the characteristic part of this invention, detailed description is abbreviate | omitted here. In response expression extraction, processing may be performed on the surface layer representation of the target document, processing may be performed on the result of morphological analysis, or processing may be performed on the result of syntax / dependency analysis. Also good.

ユーザは入力部１０１を介してキーワード等からなる質問文を入力する。
制御部１０２は、入力部１０１から入力された質問文を質問文リスト１１２の１行目に登録すると共に質問文ＩＤを分割質問文生成部１１１，回答種別判定部１２２および文書検索部１２３へ出力する。 The user inputs a question sentence including a keyword or the like via the input unit 101.
The control unit 102 registers the question text input from the input unit 101 in the first line of the question text list 112 and outputs the question text ID to the divided question text generation unit 111, the answer type determination unit 122, and the document search unit 123. To do.

回答種別判定部１２２は、制御部１０２から入力された質問文と、質問パターンデータベース１２１に登録されている表現とを比較して回答種別判定を行い、判定結果である回答種別情報を制御部１０２へ出力する。この回答種別判定部１２２の処理については、例えば特開２００２−１３２８１２公報等を用いればよく、本発明の主要部ではないため、詳細な説明は省略する。 The answer type determination unit 122 compares the question text input from the control unit 102 with the expressions registered in the question pattern database 121 to determine the answer type, and sets the answer type information as the determination result to the control unit 102. Output to. For the processing of the answer type determination unit 122, for example, Japanese Patent Laid-Open No. 2002-132812 may be used, and since it is not a main part of the present invention, detailed description thereof is omitted.

文書検索部１２３は、制御部１０２から入力された質問文から検索キーワードを取り出し、検索対象文書である文書データベース１２４に対してキーワード検索を行い、キーワード検索結果を得る。そしてこのキーワード検索により得られた文書について、文書スコアの上位順にあらかじめ定められた文書ＩＤと当該文書ＩＤの文書スコア、およびマッチした検索キーワードと当該検索キーワードの出現位置の情報を制御部１０２へ出力する。このキーワード検索による文書スコアの出し方としては、例えば、より高い出現頻度の文書を、より高いスコアとする等、従来種々知られている技術を適用すれば良い。この文書スコアを出すキーワード検索としては、例えば、特開２００２−１３２８１１や、文献（徳永健伸；情報検索と言語処理、東京大学出版会、１９９４年）等を用いれば良い。 The document search unit 123 extracts a search keyword from the question text input from the control unit 102, performs a keyword search on the document database 124 that is a search target document, and obtains a keyword search result. Then, for the document obtained by this keyword search, the document ID that has been determined in descending order of the document score, the document score of the document ID, and the matched search keyword and the information on the appearance position of the search keyword are output to the control unit 102. To do. As a method for obtaining a document score by this keyword search, various conventionally known techniques may be applied, for example, a document having a higher appearance frequency is set to a higher score. For example, Japanese Patent Laid-Open No. 2002-13281, literature (Takenobu Tokunaga; information retrieval and language processing, University of Tokyo Press, 1994), etc. may be used as the keyword search for obtaining the document score.

制御部１０２は、回答種別判定部１２２の処理結果（回答種別情報）および文書検索部１２３の処理結果（文書ＩＤと当該文書ＩＤの文書スコア，検索キーワードと当該検索キーワードの出現位置の情報の４つの情報の内、文書ＩＤと当該文書ＩＤの文書スコアの２つの情報）を、回答候補生成部１２８へ出力する。 The control unit 102 determines the processing result (response type information) of the answer type determination unit 122 and the processing result of the document search unit 123 (document ID, document score of the document ID, search keyword, and information on the appearance position of the search keyword. Two pieces of information (document ID and document score of the document ID) are output to the answer candidate generating unit 128.

回答候補生成部１２８は、制御部１０２から入力された回答種別情報，文書ＩＤと当該文書ＩＤの文書スコアをキーとして回答表現データベース１２７を検索し、回答候補文字列（回答候補表現）と当該回答候補文字列（回答候補表現）の出現位置の情報を得る。 The answer candidate generating unit 128 searches the answer expression database 127 using the answer type information, the document ID, and the document score of the document ID input from the control unit 102 as keys, and the answer candidate character string (answer candidate expression) and the answer. Information on the appearance position of the candidate character string (answer candidate expression) is obtained.

回答候補生成部１２８は、該回答候補文字列（回答候補表現）と該回答候補文字列（回答候補表現）を得るために回答表現データベース１２７の検索時にマッチした各キーワードとの距離を計算し、あらかじめ定めた計算式により各回答候補文字列（回答候補表現）に回答としての評価点を付与する。例えば、マッチした各キーワードとの距離の逆数を合算するなどの計算式により評価点を付与することが可能である。より確からしい回答候補に高い評価点を与える計算式については質問応答装置の分野で種々の手法が知られている。ここでは本発明の主要部ではないため詳細な説明は省略する。 The answer candidate generating unit 128 calculates the distance between the answer candidate character string (answer candidate expression) and each keyword matched when searching the answer expression database 127 to obtain the answer candidate character string (answer candidate expression), An evaluation score as an answer is assigned to each answer candidate character string (answer candidate expression) by a predetermined calculation formula. For example, an evaluation score can be given by a calculation formula such as adding the reciprocal of the distance to each matched keyword. Various methods are known in the field of question answering devices for giving a high evaluation score to more likely answer candidates. Here, since it is not the main part of this invention, detailed description is abbreviate | omitted.

質問応答の技術については、例えば特開２００３−１０８５８３公報、特開平７−９３３５１号公報や特開平７−１９２０２０号公報等により知られている技術を用いれば良い。 As a question answering technique, for example, a technique known from Japanese Patent Application Laid-Open No. 2003-108583, Japanese Patent Application Laid-Open No. 7-93351, Japanese Patent Application Laid-Open No. 7-192020, or the like may be used.

次に、回答候補生成部１２８は、得られた回答候補文字列（回答候補表現）のうち表層文字列が同じものの中から評価点が高いものを残し、他を削除する。
次に、回答候補生成部１２８は、回答候補文字列（回答候補表現）と、その質問文ＩＤ（質問文リスト１１２の番号），評価点，文書ＩＤリストを、制御部１０２に出力する。 Next, the answer candidate generating unit 128 leaves the obtained answer candidate character strings (answer candidate expressions) having the same evaluation score from the same surface character strings, and deletes others.
Next, the answer candidate generating unit 128 outputs the answer candidate character string (answer candidate expression), the question sentence ID (number of the question sentence list 112), the evaluation score, and the document ID list to the control unit 102.

制御部１０２は、回答候補生成部１２８から入力された情報（回答候補生成部１２８により得られた回答候補文字列（回答候補表現），その質問文ＩＤ（質問文リスト１１２の質問文ＩＤ），評価点，文書ＩＤリスト）を、回答候補保持部１１３へ出力する。回答候補保持部１１３が記憶するデータ形式については後で詳述する。 The control unit 102 receives information input from the answer candidate generation unit 128 (answer candidate character string (answer candidate expression) obtained by the answer candidate generation unit 128, its question sentence ID (question sentence ID of the question sentence list 112), (Evaluation point, document ID list) is output to the answer candidate holding unit 113. The data format stored in the answer candidate holding unit 113 will be described in detail later.

回答候補保持部１１３は、制御部１０２から入力されたこの情報を保持する。
分割質問文生成部１１１は、制御部１０２から入力された質問文が分割可能か否かを判定し、質問文が分割可能な場合には、質問文から分割質問文を生成して制御部１０２に出力する。 The answer candidate holding unit 113 holds this information input from the control unit 102.
The divided question sentence generation unit 111 determines whether or not the question sentence input from the control unit 102 can be divided. If the question sentence can be divided, the divided question sentence generation unit 111 generates a divided question sentence from the question sentence and generates a control part 102. Output to.

例えば分割質問文生成部１１１は、入力質問文を構文解析し、構文木構造を条件として分割可能か否かを判定して分割質問文を生成する。あるいは、正規表現により分割可能かどうかを判定し分割質問文を生成してもよい。例えば、構文解析結果が、質問文の主語に複数の連体修飾句が係っている場合、連体修飾句毎に質問文を分割するようにしてもよい。また、質問文の途中に句読点が含まれている場合、句読点の前後で質問文を分割して質問を生成するようにしてもよい。 For example, the divided question sentence generation unit 111 parses the input question sentence, determines whether or not it can be divided on the basis of the syntax tree structure, and generates a divided question sentence. Alternatively, a divided question sentence may be generated by determining whether or not division is possible using a regular expression. For example, when the result of the syntax analysis is that the subject of the question sentence has a plurality of linkage modifiers, the question sentence may be divided for each linkage modifier. In addition, when a punctuation mark is included in the middle of a question sentence, the question sentence may be divided before and after the punctuation mark to generate a question.

図２および図３は、この質問文から分割質問文を生成する処理の一例を模式的に表した図である。図２は、入力質問文を構文解析し、構文木構造を条件として分割した例を示す図である。図３は、質問文の途中に句読点が含まれている場合、句読点の前後で質問文を分割して質問を生成した例を示す図である。 2 and 3 are diagrams schematically showing an example of processing for generating a divided question sentence from the question sentence. FIG. 2 is a diagram illustrating an example in which an input question sentence is parsed and divided using a syntax tree structure as a condition. FIG. 3 is a diagram illustrating an example in which a question is generated by dividing a question sentence before and after a punctuation mark when a punctuation mark is included in the middle of the question sentence.

分割質問文生成部１１１は、制御部１０２から入力された「新潟にある、お菓子製造の会社は？」という質問文を分割し、「新潟にある本社がある会社は？」および「お菓子製造の会社は？」という２つの分割質問文を生成して制御部１０２に出力する。 The divided question sentence generation unit 111 divides the question sentence “Are you a confectionery manufacturing company in Niigata?” Input from the control unit 102, and “Which company has a head office in Niigata?” Two divided question sentences “What is the manufacturing company?” Are generated and output to the control unit 102.

次に、分割質問文生成部１１１から分割質問文が入力された後の制御部１０２の処理の流れについて説明する。
制御部１０２は、分割質問文生成部１１１から入力された全ての分割質問文を質問文リスト１１２に登録する。
次に、制御部１０２は、質問文リスト１１２に登録された分割質問文を一つづつ順に回答種別判定部１２２および文書検索部１２３に送る。
回答種別判定部１２２および文書検索部１２３の処理については、分割質問文の場合も入力質問文の場合と同様であり、各処理結果を制御部１０２へ出力する。
次に制御部１０２は、回答種別判定部１２２の処理結果および文書検索部１２３の処理結果を回答候補生成部１２８に出力する。
回答候補生成部１２８の処理については、分割質問文の場合も入力質問文の場合と同様であり、処理結果を制御部１０２へ出力する。
次に制御部１０２は、回答候補生成部１２８から入力された情報を回答候補保持部１１３へ出力する。
回答候補保持部１１３は、制御部１０２から入力されたこの情報を保持する。
制御部１０２は、全ての分割質問文に対し同じ処理を繰り返す。
制御部１０２は、全ての分割質問文の処理が終了したら、回答候補保持部１１３に記憶されている回答候補文字列（回答候補表現）を１つずつ読み出し、複数の分割質問文に共通している回答候補を見つけ、その評価点を更新する。この評価点更新の手続きについては後に詳述する。 Next, the flow of processing performed by the control unit 102 after the divided question text is input from the divided question text generation unit 111 will be described.
The control unit 102 registers all the divided question sentences input from the divided question sentence generation unit 111 in the question sentence list 112.
Next, the control unit 102 sends the divided question sentences registered in the question sentence list 112 to the answer type determination unit 122 and the document search unit 123 in order one by one.
The processing of the answer type determination unit 122 and the document search unit 123 is the same as that of the input question sentence in the case of the divided question sentence, and outputs each processing result to the control unit 102.
Next, the control unit 102 outputs the processing result of the answer type determination unit 122 and the processing result of the document search unit 123 to the answer candidate generation unit 128.
About the process of the answer candidate production | generation part 128, the case of a divided question sentence is the same as that of the case of an input question sentence, and a process result is output to the control part 102. FIG.
Next, the control unit 102 outputs the information input from the answer candidate generating unit 128 to the answer candidate holding unit 113.
The answer candidate holding unit 113 holds this information input from the control unit 102.
The control unit 102 repeats the same process for all divided question sentences.
When the processing of all the divided question sentences is completed, the control unit 102 reads out the answer candidate character strings (answer candidate expressions) stored in the answer candidate holding unit 113 one by one, and is common to a plurality of divided question sentences. Find the answer candidate and update its evaluation score. This evaluation point update procedure will be described in detail later.

次に制御部１０２は、回答候補保持部１１３に保持されている回答候補文字列（回答候補表現）と当該回答候補文字列（回答候補表現）の評価点を読み出し、評価点の高い順に回答候補を並べ替える。 Next, the control unit 102 reads the answer candidate character strings (answer candidate expressions) held in the answer candidate holding unit 113 and the evaluation points of the answer candidate character strings (answer candidate expressions), and the answer candidates in descending order of the evaluation points. Sort by.

最後に制御部１０２は、評価点順に並べられた回答候補の内、あらかじめ定められた上位ｎ件の回答候補を出力部１０３に出力する。この例では、あらかじめ定められた上位ｎ件としたが、このｎについては固定である必要はなく、可変であっても良い。また、評価点が幾つ以上のものに限定して出力するようにしても良い。 Finally, the control unit 102 outputs, to the output unit 103, predetermined top n answer candidates among the answer candidates arranged in the order of evaluation points. In this example, the top n cases are determined in advance, but this n need not be fixed and may be variable. Further, the output may be limited to a number of evaluation points.

出力部１０３は、ディスプレイから構成されており、制御部１０２から入力された回答候補をディスプレイ上に表示する。この表示方法の例については後で詳細に説明する。 The output unit 103 includes a display, and displays answer candidates input from the control unit 102 on the display. An example of this display method will be described later in detail.

以上が、質問応答装置の処理の流れの概略である。
以下、先の図２，図３の質問文である「新潟にある、お菓子製造の会社は？」が入力されたときを例として質問応答装置の動作を詳細に説明する。
図４は、質問文リスト１１２のデータの形式を示す図である。
質問文リスト１１２のデータは、質問文ＩＤ，種類，テキストの３つから構成されている。１行目には、質問文ＩＤ“００１”として、入力質問文を示す“入力”，テキストとして“新潟にある、お菓子製造の会社は？”が入力されている。２行目には、質問文ＩＤ“００２”として、分割質問文を示す“分割”，テキストとして一方の分割質問文である“新潟にある本社がある会社は？”が入力されている。３行目には、質問文ＩＤ“００３”として、分割質問文を示す“分割”，テキストとして他方の分割質問である“お菓子製造の会社は？”が入力されている。 The above is the outline of the processing flow of the question answering apparatus.
Hereinafter, the operation of the question answering apparatus will be described in detail by taking as an example the case where the question sentence in FIG. 2 and FIG. 3 is “What is a candy manufacturing company in Niigata?”.
FIG. 4 is a diagram illustrating a data format of the question sentence list 112.
The data of the question sentence list 112 is composed of three items of question sentence ID, type, and text. In the first line, the question sentence ID “001” is entered as “input” indicating the input question sentence, and the text is “What is the candy manufacturing company in Niigata?”. In the second line, a question sentence ID “002” is entered as “division” indicating a division question sentence, and one division question sentence “Where is the company located in Niigata?” Is entered as text. In the third line, “partition” indicating a divided question sentence is inputted as a question sentence ID “003”, and “what is the candy manufacturing company?” Which is the other divided question is inputted as text.

回答候補保持部１１３は、回答候補文字列（回答候補表現），その質問文ＩＤ（図４に示す質問文リスト１１２の質問文ＩＤ），評価点，文書ＩＤ，ｆｌａｇ（フラッグ）の５つの情報から成る。ｆｌａｇは、既に他のデータにマージされ不要となったデータであることを識別するために付すｆｌａｇである。 The answer candidate holding unit 113 has five pieces of information including an answer candidate character string (answer candidate expression), a question sentence ID (question sentence ID of the question sentence list 112 shown in FIG. 4), an evaluation score, a document ID, and a flag (flag). Consists of. The flag is a flag attached to identify data that has already been merged with other data and becomes unnecessary.

制御部１０２は、まず入力質問文（すなわち質問文ＩＤ＝００１）「新潟にある、お菓子製造の会社は？」に対する回答候補を回答候補保持部１１３に格納する。このときｆｌａｇ＝１とする。 First, the control unit 102 stores, in the answer candidate holding unit 113, answer candidates for the input question sentence (that is, question sentence ID = 001) “What is the candy manufacturing company in Niigata?”. At this time, flag = 1.

この時点での回答候補保持部１１３のデータを、図５Ａに示す。
次に、制御部１０２は、一方の分割質問文（すなわち質問文ＩＤ＝００２）「新潟にある本社がある会社は？」に対する回答候補を回答候補保持部１１３に格納する（図５Ｂの６行目〜８行目を参照）と共に、他方の分割質問文（すなわち質問文ＩＤ＝００３）「お菓子製造の会社は？」に対する回答候補を回答候補保持部１１３に格納する（図５Ｂの９行目〜１１行目を参照）。このときｆｌａｇ＝１とする。 Data of the answer candidate holding unit 113 at this time is shown in FIG. 5A.
Next, the control unit 102 stores answer candidates for one of the divided question sentences (that is, question sentence ID = 002) “Which company has the head office in Niigata?” In the answer candidate holding unit 113 (line 6 in FIG. 5B). And the answer candidate for the other divided question sentence (namely, question sentence ID = 003) “Which company is the confectionery manufacturer?” Is stored in the answer candidate holding unit 113 (line 9 in FIG. 5B). Eyes to 11th line). At this time, flag = 1.

この時点での回答候補保持部１１３のデータを図５Ｂに示す。この図５Ｂは、回答候補保持部１１３のデータの形式を示す図であり、図５Ａと比較して分割質問文に関するデータ（質問文ＩＤ＝００２，００３）が追加されている点が異なる。 Data of the answer candidate holding unit 113 at this time is shown in FIG. 5B. FIG. 5B is a diagram showing a data format of the answer candidate holding unit 113, and is different from FIG. 5A in that data related to a divided question sentence (question sentence ID = 002, 003) is added.

以上により、全ての質問文（入力質問文，分割質問文）に対する回答候補が回答候補保持部１１３に格納される。
次に、全ての質問文（入力質問文，分割質問文）に対する回答候補が回答候補保持部１１３に格納された後の制御部１０２の処理の流れについて図６のフローチャートに従って説明する。 As described above, answer candidates for all question sentences (input question sentences, divided question sentences) are stored in the answer candidate holding unit 113.
Next, the processing flow of the control unit 102 after answer candidates for all question sentences (input question sentences, divided question sentences) are stored in the answer candidate holding part 113 will be described with reference to the flowchart of FIG.

まず、処理開始時点で格納されている回答候補の数を、変数ｃｎｔに代入する（ステップＳ６０１）。
変数ｎに初期値１，変数ｉに初期値１を代入する（ｎ＝１，ｉ＝１）（ステップＳ６０２）。
変数ｎが、変数ｃｎｔより大きいか否か（ｎ＞ｃｎｔ？）を判定する（ステップＳ６１１）。このステップＳ６１１の判定結果がＹｅｓであれば、回答候補保持部１１３の全てのデータに対し、同じ回答候補文字列で異なる質問文ＩＤ（質問文ＩＤが複数あるときには、質問文ＩＤを複数並べた質問文ＩＤリスト）を持つデータの有無を調べ、もし見つかった場合には評価点の小さい方のデータのｆｌａｇを０に変更し（ステップＳ６９１）、終了する。反対にステップＳ６１１の判定結果がＮｏであれば、次のステップＳ６１２へ進む。 First, the number of answer candidates stored at the start of processing is substituted into a variable cnt (step S601).
The initial value 1 is substituted for the variable n and the initial value 1 is substituted for the variable i (n = 1, i = 1) (step S602).
It is determined whether or not the variable n is larger than the variable cnt (n> cnt?) (Step S611). If the determination result in this step S611 is Yes, for all data in the answer candidate holding unit 113, a plurality of question sentence IDs are arranged with the same answer candidate character string (when there are a plurality of question sentence IDs, a plurality of question sentence IDs are arranged. The presence or absence of data having (question sentence ID list) is checked, and if found, the flag of the data with the smaller evaluation score is changed to 0 (step S691), and the process ends. On the other hand, if the determination result of step S611 is No, the process proceeds to the next step S612.

次に、回答候補保持部１１３の全てのデータに対し、同じ回答候補文字列で異なる質問文ＩＤまたは質問文ＩＤリストを持つデータの有無を調べ、もし見つかった場合には、評価点の小さい方のデータのｆｌａｇを０に変更する。 Next, with respect to all the data in the answer candidate holding unit 113, the presence or absence of data having different question sentence IDs or question sentence ID lists with the same answer candidate character string is checked. The flag of the data is changed to 0.

ステップＳ６１１でＮｏであれば（まだ全ての回答候補について処理を終えていない場合）、ｎ行目の回答候補は質問文ＩＤの値が２以上（即ち、分割質問文）であり、かつｆｌａｇ＝１（即ち、マージされていないデータ）であるか否か（質問文ＩＤの値≧２かつｆｌａｇ＝１）を判定する（ステップＳ６１２）。このステップＳ６１２の判定結果がＹｅｓであれば、まだマージしていない新たな回答候補文字列であるため以降マージを行うため、変数ｐに０を代入して初期化し（ｐ＝０）（ステップＳ６５１）、反対にＮｏであればステップＳ６１３で変数ｎを１インクリメントし（ｎ＝ｎ＋１）、ステップＳ６１１へ戻る。 If No in step S611 (if all answer candidates have not been processed yet), the answer candidate in the nth row has a question sentence ID value of 2 or more (that is, a divided question sentence), and flag = It is determined whether it is 1 (that is, data that has not been merged) (question ID value ≧ 2 and flag = 1) (step S612). If the determination result in this step S612 is Yes, it is a new answer candidate character string that has not yet been merged, so that the variable p is initialized by substituting 0 into the variable p (p = 0) (step S651). On the other hand, if No, the variable n is incremented by 1 (n = n + 1) in step S613, and the process returns to step S611.

ステップＳ６５１の次に、変数ａに回答候補保持部１１３のｎ行目の回答候補文字列を代入する（ステップＳ６５２）。
ステップＳ６５２の次に、変数ｍに、変数ｎ＋１を代入し初期化する（ｍ＝ｎ＋１）（ステップＳ６５３）。この変数ｍは、ｎ行目のデータと、回答候補保持部１１３の何行目のデータとを比較するかを示す変数として使用し、ｎ＋１行目以降と比較するため初期値であるｎ＋１を与えている。 Following step S651, the answer candidate character string in the nth row of the answer candidate holding unit 113 is substituted for the variable a (step S652).
Subsequent to step S652, the variable n + 1 is substituted into the variable m and initialized (m = n + 1) (step S653). This variable m is used as a variable indicating the comparison between the data in the nth row and the data in the answer candidate holding unit 113, and is given an initial value n + 1 for comparison with the n + 1th and subsequent rows. ing.

ステップＳ６５３の次に、変数ｍが、変数ｃｎｔより大きいか否かを判定する（ｍ＞ｃｎｔ？）（ステップＳ６５４）。即ち、ｎ行目の回答候補文字列に対して、全てマージが終了したか否かを判定する。ステップＳ６５４の判定結果がＹｅｓ（ｎ行目の回答候補文字列に対して、全てマージが終了した）であれば、ステップＳ６１３で変数ｎを１インクリメントし（ｎ＝ｎ＋１）、ステップＳ６１１へ戻る。反対にステップＳ６５４の判定結果がＮｏ（ｎ行目の回答候補文字列に対して、全てマージが終了していない）であれば、変数ｂに回答候補保持部１１３のｍ行目の回答候補文字列を代入する（ステップＳ６５５）。 Following step S653, it is determined whether the variable m is greater than the variable cnt (m> cnt?) (Step S654). That is, it is determined whether or not merging has been completed for all answer candidate character strings in the nth row. If the determination result in step S654 is Yes (all merges have been completed for the answer candidate character string on the nth row), the variable n is incremented by 1 (n = n + 1) in step S613, and the process returns to step S611. On the other hand, if the determination result in step S654 is No (no merge has been completed for the answer candidate character string in the nth row), the answer candidate character in the mth row of the answer candidate holding unit 113 is stored in the variable b. A column is substituted (step S655).

ステップＳ６５５の次に、変数ａが、変数ｂと等しいか否かを判定する（ａ＝ｂ？）（ステップＳ６５６）。ステップＳ６５６の判定結果が、Ｙｅｓ（ａ＝ｂ）であれば、次のステップＳ６５７へ進み、反対にＮｏ（ａ≠ｂ）であればステップＳ６６２で変数ｍを１インクリメントし（ｍ＝ｍ＋１）、ステップＳ６５４へ戻る。 After step S655, it is determined whether the variable a is equal to the variable b (a = b?) (Step S656). If the determination result in step S656 is Yes (a = b), the process proceeds to the next step S657. If No (a ≠ b), the variable m is incremented by 1 in step S662 (m = m + 1). The process returns to step S654.

次に、変数ｐが０か否か判定する（ステップＳ６５７）。ステップＳ６５７の判定結果がＹｅｓであれば回答候補保持部１１３のｎ行目の回答候補の情報を、変数ｃｎｔ＋ｉ行目にコピーすると共に回答候補ｎ行目のｆｌａｇを０とし（ステップＳ６５８）、反対にＮｏであればステップＳ６６０へ進む。 Next, it is determined whether or not the variable p is 0 (step S657). If the determination result in step S657 is Yes, the information on the answer candidate in the nth line of the answer candidate holding unit 113 is copied to the variable cnt + i line, and the flag in the answer candidate nth line is set to 0 (step S658). If No, the process proceeds to step S660.

ステップＳ６５８の次に、変数ｉを１インクリメントし（ｉ＝ｉ＋１）（ステップＳ６５９）、次のステップＳ６６０へ進む。
次のステップＳ６６０では、次の３つの処理を行う、１つ目は、ｍ行目の回答候補の評価点を、変数ｃｎｔ＋ｉ行目の回答候補の評価点とを加算する。２つ目は、質問文ＩＤと文書ＩＤを、変数ｃｎｔ＋ｉ行目へ追加する。３つ目は、ｍ行目のｆｌａｇ＝０とする。 After step S658, the variable i is incremented by 1 (i = i + 1) (step S659), and the process proceeds to the next step S660.
In the next step S660, the following three processes are performed. First, the evaluation score of the answer candidate in the m-th row is added to the evaluation score of the answer candidate in the variable cnt + i-th row. Second, the question sentence ID and the document ID are added to the variable cnt + i line. The third is flag = 0 in the m-th row.

ステップＳ６６０の次に、変数ｐを１インクリメントし（ｐ＝ｐ＋１）（ステップＳ６６１）、ステップＳ６６２へ進む。
要するに、ｎ＝１から順にまだマージしていない回答候補についてマージ処理を順次していく。
質問文ＩＤの値が２以上（即ち、分割質問による回答候補）で、ｆｌａｇの値が１（即ち、まだマージしていないデータ）である回答候補に対して、同じ回答候補文字列を持つデータの有無を調べる。 After step S660, the variable p is incremented by 1 (p = p + 1) (step S661), and the process proceeds to step S662.
In short, the merge process is sequentially performed on answer candidates that have not yet been merged in order from n = 1.
Data having the same answer candidate character string for answer candidates having a question sentence ID value of 2 or more (that is, answer candidates based on divided questions) and a flag value of 1 (that is, data that has not yet been merged) Check for the presence or absence.

もし見つかった場合には、重複しているデータの中で一番上のデータ（ｎ行目のデータ）、即ち、ｎ行目の回答候補の回答候補文字列，質問文ＩＤ，評価点，文書ＩＤ，ｆｌａｇ（ｆｌａｇ＝１）を、回答候補保持部１１３の変数ｃｎｔ＋ｉ行目に格納し、ｎ行目の回答候補のｆｌａｇを０とする。 If found, the top data (data in the nth row) among the duplicate data, that is, the answer candidate character string, question sentence ID, evaluation score, document of the answer candidate in the nth row ID, flag (flag = 1) is stored in the variable cnt + i line of the answer candidate holding unit 113, and the flag of the answer candidate in the nth line is set to 0.

そして重複しているデータの中で上から２番目以降に出現するデータを、順次、変数ｃｎｔ＋ｉ行目にマージしていく。詳細には、回答候補文字列が一致したｍ行目の回答候補の評価点を変数ｃｎｔ＋ｉ行目の回答候補の評価点に加算し、質問文ＩＤと、文書ＩＤを追加し、ｍ行目の回答候補のｆｌａｇを０とする。 Then, the data appearing second and later from the top among the overlapping data is sequentially merged into the variable cnt + i line. More specifically, the evaluation score of the answer candidate in the m-th line that matches the answer candidate character string is added to the evaluation score of the answer candidate in the variable cnt + i line, the question sentence ID and the document ID are added, and the m-th line The flag of the answer candidate is set to 0.

以上の処理の結果、分割質問による回答候補のデータはマージが行われる。
例えば、ｎ＝６となるとステップＳ６５８にて、図５Ｂの回答候補６行目の質問文ＩＤ“００２”，評価点“０．２”，文書ＩＤ“００２１”が、図５Ｃの回答候補１２行目（変数ｃｎｔ＋１行目、即ち、この時点での回答候補保持部１１３が保持する情報の件数である１１＋１＝１２行目）にそれぞれコピーされる。回答候補６（ｎ）行目のｆｌａｇは０となり、コピー先である変数ｃｎｔ＋１行目のｆｌａｇは１となる。 As a result of the above processing, the answer candidate data based on the divided question is merged.
For example, when n = 6, in step S658, the question sentence ID “002”, the evaluation score “0.2”, and the document ID “0021” on the sixth answer candidate line in FIG. 5B are the 12th answer candidate line in FIG. 5C. Each is copied to the first line (variable cnt + 1 line, that is, 11 + 1 = 12th line, which is the number of pieces of information held by the answer candidate holding unit 113 at this time). The flag of the answer candidate 6 (n) line is 0, and the flag of the variable cnt + 1 line as the copy destination is 1.

更に、分割質問文の重複については変数ｍが変更されることによりサーチが行われ、ｍ＝９となるとステップＳ６６０にて、図５Ｂの回答候補９行目の評価点“０．２”が図５Ｃの回答候補１２行目に加算され合計した値である“０．４”となり、質問文ＩＤである“００３” ，文書ＩＤ“０３１１”が、図５Ｃの回答候補１２行目に追加される。回答候補９（ｍ）行目のｆｌａｇは０となる。この時点での回答候補保持部１１３のデータを図５Ｃに示す。 Further, for the duplication of the divided question text, a search is performed by changing the variable m. When m = 9, the evaluation score “0.2” on the answer candidate 9th line in FIG. 5B is shown in step S660. The value added to the 12th answer candidate line of 5C is “0.4”, and the question sentence ID “003” and the document ID “0311” are added to the answer candidate 12th line of FIG. 5C. . The flag in the answer candidate 9 (m) line is 0. Data of the answer candidate holding unit 113 at this time is shown in FIG. 5C.

要するに、重複しているデータの中で、一番上の行のデータが空いているエリアへコピーされ（ステップＳ６５８）、２番目以降のデータがこのコピーされた行に追加されていく（ステップＳ６６０）こととなる。この時点での回答候補保持部１１３のデータを図５Ｄに示す。このように回答候補文字列毎に、ｆｌａｇ＝１のデータは１つのみ存在する状態となる。 In short, among the duplicated data, the data in the top row is copied to an empty area (step S658), and the second and subsequent data are added to the copied row (step S660). ) Data of the answer candidate holding unit 113 at this time is shown in FIG. 5D. Thus, there is only one data with flag = 1 for each answer candidate character string.

次に、回答候補保持部１１３の全てのデータに対し、同じ回答候補文字列で異なる質問文ＩＤ（質問文ＩＤが複数あるときには、質問文ＩＤを複数並べた質問文ＩＤリスト）を持つデータの有無を調べ、もし見つかった場合には、評価点の小さい方のデータのｆｌａｇを０に変更する。 Next, for all the data in the answer candidate holding unit 113, data having different question sentence IDs (a question sentence ID list in which a plurality of question sentence IDs are arranged when there are a plurality of question sentence IDs) in the same answer candidate character string. The flag of the data with the smaller evaluation score is changed to 0 if it is found.

この時点での回答候補保持部１１３のデータを図５Ｄに示す。
以上が、制御部１０２による、回答候補保持部１１３の評価点の更新処理の流れである。
なお、この実施形態では、入力質問文から得られた回答候補と、分割質問文から得られた回答候補とを加算せずに別々に計算した。しかし、入力質問文／分割質問文から得られた回答候補を区別せず、合計を計算しても同様の効果を得ることができる。 Data of the answer candidate holding unit 113 at this time is shown in FIG. 5D.
The above is the flow of the update process of the evaluation score of the answer candidate holding unit 113 by the control unit 102.
In this embodiment, the answer candidate obtained from the input question sentence and the answer candidate obtained from the divided question sentence are separately calculated without being added. However, the same effect can be obtained even if the total is calculated without distinguishing the answer candidates obtained from the input question sentence / divided question sentence.

次に、出力部１０３への出力形態について図面を用いて詳細に説明する。
図７（ａ）は、質問文分割を行わず、入力質問文に対する回答候補からのみ（図５Ｄの質問文ＩＤが００１のデータ、即ち、図５Ａのデータ）を出力部１０３へ回答出力した場合の表示例である。この例では、上位３件を第１位から第３位の回答出力とした例であり、回答候補１行目から３行目の回答候補文字列が表示されている。このように評価点順に並べ替えられて表示がなされる。 Next, an output form to the output unit 103 will be described in detail with reference to the drawings.
FIG. 7A shows a case in which the question sentence is not divided and only the answer candidate for the input question sentence (data with question sentence ID 001 in FIG. 5D, ie, data in FIG. 5A) is output to the output unit 103. Is a display example. This example is an example in which the top three answers are output from the first place to the third place, and answer candidate character strings in the first to third answer candidates are displayed. In this way, the images are sorted and displayed in the order of evaluation points.

図７（ｂ）は、入力質問文および分割質問文に対する回答候補を合わせた全体から回答出力した場合の表示例である。この例では、回答候補１行目と回答候補１２行目の評価点が同じく０．４であるが、入力質問文に対する回答候補を正しいであろうと見なして第１位の回答候補としている。第３位の回答候補には、２番目に高い評価点が０．３２である回答候補２行目が表示されている。このように評価点順に並べ替えられて表示がなされる。 FIG. 7B is a display example when an answer is output from the whole answer candidates for the input question sentence and the divided question sentence. In this example, the evaluation score on the first line of the answer candidate and the 12th line of the answer candidate is 0.4, but the answer candidate for the input question sentence is regarded as correct and is set as the first answer candidate. In the third candidate answer, the second candidate answer line with the second highest evaluation score of 0.32 is displayed. In this way, the images are sorted and displayed in the order of evaluation points.

次に、図８を用いて、分割質問文から得られた回答出力であることを示す情報を表示する場合の表示例について説明する。
図８において、第２位の回答「○○製菓」は分割質問文から得られた回答出力（図５Ｄ回答候補１２行目）であるので、画面上の表示として回答出力の隣に別の色・あるいは形状の文字で「質問分割」と制御部１０２が制御し表示させることにより、ユーザが第２位の回答出力が分割質問文から得られた回答出力であることを知ることができる。 Next, a display example in the case of displaying information indicating that it is an answer output obtained from a divided question sentence will be described using FIG.
In FIG. 8, the second-ranked answer “XX Confectionery” is the answer output obtained from the divided question text (FIG. 5D answer candidate 12th line), so that another color is displayed next to the answer output on the screen. -Or, the control unit 102 controls and displays "question division" with the shape character, so that the user can know that the second-ranked answer output is the answer output obtained from the divided question text.

なお、この例では、回答出力の隣に別の色・あるいは形状の文字で表示したが、例えば分割質問文から得られた回答出力については回答出力の文字の色を変える等、回答出力の文字のフォントや文字の修飾を入力質問文から得られた回答出力と変えることにより識別可能としても良い。 In this example, the character is displayed in a different color or shape next to the answer output. For example, for the answer output obtained from the divided question text, the character of the answer output is changed by changing the color of the answer output character. It may be possible to identify the font by changing the font and character modification from the answer output obtained from the input question sentence.

このように回答出力が分割質問文から得られたものである場合、どのような分割質問文から得られたのかユーザが知りたい場合がある。これに対応するため、図９に示す通り、図８の回答出力に加えて、分割質問文についても表示するように制御部１０２が制御し表示させても良い。図９において、第２位の回答出力「○○製菓」は、２つの分割質問文（質問文ＩＤ＝００２「新潟にある本社がある会社は？」，質問文ＩＤ＝００３「お菓子製造の会社は？」）から得られた回答出力であるので、第２位の回答出力横や下等、第２位の回答出力に対応した表示であることが分かるように、第２位の回答出力に対応付けて表示する。このとき制御部１０２は、図４に示した質問文リスト１１２を参照することにより分割質問文を得て、第２位の回答出力に対応して画面上に出力表示する。 As described above, when the answer output is obtained from the divided question sentence, the user may want to know what kind of divided question sentence is obtained. In order to cope with this, as shown in FIG. 9, in addition to the answer output of FIG. 8, the control unit 102 may control and display the divided question text so as to display it. In FIG. 9, the second-ranked answer output “XX Confectionery” is divided into two divided question sentences (question sentence ID = 002 “Which company has the head office in Niigata?”, Question sentence ID = 003 “confectionery manufacturing The answer output obtained from "What is the company?"), So that the second answer output is displayed next to and below the second answer output, so that the display corresponds to the second answer output Display in association with. At this time, the control unit 102 obtains a divided question sentence by referring to the question sentence list 112 shown in FIG. 4, and outputs and displays it on the screen corresponding to the second-ranked answer output.

更に、回答出力についてどのような文書が根拠となる文書であるのかユーザが知りたい場合がある。これに対応するため図１０に示すとおり、図９の回答出力に加え、回答出力の根拠となった文書についても回答出力に加えて表示する。実際には制御部１０２は、回答出力の根拠となった文書（根拠文書）の文書ＩＤを、図５Ｄに示す回答候補保持部１１３のデータから得て、文書検索部１２３を介して文書データベース１２４から根拠文書を得て、第２位の回答出力に対応して画面上に出力表示する。 Further, there are cases where the user wants to know what document is the basis document for answer output. In order to cope with this, as shown in FIG. 10, in addition to the response output of FIG. 9, the document that is the basis of the response output is displayed in addition to the response output. Actually, the control unit 102 obtains the document ID of the document (foundation document) that is the basis of the answer output from the data of the answer candidate holding unit 113 shown in FIG. 5D, and the document database 124 via the document search unit 123. The basis document is obtained from the above, and output and displayed on the screen corresponding to the second-ranked answer output.

このように、本実施形態においては、入力された質問文に対する回答根拠が複数の文書に分散している場合にも、入力された質問文による回答候補と、分割質問文による回答候補を獲得し、これらを評価して適切に回答候補を提示することができるようになる。 As described above, in the present embodiment, even when the basis for the answer to the input question sentence is distributed among a plurality of documents, the answer candidate based on the input question sentence and the answer candidate based on the divided question sentence are acquired. These can be evaluated and answer candidates can be presented appropriately.

また、本発明では回答候補をマージするときに評価点の合計としたが、必ずしも合計でなくても良い。例えば、合計×０．９というように係数を乗算するようにしても良い。 In the present invention, the sum of the evaluation points is used when the answer candidates are merged. For example, the coefficients may be multiplied by a total of 0.9.

その他、本発明はその趣旨を逸脱しない範囲内で種々の応用が可能である。
なお、本発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 In addition, the present invention can be applied in various ways without departing from the spirit of the present invention.
Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the components without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of components disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

本発明の質問応答装置の構成を示すブロック図。The block diagram which shows the structure of the question answering apparatus of this invention. 質問文分割規則の一例を示す図（構文木構造を条件として分割した例）。The figure which shows an example of a question sentence division | segmentation rule (example which divided | segmented on condition of syntax tree structure). 質問文分割規則の一例を示す図（句読点の前後で質問文を分割して質問を生成した例）。The figure which shows an example of a question sentence division | segmentation rule (example which divided | segmented the question sentence before and after the punctuation mark, and produced | generated the question). 質問文リスト１１２に記憶される質問文のデータの形式を示す図である。It is a figure which shows the format of the data of the question sentence memorize | stored in the question sentence list. 回答候補保持部１１３が保持する回答候補のデータの形式を示す図（入力質問文について入力された例）。The figure which shows the format of the data of the answer candidate which the answer candidate holding part 113 hold | maintains (example input about the input question sentence). 回答候補保持部１１３が保持する回答候補のデータの形式を示す図（入力質問文および分割質問文について入力された例）。The figure which shows the format of the data of the answer candidate which the answer candidate holding part 113 hold | maintains (example input about the input question sentence and the division | segmentation question sentence). 回答候補保持部１１３が保持する回答候補のデータの形式を示す図（マージを説明するための例）。The figure which shows the format of the data of the answer candidate which the answer candidate holding part 113 hold | maintains (example for demonstrating merge). 回答候補保持部１１３が保持する回答候補のデータの形式を示す図（マージを説明するための例）。The figure which shows the format of the data of the answer candidate which the answer candidate holding part 113 hold | maintains (example for demonstrating merge). 制御部１０２のマージ処理の流れを示すフローチャート。5 is a flowchart showing a flow of merge processing of the control unit 102. 出力部１０３の表示例（（ａ）：入力質問文に対する回答候補からのみ回答出力した場合の表示例。（ｂ）入力質問文および分割質問文に対する回答候補からも合わせた全体から回答出力した場合の表示例）。Display example of output unit 103 ((a): Display example when answer is output only from answer candidate for input question sentence. (B) When answer is output from all of answer questions for input question sentence and divided question sentence. Display example). 出力部１０３の表示例（分割質問文から得られた回答出力であることを識別可能に表示する場合の表示例）。The example of a display of the output part 103 (The example of a display in the case of displaying so that it is an answer output obtained from the division | segmentation question sentence so that identification is possible). 出力部１０３の表示例（分割質問文から得られた回答候補を識別可能に表示する場合の表示例）Display example of output unit 103 (display example in the case where answer candidates obtained from divided question sentences are displayed in an identifiable manner) 出力部１０３の表示例（回答出力の根拠となった文書についても回答出力に加えて表示する場合の表示例）。A display example of the output unit 103 (a display example in which a document that is the basis for answer output is displayed in addition to the answer output).

Explanation of symbols

１０１…入力部、１０２…制御部、１０３…出力部、１１１…分割質問文生成部、１１２…質問文リスト、１１３…回答候補保持部、１２１…質問パターンデータベース、１２２…回答種別判定部、１２３…文書検索部、１２４…文書データベース、１２５…回答表現抽出部、１２６…表現カテゴリデータベース、１２７…回答表現データベース、１２８…回答候補生成部。 DESCRIPTION OF SYMBOLS 101 ... Input part, 102 ... Control part, 103 ... Output part, 111 ... Divided question sentence generation part, 112 ... Question sentence list, 113 ... Answer candidate holding part, 121 ... Question pattern database, 122 ... Answer type determination part, 123 Document retrieval unit 124 Document database 125 Answer expression extraction unit 126 Expression category database 127 Answer expression database 128 Answer candidate generation unit

Claims

In a question answering apparatus that extracts and outputs answers to an inputted question sentence from a plurality of documents registered in a document database ,
A question sentence dividing means for dividing the inputted question sentence into divided question sentences which are a plurality of divided question sentences;
Retrieve a search keyword from the question sentence and the divided question sentence , perform a keyword search for the plurality of documents , search the answer expression database using the document ID and document score of the document obtained by the keyword search as a key, Answer candidate storage means for obtaining an answer candidate character string and storing the obtained answer candidate character string and the question sentence or the divided question sentence as a set;
An answer candidate evaluation unit that evaluates and calculates a distance between the answer candidate character string stored in the answer candidate storage unit and the search keyword, and assigns an evaluation score obtained as a result of the evaluation to each set;
For the set obtained from the divided question text , a merge means for merging when the answer candidate character strings are duplicated , and adding the evaluation score ;
A question answering apparatus comprising: answer output means for outputting a predetermined number of answer candidates as answers in the order of the highest evaluation score of the set obtained finally.

The answer output means includes
The question answering apparatus according to claim 1, further comprising means for attaching and outputting information indicating that the answer is obtained from the divided question sentence to the answer obtained from the divided question sentence.

The answer output means includes
The question answering apparatus according to claim 1, further comprising means for outputting the divided question sentence as a pair with the answer to the answer obtained from the divided question sentence.

In the question answering method that extracts and outputs the answers to the input question text from multiple documents registered in the document database ,
The question sentence dividing means divides the inputted question sentence into a plurality of divided question sentences, which are divided question sentences,
A search keyword is extracted from the question sentence and the divided question sentence by the answer candidate storage means, the keyword search is performed on the plurality of documents , and the answer is obtained using the document ID and document score of the document obtained by the keyword search as keys. Search the expression database, obtain an answer candidate character string , store the obtained answer candidate character string and the question sentence or the divided question sentence as a set,
The answer candidate evaluation means evaluates and calculates the distance between the answer candidate character string stored in the answer candidate storage means and the search keyword, and assigns an evaluation score obtained as a result of this evaluation for each set,
For the set obtained from the divided question sentence by merging means, if the answer candidate character strings are duplicated , merge , add the evaluation score,
A question answering method characterized in that the answer output means outputs a predetermined number of the answer candidates as answers in the order of the highest evaluation score of the set obtained finally.