JP2005202627A

JP2005202627A - Question answering device and question answering system

Info

Publication number: JP2005202627A
Application number: JP2004007590A
Authority: JP
Inventors: Yoshitaka Hamaguchi; 佳孝濱口
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2004-01-15
Filing date: 2004-01-15
Publication date: 2005-07-28

Abstract

<P>PROBLEM TO BE SOLVED: To provide a question answering device and system for holding the priority of a database structurer. <P>SOLUTION: This question answering device is provided with a document analyzing part for acquiring a retrieval word by analyzing a question sentence, a document storage part for storing a document group including answer candidates to the question sentence, a document retrieving part for acquiring an identifier group for identifying a document corresponding to the retrieval word by referring to the document storing part, a session managing part for enabling a user to arbitrarily select the identifier group and an answer extracting part for acquiring a document corresponding to the identifier group from the session managing part by referring to the document storing part, and for extracting the answer candidates corresponding to the predetermined question classification from the document. This question answering device is also provided with an answer processing control part for performing the decision of the number of documents as the decision of answer processing based on the identifier group, and for, when the number of documents is not more than a predetermined reference value according to the decision, outputting the result that there is no answer to the question sentence as the answer result of the question sentence. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、質問文の回答を、予め保持する情報から取得する質問応答装置および質問応答システムに関するものである。 The present invention relates to a question answering apparatus and a question answering system for obtaining an answer to a question sentence from information held in advance.

質問文の回答を、予め保持する情報から取得して応答を行う質問応答システムが特許文献１に示されている。
該特許文献１の発明によれば、質問文の形態素を解析して、質問内容の要となる検索語と質問種別とを得る質問解析処理と、予め保持する文書集合（被検索文書）から、検索語が示されている文書を検索する文書検索処理と、検索で得た文書において、質問種別に応じた単語に対し検索語・出現頻度に基づいて点数を付与し、該点数に基づいて回答すべき単語を決定する回答抽出処理とを行って、質問内容に応じた回答を取得する。 Japanese Patent Application Laid-Open No. 2004-151867 discloses a question answering system that obtains an answer to a question sentence from information held in advance and responds.
According to the invention of Patent Document 1, a morpheme of a question sentence is analyzed, a question analysis process for obtaining a search word and a question type that are the main contents of a question, and a document set (searched document) held in advance, A document search process for searching a document in which a search term is indicated, and in the document obtained by the search, a score is given to the word according to the question type based on the search word and the appearance frequency, and the answer is based on the score An answer extraction process for determining a word to be performed is performed to obtain an answer according to the question content.

例えば、被検索文書のある文書が「燃料電池について」と示されて保持されており、質問文として「燃料を使う電池は何？」と示されている文面が入力された場合で説明する。
質問解析処理では、質問文の形態素解析を行い、この解析で区切った単語を取得する。すなわち質問解析処理では、「燃料を使う電池は何？」と形態素の解析を行い、「燃料」や「電池」などの単語を取得する。 For example, a description will be given of a case in which a document with a search target document is held as “About fuel cell” and a text indicating “What is a battery using fuel?” Is input as a question sentence.
In the question analysis process, morphological analysis of the question sentence is performed, and words separated by this analysis are acquired. That is, in the question analysis process, morphemes are analyzed as “What is the battery that uses fuel?” And words such as “fuel” and “battery” are acquired.

文書検索処理は、取得した各単語を検索語として、該検索語が被検索文書に含まれているか否か調べる。すなわち文書検索処理では、被検索文を「燃料電池について」と形態素解析する。これにより被検索文書における「燃料電池」は「燃料」および「電池」に区切られ、この区切った単語と、検索語として取得した単語とが一致することで、質問文に対応する被検索文書が特定される。 The document search process uses each acquired word as a search word, and checks whether the search word is included in the searched document. In other words, in the document search process, the searched sentence is morphologically analyzed as “about fuel cell”. As a result, the “fuel cell” in the searched document is divided into “fuel” and “battery”, and the searched word corresponding to the question sentence is obtained by matching the divided word with the word acquired as the search word. Identified.

被検索文書が複数有る場合、利用者が任意に選定可能にセッション管理が行なわれる。このセッション管理で任意に選定された被検索文書に対し回答抽出処理が行なわれる。 When there are a plurality of documents to be searched, session management is performed so that the user can arbitrarily select. Answer extraction processing is performed on the search target document arbitrarily selected by the session management.

回答抽出処理では、特定した被検索文書の各単語に対し、質問種別や検索語・出現頻度に基づいて点数を付与し、付与した点数に基づいて、質問文の回答候補の単語（以降、単に回答候補と称す）を、該単語の抽出元文書と共に出力する。
特開２００２−１３２８１１号公報 In the answer extraction process, a score is assigned to each word of the identified search target document based on the question type and the search word / appearance frequency. Are output together with the word extraction source document.
Japanese Patent Laid-Open No. 2002-132911

ところで、回答候補を抽出する際に、該回答候補の抽出元の文書が１つしか存在しない場合や、利用者がセッション管理で意図的に被検索文書を一つしか選定しなかった場合、利用者は回答候補を抽出した文書が、どの文書であるか容易に特定することができる。
すなわち、利用者は回答候補をどの文書（被検索文書）から抽出したか容易に知り得ることができることから、被検索文書と該被検索文書に対応づけられている回答候補との関係、いわゆるデータベース内容を容易に知ることができる。これにより、データベース構築者の優位性が低減する恐れがあり、これが危惧されていた。 By the way, when answer candidates are extracted, if there is only one document from which the answer candidates are extracted, or if the user intentionally selects only one document to be searched in session management, The person can easily identify the document from which the answer candidates are extracted.
In other words, since the user can easily know from which document (searched document) the answer candidate is extracted, the relationship between the searched document and the answer candidate associated with the searched document, so-called database You can know the contents easily. This could reduce the superiority of database builders, which was feared.

従って、本発明の目的は、データベース構築者の優位性を保持することができる質問応答装置および質問応答システムを提供することにある。 Accordingly, an object of the present invention is to provide a question answering apparatus and a question answering system that can maintain the superiority of a database builder.

本発明は、以上の点を解決するために、次の構成を採用する。
〈構成１〉
利用者からの質問文を解析して検索語を取得する文書解析部と、質問文に対する回答候補が含まれる文書群を蓄積する文書蓄積部と、前記文書蓄積部を参照し、検索語に対応する文書を識別するための識別子群を取得する文書検索部と、前記文書検索部からの前記識別子群を利用者が任意に選定するためのセッション管理部と、前記文書蓄積部を参照して、前記セッション管理部からの前記識別子群に対応する文書を取得すると、該文書から所定の質問種別に対応する回答候補を抽出する回答抽出部とを備えた質問応答装置において、前記識別子群に基づいて文書数の判定を回答処理判定として行い、この判定で文書数が所定の基準値以下であるとき、質問文に対する回答が無いことを前記質問文の回答結果として出力する回答処理制御部を備えることを特徴とする。 The present invention adopts the following configuration in order to solve the above points.
<Configuration 1>
A document analysis unit that analyzes a question sentence from a user and obtains a search term, a document storage unit that accumulates a document group including answer candidates for the question sentence, and the document storage unit, and corresponds to the search term A document search unit for acquiring an identifier group for identifying a document to be performed, a session management unit for a user to arbitrarily select the identifier group from the document search unit, and the document storage unit, When a document corresponding to the identifier group is acquired from the session management unit, an answer extraction unit that extracts an answer candidate corresponding to a predetermined question type from the document, based on the identifier group An answer processing control unit that performs determination of the number of documents as an answer process judgment, and outputs an answer result of the question text that there is no answer to the question text when the number of documents is equal to or less than a predetermined reference value in this judgment. Characterized in that it obtain.

〈構成２〉
更に、文書蓄積部を参照して、前記識別子群に関連付けられた文書に基づいて前記質問解析部に前記検索語に類似する類似検索語を取得させ、該類似検索語に対応する類似文書を識別するための類似識別子を前記文書検索部に取得させる類似文書検索部を備えており、前記回答処理制御部は、前記文書検索部で得る前記識別子群に基づいて、文書数が所定の基準値以下であるとき、前記類似文書検索部を起動させて、該類似文書検索部からの類似識別子を取得すると、前記類似識別子群に類似識別子を加えて得た文書数に基づいて前記回答処理判定を行い、この判定で文書数が所定の基準値より多いとき、回答候補の抽出と共に抽出元の文書を質問文の回答結果として出力させるべく、前記回答抽出部を起動させることを特徴とする。 <Configuration 2>
Further, referring to the document storage unit, based on the document associated with the identifier group, the query analysis unit acquires a similar search word similar to the search word, and identifies a similar document corresponding to the similar search word A similar document search unit that causes the document search unit to acquire a similar identifier to be used, and the answer processing control unit has a number of documents equal to or less than a predetermined reference value based on the identifier group obtained by the document search unit. When the similar document search unit is activated and a similar identifier is obtained from the similar document search unit, the answer processing determination is performed based on the number of documents obtained by adding the similar identifier to the similar identifier group. When the number of documents is larger than a predetermined reference value in this determination, the answer extraction unit is activated so that the answer source is extracted and the extraction source document is output as the answer result of the question sentence.

〈構成３〉
利用者からの質問文を解析して検索語を取得する文書解析部と、質問文に対する回答候補が含まれる文書群を蓄積する文書蓄積部と、前記文書蓄積部を参照し、検索語に対応する文書を識別するための識別子群を取得する文書検索部と、前記文書検索部からの前記識別子群を利用者が任意に選定するためのセッション管理部と、前記文書蓄積部を参照して、前記セッション管理部からの前記識別子群に対応する文書を取得すると、該文書から所定の質問種別に対応する回答候補を抽出する回答抽出部とを備えた質問応答装置において、前記回答抽出部からの回答候補の抽出元の文書を識別する抽出識別子群に基づいて文書数の判定を行い、この判定で文書数が所定の基準値以下であるとき、質問文に対する回答が無いことを前記質問文の回答結果として出力する回答処理制御部を備えることを特徴とする。 <Configuration 3>
A document analysis unit that analyzes a question sentence from a user and obtains a search term, a document storage unit that accumulates a document group including answer candidates for the question sentence, and the document storage unit, and corresponds to the search term A document search unit for acquiring an identifier group for identifying a document to be performed, a session management unit for a user to arbitrarily select the identifier group from the document search unit, and the document storage unit, When a document corresponding to the identifier group is acquired from the session management unit, an answer extraction unit that extracts an answer candidate corresponding to a predetermined question type from the document, the question answering device includes: The number of documents is determined based on an extraction identifier group that identifies a document from which answer candidates are extracted. If the number of documents is equal to or less than a predetermined reference value in this determination, it is determined that there is no answer to the question sentence. answer Characterized in that it comprises an answer processing controller for outputting a result.

〈構成４〉
利用者からの質問文を解析して検索語を取得する文書解析部と、質問文に対する回答候補が含まれる文書群を蓄積する文書蓄積部と、前記文書蓄積部を参照し、検索語に対応する文書を識別するための識別子群を取得する文書検索部と、前記文書検索部からの前記識別子群を利用者が任意に選定するためのセッション管理部と、前記文書蓄積部を参照して、前記セッション管理部からの前記識別子群に対応する文書を取得すると、該文書から所定の質問種別に対応する回答候補を抽出する回答抽出部とを備えた質問応答装置において、前記文書蓄積部を参照して、前記識別子群に関連付けられた文書に基づいて前記質問解析部に前記検索語に類似する類似検索語を取得させ、該類似検索語に対応する類似文書を識別するための類似識別子を前記文書検索部に取得させる類似文書検索部と、前記回答抽出部からの回答候補の抽出元文書を識別する抽出識別子群に基づいて文書数の判定を行い、この判定で文書数が所定の基準値以上であるとき、回答候補の抽出と共に抽出元の文書を質問文の回答結果として出力し、前記文書数が所定の基準値未満であるとき、前記類似文書検索部を起動させて、該類似文書検索部からの類似識別子を取得すると、該類似識別子で示される文書数が所定の基準値以上のとき、前記識別子に前記類似識別子を加えて生成した新たな識別子に基づいて前記回答抽出部を起動させる回答処理制御部とを備えることを特徴とする。 <Configuration 4>
A document analysis unit that analyzes a question sentence from a user and obtains a search term, a document storage unit that accumulates a document group including answer candidates for the question sentence, and the document storage unit, and corresponds to the search term A document search unit for acquiring an identifier group for identifying a document to be performed, a session management unit for a user to arbitrarily select the identifier group from the document search unit, and the document storage unit, When a document corresponding to the identifier group is acquired from the session management unit, an answer extraction unit that extracts an answer candidate corresponding to a predetermined question type from the document is referred to the document storage unit. Then, based on a document associated with the identifier group, the query analysis unit acquires a similar search word similar to the search word, and a similar identifier for identifying a similar document corresponding to the similar search word The number of documents is determined based on a similar document search unit to be acquired by the document search unit, and an extraction identifier group that identifies an extraction source document of answer candidates from the answer extraction unit. If it is above, the extraction source document is output as the answer result of the question sentence together with the extraction of the answer candidate, and when the number of documents is less than a predetermined reference value, the similar document search unit is activated to When the similarity identifier is acquired from the search unit, when the number of documents indicated by the similarity identifier is equal to or greater than a predetermined reference value, the answer extraction unit is activated based on a new identifier generated by adding the similarity identifier to the identifier And an answer processing control unit.

〈構成５〉
前記識別子は、文書のファイル名、若しくは前記文書蓄積部に蓄積された文書について検索語となり得る単語とその文書を対応づけたインデックスであることを特徴とする。 <Configuration 5>
The identifier is a file name of a document or an index that associates the document with a word that can be a search word for the document stored in the document storage unit.

〈構成６〉
前記類似文書検索部は、前記文書検索部からの類似識別子が前記識別子群に含まれているとき、その類似識別子を取り除くことを特徴とする。 <Configuration 6>
The similar document search unit removes the similar identifier when the similar identifier from the document search unit is included in the identifier group.

〈構成７〉
前記質問応答装置をサーバ・クライアント構築すべく、クライアントは前記セッション管理部を備えており、サーバは前記セッション管理部で選定された識別子に基づいて動作する前記回答処理制御部と、該回答処理制御部の制御を受ける前記各部と、前記文書蓄積部とを備えることを特徴とする質問応答システム。 <Configuration 7>
In order to construct the question answering apparatus as a server / client, the client includes the session management unit, and the server operates based on the identifier selected by the session management unit, and the answer processing control. A question answering system comprising: each unit under control of a unit; and the document storage unit.

本発明の質問応答装置は、質問文を解析して得る検索語に対応する文書を識別するための識別子群に基づいて文書数の判定を行い、この判定で文書数が所定の基準値以下であるとき、質問文に対する回答が無いことを質問文の回答結果として出力する回答制御部を備えることにより、抽出元の文書が複数あり、利用者が抽出元の文書を特定できないときのみ、質問文に対する回答を行うことから、利用者がデータベースの内容を容易に把握することができず、データベース構築者の優位性を保つことが出来る。 The question answering apparatus of the present invention determines the number of documents based on an identifier group for identifying a document corresponding to a search word obtained by analyzing a question sentence, and the number of documents is equal to or less than a predetermined reference value by this determination. In some cases, by providing an answer control unit that outputs that there is no answer to the question text as the answer result of the question text, the question text only exists when there are multiple source documents and the user cannot identify the source text Therefore, the user cannot easily grasp the contents of the database, and the superiority of the database builder can be maintained.

以下、本発明の実施形態をクライアント・サーバ構成の質問応答システムの例を、図を用いて詳細に説明する。 Hereinafter, an example of a question answering system having a client / server configuration will be described in detail with reference to the drawings.

本発明の質問応答システム１０は、図１に示すように、サーバ２０と、クライアント３０とを備えており、該サーバ２０およびクライアント３０が図示しない伝送路としてのネットワークに接続されており、該ネットワークを介してサーバ２０およびクライアント３０間でデータ通信が行なわれる。 As shown in FIG. 1, the question answering system 10 of the present invention includes a server 20 and a client 30, and the server 20 and the client 30 are connected to a network as a transmission path (not shown). Data communication is performed between the server 20 and the client 30 via the.

クライアント３０は、利用者からの質問文を受入れる入力部１１００と、該入力部１１００で受入れた質問文に基づいてサーバ２０との遣り取りの管理を行うセッション管理部１５００と、質問文に対する回答候補を利用者に提示する出力部１７００とを備える。 The client 30 includes an input unit 1100 that receives a question sentence from the user, a session management unit 1500 that manages communication with the server 20 based on the question sentence received by the input unit 1100, and answers to the question sentence. And an output unit 1700 to be presented to the user.

また、サーバ２０は、質問文の形態素解析を行い、質問内容の要となる検索語を取得する質問解析部１２００と、検索語に適合する文書を検索し、検索結果をセッション管理部１５００へ出力する文書検索部１４００と、該文書検索部１４００のための文書を保持する文書蓄積部１３００と、セッション管理部１５００から取得する文書の数に基づいて、質問文に対する回答を制御する回答処理制御部１６００と、該回答処理制御部１６００で処理された文書から質問文に対する回答候補を抽出する回答抽出部１８００とを備える。 In addition, the server 20 performs morphological analysis of the question sentence, searches the query analysis unit 1200 for acquiring a search term that is the key of the question content, searches for a document that matches the search term, and outputs the search result to the session management unit 1500. A document search unit 1400 that performs a document search, a document storage unit 1300 that holds a document for the document search unit 1400, and an answer processing control unit that controls an answer to the question sentence based on the number of documents acquired from the session management unit 1500 1600, and an answer extraction unit 1800 for extracting answer candidates for the question sentence from the document processed by the answer processing control unit 1600.

質問解析部１２００は、質問文に対し形態素解析などの自然言語処理を行い、質問文の要となる単語を検索語として取得する。例えば質問文として「○×装置の発明者？」を取得した質問解析部１２００は、「○×装置，の，開発者，？」と形態素解析を行い、区分けした各単語から、質問文のキーワードとなり得る品詞（例えば名詞）の単語を検索語として取得する。このとき質問解析部１２００は、検索語に基づいて、後述する回答抽出部１８００での回答抽出処理のための質問種別を取得してもよい。 The question analysis unit 1200 performs natural language processing such as morphological analysis on the question sentence, and acquires a word that is the key of the question sentence as a search word. For example, the question analysis unit 1200 that has acquired “XX inventor?” As a question sentence performs a morphological analysis with “XX device, developer ,?”, and from each divided word, the keyword of the question sentence A word with a possible part of speech (for example, a noun) is acquired as a search word. At this time, the question analysis unit 1200 may acquire a question type for answer extraction processing in an answer extraction unit 1800 described later based on the search word.

文書蓄積部１３００は、質問文の回答候補の抽出元になる文書集合を保持する。ところで文書蓄積部１３００は、後述する文書検索部１４００での検索効率の向上を図るべく、予め各文書毎に形態素解析などの自然言語解析処理を行って得た単語（回答候補）に、当該文書の識別子を関連付けて保持することが望ましい。 The document accumulating unit 1300 holds a document set as a source for extracting question answer answers. By the way, the document accumulating unit 1300 applies the document (answer candidate) to the word (answer candidate) obtained in advance by performing natural language analysis processing such as morphological analysis for each document in order to improve the search efficiency in the document search unit 1400 described later. It is desirable to keep the identifiers associated with each other.

文書検索部１４００は、質問解析部１２００からの検索語に合致する文書を文書蓄積部１３００を参照して取得する。本実施例では、文書検索部１４００で取得する文書において、その識別子（識別子群）が検索結果としてセッション管理部１５００へ出力される。このとき、識別子と共に、その検索語もセッション管理部１５００へ出力される。 The document search unit 1400 acquires a document that matches the search word from the question analysis unit 1200 with reference to the document storage unit 1300. In this embodiment, the identifier (identifier group) of the document acquired by the document search unit 1400 is output to the session management unit 1500 as a search result. At this time, the search term is also output to the session management unit 1500 together with the identifier.

セッション管理部１５００は、文書検索部１４００から検索結果として文書の識別子や検索語を取得すると、該識別子および検索語を質問文に対応させて管理する。セッション管理部１５００は、管理する識別子および検索語を回答処理制御部１６００へ出力する。また、セッション管理部１５００は、必要に応じて利用者からの指示に基づいて、管理する検索語や識別子の補正を行うべく、入力表示の機能も備えている。 When the session management unit 1500 acquires a document identifier or search word as a search result from the document search unit 1400, the session management unit 1500 manages the identifier and the search word in association with the question sentence. Session management unit 1500 outputs an identifier and a search term to be managed to answer processing control unit 1600. The session management unit 1500 is also provided with an input display function so as to correct a search term and an identifier to be managed based on an instruction from the user as necessary.

回答処理制御部１６００は、セッション管理部１５００からの識別子を取得すると、その識別子の数に基づいて所定の基準値以上か否か判定を行い、識別子の数が基準値以下のとき、出力部１７００に対し、「回答無し」を示す情報を出力する。一方、識別子の数が基準値より多いとき、その識別子および検索語を回答抽出部１８００へ出力する。 When the response processing control unit 1600 acquires the identifier from the session management unit 1500, the response processing control unit 1600 determines whether or not the identifier is equal to or greater than a predetermined reference value based on the number of identifiers. When the number of identifiers is equal to or less than the reference value, the output unit 1700 In response to this, information indicating “no answer” is output. On the other hand, when the number of identifiers is greater than the reference value, the identifiers and search terms are output to the answer extraction unit 1800.

回答抽出部１８００は、回答処理制御部１６００からの識別子を取得すると、文書蓄積部１３００を参照して、識別子に関連付けられている回答候補のうち、質問種別に対応した属性の単語を抽出する。抽出された回答候補は、回答抽出部１８００から出力部１７００へ出力される。ところで、質問種別は、従来から知られている方法で質問文から求めてもよいし、セッション管理部１５００にＧＵＩ（Graphical User Interface）の機能を備え、該ＧＵＩなどで利用者が任意に指定してもよいし、予め特定の質問種別を限定しておいてもよい。 When the answer extraction unit 1800 acquires the identifier from the answer processing control unit 1600, the answer extraction unit 1800 refers to the document storage unit 1300 to extract a word having an attribute corresponding to the question type from among answer candidates associated with the identifier. The extracted answer candidates are output from the answer extraction unit 1800 to the output unit 1700. By the way, the question type may be obtained from a question sentence by a conventionally known method, or the session management unit 1500 has a GUI (Graphical User Interface) function, and the user arbitrarily designates it with the GUI or the like. Alternatively, specific question types may be limited in advance.

回答抽出部１８００は、抽出された回答候補に対し回答候補の出現回数や出現文書数などに基づいて得点を付与した後、出力部１７００へ出力する。
出力部１７００は、回答候補に該回答候補を抽出するための質問文を対応付けて利用者に提示したり、回答が無いことを、セッション管理部１５００で加工した加工履歴と対応付けて利用者に提示する。 The answer extraction unit 1800 gives a score to the extracted answer candidate based on the number of appearances of the answer candidate, the number of appearing documents, and the like, and then outputs the score to the output unit 1700.
The output unit 1700 associates a question sentence for extracting the answer candidate with the answer candidate and presents it to the user, or associates that there is no answer with the processing history processed by the session management unit 1500. To present.

次に実施例１の質問応答システム１０の動作を図２の処理フローに沿って説明する。
利用者が質問文を入力部１１００から入力する（ステップＳ１０）。入力された質問文は、ネットワークを介して質問解析部１２００へ送られる。このとき、入力部１１００からセッション管理部１５００にも質問文が送られる。 Next, operation | movement of the question answering system 10 of Example 1 is demonstrated along the processing flow of FIG.
The user inputs a question sentence from the input unit 1100 (step S10). The input question text is sent to the question analysis unit 1200 via the network. At this time, a question sentence is also sent from the input unit 1100 to the session management unit 1500.

質問解析部１２００は、質問文を取得すると、該質問文に対し形態素解析を行い、質問内容の要となる検索語を取得し、該検索語を文書検索部１４００へ出力する（ステップＳ２０）。 When the question analysis unit 1200 obtains the question sentence, the question analysis part 1200 performs morphological analysis on the question sentence, obtains a search word that is the key of the question content, and outputs the search word to the document search part 1400 (step S20).

文書検索部１４００は、検索語を取得すると、文書蓄積部１３００を参照して検索語に対応する文書（被検索文書）の識別子を取得し、該識別子と検索語とをセッション管理部１５００へ出力する（ステップＳ３０）。ここでは、文書検索部１４００で複数の識別子が識別子群として取得された例で、以降の説明を行う。
セッション管理部１５００は、入力部１１００からの質問文に、文書検索部１４００からの検索語および識別子群とを対応させて管理する（ステップＳ４０）。また利用者からの質問種別をセッション管理部１５００で受付けると、該セッション管理部１５００は、該質問種別も質問文や識別子群に対応付けて管理する。 When the search term is acquired, the document search unit 1400 refers to the document storage unit 1300 to acquire the identifier of the document (searched document) corresponding to the search term, and outputs the identifier and the search term to the session management unit 1500. (Step S30). Here, an example in which a plurality of identifiers are acquired as an identifier group by the document search unit 1400 will be described below.
The session management unit 1500 manages the query text from the input unit 1100 in association with the search word and identifier group from the document search unit 1400 (step S40). When the session management unit 1500 receives a question type from the user, the session management unit 1500 also manages the question type in association with the question sentence or the identifier group.

利用者がセッション管理部１５００に対し質問文の回答を要求する指示を行うと、検索語に基づいて取得した識別子群がセッション管理部１５００から回答処理制御部１６００へ出力される。
回答処理制御部１６００は、セッション管理部１５００から識別子群を取得すると、識別子の数が所定の基準値以下であるか否かを判定する（ステップＳ５０）。本実施例では、基準値を１とする。回答処理制御部１６００は、識別子の数が１以下のとき、出力部１７００に対し、回答無しを示す情報を出力する。この情報を受けた出力部１７００は、その旨を利用者に提示する。 When the user instructs the session management unit 1500 to request an answer to a question sentence, the identifier group acquired based on the search term is output from the session management unit 1500 to the response processing control unit 1600.
When the answer processing control unit 1600 acquires the identifier group from the session management unit 1500, the answer processing control unit 1600 determines whether the number of identifiers is equal to or less than a predetermined reference value (step S50). In this embodiment, the reference value is 1. When the number of identifiers is 1 or less, the answer processing control unit 1600 outputs information indicating no answer to the output unit 1700. Upon receiving this information, the output unit 1700 presents that fact to the user.

一方、識別子の数が２以上あるとき、回答処理制御部１６００は、識別子を回答抽出部１８００へ出力する。
回答抽出部１８００は、識別子および検索語を取得すると、文書蓄積部１３００を参照して識別子に関連付けられている単語（回答候補）を取得する（ステップＳ６０）。このとき、回答抽出部１８００は、セッション管理部１５００から質問種別を取得し、該質問種別に対応する回答候補を抽出する。回答抽出部１８００は、抽出した回答候補に対し、従来から知られた方法により、出現回数や出現文書数などに基づいて得点を付与した後、出力部１７００へ出力する。
出力部１７００は、回答候補に該回答候補を抽出するための質問文を対応付けて利用者に提示する（ステップＳ７０）。 On the other hand, when there are two or more identifiers, the answer processing control unit 1600 outputs the identifiers to the answer extracting unit 1800.
When the answer extraction unit 1800 obtains the identifier and the search word, the answer extraction unit 1800 refers to the document storage unit 1300 to obtain a word (answer candidate) associated with the identifier (step S60). At this time, the answer extraction unit 1800 acquires the question type from the session management unit 1500 and extracts answer candidates corresponding to the question type. The answer extraction unit 1800 assigns a score to the extracted answer candidates based on the number of appearances, the number of appearance documents, and the like by a conventionally known method, and then outputs the score to the output unit 1700.
The output unit 1700 associates a question sentence for extracting the answer candidate with the answer candidate and presents it to the user (step S70).

ここで、利用者がＧＵＩなどを用いて、セッション管理部１５００で管理する識別子群に対し、絞込みを行なったときの動作を説明する。セッション管理部１５００において、利用者がＧＵＩで絞込みを行うと、新たな識別子群が生成される。この識別子群において、識別子の数が１つになったときの動作を説明する（以降、識別子群と称するが識別子の数は、１つである）。識別子群が回答処理制御部１６００へ出力されると、回答処理制御部１６００は識別子の数を判定する。この判定で識別子の数が１以下であると判定した回答処理制御部１６００は、回答抽出部１８００へ識別子群を送ることなく、「回答無し」であることをネットを介して出力部１７００へ出力する。これにより、識別子の数が１以下のとき、利用者は回答候補を得ることができない。 Here, an operation when the user narrows down the identifier group managed by the session management unit 1500 using the GUI or the like will be described. In the session management unit 1500, when the user narrows down with the GUI, a new identifier group is generated. An operation when the number of identifiers in this identifier group becomes one will be described (hereinafter referred to as identifier group, but the number of identifiers is one). When the identifier group is output to the answer processing control unit 1600, the answer processing control unit 1600 determines the number of identifiers. The response processing control unit 1600 that has determined that the number of identifiers is 1 or less in this determination does not send the identifier group to the response extraction unit 1800 and outputs “no response” to the output unit 1700 via the network. To do. Thereby, when the number of identifiers is 1 or less, the user cannot obtain answer candidates.

前記したように、実施例１の質問応答システム１０によれば、文書蓄積部１３００で保持する各文書（各単語が関連付けられている識別子の各文書）から、回答候補を得る際に、利用者は回答候補をどの文書から抽出したか知ることができない、すなわち回答候補をどの文書から抽出したか判らない場合には、通常どおり回答候補を出力し、回答候補を抽出した文書が特定できる場合は、質問文に対する回答を「回答無し」と出力することから、利用者は、どの文書から回答候補が抽出されたか知ることが出来ない。 As described above, according to the question answering system 10 of the first embodiment, when obtaining answer candidates from each document held in the document storage unit 1300 (each document with an identifier associated with each word), the user If it is not possible to know from which document the answer candidate was extracted, that is, if it does not know from which document the answer candidate was extracted, the answer candidate is output as usual, and the document from which the answer candidate was extracted can be identified Since the answer to the question sentence is output as “no answer”, the user cannot know from which document the answer candidate is extracted.

従って、利用者が各文書に対してそこから得られる回答候補を対応づけたデータベース構成を知ることが出来ず、データベースを構築したサーバ運用者の利益が損なわれる恐れを低減することができる。 Therefore, the user cannot know the database configuration in which the answer candidates obtained from each document are associated with each document, and the possibility that the profit of the server operator who built the database is impaired can be reduced.

次に、実施例１の質問応答システム１０の構成に類似文書検索部が追加された質問応答システム１１を説明する。質問応答システム１１は、図３に示すように、サーバ２０とクライアント３０とが図示しないネットワークで接続されている。
クライアント３０は、入力部１１００とセッション管理部１５００と出力部１７００とを備える。 Next, a question answering system 11 in which a similar document search unit is added to the configuration of the question answering system 10 of the first embodiment will be described. In the question answering system 11, as shown in FIG. 3, the server 20 and the client 30 are connected by a network (not shown).
The client 30 includes an input unit 1100, a session management unit 1500, and an output unit 1700.

サーバ２０は、文書蓄積部１３００と、回答抽出部１８００と、実施例２における質問解析部２２００と、実施例２における文書検索部２４００と、実施例２における回答処理制御部２６００と、新たな類似文書検索部２９００とを備える。 The server 20 is newly similar to the document storage unit 1300, the answer extraction unit 1800, the question analysis unit 2200 in the second embodiment, the document search unit 2400 in the second embodiment, and the answer processing control unit 2600 in the second embodiment. A document search unit 2900.

クライアント３０側の構成内容は、前記した具体例１と同じであることから、その説明を割愛し、サーバ２０において、前記した具体例１の構成と異なる各部のみ説明を行なう。
質問解析部２２００は、実施例１と同様に質問文に基づいて検索語を取得して、取得した検索語を文書検索部２４００へ出力する。これに加えて質問解析部２２００は、後述する類似文書検索部２９００からの文書に対し、形態素解析を行い、その文書のキーワードとなり得る品詞の単語を類似検索語として取得し、該類似検索語を類似文書検索部２９００へ出力する。 Since the configuration content on the client 30 side is the same as in the first specific example, the description thereof will be omitted, and the server 20 will be described only for each part different from the configuration in the first specific example.
The question analysis unit 2200 acquires a search term based on the question sentence as in the first embodiment, and outputs the acquired search term to the document search unit 2400. In addition to this, the question analysis unit 2200 performs morphological analysis on a document from a similar document search unit 2900, which will be described later, and acquires a part of speech word that can be a keyword of the document as a similar search word. The data is output to the similar document search unit 2900.

文書検索部２４００は、実施例１と同様に、質問解析部２２００からの検索語を取得すると、文書蓄積部１３００を参照して、検索語に合致する回答単語（文書）が関連付けられている識別子群を取得し、該識別子群をセッション管理部１５００へ出力する。これに加えて文書検索部２４００は、類似文書検索部２９００からの類似検索語を取得すると、文書蓄積部１３００を参照して、類似検索語に合致する回答単語（文書）が関連付けられている識別子を類似識別子として、該類似識別子を類似文書検索部２９００へ出力する。 As in the first embodiment, when the document search unit 2400 acquires a search word from the question analysis unit 2200, the document search unit 2400 refers to the document storage unit 1300, and an identifier associated with an answer word (document) that matches the search word. The group is acquired, and the identifier group is output to the session management unit 1500. In addition, when the document search unit 2400 obtains a similar search word from the similar document search unit 2900, the document search unit 2400 refers to the document storage unit 1300 to identify an identifier associated with a response word (document) that matches the similar search word. Is output as a similar identifier, and the similar identifier is output to the similar document search unit 2900.

回答処理制御部２６００は、セッション管理部１５００からの識別子群を取得すると、その識別子の数が所定の基準値以下か否か判定を行う。本実施例では、基準値を１として以降の説明を行う。識別子の数が１以下であるとき、回答処理制御部２６００は、識別子群を２９００へ出力する。そして、出力した識別子群に関連付けられた回答単語（文書）に類似するその他の回答単語（文書）の識別子、すなわち類似識別子を類似文書検索部２９００から取得した回答処理制御部２６００は、識別子群に類似識別子を加えた新たな識別子群を生成し、該識別子群において、識別子の数が所定の基準値以下か否かを判定する。この判定結果に基づく出力は、前記した実施例１と同様である。 When the answer processing control unit 2600 acquires the identifier group from the session management unit 1500, the answer processing control unit 2600 determines whether the number of identifiers is equal to or less than a predetermined reference value. In the present embodiment, the following description will be made assuming that the reference value is 1. When the number of identifiers is 1 or less, the answer processing control unit 2600 outputs the identifier group to 2900. Then, the response processing control unit 2600 that acquired the identifier of another response word (document) similar to the response word (document) associated with the output identifier group, that is, the similar identifier from the similar document search unit 2900 is included in the identifier group. A new identifier group to which similar identifiers are added is generated, and it is determined whether or not the number of identifiers in the identifier group is equal to or less than a predetermined reference value. The output based on the determination result is the same as that in the first embodiment.

すなわち、類似識別子の加算により、新たに生成した識別子群において、識別子の数が１以下であるとき、回答処理制御部２６００は、出力部１７００に対し「回答無し」を出力する。 That is, when the number of identifiers is 1 or less in the newly generated identifier group by adding similar identifiers, the answer processing control unit 2600 outputs “no answer” to the output unit 1700.

一方、識別子の数が２以上であるとき、回答処理制御部２６００は、類似文書検索部２９００での処理を行うことなく、前記した実施例１と同様に、識別子群を回答抽出部１８００へ出力する。 On the other hand, when the number of identifiers is 2 or more, the answer processing control unit 2600 outputs the identifier group to the answer extracting unit 1800 as in the first embodiment without performing the process in the similar document search unit 2900. To do.

類似文書検索部２９００は、文書蓄積部１３００を参照して、回答処理制御部２６００からの識別子群に関連付けられた文書に類似する文書の識別子を類似識別子として取得し、該類似識別子を回答処理制御部２６００へ出力する。 The similar document search unit 2900 refers to the document storage unit 1300, acquires an identifier of a document similar to the document associated with the identifier group from the response processing control unit 2600 as a similar identifier, and uses the similar identifier as a response processing control. Part 2600.

すなわち、類似文書検索部２９００は、回答処理制御部２６００からの識別子群を取得すると、文書蓄積部１３００を参照して該識別子群に関連付けられた文書を取得し、該文書に対し質問解析部２２００で形態素解析を行なわせ類似検索語の取得を行わせる。そして、類似文書検索部２９００は、取得した類似検索語に基づいて、文書検索部２４００で類似識別子の取得を行わせ、取得した類似識別子を回答処理制御部２６００へ出力する。 That is, when the similar document search unit 2900 acquires the identifier group from the answer processing control unit 2600, the similar document search unit 2900 acquires the document associated with the identifier group with reference to the document storage unit 1300, and the query analysis unit 2200 for the document. Then, morphological analysis is performed to obtain similar search terms. Then, the similar document search unit 2900 causes the document search unit 2400 to acquire a similar identifier based on the acquired similar search word, and outputs the acquired similar identifier to the answer processing control unit 2600.

次に質問応答システム１１の動作を説明する。
図４は、順に質問解析部２２００、文書検索部２４００、類似文書検索部２９００、回答処理制御部２６００および回答抽出部１８００における各部の処理タイミングを示す処理フローであり、該処理フローに沿って説明を行う。 Next, the operation of the question answering system 11 will be described.
FIG. 4 is a processing flow showing the processing timing of each part in the question analysis unit 2200, document search unit 2400, similar document search unit 2900, answer processing control unit 2600, and answer extraction unit 1800 in this order. I do.

入力部１１００で取得した質問文に基づいて質問解析部２２００で検索語を取得し、該検索語に基づいて文書検索部２４００で識別子群を取得し、セッション管理部１５００で識別子群を管理するまでの処理内容は、実施例１の質問応答システム１０と同様である。すなわち、入力部１１００で入力された質問文を取得した質問解析部２２００は、形態素解析を行い検索語を取得する（ステップＳ２０１）。質問解析部２２００で取得した検索語が文書検索部２４００へ送られると、該文書検索部２４００は、文書蓄積部１３００を参照し、検索語に合致する回答単語（文書）が関連付けられている識別子群をセッション管理部１５００へ出力する。 Until the query analysis unit 2200 acquires a search term based on the question sentence acquired by the input unit 1100, acquires an identifier group by the document search unit 2400 based on the search term, and manages the identifier group by the session management unit 1500 The processing content of is the same as that of the question answering system 10 of the first embodiment. That is, the question analysis part 2200 which acquired the question sentence input by the input part 1100 performs a morphological analysis, and acquires a search word (step S201). When the search word acquired by the question analysis unit 2200 is sent to the document search unit 2400, the document search unit 2400 refers to the document storage unit 1300, and an identifier associated with an answer word (document) that matches the search word. The group is output to the session management unit 1500.

セッション管理部１５００で管理する識別子群において、識別子の数が１つであるときの例で、以降の説明を行う。セッション管理部１５００からの識別子群を取得した回答処理制御部２６００は、該識別子群において、識別子の数を所定の基準値（１）以下であるか否かを判定する（ステップＳ２０３）。この判定において、識別子の数が１以下のとき、該識別子群を類似文書検索部２９００へ出力する。 The following description will be given with an example in which the number of identifiers in the identifier group managed by the session management unit 1500 is one. The response processing control unit 2600 that has acquired the identifier group from the session management unit 1500 determines whether or not the number of identifiers in the identifier group is equal to or less than a predetermined reference value (1) (step S203). In this determination, when the number of identifiers is 1 or less, the identifier group is output to the similar document search unit 2900.

識別子群を取得した類似文書検索部２９００は、該識別子に対応する文書を文書蓄積部１３００を参照して取得する（ステップＳ２０４）。取得した文書は、類似文書検索部２９００から質問解析部２２００へ出力される。 The similar document search unit 2900 that has acquired the identifier group acquires a document corresponding to the identifier with reference to the document storage unit 1300 (step S204). The acquired document is output from the similar document search unit 2900 to the question analysis unit 2200.

質問解析部２２００は、類似文書検索部２９００からの文書を取得すると、該文書の形態素解析を行い、類似検索語を取得する（ステップＳ２０５）。取得した類似検索語は、質問解析部２２００から類似文書検索部２９００へ出力される。 When the question analysis unit 2200 acquires a document from the similar document search unit 2900, the question analysis unit 2200 performs a morphological analysis of the document and acquires a similar search word (step S205). The acquired similar search term is output from the question analysis unit 2200 to the similar document search unit 2900.

質問解析部２２００からの類似検索語を取得した類似文書検索部２９００は、該類似取得検索語を文書検索部２４００へ出力する。
文書検索部２４００は、類似検索語を取得した文書検索部２４００は、文書蓄積部１３００を参照して、類似検索語に対応する識別子を類似識別子として取得し、該類似識別子を類似文書検索部２９００へ出力する（ステップＳ２０６）。 The similar document search unit 2900 that has acquired the similar search term from the question analysis unit 2200 outputs the similar acquisition search term to the document search unit 2400.
The document search unit 2400, which has acquired the similar search word, refers to the document storage unit 1300, acquires an identifier corresponding to the similar search word as a similar identifier, and uses the similar identifier as the similar document search unit 2900. (Step S206).

類似識別子を取得した類似文書検索部２９００は、単数、若しくは複数の類似識別子において、ステップＳ２０２において取得した識別子が含まれていないか調べ、含まれているとき、その識別子を取り除いた類似識別子を生成し、該類似識別子を回答処理制御部２６００へ出力する（ステップＳ２０７）。 The similar document search unit 2900 that has acquired the similar identifier checks whether or not the identifier acquired in step S202 is included in one or a plurality of similar identifiers, and if included, generates a similar identifier by removing the identifier. Then, the similarity identifier is output to the answer processing control unit 2600 (step S207).

類似識別子を取得した回答処理制御部２６００は、ステップＳ２０２において取得した識別子群に類似識別子を加えた新たな識別子群を生成する（ステップＳ２０８）。
新たな識別子群を生成した回答処理制御部２６００は、該識別子群において、識別子の数を所定の基準値（１）以下であるか否かを判定する（ステップＳ２０９）。この判定において、識別子の数が１以下のとき、出力部１７００に対し「回答無し」を示す情報を出力する。一方、識別子の数が２以上のとき、識別子群を回答抽出部１８００へ出力する。 The response processing control unit 2600 that has acquired the similar identifier generates a new identifier group obtained by adding the similar identifier to the identifier group acquired in step S202 (step S208).
The answer processing control unit 2600 that has generated a new identifier group determines whether or not the number of identifiers in the identifier group is equal to or smaller than a predetermined reference value (1) (step S209). In this determination, when the number of identifiers is 1 or less, information indicating “no answer” is output to the output unit 1700. On the other hand, when the number of identifiers is 2 or more, the identifier group is output to the answer extraction unit 1800.

回答処理制御部２６００からの識別子群を取得した回答抽出部１８００は、実施例１と同様に、質問文に対する回答候補を抽出して、該回答候補に得点を付与し、これを回答抽出部１８００へ出力する（ステップＳ２１０）。 The answer extraction unit 1800 that has acquired the identifier group from the answer processing control unit 2600 extracts answer candidates for the question sentence, assigns scores to the answer candidates, and outputs them to the answer extraction unit 1800 as in the first embodiment. (Step S210).

前記したように、実施例２の質問応答システム１１によれば、文書蓄積部１３００で保持する各文書（各単語が関連付けられている各識別子の文書）から、回答候補を得る際に、識別子群の数が所定の基準値以下のとき、識別子群に対応する文書から類似検索語を取得し、該類似検索語に対応する類似識別子を識別子群に加え、新たな識別子群を生成する。これにより、複数の識別子で構成された識別子群に基づいて回答の抽出処理を行なうことから、利用者は回答候補をどの文書から抽出したか知ることができない。 As described above, according to the question answering system 11 of the second embodiment, when an answer candidate is obtained from each document held in the document storage unit 1300 (document of each identifier associated with each word), the identifier group Is equal to or less than a predetermined reference value, a similar search word is acquired from a document corresponding to the identifier group, a similar identifier corresponding to the similar search word is added to the identifier group, and a new identifier group is generated. As a result, the answer extraction process is performed based on the identifier group composed of a plurality of identifiers, and the user cannot know from which document the answer candidate is extracted.

従って、利用者が各文書に対して、そこから得られる回答候補を対応づけたデータベース構成を知ることが出来ず、データベースを構築したサーバ運用者の利益が損なわれる恐れを低減することができる。 Therefore, the user cannot know the database configuration in which the answer candidates obtained from each document are associated with each document, and the possibility that the profit of the server operator who built the database is impaired can be reduced.

次に、実施例１の回答処理制御部１６００に替わる回答処理制御部３６００と実施例１の回答抽出部１８００に替わる回答抽出部３８００を備えた質問応答システム１２を説明する。
質問応答システム１２は、回答処理制御部３６００および回答抽出部３８００以外の構成は、前記した実施例１の質問応答システム１０の構成と同じである。すなわち、質問応答システム１２は、図５に示すように、サーバ２０とクライアント３０とが図示しないネットワークで接続されており、クライアント３０は、入力部１１００とセッション管理部１５００と出力部１７００とを備える。 Next, a question answering system 12 including an answer processing control unit 3600 that replaces the answer processing control unit 1600 of Example 1 and an answer extraction unit 3800 that replaces the answer extraction unit 1800 of Example 1 will be described.
The configuration of the question answering system 12 other than the answer processing control unit 3600 and the answer extracting unit 3800 is the same as that of the question answering system 10 of the first embodiment described above. That is, in the question answering system 12, as shown in FIG. 5, the server 20 and the client 30 are connected by a network (not shown), and the client 30 includes an input unit 1100, a session management unit 1500, and an output unit 1700. .

サーバ２０は、質問解析部１２００と、文書蓄積部１３００と、文書検索部１４００と、新たな回答処理制御部３６００と、新たな回答抽出部３８００とを備える。
回答処理制御部３６００および回答抽出部３８００以外は、前記した実施例１と同様であることから、その説明を割愛する。 The server 20 includes a question analysis unit 1200, a document storage unit 1300, a document search unit 1400, a new answer processing control unit 3600, and a new answer extraction unit 3800.
Other than the answer processing control unit 3600 and the answer extraction unit 3800 are the same as those in the first embodiment, and the description thereof will be omitted.

回答抽出部３８００は、後述する回答処理制御部３６００からの識別子群を取得すると、文書蓄積部１３００を参照して、識別子に関連付けられている回答候補のうち、質問種別に対応した属性の単語を抽出する。回答抽出部１８００は、抽出した回答候補に対し、従来から知られた方法により、出現回数や出現文書数などに基づいて得点を付与した後、回答処理制御部３６００へ出力する。これに加えて、回答抽出部３８００は、抽出した各回答候補に関連付けられている各識別子で構成された抽出識別子群を取得し、該抽出識別子群を回答処理制御部３６００へ出力する。この回答処理制御部３６００への出力は、回答候補が得られた文書数でもよい。 When the answer extraction unit 3800 obtains an identifier group from the answer processing control unit 3600 described later, the answer extraction unit 3800 refers to the document storage unit 1300 and selects an attribute word corresponding to the question type from among answer candidates associated with the identifier. Extract. The answer extraction unit 1800 assigns a score to the extracted answer candidates based on the number of appearances and the number of appearance documents by a conventionally known method, and then outputs the score to the answer processing control unit 3600. In addition to this, the answer extraction unit 3800 acquires an extracted identifier group composed of identifiers associated with each extracted answer candidate, and outputs the extracted identifier group to the answer processing control unit 3600. The output to the answer processing control unit 3600 may be the number of documents for which answer candidates are obtained.

回答処理制御部３６００は、セッション管理部１５００からの識別子群を回答抽出部３８００へ出力して、該回答抽出部３８００での処理結果として、回答候補と抽出識別子群を取得すると、該抽出識別子群における識別子の数が所定の基準値以下か否か判定を行う。本実施例では、基準値を１として以降の説明を行う。識別子の数が１以下であるとき、回答処理制御部３６００は、「回答無し」を出力部１７００へ出力し、識別子の数（文書数）が２以上であれば、回答候補とその得点を出力部１７００へ出力する。 When the answer processing control unit 3600 outputs the identifier group from the session management unit 1500 to the answer extracting unit 3800 and acquires the answer candidate and the extracted identifier group as the processing result in the answer extracting unit 3800, the extracted identifier group It is determined whether or not the number of identifiers in is less than or equal to a predetermined reference value. In the present embodiment, the following description will be made assuming that the reference value is 1. When the number of identifiers is 1 or less, the answer processing control unit 3600 outputs “no answer” to the output unit 1700, and if the number of identifiers (number of documents) is 2 or more, the answer candidate and its score are output. Part 1700.

次に、質問応答システム１２の動作を本実施例の特徴を示す回答処理制御部３６００の動作を示すフローチャートを用いて説明する。
質問文に基づいて質問解析部２２００で検索語を取得し、該検索語に基づいて文書検索部２４００で識別子群を取得し、セッション管理部１５００で識別子群を管理し、該識別子群を回答処理制御部３６００へ出力するまでの処理内容は、実施例１の質問応答システム１０と同様である。 Next, the operation of the question answering system 12 will be described using a flowchart showing the operation of the answer processing control unit 3600 showing the features of this embodiment.
The query analysis unit 2200 acquires a search word based on the question sentence, the document search unit 2400 acquires an identifier group based on the search word, the session management unit 1500 manages the identifier group, and the identifier group is processed as an answer The processing content until output to the control unit 3600 is the same as that of the question answering system 10 of the first embodiment.

回答処理制御部３６００は、取得した識別子群を回答抽出部３８００へ出力し、回答抽出部３８００における処理結果として、該回答抽出部３８００から回答候補と抽出識別子群とを取得する（ステップＳ３０１）。
回答処理制御部３６００は、抽出識別子群における識別子の数が１以下か否か判定を行う（ステップＳ３０２）。識別子の数（文書数）が２以上であるとき、回答処理制御部３６００は、回答候補とその得点を出力部１７００へ出力する。
一方、識別子の数が１以下であるとき、回答処理制御部３６００は、「回答無し」を出力部１７００へ出力する（ステップＳ３０４）。 The answer processing control unit 3600 outputs the acquired identifier group to the answer extracting unit 3800, and acquires the answer candidate and the extracted identifier group from the answer extracting unit 3800 as a processing result in the answer extracting unit 3800 (step S301).
The answer processing control unit 3600 determines whether or not the number of identifiers in the extracted identifier group is 1 or less (step S302). When the number of identifiers (number of documents) is 2 or more, the answer processing control unit 3600 outputs answer candidates and their scores to the output unit 1700.
On the other hand, when the number of identifiers is 1 or less, the answer processing control unit 3600 outputs “no answer” to the output unit 1700 (step S304).

前記したように、実施例３の質問応答システム１２によれば、回答抽出部３８００で抽出した回答候補に関連付けられた識別子の数（抽出識別子群）が所定の基準値以下のとき、質問文に対する回答を行わない、すなわち質問候補の抽出対象の文書が複数あるとき以外、回答を行わないことから、利用者は回答候補をどの文書から抽出したか知ることができない。 As described above, according to the question answering system 12 of the third embodiment, when the number of identifiers (extraction identifier group) associated with the answer candidates extracted by the answer extraction unit 3800 is equal to or less than a predetermined reference value, Since the answer is not made, that is, the answer is not made except when there are a plurality of question candidate extraction target documents, the user cannot know from which document the answer candidate is extracted.

次に、実施例２と実施例３を組合わせた質問応答システム１３を説明する。
質問応答システム１３は、図７に示すように、サーバ２０とクライアント３０とが図示しないネットワークで接続されており、クライアント３０は、入力部１１００とセッション管理部１５００と出力部１７００とを備える。 Next, a question answering system 13 combining the second embodiment and the third embodiment will be described.
As shown in FIG. 7, the question answering system 13 includes a server 20 and a client 30 connected via a network (not shown). The client 30 includes an input unit 1100, a session management unit 1500, and an output unit 1700.

サーバ２０は、質問解析部２２００と、文書検索部２４００と、類似文書検索部２９００と、文書蓄積部１３００と、回答抽出部３８００と、新たな回答処理制御部４６００とを備える。 The server 20 includes a question analysis unit 2200, a document search unit 2400, a similar document search unit 2900, a document storage unit 1300, an answer extraction unit 3800, and a new answer processing control unit 4600.

回答処理制御部４６００以外は、前記した実施例と同様であることから、その説明を割愛する。
実施例４の回答処理制御部４６００は、セッション管理部１５００からの識別子群を回答抽出部３８００へ出力して、該回答抽出部３８００での処理結果として、回答候補と抽出識別子群を取得すると、該抽出識別子群における識別子の数が所定の基準値以下か否か判定を行う。本実施例では、基準値を１として以降の説明を行う。識別子の数が１以下であるとき、回答処理制御部４６００は、識別子群を２９００へ出力する。そして、出力した識別子群に関連付けられた回答単語（文書）に類似する他の回答単語（文書）の識別子を類似識別子として類似文書検索部２９００から取得した回答処理制御部４６００は、識別子群に類似識別子を加えた新たな識別子群を生成する。 Except for the answer processing control unit 4600, the explanation is omitted because it is the same as the above-described embodiment.
When the answer processing control unit 4600 according to the fourth embodiment outputs the identifier group from the session management unit 1500 to the answer extracting unit 3800 and acquires the answer candidate and the extracted identifier group as the processing result in the answer extracting unit 3800, It is determined whether or not the number of identifiers in the extracted identifier group is equal to or less than a predetermined reference value. In the present embodiment, the following description will be made assuming that the reference value is 1. When the number of identifiers is 1 or less, the answer processing control unit 4600 outputs the identifier group to 2900. The response processing control unit 4600 obtained from the similar document search unit 2900 as an identifier of another response word (document) similar to the response word (document) associated with the output identifier group is similar to the identifier group. A new identifier group to which the identifier is added is generated.

そして、回答処理制御部４６００は新たに生成した識別子群を回答抽出部３８００へ出力して、該回答抽出部３８００での処理結果として回答候補と抽出識別子群を取得すると、該抽出識別子群における識別子の数が所定の基準値以下か否か判定を行う。回答処理制御部４６００は、前記した処理を所定の回数繰返すか、回答抽出部３８００からの抽出識別子群における識別子の数（文書数）が所定値以上になるまで繰返すか、類似文書検索部２９００から類似識別子を得られなくなるまで繰返す。 Then, the answer processing control unit 4600 outputs the newly generated identifier group to the answer extracting unit 3800, and when the answer candidate and the extracted identifier group are acquired as the processing result in the answer extracting unit 3800, the identifier in the extracted identifier group It is determined whether or not the number is less than or equal to a predetermined reference value. The answer processing control unit 4600 repeats the above process a predetermined number of times, or repeats until the number of identifiers (number of documents) in the extracted identifier group from the answer extraction unit 3800 becomes a predetermined value or more, or from the similar document search unit 2900 Repeat until no similar identifier is available.

その結果、回答処理制御部４６００は、識別子群における識別子の数が所定値以下であるとき、「回答無し」を示す情報を出力部１７００へ出力し、識別子群における識別子の数が所定値より大きいとき、回答候補に得点を付与し、これを回答抽出部１８００へ出力する。 As a result, when the number of identifiers in the identifier group is equal to or smaller than the predetermined value, the answer processing control unit 4600 outputs information indicating “no answer” to the output unit 1700, and the number of identifiers in the identifier group is larger than the predetermined value. At this time, a score is assigned to the answer candidate, and this is output to the answer extraction unit 1800.

次に、質問応答システム１３の動作を説明する。
図８は、順に質問解析部２２００、文書検索部２４００、類似文書検索部２９００、回答処理制御部４６００および回答抽出部３８００における各部の処理タイミングを示す処理フローであり、該処理フローに沿って説明を行う。 Next, the operation of the question answering system 13 will be described.
FIG. 8 is a processing flow showing the processing timing of each part in the question analysis unit 2200, document search unit 2400, similar document search unit 2900, answer processing control unit 4600, and answer extraction unit 3800 in this order. I do.

入力部１１００で取得した質問文に基づいて質問解析部２２００で検索語を取得し、該検索語に基づいて文書検索部２４００で識別子群を取得し、セッション管理部１５００で識別子群を管理するまでの処理内容は、前記した実施例と同様である。すなわち、入力部１１００で入力された質問文を取得した質問解析部２２００は、形態素解析を行い検索語を取得する（ステップＳ４０１）。質問解析部２２００で取得した検索語が文書検索部２４００へ送られると、該文書検索部２４００は、文書蓄積部１３００を参照し、検索語に合致する回答単語（文書）が関連付けられている識別子群をセッション管理部１５００へ出力する（ステップＳ４０２）。 Until the query analysis unit 2200 acquires a search term based on the question sentence acquired by the input unit 1100, acquires an identifier group by the document search unit 2400 based on the search term, and manages the identifier group by the session management unit 1500 The processing content of is the same as in the above-described embodiment. That is, the question analysis unit 2200 that has acquired the question text input by the input unit 1100 performs morphological analysis and acquires a search word (step S401). When the search term acquired by the question analysis unit 2200 is sent to the document search unit 2400, the document search unit 2400 refers to the document storage unit 1300, and an identifier associated with an answer word (document) that matches the search term. The group is output to the session management unit 1500 (step S402).

セッション管理部１５００からの識別子群を取得した回答処理制御部４６００は、その識別子群を回答抽出部３８００へ出力する。識別子群を取得した回答抽出部３８００は、文書蓄積部１３００を参照して、回答候補と抽出識別子群を取得し、それらを回答処理制御部４６００へ出力する（ステップＳ４０３）。回答抽出部３８００での抽出識別子群における識別子の数が１つであるときの例で、以降の説明を行う。 The answer processing control unit 4600 that has acquired the identifier group from the session management unit 1500 outputs the identifier group to the answer extraction unit 3800. The answer extraction unit 3800 that has acquired the identifier group refers to the document storage unit 1300, acquires the answer candidate and the extracted identifier group, and outputs them to the answer processing control unit 4600 (step S403). The following explanation will be given using an example in which the number of identifiers in the extracted identifier group in the answer extraction unit 3800 is one.

回答処理制御部４６００は、取得した抽出識別子群における識別子の数を所定の基準値と判定する（ステップＳ４０４）。この判定で抽出識別子群における識別子の数が２以上のとき、回答処理制御部４６００は、回答候補に得点を付与し、これを出力部１７００へ出力する。 The answer processing control unit 4600 determines the number of identifiers in the acquired extracted identifier group as a predetermined reference value (step S404). When the number of identifiers in the extracted identifier group is two or more in this determination, the answer processing control unit 4600 gives a score to the answer candidate and outputs it to the output unit 1700.

一方、抽出識別子群における識別子の数が１以下であるとき、回答処理制御部４６００は、類似識別子取得処理を少なくとも１回以上行ったか否かを判定する（ステップＳ４０４）。類似識別子取得処理を少なくとも１回以上行っているとき、回答処理制御部４６００は、「回答無し」を示す情報を出力部１７００へ出力する。 On the other hand, when the number of identifiers in the extracted identifier group is 1 or less, the response process control unit 4600 determines whether or not the similar identifier acquisition process has been performed at least once (step S404). When the similar identifier acquisition process is performed at least once, the answer process control unit 4600 outputs information indicating “no answer” to the output unit 1700.

未だ、類似識別子取得処理を実施していないとき、回答処理制御部４６００は、識別子群を類似文書検索部２９００へ出力する。
識別子群を取得した類似文書検索部２９００は、該識別子に対応する文書を文書蓄積部１３００を参照して取得する（ステップＳ４０６）。取得した文書は、類似文書検索部２９００から質問解析部２２００へ出力される。 When the similar identifier acquisition processing is not yet performed, the answer processing control unit 4600 outputs the identifier group to the similar document search unit 2900.
The similar document search unit 2900 that has acquired the identifier group acquires a document corresponding to the identifier with reference to the document storage unit 1300 (step S406). The acquired document is output from the similar document search unit 2900 to the question analysis unit 2200.

質問解析部２２００は、類似文書検索部２９００からの文書を取得すると、該文書の形態素解析を行い、類似検索語を取得する（ステップＳ４０７）。取得した類似検索語は、質問解析部２２００から類似文書検索部２９００へ出力される。 When the question analysis unit 2200 acquires a document from the similar document search unit 2900, the question analysis unit 2200 performs a morphological analysis of the document and acquires a similar search word (step S407). The acquired similar search term is output from the question analysis unit 2200 to the similar document search unit 2900.

質問解析部２２００からの類似検索語を取得した類似文書検索部２９００は、該類似取得検索語を文書検索部２４００へ出力する。
文書検索部２４００は、類似検索語を取得した文書検索部２４００は、文書蓄積部１３００を参照して、類似検索語に対応する識別子を類似識別子として取得し、該類似識別子を類似文書検索部２９００へ出力する（ステップＳ４０８）。 The similar document search unit 2900 that has acquired the similar search term from the question analysis unit 2200 outputs the similar acquisition search term to the document search unit 2400.
The document search unit 2400, which has acquired the similar search word, refers to the document storage unit 1300, acquires an identifier corresponding to the similar search word as a similar identifier, and uses the similar identifier as the similar document search unit 2900. (Step S408).

類似識別子を取得した類似文書検索部２９００は、単数、若しくは複数の類似識別子において、ステップＳ２０２において取得した識別子が含まれていないか調べ、含まれているとき、その識別子を取り除いた類似識別子を生成し、該類似識別子を回答処理制御部４６００へ出力する（ステップＳ４０９）。 The similar document search unit 2900 that has acquired the similar identifier checks whether or not the identifier acquired in step S202 is included in one or a plurality of similar identifiers, and if included, generates a similar identifier by removing the identifier. Then, the similarity identifier is output to the answer processing control unit 4600 (step S409).

回答処理制御部４６００は、類似文書検索部２９００から類似識別子を取得すると、該類似識別子における識別子の数を所定の基準値（１）以上であるか否かを判定する（ステップＳ４１０）。この判定において、識別子の数が１以上でないとき、すなわち識別子の数が０であるとき、回答処理制御部４６００は、「回答無し」を示す情報を出力部１７００へ出力する。 When the response processing control unit 4600 acquires a similar identifier from the similar document search unit 2900, it determines whether the number of identifiers in the similar identifier is equal to or greater than a predetermined reference value (1) (step S410). In this determination, when the number of identifiers is not 1 or more, that is, when the number of identifiers is 0, the answer processing control unit 4600 outputs information indicating “no answer” to the output unit 1700.

一方、類似識別子における識別子の数が１以上あるとき、回答処理制御部４６００は、識別子群に類似識別子を加えた新たな識別子群を生成し、ステップＳ４０３からの処理を行うべく、生成した識別子群を回答抽出部３８００へ出力する（ステップＳ４１１）。 On the other hand, when the number of identifiers in the similar identifier is one or more, the answer processing control unit 4600 generates a new identifier group in which the similar identifier is added to the identifier group, and the generated identifier group to perform the processing from step S403. Is output to the answer extraction unit 3800 (step S411).

ステップＳ４０３における回答抽出部３８００は、ステップＳ４１１で新たな類似識別子が加えられ、この追加された類似識別子に基づいて、回答候補と新たな識別子を含む抽出識別子群とを取得する例で以降を説明する。
ステップＳ４０４における回答処理制御部４６００は、抽出識別子群における識別子の数を判定を行うが、今回の判定ではステップＳ４０３で新たな識別子を含む抽出識別子群における識別子の数を判定する。抽出識別子群には、ステップＳ４０３における処理で追加された類似識別子に基づいて得る新たな識別子が含まれており、すなわち抽出識別子群は、前回取得した識別子と今回取得した他の識別子とで構成されている。従って、少なくとも２つ以上の識別子を含むことから、抽出識別子群における識別子の数が少なくとも２つ以上あり、回答処理制御部４６００は回答候補に得点を付与し、これを出力部１７００へ出力する。 The answer extraction unit 3800 in step S403 adds a new similar identifier in step S411, and based on the added similar identifier, acquires an answer candidate and an extracted identifier group including the new identifier, and the following description is given. To do.
The answer processing control unit 4600 in step S404 determines the number of identifiers in the extracted identifier group. In this determination, in step S403, the number of identifiers in the extracted identifier group including a new identifier is determined. The extracted identifier group includes a new identifier obtained based on the similar identifier added in the process in step S403. That is, the extracted identifier group includes the previously acquired identifier and another identifier acquired this time. ing. Therefore, since at least two or more identifiers are included, the number of identifiers in the extracted identifier group is at least two, and the answer processing control unit 4600 gives a score to the answer candidate and outputs it to the output unit 1700.

前記したように、本発明の質問応答システム１３によれば、回答候補に関連付けられた識別子の数（抽出識別子群）が所定の基準値以下のとき、識別子群に対応する文書から類似検索語を取得し、該類似検索語に対応する類似識別子を識別子群に加えた新たな識別子群を生成し、この生成した識別子群に基づいて回答候補の再抽出を行うことにより、該回答候補に関連付けられた他の識別子を得ることができる。従って複数の識別子、すなわち類似した文書が追加された複数の文書を得ることができることから、利用者は回答候補をどの文書から抽出したか知ることができない。 As described above, according to the question answering system 13 of the present invention, when the number of identifiers (extracted identifier group) associated with the answer candidates is equal to or smaller than a predetermined reference value, similar search terms are retrieved from the document corresponding to the identifier group. A new identifier group obtained by adding the similar identifier corresponding to the similar search word to the identifier group, and re-extracting the answer candidate based on the generated identifier group, thereby being associated with the answer candidate. Other identifiers can be obtained. Therefore, since a plurality of documents to which a plurality of identifiers, that is, similar documents are added, can be obtained, the user cannot know from which document the answer candidate is extracted.

これにより、利用者は、各文書に対して、そこから得られる回答候補を対応づけたデータベース構成を知ることが出来ず、データベースを構築したサーバ運用者の利益が損なわれる恐れを低減することができる。 As a result, the user cannot know the database configuration in which the answer candidates obtained from each document are associated with each document, thereby reducing the possibility that the profit of the server operator who built the database is impaired. it can.

実施例は質問応答システムの例で説明したが、クライアント・サーバ構成に限ることなく、質問応答装置として本発明を実施してもよい。 Although the embodiment has been described with the example of the question answering system, the present invention may be implemented as a question answering device without being limited to the client / server configuration.

実施例１の質問応答システムのブロック図である。It is a block diagram of the question answering system of Example 1. 実施例１の質問応答システムの処理フローを示す図である。It is a figure which shows the processing flow of the question answering system of Example 1. 実施例２の質問応答システムのブロック図である。It is a block diagram of the question answering system of Example 2. 実施例２の質問応答システムの処理フローを示す図である。It is a figure which shows the processing flow of the question answering system of Example 2. 実施例３の質問応答システムのブロック図である。It is a block diagram of the question answering system of Example 3. 実施例３の回答処理制御部の動作を示すフローチャートである。14 is a flowchart illustrating an operation of an answer processing control unit according to the third embodiment. 実施例４の質問応答システムのブロック図である。It is a block diagram of the question answering system of Example 4. 実施例４の質問応答システムの処理フローを示す図である。It is a figure which shows the processing flow of the question answering system of Example 4.

Explanation of symbols

１０実施例１の質問応答システム
１１実施例２の質問応答システム
１２実施例３の質問応答システム
１３実施例４の質問応答システム
２０サーバ
３０クライアント
１１００入力部
１２００２２００質問解析部
１３００文書蓄積部
１４００２４００文書検索部
１５００セッション管理部
１６００２６００３６００４６００回答処理制御部
１７００出力部
１８００３８００回答抽出部
２９００類似文書検索部
DESCRIPTION OF SYMBOLS 10 Question answering system of Example 1 11 Question answering system of Example 2 12 Question answering system of Example 3 13 Question answering system of Example 4 20 Server 30 Client 1100 Input part 1200 2200 Question analysis part 1300 Document storage part 1400 2400 Document search unit 1500 Session management unit 1600 2600 3600 4600 Answer processing control unit 1700 Output unit 1800 3800 Answer extraction unit 2900 Similar document search unit

Claims

A document analysis unit that analyzes a question sentence from a user and obtains a search term, a document storage unit that accumulates a document group including answer candidates for the question sentence, and the document storage unit, and corresponds to the search term A document search unit for acquiring an identifier group for identifying a document to be performed, a session management unit for a user to arbitrarily select the identifier group from the document search unit, and the document storage unit, In a question answering apparatus comprising: an answer extracting unit that extracts an answer candidate corresponding to a predetermined question type from the document when acquiring a document corresponding to the identifier group from the session management unit;
An answer that determines the number of documents based on the identifier group as an answer processing judgment, and outputs an answer to the question text that there is no answer to the question text when the number of documents is equal to or less than a predetermined reference value A question answering apparatus comprising a processing control unit.

Further, referring to the document storage unit, based on the document associated with the identifier group, the query analysis unit acquires a similar search word similar to the search word, and identifies a similar document corresponding to the similar search word A similar document search unit for causing the document search unit to acquire a similar identifier for
The answer processing control unit activates the similar document search unit based on the identifier group obtained by the document search unit and activates the similar document search unit when the number of documents is equal to or less than a predetermined reference value. When the identifier is acquired, the answer processing determination is performed based on the number of documents obtained by adding the similar identifier to the similar identifier group. When the number of documents exceeds the predetermined reference value in this determination, the extraction source is extracted together with the extraction of the answer candidates. The question answering apparatus according to claim 1, wherein the answer extraction unit is activated to output the document as an answer result of the question sentence.

A document analysis unit that analyzes a question sentence from a user and obtains a search term, a document storage unit that accumulates a document group including answer candidates for the question sentence, and the document storage unit, and corresponds to the search term A document search unit for acquiring an identifier group for identifying a document to be performed, a session management unit for a user to arbitrarily select the identifier group from the document search unit, and the document storage unit, In a question answering apparatus comprising: an answer extracting unit that extracts an answer candidate corresponding to a predetermined question type from the document when acquiring a document corresponding to the identifier group from the session management unit;
The number of documents is determined based on an extraction identifier group that identifies the document from which the answer candidate is extracted from the answer extraction unit. A question answering apparatus comprising: an answer processing control unit that outputs the answer as a result of the question sentence.

A document analysis unit that analyzes a question sentence from a user and obtains a search term, a document storage unit that accumulates a document group including answer candidates for the question sentence, and the document storage unit, and corresponds to the search term A document search unit for acquiring an identifier group for identifying a document to be performed, a session management unit for a user to arbitrarily select the identifier group from the document search unit, and the document storage unit, In a question answering apparatus comprising: an answer extracting unit that extracts an answer candidate corresponding to a predetermined question type from the document when acquiring a document corresponding to the identifier group from the session management unit;
Referring to the document storage unit, the query analysis unit is caused to acquire a similar search word similar to the search word based on a document associated with the identifier group, and a similar document corresponding to the similar search word is identified. A similar document search unit for causing the document search unit to acquire a similar identifier for
The number of documents is determined based on an extraction identifier group that identifies an extraction source document of answer candidates from the answer extraction unit. When the number of documents is equal to or greater than a predetermined reference value in this determination, the extraction source is extracted together with the extraction of answer candidates. When the number of documents is less than a predetermined reference value and the similar document search unit is activated to obtain a similar identifier from the similar document search unit, the similar document is obtained. An answer processing control unit that activates the answer extracting unit based on a new identifier generated by adding the similar identifier to the identifier when the number of documents indicated by the identifier is equal to or greater than a predetermined reference value. Question answering device.

4. The identifier according to claim 1, wherein the identifier is a file name of the document or an index that associates the document with a word that can be a search word for the document stored in the document storage unit. 4. The question answering device according to 4.

5. The question answering apparatus according to claim 2, wherein when the similar identifier from the document search unit is included in the identifier group, the similar document search unit removes the similar identifier.

In order to construct the server / client of the question answering apparatus according to claim 1, claim 3 and claim 4,
The client includes the session management unit,
The server includes the answer processing control unit that operates based on the identifier selected by the session management unit, the units that are controlled by the answer processing control unit, and the document storage unit. Response system.