JP2019008367A

JP2019008367A - Question word weight calculation apparatus, question answer retrieval apparatus, question word weight calculation method, question answer retrieval method, program and record medium

Info

Publication number: JP2019008367A
Application number: JP2017120725A
Authority: JP
Inventors: 済央野本; Narichika Nomoto; 久子浅野; Hisako Asano; 松尾　義博; Yoshihiro Matsuo; 義博松尾
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2017-06-20
Filing date: 2017-06-20
Publication date: 2019-01-17

Abstract

To reduce an influence of unnecessary words in an answer included in a question text of FAQ and perform FAQ retrieval with high accuracy even for unprotected FAQ.SOLUTION: A question word weight calculation apparatus 10 includes: a word weight calculation unit 14 for calculating a weight value indicating an importance in an answer of each word extracted from a question text using a plurality of words extracted from an answer text with respect to each pair of text pair groups by taking as input a text pair group composed of a pair of the question text and the answer text with respect to the question text; and an output unit 15 for outputting a question word weight table 16 for storing a weight value of each of the words calculated for the pair by the word weight calculation unit 14 for each of the pairs.SELECTED DRAWING: Figure 1

Description

本発明は、質問単語重み算出装置、質問回答検索装置、質問単語重み算出方法、質問回答検索方法、プログラム、及び記憶媒体に係り、特に、質問テキストと、それに対する回答テキストとのペアを蓄積して利用する質問単語重み算出装置、質問回答検索装置、質問単語重み算出方法、質問回答検索方法、プログラム、及び記憶媒体に関する。 The present invention relates to a question word weight calculation device, a question answer search device, a question word weight calculation method, a question answer search method, a program, and a storage medium, and particularly stores a pair of a question text and an answer text corresponding thereto. The present invention relates to a question word weight calculation device, a question answer search device, a question word weight calculation method, a question answer search method, a program, and a storage medium.

近年、サービスの多様化に伴い、企業等が提供するサービスの数が増加する傾向にある。それに伴い、サービスを提供する企業等では、サービスの仕様や取扱方法等に関して、習得すべき知識量も増え続けている。例えば、企業等に設置されているコールセンタのオペレータが十分な知識を習得するには相応のコストがかかる。一方、十分な知識のないオペレータに応対させると、顧客を不安にさせる、顧客との通話時間が長くなる等、企業等にとって重大なサービス品質の低下につながる可能性がある。 In recent years, with the diversification of services, the number of services provided by companies and the like tends to increase. Along with this, companies that provide services are continuously increasing the amount of knowledge they need to learn about service specifications and handling methods. For example, it takes a considerable cost for a call center operator installed in a company or the like to acquire sufficient knowledge. On the other hand, responding to an operator who does not have sufficient knowledge may lead to a serious deterioration in service quality for companies and the like, such as making the customer uneasy and increasing the call time with the customer.

そこで、必要な知識の習得にかかるコストを低減させつつ、顧客に高品質なサービスを提供するために、社内知識やノウハウ等をよくある質問(以下、ＦＡＱ：Frequently Asked Questionsという。)として整備し、このＦＡＱを活用する取り組みがなされている。しかし、多様な問い合わせに対するＦＡＱを整備し続けるには、膨大なコストがかかるため、いかに効率良く新たな問い合わせに応じたＦＡＱを整備し続けられるかが課題とされている。 Therefore, in order to provide high-quality services to customers while reducing the cost of acquiring the necessary knowledge, internal knowledge and know-how are prepared as frequently asked questions (hereinafter referred to as FAQ: Frequently Asked Questions). , Efforts to utilize this FAQ are being made. However, since it takes an enormous cost to continue to prepare FAQs for various inquiries, it is an issue how to efficiently maintain FAQs in response to new inquiries.

例えば、特許文献１に記載の従来技術によれば、ＦＡＱのＱ（質問）のみを用いた検索システムを構築することができる。この従来技術を用いて、顧客の質問文（入力クエリ）に最も意味が近いＱを見つけ出すことができる。 For example, according to the conventional technique described in Patent Document 1, it is possible to construct a search system using only FAQ Q (question). Using this conventional technique, it is possible to find the Q that is closest to the customer's question sentence (input query).

整備されたＦＡＱの場合、従来技術を適用してＦＡＱ検索システムを構築することは有効と考えられる。つまり、「ＦＡＱ中のＱには、必要最低限の情報（単語）のみ含まれている。」ということを前提とした場合、入力クエリと質問テキストの類似度から候補を見つけ出すことは有効な手段である。 In the case of a prepared FAQ, it is considered effective to apply a conventional technique to construct a FAQ search system. In other words, when it is assumed that “Q in FAQ contains only necessary minimum information (words)”, it is an effective means to find candidates from the similarity between the input query and the question text. It is.

特開２０１１−８５９９１号公報JP 2011-85991 A

ところで、電子メール（以下、単にメールという。）を用いて顧客がオペレータに対して問い合わせを行う場合がある。この場合、顧客からの問い合わせメールをＱ（質問）、それに対するオペレータの回答メールをＡ（回答）として、ＱとＡのペアの履歴をそのままＦＡＱとして活用することが考えられる。 Incidentally, there are cases where a customer makes an inquiry to an operator using electronic mail (hereinafter simply referred to as mail). In this case, it is conceivable that the inquiry mail from the customer is Q (question), the operator's reply mail is A (answer), and the history of the pair of Q and A is used as it is as FAQ.

しかしながら、メールでの問い合わせのように整備されていないＦＡＱの場合、Ｑの中には、Ａの回答と直接関係しない情報（単語）も多く含まれる。例えば、顧客に何か問題が生じて問い合わせメールを出す場合について想定する。顧客は何が問題なのか、問題の解決にどのような情報が必要なのか分からないことが多いため、問題に直接関係しないような状況説明が多く含まれる傾向がある。また、メールにおける定型文等も、当該問題の解決に直接関係しない情報として含まれる。 However, in the case of a FAQ that is not maintained as inquiries by e-mail, the Q includes a lot of information (words) not directly related to the answer of A. For example, suppose that a customer issues a problem and issues an inquiry mail. Since customers often do not know what is the problem and what information is needed to solve the problem, they tend to include many explanations of situations that are not directly related to the problem. Moreover, the fixed sentence etc. in an email are also included as information not directly related to the solution of the problem.

このため、従来技術では、ＦＡＱを整備する場合に、ノイズとなる単語の影響を受けて正しく類似度を計算することができず、十分な精度を実現することが難しい。 For this reason, in the prior art, when the FAQ is prepared, the similarity cannot be calculated correctly under the influence of a word that becomes noise, and it is difficult to realize sufficient accuracy.

また、Ｑ中にＡを回答するための手掛かり語が欠如することも多く、この場合も正しく類似度を計算することが難しく、精度低下の一因となる。 In addition, a clue word for answering A in Q is often lacking. In this case as well, it is difficult to calculate the degree of similarity correctly, which causes a decrease in accuracy.

本発明は、上記の事情に鑑みてなされたもので、ＦＡＱの質問テキスト中に含まれる回答に不要な単語の影響を低減し、整備されていないＦＡＱに対しても高い精度でＦＡＱ検索を行うことができる質問単語重み算出装置、質問回答検索装置、質問単語重み算出方法、質問回答検索方法、プログラム及び記憶媒体を提供することを目的とする。 The present invention has been made in view of the above circumstances, reduces the influence of unnecessary words on answers contained in FAQ question text, and performs FAQ searches with high accuracy even for unprepared FAQs. An object of the present invention is to provide a question word weight calculation device, a question answer search device, a question word weight calculation method, a question answer search method, a program, and a storage medium.

上記目的を達成するために、第１の発明に係る質問単語重み算出装置は、質問テキストと、前記質問テキストに対する回答テキストとのペアからなるテキストペア群を入力として、前記テキストペア群の各ペアについて、前記回答テキストから抽出される複数の単語を用いて、前記質問テキストから抽出される各単語の回答における重要度を示す重み値を算出する単語重み算出部と、前記各ペアについて、前記単語重み算出部によりペアについて算出された前記各単語の重み値を格納した質問単語重みテーブルを出力する出力部と、を含むものである。 In order to achieve the above object, a question word weight calculation device according to a first aspect of the present invention is directed to a text pair group consisting of a pair of a question text and an answer text for the question text, and each pair of the text pair group. For each of the pairs, a word weight calculation unit that calculates a weight value indicating importance in an answer of each word extracted from the question text using a plurality of words extracted from the answer text And an output unit for outputting a question word weight table storing the weight value of each word calculated for the pair by the weight calculation unit.

また、第２の発明に係る質問単語重み算出装置は、第１の発明において、前記テキストペア群及び外部文書を入力として、前記テキストペア群の各ペアについて、前記外部文書から、前記回答テキストから抽出される単語を含む関連記載箇所を文書内テキストとして抽出する文書内テキスト抽出部と、前記各ペアについて、前記文書内テキスト抽出部により前記ペアについて抽出された文書内テキストから、前記ペアの質問テキストに含まれていない単語を拡張語として抽出し、拡張語として抽出した単語の重み値を、前記ペアについての単語の重みとして前記質問単語重みテーブルに追加する拡張語抽出部と、を更に含むものである。 Further, the question word weight calculation device according to the second invention is the first invention, wherein the text pair group and the external document are input, and each pair of the text pair group is input from the external document and from the answer text. In-document text extraction unit for extracting related description locations including extracted words as in-document text, and for each pair, the pair questions from the in-document text extracted by the in-document text extraction unit An expanded word extraction unit that extracts words not included in the text as expanded words, and adds the word weight values extracted as expanded words to the question word weight table as the word weights for the pair. It is a waste.

一方、上記目的を達成するために、第３の発明に係る質問回答検索装置は、質問テキストと、前記質問テキストに対する回答テキストとのペアからなるテキストペア群を入力として、前記テキストペア群の各ペアについて、前記回答テキストから抽出される複数の単語を用いて算出される、前記質問テキストから抽出される各単語の回答における重要度を示す重み値を格納した質問単語重みテーブルと、入力クエリに含まれる各単語と、前記質問単語重みテーブルとに基づいて、前記各ペアについて、前記ペアの各単語の重み値のうち、前記入力クエリに含まれる単語と一致する各単語の重み値の総和を、検索スコアとして算出し、前記検索スコアに従って、前記入力クエリに対応する前記ペアを検索結果として出力するＦＡＱ検索部と、を含むものである。 On the other hand, in order to achieve the above object, the question answer search device according to the third invention uses a text pair group consisting of a pair of a question text and an answer text for the question text as input, A question word weight table storing weight values indicating importance in answers of each word extracted from the question text, calculated using a plurality of words extracted from the answer text, and an input query Based on each word included and the question word weight table, for each pair, the sum of the weight values of each word that matches the word included in the input query among the weight values of each word of the pair. A FAQ search unit that calculates a search score and outputs the pair corresponding to the input query as a search result according to the search score; Is Dressings.

一方、上記目的を達成するために、第４の発明に係る質問単語重み算出方法は、単語重み算出部が、質問テキストと、前記質問テキストに対する回答テキストとのペアからなるテキストペア群を入力として、前記テキストペア群の各ペアについて、前記回答テキストから抽出される複数の単語を用いて、前記質問テキストから抽出される各単語の回答における重要度を示す重み値を算出するステップと、出力部が、前記各ペアについて、前記単語重み算出部によりペアについて算出された前記各単語の重み値を格納した質問単語重みテーブルを出力するステップと、を含むものである。 On the other hand, in order to achieve the above object, in the question word weight calculation method according to the fourth invention, the word weight calculation unit receives as input a text pair group consisting of a pair of a question text and an answer text to the question text. Calculating, for each pair of the text pair group, a weight value indicating a degree of importance in an answer of each word extracted from the question text using a plurality of words extracted from the answer text; and an output unit Outputting a question word weight table storing the weight value of each word calculated for the pair by the word weight calculator for each pair.

また、第５の発明に係る質問単語重み算出方法は、第４の発明において、文書内テキスト抽出部が、前記テキストペア群及び外部文書を入力として、前記テキストペア群の各ペアについて、前記外部文書から、前記回答テキストから抽出される単語を含む関連記載箇所を文書内テキストとして抽出するステップと、拡張語抽出部が、前記各ペアについて、前記文書内テキスト抽出部により前記ペアについて抽出された文書内テキストから、前記ペアの質問テキストに含まれていない単語を拡張語として抽出し、拡張語として抽出した単語の重み値を、前記ペアについての単語の重みとして前記質問単語重みテーブルに追加するステップと、を更に含むものである。 The question word weight calculation method according to a fifth aspect of the present invention is the method for calculating a question word weight according to the fourth aspect, wherein the in-document text extraction unit inputs the text pair group and the external document and inputs the external pair for each pair of the text pair group. A step of extracting as a text in a document a related description part including a word extracted from the answer text from the document, and an extended word extracting unit for each pair, the text in the document is extracted for the pair A word not included in the question text of the pair is extracted as an extended word from the text in the document, and a weight value of the word extracted as the extended word is added to the question word weight table as a word weight for the pair. And a step.

一方、上記目的を達成するために、第６の発明に係る質問回答検索方法は、ＦＡＱ検索部が、入力クエリに含まれる各単語と、質問単語重みテーブルであって、質問テキストと、前記質問テキストに対する回答テキストとのペアからなるテキストペア群を入力として、前記テキストペア群の各ペアについて、前記回答テキストから抽出される複数の単語を用いて算出される、前記質問テキストから抽出される各単語の回答における重要度を示す重み値を格納した質問単語重みテーブルとに基づいて、前記各ペアについて、前記ペアの各単語の重み値のうち、前記入力クエリに含まれる単語と一致する各単語の重み値の総和を、検索スコアとして算出し、前記検索スコアに従って、前記入力クエリに対応する前記ペアを検索結果として出力するステップを含むものである。 On the other hand, in order to achieve the above object, in the question answer search method according to the sixth aspect of the present invention, the FAQ search unit includes each word included in the input query, a question word weight table, a question text, the question Each text extracted from the question text is calculated using a plurality of words extracted from the answer text for each pair of the text pair group, taking as input a text pair group consisting of a pair of answer text to the text Based on a question word weight table storing weight values indicating importance in word responses, for each pair, each word that matches the word included in the input query among the weight values of each word of the pair Is calculated as a search score, and the pair corresponding to the input query is output as a search result according to the search score. It is intended to include a step.

一方、上記目的を達成するために、第７の発明に係るプログラムは、コンピュータを、第１又は第２の発明に係る質問単語重み算出装置が備える各部、又は、第３の発明に係る質問回答検索装置が備える各部として機能させるためのものである。 On the other hand, in order to achieve the above object, a program according to a seventh aspect of the invention provides a computer, each part of the question word weight calculation device according to the first or second aspect, or the question answer according to the third aspect. This is for causing each unit included in the search device to function.

更に、上記目的を達成するために、第８の発明に係る記憶媒体は、コンピュータを、第１又は第２の発明に係る質問単語重み算出装置が備える各部、又は、第３の発明に係る質問回答検索装置が備える各部として機能させるためのプログラムを記憶したものである。 Furthermore, in order to achieve the above object, a storage medium according to an eighth aspect of the invention provides a computer, each part included in the question word weight calculation device according to the first or second aspect, or the question according to the third aspect. A program for functioning as each unit included in the answer search device is stored.

以上説明したように、本発明に係る質問単語重み算出装置、質問回答検索装置、質問単語重み算出方法、質問回答検索方法、プログラム、及び記憶媒体によれば、ＦＡＱの質問テキスト中に含まれる回答に不要な単語の影響を低減し、整備されていないＦＡＱに対しても高い精度でＦＡＱ検索を行うことができる。 As described above, according to the question word weight calculation device, the question answer search device, the question word weight calculation method, the question answer search method, the program, and the storage medium according to the present invention, the answers included in the FAQ question text Thus, it is possible to reduce the influence of unnecessary words and to perform FAQ searches with high accuracy even for FAQs that are not maintained.

第１の実施形態に係る質問単語重み算出装置の機能的な構成の一例を示すブロック図である。It is a block diagram which shows an example of a functional structure of the question word weight calculation apparatus which concerns on 1st Embodiment. 第１の実施形態に係る質問回答検索装置の機能的な構成の一例を示すブロック図である。It is a block diagram which shows an example of a functional structure of the question answer search apparatus which concerns on 1st Embodiment. 第１の実施形態に係る質問単語重み算出プログラムによる処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the process by the question word weight calculation program which concerns on 1st Embodiment. 第１の実施形態に係る質問回答検索プログラムによる処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the process by the question answer search program which concerns on 1st Embodiment. 第２の実施形態に係る質問単語重み算出装置の機能的な構成の一例を示すブロック図である。It is a block diagram which shows an example of a functional structure of the question word weight calculation apparatus which concerns on 2nd Embodiment. 第２の実施形態に係る質問単語重み追加プログラムによる処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of a process by the question word weight addition program which concerns on 2nd Embodiment.

以下、図面を参照して、本発明を実施するための形態の一例について詳細に説明する。 Hereinafter, an example of an embodiment for carrying out the present invention will be described in detail with reference to the drawings.

［第１の実施形態］
本実施形態では、質問テキストを構成する各単語に対して、回答テキストを用いて、「回答に必要な単語か否か」で重み付ける、あるいは、検索対象から除外することで、不要な単語の影響を低減したＦＡＱ検索を実現する。 [First Embodiment]
In this embodiment, each word constituting the question text is weighted by “whether or not it is a word necessary for an answer” using the answer text, or is excluded from the search target. The FAQ search with reduced influence is realized.

図１は、第１の実施形態に係る質問単語重み算出装置１０Ａの機能的な構成の一例を示すブロック図である。
図１に示すように、本実施形態に係る質問単語重み算出装置１０Ａは、ＦＡＱデータベース（以下、ＦＡＱＤＢ）１２と、単語重み算出部１４と、出力部１５と、質問単語重みテーブル１６と、を備える。 FIG. 1 is a block diagram illustrating an example of a functional configuration of a question word weight calculation apparatus 10A according to the first embodiment.
As shown in FIG. 1, a question word weight calculation device 10A according to the present embodiment includes a FAQ database (hereinafter, FAQDB) 12, a word weight calculation unit 14, an output unit 15, and a question word weight table 16. Prepare.

ＦＡＱＤＢ１２には、顧客からの問い合わせメールに含まれる質問テキストと、問い合わせメールに対するオペレータの回答メールに含まれる回答テキストとのペアからなるテキストペア群が格納されている。 The FAQ DB 12 stores a text pair group consisting of a pair of a question text included in an inquiry mail from a customer and an answer text included in an operator's answer mail for the inquiry mail.

単語重み算出部１４は、ＦＡＱＤＢ１２からテキストペア群を構成する各ペアの入力を受け付ける。単語重み算出部１４は、ＦＡＱの各ペアに対して処理を行う。つまり、単語重み算出部１４は、１つのペアを構成する質問テキスト及び回答テキストの各々に対して形態素解析を行って、質問テキスト及び回答テキストの各々から複数の単語を抽出する。単語重み算出部１４は、回答テキストから抽出される複数の単語を用いて、質問テキストから抽出される各単語の回答における重要度を示す重み値を算出する。 The word weight calculation unit 14 receives input of each pair constituting the text pair group from the FAQ DB 12. The word weight calculation unit 14 performs processing on each FAQ pair. That is, the word weight calculation unit 14 performs a morphological analysis on each of the question text and the answer text constituting one pair, and extracts a plurality of words from each of the question text and the answer text. The word weight calculation unit 14 uses a plurality of words extracted from the answer text to calculate a weight value indicating the importance in the answer of each word extracted from the question text.

具体的には、回答テキストを構成する複数の単語Ｗａ_ｉ（ｉ＝１，・・・，ｍ）を用いて、質問テキストを構成する各単語Ｗｑ_ｊ（ｊ＝１，・・・，ｎ）に重み付けを行う。 Specifically, a plurality of words Wa_i (i = 1,..., M) constituting the answer text are used to weight each word Wq_j (j = 1,..., N) constituting the question text. I do.

例えば、単語重み算出部１４は、質問テキストを構成する各単語Ｗｑ_ｊ（ｊ＝１，・・・，ｎ）が、回答テキストを構成する複数の単語Ｗａ_ｉ（ｉ＝１，・・・，ｍ）に含まれているか否かに基づいて、各単語Ｗｑ_ｊの重み値を算出する。具体的には、Ｗｑ_ｊがＷａ_ｉに含まれている場合には、Ｗｑ_ｊの重み値Ｓ（Ｗｑ_ｊ）を「１」とし、Ｗｑ_ｊがＷａ_ｉに含まれていない場合には、Ｗｑ_ｊの重み値Ｓ（Ｗｑ_ｊ）を「０」とする。Ｓ（Ｗｑ_ｊ）＝０の場合は、その単語は後段の処理で検索対象から除外する。 For example, the word weight calculation unit 14 is configured such that each word Wq_j (j = 1,..., N) constituting the question text is a plurality of words Wa_i (i = 1,..., M) constituting the answer text. The weight value of each word Wq_j is calculated based on whether it is included. Specifically, when Wq_j is included in Wa_i, the weight value S (Wq_j) of Wq_j is set to “1”. When Wq_j is not included in Wa_i, the weight value S (Wq_j of Wq_j is set. ) Is set to “0”. When S (Wq_j) = 0, the word is excluded from the search target in the subsequent process.

一方、情報検索で良く利用される、ｔｆ(term frequency)-ｉｄｆ(inverse document frequency)値を用いて連続値として単語の重み値を計算してもよい。ｔｆ-ｉｄｆ値は、テキスト中の単語に関する重み値の一種であり、ｔｆはテキスト中で単語が出現する頻度（回数）を示し、ｉｄｆは逆テキスト頻度を示す。ある単語ｗのｉｄｆ値をｉｄｆ（ｗ）とした場合、ｉｄｆ（ｗ）は、下記の式で計算される。なお、ｄｆ（ｗ）は単語ｗが出現するテキストの数であり、Ｎはテキストの総数である。 On the other hand, a word weight value may be calculated as a continuous value using a tf (term frequency) -idf (inverse document frequency) value that is often used in information retrieval. The tf-idf value is a kind of weight value related to a word in the text, tf indicates the frequency (number of times) of the word appearing in the text, and idf indicates the inverse text frequency. When the idf value of a word w is idf (w), idf (w) is calculated by the following formula. Note that df (w) is the number of texts in which the word w appears, and N is the total number of texts.

ｉｄｆ（ｗ）＝ｌｏｇ（Ｎ／ｄｆ（ｗ）） idf (w) = log (N / df (w))

そして、テキストｄの単語ｗに関するｔｆ-ｉｄｆ値をＴＦＩＤＦ（ｄ，ｗ）とした場合、ＴＦＩＤＦ（ｄ，ｗ）は、下記の式で計算される。なお、ｔｆ（ｄ，ｗ）は、テキストｄ中で単語ｗが出現する頻度（回数）である。 If the tf-idf value for the word w of the text d is TFIDF (d, w), TFIDF (d, w) is calculated by the following equation. Note that tf (d, w) is the frequency (number of times) the word w appears in the text d.

ＴＦＩＤＦ（ｄ，ｗ）＝ｔｆ（ｄ，ｗ）×ｉｄｆ（ｗ） TFIDF (d, w) = tf (d, w) × idf (w)

上記のｔｆ-ｉｄｆ値を用いて連続値として単語の重み値を計算する場合、単語重み算出部１４は、まず、ＦＡＱの全ての回答テキスト（１テキストを１文書とみなす）を用いて、各単語のｔｆ-ｉｄｆ値を計算する。本実施形態では、回答テキストに関するｔｆ-ｉｄｆ値の計算結果を以下にように表す。 When calculating the weight value of a word as a continuous value using the tf-idf value described above, the word weight calculation unit 14 first uses all the answer texts of FAQ (one text is regarded as one document), Calculate the tf-idf value of the word. In the present embodiment, the calculation result of the tf-idf value regarding the answer text is expressed as follows.

ＴＦＩＤＦ（Ｗａ_ｉ）, ,ｉ＝１，・・・，Ｍ TFIDF (Wa_i),, i = 1,..., M

同様に、ＦＡＱの全ての質問テキストを用いて、各単語のｔｆ-ｉｄｆ値を計算する。本実施形態では、質問テキストに関するｔｆ-ｉｄｆ値の計算結果を以下にように表す。 Similarly, the tf-idf value of each word is calculated using all the question texts of the FAQ. In the present embodiment, the calculation result of the tf-idf value regarding the question text is expressed as follows.

ＴＦＩＤＦ（Ｗｑ_ｊ）, ,ｊ＝１，・・・，Ｎ TFIDF (Wq_j),, j = 1,..., N

最後に、単語重み算出部１４は、次の（１）式を用いて、上記回答テキスト及び質問テキストの各々のｔｆ-ｉｄｆ値を乗算することにより、質問テキストを構成する複数の単語Ｗｑ_ｊの重み値Ｓ（Ｗｑ_ｊ）を算出する。 Finally, the word weight calculation unit 14 multiplies the tf-idf values of the answer text and the question text by using the following equation (1), thereby weighting a plurality of words Wq_j constituting the question text. A value S (Wq_j) is calculated.

ｉｆＷｑ_ｊｅｑＷａ_ｉｔｈｅｎ
Ｓ（Ｗｑ_ｊ）＝ＴＦＩＤＦ（Ｗａ_ｉ）×ＴＦＩＤＦ（Ｗｑ_ｊ）（１） if Wq_j eq Wa_i then
S (Wq_j) = TFIDF (Wa_i) × TFIDF (Wq_j) (1)

上記（１）式により、回答テキストに含まれない、あるいは、回答テキストで重要性が低いと判断された質問テキスト中の単語の重み値は低い値となる。なお、単語の重み値の計算は、上記のｔｆ-ｉｄｆ値に限らず、例えば、ＯｋａｐｉＢＭ２５等の他の単語重み計算手法を用いてもよい。 According to the formula (1), the weight value of the word in the question text that is not included in the answer text or is determined to be less important in the answer text becomes a low value. The calculation of the word weight value is not limited to the tf-idf value described above, and other word weight calculation methods such as Okapi BM25 may be used.

出力部１５は、単語重み算出部１４により、上記（１）式を用いて、ＦＡＱの各ペアについて算出された各単語の重み値を格納した質問単語重みテーブル１６を出力する。 The output unit 15 outputs a question word weight table 16 in which the word weight calculation unit 14 stores the weight value of each word calculated for each pair of FAQs using the above equation (1).

ここで、本実施形態に係る質問単語重み算出装置１０Ａは、ＣＰＵ（Central Processing Unit）、ＲＡＭ(Random Access Memory)、ＲＯＭ(Read Only Memory)、及びＨＤＤ(Hard Disk Drive)等を備えたコンピュータとして構成される。ＲＯＭには、本実施形態に係る質問単語重み算出プログラムが記憶されている。なお、質問単語重み算出プログラムは、ＨＤＤに記憶されていてもよい。 Here, the question word weight calculation device 10A according to the present embodiment is a computer including a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), an HDD (Hard Disk Drive), and the like. Composed. The ROM stores a question word weight calculation program according to the present embodiment. The question word weight calculation program may be stored in the HDD.

上記の質問単語重み算出プログラムは、例えば、質問単語重み算出装置１０Ａに予めインストールされていてもよい。この質問単語重み算出プログラムは、不揮発性の記憶媒体に記憶して、又は、ネットワークを介して配布して、質問単語重み算出装置１０Ａに適宜インストールすることで実現してもよい。なお、不揮発性の記憶媒体の例としては、ＣＤ-ＲＯＭ(Compact Disc Read Only Memory)、光磁気ディスク、ＤＶＤ-ＲＯＭ(Digital Versatile Disc Read Only Memory)、フラッシュメモリ、メモリカード等が挙げられる。 The above question word weight calculation program may be installed in advance in the question word weight calculation device 10A, for example. This question word weight calculation program may be realized by being stored in a nonvolatile storage medium or distributed via a network and appropriately installed in the question word weight calculation device 10A. Examples of the nonvolatile storage medium include a CD-ROM (Compact Disc Read Only Memory), a magneto-optical disk, a DVD-ROM (Digital Versatile Disc Read Only Memory), a flash memory, and a memory card.

ＣＰＵは、ＲＯＭに記憶されている質問単語重み算出プログラムを読み込んで実行することにより、上記の単語重み算出部１４及び出力部１５として機能する。また、ＨＤＤは、上記のＦＡＱＤＢ１２及び質問単語重みテーブル１６の各々として機能する。 The CPU functions as the word weight calculation unit 14 and the output unit 15 by reading and executing the question word weight calculation program stored in the ROM. The HDD functions as each of the above FAQDB 12 and the question word weight table 16.

次に、上記のＦＡＱＤＢ１２及び質問単語重みテーブル１６を用いて、ＦＡＱ検索を行う質問回答検索装置２０の構成について説明する。なお、本実施形態では、質問回答検索装置２０を、質問単語重み算出装置１０Ａと別体で構成した場合について説明するが、質問回答検索装置２０と質問単語重み算出装置１０Ａとを一体的に構成してもよい。 Next, the configuration of the question answer search device 20 that performs the FAQ search using the above FAQDB 12 and the question word weight table 16 will be described. In the present embodiment, the case where the question answer search device 20 is configured separately from the question word weight calculation device 10A will be described. However, the question answer search device 20 and the question word weight calculation device 10A are integrally configured. May be.

図２は、第１の実施形態に係る質問回答検索装置２０の機能的な構成の一例を示すブロック図である。
図２に示すように、本実施形態に係る質問回答検索装置２０は、ＦＡＱＤＢ１２と、質問単語重みテーブル１６と、ＦＡＱ検索部２２と、を備える。なお、ＦＡＱＤＢ１２及び質問単語重みテーブル１６は、質問回答検索装置２０の外部に設けられていてもよい。 FIG. 2 is a block diagram illustrating an example of a functional configuration of the question / answer search apparatus 20 according to the first embodiment.
As shown in FIG. 2, the question answer search device 20 according to this embodiment includes a FAQ DB 12, a question word weight table 16, and a FAQ search unit 22. The FAQDB 12 and the question word weight table 16 may be provided outside the question answer search device 20.

ＦＡＱ検索部２２は、入力クエリ（質問文）２４を形態素解析して、入力クエリ２４から各単語を抽出する。ＦＡＱ検索部２２は、入力クエリ２４に含まれる各単語と、質問単語重みテーブル１６とに基づいて、各ペアについて、当該ペアの各単語の重み値のうち、入力クエリ２４に含まれる単語と一致する各単語の重み値の総和を、検索スコアとして算出する。 The FAQ search unit 22 performs morphological analysis on the input query (question sentence) 24 and extracts each word from the input query 24. The FAQ search unit 22 matches each word included in the input query 24 and the word included in the input query 24 among the weight values of each word of the pair, based on each word included in the input query 24 and the question word weight table 16. The sum of the weight values of each word to be calculated is calculated as a search score.

具体的には、入力クエリ２４を構成する各単語Ｗｉ_ｋ（ｋ＝１，・・・，ｘ）について、ある質問テキスト（ここではｑとする）における単語の重み値Ｓ（ｑ，Ｗ）を用いて、ｑの検索スコアＳＳ（ｑ）を次の（２）式によって計算する。 Specifically, for each word Wi_k (k = 1,..., X) constituting the input query 24, a word weight value S (q, W) in a certain question text (here, q) is used. Then, the search score SS (q) of q is calculated by the following equation (2).

ｉｆＳ（ｑ，Ｗｉ_ｋ）ｉｓｄｅｆｉｎｅｄｔｈｅｎ
ＳＳ（ｑ）＝ΣＳ（ｑ，Ｗ）（２） if S (q, Wi_k) is defined then
SS (q) = ΣS (q, W) (2)

上記（２）式により、入力クエリ２４とより多くかつ重要な単語が一致しているｑほど、検索スコアＳＳ（ｑ）は高い値となる。 According to the above equation (2), the search score SS (q) has a higher value as q is more consistent with the input query 24 and more important words.

最終的に、ＦＡＱ検索部２２は、例えば、検索スコアＳＳ（ｑ）が高い順にｑを抽出し、抽出したｑを含むペアを、入力クエリ２４に対応する検索結果２６として出力する。 Finally, the FAQ search unit 22 extracts q, for example, in descending order of the search score SS (q), and outputs a pair including the extracted q as the search result 26 corresponding to the input query 24.

なお、本実施形態に係る質問回答検索装置２０は、質問単語重み算出装置１０Ａと同様に、ＣＰＵ、ＲＡＭ、ＲＯＭ、及びＨＤＤ等を備えたコンピュータとして構成される。ＲＯＭには、本実施形態に係る質問回答検索プログラムが記憶されている。なお、質問回答検索プログラムは、ＨＤＤに記憶されていてもよい。 Note that the question answer search device 20 according to the present embodiment is configured as a computer including a CPU, a RAM, a ROM, an HDD, and the like, similar to the question word weight calculation device 10A. The ROM stores a question answer search program according to the present embodiment. The question answer search program may be stored in the HDD.

上記の質問回答検索プログラムは、例えば、質問回答検索装置２０に予めインストールされていてもよい。この質問回答検索プログラムは、不揮発性の記憶媒体に記憶して、又は、ネットワークを介して配布して、質問回答検索装置２０に適宜インストールすることで実現してもよい。なお、不揮発性の記憶媒体の例としては、ＣＤ-ＲＯＭ、光磁気ディスク、ＤＶＤ-ＲＯＭ、フラッシュメモリ、メモリカード等が挙げられる。 The above question answer search program may be installed in advance in the question answer search device 20, for example. This question / answer search program may be realized by being stored in a non-volatile storage medium or distributed via a network and appropriately installed in the question / answer search device 20. Examples of non-volatile storage media include CD-ROM, magneto-optical disk, DVD-ROM, flash memory, memory card, and the like.

ＣＰＵは、ＲＯＭに記憶されている質問回答検索プログラムを読み込んで実行することにより、上記のＦＡＱ検索部２２として機能する。 The CPU functions as the FAQ search unit 22 by reading and executing the question answer search program stored in the ROM.

次に、図３を参照して、第１の実施形態に係る質問単語重み算出装置１０Ａの作用を説明する。なお、図３は、第１の実施形態に係る質問単語重み算出プログラムによる処理の流れの一例を示すフローチャートである。 Next, the operation of the question word weight calculation device 10A according to the first embodiment will be described with reference to FIG. FIG. 3 is a flowchart showing an example of the flow of processing by the question word weight calculation program according to the first embodiment.

まず、図３のステップ１００では、単語重み算出部１４が、ＦＡＱＤＢ１２から、ＦＡＱ中の質問テキスト及び回答テキストのペアの入力を受け付ける。 First, in step 100 of FIG. 3, the word weight calculation unit 14 receives an input of a question text and answer text pair in the FAQ from the FAQ DB 12.

ステップ１０２では、単語重み算出部１４が、入力を受け付けたペアの質問テキスト及び回答テキストの各々を形態素解析し、質問テキスト及び回答テキストの各々から複数の単語を抽出する。 In step 102, the word weight calculation unit 14 performs a morphological analysis on each of the question text and answer text of the pair that has received the input, and extracts a plurality of words from each of the question text and answer text.

ステップ１０４では、単語重み算出部１４が、質問テキスト中の各単語が、回答テキスト中の複数の単語に含まれているか否かに基づいて、質問テキスト中の各単語の重み値を算出する。あるいは、単語重み算出部１４が、上記（１）式を用いて、質問テキスト中の各単語の重み値Ｓ（Ｗｑ_ｊ）を計算する。 In step 104, the word weight calculation unit 14 calculates a weight value of each word in the question text based on whether or not each word in the question text is included in a plurality of words in the answer text. Alternatively, the word weight calculation unit 14 calculates the weight value S (Wq_j) of each word in the question text using the above equation (1).

ステップ１０６では、出力部１５が、当該ペアについて計算した各単語の重み値を格納した質問単語重みテーブル１６を出力する。 In step 106, the output unit 15 outputs the question word weight table 16 in which the weight value of each word calculated for the pair is stored.

ステップ１０８では、出力部１５が、ＦＡＱＤＢ１２に格納されている全ペアについて重み算出処理が終了したか否かを判定する。全ペアについて重み算出処理が終了したと判定した場合（肯定判定の場合）、上記一連の処理を終了する。一方、全ペアについて重み算出処理が終了していないと判定した場合（否定判定の場合）、ステップ１００に戻り処理を繰り返す。 In step 108, the output unit 15 determines whether or not the weight calculation processing has been completed for all pairs stored in the FAQDB 12. When it is determined that the weight calculation process has been completed for all pairs (in the case of an affirmative determination), the above series of processes ends. On the other hand, when it is determined that the weight calculation processing has not been completed for all pairs (in the case of negative determination), the processing returns to step 100 and is repeated.

次に、図４を参照して、第１の実施形態に係る質問回答検索装置２０の作用を説明する。なお、図４は、第１の実施形態に係る質問回答検索プログラムによる処理の流れの一例を示すフローチャートである。 Next, with reference to FIG. 4, the operation of the question / answer search device 20 according to the first embodiment will be described. FIG. 4 is a flowchart showing an example of a process flow by the question answer search program according to the first embodiment.

まず、図４のステップ１１０では、ＦＡＱ検索部２２が、入力クエリ２４を受け付ける。 First, in step 110 of FIG. 4, the FAQ search unit 22 receives an input query 24.

ステップ１１２では、ＦＡＱ検索部２２が、入力クエリ２４を形態素解析し、入力クエリ２４から複数の単語を抽出する。 In step 112, the FAQ search unit 22 performs morphological analysis on the input query 24 and extracts a plurality of words from the input query 24.

ステップ１１４では、ＦＡＱ検索部２２が、上記（２）式を用いて、入力クエリ２４と質問テキスト（ｑ）との間における検索スコアＳＳ（ｑ）を計算する。 In step 114, the FAQ search unit 22 calculates a search score SS (q) between the input query 24 and the question text (q) using the above equation (2).

ステップ１１６では、ＦＡＱ検索部２２が、ＦＡＱＤＢ１２に格納されている全ての質問テキストについて検索スコア算出処理が終了したか否かを判定する。全ての質問テキストについて検索スコア算出処理が終了したと判定した場合（肯定判定の場合）、上記一連の処理を終了する。一方、全ての質問テキストについて検索スコア算出処理が終了していないと判定した場合（否定判定の場合）、ステップ１１４に戻り処理を繰り返す。 In step 116, the FAQ search unit 22 determines whether or not the search score calculation processing has been completed for all the question texts stored in the FAQ DB 12. When it is determined that the search score calculation processing has been completed for all the question texts (in the case of an affirmative determination), the above series of processing ends. On the other hand, when it is determined that the search score calculation process has not been completed for all the question texts (in the case of negative determination), the process returns to step 114 and is repeated.

ステップ１１８では、ＦＡＱ検索部２２が、検索スコアが高い順にｑを抽出し、抽出したｑを含むペアを、入力クエリ２４に対応する検索結果２６として出力し、上記一連の処理を終了する。 In step 118, the FAQ search unit 22 extracts q in descending order of the search score, outputs a pair including the extracted q as the search result 26 corresponding to the input query 24, and ends the series of processes.

本実施形態によれば、ＦＡＱの質問テキスト中に含まれる回答に不要な単語の影響を低減することができる。このため、整備されていないＦＡＱに対しても高い精度でＦＡＱ検索を行うことができる。 According to the present embodiment, it is possible to reduce the influence of words unnecessary for answers included in the question text of the FAQ. For this reason, FAQ search can be performed with high accuracy even for FAQs that are not maintained.

［第２の実施形態］
本実施形態では、サービスに関連するマニュアル等の外部文書から、回答テキストと関連性の高い記載箇所を文書内テキストとして抽出し、抽出した文書内テキストから、質問テキストに含まれていない単語を拡張語として抽出する。これにより、質問テキストに含まれるべき単語を拡張語として考慮したＦＡＱ検索を実現する。 [Second Embodiment]
In this embodiment, from the external document such as a manual related to the service, the description part highly relevant to the answer text is extracted as the text in the document, and the words not included in the question text are expanded from the extracted text in the document Extract as a word. Thereby, the FAQ search considering the word to be included in the question text as an extended word is realized.

図５は、第２の実施形態に係る質問単語重み算出装置１０Ｂの機能的な構成の一例を示すブロック図である。
なお、第２の実施形態で説明する装置構成においては、第１の実施形態で説明した装置構成と同様な部分を含み、同様な部分には同一の符号を付してその説明を省略し、異なる部分のみを説明する。 FIG. 5 is a block diagram illustrating an example of a functional configuration of the question word weight calculation device 10B according to the second embodiment.
The apparatus configuration described in the second embodiment includes the same parts as the apparatus configuration described in the first embodiment, and the same parts are denoted by the same reference numerals and description thereof is omitted. Only the differences will be described.

図５に示すように、本実施形態に係る質問単語重み算出装置１０Ｂは、ＦＡＱＤＢ１２と、単語重み算出部１４と、出力部１５と、質問単語重みテーブル１６と、更に、外部文書ＤＢ３０と、文書内テキスト抽出部３２と、拡張語抽出部３４と、を備える。 As shown in FIG. 5, the question word weight calculation device 10B according to the present embodiment includes a FAQ DB 12, a word weight calculation unit 14, an output unit 15, a question word weight table 16, an external document DB 30, and a document. An internal text extraction unit 32 and an extended word extraction unit 34 are provided.

外部文書ＤＢ３０には、顧客に提供するサービスに関するマニュアルや取扱説明書等の外部文書が格納されている。 The external document DB 30 stores external documents such as manuals and instruction manuals related to services provided to customers.

文書内テキスト抽出部３２は、ＦＡＱＤＢ１２からテキストペア群を構成する各ペアの入力を受け付けると共に、外部文書ＤＢ３０から外部文書の入力を受け付ける。文書内テキスト抽出部３２は、ＦＡＱの各ペアに対して処理を行う。つまり、文書内テキスト抽出部３２は、質問テキスト及び回答テキストの各ペアについて、外部文書から、回答テキストから抽出される単語を含む関連記載箇所を文書内テキストとして抽出する。抽出方法としては、例えば、回答テキスト毎に単語を抽出し、抽出した単語が外部文書内の文の塊毎にいくつ含まれるかをカウントする。文の塊の単位は、文でも良いし、段落でも良いし、章節でも良い。そして、単語の含有数が多い順に文の塊を並び替え、予め定めた抽出基準に基づいて、関連記載箇所を文書内テキストとして抽出する。抽出基準としては、例えば、単語含有数に基づく閾値でも良いし、上位Ｎ個としても良い。 The in-document text extraction unit 32 accepts input of each pair constituting the text pair group from the FAQ DB 12 and accepts input of the external document from the external document DB 30. The in-document text extraction unit 32 performs processing for each FAQ pair. That is, the in-document text extraction unit 32 extracts, as in-document text, a related description location including a word extracted from the answer text from the external document for each pair of the question text and the answer text. As an extraction method, for example, a word is extracted for each answer text, and the number of extracted words included in each sentence block in the external document is counted. The unit of a sentence block may be a sentence, a paragraph, or a chapter. Then, sentence chunks are rearranged in descending order of the number of words contained, and related description locations are extracted as in-document text based on a predetermined extraction criterion. As an extraction criterion, for example, a threshold based on the number of words contained may be used, or the top N may be used.

なお、単語の含有数をカウントする場合に、上述の単語重み算出部１４で算出した回答テキスト毎のｔｆ-ｉｄｆ値を用いても良い。この場合、カウント値ではなく、ｔｆ-ｉｄｆ値の総和を用いる。これにより、単語の重要度を考慮した文書内テキストを抽出することができる。 Note that when counting the number of words contained, the tf-idf value for each answer text calculated by the word weight calculator 14 described above may be used. In this case, the sum of tf-idf values is used instead of the count value. Thereby, the text in a document which considered the importance of the word can be extracted.

拡張語抽出部３４は、ＦＡＱの各ペアについて、文書内テキスト抽出部３２により当該ペアについて抽出された文書内テキストから、当該ペアの質問テキストに含まれていない単語を拡張語として抽出する。拡張語抽出部３４は、拡張語として抽出した単語の重み値Ｓを、当該ペアについての単語の重み値として質問単語重みテーブル１６に追加する。このとき、拡張語としての各単語の重み値を一律Ｚで決定しても良いし、上記で定めた文の塊毎に、外部文書に対するｔｆ-ｉｄｆ値を算出し、そのｔｆ-ｉｄｆ値に基づいて拡張語としての各単語の重み値を決定しても良い。 For each pair of FAQs, the extended word extraction unit 34 extracts, as an extended word, a word that is not included in the question text of the pair from the text in the document extracted for the pair by the in-document text extraction unit 32. The extended word extraction unit 34 adds the word weight value S extracted as the extended word to the question word weight table 16 as the word weight value for the pair. At this time, the weight value of each word as an extended word may be determined uniformly by Z, or the tf-idf value for the external document is calculated for each sentence block defined above, and the tf-idf value is calculated. Based on this, the weight value of each word as an extended word may be determined.

なお、本実施形態に係る質問単語重み算出装置１０Ｂは、上述の質問単語重み算出装置１０Ａと同様に、ＣＰＵ、ＲＡＭ、ＲＯＭ、及びＨＤＤ等を備えたコンピュータとして構成される。ＲＯＭには、本実施形態に係る質問単語重み算出プログラム及び質問単語重み追加プログラムが記憶されている。ＣＰＵは、ＲＯＭに記憶されている質問単語重み算出プログラム及び質問単語重み追加プログラムを読み込んで実行することにより、上記の単語重み算出部１４、出力部１５、文書内テキスト抽出部３２、及び拡張語抽出部３４として機能する。 Note that the question word weight calculation device 10B according to the present embodiment is configured as a computer including a CPU, a RAM, a ROM, an HDD, and the like, similar to the question word weight calculation device 10A described above. The ROM stores a question word weight calculation program and a question word weight addition program according to the present embodiment. The CPU reads and executes the question word weight calculation program and the question word weight addition program stored in the ROM, thereby executing the word weight calculation unit 14, the output unit 15, the in-document text extraction unit 32, and the extended word. It functions as the extraction unit 34.

次に、上記図３、図６を参照して、第２の実施形態に係る質問単語重み算出装置１０Ｂの作用を説明する。なお、図６は、第２の実施形態に係る質問単語重み追加プログラムによる処理の流れの一例を示すフローチャートである。 Next, the operation of the question word weight calculation device 10B according to the second embodiment will be described with reference to FIGS. FIG. 6 is a flowchart showing an example of the flow of processing by the question word weight addition program according to the second embodiment.

まず、質問単語重み算出装置１０Ｂは、上記図３に示す質問単語重み算出プログラムによる処理を実行する。
次に、質問単語重み算出装置１０Ｂは、図６に示す質問単語重み追加プログラムによる処理を実行する。 First, the question word weight calculation device 10B executes processing by the question word weight calculation program shown in FIG.
Next, the question word weight calculation device 10B executes processing by the question word weight addition program shown in FIG.

まず、図６のステップ１２０では、文書内テキスト抽出部３２が、ＦＡＱＤＢ１２から、ＦＡＱ中の質問テキスト及び回答テキストのペアの入力を受け付ける。 First, in step 120 of FIG. 6, the in-document text extraction unit 32 receives an input of a question text and answer text pair in the FAQ from the FAQ DB 12.

ステップ１２２では、文書内テキスト抽出部３２が、入力を受け付けたペアの回答テキストを形態素解析し、回答テキストから単語を抽出する。 In step 122, the in-document text extraction unit 32 performs morphological analysis on the paired answer texts that have received the input, and extracts words from the answer texts.

ステップ１２４では、文書内テキスト抽出部３２が、外部文書ＤＢ３０に格納されている外部文書から、回答テキストから抽出した単語を含む関連記載箇所を文書内テキストとして抽出する。 In step 124, the in-document text extraction unit 32 extracts a related description portion including a word extracted from the answer text from the external document stored in the external document DB 30 as in-document text.

ステップ１２６では、拡張語抽出部３４が、文書内テキスト抽出部３２により抽出された文書内テキストから、質問テキストに含まれていない単語を拡張語として抽出する。 In step 126, the extended word extraction unit 34 extracts words that are not included in the question text as extended words from the in-document text extracted by the in-document text extraction unit 32.

ステップ１２８では、拡張語抽出部３４が、拡張語として抽出した単語の重み値を決定する。 In step 128, the extended word extraction unit 34 determines the weight value of the word extracted as the extended word.

ステップ１３０では、拡張語抽出部３４が、上記で決定した単語の重み値を、ペアについての単語の重み値として質問単語重みテーブル１６に追加する。 In step 130, the extended word extraction unit 34 adds the word weight value determined above to the question word weight table 16 as the word weight value for the pair.

ステップ１３２では、拡張語抽出部３４が、ＦＡＱＤＢ１２に格納されている全ペアについて重み追加処理が終了したか否かを判定する。全ペアについて重み追加処理が終了したと判定した場合（肯定判定の場合）、上記一連の処理を終了する。一方、全ペアについて重み追加処理が終了していないと判定した場合（否定判定の場合）、ステップ１２０に戻り処理を繰り返す。 In step 132, the extended word extraction unit 34 determines whether or not the weight addition processing has been completed for all pairs stored in the FAQ DB 12. When it is determined that the weight addition processing has been completed for all pairs (in the case of an affirmative determination), the above series of processing ends. On the other hand, when it is determined that the weight addition processing has not been completed for all pairs (in the case of negative determination), the processing returns to step 120 and is repeated.

なお、本実施形態においても、入力クエリ２４に対してＦＡＱ検索を実行する場合には、第１の実施形態と同様に、質問回答検索装置２０を適用することができる。 In the present embodiment as well, the question answer search device 20 can be applied in the same way as in the first embodiment when executing the FAQ search for the input query 24.

本実施形態によれば、ＦＡＱの質問テキスト中に含まれる回答に不要な単語の影響を低減しつつ、外部文書から、回答テキストに関連しかつ質問テキストにはない単語を拡張語として追加することができる。このため、整備されていないＦＡＱに対してもより高い精度でＦＡＱ検索を行うことができる。 According to the present embodiment, the word related to the answer text and not included in the question text is added as an extended word from the external document while reducing the influence of the word unnecessary for the answer included in the question text of the FAQ. Can do. For this reason, it is possible to perform FAQ search with higher accuracy even for FAQs that are not maintained.

以上、実施形態として質問単語重み算出装置及び質問回答検索装置を例示して説明した。実施形態は、コンピュータを、質問単語重み算出装置、又は、質問回答検索装置が備える各部として機能させるためのプログラムの形態としてもよい。実施形態は、このプログラムを記憶したコンピュータが読み取り可能な記憶媒体の形態としてもよい。 As described above, the question word weight calculation device and the question answer search device have been described as examples. The embodiment may be in the form of a program for causing a computer to function as each unit included in the question word weight calculation device or the question answer search device. The embodiment may be in the form of a computer-readable storage medium storing this program.

その他、上記実施形態で説明した質問単語重み算出装置及び質問回答検索装置の構成は、一例であり、主旨を逸脱しない範囲内において状況に応じて変更してもよい。 In addition, the configurations of the question word weight calculation device and the question answer search device described in the above embodiment are merely examples, and may be changed according to the situation without departing from the gist.

また、上記実施形態で説明したプログラムの処理の流れも、一例であり、主旨を逸脱しない範囲内において不要なステップを削除したり、新たなステップを追加したり、処理順序を入れ替えたりしてもよい。 Further, the processing flow of the program described in the above embodiment is an example, and unnecessary steps may be deleted, new steps may be added, or the processing order may be changed within a range not departing from the gist. Good.

また、上記実施形態では、プログラムを実行することにより、実施形態に係る処理がコンピュータを利用してソフトウェア構成により実現される場合について説明したが、これに限らない。実施形態は、例えば、ハードウェア構成や、ハードウェア構成とソフトウェア構成との組み合わせによって実現してもよい。 Moreover, although the said embodiment demonstrated the case where the process which concerns on embodiment was implement | achieved by a software structure using a computer by running a program, it is not restricted to this. The embodiment may be realized by, for example, a hardware configuration or a combination of a hardware configuration and a software configuration.

１０Ａ、１０Ｂ質問単語重み算出装置
１２ＦＡＱＤＢ
１４単語重み算出部
１５出力部
１６質問単語重みテーブル
２０質問回答検索装置
２２ＦＡＱ検索部
２４入力クエリ
２６検索結果
３０外部文書ＤＢ
３２文書内テキスト抽出部
３４拡張語抽出部 10A, 10B Question word weight calculation device 12 FAQDB
14 Word Weight Calculation Unit 15 Output Unit 16 Question Word Weight Table 20 Question Answer Search Device 22 FAQ Search Unit 24 Input Query 26 Search Result 30 External Document DB
32 Text extractor in document 34 Extended word extractor

Claims

Extracted from the question text using a plurality of words extracted from the answer text for each pair of the text pair group, taking as input a text pair group consisting of a question text and an answer text to the question text A word weight calculation unit for calculating a weight value indicating the importance of each word answer;
For each of the pairs, an output unit that outputs a question word weight table storing the weight value of each word calculated for the pair by the word weight calculation unit;
Question word weight calculation device including

An in-document text extraction unit that extracts, as the in-document text, a related description location including a word extracted from the answer text from the external document for each pair of the text pair group, using the text pair group and the external document as input. When,
For each of the pairs, from the in-document text extracted for the pair by the in-document text extraction unit, a word not included in the question text of the pair is extracted as an extended word, and the word weight value extracted as the extended word An expanded word extraction unit for adding to the question word weight table as a word weight for the pair;
The question word weight calculation device according to claim 1, further comprising:

The question, which is calculated using a plurality of words extracted from the answer text for each pair of the text pair group, taking as input a text pair group consisting of a question text and an answer text to the question text A question word weight table storing weight values indicating the importance of answers in each word extracted from the text;
Based on each word included in the input query and the question word weight table, for each pair, the weight value of each word that matches the word included in the input query among the weight values of each word of the pair A FAQ search unit that calculates a total of the search query and outputs the pair corresponding to the input query as a search result according to the search score;
Question answering search device including

The word weight calculation unit receives a text pair group consisting of a pair of a question text and an answer text for the question text, and uses a plurality of words extracted from the answer text for each pair of the text pair group. Calculating a weight value indicating the importance of each word answer extracted from the question text;
An output unit that outputs, for each pair, a question word weight table that stores a weight value of each word calculated for the pair by the word weight calculation unit;
Question word weight calculation method including

The in-document text extraction unit receives the text pair group and the external document as input, and for each pair of the text pair group, the related description location including the word extracted from the answer text from the external document as the in-document text Extracting, and
An extended word extraction unit extracts, as an extended word, a word that is not included in the question text of the pair from the in-document text extracted for the pair by the in-document text extraction unit for each pair. Adding the extracted word weight value to the question word weight table as a word weight for the pair;
The question word weight calculation method according to claim 4, further comprising:

The FAQ search unit inputs each text included in the input query, a question word weight table, and a text pair group consisting of a pair of a question text and an answer text to the question text. Based on a question word weight table storing weight values indicating importance in answers of each word extracted from the question text, calculated using a plurality of words extracted from the answer text for the pair, For each pair, among the weight values of each word of the pair, the sum of the weight values of each word that matches the word included in the input query is calculated as a search score, and the input query is determined according to the search score. A question answer search method including a step of outputting the corresponding pair as a search result.

The program for functioning a computer as each part with which the question word weight calculation apparatus of Claim 1 or 2 is provided, or each part with which the question answer search apparatus is provided with Claim 3.

The storage medium which memorize | stored the program for functioning a computer as each part with which the question word weight calculation apparatus of Claim 1 or 2 is provided, or each part with which the question answer search apparatus is provided with Claim 3.