JP2013143066A

JP2013143066A - Question and answer program, server and method which use large amount of comment texts

Info

Publication number: JP2013143066A
Application number: JP2012003713A
Authority: JP
Inventors: Kazunori Matsumoto; 一則松本; Hajime Hattori; 元服部; Toshihiro Ono; 智弘小野
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2012-01-12
Filing date: 2012-01-12
Publication date: 2013-07-22
Anticipated expiration: 2032-01-12
Also published as: JP5711674B2

Abstract

PROBLEM TO BE SOLVED: To provide a question and answer server etc. which can clearly show (or narrow down to) an answer sentence in which a user intention is reflected, when there are a plurality of candidates for the answer sentence to a question sentence from the user.SOLUTION: A question and answer program comprises: question keyword extraction means which extracts a plurality of question keywords from a question sentence; comment text retrieval means which retrieves comment texts including the question keywords; topic classification means which classifies the retrieved plurality of comment texts into a plurality of topic groups; answer sentence detection means which associates an answer sentence whose similarity with each topic and each answer sentence becomes more than a prescribed threshold; representative keyword extraction means which extracts a representative keyword for each topic; differential keyword extraction means which extracts the representative keyword occurring only in the topic as a differential keyword; differential keyword selection means which clearly shows a plurality of differential keywords to a user so as to be selectable; and answer sentence output means which clearly shows the answer sentence corresponding to the differential keywords to the user.

Description

本発明は、質問文の入力に対して最適な回答文を出力する質問回答プログラムの技術に関する。 The present invention relates to a technique for a question answering program that outputs an optimum answer sentence in response to a question sentence input.

近年、ＦＡＱ(Frequently Asked Questions)に基づく質問回答システムが構築されている。「ＦＡＱ」とは、多数の人が共通して頻繁に尋ねる質問に対する回答をまとめた問答集をいう。質問回答システムは、特定種類の情報に関する質問文をユーザから自然言語で入力し、その回答文を出力するソフトウェアをいう。一般に、質問回答システムは、仮想質問文とそれに紐づけられた回答候補文とを予めデータベースに記憶する。その上で、質問回答システムは、以下のようなステップで処理を実行する。
（１）ユーザから入力された質問文から、特徴的な単語をクエリとして抽出する。
（２）検索エンジンを用いて、複数のクエリの出現頻度が高い仮想質問文を選択する。
（３）選択された仮想選択文に対する回答文を選択する。
（４）選択された回答文をユーザに提示する。 In recent years, a question answering system based on FAQ (Frequently Asked Questions) has been constructed. “FAQ” refers to a collection of answers to questions that are frequently asked by many people in common. The question answering system is software that inputs a question sentence concerning a specific type of information from a user in a natural language and outputs the answer sentence. Generally, the question answering system stores a virtual question sentence and an answer candidate sentence associated therewith in a database in advance. In addition, the question answering system executes processing in the following steps.
(1) A characteristic word is extracted as a query from a question sentence input by a user.
(2) Using a search engine, select a virtual question sentence with a high appearance frequency of a plurality of queries.
(3) Select an answer sentence for the selected virtual selection sentence.
(4) Present the selected answer sentence to the user.

このような質問回答システムは、ユーザに対して単体装置として存在するものもあれば、インターネット上に質問回答サーバとして接続されたものものある。この質問回答サーバは、ユーザ操作の端末からネットワークを介して質問文を受信し、回答文をその端末へ送信する（例えば非特許文献１参照）。 Some of such question answering systems exist as a single device for the user, and others are connected as a question answering server on the Internet. The question answering server receives a question sentence from a user-operated terminal via a network, and transmits the answer sentence to the terminal (see, for example, Non-Patent Document 1).

また、他の技術として、インターネット上に、ブログ(Web log)サーバやミニブログ(mini Web log)（例えばtwitter（登録商標））サーバが接続されている。このようなブログサーバは、不特定多数の第三者からのコメント文章を受信し、他の第三者へ公開する。このようなコメント文章は、様々な話題について公開されており、勿論、前述した質問回答システムに入出力される質問文及び回答文に関連するコメント文章も多く議論されている。 As another technique, a blog (Web log) server or a mini blog (for example, twitter (registered trademark)) server is connected to the Internet. Such a blog server receives comment texts from an unspecified number of third parties and publishes them to other third parties. Such comment texts are disclosed on various topics, and of course, a lot of comment texts related to the question texts and answer texts input to and output from the question answering system described above are also discussed.

特開２０１１−８１６２６号公報JP 2011-81626 A 特開２００５−１４１４２８号公報JP 2005-141428 A 特開２００５−２８４２０９号公報JP 2005-284209 A

ＫＤＤＩ、「au one NETコンシェルジュ」、[online]、［平成２３年１２月１８日検索］、インターネット＜URL:http://concierge.auone-net.jp/inagoNetPeople/BrowserClient/GUI/help.html＞KDDI, "au one NET Concierge", [online], [Search on December 18, 2011], Internet <URL: http://concierge.auone-net.jp/inagoNetPeople/BrowserClient/GUI/help.html> 坪坂正志、「Latent Dirichlet Allocation入門」、[online]、［平成２３年１２月１８日検索］、インターネット＜URL:http://www.slideshare.net/tsubosaka/tokyotextmining＞Masashi Tsubosaka, “Introduction to Latent Dirichlet Allocation”, [online], [Search December 18, 2011], Internet <URL: http://www.slideshare.net/tsubosaka/tokyotextmining>

しかしながら、同じ質問文であっても、そのユーザの質問の意図が複数あり得る場合がある。このような場合、ユーザに対して、適切な回答文が返答されない場合が多い。 However, there may be a plurality of intentions of the user's question even in the same question sentence. In such a case, an appropriate answer sentence is often not returned to the user.

ユーザの質問文の例
Ｑ「携帯電話機の紛失」
この質問文に対して、質問回答システムは、以下の２つキーワードを抽出する。
「携帯電話機」「紛失」
これらキーワードをクエリとして回答文を検索すると、複数の回答の選択肢がある。
Ａ「携帯探せて安心サービスの申込方法」に関する回答文
Ａ「携帯探せて安心サービスの利用方法」に関する回答文
この場合、ユーザとしては、紛失した携帯電話機を遠隔からロックする「利用方法」を問い合わせたつもりであるにも拘わらず、質問回答システムは、「申込方法」について回答してしまう場合もある。 Examples of user questions Q “Lost mobile phone”
For this question sentence, the question answering system extracts the following two keywords.
"Mobile phone""Lost"
When answer sentences are searched using these keywords as queries, there are a plurality of answer options.
A “Responding to“ How to find a mobile phone and secure service ”A Answer to“ How to use mobile phone to find a safe service ”In this case, the user inquires about“ how to use ”to lock a lost mobile phone remotely. In spite of the intention, the question answering system may answer about the “application method”.

そこで、本発明は、ユーザの質問文に対して複数の回答文の候補が存在する場合、ユーザの意図を反映した回答文を明示する（に絞り込む）ことができる質問回答プログラム、サーバ及び方法を提供することを目的とする。 Therefore, the present invention provides a question answering program, server, and method capable of clearly specifying (restricting to) an answer sentence reflecting the user's intention when there are a plurality of answer sentence candidates for the user's question sentence. The purpose is to provide.

本発明によれば、多数のコメント文章を蓄積したコメント文章蓄積部と、多数の回答文を蓄積した回答文蓄積部とを有し、ユーザからの質問文に対する回答文を抽出するようにコンピュータを機能させる質問回答プログラムであって、
質問文を入力する質問文入力手段と、
質問文に含まれる複数の質問キーワードを抽出する質問キーワード抽出手段と、
コメント文章蓄積部を用いて、質問キーワードを含むコメント文章を検索するコメント文章検索手段と、
検索された複数のコメント文章を、出現単語の分布から複数個のトピックグループに分類するトピック分類手段と、
各トピックグループに含まれるコメント文章群と、各回答文に含まれる文章との間の類似度を算出し、各トピックグループに類似度が所定閾値以上となる回答文を対応付ける回答文検出手段と、
各トピックグループについて、対応付けられた回答文に含まれるキーワードの中で、当該トピックグループを特徴付ける代表キーワードを抽出する代表キーワード抽出手段と、
各トピックグループについて、当該トピックグループのみに出現する代表キーワードを、差分キーワードとして抽出する差分キーワード抽出手段と、
回答文検出手段によって検出された回答文を、対応する１つ以上の差分キーワードと共に明示する回答文出力手段と
してコンピュータを機能させることを特徴とする。 According to the present invention, the computer has a comment text storage unit that stores a large number of comment texts and an answer text storage unit that stores a large number of response texts, and extracts a response text to a question text from a user. A question answering program that works,
A question sentence input means for inputting a question sentence;
A question keyword extracting means for extracting a plurality of question keywords included in a question sentence;
Comment text search means for searching for comment text containing a question keyword using the comment text storage unit;
A topic classification means for classifying a plurality of searched comment sentences into a plurality of topic groups from the distribution of appearance words;
An answer sentence detection means for calculating a similarity between a comment sentence group included in each topic group and a sentence included in each answer sentence, and associating an answer sentence having a similarity equal to or greater than a predetermined threshold with each topic group;
For each topic group, representative keyword extraction means for extracting a representative keyword characterizing the topic group from the keywords included in the associated answer sentence;
For each topic group, a differential keyword extraction unit that extracts a representative keyword that appears only in the topic group as a differential keyword;
The computer is caused to function as an answer sentence output means for clearly indicating an answer sentence detected by the answer sentence detecting means together with one or more corresponding difference keywords.

本発明の質問回答プログラムにおける他の実施形態によれば、
複数の差分キーワードを、ユーザインタフェースを介してユーザに明示すると共に、ユーザ操作に応じていずれか１つの差分キーワードを選択させる差分キーワード選択手段を更に有し、
回答文出力手段は、選択された差分キーワードに対応する回答文を、ユーザインタフェースを介して明示する
ようにコンピュータを機能させることも好ましい。 According to another embodiment of the question answering program of the present invention,
A differential keyword selecting unit that clearly indicates a plurality of differential keywords to the user via the user interface and that selects any one differential keyword according to a user operation,
The answer sentence output means preferably causes the computer to function so as to clearly indicate the answer sentence corresponding to the selected difference keyword via the user interface.

本発明の質問回答プログラムにおける他の実施形態によれば、
トピック分類手段は、当該コメント文章を、分類された各トピックグループに属する確からしさ（トピック比率）を算出するＬＤＡ(Latent Dirichlet Allocation)アルゴリズムを用いて、いずれか１つのトピックグループに分類するようにコンピュータを機能させることも好ましい。 According to another embodiment of the question answering program of the present invention,
The topic classification means uses a computer to classify the comment text into any one topic group using an LDA (Latent Dirichlet Allocation) algorithm that calculates the likelihood (topic ratio) belonging to each classified topic group. It is also preferable to function.

本発明の質問回答プログラムにおける他の実施形態によれば、
質問キーワード抽出手段は、質問文から形態素解析によってキーワードを抽出すると共に、ＴＦ−ＩＤＦ（Term Frequency - Inverse Document Frequency：単語の出現頻度−逆出現頻度）によって特徴的な単語を、質問キーワードとして抽出し、
及び／又は、
代表キーワード抽出手段は、回答文から形態素解析によってキーワードを抽出すると共に、ＴＦ−ＩＤＦによって特徴的な単語を、代表キーワードとして抽出する
ようにコンピュータを機能させることも好ましい。 According to another embodiment of the question answering program of the present invention,
The question keyword extraction means extracts a keyword from a question sentence by morphological analysis, and extracts a characteristic word as a question keyword by TF-IDF (Term Frequency-Inverse Document Frequency). ,
And / or
The representative keyword extracting means preferably extracts the keyword from the answer sentence by morphological analysis and also causes the computer to function so as to extract a characteristic word as a representative keyword by TF-IDF.

本発明の質問回答プログラムにおける他の実施形態によれば、
回答文検出手段は、
各トピックグループに含まれるコメント文章群から形態素解析によってキーワードを抽出すると共に、ＴＦ−ＩＤＦによってトピックグループにおける第１の特徴ベクトルを算出し、
回答文蓄積部に蓄積された各回答文から形態素解析によってキーワードを抽出すると共に、ＴＦ−ＩＤＦによって当該回答文における第２の特徴ベクトルとを算出し、
トピックグループの第１のベクトルと、回答文の第２のベクトルとの間のコサイン距離に基づいて類似度を算出する
ようにコンピュータを機能させることも好ましい。 According to another embodiment of the question answering program of the present invention,
Answer detection means
A keyword is extracted by morphological analysis from a comment sentence group included in each topic group, and a first feature vector in the topic group is calculated by TF-IDF.
A keyword is extracted from each answer sentence stored in the answer sentence storage unit by morphological analysis, and a second feature vector in the answer sentence is calculated by TF-IDF.
It is also preferable to cause the computer to function to calculate the similarity based on the cosine distance between the first vector of the topic group and the second vector of the answer sentence.

本発明の質問回答プログラムにおける他の実施形態によれば、
代表キーワード抽出手段は、各トピックグループの代表キーワードを、赤池情報量基準に応じて優先順に並べるようにコンピュータを機能させることも好ましい。 According to another embodiment of the question answering program of the present invention,
The representative keyword extracting means preferably causes the computer to function so that the representative keywords of each topic group are arranged in priority order according to the Akaike information amount standard.

本発明の質問回答プログラムにおける他の実施形態によれば、
コメント文章は、不特定多数の第三者によって投稿されたものであって、
コメント文章蓄積部は、ミニブログ(mini Web log)サーバに投稿されたコメント文章を収集し蓄積したものであるようにコンピュータを機能させることも好ましい。 According to another embodiment of the question answering program of the present invention,
Comment text is posted by an unspecified number of third parties,
It is also preferable that the comment text storage unit allows the computer to function so as to collect and store the comment text posted on the mini web log server.

本発明によれば、多数のコメント文章を蓄積したコメント文章蓄積部と、多数の回答文を蓄積した回答文蓄積部とを有し、ユーザからの質問文に対する回答文を抽出する質問回答サーバであって、
端末から、質問文を入力する質問文入力手段と、
質問文に含まれる複数の質問キーワードを抽出する質問キーワード抽出手段と、
コメント文章蓄積部を用いて、質問キーワードを含むコメント文章を検索するコメント文章検索手段と、
検索された複数のコメント文章を、出現単語の分布から複数個のトピックグループに分類するトピック分類手段と、
各トピックグループに含まれるコメント文章群と、各回答文との間の類似度を算出し、各トピックグループに類似度が所定閾値以上となる回答文を対応付ける回答文検出手段と、
各トピックグループについて、対応付けられた回答文に含まれるキーワードの中で、当該トピックグループを特徴付ける代表キーワードを抽出する代表キーワード抽出手段と、
各トピックグループについて、当該トピックグループのみに出現する代表キーワードを、差分キーワードとして抽出する差分キーワード抽出手段と、
回答文検出手段によって検出された回答文を、対応する１つ以上の差分キーワードと共に明示する回答文出力手段と
を有することを特徴とする。 According to the present invention, there is provided a question answering server that has a comment sentence accumulating unit that accumulates a large number of comment sentences and an answer sentence accumulating unit that accumulates a large number of answer sentences, and extracts an answer sentence for a question sentence from a user. There,
A question sentence input means for inputting a question sentence from a terminal;
A question keyword extracting means for extracting a plurality of question keywords included in a question sentence;
Comment text search means for searching for comment text containing a question keyword using the comment text storage unit;
A topic classification means for classifying a plurality of searched comment sentences into a plurality of topic groups from the distribution of appearance words;
A comment sentence group included in each topic group and an answer sentence calculating means for calculating a similarity between each answer sentence and associating each topic group with an answer sentence having a similarity equal to or greater than a predetermined threshold;
For each topic group, representative keyword extraction means for extracting a representative keyword characterizing the topic group from the keywords included in the associated answer sentence;
For each topic group, a differential keyword extraction unit that extracts a representative keyword that appears only in the topic group as a differential keyword;
Answer text output means for clearly indicating the answer text detected by the answer text detection means together with one or more corresponding difference keywords.

本発明の質問回答サーバにおける他の実施形態によれば、
複数の差分キーワードを、ユーザインタフェースを介してユーザに明示すると共に、ユーザ操作に応じていずれか１つの差分キーワードを選択させる差分キーワード選択手段を更に有し、
回答文出力手段は、選択された差分キーワードに対応する回答文を、ユーザインタフェースを介して明示することも好ましい。 According to another embodiment of the question answering server of the present invention,
A differential keyword selecting unit that clearly indicates a plurality of differential keywords to the user via the user interface and that selects any one differential keyword according to a user operation,
It is also preferable that the answer sentence output means clearly indicates the answer sentence corresponding to the selected difference keyword via the user interface.

本発明によれば、多数のコメント文章を蓄積したコメント文章蓄積部と、多数の回答文を蓄積した回答文蓄積部とを有し、ユーザからの質問文に対する回答文を抽出する装置における質問回答方法であって、
質問文を入力する第１のステップと、
質問文に含まれる複数の質問キーワードを抽出する第２のステップと、
コメント文章蓄積部を用いて、質問キーワードを含むコメント文章を検索する第３のステップと、
検索された複数のコメント文章を、出現単語の分布から複数個のトピックグループに分類する第４のステップと、
各トピックグループに含まれるコメント文章群と、各回答文との間の類似度を算出し、各トピックグループに類似度が所定閾値以上となる回答文を対応付ける第５のステップと、
各トピックグループについて、対応付けられた回答文に含まれるキーワードの中で、当該トピックグループを特徴付ける代表キーワードを抽出する第６のステップと、
各トピックグループについて、当該トピックグループのみに出現する代表キーワードを、差分キーワードとして抽出する第７のステップと、
第５のステップによって検出された回答文を、対応する１つ以上の差分キーワードと共に明示する第８のステップと
を有することを特徴とする。 According to the present invention, a question answer in an apparatus that has a comment text storage unit that stores a large number of comment texts and a response text storage unit that stores a large number of response texts and extracts a response text to a question text from a user A method,
A first step of inputting a question sentence;
A second step of extracting a plurality of question keywords included in the question sentence;
A third step of searching for a comment sentence including a question keyword using the comment sentence storage unit;
A fourth step of classifying the plurality of searched comment sentences into a plurality of topic groups from the distribution of appearance words;
A fifth step of calculating a similarity between a comment sentence group included in each topic group and each answer sentence, and associating an answer sentence having a similarity equal to or greater than a predetermined threshold to each topic group;
For each topic group, a sixth step of extracting a representative keyword characterizing the topic group from the keywords included in the associated answer sentence;
For each topic group, a seventh step of extracting a representative keyword that appears only in the topic group as a difference keyword;
And an eighth step of clearly indicating the answer sentence detected by the fifth step together with the corresponding one or more difference keywords.

本発明の質問回答方法における他の実施形態によれば、
第８のステップについて、
複数の差分キーワードを、ユーザインタフェースを介してユーザに明示すると共に、ユーザ操作に応じていずれか１つの差分キーワードを選択させ、
選択された差分キーワードに対応する回答文を、ユーザインタフェースを介して明示することも好ましい。 According to another embodiment of the question answering method of the present invention,
For the eighth step,
A plurality of difference keywords are clearly indicated to the user via the user interface, and one of the difference keywords is selected according to a user operation.
It is also preferable to clearly indicate the answer sentence corresponding to the selected difference keyword via the user interface.

本発明の質問回答プログラム、サーバ及び方法によれば、ユーザの質問文に対して複数の回答文の候補が存在する場合、ユーザの意図を反映した回答文を明示する（に絞り込む）ことができる。 According to the question answering program, server and method of the present invention, when there are a plurality of answer sentence candidates for the user's question sentence, the answer sentence reflecting the user's intention can be clearly specified (restricted). .

本発明におけるシステム構成図である。It is a system configuration diagram in the present invention. 本発明における質問回答サーバの機能構成図である。It is a function block diagram of the question answering server in this invention. 質問キーワード抽出部及びコメント文章検索部の処理を表す説明図である。It is explanatory drawing showing the process of a question keyword extraction part and a comment text search part. トピック分類部の処理を表す説明図である。It is explanatory drawing showing the process of a topic classification | category part. 回答文検出部の処理を表す説明図である。It is explanatory drawing showing the process of an answer sentence detection part. 代表キーワード抽出部、差分キーワード抽出部、差分キーワード選択部及び回答文出力部の処理を表す説明図である。It is explanatory drawing showing the process of a representative keyword extraction part, a difference keyword extraction part, a difference keyword selection part, and an answer sentence output part. 本発明におけるシーケンス図である。It is a sequence diagram in the present invention.

以下、本発明の実施の形態について、図面を用いて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明におけるシステム構成図である。 FIG. 1 is a system configuration diagram according to the present invention.

図１によれば、インターネット上に、本発明における質問回答サーバ１が接続されている。質問回答サーバ１は、回答文を予め蓄積しているものであってもよいし、他の回答文蓄積サーバ２から回答文を受信するものであってもよい。尚、本発明によれば、ＦＡＱのような質問文候補と回答文候補とを予め紐付けて記憶しておく必要はない。あくまで、回答文候補のみを予め蓄積している。 According to FIG. 1, a question answering server 1 according to the present invention is connected to the Internet. The question answering server 1 may store the answer text in advance, or may receive the answer text from another answer text storage server 2. According to the present invention, there is no need to associate and store question sentence candidates and answer sentence candidates such as FAQ in advance. Only answer sentence candidates are stored in advance.

質問者が操作する端末４は、アクセスネットワーク及びインターネットを介して、質問回答サーバ１へアクセスする。そして、端末４は、質問文を質問回答サーバ１へ送信し、これに対し、質問回答サーバ１から回答文を受信する。以下の実施形態の中では、質問者が自然言語のテキストで端末４へ入力することを想定しているが、質問者が音声で入力しテキストに変換されたものであってもよい。 The terminal 4 operated by the questioner accesses the question answering server 1 via the access network and the Internet. Then, the terminal 4 transmits the question text to the question answering server 1 and receives the answer text from the question answering server 1. In the following embodiment, it is assumed that the questioner inputs the natural language text to the terminal 4, but the questioner may be input by voice and converted into text.

また、図１によれば、不特定多数の第三者から投稿されたコメント文章を公開するブログサーバ２が、インターネットに更に接続されている。ブログサーバ３は、例えばtwitter（登録商標）サーバのようなミニブログサーバである。不特定多数の第三者は、自ら所持する端末５を用いて、ミニブログサーバ３へコメント文章を自由に投稿することができる。 In addition, according to FIG. 1, a blog server 2 that publishes comment text posted by an unspecified number of third parties is further connected to the Internet. The blog server 3 is a mini blog server such as a twitter (registered trademark) server. An unspecified number of third parties can freely post a comment sentence to the miniblog server 3 using the terminal 5 possessed by the third party.

本発明における質問回答サーバ１は、ミニブログサーバ３から大量のコメント文章を収集する。そして、質問回答サーバ１は、ユーザの質問文に対して複数の回答文の候補が存在する場合、収集したコメント文章を用いて、ユーザの意図を反映した回答文を明示する（に絞り込む）。 The question answering server 1 in the present invention collects a large amount of comment text from the miniblog server 3. Then, when there are a plurality of answer sentence candidates for the user's question sentence, the question / answer server 1 specifies (restricts) the answer sentence reflecting the user's intention using the collected comment sentences.

図２は、本発明における質問回答サーバの機能構成図である。 FIG. 2 is a functional configuration diagram of the question answering server according to the present invention.

図２によれば、質問回答サーバ１は、通信インタフェース部１０と、回答文蓄積部１０１と、回答文取得部１１１と、コメント文章蓄積部１０２と、コメント文章収集部１１２とを有する。 According to FIG. 2, the question answering server 1 includes a communication interface unit 10, an answer sentence accumulating part 101, an answer sentence acquiring part 111, a comment sentence accumulating part 102, and a comment sentence collecting part 112.

回答文蓄積部１０１は、多数の回答文を蓄積する。回答文取得部１１１が、これら回答文を、ネットワークを介して回答文蓄積サーバ２から受信し、回答文蓄積部１０１へ蓄積するものであってもよい。 The answer sentence storage unit 101 stores a large number of answer sentences. The answer sentence acquisition unit 111 may receive these answer sentences from the answer sentence storage server 2 via the network and store them in the answer sentence storage unit 101.

コメント文章蓄積部１０２は、不特定多数の第三者によって投稿された多数のコメント文章を蓄積する。コメント文章収集部１１２が、これらコメント文章を、ネットワークを介してブログサーバ３から受信し、コメント文章蓄積部１０２へ蓄積するものであってもよい。 The comment text storage unit 102 stores a large number of comment text posted by an unspecified number of third parties. The comment text collection unit 112 may receive these comment texts from the blog server 3 via the network and store them in the comment text storage unit 102.

「コメント文章」とは、例えばtwitter（登録商標）で発信された、日本語の「つぶやき」（最大文字数：１４０文字）のようなものである。コメント文章は、例えば、ユーザid(from_user_id)、つぶやきＩＤ(id_str)、発信時間(created_at)、つぶやき(texts)を含む。ここで、コメント文章収集部１１２は、予め指定した複数のキーワードを含むコメント文章のみを収集することもできる。 The “comment text” is, for example, a Japanese “tweet” (maximum number of characters: 140 characters) transmitted by twitter (registered trademark). The comment text includes, for example, a user id (from_user_id), a tweet ID (id_str), a transmission time (created_at), and a tweet (texts). Here, the comment sentence collection unit 112 can also collect only comment sentences including a plurality of keywords specified in advance.

また、図２によれば、質問回答サーバ１は、質問文入力部１２１と、質問キーワード抽出部１２２と、コメント文章検索部１２３と、トピック分類部１２４と、回答文検出部１２５と、代表キーワード抽出部１２６と、差分キーワード抽出部１２７と、差分キーワード選択部１２８と、回答文出力部１２９とを有する。これら機能構成部は、サーバに搭載されたコンピュータを機能させるプログラムを実行することによって実現される。 2, the question answering server 1 includes a question sentence input unit 121, a question keyword extraction unit 122, a comment sentence search unit 123, a topic classification unit 124, an answer sentence detection unit 125, and a representative keyword. An extraction unit 126, a difference keyword extraction unit 127, a difference keyword selection unit 128, and an answer sentence output unit 129 are included. These functional components are realized by executing a program that causes a computer mounted on the server to function.

［質問文入力部１２１］
質問文入力部１２１は、質問者の端末４から、ネットワークを介して質問文を受信する。例えばユーザの質問文は、以下のようなものである。
Ｑ「携帯電話機の紛失」
その質問文は、質問キーワード抽出部１２２へ出力される。 [Question sentence input unit 121]
The question sentence input unit 121 receives a question sentence from the questioner's terminal 4 via the network. For example, the user's question text is as follows.
Q "Lost mobile phone"
The question text is output to the question keyword extraction unit 122.

図３は、質問キーワード抽出部及びコメント文章検索部の処理を表す説明図である。 FIG. 3 is an explanatory diagram showing the processing of the question keyword extraction unit and the comment text search unit.

［質問キーワード抽出部１２２］
質問キーワード抽出部１２２は、質問文に含まれる複数の質問キーワードを抽出する。ここで、質問キーワード抽出部１２２は、質問文から形態素解析によってキーワードを抽出すると共に、ＴＦ−ＩＤＦ（Term Frequency - Inverse Document Frequency：単語の出現頻度−逆出現頻度）によって特徴的な単語を、質問キーワードとして抽出する。 [Question keyword extraction unit 122]
The question keyword extraction unit 122 extracts a plurality of question keywords included in the question sentence. Here, the question keyword extraction unit 122 extracts a keyword from a question sentence by morphological analysis, and asks a characteristic word by TF-IDF (Term Frequency-Inverse Document Frequency) as a question. Extract as keywords.

質問キーワード抽出部１２２は、最初に、質問文から形態素解析によって単語を抽出する。「形態素解析」とは、文章を、意味のある単語に区切り、辞書を利用して品詞や内容を判別する技術をいう。「形態素」とは、文章の要素のうち、意味を持つ最小の単位を意味する。形態素解析のように単語単位で検索することなく、文字単位で分解し、後続の N-1文字を含めた状態で出現頻度を求める「N-gram」によって解析するものであってもよい。 First, the question keyword extraction unit 122 extracts words from the question sentence by morphological analysis. “Morphological analysis” refers to a technique in which sentences are divided into meaningful words and the part of speech and contents are discriminated using a dictionary. The “morpheme” means the smallest unit having meaning among the elements of the sentence. Instead of searching by word unit as in morphological analysis, analysis may be performed by “N-gram” that decomposes by character unit and obtains the appearance frequency in a state including the following N-1 characters.

ここで、形態素解析には、例えばオープンソースの形態素解析エンジン「ＭｅＣａｂ」を用いることができる。このエンジンによれば、階層化された品詞体系を有し、形態素の品詞も解析することができる。形態素毎に、「名詞」「固有名詞」「組織」「地域」「一般」・・・等の品詞も出力される。 Here, for example, an open source morphological analysis engine “MeCab” can be used for the morphological analysis. This engine has a hierarchical part-of-speech system and can also analyze morpheme part-of-speech. For each morpheme, parts of speech such as “noun”, “proprietary noun”, “organization”, “region”, “general”, etc. are also output.

次に、ＴＦ−ＩＤＦによって特徴的なキーワードを、質問キーワードとして抽出する。ＴＦ−ＩＤＦとは、各単語に重みを付けて、クエリから文章をベクトル空間で表し、文章とクエリの類似度でランク付けをする技術である。ランク付けられた値が高いほど、重要キーワードと認識される。 Next, characteristic keywords are extracted as question keywords by TF-IDF. TF-IDF is a technology that weights each word, expresses sentences from a query in a vector space, and ranks them based on the similarity between the sentences and the query. The higher the ranked value, the more important keywords are recognized.

図３の例によれば、以下のように抽出される。
質問文「携帯電話機の紛失」
質問キーワード「携帯電話機」「紛失」 According to the example of FIG. 3, it is extracted as follows.
Question "Lost mobile phone"
Question keyword "mobile phone""lost"

［コメント文章検索部１２３］
コメント文章検索部１２３は、コメント文章蓄積部１０２を用いて、質問キーワードを含むコメント文章を検索する。具体的には、質問キーワードをクエリとして、各コメント文章からＴＦ(Term Frequency)値やＤＦ(Document Frequency)値を抽出し、これら値が所定閾値以上となる複数のコメント文章を検索する。ＴＦ値は、文章における検索語の出現頻度をいい、ＤＦ値は、索引語が現れる相対文章頻度をいう。コメント文章検索部１２３は、ソーシャルメディア検索機能であって、投稿された大量のつぶやきの中から、質問キーワードに関するつぶやきのみを検索するようなものである。 [Comment text search unit 123]
The comment text search unit 123 uses the comment text storage unit 102 to search for a comment text including the question keyword. Specifically, using a question keyword as a query, a TF (Term Frequency) value and a DF (Document Frequency) value are extracted from each comment sentence, and a plurality of comment sentences whose values are equal to or greater than a predetermined threshold are searched. The TF value refers to the appearance frequency of a search word in a sentence, and the DF value refers to the relative sentence frequency in which an index word appears. The comment text search unit 123 is a social media search function that searches only tweets related to a question keyword from a large number of posted tweets.

図３によれば、例えば４つのコメント文章が検索されている。これらコメント文章には、少なくとも「携帯電話機」又は「紛失」が含まれている。 According to FIG. 3, for example, four comment sentences are searched. These comment sentences include at least “mobile phone” or “lost”.

図４は、トピック分類部の処理を表す説明図である。 FIG. 4 is an explanatory diagram illustrating processing of the topic classification unit.

［トピック分類部１２４］
トピック分類部１２４は、検索された複数のコメント文章を、出現単語の分布から複数個のトピックグループに分類する。トピック分類部１２４は、当該コメント文章を、分類された各トピックグループに属する確からしさ（トピック比率）を算出するＬＤＡ(Latent Dirichlet Allocation)アルゴリズムを用いて、いずれか１つのトピックグループに分類する。即ち、質問キーワードに関する多数のコメント文章を、潜在的なトピックグループに分類する。 [Topic classification unit 124]
The topic classifying unit 124 classifies the plurality of searched comment sentences into a plurality of topic groups based on the distribution of appearance words. The topic classification unit 124 classifies the comment text into any one topic group by using an LDA (Latent Dirichlet Allocation) algorithm that calculates a probability (topic ratio) belonging to each classified topic group. That is, a large number of comment sentences related to the question keyword are classified into potential topic groups.

ＬＤＡは、単語文書行列を次元圧縮する技術（ＬＳＩ(latent Semantic Indexin)）に対して、単語の特徴ベクトルに揺らぎに基づく確率的な枠組みを導入したものである（例えば非特許文献２参照）。その圧縮した次元の集合をトピックという。 LDA is a technique in which a probabilistic framework based on fluctuation is introduced into a feature vector of a word with respect to a technology (LSI (latent Semantic Indexin)) for dimensional compression of a word document matrix (see Non-Patent Document 2, for example). The compressed set of dimensions is called a topic.

図４によれば、トピック分類部１２４は、以下のステップで処理を実行する。
（Ｓ４１）質問キーワードに関する多数のコメント文章から単語を抽出し、単語毎の出現頻度（出現回数）をＬＤＡ処理へ入力する。そして、コメント文章毎に、各単語の出現頻度を計数する。 According to FIG. 4, the topic classification unit 124 executes processing in the following steps.
(S41) Words are extracted from a large number of comment sentences related to the question keyword, and the appearance frequency (number of appearances) for each word is input to the LDA process. And the appearance frequency of each word is counted for every comment sentence.

（Ｓ４２）次に、ＬＤＡ処理では、トピック毎の単語分布や、コメント文章（ネット側意見）毎のトピック比率を取得する。このトピック比率によって、コメント文章が属するトピックグループに分類する。そして、トピックグループ毎に、全てのコメント文章に含まれる各単語の出現頻度を計数する。 (S42) Next, in the LDA process, the word distribution for each topic and the topic ratio for each comment sentence (net side opinion) are acquired. This topic ratio classifies the topic group to which the comment text belongs. And the appearance frequency of each word contained in all the comment sentences is counted for every topic group.

（Ｓ４３）次に、コメント文章毎に、各トピックグループに属する単語を計数する。そして、コメント文章を計数値の高いトピックグループに分類する。 (S43) Next, for each comment sentence, the words belonging to each topic group are counted. Then, the comment sentences are classified into topic groups with high count values.

図５は、回答文検出部の処理を表す説明図である。 FIG. 5 is an explanatory diagram showing processing of the answer sentence detection unit.

［回答文検出部１２５］
回答文検出部１２５は、各トピックグループに含まれるコメント文章群と、各回答文に含まれる文章との間の類似度を算出し、各トピックグループに類似度が所定閾値以上となる回答文を対応付ける。 [Answer sentence detector 125]
The answer sentence detection unit 125 calculates the similarity between the comment sentence group included in each topic group and the sentence included in each answer sentence, and determines an answer sentence having a similarity equal to or greater than a predetermined threshold for each topic group. Associate.

類似度の算出方法は、例えば以下のようにする。
（Ｓ５１）回答文検出部１２５は、各トピックグループに含まれるコメント文章群から形態素解析によってキーワードを抽出すると共に、ＴＦ−ＩＤＦによってトピックグループにおける第１の特徴ベクトルを算出する。
各トピックグループ：Ｃi(i=1,2,・・・)
トピックグループiに含まれるコメント文章：Ｔij(j=1,2,・・・) For example, the similarity is calculated as follows.
(S51) The answer sentence detection unit 125 extracts a keyword from a comment sentence group included in each topic group by morphological analysis, and calculates a first feature vector in the topic group by TF-IDF.
Each topic group: Ci (i = 1,2, ...)
Comment text included in topic group i: Tij (j = 1,2, ...)

（Ｓ５２）回答文蓄積部１０１に蓄積された各回答文から形態素解析によってキーワードを抽出すると共に、ＴＦ−ＩＤＦによって当該回答文における第２の特徴ベクトルとを算出する。
回答文：Ａk(k=1,2,・・・) (S52) A keyword is extracted from each answer sentence stored in the answer sentence storage unit 101 by morphological analysis, and a second feature vector in the answer sentence is calculated by TF-IDF.
Answer text: Ak (k = 1,2, ...)

（Ｓ５３）トピックグループの第１のベクトルと、回答文の第２のベクトルとの間のコサイン距離に基づいて類似度を算出する。具体的には、各コメント文章Ｔi1, Ｔi2,・・・を含むトピックグループＣiと、回答文Ａjとの類似度Dist（Ｃi,Ａj）を算出する。
Dist（Ｃi,Ａj）＝cosin距離Ｄ(Ｔi1,Ａj)，Ｄ(Ｔi2, Ａj)，・・・の平均値
＝ａｒｇ_i ｍａｘ（Dist(Ｃi,Ａj)） (S53) The similarity is calculated based on the cosine distance between the first vector of the topic group and the second vector of the answer sentence. Specifically, a similarity Dist (Ci, Aj) between the topic group Ci including each comment sentence Ti1, Ti2,... And the answer sentence Aj is calculated.
Dist (Ci, Aj) = average value of cosin distances D (Ti1, Aj), D (Ti2, Aj), ...
= Arg _i max (Dist (Ci, Aj))

図５によれば、回答文蓄積部１０１には、多数の回答文が蓄積されている。
回答文１「・・・」
回答文２「携帯探せて安心サービスの申込方法」
（・・携帯電話機・・・端末・・・安心・・・紛失・・・申込み・・・）
回答文３「・・・」
回答文４「・・・」
回答文５「携帯探せて安心サービスの利用方法」
（・・携帯電話機・・・端末・・・発見・・・）
回答文６「・・・」 According to FIG. 5, a large number of answer sentences are accumulated in the answer sentence accumulation unit 101.
Answer 1 "..."
Answer 2 “How to apply for a secure service by searching for a mobile phone”
(・・ Mobile phones ... Terminals ... Safety ... Loss ... Applications ...)
Answer sentence 3 "..."
Answer sentence 4 "..."
Answer sentence 5 “How to use a secure service with a mobile search”
(・・ Mobile phones ... terminals ... discovery ...)
Answer 6 "..."

図５によれば、トピックグループ１と回答文２との類似度が、所定閾値δよりも高く、両者は類似していると判定されている。また、トピックグループ２と回答文５との類似度が、所定閾値δよりも高く、両者は類似していると判定されている。これによって、トピックグループＣ1,Ｃ2,・・・毎に、０個以上の回答文が割り当てられる。 According to FIG. 5, the similarity between the topic group 1 and the answer sentence 2 is higher than the predetermined threshold δ, and it is determined that both are similar. Further, the similarity between the topic group 2 and the answer sentence 5 is higher than the predetermined threshold δ, and it is determined that both are similar. As a result, zero or more answer sentences are assigned to each topic group C1, C2,.

図６は、代表キーワード抽出部、差分キーワード抽出部、差分キーワード選択部及び回答文出力部の処理を表す説明図である。 FIG. 6 is an explanatory diagram illustrating processing of the representative keyword extraction unit, the difference keyword extraction unit, the difference keyword selection unit, and the answer sentence output unit.

［代表キーワード抽出部１２６］
代表キーワード抽出部１２６は、各トピックグループについて、対応付けられた回答文に含まれるキーワードの中で、当該トピックグループを特徴付ける代表キーワードを抽出する。具体的には、代表キーワード抽出部１２６は、回答文から形態素解析によってキーワードを抽出すると共に、ＴＦ−ＩＤＦによって特徴的な単語を、代表キーワードとして抽出する。 [Representative keyword extraction unit 126]
The representative keyword extraction unit 126 extracts, for each topic group, a representative keyword that characterizes the topic group from among keywords included in the associated answer sentence. Specifically, the representative keyword extraction unit 126 extracts a keyword from the answer sentence by morphological analysis, and extracts a characteristic word as a representative keyword by TF-IDF.

図６によれば、トピックグループ１に対応する回答文２からは、以下の３つの単語が、代表キーワードとして抽出される。
回答文２（携帯探せて安心サービスの申込方法）
「端末」「安心」「申込み」
回答文５（携帯探せて安心サービスの利用方法）
「端末」「発見」 According to FIG. 6, the following three words are extracted as representative keywords from the answer sentence 2 corresponding to the topic group 1.
Answer 2 (How to apply for a secure service with a mobile search)
"Terminal""Reliable""Application"
Answer sentence 5 (How to use a reliable service by searching for a mobile phone)
"Terminal""Discovery"

ここで、代表キーワード抽出部１２６は、各トピックグループの代表キーワードを、赤池情報量基準に応じて優先順に並べることも好ましい。トピックグループＣ1,Ｃ2,・・・に割り当てられた回答文のいずれかに出現する単語を、ｗ1,ｗ2,・・・とする。ここでは、単語ｗ(i)が、トピックグループＣjの判別に役立つかどうかの指標を与える（例えば特許文献３参照）。 Here, it is preferable that the representative keyword extraction unit 126 arranges the representative keywords of each topic group in order of priority according to the Akaike information amount standard. Words appearing in any of the answer sentences assigned to the topic groups C1, C2,. Here, an index indicating whether or not the word w (i) is useful for determining the topic group Cj is given (see, for example, Patent Document 3).

以下では、単語ｗが、トピックグループＣの判別に役立つかどうかの指標Ｅ(ｗ,Ｃ)の算出方法を表す。
（Ｓ１）トピックグループ含まれるコメント文章（つぶやき）の集合Ｕから、以下の４種類の頻度を得る。
ｎ11＝トピックグループＣに類似し、単語ｗが出現するコメント文章の数
ｎ12＝トピックグループＣ以外に類似し、単語ｗが出現するコメント文章の数
ｎ21＝トピックグループＣに類似し、単語ｗが出現しないコメント文章の数
ｎ22＝トピックグループＣ以外に類似し、単語ｗが出現しないコメント文章の数 Below, the calculation method of the parameter | index E (w, C) whether the word w is useful for discrimination | determination of the topic group C is represented.
(S1) The following four types of frequencies are obtained from the set U of comment sentences (tweets) included in the topic group.
n11 = number of comment sentences similar to topic group C and word w appear n12 = number of comment sentences similar to topic group C and word w appear n21 = number of comment sentences similar to topic group C and word w appear N22 = number of comment sentences that are similar to those other than topic group C and do not appear in word w

（Ｓ２）次に、ｎ11,ｎ12,ｎ21,ｎ22に対して、赤池情報量規準(ＡＩＣ：Akaike's Information Criterion)を用いて、独立モデルに対する値MLL_IM(w,C)及び従属モデルに対する値MLL_DM(w,C)を算出する。これは、単語とトピックグループとの組毎の不当割合を算出する。
MLL_IM(w,C)＝(n11+n12) log(n11+n12)＋(n11+n21) log(n11+n21)
＋(n21+n22) log(n21+n22)＋(n12+n22) log(n12+n22)−2 N log N
MLL_DM(w,C)＝n11 log n11＋n12 log n12＋n21 log n21＋n22 log n22−N log N
但し、N＝n11＋n12＋n21＋n22 (S2) Next, for n11, n12, n21, and n22, using the Akaike's Information Criterion (AIC), the value MLL_IM (w, C) for the independent model and the value MLL_DM (w for the dependent model) , C). This calculates an unreasonable ratio for each pair of a word and a topic group.
MLL_IM (w, C) = (n11 + n12) log (n11 + n12) + (n11 + n21) log (n11 + n21)
+ (N21 + n22) log (n21 + n22) + (n12 + n22) log (n12 + n22) -2 N log N
MLL_DM (w, C) = n11 log n11 + n12 log n12 + n21 log n21 + n22 log n22-N log N
However, N = n11 + n12 + n21 + n22

（Ｓ３）前述のMLL_IM(w,C)及びMLL_DM(w,C)から、以下のＥ(w,C)を算出する。
AIC_IM(w,C)＝-2 × MLL_IM(w,C) ＋ 2×2
AIC_DM(w,C)＝-2 × MLL_DM(w,C) ＋ 2×3
Ｅ(w,C)＝AIC_IM(w, C) − AIC_DM(w,C) (S3) The following E (w, C) is calculated from the aforementioned MLL_IM (w, C) and MLL_DM (w, C).
AIC_IM (w, C) = -2 × MLL_IM (w, C) + 2 × 2
AIC_DM (w, C) = -2 × MLL_DM (w, C) + 2 × 3
E (w, C) = AIC_IM (w, C) − AIC_DM (w, C)

前述で算出されたＥ(ｗ,Ｃ)は、単語ｗがトピックグループＣに偏って出現する不当割合を表す。Ｅ(ｗ,Ｃ)は、赤池情報量基準に従って、トピックグループＣの判別に役立つ単語ほど、Ｅ(ｗ,Ｃ)の値が高くなる。本発明によれば、各トピックグループＣiに対し、Ｅ(w,C)の値が大きい順に、ｍ個の単語Ｃi,1、Ｃi,2、Ｃi,3、・・・Ｃi,m を抽出し、トピックグループＣiの代表キーワードとする。 E (w, C) calculated as described above represents an unreasonable ratio in which the word w appears biased to the topic group C. As for E (w, C), the value of E (w, C) becomes higher as the word is more useful for determining the topic group C according to the Akaike information criterion. According to the present invention, for each topic group Ci, m words Ci, 1, Ci, 2, Ci, 3,... Ci, m are extracted in descending order of E (w, C). The representative keyword of the topic group Ci.

［差分キーワード抽出部１２７］
差分キーワード抽出部１２７は、各トピックグループについて、当該トピックグループのみに出現する代表キーワードを、差分キーワードとして抽出する。図６によれば、以下の差分キーワードが抽出されている。
トピックグループ１：「安心」「申込み」（×端末）
トピックグループ２：「発見」（×端末） [Difference Keyword Extraction Unit 127]
The difference keyword extraction unit 127 extracts, for each topic group, a representative keyword that appears only in the topic group as a difference keyword. According to FIG. 6, the following difference keywords are extracted.
Topic Group 1: “Reliable” “Application” (× terminal)
Topic Group 2: “Discovery” (× terminal)

［差分キーワード選択部１２８］
差分キーワード選択部１２８は、複数の差分キーワードを、ユーザインタフェースを介してユーザに明示すると共に、ユーザ操作に応じていずれか１つの差分キーワードを選択させる。ユーザから見ると、例えば、質問文をキーボードで入力した後、トピックグループ毎の差分キーワードがディスプレイに表示される。そして、ユーザは、いずれかの差分キーワードを選択することができる。図６によれば、ユーザは、「発見」を選択している。ユーザに選択された差分キーワード「発見」は、回答文出力部１２９へ出力される。 [Difference keyword selection unit 128]
The difference keyword selection unit 128 clearly indicates a plurality of difference keywords to the user via the user interface, and causes any one difference keyword to be selected according to a user operation. When viewed from the user, for example, after inputting a question sentence with a keyboard, the difference keyword for each topic group is displayed on the display. Then, the user can select any difference keyword. According to FIG. 6, the user has selected “discovery”. The difference keyword “discovery” selected by the user is output to the answer sentence output unit 129.

［回答文出力部１２９］
回答文出力部１２９は、回答文検出手段によって検出された回答文を、対応する１つ以上の差分キーワードと共に明示する。本発明によれば、ユーザの質問に曖昧性があり、コメント文章群が複数のトピックグループに分類され、各トピックグループに対応付けられた回答文を得ることができる。ここで、この得られた回答文の数が少ない場合、差分キーワードは、提示される回答文の傾向をユーザが認識するために有益な情報となる。 [Answer sentence output unit 129]
The answer sentence output unit 129 specifies the answer sentence detected by the answer sentence detection unit together with one or more corresponding difference keywords. According to the present invention, the user's question is ambiguous, the comment sentence group is classified into a plurality of topic groups, and an answer sentence associated with each topic group can be obtained. Here, when the number of obtained answer sentences is small, the difference keyword is useful information for the user to recognize the tendency of the presented answer sentence.

また、回答文の数が多い場合、ユーザとインタラクション（やりとり）をすることによって、回答文を絞り込むことが好ましい。そこで、回答文出力部１２９は、選択された差分キーワードに対応する回答文を、ユーザインタフェースを介して明示する。例えば、その回答文を、ユーザが視認するディスプレイに表示する。図６によれば、「携帯探せて安心サービスの利用方法」の回答文が、ユーザへ表示される。これによって、ユーザは、質問文に対する回答文を認識することができる。 When there are a large number of answer sentences, it is preferable to narrow down the answer sentences by interacting with the user. Therefore, the answer sentence output unit 129 specifies the answer sentence corresponding to the selected difference keyword via the user interface. For example, the answer sentence is displayed on a display visually recognized by the user. According to FIG. 6, an answer sentence “How to find a mobile phone and use a reliable service” is displayed to the user. Thereby, the user can recognize the answer sentence to the question sentence.

図７は、本発明におけるシーケンス図である。 FIG. 7 is a sequence diagram in the present invention.

（Ｓ７１）質問者が操作する端末４から、質問回答サーバ１へ、ユーザの質問文が送信される（図２の質問文入力部１２１参照）。
（Ｓ７２）質問回答サーバ１は、質問文に含まれる複数の質問キーワードを抽出する（図２の質問キーワード抽出部１２２参照）。
（Ｓ７３）質問回答サーバ１は、コメント文章蓄積部１０２を用いて、質問キーワードを含むコメント文章を検索する（図２のコメント文章検索部１２３参照）。
（Ｓ７４）質問回答サーバ１は、検索された複数のコメント文章を、出現単語の分布から複数個のトピックグループに分類する（図２のトピック分類部１２４参照）。
（Ｓ７５）質問回答サーバ１は、各トピックグループに含まれるコメント文章群と、各回答文との間の類似度を算出し、各トピックグループに類似度が所定閾値以上となる回答文を対応付ける（図２の回答文検出部１２５参照）。
（Ｓ７６）質問回答サーバ１は、各トピックグループについて、対応付けられた回答文に含まれるキーワードの中で、当該トピックグループを特徴付ける代表キーワードを抽出する（図２の代表キーワード抽出部１２６参照）。
（Ｓ７７）質問回答サーバ１は、各トピックグループについて、当該トピックグループのみに出現する代表キーワードを、差分キーワードとして抽出する（図２の差分キーワード抽出部１２７参照）。
（Ｓ７８）質問回答サーバ１は、複数の差分キーワードを、ユーザ操作の端末４へ送信する（図２の差分キーワード選択部１２８参照）。そして、端末４では、ユーザ操作に応じていずれか１つの差分キーワードが選択させる。選択された差分キーワードは、端末４から質問回答サーバ１へ送信される。
（Ｓ７９）質問回答サーバ１は、選択された差分キーワードに対応する回答文を、ユーザの端末４へ送信する（図２の回答文出力部１２９参照）。 (S71) The question text of the user is transmitted from the terminal 4 operated by the questioner to the question answering server 1 (see the question text input unit 121 in FIG. 2).
(S72) The question answering server 1 extracts a plurality of question keywords included in the question sentence (see the question keyword extracting unit 122 in FIG. 2).
(S73) The question answering server 1 uses the comment text storage unit 102 to search for a comment text including the question keyword (see the comment text search unit 123 in FIG. 2).
(S74) The question answering server 1 classifies the searched comment sentences into a plurality of topic groups based on the distribution of appearance words (see the topic classification unit 124 in FIG. 2).
(S75) The question answering server 1 calculates the similarity between the comment sentence group included in each topic group and each answer sentence, and associates an answer sentence whose similarity is equal to or greater than a predetermined threshold with each topic group ( (See the answer sentence detection unit 125 in FIG. 2).
(S76) For each topic group, the question / answer server 1 extracts a representative keyword that characterizes the topic group from keywords included in the associated answer sentence (see the representative keyword extraction unit 126 in FIG. 2).
(S77) The question answering server 1 extracts, for each topic group, a representative keyword that appears only in the topic group as a difference keyword (see the difference keyword extraction unit 127 in FIG. 2).
(S78) The question answering server 1 transmits a plurality of difference keywords to the user-operated terminal 4 (see the difference keyword selection unit 128 in FIG. 2). And in the terminal 4, any one difference keyword is selected according to user operation. The selected difference keyword is transmitted from the terminal 4 to the question answering server 1.
(S79) The question / answer server 1 transmits an answer sentence corresponding to the selected difference keyword to the user's terminal 4 (see the answer sentence output unit 129 in FIG. 2).

前述したように本発明の質問回答サーバによれば、例えばtwitterのような大量のコメント文章から、質問文の意図を表す代表的なキーワードを抽出し、質問文を補完することによって、回答文を高精度に検索することができる。具体的には、最初に、質問文に含まれるキーワードを抽出してソーシャルメディアを検索し、大量の検索結果を複数のトピックグループ（トピック毎に１つの検索意図に対応）に高速に分類し、各トピックに類似する回答文を回答文蓄積部から検索する。次に、各トピックグループに特有の単語（差分キーワード）を自動的に抽出してユーザに提示し、ユーザの選択結果に従った回答文に絞り込んで、ユーザとの対話形式を繰り返し実行することができる。 As described above, according to the question answering server of the present invention, a representative keyword representing the intention of the question sentence is extracted from a large amount of comment sentences such as twitter, and the answer sentence is complemented by supplementing the question sentence. It is possible to search with high accuracy. Specifically, first, the keywords included in the question sentence are extracted to search social media, and a large amount of search results are quickly classified into a plurality of topic groups (corresponding to one search intention for each topic) An answer sentence similar to each topic is searched from the answer sentence storage unit. Next, a word (difference keyword) peculiar to each topic group is automatically extracted and presented to the user, and it is narrowed down to the answer sentence according to the user's selection result, and the interactive form with the user can be repeatedly executed. it can.

以上、詳細に説明したように、本発明の質問回答プログラム、サーバ及び方法によれば、ユーザの質問文に対して複数の回答文の候補が存在する場合、ユーザの意図を反映した回答文を明示する（に絞り込む）ことができる。 As described above in detail, according to the question answering program, server and method of the present invention, when there are a plurality of answer sentence candidates for the user's question sentence, an answer sentence reflecting the user's intention is displayed. You can specify (narrow down).

前述した本発明の種々の実施形態について、本発明の技術思想及び見地の範囲の種々の変更、修正及び省略は、当業者によれば容易に行うことができる。前述の説明はあくまで例であって、何ら制約しようとするものではない。本発明は、特許請求の範囲及びその均等物として限定するものにのみ制約される。 Various changes, modifications, and omissions of the above-described various embodiments of the present invention can be easily made by those skilled in the art. The above description is merely an example, and is not intended to be restrictive. The invention is limited only as defined in the following claims and the equivalents thereto.

１質問回答サーバ
１０通信インタフェース部
１０１回答文蓄積部
１０２コメント文章蓄積部
１１１回答文取得部
１１２コメント文章収集部
１２１質問文入力部
１２２質問キーワード抽出部
１２３コメント文章検索部
１２４トピック分類部
１２５回答文検出部
１２６代表キーワード抽出部
１２７差分キーワード抽出部
１２８差分キーワード選択部
１２９回答文出力部
２回答文蓄積サーバ
３ブログサーバ
４端末
５コメント投稿者用の汎用端末 DESCRIPTION OF SYMBOLS 1 Question answer server 10 Communication interface part 101 Answer sentence storage part 102 Comment sentence storage part 111 Answer sentence acquisition part 112 Comment sentence collection part 121 Question sentence input part 122 Question keyword extraction part 123 Comment sentence search part 124 Topic classification part 125 Answer sentence Detection unit 126 Representative keyword extraction unit 127 Difference keyword extraction unit 128 Difference keyword selection unit 129 Answer sentence output part 2 Answer sentence storage server 3 Blog server 4 Terminal 5 General-purpose terminal for comment author

Claims

A question answering program that has a comment sentence accumulating part that accumulates a large number of comment sentences and an answer sentence accumulating part that accumulates a large number of answer sentences, and that allows a computer to function to extract answer sentences for a question sentence from a user. There,
A question sentence input means for inputting a question sentence;
Question keyword extraction means for extracting a plurality of question keywords included in the question sentence;
Using the comment sentence storage unit, comment sentence search means for searching for comment sentences including the question keyword;
A topic classification means for classifying a plurality of searched comment sentences into a plurality of topic groups from the distribution of appearance words;
An answer sentence detection means for calculating a similarity between a comment sentence group included in each topic group and a sentence included in each answer sentence, and associating an answer sentence having the similarity equal to or greater than a predetermined threshold with each topic group; ,
For each topic group, representative keyword extraction means for extracting a representative keyword characterizing the topic group from the keywords included in the associated answer sentence;
For each topic group, a differential keyword extraction unit that extracts a representative keyword that appears only in the topic group as a differential keyword;
A question answering program which causes a computer to function as an answer sentence output means for clearly indicating the answer sentence detected by the answer sentence detection means together with the corresponding one or more difference keywords.

The difference keyword selecting means is further provided to clearly indicate the plurality of difference keywords to the user via a user interface and to select any one difference keyword according to a user operation.
The question answering program according to claim 1, wherein the answer sentence output unit causes the computer to function so as to clearly indicate an answer sentence corresponding to the selected difference keyword via a user interface.

The topic classification means classifies the comment text into any one topic group using an LDA (Latent Dirichlet Allocation) algorithm that calculates a probability (topic ratio) belonging to each classified topic group. The question answering program according to claim 1 or 2, wherein the computer functions.

The question keyword extraction means extracts a keyword from the question sentence by morphological analysis, and at the same time, extracts a characteristic word from the question keyword by TF-IDF (Term Frequency-Inverse Document Frequency). Extract as
And / or
2. The representative keyword extracting means causes a computer to function so as to extract a keyword from the answer sentence by morphological analysis and to extract a characteristic word as the representative keyword by TF-IDF. 4. The question answering program according to any one of items 1 to 3.

The answer sentence detection means includes:
A keyword is extracted from a comment sentence group included in each topic group by morphological analysis, and a first feature vector in the topic group is calculated by TF-IDF.
A keyword is extracted from each answer sentence stored in the answer sentence storage unit by morphological analysis, and a second feature vector in the answer sentence is calculated by TF-IDF.
5. The computer according to claim 1, wherein the computer is caused to calculate a similarity based on a cosine distance between a first vector of the topic group and a second vector of the answer sentence. The question answering program according to item 1.

6. The question according to claim 1, wherein the representative keyword extraction unit causes the computer to function so that the representative keywords of each topic group are arranged in a priority order in accordance with the Akaike information amount criterion. Answer program.

The comment text is posted by an unspecified number of third parties,
7. The comment text storage unit causes a computer to function so as to collect and store comment text posted on a mini blog (mini Web log) server. The question answering program described in.

A question answering server that has a comment sentence accumulating unit that accumulates a large number of comment sentences, and an answer sentence accumulating part that accumulates a large number of answer sentences, and extracts a response sentence to a question sentence from a user,
A question sentence input means for inputting a question sentence from a terminal;
Question keyword extraction means for extracting a plurality of question keywords included in the question sentence;
Using the comment sentence storage unit, comment sentence search means for searching for comment sentences including the question keyword;
A topic classification means for classifying a plurality of searched comment sentences into a plurality of topic groups from the distribution of appearance words;
A comment sentence group included in each topic group and an answer sentence calculating means for calculating a similarity between each answer sentence and associating each topic group with an answer sentence having the similarity equal to or greater than a predetermined threshold;
For each topic group, representative keyword extraction means for extracting a representative keyword characterizing the topic group from the keywords included in the associated answer sentence;
For each topic group, a differential keyword extraction unit that extracts a representative keyword that appears only in the topic group as a differential keyword;
A question answering server comprising answer text output means for clearly indicating the answer text detected by the answer text detection means together with the corresponding one or more difference keywords.

The difference keyword selecting means is further provided to clearly indicate the plurality of difference keywords to the user via a user interface and to select any one difference keyword according to a user operation.
9. The question answering server according to claim 8, wherein the answer sentence output means specifies an answer sentence corresponding to the selected difference keyword via a user interface.

A question answering method in a device that has a comment sentence accumulating unit that accumulates a large number of comment sentences and an answer sentence accumulating part that accumulates a large number of answer sentences, and extracts an answer sentence for a question sentence from a user,
A first step of inputting a question sentence;
A second step of extracting a plurality of question keywords included in the question sentence;
A third step of searching for a comment sentence including the question keyword using the comment sentence storage unit;
A fourth step of classifying the plurality of searched comment sentences into a plurality of topic groups from the distribution of appearance words;
A fifth step of calculating a similarity between a comment sentence group included in each topic group and each answer sentence, and associating an answer sentence with the similarity equal to or greater than a predetermined threshold to each topic group;
For each topic group, a sixth step of extracting a representative keyword characterizing the topic group from the keywords included in the associated answer sentence;
For each topic group, a seventh step of extracting a representative keyword that appears only in the topic group as a difference keyword;
And an eighth step of clearly indicating the answer sentence detected by the fifth step together with the corresponding one or more difference keywords.

For the eighth step,
A plurality of the difference keywords are clearly indicated to the user through a user interface, and one of the difference keywords is selected according to a user operation,
11. The question answering method according to claim 10, wherein an answer sentence corresponding to the selected difference keyword is specified via a user interface.