JP2012208728A

JP2012208728A - Expert retrieval apparatus and expert retrieval method

Info

Publication number: JP2012208728A
Application number: JP2011073811A
Authority: JP
Inventors: Yoshikiyo Kato; 義清加藤; Susumu Akamine; 享赤峯
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2011-03-30
Filing date: 2011-03-30
Publication date: 2012-10-25
Anticipated expiration: 2031-03-30
Also published as: JP5780633B2

Abstract

PROBLEM TO BE SOLVED: To solve the problem that exerts could not be retrieved accurately for Web documents.SOLUTION: An expert retrieval apparatus comprises a document retrieval unit which acquires one or more documents relating to a topic from one or more Web server devices, an expert candidate extraction unit which extracts one or more expert candidates from the one or more documents, an opinion sentence extraction unit which extracts one or more opinion sentences which are sentences relating to opinions about the topic from the one or more documents, an opinion information acquisition unit which acquires opinion information from the one or more opinion sentences for each of one or more expert candidates, a relation degree acquisition unit which acquires for each of the one or more expert candidates a degree of relation between each of the one or more documents and the expert candidate, a score calculation unit which calculate a specialty score for the topic using the opinion information and the relation degree for each of the one or more expert candidates, an expert selection unit which acquires one or more experts using the score, and an expert output unit which outputs the one or more experts acquired by the expert selection unit. The expert retrieval apparatus can thereby retrieve experts accurately for Web documents.

Description

本発明は、Ｗｅｂ文書からトピックについての１以上の専門家を抽出する専門家検索システム等に関するものである。 The present invention relates to an expert search system that extracts one or more experts about a topic from a Web document.

従来、組織内の文書を対象に専門家を検索する技術があった（例えば、非特許文献１、非特許文献２参照）。また、研究論文を対象に専門家を検索する技術があった（例えば、非特許文献３参照）。これらの従来技術に共通するのは、専門家候補の専門性の証拠として、専門家候補が出現する文書、または専門家候補が著者となっている文書について、トピックとの関連度を用いる点である。組織内文書や研究論文は、文書の品質が一定以上であることが仮定でき、専門家候補が関わる文書のトピックとの関連度を専門性の証拠と扱っても問題はない。 Conventionally, there has been a technique for searching for an expert for documents in an organization (see, for example, Non-Patent Document 1 and Non-Patent Document 2). In addition, there has been a technique for searching for experts in research papers (see, for example, Non-Patent Document 3). Common to these prior arts is that the degree of relevance to the topic is used as evidence of the expertise of the expert candidate for the document in which the expert candidate appears or the document for which the expert candidate is the author. is there. Institutional documents and research papers can be assumed that the document quality is above a certain level, and there is no problem in treating the degree of relevance to the topic of the document with which the expert candidate is involved as evidence of expertise.

Balog K, Azzopardi L, Rijke M de. Formal Models for Expert Finding in Enterprise Corpora. In: Proceedings of SIGIR’06.; 2006:43-50.Balog K, Azzopardi L, Rijke M de. Formal Models for Expert Finding in Enterprise Corpora. In: Proceedings of SIGIR’06 .; 2006: 43-50. Macdonald C, Ounis I. Voting for Candidates : Adapting Data Fusion Techniques for an Expert Search Task. In: Proceedings of CIKM’06.; 2006:387-396.Macdonald C, Ounis I. Voting for Candidates: Adapting Data Fusion Techniques for an Expert Search Task.In: Proceedings of CIKM’06 .; 2006: 387-396. Mimno D, Mccallum A. Expertise Modeling for Matching Papers with Reviewers. In: Proceedings of KDD’07.; 2007:500-509.Mimno D, Mccallum A. Expertise Modeling for Matching Papers with Reviewers. In: Proceedings of KDD’07 .; 2007: 500-509.

しかしながら、専門家を検索する対象の文書を組織内の文書を対象にした場合には、一般のユーザは利用できない。また、トピックの中には、研究論文にならないトピックも存在する。また、研究論文は書かないが、専門家と呼ぶに相応しい人も居る。そこで、Ｗｅｂ上に存在する文書であるＷｅｂ文書を対象として、専門家を抽出することは有効である。 However, when a document for searching for an expert is a document in an organization, a general user cannot use it. In addition, some topics are not research papers. Also, there are people who do not write research papers but are suitable for being called experts. Therefore, it is effective to extract experts from Web documents that are documents existing on the Web.

しかし、Ｗｅｂ文書を対象とした場合に、スパムなどのノイズとなる文書だけでなく、掲示板の書き込みやブログといった必ずしも品質が高くない文書も多く含まれており、トピックとの関連度だけでは専門性について十分な証拠とはならず、専門性検索の精度低下を招く。つまり、従来の専門家検索システムにおいては、Ｗｅｂ文書を対象として、精度高く専門家を検索できなかった。 However, when Web documents are targeted, not only documents that cause noise such as spam, but also many documents that do not necessarily have high quality such as bulletin board writing and blogs are included. Does not provide sufficient evidence, and causes a decrease in the accuracy of expert searches. In other words, in the conventional expert search system, it is not possible to search for experts with high accuracy for Web documents.

また、従来技術では、専門性候補が所与のものとして扱われている。特定の組織を対象としたり、特定の研究分野を対象とする場合には専門家候補を特定でき、そのような仮定は妥当である。しかし、Ｗｅｂ文書を対象に不特定のトピックについて専門家検索を実施しようとしたとき、事前に専門家候補を得るのは現実的ではない。 Also, in the prior art, specialization candidates are treated as given. Expert candidates can be identified when targeting a specific organization or a specific research field, and such assumptions are valid. However, when an expert search is performed on an unspecified topic for a Web document, it is not realistic to obtain expert candidates in advance.

本第一の発明の専門家検索装置は、トピックを受け付ける受付部と、１以上のＷｅｂサーバ装置から、トピックに関連する１以上の文書を取得する文書検索部と、１以上の文書から、１以上の専門家候補を抽出する専門家候補抽出部と、１以上の文書から、トピックの意見に関する文である１以上の意見文を抽出する意見文抽出部と、１以上の意見文から、肯定的な意見文の数である肯定意見文数、および否定的な意見文の数である否定意見文数を含む意見情報を、１以上の各専門家候補ごとに取得する意見情報取得部と、１以上の各専門家候補ごとに、意見情報を用いて、トピックについての専門性のスコアを算出するスコア算出部と、スコアを用いて、１以上の専門家候補から、１以上の専門家を取得する専門家選択部と、専門家選択部が取得した１以上の専門家を出力する専門家出力部とを具備する専門家検索装置である。 The expert search device according to the first aspect of the present invention includes a reception unit that receives a topic, a document search unit that acquires one or more documents related to a topic from one or more Web server devices, and one or more documents. From the expert candidate extraction unit that extracts the above expert candidates, the opinion sentence extraction unit that extracts one or more opinion sentences that are sentences related to the topic opinion from one or more documents, and one or more opinion sentences, affirmation An opinion information acquisition unit that acquires, for each one or more expert candidates, opinion information including the number of positive opinion sentences that is the number of positive opinion sentences and the number of negative opinion sentences that are the number of negative opinion sentences; For each of one or more expert candidates, a score calculation unit that calculates the expert score for the topic using opinion information, and one or more experts from one or more expert candidates using the score Expert selection department to obtain and expert selection department An expert search device comprising an expert output unit for outputting one or more experts obtained.

かかる構成により、Ｗｅｂ文書を対象として、精度高く専門家を検索できる。 With this configuration, it is possible to search for a specialist with high accuracy for a Web document.

また、本第二の発明の専門家検索装置は、第一の発明に対して、専門家候補抽出部が抽出した１以上の各専門家候補と、文書検索部が取得した１以上の文書のうち、各専門家候補が出現する文書に対して、専門家候補と文書との関連度を、１以上の各専門家候補ごとに取得する関連度取得部をさらに具備し、スコア算出部は、１以上の各専門家候補ごとに、意見情報および関連度を用いて、トピックについての専門性のスコアを算出する専門家検索装置である。 In addition, the expert search device of the second invention has one or more expert candidates extracted by the expert candidate extraction unit and one or more documents acquired by the document search unit with respect to the first invention. Among these, for each document in which each expert candidate appears, a relevance degree acquiring unit that acquires the degree of association between the expert candidate and the document for each one or more expert candidates, and the score calculating unit, This is an expert search device that calculates a specialty score for a topic using opinion information and a degree of association for each of one or more expert candidates.

また、本第三の発明の専門家検索装置は、第一または第二の発明に対して、専門家出力部は、スコアの高い順に、１以上の専門家をソートして出力する専門家検索装置である。 In the expert search device of the third invention, the expert output unit sorts and outputs one or more experts in descending order of scores in the first or second invention. Device.

かかる構成により、Ｗｅｂ文書を対象として、精度高く専門家を検索し、適切に専門家を提示できる。 With such a configuration, it is possible to search for experts with high accuracy and appropriately present experts for Web documents.

また、本第四の発明の専門家検索装置は、第一から第三いずれかの発明に対して、専門家出力部は、１以上の専門家の出力に代えて、専門家選択部が取得した１以上の各専門家のスコアを算出する元になった１以上の文書または１以上の意見文を出力する専門家検索装置である。 In the expert search device of the fourth aspect of the invention, the expert output unit obtains the expert selection unit instead of the output of one or more experts for any of the first to third inventions. The expert search device outputs one or more documents or one or more opinion sentences from which the scores of the one or more experts are calculated.

かかる構成により、Ｗｅｂ文書を対象として、精度高く専門家を検索し、当該専門家が記載した文書を提示できる。 With this configuration, it is possible to search for a specialist with high accuracy and present a document described by the specialist for a Web document.

また、本第五の発明の専門家検索装置は、第一から第三いずれかの発明に対して、１以上の専門家のうちの一部または全部の専門家に対応する１以上の文書、または１以上の文書にアクセスするための情報である１以上のアクセス情報を出力する文書出力部をさらに具備する専門家検索装置である。 Further, the expert search device of the fifth aspect of the present invention provides one or more documents corresponding to some or all of the one or more experts with respect to any one of the first to third inventions, Alternatively, the expert search apparatus further includes a document output unit that outputs one or more pieces of access information that is information for accessing one or more documents.

本発明による専門家検索装置によれば、Ｗｅｂ文書を対象として、精度高く専門家を検索できる。 According to the expert search device of the present invention, an expert can be searched with high accuracy for a Web document.

実施の形態１における専門家検索システム１の概念図Conceptual diagram of expert search system 1 according to Embodiment 1 同専門家検索システム１のブロック図Block diagram of the expert search system 1 同専門家検索装置１３の動作について説明するフローチャートA flowchart for explaining the operation of the expert search device 13 同専門家候補抽出処理について説明するフローチャートFlowchart explaining the expert candidate extraction process 同意見情報取得処理について説明するフローチャートFlow chart explaining the opinion information acquisition process 同第一関連度取得処理について説明するフローチャートFlowchart explaining the first relevance level acquisition process 同第二関連度取得処理について説明するフローチャートFlowchart explaining the second relevance level acquisition process 同スコア算出処理について説明するフローチャートFlow chart explaining the score calculation process 同Ｗｅｂページの例を示す図The figure which shows the example of the same Web page 同意見情報の例を示す図Figure showing an example of the same opinion information 同コンピュータシステムの概観図Overview of the computer system 同コンピュータシステムのブロック図Block diagram of the computer system

以下、専門家検索システム等の実施形態について図面を参照して説明する。なお、実施の形態において同じ符号を付した構成要素は同様の動作を行うので、再度の説明を省略する場合がある。 Hereinafter, embodiments of an expert search system and the like will be described with reference to the drawings. In addition, since the component which attached | subjected the same code | symbol in embodiment performs the same operation | movement, description may be abbreviate | omitted again.

（実施の形態１）
本実施の形態において、Ｗｅｂ文書からトピックについての１以上の専門家を抽出し、出力する専門家検索システムについて説明する。また、本実施の形態において、スコアの順に専門家をソートして出力する専門家検索システム１について説明する。また、本実施の形態において、専門家の書いた信頼性の高い文書を提示する専門家検索システムについて説明する。 (Embodiment 1)
In the present embodiment, an expert search system that extracts and outputs one or more experts about a topic from a Web document will be described. Moreover, in this Embodiment, the expert search system 1 which sorts and outputs the expert in order of a score is demonstrated. In this embodiment, an expert search system that presents a highly reliable document written by an expert will be described.

図１は、本実施の形態における専門家検索システム１の概念図である。専門家検索システム１は、１以上の端末装置１１、１以上のＷｅｂサーバ装置１２、および専門家検索装置１３を具備する。端末装置１１は、ユーザが専門家の検索のために利用する端末である。Ｗｅｂサーバ装置１２は、いわゆるＷｅｂ上のＷｅｂサーバ装置であり、１以上の文書を格納している。専門家検索装置１３は、専門家を検索できる装置である。端末装置１１、Ｗｅｂサーバ装置１２、および専門家検索装置１３は、インターネット４により、通信可能である。 FIG. 1 is a conceptual diagram of an expert search system 1 in the present embodiment. The expert search system 1 includes one or more terminal devices 11, one or more Web server devices 12, and an expert search device 13. The terminal device 11 is a terminal used by a user for searching for an expert. The Web server device 12 is a so-called Web server device on the Web, and stores one or more documents. The expert search device 13 is a device that can search for experts. The terminal device 11, the Web server device 12, and the expert search device 13 can communicate via the Internet 4.

図２は、本実施の形態における専門家検索システム１のブロック図である。 FIG. 2 is a block diagram of the expert search system 1 in the present embodiment.

専門家検索システム１を構成する端末装置１１は、端末受付部１１０、端末送信部１１１、端末受信部１１２、および端末出力部１１３を具備する。 The terminal device 11 configuring the expert search system 1 includes a terminal reception unit 110, a terminal transmission unit 111, a terminal reception unit 112, and a terminal output unit 113.

Ｗｅｂサーバ装置１２は、文書格納部１２１、および文書送信部１２２を具備する。 The Web server device 12 includes a document storage unit 121 and a document transmission unit 122.

専門家検索装置１３は、受付部１３０、文書検索部１３１、専門家候補抽出部１３２、意見文抽出部１３３、意見情報取得部１３４、関連度取得部１３５、スコア算出部１３６、専門家選択部１３７、専門家出力部１３８、および文書出力部１３９を具備する。 The expert search device 13 includes a reception unit 130, a document search unit 131, an expert candidate extraction unit 132, an opinion sentence extraction unit 133, an opinion information acquisition unit 134, a relevance degree acquisition unit 135, a score calculation unit 136, and an expert selection unit. 137, an expert output unit 138, and a document output unit 139.

端末装置１１を構成する端末受付部１１０は、ユーザからトピックを受け付ける。トピックとは、用語と同意義であると考えても良い。トピックは、単語とは限らず、句や１以上の単語の集合などでも良い。また、トピックとは、検索したい専門家の専門領域を示す用語等である。トピックの入力手段は、キーボードやマウスやメニュー画面によるもの等、何でも良い。端末受付部１１０は、キーボード等の入力手段のデバイスドライバーや、メニュー画面の制御ソフトウェア等で実現され得る。 The terminal reception unit 110 configuring the terminal device 11 receives topics from the user. A topic may be considered synonymous with a term. A topic is not limited to a word, but may be a phrase or a set of one or more words. The topic is a term or the like indicating a specialized area of an expert who wants to search. The topic input means may be anything such as a keyboard, mouse or menu screen. The terminal reception unit 110 can be realized by a device driver for input means such as a keyboard, control software for a menu screen, and the like.

端末送信部１１１は、端末受付部１１０が受け付けたトピックを、専門家検索装置１３に送信する。 The terminal transmission unit 111 transmits the topic received by the terminal reception unit 110 to the expert search device 13.

端末受信部１１２は、１以上の専門家、または１以上の文書、または１以上の意見文などを専門家検索装置１３から受信する。 The terminal receiving unit 112 receives one or more experts, one or more documents, one or more opinion sentences, and the like from the expert search device 13.

端末出力部１１３は、端末受信部１１２が受信した１以上の専門家、または１以上の文書、または１以上の意見文などを出力する。ここで、出力とは、ディスプレイへの表示、プロジェクターを用いた投影、プリンタへの印字、音出力、外部の装置への送信、記録媒体への蓄積、他の処理装置や他のプログラムなどへの処理結果の引渡しなどを含む概念である。端末出力部１１３は、ディスプレイやスピーカー等の出力デバイスを含むと考えても含まないと考えても良い。端末出力部１１３は、出力デバイスのドライバーソフトまたは、出力デバイスのドライバーソフトと出力デバイス等で実現され得る。 The terminal output unit 113 outputs one or more experts, one or more documents, or one or more opinion sentences received by the terminal receiving unit 112. Here, output refers to display on a display, projection using a projector, printing on a printer, sound output, transmission to an external device, storage in a recording medium, output to another processing device or other program, etc. It is a concept that includes delivery of processing results. The terminal output unit 113 may or may not include an output device such as a display or a speaker. The terminal output unit 113 may be realized by output device driver software, or output device driver software and an output device.

Ｗｅｂサーバ装置１２を構成する文書格納部１２１は、１以上の文書を格納している。文書とは、Ｗｅｂ上の文書である。Ｗｅｂ上の文書をＷｅｂ文書ということとする。また、Ｗｅｂ文書は、研究論文なども含むが、いわゆるブログや、ミニブログや、ＳＮＳに投函した文書などのＣＧＭの文書も含む。文書は、一ファイルに分かれている必要はない。つまり、文書の区切りの識別情報は問わない。 The document storage unit 121 included in the Web server device 12 stores one or more documents. A document is a document on the Web. A document on the Web is called a Web document. Web documents include research papers, but also include CGM documents such as so-called blogs, miniblogs, and documents posted to SNS. The document does not have to be divided into one file. That is, the identification information of the document delimiter does not matter.

文書送信部１２２は、文書格納部１２１の文書を専門家検索装置１３に送信する。通常、文書送信部１２２は、専門家検索装置１３の要求に応じて、文書格納部１２１の文書を専門家検索装置１３に送信する。 The document transmission unit 122 transmits the document in the document storage unit 121 to the expert search device 13. Normally, the document transmission unit 122 transmits the document stored in the document storage unit 121 to the expert search device 13 in response to a request from the expert search device 13.

専門家検索装置１３を構成する受付部１３０は、トピックを受け付ける。通常、受付部１３０は、端末装置１１からトピックを受信する。ただし、専門家検索装置１３がスタンドアロンで動作する場合は、受付部１３０は、キーボードやマウスなどの入力デバイスからトピックを受け付ける。つまり、ここでの受け付けとは、キーボードやマウス、タッチパネルなどの入力デバイスから入力された情報の受け付け、有線もしくは無線の通信回線を介して送信された情報の受信、光ディスクや磁気ディスク、半導体メモリなどの記録媒体から読み出された情報の受け付けなどを含む概念である。 The receiving unit 130 configuring the expert search device 13 receives a topic. Usually, the reception unit 130 receives a topic from the terminal device 11. However, when the expert search device 13 operates as a stand-alone, the reception unit 130 receives a topic from an input device such as a keyboard or a mouse. In other words, accepting here means accepting information input from an input device such as a keyboard, mouse, touch panel, receiving information transmitted via a wired or wireless communication line, optical disk, magnetic disk, semiconductor memory, etc. This is a concept including reception of information read from the recording medium.

文書検索部１３１は、１以上のＷｅｂサーバ装置１２から、トピックに関連する１以上の文書を取得する。通常、文書検索部１３１は、Ｗｅｂサーバ装置１２から文書を受信する。ここで、文書検索部１３１は、トピックを含む文書をすべて取得しても良いし、所定のタグ（例えば、タイトルタグ）に対応する箇所にトピックを含む文書を取得しても良いし、トピックの出現回数が閾値以上の文書を取得するなどしても良い。その他、文書検索部１３１がトピックに関連する１以上の文書を取得するアルゴリズムは種々あり得る。なお、文書検索部１３１は、トピックをキーとして、いわゆるＷｅｂ検索する機能を有するものでも良い。かかる場合、文書検索部１３１は、いわゆるＷｅｂ検索エンジンである。 The document search unit 131 acquires one or more documents related to the topic from one or more Web server devices 12. Normally, the document search unit 131 receives a document from the Web server device 12. Here, the document search unit 131 may acquire all documents including a topic, may acquire a document including a topic at a location corresponding to a predetermined tag (for example, a title tag), You may acquire the document whose appearance frequency is more than a threshold value. In addition, there may be various algorithms for the document search unit 131 to acquire one or more documents related to a topic. Note that the document search unit 131 may have a so-called Web search function using topics as keys. In such a case, the document search unit 131 is a so-called Web search engine.

専門家候補抽出部１３２は、１以上の文書から、１以上の専門家候補を抽出する。専門家候補とは、人名でも組織名でも良い。また、専門家候補抽出部１３２は、例えば、固有表現抽出（ＮＥ）の技術を用いて固有名称（人名・組織名）する（例えば、「http://www.sophia-it.com/content/%E5%9B%BA%E6%9C%89%E8%A1%A8%E7%8F%BE%E6%8A%BD%E5%87%BA」「http://www.ntt.co.jp/journal/0806/files/jn200806020.pdf」参照）。さらに具体的には、専門家候補抽出部１３２は、１以上の各文書に対して形態素解析を行い、各文書を単語に分割した後に、機械学習（系列タギング)の手法を使って抽出する。また、専門家候補抽出部１３２は、例えば、特定のタグを手がかりに人名を取得しても良い。また、専門家候補抽出部１３２は、例えば、手がかり句「氏」「氏名」などを用いて、例えば、「山田○夫氏」から人名「山田○夫」を抽出したりしても良い。なお、文章から人名や組織名を取得する技術は公知技術である。 The expert candidate extraction unit 132 extracts one or more expert candidates from one or more documents. The expert candidate may be a person name or an organization name. Further, the expert candidate extraction unit 132, for example, uses a unique expression extraction (NE) technique to give a unique name (person name / organization name) (for example, “http://www.sophia-it.com/content/ % E5% 9B% BA% E6% 9C% 89% E8% A1% A8% E7% 8F% BE% E6% 8A% BD% E5% 87% BA '' http://www.ntt.co.jp/ journal / 0806 / files / jn200806020.pdf "). More specifically, the expert candidate extraction unit 132 performs morphological analysis on one or more documents, divides each document into words, and then extracts them using a machine learning (sequence tagging) technique. Moreover, the expert candidate extraction part 132 may acquire a person's name, for example using a specific tag as a clue. For example, the expert candidate extraction unit 132 may extract the personal name “Yamada ○ Oo” from “Yamada ○ Oo” using, for example, the clue phrases “Mr.” and “Name”. In addition, the technique which acquires a person name and an organization name from a text is a well-known technique.

意見文抽出部１３３は、１以上の文書から、トピックの意見に関する文である１以上の意見文を抽出する。意見文抽出部１３３は、例えば、トピックと、主張を特定する用や句「である。」「と考えられる。」などを含む文を意見文として抽出する。また、意見文抽出部１３３は、例えば、「Tetsuji Nakagawa, Kentaro Inui and Sadao Kurohashi:Dependency Tree-based Sentiment Classification using CRFs with Hidden Variables, In Proceedings of Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2010), June 2010.」に記載されている技術を用いて、意見文を取得する。本論文に記載されている技術を用いれば、意見文抽出部１３３は、処理対象の文に対して構文解析を行った後、その結果得られる文に含まれる語の品詞や語の係り受け関係などを手がかりに、系列タギングに基づき意見を述べている箇所（句など）を抽出する。さらに、意見文抽出部１３３は、意見として抽出された句などについて、係り受け関係に加えて、極性に関する辞書や極Ｗ性を反転させる語の辞書を利用して、条件付き確率場の手法により意見文の極性を出力する。さらに、意見文抽出部１３３は、メリット、出来事、当為などの意見タイプを、構文解析の結果から素性を手がかりとした分類器により出力する。その他、意見文の抽出方法は問わない。 The opinion sentence extraction unit 133 extracts one or more opinion sentences that are sentences related to the topic opinion from one or more documents. The opinion sentence extraction unit 133 extracts, for example, a sentence including a topic, a phrase for specifying an assertion, “is considered”, “considered”, and the like as an opinion sentence. In addition, the opinion sentence extraction unit 133, for example, “Tetsuji Nakagawa, Kentaro Inui and Sadao Kurohashi: Dependency Tree-based Sentiment Classification using CRFs with Hidden Variables, In Proceedings of Human Language Technologies: The 11th Annual Conference of the North American Chapter of Use the technology described in the Association for Computational Linguistics (HLT-NAACL 2010), June 2010. If the technique described in this paper is used, the opinion sentence extraction unit 133 performs syntax analysis on the sentence to be processed, and then the part of speech of words included in the sentence obtained as a result and the dependency relation between words. Using the clues, etc. as a clue, extract the places (phrases etc.) that express opinions based on the series tagging. Furthermore, the opinion sentence extraction unit 133 uses a conditional random field technique for a phrase or the like extracted as an opinion by using a dictionary related to polarity or a word dictionary that reverses the polar W property in addition to the dependency relationship. Outputs the polarity of the opinion sentence. Further, the opinion sentence extraction unit 133 outputs the opinion types such as merits, events, and events by the classifier based on the features from the result of the syntax analysis. In addition, the method of extracting the opinion sentence is not limited.

意見情報取得部１３４は、１以上の意見文から、１以上の各専門家候補ごとに意見情報を取得する。意見情報は、意見文抽出部１３３が取得した意見文に関する情報である。意見情報は、通常、１以上の意見文から取得できる数値情報である。また、意見情報は、通常、１以上の意見文に関する統計データである。意見情報は、通常、肯定的な意見文の数である肯定意見文数、および否定的な意見文の数である否定意見文数を含む。意見情報取得部１３４は、例えば、「良い」「すばらしい」「悪い」「評価できない」などの評価語を格納している評価語ＤＢを保持しており、かかる評価語を用いて、意見文が肯定的な意見文か、否定的な意見文かを判断する。そして、かかる判断結果を用いて、意見情報を取得する。なお、意見情報には、肯定意見文数、否定意見文数の他、肯定・否定意見の数の合計、意見の偏り（総意見数に占める肯定意見の割合）などを含んでも良い。また、意見情報に、意見文または意見文が含まれる文書とトピックとの関連度や文書が格納されているサイトのドメイン種別（co.jp,go.jp,ac.jpなど）などを含んでも良い。 The opinion information acquisition unit 134 acquires opinion information for each of one or more expert candidates from one or more opinion sentences. Opinion information is information about the opinion sentence acquired by the opinion sentence extraction unit 133. Opinion information is usually numerical information that can be acquired from one or more opinion sentences. The opinion information is usually statistical data related to one or more opinion sentences. Opinion information usually includes the number of positive opinion sentences, which is the number of positive opinion sentences, and the number of negative opinion sentences, which is the number of negative opinion sentences. The opinion information acquisition unit 134 holds, for example, an evaluation word DB that stores evaluation words such as “good”, “great”, “bad”, and “cannot be evaluated”. Judge whether it is a positive opinion sentence or a negative opinion sentence. And opinion information is acquired using this judgment result. The opinion information may include the number of positive opinion sentences and the number of negative opinion sentences, the total number of positive and negative opinions, the bias of opinions (the ratio of positive opinions to the total number of opinions), and the like. Opinion information may also include the opinion text or the degree of association between the document containing the opinion text and the topic, the domain type of the site where the document is stored (co.jp, go.jp, ac.jp, etc.) good.

関連度取得部１３５は、専門家候補と文書との関連度を、１以上の専門家候補ごとに取得する。関連度取得部１３５は、例えば、専門家候補抽出部１３２が抽出した１以上の各専門家候補と、文書検索部１３１が取得した１以上の文書のうち、各専門家候補が出現する文書に対して、専門家候補と文書との関連度を、１以上の各専門家候補ごとに取得する。 The degree-of-relevance acquisition unit 135 acquires the degree of association between the expert candidate and the document for each of one or more expert candidates. The degree-of-relevance acquisition unit 135 is, for example, a document in which each expert candidate appears among one or more expert candidates extracted by the expert candidate extraction unit 132 and one or more documents acquired by the document search unit 131. On the other hand, the degree of association between the expert candidate and the document is acquired for each of one or more expert candidates.

さらに具体的には、関連度取得部１３５は、文書検索部１３１が取得した１以上の各文書に対して、専門家候補抽出部１３２が抽出した専門家候補の数（Ｎ）を取得する。そして、関連度取得部１３５は、文書に出現する専門家候補に対して、当該文書との関連度を「１／Ｎ」と算出する。 More specifically, the relevance level acquisition unit 135 acquires the number (N) of expert candidates extracted by the expert candidate extraction unit 132 for one or more documents acquired by the document search unit 131. Then, the degree-of-relevance obtaining unit 135 calculates the degree of relevance of the expert candidate appearing in the document as “1 / N”.

なお、関連度取得部１３５は、文書とトピックとの関連度を取得しても良い。そして、関連度取得部１３５は、専門家候補ごとに、当該専門家候補が出現する文書とトピックとの関連度を取得しても良い。つまり、例えば、関連度取得部１３５は、１以上の各専門家候補に対応する１以上の文書を決定し、１以上の各文書とトピックとの関連度を、１以上の各専門家候補ごとに取得しても良い。かかる場合、例えば、関連度取得部１３５は、１以上の各専門家候補に対応する１以上の文書を決定する。そして、関連度取得部１３５は、専門家候補ごとに、当該専門家候補に対応する１以上の各文書とトピックとの関連度を取得する。ここで、関連度取得部１３５は、例えば、文書中のトピックの出現頻度を、当愛文書とトピックとの関連度として取得する。また、関連度取得部１３５は、ベクトル空間モデル用いたコサイン類似度や、Okapi BM25という方式を用いて、文書とトピックとの関連度を取得しても良い（例えば、「http://gihyo.jp/dev/serial/01/search-engine/0008」、「http://wpedia.goo.ne.jp/enwiki/Probabilistic_relevance_model_(BM25)」参照）。つまり、文書とトピックとの関連度を取得する技術は公知技術で実現され得る。 Note that the degree of association acquisition unit 135 may acquire the degree of association between a document and a topic. Then, the degree-of-association acquisition unit 135 may acquire the degree of association between the document in which the expert candidate appears and the topic for each expert candidate. That is to say, for example, the relevance degree acquisition unit 135 determines one or more documents corresponding to one or more expert candidates, and sets the relevance degree between the one or more documents and the topic for each one or more expert candidates. You may get to. In such a case, for example, the association degree acquisition unit 135 determines one or more documents corresponding to one or more expert candidates. Then, the degree-of-relevance acquisition unit 135 acquires, for each expert candidate, the degree of association between one or more documents corresponding to the expert candidate and the topic. Here, the relevance degree acquisition unit 135 acquires, for example, the appearance frequency of a topic in the document as the relevance degree between the love document and the topic. Further, the degree-of-relevance acquisition unit 135 may acquire the degree of association between a document and a topic using a cosine similarity using a vector space model or a method called Okapi BM25 (for example, “http: // gihyo. jp / dev / serial / 01 / search-engine / 0008 "," http://wpedia.goo.ne.jp/enwiki/Probabilistic_relevance_model_(BM25) "). That is, a technique for acquiring the degree of association between a document and a topic can be realized by a known technique.

また、関連度取得部１３５は、専門家候補と文書との関連度（第一の関連度）、および文書とトピックとの関連度（第二の関連度）の両方を取得しても良い。 In addition, the degree of association acquisition unit 135 may acquire both the degree of association between the expert candidate and the document (first degree of association) and the degree of association between the document and the topic (second degree of association).

スコア算出部１３６は、１以上の各専門家候補ごとに、意見情報および関連度を用いて、トピックについての専門性のスコアを算出する。スコア算出部１３６は、例えば、肯定意見文数が大きいほど、また、関連度が大きいほど、大きなスコアを算出する。スコア算出部１３６は、意見情報に含まれる情報や、関連度をパラメータとする算出式を保持しており、かかる算出式を用いて、専門家候補ごとに、トピックについての専門性のスコアを算出する。なお、算出式の具体例は後述する。 The score calculation unit 136 calculates the expert score for the topic using the opinion information and the degree of association for each of one or more expert candidates. For example, the score calculation unit 136 calculates a larger score as the number of positive opinion sentences is larger and the degree of association is larger. The score calculation unit 136 holds information included in the opinion information and a calculation formula using the degree of relevance as a parameter, and uses the calculation formula to calculate the professional score for the topic for each expert candidate. To do. A specific example of the calculation formula will be described later.

専門家選択部１３７は、スコア算出部１３６が算出したスコアを用いて、１以上の専門家候補から、１以上の専門家を取得する。ここで、専門家とは、人名または組織名などである。専門家選択部１３７は、例えば、スコアが閾値以上の専門家候補を専門家として取得する。また、専門家選択部１３７は、例えば、スコアの上位Ｎ位までの専門家候補を専門家として取得しても良い。さらに、専門家選択部１３７は、例えば、スコアの順に、専門家または専門家候補をソートしても良い。 The expert selection unit 137 acquires one or more experts from one or more expert candidates using the score calculated by the score calculation unit 136. Here, the expert is a person name or an organization name. The expert selection part 137 acquires the expert candidate whose score is more than a threshold value as an expert, for example. Moreover, the expert selection part 137 may acquire the expert candidate to the top N rank of a score as an expert, for example. Furthermore, the expert selection part 137 may sort an expert or an expert candidate in order of a score, for example.

専門家出力部１３８は、専門家選択部１３７が取得した１以上の専門家を出力する。また、専門家出力部１３８は、スコアの高い順に、１以上の専門家をソートして出力することは好適である。また、専門家出力部１３８は、１以上の専門家の出力に代えて、専門家選択部１３７が取得した１以上の各専門家のスコアを算出する元になった１以上の文書または1以上の意見文を出力しても良い。ここで、出力とは、通常、端末装置１１への送信であるが、ディスプレイへの表示、プロジェクターを用いた投影、プリンタへの印字、音出力、記録媒体への蓄積、他の処理装置や他のプログラムなどへの処理結果の引渡しなどでも良い。 The expert output unit 138 outputs one or more experts acquired by the expert selection unit 137. Moreover, it is preferable that the expert output unit 138 sorts and outputs one or more experts in descending order of score. Further, the expert output unit 138 replaces the output of one or more experts with one or more documents or one or more documents from which the scores of the one or more experts acquired by the expert selection unit 137 are calculated. May be output. Here, the output is normally transmission to the terminal device 11, but display on the display, projection using the projector, printing on the printer, sound output, storage on the recording medium, other processing devices and others It is also possible to pass the processing result to other programs.

文書出力部１３９は、１以上の専門家のうちの一部または全部の専門家に対応する１以上の文書、または１以上の文書にアクセスするための情報である１以上のアクセス情報を出力する。なお、専門家出力部１３８が文書やアクセス情報などを出力する場合、文書出力部１３９は不要である。また、文書出力部１３９は、専門家のスコアの順に、専門家の書いた文書やアクセス情報や意見文などを順位付けして出力しても良い。ここで、出力とは、通常、端末装置１１への送信であるが、ディスプレイへの表示、プロジェクターを用いた投影、プリンタへの印字、音出力、記録媒体への蓄積、他の処理装置や他のプログラムなどへの処理結果の引渡しなどでも良い。 The document output unit 139 outputs one or more documents corresponding to some or all of one or more experts, or one or more access information that is information for accessing one or more documents. . When the expert output unit 138 outputs a document, access information, etc., the document output unit 139 is not necessary. Further, the document output unit 139 may rank and output documents written by the expert, access information, opinion sentences, etc. in the order of the expert's score. Here, the output is normally transmission to the terminal device 11, but display on the display, projection using the projector, printing on the printer, sound output, storage on the recording medium, other processing devices and others It is also possible to pass the processing result to other programs.

端末送信部１１１、文書送信部１２２、専門家出力部１３８、および文書出力部１３９は、通常、無線または有線の通信手段で実現されるが、放送手段で実現されても良い。 The terminal transmission unit 111, the document transmission unit 122, the expert output unit 138, and the document output unit 139 are usually realized by wireless or wired communication means, but may be realized by broadcasting means.

端末受信部１１２、および受付部１３０は、通常、無線または有線の通信手段で実現されるが、放送を受信する手段で実現されても良い。 The terminal receiving unit 112 and the receiving unit 130 are usually realized by wireless or wired communication means, but may be realized by means for receiving broadcasts.

文書検索部１３１、専門家候補抽出部１３２、意見文抽出部１３３、意見情報取得部１３４、関連度取得部１３５、スコア算出部１３６、および専門家選択部１３７は、通常、ＭＰＵやメモリ等から実現され得る。文書検索部１３１等の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The document search unit 131, expert candidate extraction unit 132, opinion sentence extraction unit 133, opinion information acquisition unit 134, relevance level acquisition unit 135, score calculation unit 136, and expert selection unit 137 are usually from an MPU, memory, or the like. Can be realized. The processing procedure of the document search unit 131 or the like is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

次に、専門家検索システム１の動作について説明する。まず、端末装置１１の動作について説明する。端末装置１１の端末受付部１１０は、ユーザからトピックを受け付ける。そして、端末送信部１１１は、端末受付部１１０が受け付けたトピックを、専門家検索装置１３に送信する。次に、トピックの送信に対応して、端末受信部１１２は、トピックの専門家、または文書、または意見文などを専門家検索装置１３から受信する。次に、端末出力部１１３は、端末受信部１１２が受信した専門家、または文書、または意見文などを出力する。 Next, the operation of the expert search system 1 will be described. First, the operation of the terminal device 11 will be described. The terminal reception unit 110 of the terminal device 11 receives a topic from the user. Then, the terminal transmission unit 111 transmits the topic received by the terminal reception unit 110 to the expert search device 13. Next, in response to topic transmission, the terminal receiving unit 112 receives a topic expert, a document, an opinion sentence, or the like from the expert search device 13. Next, the terminal output unit 113 outputs an expert, a document, an opinion sentence, or the like received by the terminal receiving unit 112.

次に、Ｗｅｂサーバ装置１２の動作について説明する。Ｗｅｂサーバ装置１２の文書送信部１２２は、通常、専門家検索装置１３の要求に応じて、文書格納部１２１の文書を専門家検索装置１３に送信する。 Next, the operation of the Web server device 12 will be described. The document transmission unit 122 of the Web server device 12 normally transmits the document in the document storage unit 121 to the expert search device 13 in response to a request from the expert search device 13.

次に、専門家検索装置１３の動作について、図３のフローチャートを用いて説明する。 Next, operation | movement of the expert search device 13 is demonstrated using the flowchart of FIG.

（ステップＳ３０１）受付部１３０は、トピックを受け付けたか否かを判断する。トピックを受け付ければステップＳ３０２に行き、トピックを受け付けなければステップＳ３０１に戻る。 (Step S301) The receiving unit 130 determines whether a topic has been received. If a topic is accepted, the process goes to step S302, and if no topic is accepted, the process returns to step S301.

（ステップＳ３０２）文書検索部１３１は、１以上のＷｅｂサーバ装置１２から、ステップＳ３０１で受け付けられたトピックに関連する１以上の文書を取得する。 (Step S302) The document search unit 131 acquires one or more documents related to the topic received in step S301 from one or more Web server devices 12.

（ステップＳ３０３）専門家候補抽出部１３２は、ステップＳ３０２で取得された１以上の文書から、ステップＳ３０１で受け付けられたトピックに関する１以上の専門家候補を抽出する。かかる処理は、専門家候補抽出処理である。専門家候補抽出処理については、図４のフローチャートを用いて説明する。 (Step S303) The expert candidate extraction unit 132 extracts one or more expert candidates related to the topic received in step S301 from one or more documents acquired in step S302. This process is an expert candidate extraction process. The expert candidate extraction process will be described with reference to the flowchart of FIG.

（ステップＳ３０４）意見文抽出部１３３は、ステップＳ３０２で取得された１以上の文書から、ステップＳ３０１で受け付けられたトピックの意見文を、１以上、抽出する。そして、意見情報取得部１３４は、取得された１以上の意見文を用いて、意見情報を取得する。かかる処理は、意見情報取得処理である。意見情報取得処理については、図５のフローチャートを用いて説明する。 (Step S304) The opinion sentence extraction unit 133 extracts one or more opinion sentences of the topic received in step S301 from one or more documents acquired in step S302. And opinion information acquisition part 134 acquires opinion information using one or more acquired opinion sentences. This process is an opinion information acquisition process. The opinion information acquisition process will be described with reference to the flowchart of FIG.

（ステップＳ３０５）関連度取得部１３５は、ステップＳ３０３で抽出された１以上の各専門家候補に対応する１以上の文書を決定し、１以上の各文書とトピックとの関連度を、１以上の専門家候補ごとに取得する。かかる処理は、関連度取得処理（第一関連度取得処理と言っても良い）である。また、第一関連度取得処理で取得する関連度を第一関連度と言っても良い。かかる第一関連度取得処理については、図６のフローチャートを用いて説明する。また、関連度取得部１３５は、専門家候補と文書との関連度を、１以上の専門家候補ごとに取得する。かかる処理も、関連度取得処理（第二関連度取得処理と言っても良い）と言える。また、第二関連度取得処理で取得する関連度を第二関連度と言っても良い。かかる第二関連度取得処理については、図７のフローチャートを用いて説明する。 (Step S305) The relevance degree acquisition unit 135 determines one or more documents corresponding to each of the one or more expert candidates extracted in step S303, and sets the relevance degree between the one or more documents and the topic to 1 or more. Get for each expert candidate. This process is a relevance level acquisition process (may be referred to as a first relevance level acquisition process). Further, the degree of association acquired in the first degree of association acquisition process may be referred to as the first degree of association. The first relevance level acquisition process will be described with reference to the flowchart of FIG. Further, the relevance level acquisition unit 135 acquires the relevance level between the expert candidate and the document for each of one or more expert candidates. Such a process can also be said to be an association degree acquisition process (may be referred to as a second association degree acquisition process). The degree of association acquired in the second degree of association acquisition process may be referred to as the second degree of association. The second relevance level acquisition process will be described with reference to the flowchart of FIG.

（ステップＳ３０６）スコア算出部１３６は、ステップＳ３０４で取得された意見情報およびステップＳ３０５で取得された関連度（第一関連度または/および第二関連度）を用いて、１以上の専門家候補ごとに、トピックについての専門性のスコアを算出する。かかる処理は、スコア算出処理である。スコア算出処理については、図８のフローチャートを用いて説明する。 (Step S306) The score calculation unit 136 uses the opinion information acquired in step S304 and the relevance level (first relevance level or / and second relevance level) acquired in step S305, to generate one or more expert candidates. For each, calculate a specialty score for the topic. Such processing is score calculation processing. The score calculation process will be described with reference to the flowchart of FIG.

（ステップＳ３０７）専門家選択部１３７は、ステップＳ３０６で算出されたスコアを用いて、１以上の専門家候補から、１以上の専門家を取得する。なお、通常、専門家選択部１３７は、スコアが十分に高い（スコアが閾値以上の）専門家候補を、専門家として選択する。 (Step S307) The expert selection unit 137 acquires one or more experts from one or more expert candidates using the score calculated in step S306. Note that the expert selection unit 137 normally selects expert candidates whose scores are sufficiently high (scores equal to or higher than a threshold) as experts.

（ステップＳ３０８）専門家出力部１３８は、ステップＳ３０７で取得された１以上の専門家を出力する。 (Step S308) The expert output unit 138 outputs one or more experts acquired in step S307.

（ステップＳ３０９）文書出力部１３９は、１以上の専門家のうちの一部または全部の専門家に対応する１以上の文書、または１以上の文書にアクセスするための情報である１以上のアクセス情報を出力し、ステップＳ３０１に戻る。 (Step S309) The document output unit 139 has one or more accesses corresponding to one or more documents corresponding to some or all of one or more experts, or information for accessing one or more documents. The information is output and the process returns to step S301.

なお、図３のフローチャートにおいて、ステップＳ３０９の処理は行わなくても良い。 In the flowchart of FIG. 3, the process of step S309 may not be performed.

また、図３のフローチャートにおいて、電源オフや処理終了の割り込みにより処理は終了する。 Further, in the flowchart of FIG. 3, the process is ended by powering off or interruption for aborting the process.

次に、ステップＳ３０２の専門家候補抽出処理について、図４のフローチャートを用いて説明する。 Next, the expert candidate extraction process of step S302 will be described using the flowchart of FIG.

（ステップＳ４０１）専門家候補抽出部１３２は、カウンタｉに１を代入する。 (Step S401) The expert candidate extraction unit 132 substitutes 1 for a counter i.

（ステップＳ４０２）専門家候補抽出部１３２は、ｉ番目の文書が存在するか否かを判断する。ｉ番目の文書が存すればステップＳ４０３に行き、存在しなければ上位処理（ステップＳ３０４）にリターンする。 (Step S402) The expert candidate extraction unit 132 determines whether or not the i-th document exists. If the i-th document exists, the process goes to step S403, and if it does not exist, the process returns to the upper process (step S304).

（ステップＳ４０３）専門家候補抽出部１３２は、ｉ番目の文書に対して、形態素解析を行う。 (Step S403) The expert candidate extraction unit 132 performs morphological analysis on the i-th document.

（ステップＳ４０４）専門家候補抽出部１３２は、例えば、ステップＳ４０３における形態素解析の結果を用いて、例えば、機械学習（系列タギング)の手法を使って、１以上の専門家候補を取得する。 (Step S404) The expert candidate extraction unit 132 acquires one or more expert candidates using, for example, a machine learning (sequence tagging) technique using the result of the morphological analysis in step S403.

（ステップＳ４０５）専門家候補抽出部１３２は、カウンタｊに１を代入する。 (Step S405) The expert candidate extraction unit 132 substitutes 1 for the counter j.

（ステップＳ４０６）専門家候補抽出部１３２は、ｊ番目の専門家候補が存在するか否かを判断する。ｊ番目の専門家候補が存在すればステップＳ４０７に行き、存在しなければステップＳ４１１に行く。 (Step S406) The expert candidate extraction unit 132 determines whether there is a j-th expert candidate. If the j-th expert candidate exists, go to step S407, and if not, go to step S411.

（ステップＳ４０７）専門家候補抽出部１３２は、ｊ番目の専門家候補が、所定のバッファに存在するか否かを判断する。所定のバッファに存在すればステップＳ４０８に行き、存在しなければステップＳ４１０に行く。 (Step S407) The expert candidate extraction unit 132 determines whether or not the j-th expert candidate exists in a predetermined buffer. If it exists in the predetermined buffer, go to step S408, otherwise go to step S410.

（ステップＳ４０８）専門家候補抽出部１３２は、ｊ番目の専門家候補とｉ番目の文書の文書識別子とを対応付けて、所定のバッファに、少なくとも一時蓄積する。 (Step S408) The expert candidate extraction unit 132 associates the j-th expert candidate with the document identifier of the i-th document, and at least temporarily accumulates them in a predetermined buffer.

（ステップＳ４０９）専門家候補抽出部１３２は、カウンタｊを１、インクリメントし、ステップＳ４０６に戻る。 (Step S409) The expert candidate extraction unit 132 increments the counter j by 1, and returns to step S406.

（ステップＳ４１０）専門家候補抽出部１３２は、ｉ番目の文書の文書識別子を、所定のバッファに既に存在するｊ番目の専門家候補に対応付けて少なくとも一時蓄積する。 (Step S410) The expert candidate extraction unit 132 temporarily stores at least the document identifier of the i-th document in association with the j-th expert candidate already existing in the predetermined buffer.

（ステップＳ４１１）専門家候補抽出部１３２は、カウンタｉを１、インクリメントし、ステップＳ４０２に戻る。 (Step S411) The expert candidate extraction unit 132 increments the counter i by 1, and returns to step S402.

なお、図４のフローチャートにおいて、専門家候補抽出部１３２が各文書から専門家候補を抽出するアルゴリズムは問わない。 In the flowchart of FIG. 4, the algorithm that the expert candidate extraction unit 132 extracts expert candidates from each document does not matter.

次に、ステップＳ３０３の意見情報取得処理については、図５のフローチャートを用いて説明する。 Next, the opinion information acquisition process in step S303 will be described with reference to the flowchart of FIG.

（ステップＳ５０１）意見文抽出部１３３は、カウンタｉに１を代入する。 (Step S501) The opinion sentence extraction unit 133 substitutes 1 for the counter i.

（ステップＳ５０２）意見文抽出部１３３は、ステップＳ３０２で取得された文書のうち、ｉ番目の文書が存在するか否かを判断する。ｉ番目の文書が存在すればステップＳ５０３に行き、ｉ番目の文書が存在しなければステップＳ５１４に行く。 (Step S502) The opinion sentence extraction unit 133 determines whether or not the i-th document exists among the documents acquired in step S302. If the i-th document exists, the process goes to step S503, and if the i-th document does not exist, the process goes to step S514.

（ステップＳ５０３）意見文抽出部１３３は、カウンタｊに１を代入する。 (Step S503) The opinion sentence extraction unit 133 substitutes 1 for the counter j.

（ステップＳ５０４）意見文抽出部１３３は、ｉ番目の文書のうち、ｊ番目の文が存在するか否かを判断する。ｊ番目の文が存在すればステップＳ５０５に行き、ｊ番目の文が存在しなければステップＳ５１３に行く。なお、ここでの文は、文の一部でも良い。つまり、文は、意見文か否かを判断する対象の単位で良い。 (Step S504) The opinion sentence extraction unit 133 determines whether or not the j-th sentence exists in the i-th document. If the jth sentence exists, go to step S505, and if the jth sentence does not exist, go to step S513. The sentence here may be a part of the sentence. In other words, the sentence may be a target unit for determining whether or not the sentence is an opinion sentence.

（ステップＳ５０５）意見文抽出部１３３は、ｉ番目の文書のｊ番目の文を取得する。 (Step S505) The opinion sentence extraction unit 133 acquires the j-th sentence of the i-th document.

（ステップＳ５０６）意見文抽出部１３３は、ステップＳ５０５で取得した文が意見文か否かを判断し、判断結果をバッファに代入する。なお、意見文抽出部１３３は、文に対して構文解析を行い、その結果を用いて、文に含まれる語の品詞や語の係り受け関係などを手がかりに、系列タギングに基づき意見を述べている箇所を抽出し、意見文か否かを判断する。 (Step S506) The opinion sentence extraction unit 133 determines whether or not the sentence acquired in step S505 is an opinion sentence, and substitutes the determination result into the buffer. The opinion sentence extraction unit 133 parses the sentence, and uses the result to express an opinion based on the series tagging based on the part of speech of the word included in the sentence and the dependency relation of the word. Where it is and whether it is an opinion sentence or not.

（ステップＳ５０７）意見文抽出部１３３は、バッファの値が意見文であることを示す情報であればステップＳ５０８に行き、意見文でないことを示す情報であればステップＳ５１２に行く。 (Step S507) The opinion sentence extraction unit 133 goes to step S508 if the buffer value is information indicating that it is an opinion sentence, and goes to step S512 if it is information indicating that it is not an opinion sentence.

（ステップＳ５０８）意見情報取得部１３４は、ｉ番目の文書のｊ番目の文が肯定的な文である「肯定的」、否定的な文であるか「否定的」、どちらでもないか「中立」等を判断する。意見情報取得部１３４は、ｊ番目の文の極性を算出しても良い。意見情報取得部１３４は、例えば、語の係り受け関係や、極性に関する辞書や極性を反転させる語の辞書を利用して、条件付き確率場の手法により文の極性を算出する。なお、意見情報取得部１３４は、極性に関する辞書や極性を反転させる語の辞書を保持している、とする。また、極性とは、文の肯定の度合い、または／および否定の度合いを示す情報である。 (Step S508) The opinion information acquisition unit 134 determines whether the j-th sentence of the i-th document is a positive sentence, “positive”, a negative sentence or “negative”, and is “neutral” Etc.]. The opinion information acquisition unit 134 may calculate the polarity of the jth sentence. The opinion information acquisition unit 134 calculates the polarity of a sentence by a conditional random field technique using, for example, a dictionary regarding words, a dictionary related to polarity, or a word that reverses polarity. It is assumed that the opinion information acquisition unit 134 holds a dictionary related to polarity and a dictionary of words whose polarity is reversed. The polarity is information indicating the degree of affirmation and / or denial of a sentence.

（ステップＳ５０９）意見情報取得部１３４は、ステップＳ５０８における判断結果をフラグ（姿勢フラグという。）に代入し、当該姿勢フラグを意見文に付与する。 (Step S509) The opinion information acquisition unit 134 substitutes the determination result in step S508 for a flag (referred to as a posture flag), and adds the posture flag to the opinion sentence.

（ステップＳ５１０）意見情報取得部１３４は、ｉ番目の文書のｊ番目の文と、トピックとの関連度を取得し、意見文に付与する。 (Step S510) The opinion information acquisition unit 134 acquires the degree of association between the j-th sentence of the i-th document and the topic, and assigns it to the opinion sentence.

（ステップＳ５１１）意見情報取得部１３４は、ｉ番目の文書のドメイン種別を取得し、意見文に付与する。なお、ドメイン種別は、文書検索部１３１が文書を取得した際に、文書検索部１３１が取得する。 (Step S511) The opinion information acquisition unit 134 acquires the domain type of the i-th document and assigns it to the opinion sentence. The domain type is acquired by the document search unit 131 when the document search unit 131 acquires a document.

（ステップＳ５１２）意見文抽出部１３３は、カウンタｊを１、インクリメントし、ステップＳ５０４に戻る。 (Step S512) The opinion sentence extraction unit 133 increments the counter j by 1, and returns to step S504.

（ステップＳ５１３）意見文抽出部１３３は、カウンタｉを１、インクリメントし、ステップＳ５０２に戻る。 (Step S513) The opinion sentence extraction unit 133 increments the counter i by 1, and returns to step S502.

（ステップＳ５１４）意見情報取得部１３４は、カウンタｉに１を代入する。 (Step S514) The opinion information acquisition unit 134 substitutes 1 for the counter i.

（ステップＳ５１５）意見情報取得部１３４は、ｉ番目の専門家候補が存在するか否かを判断する。ｉ番目の専門家候補が存在すればステップＳ５１６に行き、ｉ番目の専門家候補が存在しなければ上位処理（ステップＳ３０５）にリターンする。 (Step S515) The opinion information acquisition unit 134 determines whether or not the i-th expert candidate exists. If the i-th expert candidate exists, the process goes to step S516, and if the i-th expert candidate does not exist, the process returns to the higher level process (step S305).

（ステップＳ５１６）意見情報取得部１３４は、ｉ番目の専門家候補に対応するすべての文書識別子を取得する。 (Step S516) The opinion information acquisition unit 134 acquires all document identifiers corresponding to the i-th expert candidate.

（ステップＳ５１７）意見情報取得部１３４は、ステップＳ５１６で取得したすべての文書識別子に対応するすべての文書の中の姿勢フラグが「肯定的」である文の数を取得する。かかる文の数は、肯定文数である。 (Step S517) The opinion information acquisition unit 134 acquires the number of sentences whose posture flag is “positive” in all documents corresponding to all document identifiers acquired in step S516. The number of such sentences is the number of positive sentences.

（ステップＳ５１８）意見情報取得部１３４は、ステップＳ５１６で取得したすべての文書識別子に対応するすべての文書の中の姿勢フラグが「否定的」である文の数を取得する。かかる文の数は、否定文数である。 (Step S518) The opinion information acquisition unit 134 acquires the number of sentences whose posture flag is “negative” in all the documents corresponding to all the document identifiers acquired in step S516. The number of such sentences is the number of negative sentences.

（ステップＳ５１９）意見情報取得部１３４は、ステップＳ５１６で取得したすべての文書識別子に対応するすべての文書の中の意見の合計数を取得する。なお、ここで、意見の合計数とは、通常、すべての意見文の数である。 (Step S519) The opinion information acquisition unit 134 acquires the total number of opinions in all the documents corresponding to all the document identifiers acquired in step S516. Here, the total number of opinions is usually the number of all opinion sentences.

（ステップＳ５２０）意見情報取得部１３４は、肯定文数の割合を取得する。 (Step S520) The opinion information acquisition unit 134 acquires the ratio of the number of positive sentences.

（ステップＳ５２１）意見情報取得部１３４は、カウンタｉを１、インクリメントし、ステップＳ５１５に戻る。 (Step S521) The opinion information acquisition unit 134 increments the counter i by 1, and returns to step S515.

なお、図５のフローチャートにおいて取得された、肯定意見文数、否定意見文数の他、肯定・否定意見の数の合計、肯定文数の割合、関連度、およびドメイン種別などは、意見情報の例である。 In addition to the number of positive opinion sentences and negative opinion sentences acquired in the flowchart of FIG. 5, the total number of positive and negative opinions, the ratio of the number of positive sentences, the degree of association, the domain type, etc. It is an example.

次に、ステップＳ３０５の第一関連度取得処理については、図６のフローチャートを用いて説明する。 Next, the first relevance level acquisition process of step S305 will be described with reference to the flowchart of FIG.

（ステップＳ６０１）関連度取得部１３５は、カウンタｉに１を代入する。 (Step S601) The degree-of-association acquisition unit 135 substitutes 1 for a counter i.

（ステップＳ６０２）関連度取得部１３５は、ｉ番目の文書が存在するか否かを判断する。ｉ番目の文書が存すればステップＳ６０３に行き、存在しなければステップＳ６０６に行く。 (Step S602) The relevance degree acquiring unit 135 determines whether or not the i-th document exists. If the i-th document exists, the process goes to step S603, and if not, the process goes to step S606.

（ステップＳ６０３）関連度取得部１３５は、ｉ番目の文書とトピックとの関連度を取得する。 (Step S603) The relevance level acquisition unit 135 acquires the relevance level between the i-th document and the topic.

（ステップＳ６０４）関連度取得部１３５は、ｉ番目の文書の文書識別子と関連度とを対応付けて蓄積する。 (Step S604) The relevance level acquisition unit 135 associates and stores the document identifier of the i-th document and the relevance level.

（ステップＳ６０５）関連度取得部１３５は、カウンタｉを１、インクリメントし、ステップＳ６０２に戻る。 (Step S605) The degree-of-association acquisition unit 135 increments the counter i by 1, and returns to step S602.

（ステップＳ６０６）関連度取得部１３５は、カウンタｉに１を代入する。 (Step S606) The relevance degree acquiring unit 135 substitutes 1 for the counter i.

（ステップＳ６０７）関連度取得部１３５は、ｉ番目の専門家候補が存在するか否かを判断する。ｉ番目の専門家候補が存すればステップＳ６０８に行き、存在しなければ上位処理（ステップＳ３０６）にリターンする。 (Step S607) The relevance degree acquisition unit 135 determines whether or not the i-th expert candidate exists. If the i-th expert candidate exists, the process goes to step S608, and if it does not exist, the process returns to the higher level process (step S306).

（ステップＳ６０８）関連度取得部１３５は、ｉ番目の専門家候補に対応するすべての文書識別子をバッファから取得する。 (Step S608) The degree-of-association acquisition unit 135 acquires all document identifiers corresponding to the i-th expert candidate from the buffer.

（ステップＳ６０９）関連度取得部１３５は、ステップＳ６０８で取得したすべての文書識別子に対応する１以上の関連度を取得する。 (Step S609) The relevance level acquisition unit 135 acquires one or more relevance levels corresponding to all the document identifiers acquired in step S608.

（ステップＳ６１０）関連度取得部１３５は、ｉ番目の専門家候補に対応付けて、ステップＳ６０９で取得した関連度をバッファに蓄積する。 (Step S610) The association degree acquisition unit 135 stores the association degree acquired in Step S609 in the buffer in association with the i-th expert candidate.

（ステップＳ６１１）関連度取得部１３５は、カウンタｉを１、インクリメントし、ステップＳ５０２に戻る。 (Step S611) The degree-of-association acquisition unit 135 increments the counter i by 1, and returns to step S502.

次に、ステップＳ３０５の第二関連度取得処理については、図７のフローチャートを用いて説明する。 Next, the second relevance level acquisition process of step S305 will be described with reference to the flowchart of FIG.

（ステップＳ７０１）関連度取得部１３５は、カウンタｉに１を代入する。 (Step S701) The degree-of-association acquisition unit 135 substitutes 1 for a counter i.

（ステップＳ７０２）ｉ番目の専門家候補が存在するか否かを判断する。ｉ番目の専門家候補が存すればステップＳ７０３に行き、存在しなければ上位処理（ステップＳ３０６）にリターンする。 (Step S702) It is determined whether or not an i-th expert candidate exists. If the i-th expert candidate exists, the process goes to step S703, and if it does not exist, the process returns to the upper process (step S306).

（ステップＳ７０３）関連度取得部１３５は、ｉ番目の専門家候補に対応するすべての文書識別子をバッファから取得する。 (Step S703) The relevance level acquisition unit 135 acquires all document identifiers corresponding to the i-th expert candidate from the buffer.

（ステップＳ７０４）関連度取得部１３５は、カウンタｊに１を代入する。 (Step S704) The degree-of-association obtaining unit 135 substitutes 1 for a counter j.

（ステップＳ７０５）関連度取得部１３５は、ステップＳ７０３で取得した文書識別子の中の、ｊ番目の文書識別子が存在するか否かを判断する。ｊ番目の文書識別子が存在すればステップＳ７０６に行き、存在しなければステップＳ７１０に行く。 (Step S705) The degree-of-association acquisition unit 135 determines whether or not the j-th document identifier is present among the document identifiers acquired in step S703. If the jth document identifier exists, the process goes to step S706, and if not, the process goes to step S710.

（ステップＳ７０６）関連度取得部１３５は、ｊ番目の文書識別子に対応する専門家候補の数を取得する。 (Step S706) The degree-of-association acquisition unit 135 acquires the number of expert candidates corresponding to the j-th document identifier.

（ステップＳ７０７）関連度取得部１３５は、ステップＳ７０６で取得した専門家候補の数を用いて、ｊ番目の文書識別子で識別される文書と、ｉ番目の専門家候補との関連度を算出する。なお、文書と専門家候補との関連度を算出する演算式は、専門家候補の数をパラメータとする減少関数である。 (Step S707) The relevance degree acquisition unit 135 calculates the relevance degree between the document identified by the j th document identifier and the i th expert candidate using the number of expert candidates acquired in step S706. . The arithmetic expression for calculating the degree of association between the document and the expert candidate is a decreasing function with the number of expert candidates as a parameter.

（ステップＳ７０８）関連度取得部１３５は、ｊ番目の文書識別子とｉ番目の専門家候補とに対応付けて、ステップＳ７０７で算出した関連度をバッファに一時蓄積する。 (Step S708) The association degree acquisition unit 135 temporarily stores the association degree calculated in step S707 in the buffer in association with the jth document identifier and the i th expert candidate.

（ステップＳ７０９）関連度取得部１３５は、カウンタｊを１、インクリメントし、ステップＳ７０５に戻る。 (Step S709) The association degree acquisition unit 135 increments the counter j by 1, and returns to step S705.

（ステップＳ７１０）関連度取得部１３５は、カウンタｉを１、インクリメントし、ステップＳ７０２に戻る。 (Step S710) The association degree acquisition unit 135 increments the counter i by 1, and returns to step S702.

ステップＳ３０６のスコア算出処理については、図８のフローチャートを用いて説明する。 The score calculation process in step S306 will be described with reference to the flowchart of FIG.

（ステップＳ８０１）スコア算出部１３６は、カウンタｉに１を代入する。 (Step S801) The score calculation unit 136 substitutes 1 for a counter i.

（ステップＳ８０２）スコア算出部１３６は、ｉ番目の専門家候補が存在するか否かを判断する。ｉ番目の専門家候補が存すればステップＳ８０３に行き、存在しなければ上位処理（ステップＳ３０７）にリターンする。 (Step S802) The score calculation unit 136 determines whether or not the i-th expert candidate exists. If the i-th expert candidate exists, the process goes to step S803, and if it does not exist, the process returns to the host process (step S307).

（ステップＳ８０３）スコア算出部１３６は、ｉ番目の専門家候補に対応する意見情報を取得する。 (Step S803) The score calculation unit 136 acquires opinion information corresponding to the i-th expert candidate.

（ステップＳ８０４）スコア算出部１３６は、ｉ番目の専門家候補に対応する関連度を取得する。ここでの関連度は、第一関連度、または第二関連度、または第一関連度と第二関連度である。 (Step S804) The score calculation unit 136 acquires the degree of association corresponding to the i-th expert candidate. The degree of association here is the first degree of association, the second degree of association, or the first degree of association and the second degree of association.

（ステップＳ８０５）スコア算出部１３６は、ステップＳ８０３で取得した意見情報、およびステップＳ８０４で取得した関連度を用いて、スコアを算出する。 (Step S805) The score calculation unit 136 calculates a score using the opinion information acquired in step S803 and the degree of association acquired in step S804.

（ステップＳ８０６）スコア算出部１３６は、ｉ番目の専門家候補に対応付けて、ステップＳ８０５で算出したスコアを、バッファに蓄積する。 (Step S806) The score calculation unit 136 stores the score calculated in step S805 in the buffer in association with the i-th expert candidate.

（ステップＳ８０７）スコア算出部１３６は、カウンタｉを１、インクリメントし、ステップＳ８０２に戻る。 (Step S807) The score calculation unit 136 increments the counter i by 1, and returns to step S802.

以下、本実施の形態における専門家検索システム１の具体的な動作について説明する。 Hereinafter, a specific operation of the expert search system 1 in the present embodiment will be described.

今、ユーザは、端末装置１１に対して、トピック「ゆとり教育」を入力した、とする。そして、端末装置１１の端末受付部１１０は、トピック「ゆとり教育」を受け付ける。そして、端末送信部１１１は、端末受付部１１０が受け付けたトピック「ゆとり教育」を、専門家検索装置１３に送信する。 Now, it is assumed that the user inputs the topic “clear education” to the terminal device 11. Then, the terminal reception unit 110 of the terminal device 11 receives the topic “clear education”. Then, the terminal transmission unit 111 transmits the topic “clear education” received by the terminal reception unit 110 to the expert search device 13.

次に、専門家検索装置１３の受付部１３０は、トピック「ゆとり教育」を受信する。そして、文書検索部１３１は、トピック「ゆとり教育」をキーとして、いわゆるＷｅｂ検索を行い、１以上の文書を取得する。そして、文書検索部１３１は、２以上のＷｅｂページを取得する。ここで、一のＷｅｂページは、例えば、図９に示すＷｅｂページであった、とする。 Next, the reception unit 130 of the expert search device 13 receives the topic “clear education”. Then, the document search unit 131 performs a so-called Web search using the topic “clear education” as a key, and acquires one or more documents. Then, the document search unit 131 acquires two or more Web pages. Here, it is assumed that one Web page is, for example, the Web page shown in FIG.

次に、専門家候補抽出部１３２は、例えば、図９の文書から、トピック「ゆとり教育」に関する専門家候補「山田○夫」を抽出する。専門家候補抽出部１３２は、固有名認識の技術を用いて、専門家候補「山田○夫」を抽出する。 Next, the expert candidate extraction unit 132 extracts, for example, the expert candidate “Yo Yamada” related to the topic “clear education” from the document in FIG. 9. The expert candidate extraction unit 132 extracts the expert candidate “Yamada ○ o” using the technique of proper name recognition.

次に、意見文抽出部１３３は、図９の文書から、１以上の文を抽出する。そして、意見文抽出部１３３は、取得した各文が意見文に対して、構文解析を行う。構文解析は、例えば、ＫＮＰ(http://nlp.kuee.kyoto-u.ac.jp/nl-resource/knp.html参照)により行う。 Next, the opinion sentence extraction unit 133 extracts one or more sentences from the document of FIG. Then, the opinion sentence extraction unit 133 performs syntax analysis on the opinion sentence obtained by each sentence. The syntax analysis is performed by, for example, KNP (see http://nlp.kuee.kyoto-u.ac.jp/nl-resource/knp.html).

次に、意見文抽出部１３３は、各文が意見文か否かを判断し、判断結果をバッファに代入する。 Next, the opinion sentence extraction unit 133 determines whether each sentence is an opinion sentence, and substitutes the determination result into the buffer.

次に、意見文抽出部１３３は、各意見文のタイプ、および極性を取得する。具体的には、意見文抽出部１３３は、例えば、構文解析の結果から1以上の素性を取得し、当該1以上の素性を機械学習の分類器に与えて、タイプを取得する。また、意見文抽出部１３３は、例えば、文に対して構文解析を行い、極性の反転語、および係り受け関係も考慮した上で文の極性を取得する。なお、意見文の抽出、意見文のタイプ、および極性の取得に関する技術は、文献「Tetsuji Nakagawa, Kentaro Inui and Sadao Kurohashi:Dependency Tree-based Sentiment Classification using CRFs with Hidden Variables, In Proceedings of Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2010), June 2010.」に記載されている。なお、意見文の抽出、極性の取得について、上記文献以外の方法で行われても良い。 Next, the opinion sentence extraction unit 133 acquires the type and polarity of each opinion sentence. Specifically, the opinion sentence extraction unit 133 acquires, for example, one or more features from the result of syntax analysis, gives the one or more features to a machine learning classifier, and acquires a type. Also, the opinion sentence extraction unit 133 performs syntax analysis on the sentence, for example, and acquires the polarity of the sentence in consideration of the polarity inversion word and the dependency relationship. The technology related to opinion sentence extraction, opinion sentence type, and polarity acquisition is described in the document `` Tetsuji Nakagawa, Kentaro Inui and Sadao Kurohashi: Dependency Tree-based Sentiment Classification using CRFs with Hidden Variables, In Proceedings of Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2010), June 2010. Note that opinion sentence extraction and polarity acquisition may be performed by methods other than those described above.

そして、意見文抽出部１３３は、図９の文章に対して、バッファ内に、図１０に示す意見文の情報を得る。意見文の情報は、「タイプ」「極性」「意見文」を有するレコードの集合である。なお、意見文抽出部１３３が取得する文は、文の一部でも良い。 And the opinion sentence extraction part 133 acquires the information of the opinion sentence shown in FIG. 10 in a buffer with respect to the sentence of FIG. Opinion sentence information is a set of records having “type”, “polarity”, and “opinion sentence”. Note that the sentence acquired by the opinion sentence extraction unit 133 may be a part of the sentence.

次に、意見情報取得部１３４は、図９の文書に対して、専門家候補「山田○夫」の意見情報を、以下のように意見情報を取得する。つまり、意見情報取得部１３４は、図１０の表の極性がプラスの意見文の数「２」を、肯定的な意見数として取得する。また、意見情報取得部１３４は、図１０の表の極性がマイナスの意見文の数「１」を、否定的な意見数として取得する。さらに、意見情報取得部１３４は、図１０の表のレコード数「５」を、意見文の数として取得する。 Next, the opinion information acquisition unit 134 acquires the opinion information of the expert candidate “Yamada ○ o” and the opinion information for the document in FIG. 9 as follows. That is, the opinion information acquisition unit 134 acquires the number of opinion sentences “2” having a positive polarity in the table of FIG. 10 as the number of positive opinions. Further, the opinion information acquisition unit 134 acquires the number “1” of negative opinion sentences in the table of FIG. 10 as a negative opinion number. Furthermore, the opinion information acquisition unit 134 acquires the number of records “5” in the table of FIG. 10 as the number of opinion sentences.

次に、関連度取得部１３５は、専門家候補と文書との関連度を、以下の数式１を用いて算出する。なお、関連度取得部１３５は、数式１の演算式の情報を予め保持している、とする。
Next, the relevance level acquisition unit 135 calculates the relevance level between the expert candidate and the document using Equation 1 below. It is assumed that the degree-of-association acquisition unit 135 holds information on the arithmetic expression of Formula 1 in advance.

なお、数式１において、ａｉｊは、文書ｉと専門家候補ｊの関連度である。ｎｉは、文書ｉから抽出された専門家候補の数である。 In Equation 1, aij is the degree of association between document i and expert candidate j. ni is the number of expert candidates extracted from the document i.

つまり、例えば、図９の文書から、専門家候補｛山田○夫，寺田○和｝が検出された場合、関連度取得部１３５は、専門家候補「山田○夫」と図９の文書との関連度ａｉｊは、１／２となる。なお、例えば、図９の文書から、専門家候補｛山田○夫｝のみが検出された場合、関連度取得部１３５は、専門家候補「山田○夫」と図９の文書との関連度ａｉｊは、１となる。 That is, for example, when an expert candidate {Yo Yamada, Kazu Terada} is detected from the document in FIG. 9, the association degree acquisition unit 135 determines whether the expert candidate “Yo Yamada” and the document in FIG. The degree of association aij is ½. For example, when only the expert candidate {Yo Yamada} is detected from the document in FIG. 9, the relevance degree acquisition unit 135 determines the relevance degree aij between the expert candidate “Yo Yamada” and the document in FIG. 9. Becomes 1.

また、関連度取得部１３５は、各文書と、トピック「ゆとり教育」との関連度を算出する。例えば、関連度取得部１３５は、図９の文書において、トピック「ゆとり教育」の出現回数「１４」を取得する。そして、関連度取得部１３５は、出現回数をパラメータとする増加関数により、図９の文書とトピック「ゆとり教育」との関連度ｒｉを取得する。 Further, the relevance level acquisition unit 135 calculates a relevance level between each document and the topic “clear education”. For example, the degree-of-relevance acquisition unit 135 acquires the number of appearances “14” of the topic “clear education” in the document of FIG. Then, the relevance level acquisition unit 135 acquires the relevance level ri between the document of FIG. 9 and the topic “clear education” by an increasing function using the number of appearances as a parameter.

次に、スコア算出部１３６は、以下の数式２を用いて、専門家候補ｊのスコアＥｊを算出する、とする。
Next, it is assumed that the score calculation unit 136 calculates the score Ej of the expert candidate j using the following formula 2.

なお、文書ｉの品質スコアｑｉは、以下の数式３により算出される、とする。
It is assumed that the quality score qi of the document i is calculated by the following mathematical formula 3.

数式３において用いられる素性種別は、バイアス項、肯定意見数、否定意見数、意見総数である。また、数式３において、ｗはモデルパラメータである。モデルパラメータとは、素性種別に与えられるパラメータである。モデルパラメータが（バイアス項のパラメータ，肯定意見数のパラメータ，否定意見数のパラメータ，意見総数のパラメータ）である場合、モデルパラメータｗは、例えば、ｗ＝（−１．４４，０．０６，０．３９０，０．２３９）である。また、数式３において、ｘｉは文書ｉの素性の集合である。ｘｉは、ここでは、（バイアス項，肯定意見数，否定意見数，意見総数）である。なお、バイアス項とは、文書とは独立のパラメータで、品質スコア関数の形状を決定するものである。 The feature types used in Equation 3 are a bias term, the number of positive opinions, the number of negative opinions, and the total number of opinions. In Equation 3, w is a model parameter. A model parameter is a parameter given to a feature type. When the model parameter is (a bias term parameter, a positive opinion number parameter, a negative opinion number parameter, a total opinion parameter), the model parameter w is, for example, w = (− 1.44, 0.06, 0). 390, 0.239). In Equation 3, xi is a set of features of document i. Here, xi is (bias term, number of positive opinions, number of negative opinions, total number of opinions). The bias term is a parameter independent of the document and determines the shape of the quality score function.

また、スコア算出部１３６は、上述した意見情報を取得する。ここでの意見情報は、文書ｉの素性の集合ｘｉ（１，２，１，５）である。 Further, the score calculation unit 136 acquires the opinion information described above. The opinion information here is a set of features xi (1, 2, 1, 5) of the document i.

そして、スコア算出部１３６は、文書ｉの品質スコアｑｉを構成する「ｗ・ｘ」（数式３参照）を、「ｗｘ＝−１．４４＋０．１２＋０．３９０＋１．１９５＝０．２６５」と算出する。 Then, the score calculation unit 136 calculates “w · x” (see Equation 3) constituting the quality score qi of the document i as “wx = −1.44 + 0.12 + 0.390 + 1.195 = 0.265”. .

次に、スコア算出部１３６は、数式３に従って、文書ｉの品質スコアｑｉを「ｑｉ＝０．５６６」と算出する。 Next, the score calculation unit 136 calculates the quality score qi of the document i as “qi = 0.666” according to Equation 3.

次に、スコア算出部１３６は、数式２に従って、専門家候補「山田○夫」に対して、「山田○夫」が抽出された文書ｉのトピックとの関連度ｒｉ、文書ｉの品質スコアｑｉ、文書ｉと専門家候補「山田○夫」との関連度ａｉｊとを乗算する。そして、スコア算出部１３６は、数式２に従って、関連度ｒｉ、品質スコアｑｉ、および関連度ａｉｊとの積を、各文書に対して算出し、算出した値の和を、専門家候補「山田○夫」のスコアとして算出する。 Next, the score calculation unit 136 relates to the expert candidate “Yamada ○ Oo” according to Equation 2, the degree of relevance ri of the document i from which “Yamada ○ Oo” is extracted, and the quality score qi of the document i , The document i and the degree of association aij between the expert candidate “Yamada ○ o” are multiplied. Then, the score calculation unit 136 calculates, for each document, the product of the relevance level ri, the quality score qi, and the relevance level aij according to Equation 2, and calculates the sum of the calculated values as the expert candidate “Yamada ○ Calculated as “husband” score.

以上の処理を、他の専門家候補に対しても行い、各専門家候補のスコアが取得される。 The above processing is also performed on other expert candidates, and the score of each expert candidate is acquired.

次に、専門家選択部１３７は、算出されたスコアを用いて、１以上の専門家候補から、例えば、スコアが閾値より高い１以上の専門家を取得する。そして、文書出力部１３９は、スコアの順に、専門家をソートして、端末装置１１に送信する。 Next, the expert selection part 137 acquires one or more experts whose score is higher than a threshold value, for example, from one or more expert candidates using the calculated score. Then, the document output unit 139 sorts the experts in the order of the scores and transmits them to the terminal device 11.

また、文書出力部１３９は、各専門家に対応する１以上の文書のＵＲＬ等を端末装置１１に送信する。 In addition, the document output unit 139 transmits the URL of one or more documents corresponding to each expert to the terminal device 11.

次に、端末装置１１の端末受信部１１２は、スコア順にソートされた１以上の専門家および文書のＵＲＬを専門家検索装置１３から受信する。そして、端末出力部１１３は、端末受信部１１２が受信した１以上の専門家および文書のＵＲＬを出力する。 Next, the terminal receiving unit 112 of the terminal device 11 receives from the expert search device 13 URLs of one or more experts and documents sorted in the order of score. Then, the terminal output unit 113 outputs the URL of one or more experts and documents received by the terminal receiving unit 112.

以上、本実施の形態によれば、Ｗｅｂ文書を対象として、精度高く専門家を検索できる。 As described above, according to the present embodiment, it is possible to search for an expert with high accuracy for a Web document.

また、本実施の形態によれば、その玉石混淆のＷｅｂ上の多数の情報の中から専門家の書いた情報を順位付けして提示することができるので、Ｗｅｂ上の多数の情報の中からより信頼性の高い情報へのアクセスを可能とし、利用者の利便性を高めることができる。 Also, according to the present embodiment, information written by an expert can be ranked and presented from a large number of information on the Web of the cobblestone chaos. This makes it possible to access more reliable information and enhance user convenience.

また、本実施の形態によれば、トピックについての専門家の一覧や，あるいは専門家が発信した情報を利用者に提示することが可能となる。 Moreover, according to this Embodiment, it becomes possible to show a user the list of experts about a topic, or the information which the expert transmitted.

なお、本実施の形態における処理は、ソフトウェアで実現しても良い。そして、このソフトウェアをソフトウェアダウンロード等により配布しても良い。また、このソフトウェアをＣＤ−ＲＯＭなどの記録媒体に記録して流布しても良い。なお、このことは、本明細書における他の実施の形態においても該当する。なお、本実施の形態における専門家検索装置を実現するソフトウェアは、以下のようなプログラムである。つまり、このプログラムは、例えば、コンピュータを、トピックを受け付ける受付部と、１以上のＷｅｂサーバ装置から、前記トピックに関連する１以上の文書を取得する文書検索部と、前記１以上の文書から、１以上の専門家候補を抽出する専門家候補抽出部と、前記１以上の文書から、前記トピックの意見に関する文である１以上の意見文を抽出する意見文抽出部と、前記１以上の意見文から、肯定的な意見文の数である肯定意見文数、および否定的な意見文の数である否定意見文数を含む意見情報を、前記１以上の各専門家候補ごとに取得する意見情報取得部と、前記１以上の各専門家候補ごとに、前記意見情報を用いて、前記トピックについての専門性のスコアを算出するスコア算出部と、前記スコアを用いて、前記１以上の専門家候補から、１以上の専門家を取得する専門家選択部と、前記専門家選択部が取得した１以上の専門家を出力する専門家出力部として機能させるプログラムである。 Note that the processing in the present embodiment may be realized by software. Then, this software may be distributed by software download or the like. Further, this software may be recorded and distributed on a recording medium such as a CD-ROM. This also applies to other embodiments in this specification. Note that the software that implements the expert search device according to the present embodiment is the following program. That is, the program includes, for example, a computer from a reception unit that receives a topic, a document search unit that acquires one or more documents related to the topic from one or more Web server devices, and the one or more documents. An expert candidate extraction unit that extracts one or more expert candidates, an opinion sentence extraction unit that extracts one or more opinion sentences that are sentences related to the opinions of the topic from the one or more documents, and the one or more opinions Opinions for obtaining, from each sentence, opinion information including the number of positive opinion sentences, which is the number of positive opinion sentences, and the number of negative opinion sentences, which is the number of negative opinion sentences, for each of the one or more expert candidates An information acquisition unit, a score calculation unit that calculates the expert score for the topic using the opinion information for each of the one or more expert candidates, and the one or more specialists using the score House From the coenzyme, and experts selector which acquires one or more experts, a program to function as an expert output unit for outputting one or more experts the expert selection unit has acquired.

また、図１１は、本明細書で述べたプログラムを実行して、上述した実施の形態の専門家検索装置等を実現するコンピュータの外観を示す。上述の実施の形態は、コンピュータハードウェア及びその上で実行されるコンピュータプログラムで実現され得る。図１１は、このコンピュータシステム３４０の概観図であり、図１２は、コンピュータシステム３４０のブロック図である。 FIG. 11 shows the external appearance of a computer that executes the program described in this specification to realize the expert search device or the like of the above-described embodiment. The above-described embodiments can be realized by computer hardware and a computer program executed thereon. FIG. 11 is an overview diagram of the computer system 340, and FIG. 12 is a block diagram of the computer system 340.

図１１において、コンピュータシステム３４０は、ＦＤドライブ、ＣＤ−ＲＯＭドライブを含むコンピュータ３４１と、キーボード３４２と、マウス３４３と、モニタ３４４とを含む。 In FIG. 11, a computer system 340 includes a computer 341 including an FD drive and a CD-ROM drive, a keyboard 342, a mouse 343, and a monitor 344.

図１２において、コンピュータ３４１は、ＦＤドライブ３４１１、ＣＤ−ＲＯＭドライブ３４１２に加えて、ＭＰＵ３４１３と、ＣＤ−ＲＯＭドライブ３４１２及びＦＤドライブ３４１１に接続されたバス３４１４と、ブートアッププログラム等のプログラムを記憶するためのＲＯＭ３４１５とに接続され、アプリケーションプログラムの命令を一時的に記憶するとともに一時記憶空間を提供するためのＲＡＭ３４１６と、アプリケーションプログラム、システムプログラム、及びデータを記憶するためのハードディスク３４１７とを含む。ここでは、図示しないが、コンピュータ３４１は、さらに、ＬＡＮへの接続を提供するネットワークカードを含んでも良い。 In FIG. 12, in addition to the FD drive 3411 and the CD-ROM drive 3412, the computer 341 stores an MPU 3413, a bus 3414 connected to the CD-ROM drive 3412 and the FD drive 3411, and a program such as a bootup program. A RAM 3416 for temporarily storing application program instructions and providing a temporary storage space; and a hard disk 3417 for storing application programs, system programs, and data. Although not shown here, the computer 341 may further include a network card that provides connection to the LAN.

コンピュータシステム３４０に、上述した実施の形態の専門家検索装置等の機能を実行させるプログラムは、ＣＤ−ＲＯＭ３５０１、またはＦＤ３５０２に記憶されて、ＣＤ−ＲＯＭドライブ３４１２またはＦＤドライブ３４１１に挿入され、さらにハードディスク３４１７に転送されても良い。これに代えて、プログラムは、図示しないネットワークを介してコンピュータ３４１に送信され、ハードディスク３４１７に記憶されても良い。プログラムは実行の際にＲＡＭ３４１６にロードされる。プログラムは、ＣＤ−ＲＯＭ３５０１、ＦＤ３５０２またはネットワークから直接、ロードされても良い。 A program that causes the computer system 340 to execute the functions of the expert search device or the like of the above-described embodiment is stored in the CD-ROM 3501 or FD 3502, inserted into the CD-ROM drive 3412 or FD drive 3411, and further a hard disk 3417 may be transferred. Alternatively, the program may be transmitted to the computer 341 via a network (not shown) and stored in the hard disk 3417. The program is loaded into the RAM 3416 at the time of execution. The program may be loaded directly from the CD-ROM 3501, the FD 3502, or the network.

プログラムは、コンピュータ３４１に、上述した実施の形態の専門家検索装置等の機能を実行させるオペレーティングシステム（ＯＳ）、またはサードパーティープログラム等は、必ずしも含まなくても良い。プログラムは、制御された態様で適切な機能（モジュール）を呼び出し、所望の結果が得られるようにする命令の部分のみを含んでいれば良い。コンピュータシステム３４０がどのように動作するかは周知であり、詳細な説明は省略する。 The program does not necessarily include an operating system (OS), a third party program, or the like that causes the computer 341 to execute the functions of the expert search device of the above-described embodiment. The program only needs to include an instruction portion that calls an appropriate function (module) in a controlled manner and obtains a desired result. How the computer system 340 operates is well known and will not be described in detail.

なお、上記プログラムにおいて、情報を送信するステップや、情報を受信するステップなどでは、ハードウェアによって行われる処理、例えば、送信するステップにおけるモデムやインターフェースカードなどで行われる処理（ハードウェアでしか行われない処理）は含まれない。 In the above program, in the step of transmitting information, the step of receiving information, etc., processing performed by hardware, for example, processing performed by a modem or an interface card in the transmitting step (only performed by hardware). Not included).

また、上記プログラムを実行するコンピュータは、単数であってもよく、複数であってもよい。すなわち、集中処理を行ってもよく、あるいは分散処理を行ってもよい。 Further, the computer that executes the program may be singular or plural. That is, centralized processing may be performed, or distributed processing may be performed.

また、上記各実施の形態において、一の装置に存在する２以上の通信手段は、物理的に一の媒体で実現されても良いことは言うまでもない。 Further, in each of the above embodiments, it goes without saying that two or more communication units existing in one apparatus may be physically realized by one medium.

また、上記各実施の形態において、各処理（各機能）は、単一の装置（システム）によって集中処理されることによって実現されてもよく、あるいは、複数の装置によって分散処理されることによって実現されてもよい。 In each of the above embodiments, each process (each function) may be realized by centralized processing by a single device (system), or by distributed processing by a plurality of devices. May be.

本発明は、以上の実施の形態に限定されることなく、種々の変更が可能であり、それらも本発明の範囲内に包含されるものであることは言うまでもない。 The present invention is not limited to the above-described embodiments, and various modifications are possible, and it goes without saying that these are also included in the scope of the present invention.

以上のように、本発明にかかる専門家検索装置は、Ｗｅｂ文書を対象として、精度高く専門家を検索できる、という効果を有し、専門家用のＷｅｂ検索エンジン等として有用である。 As described above, the expert search device according to the present invention has an effect that an expert can be searched with high accuracy for a Web document, and is useful as a Web search engine for experts.

１専門家検索システム
１１端末装置
１２Ｗｅｂサーバ装置
１３専門家検索装置
１１０端末受付部
１１１端末送信部
１１２端末受信部
１１３端末出力部
１２１文書格納部
１２２文書送信部
１３０受付部
１３１文書検索部
１３２専門家候補抽出部
１３３意見文抽出部
１３４意見情報取得部
１３５関連度取得部
１３６スコア算出部
１３７専門家選択部
１３８専門家出力部
１３９文書出力部 DESCRIPTION OF SYMBOLS 1 Expert search system 11 Terminal apparatus 12 Web server apparatus 13 Expert search apparatus 110 Terminal reception part 111 Terminal transmission part 112 Terminal reception part 113 Terminal output part 121 Document storage part 122 Document transmission part 130 Reception part 131 Document search part 132 Specialization House candidate extraction unit 133 Opinion sentence extraction unit 134 Opinion information acquisition unit 135 Relevance degree acquisition unit 136 Score calculation unit 137 Expert selection unit 138 Expert output unit 139 Document output unit

Claims

A receptionist that accepts topics,
A document search unit for acquiring one or more documents related to the topic from one or more Web server devices;
An expert candidate extraction unit that extracts one or more expert candidates from the one or more documents;
An opinion sentence extraction unit that extracts one or more opinion sentences that are sentences related to the opinion of the topic from the one or more documents;
An opinion information acquisition unit that acquires, from each of the one or more expert sentences, opinion information that is information about the opinion sentence for each of the one or more expert candidates;
For each of the one or more expert candidates, using the opinion information, a score calculation unit that calculates a professional score for the topic;
Using the score, an expert selection unit that obtains one or more experts from the one or more expert candidates;
An expert search device comprising: an expert output unit that outputs one or more experts acquired by the expert selection unit.

For one or more expert candidates extracted by the expert candidate extraction unit and one or more documents acquired by the document search unit, for the document in which each expert candidate appears, A relevance level acquisition unit that acquires the relevance level for each of the one or more expert candidates,
The score calculation unit
The expert search device according to claim 1, wherein, for each of the one or more expert candidates, an expert score for the topic is calculated using the opinion information and the degree of association.

The expert output unit
The expert search device according to claim 1, wherein one or more experts are sorted and output in descending order of the score.

The expert output unit
The one or more documents or one or more opinion sentences from which the score of one or more experts acquired by the expert selection unit is calculated instead of the output of the one or more experts are output. The expert search device according to any one of claims 1 to 3.

A document output unit that outputs one or more documents corresponding to some or all of the one or more experts, or one or more access information that is information for accessing the one or more documents; The expert search device according to any one of claims 1 to 3, further comprising:

Expert search method realized by reception unit, document search unit, expert candidate extraction unit, opinion sentence extraction unit, opinion information acquisition unit, relevance level acquisition unit, score calculation unit, expert selection unit, and expert output unit Because
The reception unit receives a topic;
A document search step in which the document search unit acquires one or more documents related to the topic from one or more Web server devices;
The expert candidate extraction unit extracts one or more expert candidates from the one or more documents;
An opinion sentence extraction step in which the opinion sentence extraction unit extracts one or more opinion sentences that are sentences relating to the opinions of the topic from the one or more documents;
The opinion information acquisition unit includes, from the one or more opinion sentences, opinion information including the number of positive opinion sentences that is the number of positive opinion sentences and the number of negative opinion sentences that are the number of negative opinion sentences, An opinion information acquisition step to be acquired for each of one or more expert candidates;
The association degree acquisition unit determines one or more documents corresponding to the one or more expert candidates, and determines the association degree between the one or more documents and the topic for each of the one or more expert candidates. Relevance level acquisition step acquired in
A score calculating step in which the score calculation unit calculates a score of expertise for the topic using the opinion information and the degree of association for each of the one or more expert candidates;
The expert selection unit obtains one or more experts from the one or more expert candidates using the score; and
An expert search method comprising: an expert search step, wherein the expert output unit includes an expert output unit that outputs one or more experts acquired by the expert selection unit.