JP2018067041A

JP2018067041A - Extraction apparatus and computer program

Info

Publication number: JP2018067041A
Application number: JP2016203564A
Authority: JP
Inventors: 明生大門; Akio Daimon
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2016-10-17
Filing date: 2016-10-17
Publication date: 2018-04-26

Abstract

PROBLEM TO BE SOLVED: To provide an extraction apparatus and a computer program capable of appropriately extracting a person conforming to a desired person image.SOLUTION: An extraction apparatus includes a reception unit that receives a text content corresponding to a desired person image, a storage unit that stores a transmission information database for recording a plurality of transmission information transmitted by a plurality of transmitters respectively in association with identification information identifying each of the transmission information and a transmitter identification information database for recording a transmitter identification information identifying each of the transmitter of the transmission information, and a calculation unit that calculates a similarity between a group of the transmission information transmitted by the transmitter and the text content received by the reception unit using the transmission information database and the transmitter identification information database for each the transmitter, and an extraction unit that extracts the transmitter identification information of the transmitter whose similarity calculated by the calculation unit is equal to or more than a predetermined degree from the transmitter identification information database.SELECTED DRAWING: Figure 1

Description

本発明は、ネットワークコミュニティ上での発信情報を基に、所望の発信者を抽出する抽出装置及びコンピュータを抽出装置として動作させるコンピュータプログラムに関する。 The present invention relates to an extraction apparatus that extracts a desired caller based on transmission information on a network community and a computer program that causes a computer to operate as the extraction apparatus.

商品・サービスの品質及び機能が成熟すると商品・サービスの差別化はデザイン、質感、コンセプトイメージ等の消費者の好みが多岐に亘る要素によるものとなる。多岐に亘る消費者の嗜好に合わせて商品・サービスを開発するに際し、少しでも多くの消費者から支持されるものとすべく消費者の意見を取り入れることが効果的であるとされている。このため消費者からのアンケート、レビュー、コメント等を参考にするのみならず、商品開発の場への参加を募るなどの取り組みが従前より行なわれている。 As the quality and function of products and services mature, the differentiation of products and services depends on a wide variety of consumer preferences such as design, texture, and concept image. In developing products and services that meet a wide range of consumer preferences, it is said that it is effective to incorporate the opinions of consumers so that they can be supported by as many consumers as possible. For this reason, in addition to referring to questionnaires, reviews, comments, etc. from consumers, efforts such as recruiting participation in product development have been made.

特許文献１には、商品に関して事前に登録された会員同士（時には商品開発の運営者も交えた状態で）でのチャットによるアイディア交換の実施、更には、イメージイラスト等の投票を、ネットワークを介して実現するシステムが開示されている。 In Patent Document 1, ideas exchanged by chat between members registered in advance with respect to the product (sometimes with the product development operator), and voting for image illustrations, etc. are also made via the network. A system to be realized is disclosed.

特許文献２には、モニタ商品の商品化を希望する応募者へ、商品開発の場を提供するシステム（モニタハウス）が開示されている。特許文献２で提案されているモニタハウスでは、応募者と、予め登録された会員及びそのモニタ商品の商品化への応援者とのアンケートのやり取りの場、又は商品化後の広告宣伝の依頼の場が提供される。このときアンケートの対象とする会員又は応援者を性別、年齢等の条件で絞ることが可能であることが開示されている。 Patent Document 2 discloses a system (monitor house) that provides a place for product development to applicants who wish to commercialize monitor products. In the monitor house proposed in Patent Literature 2, a place for exchanging questionnaires with applicants, members registered in advance and supporters for commercialization of monitor products, or requests for advertisement after commercialization A place is provided. At this time, it is disclosed that it is possible to narrow down the members or supporters targeted by the questionnaire based on conditions such as sex and age.

特開２００３−０９１６２８号公報Japanese Patent Laid-Open No. 2003-091628 特開２００１−３５７１９４号公報JP 2001-357194 A

特許文献１に示したように消費者である人物を対象にアイディア交換、アンケート、投票等を実施する場合、そのアイディア交換の結果、投票結果は、アイディアを出し合った会員がどのような人物であるかによって結果に差異が生じる。したがって特許文献２に開示されているように、商品開発を行なう事業者は、どのような消費者を商品のターゲットとするかに応じてアンケートの対象人物を属性で絞り込むことが広く行われている。 When performing idea exchange, questionnaire, voting, etc. for a person who is a consumer as shown in Patent Document 1, as a result of the idea exchange, the result of the vote is what kind of member the member who shared the idea Depending on whether or not the result is different. Therefore, as disclosed in Patent Document 2, a business operator who develops a product widely narrows down the target person of the questionnaire by attribute according to what consumer is the target of the product. .

しかしながら、開発対象の商品・サービスのターゲットとすべき消費者（需要者）を、性別、年齢、地域、嗜好を表わすキーワード等の属性情報によって絞り込む場合、いくつかの問題によって適切な人物を抽出できない可能性がある。まず、多様化する商品・サービスの開発においては、ターゲットとする消費者の人物像を属性情報で表現するには、その属性情報を膨大な数で分類する必要が生じる。したがって、一般消費者が自己申告でその属性情報を登録することが非常に煩雑となり、正確性が失われる。更に属性情報では各々の興味・関心の強さ、度合いを測ることは困難であり、淡く興味を持つ人物と、強く興味・関心を持つ人物との区別が難しい。したがって属性情報による抽出では、ターゲットとすべき人物像に合致するような人物、つまり対象の商品に本当に興味・関心を持つ人物以外の人物を抽出してしまう可能性がある。また、属性情報には時間的な要素が反映され難い。例えば既に興味を失っている分野のキーワードがその人物の属性情報として登録されたままとなっている場合、その時点では興味・関心を持たなくなっている人物が抽出される可能性がある。 However, when narrowing down the consumers (customers) that should be the target of products and services to be developed based on attribute information such as keywords representing gender, age, region, and preference, it is not possible to extract an appropriate person due to some problems. there is a possibility. First, in the development of diversifying products and services, it is necessary to classify the attribute information into a huge number in order to express the target consumer image with the attribute information. Therefore, it becomes very complicated for a general consumer to register the attribute information by self-report, and accuracy is lost. Furthermore, it is difficult to measure the strength and degree of each interest / interest in the attribute information, and it is difficult to distinguish between a person who is lightly interested and a person who is strongly interested / interested. Therefore, in the extraction based on the attribute information, there is a possibility that a person who matches the person image to be targeted, that is, a person other than a person who is really interested or interested in the target product may be extracted. In addition, it is difficult to reflect temporal elements in the attribute information. For example, if a keyword in a field that has already lost interest remains registered as the attribute information of the person, there is a possibility that a person who is no longer interested or interested at that time may be extracted.

本発明は斯かる事情に鑑みてなされたものであり、所望の人物像に即した人物を適切に抽出することを可能とする抽出装置及びコンピュータプログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and an object thereof is to provide an extraction apparatus and a computer program that can appropriately extract a person in accordance with a desired person image.

本開示に係る抽出装置は、所望の人物像に対応するテキストコンテンツを受け付ける受付部と、複数の発信者から夫々発信された複数の発信情報を、各発信情報を識別する識別情報に対応付けて記録した発信情報データベース、及び前記発信情報夫々の発信者を識別する発信者識別情報を記録した発信者識別情報データベースを記憶する記憶部と、前記発信情報データベース及び発信者識別情報データベースを用いて発信者毎に、該発信者が発信した発信情報群と前記受付部で受け付けたテキストコンテンツとの間の類似度を算出する算出部と、該算出部が算出した類似度が所定の度合い以上である発信者の発信者識別情報を前記発信者識別情報データベースから抽出する抽出部とを備える。 The extraction apparatus according to the present disclosure associates a reception unit that receives text content corresponding to a desired person image, and a plurality of pieces of transmission information that are respectively transmitted from a plurality of senders, with identification information that identifies each piece of transmission information. A recorded transmission information database, a storage unit for storing a sender identification information database in which sender identification information for identifying each sender of the transmission information is recorded, and transmission using the transmission information database and the sender identification information database A calculation unit that calculates the similarity between the transmission information group transmitted by the caller and the text content received by the reception unit, and the similarity calculated by the calculation unit is equal to or greater than a predetermined level. An extraction unit that extracts caller identification information of the caller from the caller identification information database.

本開示に係る抽出装置は、前記抽出部は、対応する類似度の高さの降順によって前記発信者識別情報をソートする。 In the extraction device according to the present disclosure, the extraction unit sorts the caller identification information in descending order of the corresponding similarity.

本開示に係る抽出装置は、前記抽出部により抽出された発信者識別情報毎に、前記類似度の算出に係る情報、及び該類似度に寄与する複数の発信情報を表示するための表示情報を作成する作成部を更に備える。 The extraction device according to the present disclosure includes, for each caller identification information extracted by the extraction unit, information related to the calculation of the similarity and display information for displaying a plurality of transmission information contributing to the similarity. A creation unit is further provided.

本開示に係る抽出装置は、前記発信情報データベースには、発信情報の発信時刻が前記発信情報に対応付けて記録されており、前記算出部は、前記発信者毎に該発信者が発信した発信情報群を抽出し、抽出された発信情報群から、直近から所定の長さの期間に発信された発信情報群を絞り込み、絞り込まれた発信情報群を用いて前記受付部で受け付けたテキストコンテンツとの類似度を算出する。 In the extraction device according to the present disclosure, a transmission time of transmission information is recorded in the transmission information database in association with the transmission information, and the calculation unit transmits a transmission transmitted by the caller for each caller. Extracting the information group, narrowing down the transmission information group transmitted in a predetermined length from the extracted transmission information group, and the text content received by the reception unit using the narrowed transmission information group The similarity is calculated.

本開示に係る抽出装置は、前記発信情報データベースには、発信情報の発信時刻が前記発信情報に対応付けて記録されており、前記算出部は、前記発信情報データベースから抽出される発信情報夫々に、発信時刻が類似度の算出時点から近い順に高い数値となる重み付け係数を付与し、付与された重み付け係数を前記発信情報に含まれる言葉の出現回数に乗算し、前記発信者毎に、該発信者が発信した発信情報群に含まれる前記言葉及び該言葉の出現回数を用いて前記受付部で受け付けたテキストコンテンツとの類似度を算出する。 In the extraction device according to the present disclosure, a transmission time of transmission information is recorded in the transmission information database in association with the transmission information, and the calculation unit is configured to transmit each transmission information extracted from the transmission information database. The transmission time is assigned a weighting coefficient that becomes a higher numerical value in the order from the calculation time of the similarity, the assigned weighting coefficient is multiplied by the number of appearances of words included in the transmission information, and the transmission is performed for each sender. The similarity with the text content received by the reception unit is calculated using the words included in the transmission information group transmitted by the person and the number of appearances of the words.

本開示に係る抽出装置では、前記受付部は複数のテキストコンテンツを共に受け付け、前記算出部は、前記複数のテキストコンテンツ夫々について類似度を算出するか、又は前記複数のテキストコンテンツから導出される特徴を示す情報を用いて類似度を算出する。 In the extraction device according to the present disclosure, the reception unit receives a plurality of text contents together, and the calculation unit calculates a similarity for each of the plurality of text contents, or is derived from the plurality of text contents. The similarity is calculated using the information indicating.

本開示に係る抽出装置では、前記算出部は、前記発信情報又は前記受付部で受け付けたテキストコンテンツ夫々に含まれる言葉と、該言葉に関連する関連語が記録してある関連辞書から抽出される関連語とを用いて類似度を算出する。 In the extraction device according to the present disclosure, the calculation unit is extracted from a related dictionary in which words included in the transmission information or the text content received by the reception unit and related words related to the words are recorded. Similarity is calculated using related terms.

本開示に係るコンピュータプログラムは、複数の発信者から夫々発信された複数の発信情報を、各発信情報を識別する識別情報に対応付けて記録した発信情報データベース、及び前記発信情報夫々の発信者を識別する発信者識別情報を記録した発信者識別情報データベースに対し読み書きが可能なコンピュータに、前記発信者識別情報を抽出させるコンピュータプログラムであって、前記コンピュータに、所望の人物像に対応するテキストコンテンツを受け付けるステップ、前記発信情報データベース及び発信者識別情報データベースを用いて発信者毎に、該発信者が発信した発信情報群と、受け付けたテキストコンテンツとの間の類似度を算出するステップ、算出された類似度が所定の度合い以上である発信者の発信者識別情報を前記発信者識別情報データベースから抽出するステップを実行させる。 The computer program according to the present disclosure includes a transmission information database in which a plurality of transmission information respectively transmitted from a plurality of senders is recorded in association with identification information for identifying each transmission information, and each sender of the transmission information. A computer program for causing a computer capable of reading and writing to a sender identification information database recording sender identification information to be extracted to extract the sender identification information, the text content corresponding to a desired person image on the computer Calculating the similarity between the transmission information group transmitted by the sender and the received text content for each sender using the transmission information database and the sender identification information database. The caller identification information of the caller whose similarity is equal to or higher than a predetermined level is sent. To execute the steps of extracting from the identification information database.

本開示の抽出装置にあっては、テキストコンテンツに類似する発信情報を発信する発信者の発信者識別情報を抽出する処理によって、テキストコンテンツに合致する所望の人物像に即した人物を適切に抽出することが可能となる。 In the extraction device of the present disclosure, a person in accordance with a desired person image that matches the text content is appropriately extracted by the process of extracting the sender identification information of the sender who transmits the transmission information similar to the text content. It becomes possible to do.

本実施の形態に係る情報処理システムの概要を示す模式図である。It is a schematic diagram which shows the outline | summary of the information processing system which concerns on this Embodiment. 情報処理システムを構成する各装置の内部構成を示すブロック図である。It is a block diagram which shows the internal structure of each apparatus which comprises an information processing system. ユーザ情報ＤＢの内容例を示す説明図である。It is explanatory drawing which shows the example of the content of user information DB. 発信情報ＤＢの内容例を示す説明図である。It is explanatory drawing which shows the example of the content of transmission information DB. サーバ装置による抽出処理の手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure of the extraction process by a server apparatus. 発信情報の内容例を示す説明図である。It is explanatory drawing which shows the example of the content of transmission information. テキストコンテンツの内容例を示す説明図である。It is explanatory drawing which shows the example of the content of a text content. 抽出結果を表示させる表示情報を制御部が作成する作成処理の手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure of the creation process in which a control part produces the display information which displays an extraction result. 制御部により作成された表示情報に基づきクライアント装置の表示部にて表示される画面例を示す説明図である。It is explanatory drawing which shows the example of a screen displayed on the display part of a client apparatus based on the display information produced by the control part. サーバ装置による抽出処理の手順の他の一例を示すフローチャートである。It is a flowchart which shows another example of the procedure of the extraction process by a server apparatus. サーバ装置による抽出処理の手順の他の一例を示すフローチャートである。It is a flowchart which shows another example of the procedure of the extraction process by a server apparatus. サーバ装置による抽出処理の手順の他の一例を示すフローチャートである。It is a flowchart which shows another example of the procedure of the extraction process by a server apparatus.

本発明をその実施の形態を示す図面に基づいて具体的に説明する。 The present invention will be specifically described with reference to the drawings showing the embodiments thereof.

図１は、本実施の形態に係る情報処理システムの概要を示す模式図である。情報処理システムは、サーバ装置１と、サーバ装置１に通信接続が可能なクライアント装置２，３とを含んで構成される。サーバ装置１は、複数の一般ユーザによるクライアント装置２を介したチャット、掲示板への書き込み、コメント、レビュー等のテキストコンテンツの投稿が可能なネットワークコミュニティ１００を提供するサーバ機能を有する。そしてサーバ装置１は、ネットワークコミュニティ１００上での複数の一般ユーザからの発信情報（発言、書き込み、音声、ブログ記事等）Ｖに基づき、商品開発者である特定ユーザが所望する人物像に合致した一般ユーザを抽出する処理を行なう抽出装置として機能する。 FIG. 1 is a schematic diagram showing an outline of the information processing system according to the present embodiment. The information processing system includes a server device 1 and client devices 2 and 3 that can be connected to the server device 1 for communication. The server apparatus 1 has a server function that provides a network community 100 that allows a plurality of general users to post text contents such as chat, writing on a bulletin board, comments, and reviews via the client apparatus 2. Then, the server device 1 matches a person image desired by a specific user who is a product developer based on transmission information (speaking, writing, voice, blog article, etc.) V from a plurality of general users on the network community 100. It functions as an extraction device that performs processing for extracting a general user.

図１に示す例においては、商品開発者であるユーザＤが、ブームとなっているある特定の食材を利用した独特な商品を開発するにあたって、商品のターゲットとなる一般ユーザの協力を得たいと考える。ネットワークコミュニティ１００上のグルメに関するグループにて発言、書き込みを行なっている一般ユーザとして例えばユーザＡ，Ｂ，Ｃが存在する。ユーザＡはその特定の食材を以前から非常に好ましく思っているものの本場の調理人が料理を提供する店、又は本場の地まで旅行に行って食べることを好む人物であるとする。ユーザＢは、その特定の食材を含むブームとなっているものに広く興味を持ちつつ、普段から家庭で料理を頻繁に行なう人物であるとする。そしてユーザＣは同じくその特定の食材を好むものの外食で楽しむことが多い人物であるとする。これに対し、商品開発者であるユーザＤは、ブームとなっている食材に興味を持ち、且つ家庭での調理に馴染むユーザＢのような人物の抽出を希望している。 In the example shown in FIG. 1, user D who is a product developer wants to obtain the cooperation of a general user who is the target of a product in developing a unique product using a specific food that has become a boom. Think. For example, there are users A, B, and C as general users who speak and write in a group relating to gourmet on the network community 100. It is assumed that the user A is a person who has been very fond of the specific food for a long time, but prefers to go to the store where the local cook provides food or travel to the home. It is assumed that the user B is a person who frequently cooks at home, while being widely interested in what has become a boom including the specific ingredients. The user C is also a person who likes the specific ingredients but often enjoys eating out. On the other hand, the user D who is a product developer is interested in foods that are booming and desires to extract a person such as the user B who is familiar with cooking at home.

このような場合、例えば対象とする特定の食材の名称を属性情報としてユーザを抽出すると、ユーザＡ及びユーザＣを抽出できたとしても、その特定の食材に対して特別な強い関心を有していないユーザＢは抽出されない可能性がある。本実施の形態に係る情報処理システムにおいては、属性情報を使用せず、ターゲットとしたいユーザが好みそうなテキストコンテンツＳをユーザＤからクライアント装置３を介して受け付け、このテキストコンテンツＳを用いて抽出を行なう。テキストコンテンツＳは、新聞、雑誌又は電子媒体における記事のみならず、音声又は動画をテキスト化したものであってもよい。図１に示す例では例えば、テキストコンテンツＳはブームとなっている食材を家庭で栽培し、更に調理して食べる魅力についての記事である。サーバ装置１は、テキストコンテンツＳとネットワークコミュニティ１００でのユーザＡ，Ｂ，Ｃからの発信情報Ｖとに基づいて、ユーザＤの所望のターゲットの人物像に近いと思われるユーザＢを適切に抽出することができる。更にサーバ装置１がユーザＤとユーザＢとの間における連絡の契機を提供することにより、ユーザＤは商品開発にユーザＢの協力を得ることが可能となる。 In such a case, for example, if a user is extracted using the name of a target specific food as attribute information, even if user A and user C can be extracted, the user has a special strong interest in the specific food. There is a possibility that no user B is extracted. In the information processing system according to the present embodiment, the text content S that the user who wants to target is likely to like is received from the user D via the client device 3 without using the attribute information, and extracted using the text content S. To do. The text content S may be not only articles in newspapers, magazines or electronic media, but also text or audio or moving images. In the example shown in FIG. 1, for example, the text content S is an article about the appeal of growing booming ingredients at home, and cooking and eating them. The server device 1 appropriately extracts the user B who seems to be close to the desired target person image of the user D based on the text content S and the transmission information V from the users A, B, and C in the network community 100. can do. Furthermore, when the server apparatus 1 provides an opportunity for communication between the user D and the user B, the user D can obtain the cooperation of the user B in product development.

このような適切な人物の抽出を実現するために情報処理システムにおけるサーバ装置１（抽出装置）は、例えば以下のような構成を有する。図２は、情報処理システムを構成する各装置の内部構成を示すブロック図である。サーバ装置１は、通信媒体４を介してクライアント装置２及び３と通信接続される。 In order to realize such an appropriate person extraction, the server apparatus 1 (extraction apparatus) in the information processing system has the following configuration, for example. FIG. 2 is a block diagram showing an internal configuration of each device constituting the information processing system. The server device 1 is communicatively connected to the client devices 2 and 3 via the communication medium 4.

通信媒体４は、ＬＡＮ４１、インターネット等の公衆網４２、公衆網４２へのアクセスポイント（ＡＰ）４３、通信キャリアが提供するキャリアネットワーク４４及び該キャリアネットワーク４４へアクセスするための基地局４５を含む。通信媒体４は、サーバ装置１、クライアント装置２，３間を有線又は無線により通信接続する。 The communication medium 4 includes a LAN 41, a public network 42 such as the Internet, an access point (AP) 43 to the public network 42, a carrier network 44 provided by a communication carrier, and a base station 45 for accessing the carrier network 44. The communication medium 4 connects the server apparatus 1 and the client apparatuses 2 and 3 through a wired or wireless communication connection.

サーバ装置１はサーバコンピュータを用い、制御部１０、記憶部１１、一時記憶部１２及び通信部１３を備える。制御部１０はＣＰＵ（Central Processing Unit ）、クロック等を用い、記憶部１１に記憶されているサーバプログラム１１Ｐ及び抽出プログラム１２Ｐに基づいた各処理を実行し、汎用サーバコンピュータをコミュニティ提供サーバ及び抽出装置として機能させる。一時記憶部１２はＤＲＡＭ（Dynamic Random Access Memory）等の揮発性メモリを用いて制御部１０の処理により生成される情報を一時的に記憶する。 The server device 1 uses a server computer and includes a control unit 10, a storage unit 11, a temporary storage unit 12, and a communication unit 13. The control unit 10 uses a CPU (Central Processing Unit), a clock, and the like, executes each process based on the server program 11P and the extraction program 12P stored in the storage unit 11, and sets the general-purpose server computer as a community providing server and an extraction device To function as. The temporary storage unit 12 temporarily stores information generated by the processing of the control unit 10 using a volatile memory such as a DRAM (Dynamic Random Access Memory).

記憶部１１は、ハードディスクを用いてサーバプログラム１１Ｐ及び抽出プログラム１２Ｐのほか、制御部１０が参照するデータを記憶する。また記憶部１１は制御部１０により作成されるネットワークコミュニティ１００のユーザ情報をユーザ情報（発信者識別情報）ＤＢ１１１として記憶し、ネットワークコミュニティ１００上での発信情報Ｖを発信情報ＤＢ１１２として記憶する。なおユーザ情報ＤＢ１１１及び発信情報ＤＢ１１２は、制御部１０により情報の読み書きが可能であればその所在は限定されず、サーバ装置１外の記憶装置に記憶されている構成であってもよい。 The storage unit 11 stores data referred to by the control unit 10 in addition to the server program 11P and the extraction program 12P using a hard disk. The storage unit 11 stores user information of the network community 100 created by the control unit 10 as a user information (sender identification information) DB 111, and stores transmission information V on the network community 100 as a transmission information DB 112. Note that the location of the user information DB 111 and the transmission information DB 112 is not limited as long as the control unit 10 can read and write information, and the user information DB 111 and the transmission information DB 112 may be configured to be stored in a storage device outside the server device 1.

サーバプログラム１１Ｐは、サーバコンピュータをチャットサーバ、掲示板サーバ又はコンテンツ投稿サーバとしての機能を発揮させるためのサーバ用プログラムである。抽出プログラム１２Ｐは、ユーザの識別情報を抽出するための後述にて説明する処理を制御部１０に実行させるためのプログラムである。 The server program 11P is a server program for causing the server computer to function as a chat server, a bulletin board server, or a content posting server. The extraction program 12P is a program for causing the control unit 10 to execute processing that will be described later for extracting user identification information.

通信部１３は、通信媒体４に含まれるＬＡＮ４１に接続されているネットワークカードである。制御部１０は通信部１３により、通信媒体４を介した通信が可能である。例えば制御部１０はルータを介して公衆網４２経由でクライアント装置２，３との通信接続が可能である。 The communication unit 13 is a network card connected to the LAN 41 included in the communication medium 4. The control unit 10 can communicate with the communication unit 13 via the communication medium 4. For example, the control unit 10 can communicate with the client apparatuses 2 and 3 via the public network 42 via the router.

クライアント装置２及びクライアント装置３は、スマートフォン、タブレット端末、デスクトップ型又はラップトップ型のＰＣを用いる。クライアント装置２，３は、ブックリーダと呼ばれる情報端末、ゲーム機、又はＰＤＡ等、通信媒体４を介した通信機能を有している情報端末であれば適用することが可能である。クライアント装置２及びクライアント装置３は基本的に同様の構成部を有し、制御部２０（３０）、記憶部２１（３１）、一時記憶部２２（３２）、表示部２３（３３）、操作部２４（３４）、音声入力部２５（３５）、及び通信部２６（３６）を備える。符号の相違は使用するユーザの種別の差異に応じたインタフェースの相違に対応する。共通する構成について以下クライアント装置２にて説明を行ない、相違するインタフェースについては後述する。 The client device 2 and the client device 3 use a smartphone, a tablet terminal, a desktop type, or a laptop type PC. The client devices 2 and 3 can be applied to any information terminal having a communication function via the communication medium 4 such as an information terminal called a book reader, a game machine, or a PDA. The client device 2 and the client device 3 basically have the same components, and include a control unit 20 (30), a storage unit 21 (31), a temporary storage unit 22 (32), a display unit 23 (33), and an operation unit. 24 (34), voice input unit 25 (35), and communication unit 26 (36). The difference in code corresponds to the difference in interface according to the difference in the type of user to be used. The common configuration will be described below in the client device 2, and different interfaces will be described later.

制御部２０は、ＣＰＵ、クロック等を含み、記憶部２１に記憶されているクライアントプログラム２Ｐに基づいた各処理を実行し、汎用コンピュータをクライアント装置２として機能させる。一時記憶部２２は、ＤＲＡＭ等の揮発性メモリを用いて制御部２０の処理により生成される情報を一時的に記憶する。 The control unit 20 includes a CPU, a clock, and the like, executes each process based on the client program 2P stored in the storage unit 21, and causes the general-purpose computer to function as the client device 2. The temporary storage unit 22 temporarily stores information generated by the processing of the control unit 20 using a volatile memory such as a DRAM.

記憶部２１は、ハードディスク又はフラッシュメモリ等の不揮発性メモリを用いる。記憶部２１はクライアントプログラム２Ｐのほか、Ｗｅｂブラウザプログラム等のクライアント用の汎用プログラムを記憶し、更に制御部２０の読み書きする各種データを記憶する。 The storage unit 21 uses a non-volatile memory such as a hard disk or a flash memory. In addition to the client program 2P, the storage unit 21 stores general-purpose programs for clients such as a Web browser program, and further stores various data read and written by the control unit 20.

クライアントプログラム２Ｐは、後述するようにクライアント装置２の制御部２０に各処理を実行させるプログラムである。クライアントプログラム２Ｐは、図示しない記録媒体に記録されてあるクライアントプログラムを読取部により読み出し、又は通信部２６経由で取得し、記録したものであってもよい（いずれも図示せず）。 The client program 2P is a program that causes the control unit 20 of the client device 2 to execute each process, as will be described later. The client program 2P may be a client program recorded on a recording medium (not shown) that is read by the reading unit or acquired and recorded via the communication unit 26 (none of which is shown).

表示部２３は、タッチパネル内蔵型ディスプレイを用いる。制御部２０は、クライアントプログラム２Ｐに基づき、表示部２３へテキスト及びアイコン等の画像を含む各種操作画面を表示する。表示部２３は、タッチパネル内蔵型でないディスプレイでもよい。 The display unit 23 uses a touch panel built-in display. The control unit 20 displays various operation screens including images such as text and icons on the display unit 23 based on the client program 2P. The display unit 23 may be a display that does not have a built-in touch panel.

操作部２４は、表示部２３のディスプレイに内蔵されるタッチパネル及びクライアント装置２の筐体に設けられるボタン群を用いる。クライアント装置２がＰＣである場合、操作部２４はキーボード及びマウス等のユーザインタフェースを含む。操作部２４は、ユーザによる操作情報を制御部２０へ通知する。 The operation unit 24 uses a touch panel built in the display of the display unit 23 and a button group provided on the housing of the client device 2. When the client device 2 is a PC, the operation unit 24 includes a user interface such as a keyboard and a mouse. The operation unit 24 notifies the control unit 20 of operation information by the user.

音声入力部２５はマイクロフォンである。制御部２０は音声入力部２５から音声を入力する。制御部２０は、音声入力部２５が入力した音声を音声認識によってテキスト化することが可能である。 The voice input unit 25 is a microphone. The control unit 20 inputs voice from the voice input unit 25. The control unit 20 can convert the voice input by the voice input unit 25 into text by voice recognition.

通信部２６は、ＬＡＮケーブルと接続可能なネットワークカードを含んで公衆網４２に接続しているか、又は基地局４５に接続する通信規格に基づく無線通信モジュール及びＡＰ４３への接続に対応する無線通信モジュールを含む。制御部２０は通信部２６により、通信媒体４経由でサーバ装置１と通信接続が可能である。 The communication unit 26 includes a network card that can be connected to a LAN cable and is connected to the public network 42, or a wireless communication module based on a communication standard connected to the base station 45 and a wireless communication module corresponding to connection to the AP 43. including. The control unit 20 can communicate with the server device 1 via the communication medium 4 by the communication unit 26.

そしてクライアント装置２、３の内、一般ユーザが使用するクライアント装置２では、クライアントプログラム２Ｐにより、サーバ装置１から提供されるネットワークコミュニティ１００上の掲示板、チャットルームへの接続インタフェース（ＧＵＩ）が提供される。ネットワークコミュニティ１００は例えば、ユーザＤが提供している各種商品について公衆網４１に広く公開されている情報提供サイトから導かれるアンケートコミュニティであり、ログイン情報を有しているユーザの端末装置２のみが通信接続することが可能としてある。ユーザＤのような事業者が使用するクライアント装置３では、アンケートコミュニティにおける種々の情報（アンケート、投票等）の集計結果が提示されるインタフェースが提供される。またクライアント装置３では、ネットワークコミュニティ１００を提供するサーバ装置１の運営者との間で後述するような情報交換を実現するインタフェースが含まれる。 Among the client devices 2 and 3, the client device 2 used by a general user provides a connection interface (GUI) to the bulletin board and chat room on the network community 100 provided from the server device 1 by the client program 2P. The The network community 100 is, for example, a questionnaire community derived from an information providing site that is widely disclosed on the public network 41 for various products provided by the user D, and only the terminal device 2 of the user having login information is provided. Communication connection is possible. The client device 3 used by the business operator such as the user D is provided with an interface for presenting the total results of various information (questionnaire, voting, etc.) in the questionnaire community. Further, the client device 3 includes an interface for realizing information exchange as will be described later with the operator of the server device 1 that provides the network community 100.

図３は、ユーザ情報ＤＢ１１１の内容例を示す説明図である。ユーザ情報ＤＢ１１１には、ネットワークコミュニティ１００におけるユーザを相互に識別するユーザ識別情報（ユーザＩＤ）がユーザ名（表示名）、ログイン情報（パスワード等）と対応付けて１つのレコードとして記憶される。つまりユーザ情報ＤＢ１１１は、発信情報Ｖの発信者を識別する情報のデータベースに対応する。なおログイン情報は、ネットワークコミュニティ１００へのログイン情報である。図３の説明図に示す例では、ユーザは一般ユーザと事業者ユーザとに区別されて分別可能なユーザ識別情報が夫々付与されている。図３の例では、一般ユーザには先頭が「０（ゼロ）」で始まる５桁の通し番号であるユーザ識別情報が付与され、事業者ユーザには８万番台の５桁の通し番号であるユーザ識別情報が付与されている。一般ユーザであるユーザＡ，Ｂ，Ｃ，Ｆには更に図３の説明図に示す例のように、ユーザ情報ＤＢ１１１は電子メールアドレス、住所等の連絡先情報、更にはログイン履歴等が記憶されてもよい。そして事業者ユーザであるユーザＤ，Ｅに対しても図３の説明図に示す例のように、電子メールアドレス、住所等の連絡先情報、更にはログイン履歴等が記憶されてもよい。更に１レコードには、ユーザの属性情報（性別、年齢（生年月日）、嗜好に関するアンケート結果）が共に記憶されていてもよい。 FIG. 3 is an explanatory diagram showing an example of the contents of the user information DB 111. In the user information DB 111, user identification information (user ID) for mutually identifying users in the network community 100 is stored as one record in association with a user name (display name) and login information (password, etc.). That is, the user information DB 111 corresponds to a database of information for identifying the sender of the transmission information V. The login information is login information to the network community 100. In the example shown in the explanatory diagram of FIG. 3, user identification information that can be distinguished and classified into general users and business users is assigned to the users. In the example of FIG. 3, user identification information that is a 5-digit serial number starting with “0 (zero)” is given to a general user, and a user identification that is a 5-digit serial number in the 80,000 range is assigned to a business user. Information is given. The users A, B, C, and F, which are general users, further store contact information such as e-mail addresses and addresses, as well as login histories, as in the example shown in the explanatory diagram of FIG. May be. Further, as in the example shown in the explanatory diagram of FIG. 3, contact information such as an e-mail address and an address, a login history, and the like may be stored for the users D and E who are business users. Furthermore, in one record, user attribute information (gender, age (date of birth), and preference questionnaire results) may be stored together.

ネットワークコミュニティ１００上での発信情報Ｖ、例えばチャット上での発言、掲示板への書き込み、商品レビュー、コメントは、発信情報ＤＢ１１２にその都度記憶される。図４は、発信情報ＤＢ１１２の内容例を示す説明図である。発信情報ＤＢ１１２には、発信情報Ｖの内容を示すテキストデータ、各発信情報Ｖを相互に識別する発信情報識別情報、及び夫々の発信者を識別するユーザ識別情報が対応付けて記憶される。そして図４に示すように、発信情報ＤＢ１１２には発信時刻（書き込み、投稿時刻）が対応付けて記憶されてもよい。 The transmission information V on the network community 100, for example, the message on chat, the writing on the bulletin board, the product review, and the comment are stored in the transmission information DB 112 each time. FIG. 4 is an explanatory diagram showing an example of the contents of the transmission information DB 112. In the transmission information DB 112, text data indicating the contents of the transmission information V, transmission information identification information for mutually identifying each transmission information V, and user identification information for identifying each sender are stored in association with each other. As shown in FIG. 4, the transmission information DB 112 may store the transmission time (writing, posting time) in association with each other.

次に、サーバ装置１にてリクエストにより所望の人物像に合致する人物のユーザ識別情報を抽出する過程について説明する。図５は、サーバ装置１による抽出処理の手順の一例を示すフローチャートである。図５のフローチャートに示す処理手順は例えば、次の場合に開始される。まず事業者ユーザであるユーザＤがクライアント装置３からネットワークコミュニティ１００内にログインする。クライアント装置３の表示部３３に表示される事業者用のログイン後のトップページには「抽出依頼」を受け付けるためのインタフェース表示が含まれ、この「抽出依頼」が選択された場合に開始される。又はネットワークコミュニティ１００の管理者宛てに、テキストコンテンツＳと共に抽出依頼を所定のフォーマットの電子メールが送信された場合にこれを受信するとサーバ装置１が以下の処理を開始してもよい。 Next, a process of extracting user identification information of a person who matches a desired person image by a request in the server apparatus 1 will be described. FIG. 5 is a flowchart illustrating an example of the procedure of extraction processing by the server device 1. The processing procedure shown in the flowchart of FIG. 5 is started in the following case, for example. First, a user D who is a business user logs in to the network community 100 from the client device 3. The top page after log-in for the business operator displayed on the display unit 33 of the client device 3 includes an interface display for accepting an “extraction request”, and is started when this “extraction request” is selected. . Alternatively, when an e-mail having a predetermined format is transmitted together with the text content S to the administrator of the network community 100, the server apparatus 1 may start the following processing upon receiving this e-mail.

クライアント装置３の表示部３３には、「抽出依頼」を受け付けるインタフェースが表示される（ステップＳ３０１）。インタフェースは例えばＷｅｂページであり、アップロードするテキストコンテンツＳの選択アイコンが含まれる。その他後述するような詳細な設定ページへのリンク（アイコン）が含まれてもよい。 An interface for accepting an “extraction request” is displayed on the display unit 33 of the client device 3 (step S301). The interface is, for example, a Web page, and includes a selection icon for the text content S to be uploaded. In addition, a link (icon) to a detailed setting page as described later may be included.

制御部３０は、表示されているインタフェースにてテキストコンテンツＳの選択を受け付ける（ステップＳ３０２）。制御部３０は、選択されたテキストコンテンツＳを一般ユーザの抽出要求と共にサーバ装置１へ通信部３６からアップロードする（ステップＳ３０３）。アップロードはテキストコンテンツＳのデータそのものでもよいし、テキストコンテンツＳへのリンク情報であってもよい。ステップＳ３０３にてクライアント装置３における処理は一旦終了する。 The control unit 30 accepts selection of the text content S through the displayed interface (step S302). The control unit 30 uploads the selected text content S to the server device 1 from the communication unit 36 together with a general user extraction request (step S303). The upload may be data of the text content S itself or link information to the text content S. In step S303, the processing in the client device 3 is temporarily terminated.

サーバ装置１の制御部１０は、通信部１３によりテキストコンテンツＳを受け付ける（ステップＳ１０１）。ステップＳ１０１において制御部１０は、クライアント装置３から送信されたテキストコンテンツＳのデータが、記事そのものの文書データではなく、Ｗｅｂページへのリンク情報である場合には、リンク先から文書データをダウンロードする。また、クライアント装置３から送信されたテキストコンテンツＳのデータは音声、動画であってもよく制御部１０はここでテキスト化する処理を事前に行なうようにしてもよい。 The control unit 10 of the server device 1 receives the text content S through the communication unit 13 (step S101). In step S101, when the data of the text content S transmitted from the client device 3 is not the document data of the article itself but link information to the Web page, the control unit 10 downloads the document data from the link destination. . Further, the data of the text content S transmitted from the client device 3 may be audio or moving images, and the control unit 10 may perform the process of converting the text here in advance.

制御部１０は、受け付けたテキストコンテンツＳに対して形態素解析を実施し（ステップＳ１０２）、テキストコンテンツＳの特徴を示す情報を導出し（ステップＳ１０３）、記憶部１１又は一時記憶部１２に記憶しておく（ステップＳ１０４）。ステップＳ１０３における特徴を示す情報は例えば、頻出名詞及びその出現回数である。 The control unit 10 performs morphological analysis on the received text content S (step S102), derives information indicating the characteristics of the text content S (step S103), and stores the information in the storage unit 11 or the temporary storage unit 12. (Step S104). The information indicating the characteristics in step S103 is, for example, frequent nouns and the number of appearances thereof.

次に制御部１０は、記憶部１１のユーザ情報ＤＢ１１１から一般ユーザに対応する複数のユーザ識別情報を取得し（ステップＳ１０５）、所定の条件に従って１つずつ選択する（ステップＳ１０６）。所定の条件とは例えば、ユーザ情報ＤＢ１１１に記録されているユーザ識別情報の全て、又は所定のグループ（掲示板、チャットルーム）に対応付けて記録されているユーザ識別情報等の条件である。その他、記録されている全期間若しくは直近所定期間（例えば１か月、３ヶ月等）を通して発信情報の数が所定数以上であるユーザ、又は直近１ヶ月に発信を行なっているユーザのみのユーザ識別情報等の条件であってもよい。更には所定のワードを含む発信情報を発信しているユーザ、所定期間にて所定ワードを含む発信情報を発信しているユーザのユーザ識別情報の条件であってもよい。 Next, the control unit 10 acquires a plurality of pieces of user identification information corresponding to general users from the user information DB 111 of the storage unit 11 (step S105), and selects one by one according to a predetermined condition (step S106). The predetermined conditions are, for example, conditions such as all user identification information recorded in the user information DB 111 or user identification information recorded in association with a predetermined group (bulletin board, chat room). In addition, the user identification of only users who have the number of outgoing information over a predetermined period over the entire recorded period or the most recent predetermined period (for example, 1 month, 3 months, etc.) It may be a condition such as information. Furthermore, it may be a condition of user identification information of a user who is transmitting transmission information including a predetermined word or a user who is transmitting transmission information including a predetermined word in a predetermined period.

制御部１０は、選択した１つのユーザ識別情報によって識別されるユーザを発信者とする発信情報Ｖ群を発信情報ＤＢ１１２から抽出し（ステップＳ１０７）、抽出した発信情報全てに対して形態素解析を実施する（ステップＳ１０８）。制御部１０は、形態素解析の結果に基づき抽出された発信情報Ｖ群に基づき、選択されたユーザ識別情報のユーザから発信される発信情報の特徴を示す情報を導出し（ステップＳ１０９）、記憶部１１又は一時記憶部１２に記憶しておく（ステップＳ１１０）。ステップＳ１０９における特徴を示す情報とは、ステップＳ１０３で導出される特徴の情報と以後のステップＳ１１１にて比較することが可能なものであり、例えば場合頻出名詞及びその出現回数である。 The control unit 10 extracts the transmission information V group having the user identified by the selected single user identification information as the sender from the transmission information DB 112 (step S107), and performs morphological analysis on all the extracted transmission information. (Step S108). The control unit 10 derives information indicating the characteristics of the transmission information transmitted from the user of the selected user identification information based on the transmission information V group extracted based on the result of the morphological analysis (step S109), and the storage unit 11 or the temporary storage unit 12 (step S110). The information indicating the feature in step S109 can be compared with the feature information derived in step S103 in the subsequent step S111. For example, it is a frequent noun and the number of appearances.

制御部１０は、テキストコンテンツＳの特徴を示す情報と抽出した発信情報の特徴を示す情報とを比較し、類似度を算出し、ユーザ識別情報と対応付けて算出した類似度を記憶部１１又は一時記憶部１２に記憶する（ステップＳ１１１）。類似度は例えば、所定の出現回数以上の名詞を各成分とするベクトル化したもの同士でコサイン類似度を算出するか、又はユークリッド距離を算出して求める。類似度の算出はこれに限られず、自然言語処理にて言葉同士の類似度、関連度等を判断するための公知発明を利用してもよい。 The control unit 10 compares the information indicating the characteristics of the text content S with the information indicating the characteristics of the extracted transmission information, calculates the similarity, and stores the similarity calculated in association with the user identification information in the storage unit 11 or It memorize | stores in the temporary memory part 12 (step S111). For example, the degree of similarity is obtained by calculating a cosine similarity between vectors obtained by using nouns having a predetermined number of appearances or more as components, or by calculating a Euclidean distance. The calculation of the degree of similarity is not limited to this, and a known invention for determining the degree of similarity and the degree of association between words in natural language processing may be used.

次に制御部１０は、前記所定の条件に合致するユーザのユーザ識別情報を全て選択したか否かを判断する（ステップＳ１１２）。未選択のユーザ識別情報があると判断された場合（Ｓ１１２：ＮＯ）、制御部１０は処理をステップＳ１０６へ戻し、所定の条件に従って次のユーザ識別情報を選択する（Ｓ１０６）。 Next, the control unit 10 determines whether or not all the user identification information of the user that matches the predetermined condition has been selected (step S112). When it is determined that there is unselected user identification information (S112: NO), the control unit 10 returns the process to step S106, and selects the next user identification information according to a predetermined condition (S106).

ステップＳ１１２にて全て選択したと判断された場合（Ｓ１１２：ＹＥＳ）、制御部１０は、類似度が高い順に、該類似度が対応付けられているユーザ識別情報をソートする（ステップＳ１１３）。制御部１０は、類似度が高い順にソートしたユーザ識別情報を例えば上位３０件までに絞り込む等した抽出結果をクライアント装置３へ向けて通信部１３から送信し（ステップＳ１１４）、抽出処理を終了する。ステップＳ１１４の抽出結果の送信は、事業者ユーザのログイン後のトップページから参照可能なＷｅｂページ（そのリンク情報）として作成されてから送信されるか、作成されたレポート又はＷｅｂページへの事業者ユーザ宛ての電子メールにてリンク情報として送信されるなど種々の方法が考えられる。 If it is determined in step S112 that all have been selected (S112: YES), the control unit 10 sorts the user identification information associated with the similarities in descending order of similarity (step S113). The control unit 10 transmits the extraction result obtained by narrowing down the user identification information sorted in descending order of similarity to, for example, the top 30 items from the communication unit 13 to the client device 3 (step S114), and ends the extraction process. . The transmission of the extraction result in step S114 is transmitted after being created as a web page (link information) that can be referred to from the top page after the log-in of the business user, or the business to the created report or web page Various methods are conceivable, such as being transmitted as link information by e-mail addressed to the user.

クライアント装置３では、制御部３０が通信部３６から抽出結果を受信し、受信した抽出結果を出力し（ステップＳ３０４）、処理を終了する。ステップＳ３０４における抽出結果の出力は、表示部３３における表示、又は通信部３６からの印刷媒体への印刷データの送信及びこれによる印刷出力が含まれる。また音声出力部を用いた読み上げ音声の出力が含まれてもよい。 In the client device 3, the control unit 30 receives the extraction result from the communication unit 36, outputs the received extraction result (step S304), and ends the process. The output of the extraction result in step S304 includes display on the display unit 33, transmission of print data from the communication unit 36 to the print medium, and print output thereby. Moreover, the output of the reading voice using the voice output unit may be included.

なおステップＳ１０５〜ステップＳ１１０及びステップＳ１１２の処理は、ネットワークコミュニティ１００上で所定の期間が経過する都度、バッチ処理によって実施しておくようにしてもよい。バッチ処理は例えば１日に一度、発信の頻度が少ない時間帯等、通信負荷及び処理負荷が少ない時間帯に行なわれるとよい。この場合、ユーザ識別情報に対応付けて特徴を示す情報が記憶部１１に記憶され、バッチ処理により更新される。そして制御部１０はテキストコンテンツＳを受け付けた場合に、ユーザ識別情報を１つずつ選択し、選択したユーザ識別情報に対応付けて記憶されている発信情報の特徴を示す情報とテキストコンテンツＳの特徴を示す情報との類似度を全ユーザに対して算出する。これにより、例えば異なるテキストコンテンツＳを受け付ける都度行なわれるユーザ毎の特徴を示す情報の導出処理の重複を回避することでサーバ装置１における処理負荷を軽減することができ、更にテキストコンテンツＳのアップロードからの応答が迅速化する。 Note that the processes in steps S105 to S110 and step S112 may be performed by batch processing each time a predetermined period of time elapses on the network community 100. The batch processing may be performed once a day, for example, in a time zone where the communication load and processing load are low, such as a time zone where transmission frequency is low. In this case, information indicating characteristics in association with the user identification information is stored in the storage unit 11 and updated by batch processing. When the text content S is received, the control unit 10 selects user identification information one by one, information indicating the characteristics of the transmission information stored in association with the selected user identification information, and the characteristics of the text content S. The similarity with the information indicating is calculated for all users. Thereby, for example, it is possible to reduce the processing load on the server device 1 by avoiding duplication of the derivation process of the information indicating the feature for each user that is performed each time the different text content S is received. Speeds up the response.

また、テキストコンテンツＳの受け付け（ステップＳ１０１）と、発信情報ＤＢ１１２の作成との順序は、上述した例には限られない。図５のフローチャートにおいては発信情報ＤＢ１１２が作成されてから、即ち発信情報Ｖが蓄積されてから、テキストコンテンツＳが受け付けられるという順序であった。しかしながら、予めテキストコンテンツＳが受け付けられて記憶部１１に記憶された状態としておき、制御部１０はその後発信される発信情報ＶについてテキストコンテンツＳとの類似度の算出を行なうようにしてもよい。 In addition, the order in which the text content S is accepted (step S101) and the transmission information DB 112 is created is not limited to the example described above. In the flowchart of FIG. 5, the text content S is accepted after the transmission information DB 112 is created, that is, after the transmission information V is accumulated. However, the text content S may be received in advance and stored in the storage unit 11, and the control unit 10 may calculate the similarity with the text content S for the transmission information V transmitted thereafter.

（実施例）
上述した実施の形態について具体例を挙げて説明する。
図６は、発信情報Ｖの内容例を示す説明図である。図６は、図１に示したネットワークコミュニティ１００の具体例におけるユーザＡ，Ｂ，Ｃの発信情報（掲示板での会話）Ｖを時系列に示している。図６に示す例では、最近食べたものとして「パクチー」が挙げられており、昨今ブームとなっている「パクチー」に関する発信情報Ｖが示されている。 (Example)
The embodiment described above will be described with specific examples.
FIG. 6 is an explanatory diagram showing an example of the content of the transmission information V. FIG. 6 shows transmission information (conversations on the bulletin board) V of users A, B, and C in the specific example of the network community 100 shown in FIG. 1 in time series. In the example shown in FIG. 6, “Pakchie” is listed as something recently eaten, and the transmission information V regarding “Pakchie”, which has recently become a boom, is shown.

図７は、テキストコンテンツＳの内容例を示す説明図である。テキストコンテンツＳは例えば、「パクチー」についての記事であり、特に家庭においてパクチーを用いた料理を作って食べることの魅力についての記事である。ブームとなっているパクチーの食材を利用した独特な商品（例えば料理キット）を開発する事業者ユーザであるユーザＤは、テキストコンテンツＳに興味を示す人物へ商品開発への協力を依頼したいと考えているとする。本実施の形態に係る情報処理システムを利用することでユーザＤは、事業者ユーザとしてネットワークコミュニティ１００へログインした後のトップページから「抽出依頼」を選択してテキストコンテンツＳをアップロードさせる操作を行なえばよい。 FIG. 7 is an explanatory diagram illustrating an example of the content of the text content S. The text content S is, for example, an article about “Pakchie”, and in particular, an article about the appeal of making and eating dishes using Pakchi at home. User D who is a business user who develops a unique product (for example, a cooking kit) using the ingredients of booming Pakuchi wants to ask a person who is interested in text content S to cooperate in product development. Suppose that By using the information processing system according to the present embodiment, user D can perform an operation of uploading text content S by selecting “extraction request” from the top page after logging in to network community 100 as a business user. That's fine.

図６に示した発信情報Ｖ群が発信情報ＤＢ１１２に記録されている状態で、図７に示したテキストコンテンツＳがアップロードされた場合、サーバ装置１ではテキストコンテンツＳ（及びリンク先）からは以下のような頻出名詞及びその出現回数のリストが特徴を示す情報として導出される。
（パクチー，２５回）
（うちパク，１０回）
（栽培，５回）
（料理，３回） When the text information S shown in FIG. 7 is uploaded in a state where the transmission information V group shown in FIG. 6 is recorded in the transmission information DB 112, the server apparatus 1 starts from the text content S (and the link destination) as follows. A list of frequent nouns and the number of appearances thereof is derived as information indicating characteristics.
(Pakchi, 25 times)
(Park, 10 times)
(Cultivation, 5 times)
(Cooking, 3 times)

そして図６に示した発信情報Ｖ群の内、ユーザ識別情報が「００００３」であるユーザ（ユーザ名「ｕｓｅｒＢ」）を発信者とする発信情報Ｖ群（図６に示す発信情報Ｖ以外も含む）からは以下のような頻出名詞及びその出現回数のリストが特徴を示す情報として導出される。
（パクチー，１１０回）
（うちパク，２５回）
（栽培，１０回）
（料理，５回） Then, in the transmission information V group shown in FIG. 6, the transmission information V group that includes the user whose user identification information is “00003” (user name “userB”) as the sender (including other than the transmission information V shown in FIG. 6). ), The following list of frequent nouns and their number of appearances is derived as characteristic information.
(Pakchie, 110 times)
(Park, 25 times)
(Cultivation, 10 times)
(Cooking, 5 times)

上述の例では頻出名詞が重複しているため、例えば出現回数が５回以上の頻出名詞によるコサイン類似度で算出される類似度は「１」となる。 In the above example, frequent nouns are duplicated, and thus the similarity calculated by the cosine similarity based on the frequent nouns having the appearance count of 5 or more is “1”, for example.

実施例においてサーバ装置１の制御部１０は、図５のフローチャートに示したステップＳ１１４における抽出処理の結果の送信の際に、クライアント装置３にて表示するため表示情報（画面データ）を作成し、抽出処理の結果としてその表示情報をクライアント装置３宛てに送信（出力）する。図８は、抽出結果を表示させる表示情報を制御部１０が作成する作成処理の手順の一例を示すフローチャートである。 In the embodiment, the control unit 10 of the server device 1 creates display information (screen data) to be displayed on the client device 3 when transmitting the result of the extraction process in step S114 shown in the flowchart of FIG. As a result of the extraction process, the display information is transmitted (output) to the client apparatus 3. FIG. 8 is a flowchart illustrating an example of a creation process procedure in which the control unit 10 creates display information for displaying the extraction result.

制御部１０は、上述の図５のフローチャートに示した手順の内、ステップＳ１１３にて類似度が高い順にソートされたユーザ識別情報を例えば上位３０件までに絞り込む（ステップＳ４０１）。 The control unit 10 narrows down, for example, the top 30 user identification information items sorted in descending order of similarity in step S113 in the procedure shown in the flowchart of FIG. 5 (step S401).

制御部１０は、絞り込まれたユーザ識別情報から、類似度が高い順に１つずつ選択し（ステップＳ４０２）、選択されているユーザ識別情報のユーザ名をユーザ情報ＤＢ１１１から読み出して表示情報へ文字情報（又は画像）として出力する（ステップＳ４０３）。次に制御部１０は、選択されているユーザ識別情報に対応付けて記憶されている類似度を表す数値情報を表示情報へ文字情報（又は画像）として出力する（ステップＳ４０４）。更に制御部１０は、選択されているユーザ識別情報が発信した発信情報Ｖ群の内の所定数の発信情報Ｖを主要コメントとして選抜し、表示情報へ出力する（ステップＳ４０５）。 The control unit 10 selects from the narrowed-down user identification information one by one in descending order of similarity (step S402), reads the user name of the selected user identification information from the user information DB 111, and displays the character information in the display information. (Or an image) is output (step S403). Next, the control unit 10 outputs numerical information indicating the similarity stored in association with the selected user identification information to the display information as character information (or an image) (step S404). Further, the control unit 10 selects a predetermined number of transmission information V in the transmission information V group transmitted by the selected user identification information as a main comment, and outputs it as display information (step S405).

ステップＳ４０５において制御部１０はまず、頻出名詞として特定された名詞をより多く含む発信情報Ｖを選抜し、更にその内でも、異なる頻出名詞をより多く含む発信情報Ｖを選抜する。選抜した結果が所定数以内である場合には、制御部１０は選抜した結果を主要コメントとしてもよいし、所定数を超える数分だけ選抜された場合には、例えば発信時刻が最近の発信情報Ｖを優先的に選抜するようにしてもよい。このように制御部１０は、抽出された発信情報Ｖに対し、頻出名詞数、頻出名詞種類数及び発信時刻に応じて更に優先順位を付与し、上位の所定数分だけを主要コメントとして選抜するとよい。 In step S405, the control unit 10 first selects transmission information V including more nouns identified as frequent nouns, and further selects transmission information V including more different frequent nouns. When the selected result is within the predetermined number, the control unit 10 may use the selected result as a main comment. When the selected result is more than the predetermined number, for example, the outgoing time is the latest outgoing information. You may make it select V preferentially. As described above, the control unit 10 gives priority to the extracted transmission information V according to the number of frequent nouns, the number of frequent nouns, and the transmission time, and selects only a predetermined upper number as main comments. Good.

制御部は上位から所定数のユーザ識別情報を選択したか否かを判断し（ステップＳ４０６）、選択していないと判断された場合（Ｓ４０６：ＮＯ）、処理をステップＳ４０２へ戻して次に類似度が高いユーザ識別情報を選択する。 The control unit determines whether or not a predetermined number of user identification information has been selected from the top (step S406). If it is determined that the user identification information has not been selected (S406: NO), the process returns to step S402, and then similar User identification information with a high degree is selected.

ステップＳ４０６にて選択したと判断された場合（Ｓ４０６：ＹＥＳ）、制御部１０は、表示情報をＷｅｂページのデータ、又はｐｄｆ等のイメージ文書のデータとして出力し（ステップＳ４０７）、作成処理を終了する。 If it is determined in step S406 that the selection has been made (S406: YES), the control unit 10 outputs the display information as Web page data or image document data such as pdf (step S407), and ends the creation process. To do.

図９は、制御部１０により作成された表示情報に基づきクライアント装置３の表示部３３にて表示される画面例を示す説明図である。図９に示す例は、図７のテキストコンテンツＳに基づき図６に示した発信情報Ｖ群を含むネットワークコミュニティ１００上の発信情報Ｖに対して行なった抽出処理の結果（Ｓ１１４）に対応する。そして図８のフローチャートに示した手順により抽出処理の結果として作成される表示情報に基づき表示される画面の一例である。図９の例に示すように、表示情報に基づき表示される画面には、類似度が高い順にユーザ識別情報を表示させ、ユーザ識別情報毎に、類似度の算出結果、及び類似度に寄与する頻出名詞を含む主要な発信情報Ｖが複数含まれている。 FIG. 9 is an explanatory diagram illustrating a screen example displayed on the display unit 33 of the client device 3 based on the display information created by the control unit 10. The example shown in FIG. 9 corresponds to the result of extraction processing (S114) performed on the transmission information V on the network community 100 including the transmission information V group shown in FIG. 6 based on the text content S of FIG. And it is an example of the screen displayed based on the display information produced as a result of an extraction process by the procedure shown in the flowchart of FIG. As shown in the example of FIG. 9, on the screen displayed based on the display information, the user identification information is displayed in descending order of similarity, and the calculation result of similarity and the similarity are contributed for each user identification information. A plurality of main transmission information V including frequent nouns are included.

図９に示す例において最も類似度が高いとして抽出されたユーザ識別情報が「００００３」（ユーザ名「ｕｓｅｒＢ」）であるユーザＢは、図６に示した会話例から分かるように、家庭においてパクチーを用いた料理に意欲を持つユーザである。更にユーザＢは、図６に示している会話例から、後に「今日は友達とうちパク！」と「うちパク」を実践していることが窺える。つまり、図７のテキストコンテンツＳに興味を示しそうなユーザであり、事業者ユーザであるユーザＤがイメージする人物像に合致していると言える。このようにイメージする人物像に合いそうなテキストコンテンツＳをアップロードするという容易な操作によって、適切な人物のユーザ識別情報の抽出が実現される。この際にユーザＤは、属性情報に対応するキーワードの登録等の操作を行なう必要がない。 As shown in the conversation example shown in FIG. 6, the user B whose user identification information extracted with the highest similarity in the example shown in FIG. 9 is “00003” (user name “userB”) It is a user who has an ambition to cook using food. Furthermore, from the conversation example shown in FIG. 6, the user B can later understand that “Today is a friend and my house!” And “My house”. That is, it can be said that the user is likely to be interested in the text content S of FIG. 7 and matches the person image imaged by the user D who is a business user. Thus, extraction of the user identification information of an appropriate person is realized by an easy operation of uploading the text content S that is likely to match the person image to be imaged. At this time, the user D does not need to perform an operation such as registration of a keyword corresponding to the attribute information.

なお図５のフローチャートの説明及び図６〜図９の具体例を参照した説明では、特徴を示す情報を頻出名詞及びその出現回数とした。しかしながら形態素解析の実施結果に基づき導出される特徴の情報はこれに限らない。例えば後述するように、頻出名詞の関連語を登録してある関連辞書を参照して関連語をも含む言葉を各成分としたベクトルを求めて特徴を示す情報としてもよい。又は、頻出名詞のＴＦ−ＩＤＦ（Term Frequency-Inverse Document Frequency ）を算出して特徴を示す情報としてもよい。その他自然言語処理の技術にて行なわれている方法で得られる情報であってもよい。 In the description of the flowchart of FIG. 5 and the description with reference to the specific examples of FIGS. However, the feature information derived based on the result of the morphological analysis is not limited to this. For example, as will be described later, information indicating a feature may be obtained by obtaining a vector including each component of a word including a related word by referring to a related dictionary in which the related word of the frequent noun is registered. Or it is good also as information which calculates TF-IDF (Term Frequency-Inverse Document Frequency) of a frequent noun, and shows the characteristic. Other information obtained by a method performed by a natural language processing technique may be used.

（変形例１）
図４には、発信情報ＤＢ１１２は、発信情報Ｖに発信時刻を対応付けて記憶する例を示した。変形例１においてサーバ装置１の制御部１０による抽出処理では、この発信時刻を用いる。図１０は、サーバ装置１による抽出処理の手順の他の一例を示すフローチャートである。また図１０のフローチャートに示す処理手順の内、図５のフローチャートに示す処理手順と共通する手順には同一のステップ番号を付して詳細な説明を省略する。更に図１０では、クライアント装置３における処理手順については同一であるので図示及び説明を省略する。 (Modification 1)
FIG. 4 shows an example in which the transmission information DB 112 stores the transmission information V in association with the transmission time. In the first modification, the transmission time is used in the extraction process performed by the control unit 10 of the server device 1. FIG. 10 is a flowchart illustrating another example of the procedure of the extraction process performed by the server device 1. Also, among the processing procedures shown in the flowchart of FIG. 10, procedures common to the processing procedures shown in the flowchart of FIG. Furthermore, in FIG. 10, since the processing procedure in the client apparatus 3 is the same, illustration and description are omitted.

変形例１においてサーバ装置１の制御部１０は、ステップＳ１０７にて発信情報ＤＢ１１２から抽出された発信情報Ｖ群から更に、対応付けられて記憶されている発信時刻に基づき、直近の所定期間に発信された発信情報Ｖ群に絞りこむ（ステップＳ１２１）。直近の所定期間とは例えば、テキストコンテンツＳのアップロード時点（ステップＳ１０１にて受け付けた日時）から３ヶ月等である。なお「直近」とは、厳密にテキストコンテンツＳを受け付けた時刻を基準とするものとは限らず、過去の発言を除外するという意味で解釈されるべきである。 In the first modification, the control unit 10 of the server device 1 further transmits a call from the call information V group extracted from the call information DB 112 in step S107 in the most recent predetermined period based on the call time stored in association with it. The transmitted transmission information V group is narrowed down (step S121). The most recent predetermined period is, for example, three months from the time of uploading the text content S (the date and time accepted in step S101). Note that “most recent” is not necessarily strictly based on the time when the text content S is received, and should be interpreted in the sense of excluding past utterances.

制御部１０は、絞り込んだ発信情報Ｖ群に含まれる発信情報Ｖ夫々について、対応付けられている発信時刻に基づいて時系列に最近のものほど高い数値となる重み付け係数を付与する（ステップＳ１２２）。ステップＳ１２２において制御部１０は例えば、直近１週間以内に発信された発信情報Ｖには係数１．０、直近１週間超から２週間以内に発信された発信情報Ｖには係数０．９を付与する。そして直近２週間超から３週間以内に発信された発信情報Ｖには係数０．８を付与し、直近３週間超から３か月以内に発信された発信情報Ｖには均等に係数０．５を付与するなどしてもよい。 The control unit 10 assigns a weighting coefficient that becomes a higher numerical value in the time series based on the associated transmission time for each transmission information V included in the narrowed transmission information V group (step S122). . In step S122, for example, the control unit 10 assigns a coefficient of 1.0 to the transmission information V transmitted within the last one week and a coefficient of 0.9 to the transmission information V transmitted within the last one week to within two weeks. To do. A factor of 0.8 is given to outgoing information V sent within the last two weeks to three weeks, and a constant of 0.5 is assigned to outgoing information V sent within the last three weeks to three months. May be given.

そしてステップＳ１０９における特徴を示す情報の導出において制御部１０は、発信情報Ｖから形態素解析により得られた言葉、例えばその名詞の出現回数に、ステップＳ１２２で付与された重み付け係数を乗算し、出現回数の総計に重み付けを作用させる。例えば直近１週間以内、直近１週間超から２週間以内、直近２週間超から３週間以内、及び直近３週間超から３ヶ月以内夫々に１回ずつ「パクチー」を含む発信情報Ｖが発信されている場合、出現回数は「４」ではなく以下のように算出される。なお係数は上述の例（１．０，０．９，０．８，０，５）を用いるがこの限りではないことは勿論である。
（１×１．０＋１×０．９＋１×０．８＋１×０．５）＝３．２回
これにより、単に出現回数を発信情報Ｖの記憶が開始されてからの全期間で同一の重み付けで計数するよりも、できる限り最近の発信情報Ｖに含まれる言葉をより重く抽出して類似度を算出することが可能になる。この場合、テキストコンテンツＳがアップロードされたタイミングと同時期にテキストコンテンツＳに興味を示しそうな人物のユーザ識別情報を抽出することが可能になる。 In the derivation of the information indicating the characteristics in step S109, the control unit 10 multiplies the number of appearances of words, for example, nouns, obtained from the transmission information V by morphological analysis by the weighting coefficient given in step S122. A weight is applied to the total of For example, the transmission information V including “Pakchii” is sent once every one week within the last one week, within the last one week to two weeks, the last two weeks to three weeks, and the last three weeks to three months. If there is, the number of appearances is calculated as follows instead of “4”. Although the above-described example (1.0, 0.9, 0.8, 0, 5) is used as the coefficient, it is needless to say that the coefficient is not limited thereto.
(1 × 1.0 + 1 × 0.9 + 1 × 0.8 + 1 × 0.5) = 3.2 times As a result, the number of appearances is simply counted with the same weighting over the entire period after the transmission of the transmission information V is started. Rather than doing so, it is possible to calculate the degree of similarity by extracting words included in the latest outgoing information V as much as possible. In this case, it becomes possible to extract user identification information of a person who is likely to be interested in the text content S at the same time when the text content S is uploaded.

（変形例２）
変形例２では、サーバ装置１の制御部１０は複数のテキストコンテンツＳを受け付けて抽出処理を行なう。図１１は、サーバ装置１による抽出処理の手順の他の一例を示すフローチャートである。また図１１のフローチャートに示す処理手順の内、図５のフローチャートに示す処理手順と共通する手順には同一のステップ番号を付して詳細な説明を省略する。更に図１１では、クライアント装置３における処理手順については同一であるので図示及び説明を省略する。 (Modification 2)
In the second modification, the control unit 10 of the server device 1 receives a plurality of text contents S and performs an extraction process. FIG. 11 is a flowchart illustrating another example of the procedure of the extraction process performed by the server device 1. Also, among the processing procedures shown in the flowchart of FIG. 11, the steps common to the processing procedure shown in the flowchart of FIG. Furthermore, in FIG. 11, since the processing procedure in the client apparatus 3 is the same, illustration and description are omitted.

変形例２においてサーバ装置１の制御部１０は、複数のテキストコンテンツＳを受け付ける（ステップＳ１３１）。 In the second modification, the control unit 10 of the server device 1 accepts a plurality of text contents S (step S131).

複数のテキストコンテンツＳを受け付ける変形例２においては、ステップＳ１０３におけるテキストコンテンツＳの特徴を示す情報の導出に際し、自然言語処理で用いられる文書における特徴語を導出するＴＦ−ＩＤＦを用いることができる。この場合制御部１０は、ステップＳ１０９にて抽出された発信情報Ｖの特徴を示す情報を導出するに際し同様にＴＦ−ＩＤＦを用いて特徴語及びそのＴＦ−ＩＤＦを算出するとよい。なおＴＦ−ＩＤＦの算出は必須ではなく、複数のテキストコンテンツＳ夫々に対して頻出名詞及び出現回数を導出してもよいし、複数のテキストコンテンツＳを１つのテキストコンテンツＳとみなして頻出名詞及びその出現回数を導出するようにしてもよい。 In Modification 2 that accepts a plurality of text contents S, TF-IDF that derives feature words in a document used in natural language processing can be used in deriving information indicating the characteristics of text contents S in step S103. In this case, the control unit 10 may calculate the feature word and its TF-IDF using TF-IDF in the same manner when deriving information indicating the characteristics of the transmission information V extracted in step S109. The calculation of TF-IDF is not essential, and frequent nouns and the number of appearances may be derived for each of the plurality of text contents S, or the plurality of text contents S may be regarded as one text content S, The number of appearances may be derived.

変形例２では、ステップＳ１１０にて導出された発信者であるユーザ毎の発信情報Ｖ群の特徴を示す情報を記憶した後に制御部１０は、ステップＳ１３１で受け付けた複数のテキストコンテンツＳと、ステップＳ１１０にて記憶したユーザ毎の発信情報Ｖ群との類似度を算出する（ステップＳ１３２）。 In the second modification, after storing the information indicating the characteristics of the transmission information V group for each user who is the caller derived in step S110, the control unit 10 includes the plurality of text contents S received in step S131, and the step The degree of similarity with the transmission information V group for each user stored in S110 is calculated (step S132).

ステップＳ１３２において制御部１０は、上述したようにＴＦ−ＩＤＦを用いる場合には複数のテキストコンテンツＳから導出されるＴＦ−ＩＤＦと、ユーザ毎の発信情報Ｖ群から導出されるＴＦ−ＩＤＦとの間でコサイン類似度を算出するとよい。またステップＳ１３２において制御部１０は、ＴＦ−ＩＤＦを用いることなしに、ステップＳ１０３にて複数のテキストコンテンツＳ夫々について特徴を示す情報を導出した場合には、ステップＳ１３２では各々についてユーザ毎の発信情報Ｖ群との類似度を算出する。更に制御部１０は、ＴＦ−ＩＤＦを用いることなしに、ステップＳ１０３にて複数のテキストコンテンツＳを１つのテキストコンテンツとみなして特徴を示す情報を導出している場合には、ステップＳ１３２ではユーザ毎の発信情報Ｖ群から導出される特徴情報との類似度を１つ算出する。 In step S132, the control unit 10 uses the TF-IDF derived from the plurality of text contents S and the TF-IDF derived from the transmission information V group for each user when using TF-IDF as described above. The cosine similarity may be calculated between them. Further, in step S132, when the control unit 10 derives information indicating the characteristics of each of the plurality of text contents S in step S103 without using TF-IDF, in step S132, transmission information for each user is obtained for each. The degree of similarity with group V is calculated. Further, when the control unit 10 derives information indicating the characteristics by regarding the plurality of text contents S as one text content without using the TF-IDF in step S103, the control unit 10 determines each user in step S132. One degree of similarity with the feature information derived from the transmission information group V is calculated.

複数のテキストコンテンツＳは、内容（テーマ）がほぼ同一であるテキストコンテンツＳ同士でもよいし、内容が同一でないテキストコンテンツＳ同士であってもよい。例えば内容が同一である複数のテキストコンテンツＳとしては図７に示した「パクチー」についての記事と、同じように「パクチー」を家庭で食べることについての他の異なる記事との組み合わせである。内容が同一である複数のテキストコンテンツＳを用いる場合には、ＴＦ−ＩＤＦを用いてより重要な特徴語のみで発信情報Ｖ群との類似度を測定し、高精度に類似度を算出することが可能となる。 The plurality of text contents S may be text contents S having substantially the same content (theme), or may be text contents S having non-identical contents. For example, the plurality of text contents S having the same contents are a combination of an article about “Pak Chi” shown in FIG. 7 and another different article about eating “Pak Chi” at home. When using a plurality of text contents S having the same content, use TF-IDF to measure the similarity with the transmission information group V using only more important feature words, and calculate the similarity with high accuracy. Is possible.

内容が同一でないテキストコンテンツＳ同士とは例えば、図７に示した「パクチー」についての記事と、「タイ旅行」についての記事との組み合わせである。内容が同一でない複数のテキストコンテンツＳを用いる場合には、制御部１０は複数のテキストコンテンツＳを１つのテキストコンテンツとみなして頻出名詞及びその出現回数を導出して類似度を算出するか、又は複数のテキストコンテンツＳ夫々について類似度を算出してもよい。複数のテキストコンテンツＳ夫々について類似度を算出する場合には、制御部１０はいずれの類似度も所定の度合い以上で高いと判断されるユーザ識別情報を抽出するとよい。これにより、異なる記事のいずれにも興味を示すようなユーザを的確に絞り込んで抽出することが可能となる。 The text contents S whose contents are not the same are, for example, a combination of the article about “Pak Chi” and the article about “Thailand travel” shown in FIG. When using a plurality of text contents S whose contents are not identical, the control unit 10 regards the plurality of text contents S as one text content and derives frequent nouns and the number of appearances thereof to calculate the similarity, or The similarity may be calculated for each of the plurality of text contents S. When calculating the similarity for each of the plurality of text contents S, the control unit 10 may extract user identification information that is determined to be higher than a predetermined degree. This makes it possible to accurately narrow down and extract users who are interested in any of different articles.

このように複数のテキストコンテンツＳを受け付ける抽出処理により、より精度の高い類似度の算出が可能になったり、所望の人物像に合致するユーザのユーザ識別情報を的確に抽出することが可能になったりすることが期待される。 Thus, extraction processing that accepts a plurality of text contents S makes it possible to calculate the degree of similarity with higher accuracy and accurately extract user identification information of a user that matches a desired person image. It is expected that

（変形例３）
変形例３では、テキストコンテンツＳ及び発信情報Ｖ夫々について、実際に含まれる言葉（頻出名詞）のみならず、関連語も用いて類似度を算出して抽出処理を行なう。変形例３では記憶部１１又は外部装置に関連辞書が記憶されており、制御部１０から読み出しが可能である。図１２は、サーバ装置１による抽出処理の手順の他の一例を示すフローチャートである。また図１２のフローチャートに示す処理手順の内、図５のフローチャートに示す処理手順と共通する手順には同一のステップ番号を付して詳細な説明を省略する。更に図１２では、クライアント装置３における処理手順については同一であるので図示及び説明を省略する。 (Modification 3)
In Modification 3, for each of the text content S and the transmission information V, the extraction process is performed by calculating the similarity using not only the words actually included (frequent nouns) but also related words. In Modification 3, the related dictionary is stored in the storage unit 11 or an external device, and can be read from the control unit 10. FIG. 12 is a flowchart illustrating another example of the procedure of extraction processing by the server device 1. Of the processing procedures shown in the flowchart of FIG. 12, the same steps as those shown in the flowchart of FIG. 5 are assigned the same step numbers, and detailed description thereof is omitted. Furthermore, in FIG. 12, since the processing procedure in the client apparatus 3 is the same, illustration and description are omitted.

変形例３においてサーバ装置１の制御部１０は、ステップＳ１０７にて発信情報ＤＢ１１２から抽出された発信情報Ｖ群に対して形態素解析を実施し（Ｓ１０８）、発信情報Ｖ群に含まれる言葉の関連語を関連辞書から取り出す（ステップＳ１４１）。制御部１０は、形態素解析により得られる発信情報Ｖ群に含まれる言葉のみならず、関連語をも用いて発信情報Ｖ群の特徴を示す情報として導出する（ステップＳ１４２）。ステップＳ１４２について例えば制御部１０は、頻出名詞として関連語も同一の出現回数を対応付けて抽出するか、又はＴＦ−ＩＤＦを算出するに際し、関連語をも特徴語に含むようにしてもよい。 In the third modification, the control unit 10 of the server device 1 performs morphological analysis on the transmission information V group extracted from the transmission information DB 112 in step S107 (S108), and relates the words included in the transmission information V group. A word is extracted from the related dictionary (step S141). The control unit 10 derives information indicating the characteristics of the transmission information group V using not only words included in the transmission information group V obtained by morphological analysis but also related words (step S142). Regarding step S142, for example, the control unit 10 may extract related words as frequent nouns in association with the same number of appearances, or may include related words in feature words when calculating TF-IDF.

なおここでいう「関連語」は、同義語、共起語等を含む。例えば関連辞書は、「パクチー」の同義語の「コリアンダー」「香菜」等を関連語として含む。また関連辞書は、「パクチー」の関連語として共起語である「タイ料理」、「タイ」等を含んでもよい。 Note that “related words” here include synonyms, co-occurrence words, and the like. For example, the related dictionary includes “Coriander”, “Kana”, etc., which are synonyms of “Pakchi” as related words. The related dictionary may include co-occurrence words “Thai cuisine”, “Thai”, and the like as related words of “Pak Chi”.

そして制御部１０は、発信情報Ｖ群の特徴を示す情報を記憶した後（Ｓ１１０）、発信情報Ｖ群の関連語を用いてテキストコンテンツＳと発信情報Ｖ群との間の類似度を算出し、ユーザ識別情報と対応付けて算出した類似度を記憶部１１又は一時記憶部１２に記憶する（Ｓ１４３）。ステップＳ１４３にて具体的には、制御部１０は、テキストコンテンツＳにて所定の出現回数以上の出現する頻出名詞を各成分としてベクトル化したものと、発信情報Ｖ群に出現する頻出名詞及び関連語を各成分としてベクトル化したものとの間でコサイン類似度を算出する。ユークリッド距離を算出してもよい。 Then, after storing the information indicating the characteristics of the transmission information V group (S110), the control unit 10 calculates the similarity between the text content S and the transmission information V group using the related words of the transmission information V group. The similarity calculated in association with the user identification information is stored in the storage unit 11 or the temporary storage unit 12 (S143). Specifically, in step S143, the control unit 10 vectorizes the frequent nouns that appear more than a predetermined number of times in the text content S as components, the frequent nouns that appear in the transmission information group V, and the related items. The cosine similarity is calculated with the word vectorized as each component. The Euclidean distance may be calculated.

なお関連語については、ステップＳ１０３における受け付けたテキストコンテンツＳに対しても、テキストコンテンツＳに含まれる言葉の関連語を関連辞書から取り出して該テキストコンテンツＳの特徴を示す情報として用いるようにしてもよい。関連語の取り出しは、テキストコンテンツＳ及び発信情報Ｖ群のいずれか一方のみならず、両者に対して行なうようにしてもよい。 As for the related words, for the text content S received in step S103, the related words of the words included in the text content S are extracted from the related dictionary and used as information indicating the characteristics of the text content S. Good. The retrieval of related words may be performed not only on one of the text content S and the transmission information V group but on both.

このように関連語を用いた抽出処理により、本来は同義語であっても使われ方が異なるのみで類似の判断から除外されてしまうような関連語も考慮して類似度を求めたり、関連する他の用語が共通の場合により類似度を高く算出したりすることを可能とする。これにより、所望の人物像に合致するユーザのユーザ識別情報を的確に、精度良く抽出することが可能になる。 In this way, the extraction process using related words determines the degree of similarity in consideration of related words that are excluded from similar judgments even if they are essentially synonyms but are used differently. It is possible to calculate a higher degree of similarity when other terms are common. As a result, it is possible to accurately and accurately extract user identification information of a user that matches a desired person image.

なお、上述の変形例１〜３は、任意の２つ又は全部を組み合わせた形態としても実現することが可能である。 In addition, the above-described modified examples 1 to 3 can also be realized as a form in which any two or all of them are combined.

上述した実施の形態においてネットワークコミュニティ１００は、上述にて示したアンケートコミュニティのような参加者が限定されているものに限らず、Ｔｗｉｔｔｅｒ（登録商標）、ＦａｃｅＢｏｏｋ（登録商標）、ブログ等の広く公開されるＳＮＳであってもよい。したがって、商品・サービスについてのアンケートのみならず、商品・サービス、更にはテレビジョン放送、ラジオ放送による放送内容等に対する感想等を含む種々の発信情報を用いて広く、ターゲットとなる人物を識別する情報を抽出することも可能である。このとき、発信された情報の内、ＳＮＳで使用される「タグ」として意図的に付けられている情報は除外する工程を経てから抽出を行なうようにしてもよい。これにより、「タグ」の内容に左右されず、潜在的にターゲットとなるべき人物を抽出することができる。更には、ネットワークコミュニティ１００は商品開発に係るアンケート調査の場であるのみならず、就職活動、転職活動、その他人材マッチングサービスにおける情報交換の場であってもよい。就職活動、転職活動、その他人材マッチングサービスに適用することによって、人事担当者、又はマッチングサービスの管理者がイメージする所望の人物像に合致するユーザの抽出が可能である。 In the above-described embodiment, the network community 100 is not limited to the participants such as the questionnaire community described above, but is widely disclosed such as Twitter (registered trademark), FaceBook (registered trademark), and blog. SNS may be used. Therefore, not only questionnaires on products / services, but also information that identifies the target person widely using a variety of transmission information including products / services, as well as impressions on the contents of television broadcasts and radio broadcasts, etc. Can also be extracted. At this time, information that is intentionally attached as a “tag” used in the SNS among the transmitted information may be extracted after a process of excluding it. This makes it possible to extract a person who is potentially a target regardless of the contents of the “tag”. Furthermore, the network community 100 is not only a place for a questionnaire survey related to product development, but may also be a place for information exchange in job hunting activities, job change activities, and other human resource matching services. By applying to job hunting activities, job hunting activities, and other human resource matching services, it is possible to extract users who match a desired person image envisioned by the personnel manager or the manager of the matching service.

また本開示では、上述したようにネットワークコミュニティ１００上での通信媒体を介した発信情報により、適切な人物を抽出する構成とした。しかしながら本願発明はこれに限らず、１つの集音装置にて複数の人物による対話を録音し、録音情報をテキスト化したものを発信情報とするか、又はテキスト化されているインタビュー記事を発信情報としても適用することが可能である。この場合対話に参加した人物、インタビューに答えた人物夫々を識別する識別情報データベースを作成し、夫々からの発言のテキストデータを発信情報として発信情報データベースを作成する。そして所望の人物に対応するテキストテキストコンテンツＳを用いて図５のフローチャートの処理を行なう。これにより、ネットワークコミュニティ１００のような仮想空間での発信情報のみならず、実空間での対話、又は紙媒体における発信情報から適切な人物の識別情報を抽出することも可能である。 Moreover, in this indication, it was set as the structure which extracts an appropriate person with the transmission information via the communication medium on the network community 100 as mentioned above. However, the invention of the present application is not limited to this, and conversations by a plurality of persons are recorded by one sound collecting device, and the recorded information is converted into text, or the interview information is converted into text. It is also possible to apply. In this case, an identification information database for identifying the person who participated in the dialogue and the person who answered the interview is created, and the transmission information database is created using the text data of the remarks from each as the transmission information. Then, the process of the flowchart of FIG. 5 is performed using the text text content S corresponding to the desired person. As a result, it is possible to extract appropriate person identification information not only from the transmission information in the virtual space such as the network community 100 but also from the conversation in the real space or the transmission information in the paper medium.

なお、上述のように開示された本実施の形態はすべての点で例示であって、制限的なものではないと考えられるべきである。本発明の範囲は、上記した意味ではなく、特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 It should be understood that the embodiment disclosed above is illustrative in all respects and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the meanings described above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

１サーバ装置（抽出装置）
１０制御部
１１記憶部
１１１ユーザ情報ＤＢ（発信者識別情報データベース）
１１２発信情報ＤＢ（発信情報データベース）
１２一時記憶部
１３通信部
２，３クライアント装置
２０，３０制御部
２３，３３表示部
２６，３６通信部 1 Server device (extraction device)
10 control unit 11 storage unit 111 user information DB (sender identification information database)
112 Transmission Information DB (Transmission Information Database)
12 Temporary storage unit 13 Communication unit 2, 3 Client device 20, 30 Control unit 23, 33 Display unit 26, 36 Communication unit

Claims

A reception unit for receiving text content corresponding to a desired person image;
A transmission information database in which a plurality of transmission information transmitted from a plurality of senders is recorded in association with identification information for identifying each transmission information, and a sender identification information for identifying each sender of the transmission information is recorded. A storage unit for storing the sender identification information database,
For each caller using the caller information database and caller identification information database, a calculation unit that calculates the similarity between the caller information group sent by the caller and the text content received by the receiving unit;
An extraction apparatus comprising: an extraction unit that extracts, from the sender identification information database, sender identification information of a sender whose similarity calculated by the calculation unit is equal to or greater than a predetermined degree.

The extraction device according to claim 1, wherein the extraction unit sorts the caller identification information in descending order of corresponding similarity.

For each caller identification information extracted by the extraction unit, a creation unit is further provided for creating display information for displaying information related to the calculation of the similarity and a plurality of transmission information contributing to the similarity. The extraction device according to claim 1, wherein:

In the outgoing information database, the outgoing time of outgoing information is recorded in association with the outgoing information,
The calculation unit includes:
For each sender, extract the outgoing information group sent by the sender,
From the extracted transmission information group, narrow down the transmission information group transmitted in the period of the predetermined length from the latest,
The extraction device according to claim 1, wherein a similarity with the text content received by the reception unit is calculated using the narrowed down transmission information group.

In the outgoing information database, the outgoing time of outgoing information is recorded in association with the outgoing information,
The calculation unit includes:
Each of the outgoing information extracted from the outgoing information database is given a weighting coefficient that becomes a higher numerical value in order from the time when the outgoing time is closer to the calculation time of the similarity,
Multiply the given weighting coefficient by the number of appearances of the word included in the transmission information,
The similarity with the text content received by the reception unit is calculated for each of the senders using the words included in the transmission information group transmitted by the senders and the number of appearances of the words. Item 2. The extraction device according to Item 1.

The reception unit receives a plurality of text contents together,
2. The calculation unit according to claim 1, wherein the calculation unit calculates the similarity for each of the plurality of text contents, or calculates the similarity using information indicating characteristics derived from the plurality of text contents. The extraction device described.

The calculation unit includes:
Calculating similarity using words included in each of the transmission information or the text content received by the receiving unit and related words extracted from related dictionaries in which related words related to the words are recorded. The extraction device according to claim 1, wherein

A transmission information database in which a plurality of transmission information transmitted from a plurality of senders is recorded in association with identification information for identifying each transmission information, and a sender identification information for identifying each sender of the transmission information is recorded. A computer program that allows a computer that can read and write to the sender identification information database to extract the sender identification information,
In the computer,
Receiving text content corresponding to a desired person image;
Calculating the degree of similarity between the transmission information group transmitted by the sender and the received text content for each sender using the transmission information database and the sender identification information database;
A computer program for executing the step of extracting, from the sender identification information database, sender identification information of a sender whose calculated similarity is a predetermined degree or more.