JP2015005091A

JP2015005091A - Information collection device and information collection program

Info

Publication number: JP2015005091A
Application number: JP2013129401A
Authority: JP
Inventors: 圭吾町永; Keigo Machinaga; 林太郎宮▲崎▼; Rintaro Miyazaki
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2013-06-20
Filing date: 2013-06-20
Publication date: 2015-01-08
Anticipated expiration: 2033-06-20
Also published as: JP5781124B2

Abstract

PROBLEM TO BE SOLVED: To collect information about the user's taste and preference or the like from the Internet.SOLUTION: An information collection device of the present invention includes: acquisition means that obtains a retrieval query which is input by a first user; extraction means that extracts a contributed content which is related to the retrieval query; and association means that associates the first user with a second user who contributed the extracted contributed content.

Description

本発明は、情報収集装置及び情報収集プログラムの技術に関する。 The present invention relates to a technology of an information collection device and an information collection program.

従来より、ユーザのＷｅｂに関する行動履歴に基づいて、そのユーザの趣味嗜好等に関する情報を収集する技術が知られている。例えば、ユーザの過去の検索クエリ（検索語）や過去に参照したＷｅｂサイトの情報を蓄積し、その内容を分析することで、そのユーザの趣味嗜好等に関する情報を収集することができる。また、コミュニティのコンテンツあるコミュニティの過去の行動と他のコミュニティの過去の行動とに基づいて、２つのコミュニティの類似度を判定し、一方のコミュニティの嗜好に基づいて他方のコミュニティの嗜好を推測する技術もある（例えば、特許文献１参照）。 2. Description of the Related Art Conventionally, a technique for collecting information related to a user's hobbies and preferences based on an action history related to the user's Web is known. For example, it is possible to collect information related to the user's hobbies and preferences by accumulating information on the user's past search queries (search terms) and websites referred to in the past and analyzing the contents thereof. Also, based on the past behavior of a community and the past behavior of another community, the similarity between the two communities is determined, and the preference of the other community is estimated based on the preference of one community There is also a technique (see, for example, Patent Document 1).

このようにして収集されたユーザの趣味嗜好の情報は、例えば、そのユーザに対するターゲティング広告等の表示・配信等に用いられる。 Information on the user's hobbies and preferences collected in this way is used, for example, for display / distribution of a targeting advertisement or the like for the user.

特開２００８−１０２８４６号公報JP 2008-102846 A

近年、インターネット上の広告やコンテンツ等の表示・配信の手法をとっても、その手法は益々高度化・多様化しつつある。このため、ユーザに対しいっそう効果的な情報を提示するには、ユーザに関する幅広く詳細な情報が不可欠である。 In recent years, techniques for displaying and distributing advertisements and contents on the Internet have become increasingly sophisticated and diversified. For this reason, in order to present more effective information to the user, wide and detailed information about the user is indispensable.

本発明は上記の点に鑑み提案されたものであり、インターネット上から、ユーザの趣味嗜好等に関する情報を収集するための、一つの技術を提案することにある。 The present invention has been proposed in view of the above points, and it is an object of the present invention to propose a technique for collecting information related to a user's hobbies and preferences from the Internet.

上記の課題を解決するため、本発明にかかる情報収集装置にあっては、第１ユーザにより入力された検索クエリを取得する取得手段と、前記検索クエリに基づいて、該検索クエリに関連する投稿コンテンツを抽出する抽出手段と、前記第１ユーザと、抽出された前記投稿コンテンツを投稿した第２ユーザとを関連付ける関連付手段とを有する。 In order to solve the above problems, in the information collection device according to the present invention, an acquisition unit that acquires a search query input by a first user, and a post related to the search query based on the search query Extraction means for extracting content, and association means for associating the first user with the second user who has posted the extracted posted content.

本発明の実施形態によれば、インターネット上から、ユーザの趣味嗜好等に関する情報を収集することができる。 According to the embodiment of the present invention, it is possible to collect information related to a user's hobbies and preferences from the Internet.

情報収集システムのネットワーク構成図である。It is a network block diagram of an information collection system. 情報収集サーバ３のハードウェア構成例を示す図である。2 is a diagram illustrating a hardware configuration example of an information collection server 3. FIG. 情報収集サーバ３のソフトウェア構成例を示す図である。2 is a diagram illustrating an example of a software configuration of an information collection server 3. FIG. ユーザ情報ＤＢ３０１ｃのデータ構成例を示す図である。It is a figure which shows the data structural example of user information DB301c. 検索クエリログ３０１ａのデータ構成例を示す図である。It is a figure which shows the data structural example of the search query log 301a. 投稿コンテンツログ３０１ｂのデータ構成例を示す図である。It is a figure which shows the data structural example of the contribution content log 301b. ユーザ関連度情報３０１ｄのデータ構成例を示す図である。It is a figure showing an example of data composition of user relevance information 301d. 情報収集サーバ３の情報収集処理を示すフローチャートである。5 is a flowchart showing information collection processing of the information collection server 3. ユーザの関連性を示す図である。It is a figure which shows a user's relevance.

以下、本発明の好適な実施形態につき説明する。 Hereinafter, preferred embodiments of the present invention will be described.

＜構成＞
（ネットワーク構成）
図１は、本発明の一実施形態にかかる情報収集システムのネットワーク構成図である。情報収集システムは、検索サーバ１、ブログサーバ２、情報収集サーバ３、及び端末４が、ネットワーク５を介して接続される。 <Configuration>
(Network configuration)
FIG. 1 is a network configuration diagram of an information collection system according to an embodiment of the present invention. In the information collection system, a search server 1, a blog server 2, an information collection server 3, and a terminal 4 are connected via a network 5.

検索サーバ１は、端末４に対して、インターネット上の情報を検索する情報検索サービスを提供する。よって、検索サーバ１は、例えば、いわゆる検索エンジンとして構成される。検索サーバ１は、ユーザの端末４から検索クエリ（検索語）が入力されると、インターネット上から検索クエリにマッチする情報（例えば、Ｗｅｂサイト等）を検索する。 The search server 1 provides the terminal 4 with an information search service for searching for information on the Internet. Thus, the search server 1 is configured as a so-called search engine, for example. When a search query (search term) is input from the user terminal 4, the search server 1 searches the Internet for information (for example, a website) that matches the search query.

ブログサーバ２は、ミニブログ、マイクロブログ、つぶやきブログなどとも呼ばれるプログサービスを提供するサーバである。ユーザは、ブログサーバ２に、自身の状況や雑記などの投稿コンテンツを短い文章で投稿する。また、ブログ内のユーザ間で投稿された文章のやり取りを介し、コミュニケーションを取ることも可能である。ブログサーバ２への投稿コンテンツは、大抵の場合、短いテキストであるため更新が容易である。このため、リアルタイムなコミュニケーションが行われることが多い。また、ブログサーバ２には、画像や動画、ＵＲＬ（Uniform Resource Locator）などの情報を投稿することもできる。 The blog server 2 is a server that provides a blog service called a mini blog, a micro blog, a tweet blog, or the like. The user posts posted content such as his / her situation and miscellaneous information in the blog server 2 in a short sentence. It is also possible to communicate through the exchange of text posted between users in the blog. In most cases, the content posted to the blog server 2 is a short text and can be easily updated. For this reason, real-time communication is often performed. The blog server 2 can also post information such as images, videos, and URLs (Uniform Resource Locator).

情報収集サーバ３は、ユーザに関する情報（例えば、趣味嗜好等）を収集する。収集は、例えば、検索サーバ１での検索ユーザのＷｅｂに関する行動履歴に基づいて、その検索ユーザの趣味嗜好等に関する情報を収集・分析する。またさらに、本実施形態においては、後述するように、ブログサーバ２の投稿コンテンツに基づいて、その検索ユーザの趣味嗜好等に関する情報を収集・分析する。この点、再度詳しく後述する。 The information collection server 3 collects information about the user (for example, hobbies and preferences). For collection, for example, based on an action history related to the search user's Web in the search server 1, information related to the search user's hobbies and preferences is collected and analyzed. Furthermore, in this embodiment, as will be described later, based on the posted content of the blog server 2, information related to the search user's hobbies and preferences is collected and analyzed. This will be described later in detail again.

端末４は、ユーザの利用するユーザ端末である。ユーザは端末４を用いて検索サーバ１にアクセスし、その検索画面から検索語を入力することで、インターネット上の情報（例えば、Ｗｅｂサイト等）を検索する。また、ユーザは端末４を用いてブログサーバ２にアクセスし、自身の状況や雑記などの投稿コンテンツを投稿する。なお、端末４は、ＰＣ（Personal Computer）、スマートフォン、携帯電話等の情報処理機器から構成される。また、検索サーバ１にアクセスし検索画面を閲覧するため、例えば、Ｗｅｂブラウザなどのアプリケーションを備える。 The terminal 4 is a user terminal used by the user. A user accesses the search server 1 using the terminal 4 and inputs a search word from the search screen to search information on the Internet (for example, a website). Further, the user accesses the blog server 2 using the terminal 4 and posts posted content such as his / her own situation and miscellaneous notes. Note that the terminal 4 includes information processing devices such as a PC (Personal Computer), a smartphone, and a mobile phone. Further, in order to access the search server 1 and browse the search screen, for example, an application such as a Web browser is provided.

ネットワーク５は、有線、無線を含むネットワークであり、例えば、検索サーバ１、ブログサーバ２及び端末４を接続するインターネット網である。 The network 5 is a network including wired and wireless, and is, for example, an Internet network that connects the search server 1, the blog server 2, and the terminal 4.

（ハードウェア構成）
図２は、本発明の一実施形態にかかる情報収集サーバ３のハードウェア構成例を示す図である。情報収集サーバ３は、ＣＰＵ（Central Processing Unit）３１、ＲＯＭ（Read Only Memory）３２、ＲＡＭ（Random Access Memory）３３、入力装置３４、出力装置３５、通信装置３６、ＨＤＤ（Hard Disk Drive）３７を有する。 (Hardware configuration)
FIG. 2 is a diagram illustrating a hardware configuration example of the information collection server 3 according to the embodiment of the present invention. The information collecting server 3 includes a central processing unit (CPU) 31, a read only memory (ROM) 32, a random access memory (RAM) 33, an input device 34, an output device 35, a communication device 36, and a hard disk drive (HDD) 37. Have.

ＣＰＵ３１は、各種プログラムの実行や演算処理を行う。ＲＯＭ３２は、起動時に必要なプログラムなどが記憶されている。ＲＡＭ３３は、ＣＰＵ３１での処理を一時的に記憶したり、データを記憶したりする。入力装置３４は、キーボードやマウスである（タッチパネルを含む）。出力装置３５は、映像や画像を表示出力するディスプレイや、音声等を出力するスピーカーである。通信装置３６は、ネットワーク５を介し、例えば端末４など他装置との通信を行う。ＨＤＤ３７は、各種データ及びプログラムを格納する。 The CPU 31 executes various programs and performs arithmetic processing. The ROM 32 stores a program necessary for starting up. The RAM 33 temporarily stores processing performed by the CPU 31 and stores data. The input device 34 is a keyboard or a mouse (including a touch panel). The output device 35 is a display that displays and outputs video and images, and a speaker that outputs sound and the like. The communication device 36 communicates with other devices such as the terminal 4 via the network 5. The HDD 37 stores various data and programs.

（ソフトウェア構成）
図３は、本発明の一実施形態にかかる情報収集サーバ３のソフトウェア構成例を示す図である。情報収集サーバ３は、主な機能部として、記憶部３０１、検索クエリ取得部３０２、投稿コンテンツ抽出部３０３、ユーザ関連付部３０４、ユーザ情報収集部３０５を有する。 (Software configuration)
FIG. 3 is a diagram showing a software configuration example of the information collection server 3 according to the embodiment of the present invention. The information collection server 3 includes a storage unit 301, a search query acquisition unit 302, a posted content extraction unit 303, a user association unit 304, and a user information collection unit 305 as main functional units.

記憶部３０１は、検索クエリログ３０１ａ、投稿コンテンツログ３０１ｂ、ユーザ情報ＤＢ（Database）３０１ｃ、ユーザ関連度情報３０１ｄを記憶する機能を有している。これら情報については具体例と共に後述する。 The storage unit 301 has a function of storing a search query log 301a, a posted content log 301b, a user information DB (Database) 301c, and user relevance information 301d. These pieces of information will be described later along with specific examples.

検索クエリ取得部３０２は、検索ユーザにより入力された検索クエリと、その検索クエリの入力時刻とを取得する機能を有している。 The search query acquisition unit 302 has a function of acquiring a search query input by a search user and an input time of the search query.

投稿コンテンツ抽出部３０３は、検索ユーザにより入力された検索クエリと、その検索クエリの入力時刻とに基づいて、検索クエリと投稿コンテンツとの関連度値を算出することで、検索クエリに関連する投稿コンテンツ（検索クエリと関連度の高い投稿コンテンツ）を抽出する機能を有している。 The posted content extraction unit 303 calculates a relevance value between the search query and the posted content based on the search query input by the search user and the input time of the search query, thereby posting related to the search query. It has a function of extracting content (posted content highly relevant to a search query).

ユーザ関連付部３０４は、検索ユーザの入力した検索クエリと、ブログユーザの投稿した投稿コンテンツとの関連性が高い場合、検索クエリを入力した検索ユーザと、抽出された投稿コンテンツを投稿したブログユーザとを関連付ける機能を有している。検索クエリは検索ユーザにより入力されたものであり、投稿コンテンツはブログユーザにより投稿されたものであるが、検索クエリと投稿コンテンツとの関連性が高い場合（関連性が高いほど）、検索クエリの入力した検索ユーザと、その投稿コンテンツを投稿したブログユーザとは、同一ユーザ又は趣味嗜好が類似するユーザである可能性は高いと考えられるためである。この点、再度詳しく後述する。 When the relevance between the search query input by the search user and the posted content posted by the blog user is high, the user association unit 304 includes the search user who input the search query and the blog user who posted the extracted posted content. Has the function of associating with. The search query is entered by the search user, and the posted content is posted by the blog user. If the search query is highly related to the posted content (the more relevant), the search query This is because the input search user and the blog user who posted the posted content are likely to be the same user or a user with similar hobbies and preferences. This will be described later in detail again.

ユーザ情報収集部３０５は、検索クエリの入力した検索ユーザと関連付けられたブログユーザの投稿した投稿コンテンツから、ブログユーザに関する情報（例えば、趣味嗜好等）、これ即ち、検索ユーザに関する情報（例えば、趣味嗜好等）を収集する機能を有している。 The user information collection unit 305 obtains information related to the blog user (for example, hobby preferences) from the posted content posted by the blog user associated with the search user input by the search query, that is, information related to the search user (for example, hobby). It has a function to collect preferences, etc.).

以上、情報収集サーバ３の主な機能構成である。これらの各機能は、実際には情報収集サーバ３のＣＰＵ３１が実行するプログラムによりコンピュータに実現させることで実現される。 The main functional configuration of the information collection server 3 has been described above. Each of these functions is actually realized by causing a computer to implement the program by the CPU 31 of the information collection server 3.

（ユーザ情報ＤＢ３０１ｃ）
図４は、本発明の一実施形態にかかるユーザ情報ＤＢ３０１ｃのデータ構成例を示す図である。ユーザ情報ＤＢ３０１ｃは、「検索ユーザＩＤ」、「ユーザ名」、「年齢」、「住所」、「趣味嗜好」、「類似ブログユーザＩＤ」、「趣味嗜好（拡張）」などのデータ項目を有する。 (User information DB 301c)
FIG. 4 is a diagram showing a data configuration example of the user information DB 301c according to the embodiment of the present invention. The user information DB 301c includes data items such as “search user ID”, “user name”, “age”, “address”, “hobby preference”, “similar blog user ID”, and “hobby preference (extended)”.

「検索ユーザＩＤ」は、検索サーバ１を利用する検索ユーザのＩＤ（固有識別子）である。「検索ユーザＩＤ」は、検索クエリの入力者（検索者）を識別するための識別子として用いられる。 “Search user ID” is an ID (unique identifier) of a search user who uses the search server 1. The “search user ID” is used as an identifier for identifying a search query input person (searcher).

「ユーザ名」、「年齢」、「住所」は、検索ユーザのユーザ名、年齢、住所である。これら検索ユーザの属性情報は、検索サーバ１を利用する検索ユーザが、例えば、ユーザ登録を行う際に登録される。また、これ以外の属性情報が登録されてもよい。 “User name”, “age”, and “address” are the user name, age, and address of the search user. The search user attribute information is registered when the search user who uses the search server 1 performs user registration, for example. Further, other attribute information may be registered.

「趣味嗜好」は、検索ユーザの趣味や嗜好に関する情報である。「趣味嗜好」は、検索ユーザ自身によりユーザ登録を行う際に登録されうる。また、その検索ユーザのＷｅｂに関する行動履歴に基づいて、その検索ユーザの趣味嗜好等に関する情報を収集・分析し、その結果を登録してもよい。 “Hobby preference” is information relating to a search user's hobbies and preferences. The “hobby preference” can be registered when the search user himself performs user registration. Further, based on the search user's behavior history regarding the Web, information related to the search user's hobbies and preferences may be collected and analyzed, and the result may be registered.

「類似ブログユーザＩＤ」は、後述の情報収集処理を経て、検索ユーザと同一人物である可能性の高いブログユーザ又はその検索ユーザと趣味嗜好が類似する可能性の高いブログユーザ（類似ブログユーザと呼ぶ）のユーザＩＤが格納される。このため、「類似ブログユーザＩＤ」欄に格納されるブログユーザは、その検索ユーザと同一趣味嗜好を有する同一趣味嗜好グループメンバーともいえる。 The “similar blog user ID” is a blog user who is likely to be the same person as the search user or a blog user who is highly likely to have similar hobbies and tastes after the information collection process described later (with similar blog user User ID) is stored. Therefore, it can be said that the blog user stored in the “similar blog user ID” column is the same hobby preference group member who has the same hobby preference as the search user.

「趣味嗜好（拡張）」は、類似ブログユーザの投稿した投稿コンテンツから収集される類似ブログユーザの趣味嗜好に関する情報である。また、「検索ユーザＩＤ」の検索ユーザと、「類似ブログユーザＩＤ」のブログユーザとは、趣味嗜好が類似することから、「趣味嗜好（拡張）」は、同時に、検索ユーザ自身の趣味や嗜好に関する情報である可能性が高い。よって、本実施形態では、類似ブログユーザの投稿した投稿コンテンツから収集される類似ブログユーザの趣味嗜好に関する情報は、検索ユーザ自身の趣味嗜好でもあるとして取り扱う。 “Hobby preferences (extended)” is information on hobby preferences of similar blog users collected from posted content posted by similar blog users. In addition, since the search user of “search user ID” and the blog user of “similar blog user ID” have similar hobbies and preferences, “hobby preferences (extended)” is simultaneously the hobbies and preferences of the search users themselves. Information is likely to be. Therefore, in this embodiment, the information regarding the hobby preference of the similar blog user collected from the posted content posted by the similar blog user is handled as the hobby preference of the search user itself.

例えば、「検索ユーザＩＤ」ｙ−ｔａｒｏは、「趣味嗜好」として、サッカー観戦、車、旅行に興味がある。この「趣味嗜好」は、検索ユーザ自身によりユーザ登録を行う際に登録されたり、そのユーザのＷｅｂに関する行動履歴に基づいて得られた情報である。 For example, “search user ID” y-taro is interested in watching soccer games, cars, and traveling as “hobby preferences”. This “hobby preference” is information that is registered when the search user himself / herself performs user registration, or obtained based on the user's behavior history regarding the Web.

ここで、図４によれば、「検索ユーザＩＤ」ｙ−ｔａｒｏは、「類似ブログユーザＩＤ」ｔａｒｏ１２、ｔａｒｏ３３２と対応付けられていることから、「検索ユーザＩＤ」ｙ−ｔａｒｏと「類似ブログユーザＩＤ」ｔａｒｏ１２、ｔａｒｏ３３２とは、その「趣味嗜好」が類似する。よって、「検索ユーザＩＤ」ｙ−ｔａｒｏの「趣味嗜好」を、「類似ブログユーザＩＤ」ｔａｒｏ１２、ｔａｒｏ３３２と同一「趣味嗜好」の日本酒にまで拡張することができる。 Here, according to FIG. 4, since “search user ID” y-taro is associated with “similar blog user ID” taro12 and taro332, “search user ID” y-taro and “similar blog user” IDs “taro12 and taro332” are similar in “hobby preference”. Therefore, the “hobby preference” of the “search user ID” y-taro can be extended to the sake of the same “hobby preference” as the “similar blog user IDs” taro12 and taro332.

（検索クエリログ３０１ａ）
図５は、本発明の一実施形態にかかる検索クエリログ３０１ａのデータ構成例を示す図である。検索クエリログ３０１ａは、検索クエリが入力された「時刻」、検索クエリを入力した検索ユーザの「検索ユーザＩＤ」、入力された「検索クエリ」などのログデータ項目を有する。なお、「時刻」は、年月日時分秒を含む。 (Search query log 301a)
FIG. 5 is a diagram showing a data configuration example of the search query log 301a according to the embodiment of the present invention. The search query log 301a includes log data items such as “time” when the search query is input, “search user ID” of the search user who inputs the search query, and “search query” that is input. The “time” includes year / month / day / hour / minute / second.

検索ユーザは、端末４を用いて検索サーバ１にアクセスし、その検索画面から検索クエリ（検索語）を入力することで、インターネット上の情報（例えば、Ｗｅｂサイト等）を検索する。このとき、情報収集サーバ３（又は検索サーバ１）は、検索クエリログとして、「時刻」、「検索ユーザＩＤ」、「検索クエリ」などのログを検索クエリログ３０１ａに蓄積する。 The search user accesses the search server 1 using the terminal 4 and inputs a search query (search word) from the search screen to search for information on the Internet (for example, a website). At this time, the information collection server 3 (or the search server 1) accumulates logs such as “time”, “search user ID”, and “search query” in the search query log 301a as the search query log.

なお、本実施形態では、検索クエリを入力した検索ユーザを識別するため、検索ユーザ（例えば、「検索ユーザＩＤ」ｙ−ｔａｒｏ）は、既に検索画面でログイン中の状態にあるものとする。勿論、検索クエリを入力した検索ユーザを識別可能な限り、検索ユーザを識別可能なその他の識別子を用いてもよい。例えば、browser cookieを利用し、ブラウザの識別子により、検索ユーザを識別することができる。この場合、ログイン中の状態にある必要はない。 In this embodiment, in order to identify the search user who has input the search query, the search user (for example, “search user ID” y-taro) is already logged in on the search screen. Of course, as long as the search user who input the search query can be identified, other identifiers that can identify the search user may be used. For example, a search user can be identified by using a browser cookie and a browser identifier. In this case, there is no need to be logged in.

（投稿コンテンツログ３０１ｂ）
図６は、本発明の一実施形態にかかる投稿コンテンツログ３０１ｂのデータ構成例を示す図である。投稿コンテンツログ３０１ｂは、投稿コンテンツの情報として、投稿の「時刻」、投稿コンテンツを投稿したブログユーザの「ブログユーザＩＤ」、投稿された「投稿コンテンツ」などのログデータ項目を有する。 (Posted content log 301b)
FIG. 6 is a diagram showing a data configuration example of the posted content log 301b according to the embodiment of the present invention. The posted content log 301b includes log data items such as “time” of posting, “blog user ID” of the blog user who posted the posted content, and “posted content” posted as posted content information.

ブログユーザは、端末４を用いてブログサーバ２にアクセスし、自身の状況や雑記などの投稿コンテンツを投稿する。一方、情報収集サーバ３は、一定時間毎又は所定タイミングで、ブログサーバ２が公開中の投稿コンテンツから、「時刻」、「ブログユーザＩＤ」、「投稿コンテンツ」などの情報を取得し、投稿コンテンツログとしてこれを投稿コンテンツログ３０１ｂに蓄積する。また、投稿したブログユーザのユーザ情報詳細（プロフィール詳細）が公開されている場合には、公開範囲内で、「ブログユーザＩＤ」、「ニックネーム」、「年齢」、「住所」、「趣味」などのユーザ情報詳細を投稿コンテンツログ３０１ｂに蓄積する。 The blog user accesses the blog server 2 using the terminal 4 and posts posted content such as his / her situation and miscellaneous notes. On the other hand, the information collection server 3 acquires information such as “time”, “blog user ID”, “posted content”, etc. from the posted content that is published by the blog server 2 at regular time intervals or at a predetermined timing. This is accumulated in the posted content log 301b as a log. In addition, when the user information details (profile details) of the posted blog user are made public, “blog user ID”, “nickname”, “age”, “address”, “hobby”, etc. within the disclosure range Are stored in the posted content log 301b.

（ユーザ関連度情報３０１ｄ）
図７は、本発明の一実施形態にかかるユーザ関連度情報３０１ｄのデータ構成例を示す図である。ユーザ関連度情報３０１ｄは、「検索ユーザＩＤ」、「ブログユーザＩＤ」、「関連度値」などのデータ項目を有する。 (User relevance information 301d)
FIG. 7 is a diagram showing a data configuration example of the user association degree information 301d according to the embodiment of the present invention. The user relevance information 301d includes data items such as “search user ID”, “blog user ID”, and “relevance value”.

上述したように、投稿コンテンツ抽出部３０３は、検索ユーザにより入力された検索クエリと、その検索クエリの入力時刻とに基づいて、検索クエリと、それぞれの投稿コンテンツとの関連度値（類似度値）を算出し、検索クエリに関連する（検索クエリと関連性の高い）投稿コンテンツを抽出する。このとき、検索クエリと、投稿コンテンツとの関連度値が高いほど、検索クエリを入力した検索ユーザと、投稿コンテンツを投稿したブログユーザとは、同一人物である可能性が高いか、又は趣味嗜好が近しい同一趣味嗜好グループメンバーである可能性が高い。 As described above, the posted content extraction unit 303 determines the relevance value (similarity value) between the search query and each posted content based on the search query input by the search user and the input time of the search query. ) To extract post content related to the search query (highly related to the search query). At this time, the higher the relevance value between the search query and the posted content, the more likely that the search user who entered the search query and the blog user who posted the posted content are the same person, or a hobby preference Is likely to be a member of the same hobby group.

よって、投稿コンテンツ抽出部３０３は、検索クエリと、投稿コンテンツとの関連度値（類似度値）とを、ユーザ関連度情報３０１ｄにおいて、その検索クエリを入力した検索ユーザと、その投稿コンテンツを投稿したブログユーザとの関連性度合いを示す関連度値として採用する。検索クエリと、投稿コンテンツとの関連度値（類似度値）が高いほど、同一ユーザである可能性が高く、又は同一ユーザではなくとも趣味嗜好が非常に近しいユーザ同士であるといえるためである。 Accordingly, the posted content extraction unit 303 posts the search query and the relevance value (similarity value) between the posted content and the search user who has input the search query in the user relevance information 301d and the posted content. Adopted as a relevance value indicating the degree of relevance with the blog user. This is because the higher the relevance value (similarity value) between the search query and the posted content, the higher the possibility that they are the same user, or it can be said that the users are very close to each other even if they are not the same user. .

なお、関連度値は、１つの検索クエリと、１つの投稿コンテンツとの関連度値のみで決定される訳ではなく、関連度値の信頼度を高めるため、一定の期間に渡る複数の検索クエリと、複数の投稿コンテンツとの関連度値で決定されうる。よって、例えば、新たに検索ユーザにより入力された検索クエリと、新たに投稿された投稿コンテンツとの関連度値とに基づいて、その検索クエリを入力した検索ユーザと、投稿コンテンツを投稿したブログユーザとの関連度値は、都度更新される。 Note that the relevance value is not determined by only the relevance value between one search query and one post content, but a plurality of search queries over a certain period of time in order to increase the reliability of the relevance value. And a relevance value with a plurality of posted contents. Thus, for example, based on the search query newly entered by the search user and the relevance value between the newly posted content, the search user who entered the search query, and the blog user who posted the posted content The relevance value is updated each time.

＜動作＞
図８は、本発明の一実施形態にかかる情報収集サーバ３の情報収集処理を示すフローチャートである。以下、図面を参照しながら、詳しく説明する。なお、前提として、情報収集サーバ３は、検索クエリログ３０１ａ、投稿コンテンツログ３０１ｂ、ユーザ情報ＤＢ（Database）３０１ｃのデータを予め有している。 <Operation>
FIG. 8 is a flowchart showing information collection processing of the information collection server 3 according to the embodiment of the present invention. Hereinafter, it will be described in detail with reference to the drawings. As a premise, the information collection server 3 has data of a search query log 301a, a posted content log 301b, and a user information DB (Database) 301c in advance.

Ｓ１：情報収集サーバ３の検索クエリ取得部３０２は、検索クエリログ３０１ａから、検索ユーザにより入力された検索クエリと、その検索クエリの入力時刻とを取得する。 S1: The search query acquisition unit 302 of the information collection server 3 acquires the search query input by the search user and the input time of the search query from the search query log 301a.

例えば、図５の場合、「検索ユーザＩＤ」ｙ−ｔａｒｏにより入力された検索クエリ「ワールドカップ」、「日本」と、その検索クエリの入力時刻「3/10 22：18」を取得する。 For example, in the case of FIG. 5, the search query “world cup” and “Japan” input by “search user ID” y-taro and the input time “3/10 22:18” of the search query are acquired.

Ｓ２：投稿コンテンツ抽出部３０３は、投稿コンテンツログ３０１ｂから、検索ユーザにより入力された検索クエリに基づいて、検索クエリに関連する（検索クエリと関連性の高い）投稿コンテンツを抽出する。 S2: The posted content extraction unit 303 extracts the posted content related to the search query (highly related to the search query) from the posted content log 301b based on the search query input by the search user.

具体的に、検索ユーザにより入力された検索クエリと、投稿コンテンツ（内容）とを比較し、関連度値を算出する。検索クエリと投稿コンテンツ（内容）との関連度値は、例えば、形態素解析、コサイン類似度を用いて算出しうる。そして、検索クエリと投稿コンテンツとの関連性が高い場合、その投稿コンテンツを抽出する。なお、検索クエリと関連性の高い低いは、関連度値に基づいて判定し、関連度値が所定閾値（例えば、０．７等）以上の場合、検索クエリと関連性の高い投稿コンテンツであると判定する。 Specifically, the search query input by the search user is compared with the posted content (content), and the relevance value is calculated. The relevance value between the search query and the posted content (content) can be calculated using, for example, morphological analysis and cosine similarity. If the relevance between the search query and the posted content is high, the posted content is extracted. Note that the low relevance to the search query is determined based on the relevance value. If the relevance value is equal to or higher than a predetermined threshold (for example, 0.7), the posted content is highly relevant to the search query. Is determined.

例えば、検索クエリ「ワールドカップ」、「日本」と、「ニッポン、ワールドカップ出場決定おめでとー」との投稿コンテンツ（内容）とを比較し、その関連度値を算出する。そして、例えば、その関連度値が０．８００の場合、この投稿コンテンツを抽出する。 For example, the search queries “World Cup” and “Japan” are compared with the posted contents (contents) of “Nippon, World Cup participation decision congratulations”, and the relevance value is calculated. For example, when the relevance value is 0.800, the posted content is extracted.

また、例えば、検索クエリ「ワールドカップ」、「日本」と、「最近観戦しに行ってないなー。ブラジル戦はナマで観戦したい！」との投稿コンテンツ（内容）とを比較し、その関連度値を算出する。そして、例えば、その関連度値が０．８４６の場合、この投稿コンテンツを抽出する。 Also, for example, compare the search queries “World Cup” and “Japan” with the posted content (contents) of “I haven't been to watch the game recently. Is calculated. For example, if the relevance value is 0.846, the posted content is extracted.

また、例えば、検索クエリ「ワールドカップ」、「日本」と、「内容はともかく、日本のワールドカップ決まって良かった」との投稿コンテンツ（内容）とを比較し、その関連度値を算出する。そして、例えば、その関連度値が０．７２３の場合、この投稿コンテンツを抽出する。 Further, for example, the search queries “World Cup” and “Japan” are compared with the posted content (contents), “The content was determined, but the Japanese World Cup was good”, and the relevance value is calculated. For example, when the relevance value is 0.723, the posted content is extracted.

一方、例えば、検索クエリ「ワールドカップ」、「日本」と、「そろそろＧＷのホテル予約しないとなあ。。」との投稿コンテンツ（内容）とを比較し、その関連度値を算出する。そして、例えば、その関連度値が０．０２３の場合、この投稿コンテンツを抽出しない。 On the other hand, for example, the search queries “World Cup” and “Japan” are compared with the posted content (contents) of “Now we will make a hotel reservation for GW.”, And the relevance value is calculated. For example, when the relevance value is 0.023, the posted content is not extracted.

Ｓ３：次に、投稿コンテンツ抽出部３０３は、Ｓ２で抽出した投稿コンテンツから、検索クエリの入力時刻に基づいて、検索クエリにより関連する（検索クエリとより関連性の高い）投稿コンテンツを抽出する。 S3: Next, the posted content extraction unit 303 extracts the posted content related to the search query (higher related to the search query) from the posted content extracted in S2 based on the input time of the search query.

具体的に、投稿コンテンツ抽出部３０３は、Ｓ２で抽出した投稿コンテンツから、検索クエリの入力時刻の前後時刻（例えば、前後３０分）に投稿されている投稿コンテンツを抽出する。検索クエリの入力時刻と、投稿コンテンツの投稿時刻とが近いということは、例えば、検索クエリを入力した検索ユーザと、投稿コンテンツを投稿したブログユーザとは、所定以上の関連度が認められる可能性が高いからである。 Specifically, the posted content extraction unit 303 extracts the posted content posted at the time before and after the input time of the search query (for example, 30 minutes before and after) from the posted content extracted in S2. The input time of the search query is close to the posting time of the posted content. For example, there is a possibility that the search user who inputs the search query and the blog user who posted the posted content have a degree of relevance higher than a predetermined level. Because it is expensive.

例えば、図５の場合、検索クエリの入力時刻「3/10 22：18」である。よって、3/10 22：18の前後３０分の間に投稿されている投稿コンテンツを抽出する。具体的には、図５の場合、以下の投稿コンテンツが抽出される。
「20ｘｘ/3/10 22：25 ｔａｒｏ１２ニッポン、ワールドカップ出場決定おめでとー」
「20ｘｘ/3/10 22：35 ｔａｒｏ１２最近観戦しに行ってないなー。ブラジル戦はナマで観戦したい！」
「20ｘｘ/3/10 22：30 ｔａｒｏ３３２内容はともかく、日本のワールドカップ決まって良かった」
なお、Ｓ２及びＳ３は、同一ステップ内で一度に処理するようにしてもよい。つまり、検索クエリと、検索クエリの入力時刻とに基づいて、検索クエリに関連する投稿コンテンツを抽出する。 For example, in the case of FIG. 5, the input time of the search query is “3/10 22:18”. Therefore, the posted content posted within 30 minutes before and after 3/10 22:18 is extracted. Specifically, in the case of FIG. 5, the following posted content is extracted.
“20xx / 3/10 22:25 taro12 Nippon, congratulations on the decision to participate in the World Cup”
“20xx / 3/10 22:35 taro12 I have n’t been to watch recently. I want to watch the game against Brazil live!”
“20xx / 3/10 22:30 taro 332 Regardless of the content, the Japanese World Cup has been decided”
Note that S2 and S3 may be processed at the same time in the same step. That is, the posted content related to the search query is extracted based on the search query and the input time of the search query.

Ｓ４：次に、ユーザ関連付部３０４は、検索クエリを入力した検索ユーザと、抽出された投稿コンテンツを投稿したブログユーザとを、関連度値に基づいて関連付ける。 S4: Next, the user association unit 304 associates the search user who has input the search query with the blog user who has posted the extracted posted content based on the relevance value.

図９は、本発明の一実施形態にかかるユーザの関連性を示す図である。Ｓ２において、検索クエリと投稿コンテンツとの関連度値を算出したが、検索クエリと投稿コンテンツとの関連性が高い場合（例えば、関連度値が所定閾値以上の場合）、検索クエリの入力した検索ユーザと、その投稿コンテンツを投稿したブログユーザとは、同一ユーザ又は趣味嗜好が類似するユーザである可能性が高いと考えられる。よって、Ｓ２で算出した関連度値を用い、その関連度値を、検索クエリの入力した検索ユーザと、その投稿コンテンツを投稿したブログユーザとの関連性の近さを示す数値として採用する。 FIG. 9 is a diagram showing the relationship of users according to an embodiment of the present invention. In S2, the relevance value between the search query and the posted content is calculated. If the relevance between the search query and the posted content is high (for example, the relevance value is equal to or greater than a predetermined threshold), the search entered by the search query There is a high possibility that the user and the blog user who posted the posted content are the same user or a user with a similar hobby preference. Therefore, the relevance value calculated in S2 is used, and the relevance value is adopted as a numerical value indicating the closeness of relevance between the search user who input the search query and the blog user who posted the posted content.

例えば、検索クエリ「ワールドカップ」、「日本」と、「ニッポン、ワールドカップ出場決定おめでとー」との投稿コンテンツ（内容）との関連度値が０．８００の場合、この関連度値０．８００を、「検索ユーザＩＤ」ｙ−ｔａｒｏに対する「ブログユーザＩＤ」ｔａｒｏ１２との関連度値として、ユーザ関連度情報３０１ｄに登録する。 For example, if the relevance value between the search queries “World Cup” and “Japan” and the posted content (contents) of “Nippon, World Cup participation congratulations” is 0.800, this relevance value is 0. .800 is registered in the user relevance information 301d as the relevance value with the “blog user ID” taro12 for the “search user ID” y-taro.

また、例えば、検索クエリ「ワールドカップ」、「日本」と、「最近観戦しに行ってないなー。ブラジル戦はナマで観戦したい！」との投稿コンテンツ（内容）との関連度値が０．８４６の場合、この関連度値０．８４６を、「検索ユーザＩＤ」ｙ−ｔａｒｏに対する「ブログユーザＩＤ」ｔａｒｏ１２との関連度値として、ユーザ関連度情報３０１ｄに登録する。但し、既にユーザ関連度情報３０１ｄに登録がある場合、既関連度値を、今回の関連度値を織り込んで更新する。例えば、既関連度値が０．８００であり、今回の関連度値が０．８４６の場合、例えば平均をとって、０．８２３と更新する（図７）。このように、検索クエリと、投稿コンテンツとの関連度値を、複数以上考慮することで、関連度値の信頼度を高める。 In addition, for example, the relevance value between the search queries “World Cup” and “Japan” and the posted content (contents) of “I haven't been to watch the game recently. In this case, the relevance value 0.846 is registered in the user relevance information 301d as a relevance value with the “blog user ID” taro12 for the “search user ID” y-taro. However, if there is already registration in the user association degree information 301d, the existing association degree value is updated by incorporating the current association degree value. For example, if the existing relevance value is 0.800 and the current relevance value is 0.846, for example, an average is taken and updated to 0.823 (FIG. 7). Thus, the reliability of the relevance value is increased by considering a plurality of relevance values between the search query and the posted content.

また、例えば、検索クエリ「ワールドカップ」、「日本」と、「内容はともかく、日本のワールドカップ決まって良かった」との投稿コンテンツ（内容）との関連度値が０．７２３の場合、この関連度値０．７２３を、「検索ユーザＩＤ」ｙ−ｔａｒｏに対する「ブログユーザＩＤ」ｔａｒｏ３３２との関連度値として、ユーザ関連度情報３０１ｄに登録する（図７）。 Also, for example, when the relevance value between the search query “World Cup” and “Japan” and the posted content (content) is “It was good to have decided the World Cup in Japan” is 0.723. The relevance value 0.723 is registered in the user relevance information 301d as a relevance value with the “blog user ID” taro 332 for the “search user ID” y-taro (FIG. 7).

ここで、図７を参照するに、「検索ユーザＩＤ」ｙ−ｔａｒｏと「ブログユーザＩＤ」ｔａｒｏ１２は、関連度値０．８２３が示す関連度合い程度において、同一ユーザ又は趣味嗜好が類似するユーザであることが分かる。同様に、「検索ユーザＩＤ」ｙ−ｔａｒｏと「ブログユーザＩＤ」ｔａｒｏ３３２は、関連度値０．７２３が示す関連度合い程度において、同一ユーザ又は趣味嗜好が類似するユーザであることが分かる。 Here, referring to FIG. 7, “search user ID” y-taro and “blog user ID” taro 12 are users who have the same user or similar hobbies in the degree of association indicated by the association degree value 0.823. I understand that there is. Similarly, it can be seen that “search user ID” y-taro and “blog user ID” taro 332 are the same users or users with similar hobbies in terms of the degree of association indicated by the association degree value 0.723.

なお、本実施形態では、比較的、関連度の高い投稿コンテンツの存在を前提として説明したが、関連度の低い投稿コンテンツしか存在しない場合もありうる。このような場合、関連度の低いブログユーザであっても、以降の投稿コンテンツにより、関連度が上昇する場合も考慮して、ユーザ関連度情報３０１ｄにおいて、とりあえず低い関連度値とともに検索ユーザとの関連付けを行っておく。但し、あまりに関連度の低いブログユーザの場合には、検索クエリを入力した検索ユーザとの関連付けは行わなくともよい。 In the present embodiment, the description has been made on the assumption that there is a relatively high degree of relevance of posted content. However, there may be cases where only posted content having a low relevance level exists. In such a case, even if the blog user has a low relevance level, considering the case where the relevance level increases due to subsequent posted content, in the user relevance level information 301d, a low relevance level value is used together with the search user. Make an association. However, in the case of a blog user with a very low degree of association, it is not necessary to associate with a search user who has input a search query.

次に、ユーザ関連付部３０４は、ユーザ関連度情報３０１ｄを参照し、検索クエリを入力した検索ユーザに対し、関連度値が所定閾値以上のブログユーザを、同一ユーザ又は趣味嗜好が類似するユーザであるとみなして、ユーザ情報ＤＢ３０１ｃの「類似ブログユーザＩＤ」に登録する。 Next, the user association unit 304 refers to the user relevance information 301d, and a blog user whose relevance value is equal to or greater than a predetermined threshold with respect to the search user who has input the search query is the same user or a user with a similar hobby preference. Is registered in the “similar blog user ID” of the user information DB 301c.

例えば、検索クエリを入力した「検索ユーザＩＤ」ｙ−ｔａｒｏに対し、関連度値が所定閾値０．７以上の「ブログユーザＩＤ」ｔａｒｏ１２及びｔａｒｏ３３２を、「検索ユーザＩＤ」ｙ−ｔａｒｏと同一ユーザ又は趣味嗜好が類似するユーザであるとみなして、ユーザ情報ＤＢ３０１ｃの「類似ブログユーザＩＤ」に登録する（図４）。 For example, for the “search user ID” y-taro that has entered the search query, the “blog user ID” taro12 and taro332 whose relevance value is a predetermined threshold value 0.7 or more are the same users as the “search user ID” y-taro. Or it considers that it is a user with a similar hobby preference, and it registers into "similar blog user ID" of user information DB301c (FIG. 4).

「検索ユーザＩＤ」ｙ−ｔａｒｏは、検索クエリ「ワールドカップ」、「日本」を入力することで、「ワールドカップ」、「日本」に関するＷｅｂサイトの検索を行った人物である。即ち、「ワールドカップ」、「日本」に関し、興味を有している人物像がうかがえる。一方、ブログの投稿コンテンツにおいて、近しい時刻時間帯に、「ワールドカップ」、「日本」に関する投稿コンテンツを投稿した「ブログユーザＩＤ」ｔａｒｏ１２及びｔａｒｏ３３２が存在する。よって、「検索ユーザＩＤ」ｙ−ｔａｒｏと、「ブログユーザＩＤ」ｔａｒｏ１２及びｔａｒｏ３３２とは、検索サーバ１及びプログサーバ２のシステム上、異なるユーザＩＤを有するものの、同一人物である可能性がある。またもしくは、同一人物ではなくとも、「ワールドカップ」、「日本」というキーワードにおいて、趣味嗜好が類似するユーザ達であることが分かる。このように、本実施形態の「検索ユーザＩＤ」ｙ−ｔａｒｏと、「ブログユーザＩＤ」ｔａｒｏ１２及びｔａｒｏ３３２とは、同一人物であるなしに関わらず、少なくとも、同一又は類似する趣味嗜好を有する人物（達）である。 The “search user ID” y-taro is a person who has searched a website related to “world cup” and “Japan” by inputting a search query “world cup” and “Japan”. That is, there are images of people who are interested in “World Cup” and “Japan”. On the other hand, in the posted content of the blog, there are “blog user IDs” taro12 and taro332 that posted the posted content related to “World Cup” and “Japan” in a close time zone. Therefore, the “search user ID” y-taro and the “blog user IDs” taro12 and taro332 have different user IDs on the system of the search server 1 and the blog server 2, but may be the same person. Alternatively, it can be seen that even if they are not the same person, they are users who have similar hobbies in terms of the keywords “World Cup” and “Japan”. As described above, the “search user ID” y-taro and the “blog user ID” taro12 and taro332 in this embodiment are at least persons having the same or similar hobbies regardless of whether or not they are the same person ( )).

Ｓ５：ユーザ情報収集部３０５は、検索クエリの入力した検索ユーザと関連付けられたブログユーザの投稿した投稿コンテンツから、ブログユーザに関する情報（例えば、趣味嗜好等）、即ち、検索ユーザに関する情報（例えば、趣味嗜好等）を収集する。 S5: The user information collection unit 305, from the posted content posted by the blog user associated with the search user input by the search query, information related to the blog user (for example, hobby preferences), that is, information related to the search user (for example, Hobbies and preferences).

例えば、ユーザ情報ＤＢ３０１ｃ（図４）を参照すると、「検索ユーザＩＤ」ｙ−ｔａｒｏは、「類似ブログユーザＩＤ」ｔａｒｏ１２及びｔａｒｏ３３２と、同一又は類似する趣味嗜好を有する。よって、ユーザ情報収集部３０５は、過去、「類似ブログユーザＩＤ」ｔａｒｏ１２及びｔａｒｏ３３２によって投稿されている投稿コンテンツをサーチして、ｔａｒｏ１２及びｔａｒｏ３３２の有する趣味嗜好を解析する。そして、ユーザ情報収集部３０５は、解析されたｔａｒｏ１２及びｔａｒｏ３３２の趣味嗜好を、ユーザ情報ＤＢ３０１ｃの「趣味嗜好（拡張）」に登録する（図４）。 For example, referring to the user information DB 301c (FIG. 4), the “search user ID” y-taro has the same or similar hobby preference as the “similar blog user IDs” taro12 and taro332. Therefore, the user information collection unit 305 searches the posted content posted by the “similar blog user ID” taro12 and taro332 in the past, and analyzes the hobby preferences of the taro12 and taro332. Then, the user information collection unit 305 registers the analyzed hobbies and preferences of taro12 and taro332 in “hobby preferences (extended)” of the user information DB 301c (FIG. 4).

例えば、「20ｘｘ/3/11 21：04 YYY おいしい日本酒を頂きました。今夜はこれで晩酌です。」という、ｔａｒｏ３３２の投稿コンテンツが存在する（図６）。ユーザ情報収集部３０５は、ブログユーザｔａｒｏ３３２には日本酒という趣味嗜好があると解析した場合、ユーザ情報ＤＢ３０１ｃにおいて、検索ユーザｙ−ｔａｒｏの「趣味嗜好（拡張）」に登録する（図４）。ユーザ情報ＤＢ３０１ｃの「趣味嗜好」によれば、検索ユーザｙ−ｔａｒｏの趣味嗜好は、サッカー観戦、車、旅行のみが把握されるが、「趣味嗜好（拡張）」により、検索ユーザｙ−ｔａｒｏの趣味嗜好は、さらに、日本酒という趣味嗜好を有しているものとして扱うことができる。 For example, there is a posted content of taro332 that says “20xx / 3/11 21:04 YYY I had a delicious sake. When the user information collection unit 305 analyzes that the blog user taro 332 has a hobby preference of sake, the user information collection unit 305 registers the “hobby preference (extended)” of the search user y-taro in the user information DB 301c (FIG. 4). According to the “hobby preference” of the user information DB 301c, the hobby preference of the search user y-taro is only the soccer watching, the car, and the travel. The hobby preference can be further treated as having a hobby preference of sake.

従来、「検索ユーザＩＤ」ｙ−ｔａｒｏに対し、例えば、趣味嗜好に関連する広告を配信する場合、サッカー観戦、車、旅行に関する広告の配信を行う。しかしながら、本実施形態によれば、趣味嗜好を拡張することができるので、「検索ユーザＩＤ」ｙ−ｔａｒｏに対し、日本酒に関する広告の配信を行うことも可能となる。勿論、広告の配信の他にも、「検索ユーザＩＤ」ｙ−ｔａｒｏに対し、趣味嗜好に基づいて、各種の制御を行うことも可能である。 Conventionally, for example, when distributing advertisements related to hobbies and preferences to “search user ID” y-taro, advertisements related to watching soccer games, cars, and travel are distributed. However, according to the present embodiment, hobbies and preferences can be expanded, so it is also possible to distribute advertisements related to sake to “search user ID” y-taro. Of course, in addition to the distribution of the advertisement, various controls can be performed on the “search user ID” y-taro based on the taste preference.

＜補足＞
上述のＳ４では、検索クエリと、投稿コンテンツ（内容）との関連度値を、検索ユーザに対するブログユーザとの関連度値として、ユーザ関連度情報３０１ｄに登録した（図７）。ここで、一般に投稿コンテンツとともに、投稿したブログユーザのユーザ情報詳細（プロフィール詳細）が公開されている場合がある。上述したように、このようなユーザ情報詳細は、公開範囲内で、投稿コンテンツログ３０１ｂに蓄積される。 <Supplement>
In S4 described above, the relevance value between the search query and the posted content (content) is registered in the user relevance information 301d as the relevance value with the blog user for the search user (FIG. 7). Here, in general, the user information details (profile details) of the posted blog user may be disclosed together with the posted content. As described above, such user information details are accumulated in the posted content log 301b within the disclosure range.

よって、ユーザ関連付部３０４は、ユーザ情報ＤＢ３０１ｃのユーザ属性情報（例えば、「ユーザ名」、「年齢」、「住所」）と、投稿コンテンツログ３０１ｂのユーザ情報詳細（例えば、「ブログユーザＩＤ」、「ニックネーム」、「年齢」、「住所」、「趣味」）との一致度、類似度に基づいて、検索ユーザに対するブログユーザとの関連度値を補正することができる。 Therefore, the user association unit 304 includes user attribute information (for example, “user name”, “age”, “address”) in the user information DB 301c and user information details (for example, “blog user ID”) in the posted content log 301b. , “Nickname”, “age”, “address”, “hobby”), and the degree of association with the blog user with respect to the search user can be corrected.

即ち、検索ユーザのユーザ属性情報と、ブログユーザのユーザ情報詳細とが一致（類似）するほど、検索ユーザとブログユーザとは同一人物である可能性が高いといえるため、関連度値をより高い値に補正する。このようにすることで、関連度値の信頼度をより高めることができる。 That is, as the user attribute information of the search user and the user information details of the blog user match (similar), the search user and the blog user are more likely to be the same person, so the relevance value is higher. Correct to the value. By doing in this way, the reliability of a relevance value can be raised more.

＜総括＞
以上、本発明の実施形態によれば、インターネット上から、ユーザの趣味嗜好等に関する情報を収集することが可能である。また、本発明の好適な実施の形態により本発明を説明した。ここでは特定の具体例を示して本発明を説明したが、特許請求の範囲に定義された本発明の広範な趣旨および範囲から逸脱することなく、これら具体例に様々な修正および変更を加えることができることは明らかである。すなわち、具体例の詳細および添付の図面により本発明が限定されるものと解釈してはならない。 <Summary>
As described above, according to the embodiment of the present invention, it is possible to collect information on the user's hobbies and preferences from the Internet. The present invention has been described with reference to preferred embodiments of the present invention. While the invention has been described with reference to specific embodiments, various modifications and changes may be made to the embodiments without departing from the broad spirit and scope of the invention as defined in the claims. Obviously you can. In other words, the present invention should not be construed as being limited by the details of the specific examples and the accompanying drawings.

１検索サーバ
２ブログサーバ
３情報収集サーバ
４端末
５ネットワーク
３１ＣＰＵ
３２ＲＯＭ
３３ＲＡＭ
３４入力装置
３５出力装置
３６通信装置
３７ＨＤＤ
３０１記憶部
３０２検索クエリ取得部
３０３投稿コンテンツ抽出部
３０４ユーザ関連付部
３０５ユーザ情報収集部 1 Search Server 2 Blog Server 3 Information Collection Server 4 Terminal 5 Network 31 CPU
32 ROM
33 RAM
34 Input device 35 Output device 36 Communication device 37 HDD
301 Storage Unit 302 Search Query Acquisition Unit 303 Post Content Extraction Unit 304 User Association Unit 305 User Information Collection Unit

Claims

Obtaining means for obtaining a search query input by a first user;
Extraction means for extracting post content related to the search query based on the search query;
Association means for associating the first user with the second user who has posted the extracted posted content;
An information collecting apparatus comprising:

The extraction means includes
Calculating a relevance value between the search query and the post content, and extracting post content having a relevance value greater than or equal to a predetermined value;
The information collecting apparatus according to claim 1.

The extraction means includes
Extracting the input time of the search query and posted content posted at a predetermined time before and after,
The information collecting apparatus according to claim 2.

The association means includes
Associating the first user with the second user based on a relevance value between the search query and the posted content;
The information collection device according to claim 2 or 3,

The extraction means includes
Calculating the relevance value including a relevance value between user information of the first user and user information of the second user;
The information collection device according to claim 1, wherein the information collection device is an information collection device.

Information collecting means for collecting information about the first user from the posted content of the second user associated with the first user;
6. The information collecting apparatus according to claim 1, further comprising:

On the computer,
Obtaining means for obtaining a search query input by a first user;
Extraction means for extracting post content related to the search query based on the search query;
An information collection program for causing the first user to function as an association means for associating the extracted second posted content with the second user.