JP5973927B2

JP5973927B2 - Feature estimation device and feature estimation method

Info

Publication number: JP5973927B2
Application number: JP2013018537A
Authority: JP
Inventors: 直史原
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2013-02-01
Filing date: 2013-02-01
Publication date: 2016-08-23
Anticipated expiration: 2033-02-01
Also published as: JP2014149723A

Description

本発明は、ユーザの特徴推定を行う特徴推定装置及び特徴推定方法に関する。 The present invention relates to a feature estimation apparatus and a feature estimation method for estimating a feature of a user.

従来、インターネット上の情報を用いたユーザ特徴推定においては、推定対象の本人、及び当該本人の友人等の本人と繋がりのあるユーザのデータ（テキスト情報等）に基づく手法が用いられている（下記特許文献１参照）。 Conventionally, in user feature estimation using information on the Internet, a method based on data (text information or the like) of a person who is connected to the person to be estimated and a person such as a friend of the person concerned (text information below) is used. Patent Document 1).

特開２００６−３０９６６０号公報JP 2006-309660 A

しかしながら、上記の方法では以下に示すような問題点があった。ユーザの特徴を示す情報として、例えばテキスト情報中に出現する人物名や地名等の固有名詞及びその出現頻度データを用いることができる。このようなユーザの特徴は、ニュースや商品を推薦（レコメンド）する手法に用いられる。このようなユーザ特徴の抽出元として、例えばＴｗｉｔｔｅｒ（登録商標）と呼ばれるマイクロブログにおいてユーザが投稿したテキストやユーザ間の繋がりを示す情報を利用することが考えられる。 However, the above method has the following problems. As information indicating user characteristics, for example, proper nouns such as person names and place names appearing in text information and their appearance frequency data can be used. Such user characteristics are used in a method of recommending (recommending) news and products. As such a user feature extraction source, for example, it is conceivable to use text posted by a user or information indicating a connection between users in a microblog called Twitter (registered trademark).

しかしながら、このようなマイクロブログに投稿されるテキストは、雨が降ってきた、電車が遅延している等、通常のテキスト文書と比較してユーザの特徴推定に用いることができる固有名詞の出現頻度が低いテキストであることが多い。即ち、上記のようにマイクロブログの情報には、本人及び本人と直接繋がりのある友人等のデータ内にユーザ特徴推定に活用可能なデータが少ない、又は存在しない場合がある。その場合は、十分な精度のユーザ特徴推定を行うことができないという問題があった。 However, the text posted to such microblogs is the frequency of appearance of proper nouns that can be used for user feature estimation compared to normal text documents, such as raining, train delays, etc. Is often low text. That is, as described above, the microblog information may have little or no data that can be used for user feature estimation in the data of the person and friends directly connected to the person. In that case, there is a problem that the user feature estimation with sufficient accuracy cannot be performed.

本発明は、上記の問題点に鑑みてなされたものであり、特徴の推定対象となるユーザ及び当該ユーザと直接繋がりのあるユーザに係るデータからユーザ特徴推定に活用可能なデータを十分に得られない場合であっても、十分な精度のユーザ特徴推定を行うことができる特徴推定装置及び特徴推定方法を提供することを目的とする。 The present invention has been made in view of the above-described problems, and sufficiently usable data for user feature estimation can be obtained from data relating to a user to be a feature estimation target and a user directly connected to the user. It is an object of the present invention to provide a feature estimation device and a feature estimation method that can perform user feature estimation with sufficient accuracy even when there is no such feature.

上記の目的を達成するために、本発明に係る特徴推定装置は、ユーザの特徴推定を行う特徴推定装置であって、複数のユーザ間のリンク関係を示すリンク情報を取得するリンク情報取得手段と、リンク情報取得手段によって取得されたリンク情報に基づいて、予め設定された１以上の第１のユーザと予め設定された１以上の第２のユーザとの間の類似度を算出する類似度算出手段と、類似度算出手段によって算出された、第２のユーザと、推定対象のユーザとリンク関係を有する第１のユーザとの間の類似度に基づいて、当該推定対象のユーザの特徴推定に利用する第２のユーザを特定する特定手段と、特定手段によって特定された第２のユーザに関するデータを用いて推定対象のユーザの特徴推定を行う特徴推定手段と、特徴推定手段によって行われたユーザの特徴推定を示す情報を出力する出力手段と、を備える。 In order to achieve the above object, a feature estimation apparatus according to the present invention is a feature estimation apparatus that performs user feature estimation, and includes link information acquisition means for acquiring link information indicating a link relationship between a plurality of users. Similarity calculation for calculating the similarity between one or more preset first users and one or more preset second users based on the link information acquired by the link information acquisition means Based on the similarity between the second user calculated by the means and the similarity calculating means, and the first user having a link relationship with the estimation target user, the feature estimation of the estimation target user is performed. A specifying unit that specifies a second user to be used, a feature estimating unit that performs feature estimation of a user to be estimated using data related to the second user specified by the specifying unit, and a feature estimating unit And an output means for outputting information indicating characteristics estimation of the user made Te.

本発明に係る本文抽出装置では、第２のユーザと、推定対象のユーザとリンク関係を有する第１のユーザとの間の類似度に基づいて、当該推定対象のユーザの特徴推定に利用する第２のユーザが特定され、第２のユーザに関するデータからユーザの特徴推定が行われる。ここで、第２のユーザは、推定対象のユーザと直接リンク関係を有していなくてもよい。従って、本発明に係る本文抽出装置によれば、第２のユーザをユーザ特徴推定に活用可能なデータを有するユーザとすれば、特徴の推定対象となるユーザ及び当該ユーザと直接繋がりのあるユーザに係るデータからユーザ特徴推定に活用可能なデータを十分に得られない場合であっても、十分な精度のユーザ特徴推定を行うことができる。 In the text extracting device according to the present invention, based on the degree of similarity between the second user and the first user having a link relationship with the estimation target user, the second text extraction device is used for feature estimation of the estimation target user. Two users are identified, and user feature estimation is performed from data relating to the second user. Here, the second user may not have a direct link relationship with the estimation target user. Therefore, according to the text extracting device of the present invention, if the second user is a user who has data that can be used for user feature estimation, the user who is the target of feature estimation and the user who is directly connected to the user are identified. Even if sufficient data that can be used for user feature estimation cannot be obtained from such data, it is possible to perform user feature estimation with sufficient accuracy.

類似度算出手段は、第１のユーザとリンク関係があるユーザ、及び第２のユーザとリンク関係があるユーザの一致度に基づいて、類似度を算出する、こととしてもよい。この構成によれば、計算量の少ない演算により、ユーザ間の類似度を算出することができる。 The similarity calculation means may calculate the similarity based on the degree of coincidence of the user having a link relationship with the first user and the user having a link relationship with the second user. According to this configuration, the similarity between users can be calculated by a calculation with a small amount of calculation.

類似度算出手段は、第１のユーザとリンク関係があるユーザの数、及び第２のユーザとリンク関係があるユーザの数のうち少ないものを分母、第１のユーザ及び第２のユーザの双方とリンク関係があるユーザの数を分子とした数を一致度とする、こととしてもよい。この構成によれば、第１のユーザと第２のユーザとの何れかとリンク関係を有するユーザが極端に少ない場合等であっても、正確にユーザ間の類似度を算出することができる。 The similarity calculation means uses a smaller denominator of both the number of users linked to the first user and the number of users linked to the second user, both the first user and the second user. The number of users who have a link relationship with the numerator may be used as the degree of coincidence. According to this configuration, even when the number of users having a link relationship with either the first user or the second user is extremely small, the similarity between users can be accurately calculated.

特徴推定手段は、第２のユーザに関するデータとして当該第２のユーザに係るテキストに含まれる単語を用いて推定対象のユーザの特徴推定を行う、こととしてもよい。この構成によれば、適切かつ確実にユーザの特徴推定を行うことができる。 The feature estimation means may perform feature estimation of the estimation target user using a word included in the text relating to the second user as data relating to the second user. According to this configuration, it is possible to appropriately and reliably perform user feature estimation.

特徴推定手段は、推定対象のユーザに係るテキストに含まれる単語も用いて推定対象のユーザの特徴推定を行う、こととしてもよい。この構成によれば、ユーザ自身の情報に基づいてより適切にユーザの特徴推定を行うことができる。 The feature estimation means may perform feature estimation of the estimation target user using a word included in the text related to the estimation target user. According to this structure, a user's characteristic estimation can be performed more appropriately based on a user's own information.

リンク情報取得手段は、複数のユーザ間のリンク関係の方向も示すリンク情報を取得する、こととしてもよい。この構成によれば、リンクの方向にも基づいて適切にユーザの特徴推定を行うことができる。 The link information acquisition means may acquire link information that also indicates directions of link relationships between a plurality of users. According to this configuration, it is possible to appropriately perform user feature estimation based on the link direction.

ところで、本発明は、上記のように特徴推定装置の発明として記述できる他に、以下のように特徴推定方法の発明としても記述することができる。これはカテゴリが異なるだけで、実質的に同一の発明であり、同様の作用及び効果を奏する。 By the way, the present invention can be described as an invention of a feature estimation method as described above, and can also be described as an invention of a feature estimation method as follows. This is substantially the same invention only in different categories, and has the same operations and effects.

即ち、本発明に係る特徴推定方法は、ユーザの特徴推定を行う特徴推定方法であって、複数のユーザ間のリンク関係を示すリンク情報を取得するリンク情報取得ステップと、リンク情報取得ステップにおいて取得されたリンク情報に基づいて、予め設定された１以上の第１のユーザと予め設定された１以上の第２のユーザとの間の類似度を算出する類似度算出ステップと、類似度算出ステップにおいて算出された、第２のユーザと、推定対象のユーザとリンク関係を有する第１のユーザとの間の類似度に基づいて、当該推定対象のユーザの特徴推定に利用する第２のユーザを特定する特定ステップと、特定ステップにおいて特定された第２のユーザに関するデータを用いて推定対象のユーザの特徴推定を行う特徴推定ステップと、特徴推定ステップにおいて行われたユーザの特徴推定を示す情報を出力する出力ステップと、を含む。 That is, the feature estimation method according to the present invention is a feature estimation method for estimating a feature of a user, and is acquired in a link information acquisition step for acquiring link information indicating a link relationship between a plurality of users, and a link information acquisition step. A similarity calculation step for calculating a similarity between one or more preset first users and one or more preset second users based on the link information, and a similarity calculation step Based on the similarity between the second user calculated in step 1 and the first user having a link relationship with the estimation target user, the second user to be used for feature estimation of the estimation target user is A specifying step for specifying, a feature estimating step for performing feature estimation of the user to be estimated using data relating to the second user specified in the specifying step, and a feature estimating step Tsu comprising an output step of outputting information indicating characteristics estimation of a user performed in up, the.

本発明によれば、特徴の推定対象となるユーザ及び当該ユーザと直接繋がりのあるユーザに係るデータからユーザ特徴推定に活用可能なデータを十分に得られない場合であっても、十分な精度のユーザ特徴推定を行うことができる。 According to the present invention, even when data that can be used for user feature estimation cannot be sufficiently obtained from data related to a user whose feature is to be estimated and a user who is directly connected to the user, sufficient accuracy can be obtained. User feature estimation can be performed.

本発明の実施形態に係る特徴推定装置の機能構成を示す図である。It is a figure which shows the function structure of the feature estimation apparatus which concerns on embodiment of this invention. ユーザと別のユーザとのフォロー関係を示す図である。It is a figure which shows the follow relationship between a user and another user. 有名人と有名人以外のユーザとのフォロー関係、及び推定補助者と推定補助者以外のユーザとのフォロー関係を示す図である。It is a figure which shows the follow relationship between a celebrity and users other than a celebrity, and the follow relationship between a presumed assistant and users other than a presumed assistant. 有名人と有名人以外のユーザとのフォロー関係、及び推定補助者と推定補助者以外のユーザとのフォロー関係を示す図である。It is a figure which shows the follow relationship between a celebrity and users other than a celebrity, and the follow relationship between a presumed assistant and users other than a presumed assistant. 本発明の実施形態に係る特徴推定装置のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the feature estimation apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る特徴推定装置で実行される処理である特徴推定方法を示すフローチャートである。It is a flowchart which shows the feature estimation method which is a process performed with the feature estimation apparatus which concerns on embodiment of this invention.

以下、図面と共に本発明に係る特徴推定装置及び特徴推定方法について詳細に説明する。なお、図面の説明においては同一要素には同一符号を付し、重複する説明を省略する。 Hereinafter, a feature estimation apparatus and a feature estimation method according to the present invention will be described in detail with reference to the drawings. In the description of the drawings, the same elements are denoted by the same reference numerals, and redundant description is omitted.

図１に本実施形態に係る特徴推定装置１０を示す。特徴推定装置１０は、ユーザの特徴推定を行う装置である。推定されたユーザの特徴は、例えば、コンテンツ配信を行う際に利用される。特徴推定装置１０は、具体的には例えば、インターネットＮ等のネットワークに接続されるサーバ装置である。本実施形態では、ユーザの特徴推定は、例えば、Ｔｗｉｔｔｅｒ等のマイクロブログのデータを利用して行われる。本実施形態では、Ｔｗｉｔｔｅｒを例として説明する。 FIG. 1 shows a feature estimation apparatus 10 according to the present embodiment. The feature estimation device 10 is a device that performs user feature estimation. The estimated user characteristics are used, for example, when content is distributed. Specifically, the feature estimation device 10 is a server device connected to a network such as the Internet N, for example. In this embodiment, the user's feature estimation is performed using microblog data such as Twitter, for example. In this embodiment, Twitter will be described as an example.

Ｔｗｉｔｔｅｒは、各ユーザがテキスト（ツイート）を投稿するシステムである。投稿されたテキストは、Ｔｗｉｔｔｅｒのシステムにおいて他のユーザから参照される。本実施形態では、当該テキストをユーザデータとしてユーザの特徴推定に利用する。なお、テキストであるユーザデータには、ツイートだけではなく、ユーザにおいて登録され他のユーザから参照可能なユーザの自己紹介文が含まれていてもよい。Ｔｗｉｔｔｅｒにおける各ユーザ（アカウント）には、ユーザを特定する情報であるユーザＩＤが付与されている。 Twitter is a system in which each user posts a text (tweet). The posted text is referred to by other users in the Twitter system. In the present embodiment, the text is used as user data for user feature estimation. Note that the user data, which is text, may include not only tweets but also user self-introduction sentences registered by the user and referable by other users. Each user (account) in Twitter is given a user ID, which is information for identifying the user.

Ｔｗｉｔｔｅｒにおいては、別のユーザを登録して当該別のユーザからの投稿を表示できる機能がある。このようにあるユーザが別のユーザを登録することをフォローと呼ぶ。図２に示すように、例えば、あるユーザ３０は、一般ユーザ３１である友人Ａ、友人Ｂ、友人ｎ及び有名人３２をフォローしている。このようにＴｗｉｔｔｅｒでは、ユーザ間にリンクが設けられる。また、フォローは、フォローしている側からフォローされている側への方向のリンク関係である。本実施形態では、当該リンク関係をユーザの特徴推定に利用する。なお、上記において、有名人とは、芸能人、政治家、タレント等のフォローされている数が多いユーザである。 Twitter has the function of registering another user and displaying posts from the other user. This registration of one user by another user is called follow. As shown in FIG. 2, for example, a certain user 30 follows a general user 31 that is a friend A, a friend B, a friend n, and a celebrity 32. Thus, in Twitter, a link is provided between users. The follow is a link relationship in the direction from the following side to the following side. In this embodiment, the link relationship is used for user feature estimation. In addition, in the above, a celebrity is a user with many followers, such as an entertainer, a politician, and a talent.

上述したように、Ｔｗｉｔｔｅｒにおける通常のユーザの投稿は、ユーザの特徴推定を行うための情報が含まれていないことが多い。一方で、ユーザの特徴推定に活用可能なテキストを頻繁に投稿するユーザがいる。例えば、スポーツ新聞やタレントグループの広報担当者である。これらのユーザの投稿は、ニュースや芸能人に関する情報であるため、固有名詞が多く含まれており、ユーザの特徴推定に活用可能である。本実施形態では、これらのユーザを推定補助者と呼ぶ。 As described above, an ordinary user's post in Twitter often does not include information for performing user feature estimation. On the other hand, there are users who frequently post texts that can be used for user feature estimation. For example, a spokesperson for a sports newspaper or a talent group. Since these user posts are information about news and entertainers, they contain many proper nouns and can be used to estimate user characteristics. In this embodiment, these users are called estimation assistants.

推定対象のユーザと推定補助者との間に直接リンク関係がある場合（例えば、推定対象のユーザが推定補助者をフォローしている場合）、推定補助者の投稿をユーザの特徴推定に利用することができる。しかし、直接リンク関係がない場合には、通常、推定補助者の投稿をユーザの特徴推定に利用することはできない。本実施形態では、推定対象のユーザと推定補助者との間に直接リンク関係がない場合であっても、推定補助者の投稿をユーザの特徴推定に利用できるようにするものである。図３に示すように、有名人３２及び推定補助者３３は、多くの一般ユーザ３１（例えば、有名人３２及び推定補助者３３以外のユーザ）からフォローされている。図３において、ハッチングされた丸で示す一般ユーザ３１は、有名人３２及び推定補助者３３の両方をフォローしている。 When there is a direct link relationship between the estimation target user and the estimation assistant (for example, when the estimation target user follows the estimation assistant), the posting of the estimation assistant is used for user feature estimation. be able to. However, when there is no direct link relationship, it is usually impossible to use the post of the estimation assistant for user feature estimation. In the present embodiment, even if there is no direct link relationship between the estimation target user and the estimation assistant, the posting of the estimation assistant can be used for user feature estimation. As shown in FIG. 3, the celebrity 32 and the estimation assistant 33 are followed by many general users 31 (for example, users other than the celebrity 32 and the estimation assistant 33). In FIG. 3, a general user 31 indicated by a hatched circle follows both a celebrity 32 and an estimated assistant 33.

なお、本実施形態では、Ｔｗｉｔｔｅｒを例として説明するが、上記のユーザデータ及びリンク関係を利用できるものであれば任意のシステムや情報を利用して本発明を実施することができる。例えば、ＳＮＳ（ソーシャル・ネットワーキング・サービス）におけるユーザデータとユーザ間のリンク関係とを用いることができる。 In the present embodiment, Twitter is described as an example, but the present invention can be implemented using any system and information as long as the user data and the link relationship described above can be used. For example, user data in SNS (Social Networking Service) and a link relationship between users can be used.

引き続いて、本実施形態に係る特徴推定装置１０の機能について詳細に説明する。特徴推定装置１０は、ユーザの特徴推定に用いるデータを取得（受信）できるように、Ｔｗｉｔｔｅｒのサービスを提供するサーバとインターネットＮ等のネットワークを介して接続されている。図１に示すように、特徴推定装置１０は、有名人ユーザＩＤ記憶部１１と、推定補助者ユーザＩＤ記憶部１２と、データ取得部１３と、リンク情報記憶部１４と、ユーザデータ記憶部１５と、類似度算出部１６と、特定部１７と、ユーザ特徴生成部１８と、ユーザ特徴記憶部１９とを備えて構成される。 Subsequently, the function of the feature estimation apparatus 10 according to the present embodiment will be described in detail. The feature estimation apparatus 10 is connected to a server that provides a Twitter service via a network such as the Internet N so that data used for user feature estimation can be obtained (received). As shown in FIG. 1, the feature estimation apparatus 10 includes a celebrity user ID storage unit 11, an estimation assistant user ID storage unit 12, a data acquisition unit 13, a link information storage unit 14, and a user data storage unit 15. The similarity calculation unit 16, the specifying unit 17, the user feature generation unit 18, and the user feature storage unit 19 are configured.

有名人ユーザＩＤ記憶部１１は、本発明における第１のユーザとして有名人のアカウントのユーザＩＤを記憶する手段である。上述したように有名人は、多数の一般ユーザから多くフォローされているユーザである。どのユーザが有名人であるかは特徴推定装置１０の管理者等によって予め設定され、有名人のユーザＩＤは当該管理者等により特徴推定装置１０に入力されている。有名人ユーザＩＤ記憶部１１は、１以上の有名人のユーザＩＤを、例えばリスト（一覧）で記憶している。なお、本実施形態では、有名人を第１のユーザとしているが、第１のユーザは必ずしも有名である必要はなく、本発明における第１のユーザとしての適格を満たすユーザであれば任意のユーザを第１のユーザとしてもよい。 The celebrity user ID storage unit 11 is means for storing the user ID of an account of a celebrity as the first user in the present invention. As described above, the celebrity is a user who is frequently followed by many general users. Which user is a celebrity is set in advance by the administrator of the feature estimation device 10 or the like, and the user ID of the celebrity is input to the feature estimation device 10 by the administrator or the like. The celebrity user ID storage unit 11 stores one or more celebrity user IDs, for example, in a list. In this embodiment, the celebrity is the first user. However, the first user is not necessarily famous, and any user can be selected as long as the user satisfies the qualification as the first user in the present invention. It is good also as a 1st user.

推定補助者ユーザＩＤ記憶部１２は、本発明における第２のユーザとして推定補助者のアカウントのユーザＩＤを記憶する手段である。上述したように推定補助者は、ユーザの特徴推定に活用可能な投稿等をするユーザであり、また、多数の一般ユーザから多くフォローされているユーザである。どのユーザが推定補助者であるかは特徴推定装置１０の管理者等によって予め設定され、推定補助者のユーザＩＤは当該管理者等により特徴推定装置１０に入力されている。推定補助者ユーザＩＤ記憶部１２は、１以上の推定補助者のユーザＩＤを、例えばリスト（一覧）で記憶している。 The estimated assistant user ID storage unit 12 is a means for storing the user ID of the account of the estimated assistant as the second user in the present invention. As described above, the estimation assistant is a user who makes a post or the like that can be utilized for user feature estimation, and is a user who is frequently followed by many general users. Which user is the estimation assistant is set in advance by the administrator or the like of the feature estimation apparatus 10, and the user ID of the estimation assistant is input to the feature estimation apparatus 10 by the administrator or the like. The estimated assistant user ID storage unit 12 stores, for example, a list (list) of user IDs of one or more estimated assistants.

データ取得部１３は、ユーザの特徴推定に必要なデータを取得する手段である。データ取得部１３は、例えば、インターネットＮ経由でＴｗｉｔｔｅｒのサーバに対してデータを要求することで、当該データを取得（受信）する。具体的には、データ取得部１３は、複数のユーザ間のリンク関係を示すリンク情報を取得するリンク情報取得手段である。リンク情報は、上述したユーザ間のフォロー関係を示す情報である。 The data acquisition unit 13 is means for acquiring data necessary for user feature estimation. For example, the data acquisition unit 13 acquires (receives) the data by requesting the data from the Twitter server via the Internet N. Specifically, the data acquisition unit 13 is a link information acquisition unit that acquires link information indicating a link relationship between a plurality of users. The link information is information indicating the follow relationship between users described above.

データ取得部１３は、推定対象のユーザのユーザＩＤを入力する。この入力は、例えば、特徴推定装置１０に接続される端末から、特徴推定装置１０の管理者等による当該端末の操作により送信されるユーザＩＤを受信することにより行われる。データ取得部１３は、このユーザＩＤをキーとして、Ｔｗｉｔｔｅｒのサーバから当該ユーザがフォローしているユーザのユーザＩＤのリスト（一覧）をリンク情報として取得する。なお、このリンク情報は、推定対象のユーザから別のユーザへの方向のリンクを示すものである。なお、データ取得部１３は、推定対象のユーザに係るリンク情報を取得する際に、有名人ユーザＩＤ記憶部１１に記憶されている有名人のユーザＩＤを参照して、推定対象のユーザと有名人との間のリンク関係（当該ユーザから有名人へのフォロー関係）に係るリンク情報のみを取得することとしてもよい。 The data acquisition unit 13 inputs the user ID of the estimation target user. This input is performed, for example, by receiving a user ID transmitted from a terminal connected to the feature estimation apparatus 10 by an operation of the terminal by an administrator of the feature estimation apparatus 10 or the like. The data acquisition unit 13 acquires, as link information, a list of user IDs of the users that the user is following from the Twitter server using the user ID as a key. The link information indicates a link in the direction from the estimation target user to another user. Note that the data acquisition unit 13 refers to the celebrity user ID stored in the celebrity user ID storage unit 11 when acquiring link information related to the estimation target user, so that the estimation target user and the celebrity It is good also as acquiring only the link information which concerns on the link relationship (follow relationship from the said user to a celebrity).

また、データ取得部１３は、有名人ユーザＩＤ記憶部１１から有名人のユーザＩＤを読み出して、各有名人のユーザＩＤをキーとして、Ｔｗｉｔｔｅｒのサーバから当該有名人をフォローしているユーザ（当該ユーザが有名人をフォローしている）のユーザＩＤのリスト（一覧）をリンク情報として取得する。なお、このリンク情報は、当該有名人以外のユーザから当該有名人への方向のリンクを示すものである。 In addition, the data acquisition unit 13 reads the celebrity user ID from the celebrity user ID storage unit 11, and uses the celebrity user ID as a key to follow the celebrity from the Twitter server (the user is the celebrity. A list (list) of user IDs of “following” is acquired as link information. This link information indicates a link in the direction from the user other than the celebrity to the celebrity.

また、データ取得部１３は、推定補助者ユーザＩＤ記憶部１２から推定補助者のユーザＩＤを読み出して、各推定補助者のユーザＩＤをキーとして、Ｔｗｉｔｔｅｒのサーバから当該推定補助者をフォローしているユーザ（当該ユーザが推定補助者をフォローしている）のユーザＩＤのリスト（一覧）をリンク情報として取得する。なお、このリンク情報は、当該推定補助者以外のユーザから当該推定補助者への方向のリンクを示すものである。データ取得部１３は、上記のように取得したユーザ間のフォロー関係を示すリンク情報をリンク情報記憶部１４に入力する。 In addition, the data acquisition unit 13 reads out the user ID of the estimated assistant from the estimated assistant user ID storage unit 12, and follows the estimated assistant from the Twitter server using the user ID of each estimated assistant as a key. A list (list) of user IDs of existing users (the user is following the estimated assistant) is acquired as link information. This link information indicates a link in the direction from a user other than the estimated assistant to the estimated assistant. The data acquisition unit 13 inputs link information indicating the follow relationship between users acquired as described above to the link information storage unit 14.

データ取得部１３は、ユーザの特徴推定に用いる推定補助者に関するデータを取得する。具体的には、データ取得部１３は、推定補助者のユーザＩＤをキーとして、Ｔｗｉｔｔｅｒのサーバから当該推定補助者の投稿（ツイート）であるテキスト、及び当該推定補助者の自己紹介文であるテキストを取得する。また、データ取得部１３は、推定対象のユーザのユーザＩＤをキーとして、Ｔｗｉｔｔｅｒのサーバから当該推定対象のユーザの投稿（ツイート）であるテキスト、及び当該推定補助者の自己紹介文であるテキストを取得する。また、上記と同様に有名人の投稿及び自己紹介文を取得することとしてもよい。 The data acquisition unit 13 acquires data related to an estimation assistant used for user feature estimation. Specifically, the data acquisition unit 13 uses the estimated assistant's user ID as a key, the text that is the posting (tweet) of the estimated assistant from the Twitter server, and the text that is a self-introduction sentence of the estimated assistant. To get. In addition, the data acquisition unit 13 uses the user ID of the estimation target user as a key, the text that is the posting (tweet) of the estimation target user from the Twitter server, and the text that is the self-introduction sentence of the estimation assistant. get. Moreover, it is good also as acquiring a celebrity's contribution and a self-introduction sentence similarly to the above.

なお、データ取得部１３は、各ユーザの投稿全てを取得することとしてもよいし、例えば、過去一か月分等の一定期間の投稿のみを取得することとしてもよい。データ取得部１３は、上記のように取得した各ユーザに係るテキストをユーザデータ記憶部１５に入力する。 Note that the data acquisition unit 13 may acquire all the posts of each user, or may acquire only posts for a certain period such as the past one month. The data acquisition unit 13 inputs text related to each user acquired as described above to the user data storage unit 15.

また、上記の例では、各データをインターネットＮ等のネットワークを介して（外部インターネット環境より）取得しているが、例えば、特徴推定装置１０自体がマイクロブログのサービスを提供している等、自身のサーバ内から取得可能であればそこから取得してもよい。このような構成にすることで、各データの取得にかかる時間が短縮され、装置の動作パフォーマンスが向上する。 In the above example, each data is acquired via a network such as the Internet N (from an external Internet environment). For example, the feature estimation device 10 itself provides a microblog service. If it can be obtained from within the server, it may be obtained from there. With this configuration, the time taken to acquire each data is shortened, and the operation performance of the apparatus is improved.

リンク情報記憶部１４は、データ取得部１３から入力されたリンク情報を記憶する。例えば、フォロー元のユーザのユーザＩＤとフォロー先のユーザのユーザＩＤとを対応付けて記憶しておく。 The link information storage unit 14 stores the link information input from the data acquisition unit 13. For example, the user ID of the follow source user and the user ID of the follow destination user are stored in association with each other.

ユーザデータ記憶部１５は、データ取得部１３から入力された各ユーザに係るテキストを記憶する。例えば、ユーザのユーザＩＤとテキストとを対応付けて記憶しておく。 The user data storage unit 15 stores text relating to each user input from the data acquisition unit 13. For example, the user ID of the user and the text are stored in association with each other.

類似度算出部１６は、リンク情報記憶部１４に記憶されたリンク情報に基づいて、有名人と推定補助者との間の類似度を算出する類似度算出手段である。類似度算出部１６は、各有名人と各推定補助者との組み合わせのそれぞれに対して類似度を算出する。類似度算出部１６は、リンク情報によって示される、有名人をフォローしているユーザ（有名人とリンク関係があるユーザ）、及び推定補助者をフォローしているユーザ（推定補助者とリンク関係があるユーザ）の一致度に基づいて上記の類似度を算出する。 The similarity calculation unit 16 is a similarity calculation unit that calculates the similarity between a celebrity and an estimation assistant based on link information stored in the link information storage unit 14. The similarity calculation unit 16 calculates the similarity for each combination of each celebrity and each estimation assistant. The similarity calculation unit 16 indicates the user who follows the celebrity (the user who has a link relationship with the celebrity) and the user who follows the estimation assistant (the user who has a link relationship with the estimation assistant) indicated by the link information. ) To calculate the similarity.

具体的には、類似度算出部１６は、有名人をフォローしているユーザの数（ユーザＩＤの数）、及び推定補助者をフォローしているユーザの数（ユーザＩＤの数）をカウントする。また、有名人と推定補助者との両方をフォローしているユーザ（有名人及び推定補助者の双方とリンク関係があるユーザ）の数（ユーザＩＤの数）をカウントする。類似度算出部１６は、カウントした、有名人をフォローしているユーザの数、及び推定補助者をフォローしているユーザの数のうち小さいものを分母、有名人と推定補助者との両方をフォローしているユーザの数を分子とした値（シンプソン（Ｓｉｍｐｓｏｎ）係数）を類似度として算出する。 Specifically, the similarity calculation unit 16 counts the number of users following the celebrity (number of user IDs) and the number of users following the estimated assistant (number of user IDs). In addition, the number of users (user IDs) who follow both the celebrity and the estimated assistant (users linked to both the celebrity and the estimated assistant) is counted. The similarity calculation unit 16 follows the denominator, both the celebrity and the estimated assistant, among the counted number of users following the celebrity and the number of users following the estimated assistant. A value (Simpson coefficient) with the number of users who are present as a numerator is calculated as the similarity.

上記の類似度は、有名人と推定補助者とにフォローしているユーザが類似しているか示すものである。これは、有名人と推定補助者との特徴の類似を示しているものともいえる。なお、類似度算出部１６は、推定対象のユーザがフォローしている各有名人と、全ての推定補助者との間のみ類似度を算出することとしてもよい。また、上記の類似度の算出は、一例であり、有名人のリンク関係と推定補助者のリンク関係とに基づいて有名人と推定補助者との間の類似度を算出する方法であれば任意の方法を用いることができる。 The degree of similarity indicates whether the user following the celebrity and the estimated assistant is similar. It can be said that this shows the similarities between the celebrity and the estimated assistant. The similarity calculation unit 16 may calculate the similarity only between each celebrity followed by the estimation target user and all estimation assistants. In addition, the above calculation of the similarity is an example, and any method can be used as long as the similarity between the celebrity and the estimated assistant is calculated based on the link relationship between the celebrity and the estimated assistant. Can be used.

類似度算出部１６は、算出した類似度と閾値とを比較して閾値を超えているか否かを判断する。閾値は、特徴推定装置１０の管理者等によって予め設定され、特徴推定装置１０に入力されて類似度算出部１６に記憶されている。閾値は、例えば、有名人と推定補助者とが類似していると判断しえる値とされる。類似度算出部１６は、類似度が閾値を超えると判断した場合、当該類似度に係る有名人のユーザＩＤ、当該類似度に係る推定補助者のユーザＩＤ及び類似度を対応付けて特定部１７に出力する。この情報は、例えば、図４のテーブルに示すような情報である。 The similarity calculation unit 16 compares the calculated similarity with a threshold and determines whether or not the threshold is exceeded. The threshold value is set in advance by an administrator of the feature estimation apparatus 10, input to the feature estimation apparatus 10, and stored in the similarity calculation unit 16. The threshold value is, for example, a value at which it can be determined that a celebrity and an estimated assistant are similar. When the similarity calculation unit 16 determines that the similarity exceeds the threshold, the user ID of the celebrity related to the similarity, the user ID of the estimation assistant related to the similarity, and the similarity are associated with the specifying unit 17. Output. This information is, for example, information as shown in the table of FIG.

特定部１７は、類似度算出部１６によって算出された類似度に基づいて、推定対象のユーザの特徴推定に利用する推定補助者を特定する特定手段である。特定部１７は、まず、リンク情報記憶部１４に記憶されているリンク情報と有名人ユーザＩＤ記憶部１１に記憶されている有名人のユーザＩＤとを参照して、推定対象のユーザがフォローしている有名人（推定対象のユーザとリンク関係を有する有名人）を特定する。 The specifying unit 17 is a specifying unit that specifies an estimation assistant to be used for estimating the feature of the estimation target user based on the similarity calculated by the similarity calculating unit 16. First, the specifying unit 17 refers to the link information stored in the link information storing unit 14 and the celebrity user ID stored in the celebrity user ID storing unit 11, and the user to be estimated follows. A celebrity (a celebrity who has a link relationship with the user to be estimated) is specified.

続いて、特定部１７は、類似度算出部１６から入力された情報によって、当該有名人と対応付いている推定補助者を推定対象のユーザの特徴推定に利用する推定補助者として特定する。この推定補助者は、推定対象のユーザがフォローしている有名人との間で閾値を超える類似度を有する推定補助者である。 Subsequently, the specifying unit 17 specifies the estimation assistant associated with the celebrity as the estimation assistant used for estimating the feature of the estimation target user based on the information input from the similarity calculation unit 16. This estimation assistant is an estimation assistant having a degree of similarity exceeding a threshold with a celebrity that the user to be estimated follows.

このように特定された推定補助者は、ユーザがフォローしている有名人と特徴が類似していると考えられる。従って、ユーザとこの推定補助者との間に直接的なリンク関係が無かったとしても、推定補助者に関するデータを用いてユーザの特徴推定が可能になる。例えば、有名人がタレントグループに属するタレントであり、当該有名人と類似する推定補助者が当該タレントグループの広報担当者であった場合、ユーザと広報担当者との間にフォロー関係が無かったとしても推定補助者の投稿をユーザの特徴推定に用いることができる。 The presumed assistant identified in this way is considered to have similar characteristics to the celebrity that the user is following. Therefore, even if there is no direct link relationship between the user and the estimation assistant, it is possible to estimate the user's characteristics using data on the estimation assistant. For example, if a celebrity is a talent belonging to a talent group and the estimated assistant similar to the celebrity is a spokesperson for the talent group, even if there is no follow-up relationship between the user and the spokesperson The assistant's post can be used for user feature estimation.

上述したように例えば、Ｔｗｉｔｔｅｒにおいては、各ユーザは特徴推定に用いることができる投稿をすることは少ない。それは、例えば有名人であっても同様である。従って、ユーザが有名人に興味がありフォローをしていたとしても、当該有名人の投稿からユーザの特徴推定を行うことは難しい。しかし、上述した推定補助者のような特殊なユーザは、特徴推定を行いえる投稿をすることが多い。例えば、タレントグループの広報担当者は、タレントグループに係る情報（例えば、出演するテレビ番組やリリースした楽曲名）を投稿する。 As described above, for example, in Twitter, each user rarely makes a post that can be used for feature estimation. The same applies to celebrities, for example. Therefore, even if the user is interested in a celebrity and is following, it is difficult to estimate the user's feature from the celebrity's post. However, special users such as the above-mentioned estimation assistants often post that can perform feature estimation. For example, a spokesperson for the talent group posts information related to the talent group (for example, a TV program to appear or the name of a released song).

特定部１７は、特定した、特徴推定に利用する推定補助者を示す情報（例えば、推定補助者）をユーザ特徴生成部１８に出力する。なお、特定部１７は、１人の推定対象のユーザに対して、上記の判断基準を満たす複数の推定補助者を特定することとしてもよい。あるいは、類似度の順に特定の数に推定補助者（例えば、１人の推定補助者）を特定することとしてもよい。 The specifying unit 17 outputs the specified information indicating the estimation assistant used for feature estimation (for example, the estimation assistant) to the user feature generation unit 18. Note that the specifying unit 17 may specify a plurality of estimation assistants that satisfy the above-described determination criteria for one estimation target user. Or it is good also as specifying an estimation assistant (for example, one estimation assistant) to a specific number in order of similarity.

ユーザ特徴生成部１８は、特定部１７から入力された情報によって示される推定補助者に関するデータを用いて推定対象のユーザの特徴推定を行う特徴推定手段である。ユーザ特徴生成部１８は、当該推定補助者に関するデータとして、ユーザデータ記憶部１５に記憶されている、当該推定補助者に係るテキストを読み出す。ユーザ特徴生成部１８は、当該推定補助者に係るテキストに含まれる単語を用いて推定対象のユーザの特徴推定を行う。ユーザ特徴生成部１８は、ユーザの特徴を示す情報として、予め設定された単語である特徴語に対応付けられた数値の情報を生成する。この情報は、特徴語の数の次元のベクトルとなる。当該特徴量は、特徴語に対応付けられた数値が大きいほどその特徴語によって示される特徴をユーザが有することを示す。なお、特徴語は、特徴推定装置１０の管理者等によって用意された辞書に登録されて、ユーザ特徴生成部１８に記憶されている。 The user feature generation unit 18 is a feature estimation unit that performs feature estimation of a user to be estimated using data related to an estimation assistant indicated by information input from the specifying unit 17. The user feature generation unit 18 reads the text relating to the estimation assistant stored in the user data storage unit 15 as data relating to the estimation assistant. The user feature generation unit 18 performs feature estimation of the user to be estimated using words included in the text related to the estimation assistant. The user feature generation unit 18 generates numerical information associated with a feature word that is a preset word as information indicating the feature of the user. This information is a vector of dimensions of the number of feature words. The feature amount indicates that the larger the numerical value associated with the feature word is, the more the user has the feature indicated by the feature word. The feature words are registered in a dictionary prepared by an administrator of the feature estimation apparatus 10 and stored in the user feature generation unit 18.

ユーザ特徴生成部１８は、特徴語毎に推定補助者に係るテキストにおける特徴語の出現回数をカウントする。ユーザ特徴生成部１８は、特徴語毎の出現回数に基づく値を特徴量とする。例えば、特徴語毎の出現回数自体をユーザの特徴を示す情報する。 The user feature generation unit 18 counts the number of appearances of feature words in the text related to the estimation assistant for each feature word. The user feature generation unit 18 uses a value based on the number of appearances for each feature word as a feature amount. For example, the number of appearances for each feature word itself is information indicating the user's feature.

また、ユーザ特徴生成部１８は、推定補助者に係るテキストだけでなく、推定対象のユーザに係るテキストに含まれる単語も用いて推定対象のユーザの特徴推定を行うこととしてもよい。その場合、ユーザ特徴生成部１８は、ユーザデータ記憶部１５に記憶されている、当該推定対象のユーザに係るテキストを読み出す。ユーザ特徴生成部１８は、特徴語毎に推定対象のユーザに係るテキストにおける特徴語の出現回数をカウントする。ユーザ特徴生成部１８は、推定補助者に係るテキストにおける出現回数と、推定対象のユーザに係るテキストにおける出現回数とを特徴語毎に合算し、合算した値をユーザの特徴を示す情報する。なお、この合算の際、推定補助者に係るテキストにおける出現回数と、推定対象のユーザに係るテキストにおける出現回数とに重み付けして（一定の比率で）合算してもよい。また、推定対象のユーザとリンク関係を有する有名人に係るテキストを当該ユーザの特徴を示す情報の生成に用いてもよい。 Further, the user feature generation unit 18 may perform feature estimation of the estimation target user using not only the text related to the estimation assistant but also words included in the text related to the estimation target user. In that case, the user feature generation unit 18 reads the text related to the estimation target user stored in the user data storage unit 15. The user feature generation unit 18 counts the number of appearances of the feature word in the text related to the estimation target user for each feature word. The user feature generation unit 18 adds the number of appearances in the text related to the estimation assistant and the number of appearances in the text related to the estimation target user for each feature word, and uses the added value as information indicating the feature of the user. At the time of this addition, the number of appearances in the text related to the estimation assistant and the number of appearances in the text related to the estimation target user may be weighted (at a fixed ratio) and added together. Moreover, you may use the text which concerns on the celebrity who has a link relation with the estimation object user for the production | generation of the information which shows the said user's characteristic.

なお、ユーザの特徴を示す情報は、必ずしも上記の情報には限られず、推定補助者に関するデータを用いて推定対象のユーザの特徴推定を行うものであれば、どのようなものであってもよい。ユーザ特徴生成部１８は、上記のように生成したユーザの特徴を示す情報をユーザ特徴記憶部１９に出力する。即ち、ユーザ特徴生成部１８は、ユーザの特徴推定を示す情報を出力する出力手段でもある。 Note that the information indicating the user's characteristics is not necessarily limited to the information described above, and any information may be used as long as the characteristics of the estimation target user are estimated using data on the estimation assistant. . The user feature generation unit 18 outputs information indicating the user features generated as described above to the user feature storage unit 19. In other words, the user feature generation unit 18 is also an output unit that outputs information indicating the user's feature estimation.

ユーザ特徴記憶部１９は、ユーザ特徴生成部１８から入力されたユーザの特徴を示す情報を記憶する手段である。例えば、ユーザＩＤに対応付けてユーザの特徴を示す情報を記憶する。特徴推定装置１０によって生成された、ユーザの特徴を示す情報は、例えばユーザに配信あるいは推薦するコンテンツの決定に用いられる。例えば、ユーザを表す特徴語（例えば、一定値以上の値を有する特徴語）をコンテンツのテキスト（説明文）中に含むコンテンツを当該ユーザと関連付ける（当該ユーザに配信あるいは推薦するコンテンツと決定する）。このようにコンテンツを決定することにより、ユーザと関連が高いコンテンツを配信あるいは推薦することができる。以上が、特徴推定装置１０の機能構成である。 The user feature storage unit 19 is a unit that stores information indicating the user feature input from the user feature generation unit 18. For example, information indicating user characteristics is stored in association with the user ID. Information indicating the characteristics of the user generated by the feature estimation device 10 is used, for example, for determining content to be distributed or recommended to the user. For example, a content including a feature word representing a user (for example, a feature word having a value equal to or greater than a certain value) in the text (description) of the content is associated with the user (determined as content to be distributed or recommended to the user) . By determining the content in this way, it is possible to distribute or recommend content highly relevant to the user. The functional configuration of the feature estimation apparatus 10 has been described above.

図５に本実施形態に係る特徴推定装置１０のハードウェア構成を示す。図５に示すように特徴推定装置１０は、ＣＰＵ（Central Processing Unit）１０１、主記憶装置であるＲＡＭ（RandomAccess Memory）１０２及びＲＯＭ（Read Only Memory）１０３、通信を行うための通信モジュール１０４、並びにハードディスク等の補助記憶装置１０５等のハードウェアを備えるコンピュータを含むものとして構成される。これらの構成要素がプログラム等により動作することにより、上述した特徴推定装置１０の機能が発揮される。以上が、本実施形態に係る特徴推定装置１０の構成である。 FIG. 5 shows a hardware configuration of the feature estimation apparatus 10 according to the present embodiment. As shown in FIG. 5, the feature estimation apparatus 10 includes a central processing unit (CPU) 101, a random access memory (RAM) 102 and a read only memory (ROM) 103 that are main storage devices, a communication module 104 for communication, and The computer is configured to include a computer including hardware such as an auxiliary storage device 105 such as a hard disk. The functions of the feature estimation device 10 described above are exhibited by the operation of these components by a program or the like. The above is the configuration of the feature estimation apparatus 10 according to the present embodiment.

引き続いて、図６のフローチャートを用いて、本実施形態に係る特徴推定装置１０で実行される処理である特徴推定方法を説明する。本処理では、データ取得部１３によって、推定対象のユーザのユーザＩＤが入力される（Ｓ０１）。当該ユーザＩＤの入力は、例えば、特徴推定装置１０の管理者等が操作する端末から行われる。続いて、データ取得部１３によって、リンク情報及びユーザデータが取得される（Ｓ０２、リンク情報取得ステップ）。リンク情報としては、少なくとも推定対象のユーザがフォローしているユーザのユーザＩＤ、有名人ユーザＩＤ記憶部１１にユーザＩＤが格納されている有名人をフォローしているユーザのユーザＩＤ、及び推定補助者ユーザＩＤ記憶部１２にユーザＩＤが格納されている推定補助者をフォローしているユーザのユーザＩＤが取得される。また、ユーザデータとしては、少なくとも推定補助者に係るテキストが取得される。取得されたリンク情報はリンク情報記憶部１４に格納される。取得されたユーザデータはユーザデータ記憶部１５に格納される。 Subsequently, a feature estimation method, which is a process executed by the feature estimation apparatus 10 according to the present embodiment, will be described using the flowchart of FIG. In this process, the user ID of the estimation target user is input by the data acquisition unit 13 (S01). The user ID is input from, for example, a terminal operated by an administrator of the feature estimation apparatus 10 or the like. Subsequently, link information and user data are acquired by the data acquisition unit 13 (S02, link information acquisition step). As link information, at least the user ID of the user that the estimation target user is following, the user ID of the user who is following the celebrity whose user ID is stored in the celebrity user ID storage unit 11, and the estimated assistant user The user ID of the user who is following the estimated assistant whose user ID is stored in the ID storage unit 12 is acquired. In addition, as user data, at least text relating to the estimated assistant is acquired. The acquired link information is stored in the link information storage unit 14. The acquired user data is stored in the user data storage unit 15.

続いて、類似度算出部１６によって、リンク情報記憶部１４に記憶されたリンク情報に基づいて、各有名人と各推定補助者との間の類似度がそれぞれのリンク関係の一致度から算出される（Ｓ０３、類似度算出ステップ）。続いて、類似度算出部１６によって、算出された類似度が閾値を超えているか否かが判断される。閾値を超えている類似度に係る、有名人と推定補助者との組み合わせを示す情報が類似度算出部１６から特定部１７に出力される。 Subsequently, based on the link information stored in the link information storage unit 14, the similarity calculation unit 16 calculates the similarity between each celebrity and each estimated assistant from the degree of coincidence of each link relationship. (S03, similarity calculation step). Subsequently, the similarity calculation unit 16 determines whether or not the calculated similarity exceeds a threshold value. Information indicating a combination of a celebrity and an estimation assistant related to the similarity exceeding the threshold is output from the similarity calculation unit 16 to the specifying unit 17.

続いて、特定部１７によって、類似度算出部１６によって算出された類似度に基づいて、推定対象のユーザの特徴推定に利用する推定補助者が特定される（Ｓ０４、特定ステップ）。具体的には、特定部１７によって、リンク情報記憶部１４に記憶されているリンク情報と有名人ユーザＩＤ記憶部１１に記憶されている有名人のユーザＩＤとが参照されて、推定対象のユーザがフォローしている有名人が特定される。続いて、特定部１７によって、類似度算出部１６から入力された情報が参照されて、当該有名人と対応付いている推定補助者が推定対象のユーザの特徴推定に利用する推定補助者として特定される。特定された推定補助者を示す情報は、特定部１７からユーザ特徴生成部１８に出力される。 Subsequently, based on the similarity calculated by the similarity calculation unit 16, the specifying unit 17 specifies an estimation assistant to be used for estimating the characteristics of the estimation target user (S04, specifying step). Specifically, the specifying unit 17 refers to the link information stored in the link information storage unit 14 and the celebrity user ID storage unit 11 to store the celebrity user ID, and the estimation target user follows. Celebrities are identified. Subsequently, the information input from the similarity calculation unit 16 is referred to by the specifying unit 17, and the estimation assistant associated with the celebrity is specified as the estimation assistant used for estimating the characteristics of the estimation target user. The Information indicating the identified estimated assistant is output from the specifying unit 17 to the user feature generating unit 18.

続いて、ユーザ特徴生成部１８によって、特定部１７から入力された情報によって示される推定補助者に係るテキストがユーザデータ記憶部１５から取得されて、当該テキストから推定対象のユーザの特徴推定が行われる（Ｓ０５、特徴推定ステップ）。ユーザの特徴推定を示す情報は、ユーザ特徴生成部１８からユーザ特徴記憶部１９に出力されて、ユーザ特徴記憶部１９に記憶される（Ｓ０６、出力ステップ）。以上が、本実施形態に係る特徴推定装置１０で実行される処理である特徴推定方法である。 Subsequently, the user feature generation unit 18 obtains the text related to the estimation assistant indicated by the information input from the specifying unit 17 from the user data storage unit 15, and performs the feature estimation of the estimation target user from the text. (S05, feature estimation step). Information indicating the user's feature estimation is output from the user feature generation unit 18 to the user feature storage unit 19 and stored in the user feature storage unit 19 (S06, output step). The above is the feature estimation method that is a process executed by the feature estimation apparatus 10 according to the present embodiment.

上述したように本実施形態では、推定補助者と、推定対象のユーザとリンク関係を有する有名人との間の類似度に基づいて、当該推定対象のユーザの特徴推定に利用する推定補助者が特定され、推定補助者に関するデータからユーザの特徴推定が行われる。上述したように、ユーザの特徴推定に活用可能なデータを有する、即ち、ユーザの特徴推定に活用可能なテキストを投稿している推定補助者は、推定対象のユーザと直接リンク関係を有していなくてもよい。即ち、本実施形態によれば、特徴の推定対象となるユーザ及び当該ユーザと直接繋がりのあるユーザに係るデータからユーザ特徴推定に活用可能なデータを十分に得られない場合であっても、十分な精度のユーザ特徴推定を行うことができる。 As described above, in the present embodiment, the estimation assistant used for estimating the feature of the estimation target user is identified based on the similarity between the estimation assistant and the celebrity having a link relationship with the estimation target user. Then, the feature of the user is estimated from the data related to the estimation assistant. As described above, an estimation assistant who has data that can be used for user feature estimation, that is, a text that can be used for user feature estimation has a direct link relationship with the estimation target user. It does not have to be. That is, according to the present embodiment, even if sufficient data that can be used for user feature estimation cannot be obtained from data related to a user whose feature is to be estimated and a user who is directly connected to the user. User feature estimation with high accuracy can be performed.

また、本実施形態のように有名人と推定補助者との間の類似度は、それらのユーザとリンク関係にあるユーザの一致度に基づいて算出することができる。この構成によれば、計算量の少ない演算により上記の類似度を算出することができる。また、より具体的には、シンプソン係数を用いることができる。シンプソン係数を用いることにより、有名人と推定補助者との何れかとリンク関係を有するユーザが極端に少ない場合等であっても、ジャッカード（Ｊａｃｃａｒｄ）係数と比較して正確にユーザ間の類似度を算出することができる。但し、類似度は必ずしも上記の方法により算出される必要はなく、リンク関係に基づいて算出されるものであれば任意の方法で算出されてもよい。 In addition, as in the present embodiment, the degree of similarity between a celebrity and an estimated assistant can be calculated based on the degree of coincidence of users who are linked to those users. According to this configuration, the above-described similarity can be calculated by a calculation with a small calculation amount. More specifically, a Simpson coefficient can be used. By using the Simpson coefficient, even when there are extremely few users who have a link relationship with either a celebrity or an estimated assistant, the similarity between users can be accurately compared with the Jackard coefficient. Can be calculated. However, the similarity is not necessarily calculated by the above method, and may be calculated by any method as long as it is calculated based on the link relationship.

また、ユーザの特徴推定には、推定補助者に係るテキストに含まれる特徴語を用いて行うこととすることができる。これにより、適切かつ確実にユーザの特徴推定を行うことができる。また、ユーザの特徴を簡潔に表すことができる。また、上述したように推定対象のユーザに係るテキストを用いてもよい。上述したように、例えば、Ｔｗｉｔｔｅｒ等の推定対象のユーザ自身の投稿には特徴推定に活用可能な情報が十分に含まれていない場合が多い。しかし、多少なりとも推定対象のユーザ自身の投稿に特徴推定に活用可能な情報が含まれていれば、上記の構成によりユーザ自身の情報に基づいてより適切にユーザの特徴推定を行うことができる。 The user's feature estimation can be performed using a feature word included in the text related to the estimation assistant. Thereby, a user's feature estimation can be performed appropriately and reliably. In addition, user characteristics can be expressed in a concise manner. Further, as described above, text related to the estimation target user may be used. As described above, for example, there are many cases where information that can be used for feature estimation is not sufficiently included in the post of the estimation target user himself such as Twitter. However, if the user's own post to be estimated includes information that can be used for feature estimation, the above-described configuration enables the user's feature estimation to be performed more appropriately based on the user's own information. .

また、本実施形態で示したリンク関係がフォロー関係であるように、ユーザ間のリンク関係の方向も考慮してもよい。例えば、上述した実施形態では、有名人や推定補助者へのフォローのリンク関係を利用しているが、有名人や推定補助者からのフォローのリンク関係は必ずしも利用していない。この構成によれば、リンクの方向にも基づいて適切にユーザの特徴推定を行うことができる。但し、リンク関係の方向については必ずしも考慮する必要はなく、方向付けがなされていないリンク関係に基づいて本発明を実施することも可能である。 Further, the direction of the link relationship between users may be considered so that the link relationship shown in the present embodiment is a follow relationship. For example, in the above-described embodiment, the follow link relation to the celebrity or the estimation assistant is used, but the follow link relation from the celebrity or the estimation assistant is not necessarily used. According to this configuration, it is possible to appropriately perform user feature estimation based on the link direction. However, it is not always necessary to consider the direction of the link relationship, and it is possible to implement the present invention based on the link relationship that is not oriented.

１０…特徴推定装置、１１…有名人ユーザＩＤ記憶部、１２…推定補助者ユーザＩＤ記憶部、１３…データ取得部、１４…リンク情報記憶部、１５…ユーザデータ記憶部、１６…類似度算出部、１７…特定部、１８…ユーザ特徴生成部、１９…ユーザ特徴記憶部、１０１…ＣＰＵ、１０２…ＲＡＭ、１０３…ＲＯＭ、１０４…通信モジュール、１０５…補助記憶装置。 DESCRIPTION OF SYMBOLS 10 ... Feature estimation apparatus, 11 ... Celebrity user ID storage part, 12 ... Estimation assistant user ID storage part, 13 ... Data acquisition part, 14 ... Link information storage part, 15 ... User data storage part, 16 ... Similarity calculation part , 17 ... identification unit, 18 ... user feature generation unit, 19 ... user feature storage unit, 101 ... CPU, 102 ... RAM, 103 ... ROM, 104 ... communication module, 105 ... auxiliary storage device.

Claims

A feature estimation device that performs feature estimation of a user,
Link information acquisition means for acquiring link information indicating a link relationship between a plurality of users;
Similarity calculation for calculating the similarity between one or more preset first users and one or more preset second users based on the link information acquired by the link information acquisition means Means,
Based on the similarity between the second user calculated by the similarity calculation means and the first user who has a link relationship with the estimation target user, it is used for feature estimation of the estimation target user. Identifying means for identifying a second user;
Feature estimation means for performing feature estimation of the estimation target user using data related to the second user specified by the specification means;
Output means for outputting information indicating the feature estimation of the user performed by the feature estimation means;
A feature estimation apparatus comprising:

2. The similarity calculation unit according to claim 1, wherein the similarity calculation unit calculates the similarity based on a degree of coincidence between a user having a link relationship with the first user and a user having a link relationship with the second user. Feature estimation device.

The similarity calculating means calculates a smaller denominator of the number of users having a link relationship with the first user and the number of users having a link relationship with the second user, the first user and the first user. The feature estimation apparatus according to claim 2, wherein the number of users having a link relationship with both of the two users is a numerator.

The said feature estimation means performs the feature estimation of the said estimation object user using the word contained in the text which concerns on the said 2nd user as data regarding the said 2nd user, The any one of Claims 1-3 The feature estimation device according to claim 1.

The feature estimation apparatus according to claim 4, wherein the feature estimation unit performs feature estimation of the estimation target user using a word included in text related to the estimation target user.

The said link information acquisition means is a feature estimation apparatus as described in any one of Claims 1-5 which acquires the link information which also shows the direction of the link relationship between these users.

A feature estimation method for performing user feature estimation,
A link information acquisition step of acquiring link information indicating a link relationship between a plurality of users;
Similarity calculation for calculating the similarity between one or more preset first users and one or more preset second users based on the link information acquired in the link information acquisition step Steps,
Based on the similarity between the second user calculated in the similarity calculation step and the first user who has a link relationship with the estimation target user, it is used for feature estimation of the estimation target user. A specific step of identifying a second user;
A feature estimation step of performing feature estimation of the estimation target user using data related to the second user identified in the identification step;
An output step of outputting information indicating the user's feature estimation performed in the feature estimation step;
A feature estimation method including: