JP5829180B2

JP5829180B2 - Class type estimation apparatus, program, and method for estimating ratio of each class type in all member objects in group

Info

Publication number: JP5829180B2
Application number: JP2012121733A
Authority: JP
Inventors: 池田　和史; 和史池田; 服部　元; 元服部; 小野　智弘; 智弘小野
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2012-05-29
Filing date: 2012-05-29
Publication date: 2015-12-09
Anticipated expiration: 2032-05-29
Also published as: JP2013246757A

Description

本発明は、複数のメンバオブジェクトを含む推定対象のグループについて、クラス（属性）種別を推定する技術に関する。 The present invention relates to a technique for estimating a class (attribute) type for an estimation target group including a plurality of member objects.

近年、不特定多数の第三者が、ＳＮＳ(Social Networking Service)サイトサーバを介して、自らのコメント文章（テキスト情報）を活発に発信することができる。「ＳＮＳサイトサーバ」は、複数のユーザからなるグループへ、１人のユーザによって投稿された投稿文を公開する。例えばfacebook（登録商標）やtwitter（登録商標）、google＋（登録商標）、mixi（登録商標）があり、一般にミニブログサイトとも称される。また、各ユーザは、ＳＮＳサイトサーバに対してアカウントを登録し、アカウントと共に投稿文が公開される。また、ユーザ毎のプロフィール情報も公開される。プロフィール情報は、ユーザ自らの自己紹介の文章であって、自らの属性種別を表し、不特定多数の第三者から自由に閲覧される。 In recent years, an unspecified number of third parties can actively transmit their own comment sentences (text information) via an SNS (Social Networking Service) site server. The “SNS site server” publishes posted text posted by one user to a group of a plurality of users. For example, there are facebook (registered trademark), twitter (registered trademark), google + (registered trademark), and mixi (registered trademark), which are also generally referred to as miniblog sites. Each user registers an account with the SNS site server, and the posted text is published together with the account. The profile information for each user is also disclosed. The profile information is a self-introduction sentence of the user, represents the attribute type of the user, and is freely browsed by an unspecified number of third parties.

一方で、ＳＮＳサイトサーバによって発信される投稿文は、商品及び役務に関する口コミ情報である場合も多い。この場合、マーケティングによれば、どのようなユーザがどのような評価をしているかを推定することが所望される。一般的には、所定目的に基づく多数の投稿文をグループ化し、そのメンバ投稿者のプロフィール情報に基づいて、ユーザ属性の傾向を推定しようとする。この場合、メンバ投稿者のプロフィール情報ができる限り正確なものであることが好ましい。そのために、メンバ投稿者の投稿文の内容から、プロフィール情報、即ちユーザ属性（年代、性別、趣味等）を推定することも考えられる。 On the other hand, the posted text transmitted by the SNS site server is often word-of-mouth information related to products and services. In this case, according to marketing, it is desired to estimate what kind of user is doing what kind of evaluation. In general, a large number of posted sentences based on a predetermined purpose are grouped, and the tendency of user attributes is estimated based on the profile information of the member posters. In this case, it is preferable that the profile information of the member poster is as accurate as possible. Therefore, it is also conceivable to estimate profile information, that is, user attributes (age, sex, hobbies, etc.) from the contents of the posted sentences of member posters.

従来、ユーザのプロフィール情報を、当該ユーザの投稿したテキスト情報に基づいて推定する技術がある（例えば非特許文献１参照）。この技術によれば、ユーザ毎に、過去のtweet中の単語分布からユーザの属性を推定し、対象となるグループのユーザ全体の性別のユーザ属性の比率を推定することができる。また、この技術によれば、１人以上のユーザ属性やテレビ視聴履歴を事前に収集し、当該ユーザ群の嗜好に合ったコンテンツ（例えばテレビ番組）を提示することもできる。これによって、当該ユーザ群のソーシャルコミュニケーションの場にあって、そのコンテンツを視聴する可能性が高くなり、円滑なコミュニケーションが提供される。例えば口コミマーケティングによれば、ＳＮＳを用いてユーザのプロフィール情報を取得することによるニーズは大きい。 Conventionally, there is a technique for estimating a user's profile information based on text information posted by the user (see, for example, Non-Patent Document 1). According to this technology, it is possible to estimate a user attribute from a word distribution in a past tweet for each user, and to estimate a ratio of user attributes of the gender of all users in the target group. Further, according to this technology, it is possible to collect one or more user attributes and TV viewing history in advance, and to present content (for example, a TV program) that matches the preference of the user group. Thereby, in the social communication place of the user group, the possibility of viewing the content is increased, and smooth communication is provided. For example, according to word-of-mouth marketing, there is a great need for acquiring user profile information using SNS.

池田和史，服部元，松本一則，小野智弘，東野輝夫、「マーケット推定のためのTwitter投稿者プロフィール推定手法」、DICOMO 2011, 7E-1.Kazufumi Ikeda, Hajime Hattori, Kazunori Matsumoto, Tomohiro Ono, Teruo Higashino, “Twitter Poster Profile Estimation Method for Market Estimation”, DICOMO 2011, 7E-1. 中部大学・鈴木肇、「最小二乗法」、[online]、［平成２４年５月３日検索］、インターネット＜URL:http://szksrv.isc.chubu.ac.jp/lms/lms1.html＞Chubu University, Satoshi Suzuki, “Least Squares Method”, [online], [Search May 3, 2012], Internet <URL: http://szksrv.isc.chubu.ac.jp/lms/lms1.html > Wikipedia、「最小二乗法」、[online]、［平成２４年５月３日検索］、インターネット＜URL:http://ja.wikipedia.org/wiki/%E6%9C%80%E5%B0%8F%E4%BA%8C%E4%B9%97%E6%B3%95＞Wikipedia, “Least Squares”, [online], [Search May 3, 2012], Internet <URL: http://en.wikipedia.org/wiki/%E6%9C%80%E5%B0% 8F% E4% BA% 8C% E4% B9% 97% E6% B3% 95>

マーケティングの分野によれば、一般に、「複数のメンバオブジェクトを含む推定対象のグループについて、複数のクラス（属性）種別の中で各クラス種別の比率のみを知りたい」とするニーズが存在すると考えられる。そのためには、ユーザのプロフィール情報を明確に特定することが必要となる。しかしながら、ＳＮＳサイトにおけるプロフィール情報は、一般に、ユーザ自ら記述するものであるために、身分や興味をあえて明確に記述しない場合が多い。例えば、男性のユーザが、あえて女性のふりして、投稿文を投稿しているような場合もある。そのために、非特許文献１に記載の技術によれば、グループ内の各投稿者の過去の投稿文から、特徴的な単語を抽出し、各投稿者のプロフィール情報を推定している。これによって、グループ内のユーザ全体における各プロフィール項目（クラス）の比率を推定することもできる。 According to the field of marketing, it is generally considered that there is a need to “know only the ratio of each class type among a plurality of class (attribute) types for a group to be estimated including a plurality of member objects”. . For that purpose, it is necessary to clearly specify the profile information of the user. However, since the profile information in the SNS site is generally described by the user himself, it is often not clearly described with intention and interest. For example, there is a case where a male user dares to pretend to be a woman and posts a post. Therefore, according to the technique described in Non-Patent Document 1, a characteristic word is extracted from the past posted text of each contributor in the group, and the profile information of each contributor is estimated. Thereby, the ratio of each profile item (class) in the whole user in the group can also be estimated.

しかしながら、非特許文献１に記載の技術によれば、最初に、ユーザ（メンバオブジェクト）毎に、プロフィール情報を明確に推定し、それら結果を集計している。しかしながら、ユーザ毎にプロフィール情報を明確に識別するために、推定対象となるグループ内のユーザ数が増加すると共に、必要な計算量も増加するという課題が生じる。また、各ユーザについて、プロフィール情報を明確に識別してしまうことによって、そのプロフィール情報が曖昧な場合、プロフィール項目の比率の推定精度が悪化するという課題も生じる。 However, according to the technique described in Non-Patent Document 1, first, profile information is clearly estimated for each user (member object), and the results are tabulated. However, in order to clearly identify the profile information for each user, there arises a problem that the number of users in the estimation target group increases and the necessary calculation amount also increases. In addition, by clearly identifying the profile information for each user, if the profile information is ambiguous, there is a problem that the estimation accuracy of the ratio of profile items deteriorates.

そこで、本発明は、複数のメンバオブジェクトを含む推定対象のグループについて、オブジェクト毎のクラス（属性）種別を正確に識別することなく、各クラス種別の比率を算出することができるクラス比率推定装置、プログラム及び方法を提供することを目的とする。 Accordingly, the present invention provides a class ratio estimation device capable of calculating the ratio of each class type without accurately identifying the class (attribute) type for each object for a group to be estimated including a plurality of member objects, The purpose is to provide a program and method.

本発明によれば、推定対象の複数のメンバオブジェクトi（i＝1〜N）を含むグループについて、複数のクラス（属性）種別における各クラス種別j（j=1〜M）の比率を算出するクラス比率推定装置であって、
メンバオブジェクトは、メンバ識別子iと、複数の要素k（k＝1〜K）における各要素の特徴量ｆ_kiとからなり、
学習対象のグループの全てのメンバオブジェクトi（i＝1〜L）から、クラス種別j毎に、各要素kの特徴量の確率分布関数を学習特徴量分布Ｐ _kj として蓄積する学習特徴量蓄積手段と、
推定対象のグループのメンバオブジェクト全体から、各要素kの特徴量の確率分布関数を対象特徴分布Ｐ _k として算出する対象特徴量分布算出手段と、
クラス種別j毎の学習特徴量分布Ｐ _kjの重み付け和Σ_jｗ_jＰ_kjと、対象特徴量分布Ｐ_kとの分布間距離d(Σ_jｗ_jＰ_kj, Ｐ_k)を、ｗ_j≧0という条件の下で最小にするように、各クラス種別jの学習特徴量分布Ｐ _kjに付ける重みｗ_jを決定する重み決定手段と、
決定された各クラス種別jの重みｗ_jを正規化して、クラス種別j毎の比率を算出するクラス種別正規化手段と
を有することを特徴とする。 According to the present invention, for a group including a plurality of estimation target member objects i (i = 1 to N), the ratio of each class type j (j = 1 to M) in a plurality of class (attribute) types is calculated. A class ratio estimation device comprising:
The member object is composed of a member identifier i and a feature value f _ki of each element in a plurality of elements k (k = 1 to K).
Learning feature value accumulating means for accumulating the probability distribution function of the feature value of each element k as a learning feature value distribution P _kj for each class type j from all member objects i (i = 1 to L) of the learning target group When,
A target feature amount distribution calculating means for calculating a probability distribution function of the feature amount of each element k as a target feature distribution P _k from all the member objects of the estimation target group;
The distribution distance d (Σ _j w _j P _kj , P _k ) between the weighted sum Σ _j w _j P _kj of the learning feature quantity distribution P _kj for each class type j and the target feature quantity distribution P _k is expressed as w _j ≧ so as to minimize under the condition that 0, a weight determining means for determining the weights w _j to be given to learning feature distribution P _kj for each class type j,
Class weight normalizing means for normalizing the determined weight w _j of each class type j and calculating a ratio for each class type j is provided.

本発明のクラス比率推定装置における他の実施形態によれば、
重み決定手段は、クラス種別j毎に算出される各要素kの学習特徴量分布Ｐ_kjの平均値μ_kjの重み付け和Σ_jｗ_jμ_kjと、各要素k に対する対象特徴量分布Ｐ_kの平均値μ_kとの二乗誤差で定義される分布間距離 d(Σ_jｗ_jＰ_kj, Ｐ_k)=Σ_k(Σ_jｗ_jμ_kj-μ_k)²を、ｗ_j ≧ 0 という条件の下で最小にするように、各クラス種別jの学習特徴量分布Ｐ _kjに付ける重みｗ_jを決定することも好ましい。 According to another embodiment of the class ratio estimation apparatus of the present invention,
The weight determining means calculates the weighted sum Σ _j w _j μ _kj of the average value μ _kj of the learning feature quantity distribution P _kj of each element k calculated for each class type j and the target feature quantity distribution P _k for each element _k . Inter-distribution distance d (Σ _j w _j P _kj , P _k ) = Σ _k (Σ _j w _j μ _kj −μ _k ) ² defined by the square error from the mean value μ _k, with the condition that w _j ≧ 0 It is also preferable to determine the weight w _j to be assigned to the learning feature amount distribution P _kj of each class type j so as to be minimized under.

本発明のクラス比率推定装置における他の実施形態によれば、
学習対象の複数のメンバオブジェクトi（i＝1〜L）を含むグループを入力し、当該グループの全てのメンバオブジェクトから、クラス種別j毎に各要素kの特徴量の確率分布関数Ｐ_kiを算出し、当該確率分布関数を学習特徴量分布Ｐ _kj として学習特徴量蓄積手段へ出力する学習特徴量分布算出手段を更に有することも好ましい。 According to another embodiment of the class ratio estimation apparatus of the present invention,
Enter the group including a plurality of member object i to be learned (i = 1 to L), from all the member objects of the group, for each class type j of feature parameters of each element k of the probability distribution function P _ki It is also preferable to further include a learning feature amount distribution calculating unit that calculates and outputs the probability distribution function as a learning feature amount distribution P _kj to the learning feature amount storage unit.

本発明のクラス比率推定装置における他の実施形態によれば、
メンバ識別子は、投稿者識別子であり、
各メンバオブジェクトは、投稿者識別子に対応するメンバ投稿者に関するユーザテキストであり、
クラス種別は、ユーザプロフィールに関する属性種別であり、
要素は、ユーザテキストに含まれるキーワードである
ことも好ましい。 According to another embodiment of the class ratio estimation apparatus of the present invention,
The member identifier is the poster identifier,
Each member object is a user text related to the member poster corresponding to the poster identifier,
Class type is an attribute type related to user profile,
The element is also preferably a keyword included in the user text.

本発明のクラス比率推定装置における他の実施形態によれば、
推定対象のグループに属する各メンバ投稿者のユーザテキストを取得するユーザテキスト取得手段と、
ユーザテキストから、形態素解析によって、要素としてのキーワードを抽出するキーワード抽出手段と、
メンバ投稿者毎に、要素となる各キーワードの数を特徴量として抽出する特徴量抽出手段と
を更に有することも好ましい。 According to another embodiment of the class ratio estimation apparatus of the present invention,
User text acquisition means for acquiring the user text of each member contributor belonging to the estimation target group;
A keyword extracting means for extracting a keyword as an element from user text by morphological analysis;
It is also preferable that each member contributor further includes a feature amount extraction means for extracting the number of each keyword as an element as a feature amount.

本発明のクラス比率推定装置における他の実施形態によれば、
キーワード抽出手段は、ＵＲＬ(Uniform Resource Locator)におけるドメイン名を、要素としてのキーワードとして更に抽出することも好ましい。 According to another embodiment of the class ratio estimation apparatus of the present invention,
Preferably, the keyword extracting means further extracts a domain name in a URL (Uniform Resource Locator) as a keyword as an element.

本発明のクラス比率推定装置における他の実施形態によれば、
キーワード抽出手段は、ユーザテキスト取得手段における推定対象のグループに属する各メンバのメンバ識別子を、要素としてのキーワードとして更に抽出することも好ましい。 According to another embodiment of the class ratio estimation apparatus of the present invention,
Preferably, the keyword extracting means further extracts the member identifier of each member belonging to the estimation target group in the user text obtaining means as a keyword as an element.

本発明のクラス比率推定装置における他の実施形態によれば、
ユーザテキストは、コミュニケーションサーバにおける、メンバ投稿者によって発信された投稿文、及び／又は、メンバ投稿者のプロフィール文であることも好ましい。 According to another embodiment of the class ratio estimation apparatus of the present invention,
It is also preferable that the user text is a posted message sent by a member poster and / or a profile message of a member poster in the communication server.

本発明のクラス比率推定装置における他の実施形態によれば、
対象特徴量分布算出手段は、メンバ投稿者のプロフィール文に、学習特徴量蓄積手段に蓄積されたクラス種別jを表すキーワードが含まれている場合、当該学習特徴量蓄積手段の当該クラス種別jに対応する各要素kの学習特徴量の確率分布関数Ｐ_kjをそのまま、当該メンバ投稿者のメンバオブジェクトの各要素kの特徴量ｆ_kiとすることも好ましい。 According to another embodiment of the class ratio estimation apparatus of the present invention,
The target feature amount distribution calculating means includes, when the member poster's profile sentence includes a keyword representing the class type j stored in the learning feature amount storage means, in the class type j of the learning feature amount storage means. It is also preferable to use the probability distribution function P _kj of the learning feature value of each corresponding element k as it is as the feature value f _ki of each element k of the member object of the member poster.

本発明によれば、推定対象の複数のメンバオブジェクトi（i＝1〜N）を含むグループについて、複数のクラス（属性）種別における各クラス種別j（j=1〜M）の比率を算出するようにコンピュータを機能させるクラス比率推定プログラムであって、
メンバオブジェクトは、メンバ識別子iと、複数の要素k（k＝1〜K）における各要素の特徴量ｆ _ki とからなり、
学習対象のグループの全てのメンバオブジェクトi（i＝1〜L）から、クラス種別j毎に、各要素kの特徴量の確率分布関数を学習特徴量分布Ｐ _kj として蓄積する学習特徴量蓄積手段と、
推定対象のグループのメンバオブジェクト全体から、各要素kの特徴量の確率分布関数を対象特徴分布Ｐ _k として算出する対象特徴量分布算出手段と、
クラス種別j毎の学習特徴量分布Ｐ _kj の重み付け和Σ _j ｗ _j Ｐ _kj と、対象特徴量分布Ｐ _k との分布間距離d(Σ _j ｗ _j Ｐ _kj , Ｐ _k )を、ｗ _j ≧0という条件の下で最小にするように、各クラス種別jの学習特徴量分布Ｐ _kj に付ける重みｗ _j を決定する重み決定手段と、
決定された各クラス種別jの重みｗ _j を正規化して、クラス種別j毎の比率を算出するクラス種別正規化手段と
してコンピュータを機能させることを特徴とする。 According to the present invention, the ratio of each class type j (j = 1 to M) in a plurality of class (attribute) types is calculated for a group including a plurality of member objects i (i = 1 to N) to be estimated. A class ratio estimation program for causing a computer to function as follows:
The member object is composed of a member identifier i and a feature value f _ki of each element in a plurality of elements k (k = 1 to K) .
Learning feature value accumulating means for accumulating the probability distribution function of the feature value of each element k as a learning feature value distribution P _kj for each class type j from all member objects i (i = 1 to L) of the learning target group When,
A target feature amount distribution calculating means for calculating a probability distribution function of the feature amount of each element k as a target feature distribution P _k from all the member objects of the estimation target group ;
The distribution distance d (Σ _j w _j P _kj , P _k ) between the weighted sum Σ _j w _j P _kj of the learning feature quantity distribution P _kj for each class type j and the target feature quantity distribution P _k is expressed as w _j ≧ Weight determining means for determining a weight w _j to be assigned to the learning feature amount distribution P _kj of each class type j so as to be minimized under the condition of 0 ;
Class type normalizing means for normalizing the weight w _j of each determined class type j and calculating a ratio for each class type j;
And making the computer function.

本発明によれば、推定対象の複数のメンバオブジェクトi（i＝1〜N）を含むグループについて、複数のクラス（属性）種別における各クラス種別j（j=1〜M）の比率を算出する装置のクラス比率推定方法であって、
メンバオブジェクトは、メンバ識別子iと、複数の要素k（k＝1〜K）における各要素の特徴量ｆ _ki とからなり、
装置は、学習対象のグループの全てのメンバオブジェクトi（i＝1〜L）から、クラス種別j毎に、各要素kの特徴量の確率分布関数を学習特徴量分布Ｐ _kj として蓄積する学習特徴量蓄積部を有し、
装置は、
推定対象のグループのメンバオブジェクト全体から、各要素kの特徴量の確率分布関数を対象特徴分布Ｐ _k として算出する第１のステップと、
クラス種別j毎の学習特徴量分布Ｐ _kj の重み付け和Σ _j ｗ _j Ｐ _kj と、対象特徴量分布Ｐ _k との分布間距離d(Σ _j ｗ _j Ｐ _kj , Ｐ _k )を、ｗ _j ≧0という条件の下で最小にするように、各クラス種別jの学習特徴量分布Ｐ _kj に付ける重みｗ _j を決定する第２のステップと、
決定された各クラス種別jの重みｗ _j を正規化して、クラス種別j毎の比率を算出する第３のステップと
を実行することを特徴とする。 According to the present invention, the ratio of each class type j (j = 1 to M) in a plurality of class (attribute) types is calculated for a group including a plurality of member objects i (i = 1 to N) to be estimated. An apparatus class ratio estimation method comprising:
The member object is composed of a member identifier i and a feature value f _ki of each element in a plurality of elements k (k = 1 to K) .
The apparatus accumulates a probability distribution function of a feature quantity of each element k as a learning feature quantity distribution P _kj for each class type j from all member objects i (i = 1 to L) of the learning target group. A quantity storage unit,
The device
A first step of calculating a probability distribution function of a feature value of each element k as a target feature distribution P _k from all the member objects of the estimation target group ;
The distribution distance d (Σ _j w _j P _kj , P _k ) between the weighted sum Σ _j w _j P _kj of the learning feature quantity distribution P _kj for each class type j and the target feature quantity distribution P _k is expressed as w _j ≧ A second step of determining a weight w _j to be assigned to the learning feature amount distribution P _kj of each class type j so as to be minimized under the condition of 0 ;
A third step of normalizing the determined weight w _j of each class type j and calculating a ratio for each class type j;
It is characterized by performing .

本発明のクラス比率推定装置、プログラム及び方法によれば、複数のメンバオブジェクトを含む推定対象のグループについて、オブジェクト毎のクラス（属性）種別を識別することなく、各クラス種別の比率を算出することができる。 According to the class ratio estimation apparatus, program, and method of the present invention, for a group to be estimated including a plurality of member objects, the ratio of each class type is calculated without identifying the class (attribute) type for each object. Can do.

本発明におけるシステム構成図である。It is a system configuration diagram in the present invention. 本発明におけるクラス比率推定装置の機能構成図である。It is a functional block diagram of the class ratio estimation apparatus in this invention. メンバ識別子i毎における各要素k（k＝1〜K）の特徴量ｆ_kiと、平均値μの確率分布密度Ｐ_kを表す説明図である。A feature amount f _ki of each element k (k = 1~K) in each member identifier i, a diagram of the probability distribution density P _k of an average value mu. 学習特徴量蓄積部に蓄積された確率分布関数Ｐ_kjを表す説明図である。It is explanatory drawing showing the probability distribution function _Pkj accumulate _| stored in the learning feature-value accumulation _| storage part. 学習特徴量の確率分布関数Ｐ_kjと、対象特徴量の確率分布関数Ｐ_kjとの比較を表す説明図である。A probability distribution function P _kj learning feature quantity, which is a schematic diagram of a comparison between the target characteristic quantity of the probability distribution function P _kj. 学習特徴量分布算出部における処理内容を表す説明図である。It is explanatory drawing showing the processing content in a learning feature-value distribution calculation part.

以下では、本発明の実施の形態について、図面を用いて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明におけるシステム構成図である。 FIG. 1 is a system configuration diagram according to the present invention.

図１によれば、不特定多数の第三者は、各自が所持する端末３を用いて、インターネットを介してコミュニケーションサイトサーバ２へ、投稿文を送信することができる。以下では、コミュニケーションサーバ２は、例えばＳＮＳサイトサーバであるとして説明する。勿論、ＳＮＳサイトサーバに限られないが、複数の投稿者間でコメント文章を発信し且つ購読し合うグループを構成し、投稿者毎に投稿文とを公開することを要する。 According to FIG. 1, an unspecified number of third parties can transmit a posted message to the communication site server 2 via the Internet using the terminal 3 possessed by the third party. Hereinafter, the communication server 2 will be described as being an SNS site server, for example. Of course, although not limited to the SNS site server, it is necessary to form a group in which comment text is transmitted and subscribed among a plurality of contributors, and the posted text is disclosed to each contributor.

図１によれば、本発明のクラス比率推定装置１は、インターネットを介してＳＮＳサイトサーバ２と通信する。クラス比率推定装置１は、ＳＮＳサイトサーバ２に対して、ＡＰＩ(Application Programming Interface)を介して、投稿者の交流関係リスト（グループ情報）と、投稿者毎の投稿文とを取得することができる。また、投稿者毎のプロフィール文も取得することができる。ＡＰＩは、アプリケーションサービスの機能を利用するための規則インタフェースであって、種々のサーバ毎に異なるものとして用意されている。そして、クラス比率推定装置１は、複数のメンバオブジェクトi（i＝1〜N）を含む推定対象のグループについて、複数のクラス（属性）種別における各クラス種別j（j=1〜M）の比率を算出する。 According to FIG. 1, the class ratio estimation apparatus 1 of the present invention communicates with an SNS site server 2 via the Internet. The class ratio estimation apparatus 1 can acquire a poster's exchange relation list (group information) and a posted sentence for each poster via an API (Application Programming Interface) from the SNS site server 2. . Moreover, the profile sentence for every contributor can also be acquired. The API is a rule interface for using the function of the application service, and is prepared as different for each of various servers. Then, the class ratio estimation apparatus 1 has a ratio of each class type j (j = 1 to M) in a plurality of class (attribute) types for a group to be estimated including a plurality of member objects i (i = 1 to N). Is calculated.

以下では具体的に、排他的なクラス種別は、「男性」／「女性」の２つのクラス種別であるとして説明する。また、メンバオブジェクトは、Twitterサイトにおけるユーザテキストであるとする。そして、クラス比率推定装置１によって、多数のメンバオブジェクトからのユーザテキストから、そのグループ全体におけるクラス比（例えば男女比）が推定される。 In the following description, it is assumed that the exclusive class types are two class types of “male” / “female”. The member object is assumed to be a user text on the Twitter site. Then, the class ratio estimation apparatus 1 estimates the class ratio (for example, gender ratio) in the entire group from user texts from a large number of member objects.

図２は、本発明におけるクラス比率推定装置の機能構成図である。 FIG. 2 is a functional configuration diagram of the class ratio estimation apparatus according to the present invention.

クラス比率推定装置１は、通信インタフェース部１０と、ユーザインタフェース部１１とを有する。通信インタフェース部１０は、例えばインターネットを介してＳＮＳサーバ２と通信する。ユーザインタフェース部１１は、グループにおけるクラス種別の比率をユーザに明示することができる。 The class ratio estimation apparatus 1 includes a communication interface unit 10 and a user interface unit 11. The communication interface unit 10 communicates with the SNS server 2 via, for example, the Internet. The user interface unit 11 can clearly indicate to the user the ratio of class types in the group.

本発明におけるクラス比率推定装置１は、＜推定用機能部＞として、ユーザテキスト取得部２１と、キーワード抽出部２２と、特徴量抽出部２３と、対象特徴量分布算出部２４と、重み決定部２５と、クラス種別正規化部２６と、学習特徴量蓄積部３０とを有する。また、クラス比率推定装置１は、＜学習用機能部＞として、学習テキスト取得部３１と、学習特徴量分布算出部３４とを更に有する。これら機能構成部は、装置に搭載されたコンピュータを機能させるプログラムを実行することによって実現される。 The class ratio estimation apparatus 1 according to the present invention includes a user text acquisition unit 21, a keyword extraction unit 22, a feature amount extraction unit 23, a target feature amount distribution calculation unit 24, and a weight determination unit as <estimation function unit>. 25, a class type normalization unit 26, and a learning feature amount storage unit 30. The class ratio estimation apparatus 1 further includes a learning text acquisition unit 31 and a learning feature amount distribution calculation unit 34 as <learning function unit>. These functional components are realized by executing a program that causes a computer installed in the apparatus to function.

＜推定用機能部＞
［ユーザテキスト取得部２１］
ユーザテキスト取得部２１は、通信インタフェース部１０を介して、推定対象のグループに属する各メンバ識別子のメンバオブジェクトを取得する。例えば、ＳＮＳサイトサーバ２から、推定対象者に対するグループに属する複数のメンバ投稿者のユーザテキスト（プロフィール文を含む。メンバオブジェクト）を取得する。投稿者は、投稿者識別子（メンバ識別子）によって認識される。推定対象者に対するグループは、ＳＮＳサイトサーバ２から取得した交流関係リスト（フォロー・フォロワー関係）によって抽出されるものであってもよい。 <Functional unit for estimation>
[User text acquisition unit 21]
The user text acquisition unit 21 acquires the member object of each member identifier belonging to the estimation target group via the communication interface unit 10. For example, from the SNS site server 2, user texts (including profile sentences; member objects) of a plurality of member posters belonging to the group for the estimation target person are acquired. The poster is recognized by the poster identifier (member identifier). The group for the estimation target person may be extracted by an exchange relation list (follow-follower relation) acquired from the SNS site server 2.

ここで、ＳＮＳサイトサーバにおけるメンバ投稿者が、広報的な公的投稿者である場合がある。そのようなメンバ投稿者は、例えば企業の広報用アカウントとしてのメンバ識別子を有し、多数のユーザによってフォローされている。即ち、広報用アカウントの場合、フォロー数に対するフォロワー数の比率が大きい。そのために、ユーザテキスト取得部２１が、広報用アカウントを、それをフォローするユーザのユーザテキストまでも収集した場合、結果的に得られるクラス比率の推定精度を低下させる要因となる。そこで、ユーザテキスト取得部２１は、当該ユーザのフォロー数に対するフォロワー数の比率が、所定閾値を越える場合に当該ユーザを推定対象から取り除くことも好ましい。 Here, the member contributor in the SNS site server may be a publicity public contributor. Such a member contributor has a member identifier as a corporate publicity account, for example, and is followed by many users. That is, in the case of a publicity account, the ratio of the number of followers to the number of followers is large. Therefore, when the user text acquisition unit 21 also collects the user account of the user who follows the account for publicity, it becomes a factor that decreases the estimation accuracy of the resulting class ratio. Therefore, the user text acquisition unit 21 preferably removes the user from the estimation target when the ratio of the number of followers to the number of followers of the user exceeds a predetermined threshold.

尚、クラス比率推定装置１は、グループ毎にユーザテキストを予めデータベースに蓄積したものであってもよい。この場合、クラス比率推定装置１は、ユーザテキスト取得部２１を備える必要もない。即ち、クラス比率推定装置１が、ＳＮＳサイトサーバ２と通信することを必須とするものではない。 The class ratio estimation apparatus 1 may be one in which user texts are stored in advance in a database for each group. In this case, the class ratio estimation apparatus 1 does not need to include the user text acquisition unit 21. That is, it is not essential that the class ratio estimation apparatus 1 communicates with the SNS site server 2.

［キーワード抽出部２２］
キーワード抽出部２２は、ユーザテキストから、形態素解析によって、要素としてのキーワードを抽出する。キーワード抽出部２２は、形態素解析によって、収集されたユーザテキスト毎に形態素に分割する。「形態素」とは、文章の構成要素のうち、意味を持つ最小の単位をいう。本発明によって抽出されるキーワード（形態素）は、後述する特徴量の要素となる「名詞」のみである。各投稿者識別子（メンバ識別子）毎のキーワードが、特徴量抽出部２３へ出力される。 [Keyword extraction unit 22]
The keyword extraction unit 22 extracts keywords as elements from the user text by morphological analysis. The keyword extraction unit 22 divides the collected user text into morphemes by morphological analysis. A “morpheme” refers to the smallest meaningful unit among the constituent elements of a sentence. The keywords (morphemes) extracted by the present invention are only “nouns” that are elements of feature amounts described later. A keyword for each poster identifier (member identifier) is output to the feature amount extraction unit 23.

他の実施形態として、ユーザテキスト中に、ＷｅｂサイトのＵＲＬ(Uniform Resource Locator)が含まれている場合ある。そこで、キーワード抽出部２２は、ＵＲＬにおけるドメイン名を、要素としてのキーワード（形態素）として更に抽出することも好ましい。これは、ユーザの属性によっては、特定のＵＲＬのドメインへアクセスしやすい傾向があるためである。即ち、ドメイン名を特徴量の要素とすることによって、クラス比率を推定する材料となる要素を増やすことができる。 As another embodiment, there is a case where a URL (Uniform Resource Locator) of a website is included in the user text. Therefore, it is preferable that the keyword extracting unit 22 further extracts the domain name in the URL as a keyword (morpheme) as an element. This is because there is a tendency that it is easy to access a domain of a specific URL depending on a user attribute. That is, by using the domain name as an element of the feature amount, it is possible to increase the elements that are the material for estimating the class ratio.

更なる他の実施形態として、ユーザテキスト取得部２１は、ＳＮＳサイトサーバ２から取得した交流関係リスト中から、フォローユーザのメンバ識別子を取得できる場合がある。そこで、キーワード抽出部２２は、ユーザテキスト取得部２１における推定対象のグループに属する各メンバのメンバ識別子を、要素としてのキーワードとして更に抽出することも好ましい。これは、ユーザの属性によっては、特定の他のユーザをフォローしやすい傾向があるためである。即ち、フォローユーザのメンバ識別子を特徴量の要素とすることによって、クラス比率を推定する材料となる要素を増やすことができる。 As still another embodiment, the user text acquisition unit 21 may be able to acquire the member identifier of the follow user from the exchange relation list acquired from the SNS site server 2. Therefore, it is preferable that the keyword extraction unit 22 further extracts the member identifiers of the members belonging to the estimation target group in the user text acquisition unit 21 as keywords as elements. This is because there is a tendency that it is easy to follow a specific other user depending on the attribute of the user. In other words, by using the member identifier of the follow user as an element of the feature amount, it is possible to increase the element that is a material for estimating the class ratio.

［特徴量抽出部２３］
特徴量抽出部２３は、メンバ投稿者毎に、ユーザテキストに含まれるキーワードから、要素となる各キーワードの数を特徴量として抽出する。ここで、「メンバオブジェクトＴ_i」は、メンバ識別子i（i＝1〜N）と、複数の要素k（k＝1〜K）における各要素の特徴量ｆ_kiとから構成される。各メンバオブジェクトは、メンバ識別子毎に、そのメンバ投稿者が過去に投稿したユーザテキスト（及びプロフィール文）に含まれる、要素となる各キーワードが計数される。
Ｔ_i：オブジェクト
ｆ_Ti：オブジェクトＴ_iの特徴量（ベクトル）
N：推定対象のグループ内の全メンバオブジェクト数 [Feature Extraction Unit 23]
The feature amount extraction unit 23 extracts, for each member contributor, the number of each keyword as an element from the keywords included in the user text as a feature amount. Here, the “member object T _i ” includes a member identifier i (i = 1 to N) and feature amounts f _ki of each element in a plurality of elements k (k = 1 to K). For each member object, each keyword that is an element included in the user text (and profile sentence) posted in the past by the member poster is counted for each member identifier.
T _i : Object f _Ti : Feature quantity (vector) of object T _i
N: Number of all member objects in the estimation target group

［対象特徴量分布算出部２４］
対象特徴量分布算出部２４は、推定対象のグループのメンバオブジェクト全体から、各要素kの確率分布関数Ｐ_kを算出する。確率分布密度Ｐ_kとしては、例えば平均値μであってもよい。
μ＝1/N・Σ_Tiｆ_Ti
μ：平均値 [Target Feature Quantity Distribution Calculation Unit 24]
The target feature amount distribution calculation unit 24 calculates a probability distribution function P _k of each element k from the entire member objects of the estimation target group. The probability distribution density P _k may be, for example, an average value μ.
μ = 1 / N ・ Σ _Ti f _Ti
μ: Average value

図３は、メンバ識別子i毎における各要素k（k＝1〜K）の特徴量ｆ_kiと、平均値μの確率分布密度Ｐ_kを表す説明図である。 FIG. 3 is an explanatory diagram showing the feature quantity f _ki of each element k (k = 1 to K) and the probability distribution density P _k of the average value μ for each member identifier i.

図３によれば、各メンバのユーザテキストには、以下のようなキーワードが含まれていたことを表す。各キーワードは、各要素を意味する。
メンバＬ₁のテキストは、「政府」「スマートフォン」を含む。
メンバＬ₂のテキストは、「洗濯」を含む。
メンバＬ₃のテキストは、「政府」「スマートフォン」を含む。
メンバＬ₄のテキストは、「スマートフォン」「洗濯」「掃除」を含む。
また、図３によれば、推定対象のグループのメンバオブジェクト全体における各要素kの確率分布密度Ｐ_kとして平均値μが算出されている。
ｆ_k＝（政府，スマートフォン，洗濯，掃除，・・・）
＝（0.5 ，0.75 ，0.5 ，0.25，・・・） According to FIG. 3, the user text of each member indicates that the following keyword is included. Each keyword means each element.
The text of the member L ₁ includes “government” and “smart phone”.
The text of member L ₂ includes “laundry”.
The text of member L ₃ includes “government” and “smartphone”.
The text of the member L ₄ includes “smart phone”, “laundry”, and “cleaning”.
Further, according to FIG. 3, the average value μ is calculated as the probability distribution density P _k of each element k in the entire member object of the estimation target group.
f _k = (government, smartphone, washing, cleaning, ...)
= (0.5, 0.75, 0.5, 0.25, ...)

他の実施形態として、対象特徴量分布算出部２４は、メンバ投稿者のプロフィール文に、学習特徴量蓄積手段に蓄積されたクラス種別jを表すキーワードが含まれている場合、当該学習特徴量蓄積手段の当該クラス種別jに対応する各要素kの学習特徴量の確率分布関数Ｐ_kjをそのまま、当該メンバ投稿者のメンバオブジェクトの各要素kの特徴量ｆ_kiとすることも好ましい。ユーザテキスト中のプロフィール文に、例えば「男性」と記載されている場合、学習特徴量蓄積部３０にも既に、クラス種別「男性」とする学習特徴量が蓄積されている。この場合、学習特徴量蓄積部３０のクラス種別「男性」の学習特徴量分布をそのまま、当該メンバ投稿者の対象特徴量分布として取り込む。これによって、プロフィール文に記述された当該ユーザの属性をそのまま、対象特徴量分布に取り込むことができる。 As another embodiment, the target feature amount distribution calculation unit 24 stores the learning feature amount when the keyword representing the class type j stored in the learning feature amount storage unit is included in the member poster's profile sentence. It is also preferable to use the probability distribution function P _kj of the learning feature value of each element k corresponding to the class type j of the means as it is as the feature value f _ki of each element k of the member object of the member poster. For example, when “male” is described in the profile sentence in the user text, the learning feature amount storage unit 30 has already accumulated the learning feature amount of the class type “male”. In this case, the learning feature amount distribution of the class type “male” in the learning feature amount storage unit 30 is directly taken in as the target feature amount distribution of the member poster. Thereby, the attribute of the user described in the profile sentence can be taken in the target feature amount distribution as it is.

［学習特徴量蓄積部３０］
学習特徴量蓄積部３０は、クラス種別j毎に、各要素kの学習特徴量の確率分布関数Ｐ_kjを蓄積する。 [Learning feature amount storage unit 30]
The learning feature amount accumulation unit 30 accumulates the probability distribution function P _kj of the learning feature amount of each element k for each class type j.

図４は、学習特徴量蓄積部に蓄積された確率分布関数Ｐ_kjを表す説明図である。 FIG. 4 is an explanatory diagram illustrating the probability distribution function P _kj stored in the learning feature amount storage unit.

図４によれば、クラス種別jとして、「男性」／「女性」に区分される。そして、クラス種別j毎に、各要素kの平均値μ（確率分布密度Ｐ_k）が予め蓄積されている。
Ｐ（男性）＝（政府，スマートフォン，洗濯，掃除，・・・）
＝（1 ，1
，0 ，0 ，・・・）
Ｐ（女性）＝（政府，スマートフォン，洗濯，掃除，・・・）
＝（0 ，0.5 ，1 ，0.5 ，・・・）
即ち、「クラス種別」は、ユーザプロフィールに関する属性種別である。尚、学習特徴量蓄積部３０に蓄積される情報は、所定値として予め設定されたものであってもよいし、後述する学習特徴量分布算出部３４から出力されたものであってもよい。 According to FIG. 4, the class type j is classified into “male” / “female”. For each class type j, the average value μ (probability distribution density P _k ) of each element k is stored in advance.
P (male) = (Government, smartphone, washing, cleaning, ...)
= (1, 1
, 0, 0, ...)
P (female) = (government, smartphone, washing, cleaning, ...)
= (0, 0.5, 1, 0.5, ...)
That is, the “class type” is an attribute type related to the user profile. The information stored in the learning feature amount storage unit 30 may be set in advance as a predetermined value, or may be output from a learning feature amount distribution calculation unit 34 described later.

［重み決定部２５］
重み決定部２５は、クラス種別j毎の学習特徴量分布の重み付け和Σ_jｗ_jＰ_kjと、対象特徴量分布Ｐ_kとの分布間距離ｄ(Σ_jｗ_jＰ_kj，Ｐ_k)を、ｗ_j≧０という条件の下で最小にするように、各クラス種別jの学習特徴量分布Ｐ _kjに付ける重みｗ_jを決定する。
ｄ(Σ_jｗ_jＰ_kj，Ｐ_k)を、ｗ_j≧０という条件の下で最小化
Σ_jｗ_jＰ_kj：クラス種別j毎の学習特徴量分布の重み付け和
Ｐ_k：対象特徴量分布
ｄ(Σ_jｗ_jＰ_kj，Ｐ_k)：距離
[Weight determination unit 25]
The weight determination unit 25 calculates the inter-distribution distance d (Σ _j w _j P _kj , P _k ) between the weighted sum Σ _j w _j P _kj of the learning feature quantity distribution for each class type j and the target feature quantity distribution P _k. , so as to minimize under the condition that w _j ≧ 0, to determine the weights w _j to be given to learning feature distribution P _kj for each class type j.
d (Σ _j w _j P _kj , P _k ) is minimized under the condition of w _j ≧ 0 Σ _j w _j P _kj : Weighted sum of learning feature quantity distribution for each class type j P _k : Target feature quantity Distribution d (Σ _j w _j P _kj , P _k ): distance

図５は、学習特徴量の確率分布関数Ｐ_kjと、対象特徴量の確率分布関数Ｐ_kjとの比較を表す説明図である。 FIG. 5 is an explanatory diagram illustrating a comparison between the probability distribution function P _kj of the learning feature quantity and the probability distribution function P _kj of the target feature quantity.

クラス種別jが２個である場合（j＝1,2）、最小二乗法を用いて、各クラス種別jの学習特徴量分布に付ける重みｗ_jを算出することができる（例えば非特許文献２参照）。具体的には、重み決定部２５は、クラス種別j毎に算出される各要素kの学習特徴量分布Ｐ_kjの平均値μ_kjの重み付け和Σ_jｗ_jμ_kjと、各要素kに対する対象特徴量分布Ｐ_kの平均値μ_kとの二乗誤差で定義される分布間距離ｄ(Σ_jｗ_jＰ_kj，Ｐ_k)＝Σ_k(Σ_jｗ_jμ_kj−μ_k)²を、ｗ_j≧0という条件の下で最小にするように、各クラス種別jの学習特徴量に付ける重みｗ_jを決定する。
Σ_k(Σ_jｗ_jμ_kj−μ_k)²を、ｗ_j≧０という条件の下で最小化
Σ_jｗ_jμ_kj：クラス種別j毎の学習特徴量分布Ｐ_kjの平均値μ_kjの重み付け和
μ_k：対象特徴量分布Ｐ_kの平均値 When there are two class types j (j = 1, 2), the weight w _{j given} to the learning feature amount distribution of each class type j can be calculated using the least square method (for example, Non-Patent Document 2). reference). Specifically, the weight determination unit 25 calculates the weighted sum Σ _j w _j μ _kj of the average value μ _kj of the learning feature amount distribution P _kj of each element k calculated for each class type j and the target for each element k. feature distribution P _k of an average between the distribution defined by the square errors between values mu _k distance _{_{_{d (Σ j w j P kj}}} , P k) a _{_{_{= Σ k (Σ j w j}}} μ kj -μ k) 2, The weight w _j to be assigned to the learning feature amount of each class type _j is determined so as to be minimized under the condition of w _j ≧ 0.
Σ _k (Σ _j w _j μ _kj −μ _k ) ² is minimized under the condition of w _j ≧ 0 Σ _j w _j μ _kj : Average value μ _kj of learning feature amount distribution P _kj for each class type j Weighted sum μ _k : Average value of target feature distribution P _k

具体的には、二乗誤差Ｓは、以下の式によって算出される。この二乗誤差Ｓを、ｗ_j≧0という条件の下で最小にするｗ_jを算出する。
Ｓ＝（ｗ_c1＊μ_1,c1＋ｗ_c2＊μ_1,c2−μ₁）²＋
（ｗ_c1＊μ_2,c1＋ｗ_c2＊μ_2,c2−μ₂）²＋
（ｗ_c1＊μ_3,c1＋ｗ_c2＊μ_3,c2−μ₃）²＋
（ｗ_c1＊μ_4,c1＋ｗ_c2＊μ_4,c2−μ₄）²
c1：クラス種別「男性」
c2：クラス種別「女性」
k＝1〜4：（政府，スマートフォン，洗濯，掃除）
ｗ_c1：クラス種別「男性」の重み
ｗ_c2：クラス種別「女性」の重み
μ_1,c1：「政府」における「男性」のクラス種別の学習特徴量の平均値
μ_2,c1：「スマートフォン」における「男性」のクラス種別の学習特徴量の平均値
μ_3,c1：「洗濯」における「男性」のクラス種別の学習特徴量の平均値
μ_4,c1：「掃除」における「男性」のクラス種別の学習特徴量の平均値
μ_1,c2：「政府」における「女性」のクラス種別の学習特徴量の平均値
μ_2,c2：「スマートフォン」における「女性」のクラス種別の学習特徴量の平均値
μ_3,c2：「洗濯」における「女性」のクラス種別の学習特徴量の平均値
μ_4,c2：「掃除」における「女性」のクラス種別の学習特徴量の平均値
μ₁：「政府」における対象特徴量の平均値
μ₂：「スマートフォン」における対象特徴量の平均値
μ₃：「洗濯」における対象特徴量の平均値
μ₄：「掃除」における対象特徴量の平均値 Specifically, the square error S is calculated by the following equation. This square error S, to calculate the w _j to the minimum under the condition that w _j ≧ 0.
S = (w _c1 * μ _{1, c1} + w _c2 * μ _{1, c2} −μ ₁ ) ² +
(W _c1 * μ _{2, c1} + w _c2 * μ _{2, c2} −μ ₂ ) ² +
(W _c1 * μ _{3, c1} + w _c2 * μ _{3, c2} −μ ₃ ) ² +
(W _c1 * μ _{4, c1} + w _c2 * μ _{4, c2} −μ ₄ ) ²
c1: Class type "male"
c2: Class type “female”
k = 1 to 4: (Government, smartphone, washing, cleaning)
w _c1 : Weight of class type “male” w _c2 : Weight of class type “female” μ _{1, c1} : Average value of learning features of class type “male” in “government” μ _{2, c1} : “Smartphone” Mean value of learning features of class type of “male” in μ _{μc, c1} : average value of learning features of class type of “male” in “laundry” μ _{4, c1} : class of “male” in “cleaning” Average learning feature value of each type μ _{1, c2} : Average learning feature value of “Women” class type in “Government” μ _{2, c2} : Learning feature value of “Women” class type in “Smartphone” Average value μ _{3, c2} : Average learning feature value of “Women” class type in “Laundry” μ _{4, c2} : Average learning feature value of “Woman” class type in “Cleaning” μ ₁ : the average value of the target feature amount in government "μ _2: in the" smart phone " The average value of elephant feature value μ _3: The average of the object feature amount in the "washing" value μ _4: object feature amount of the average value in the "cleaning"

次に、前述した二乗誤差Ｓから、以下の式における定数Ａ〜Ｆを算出する。
Ｓ＝Ａ＋Ｂｗ₁ ²−２Ｃｗ₁−２Ｄｗ₂＋２Ｅｗ₁ｗ₂＋Ｆｗ₂ ²
この場合、ｗ₁及びｗ₂は、以下のように算出される。
ｗ₁＝（ＦＤ−ＣＥ）／（ＦＢ−Ｅ²）＝０．５
ｗ₂＝（ＢＣ−ＤＥ）／（ＦＢ−Ｅ²）＝０．５ Next, constants A to F in the following equation are calculated from the above-described square error S.
S = A + Bw ₁ ² -2Cw ₁ -2Dw ₂ + 2Ew ₁ w ₂ + Fw ₂ ²
In this case, w ₁ and w ₂ are calculated as follows.
w ₁ = (FD-CE) / (FB-E ² ) = 0.5
w ₂ = (BC-DE) / (FB-E ² ) = 0.5

尚、クラス数が３個以上（多次元）である場合、例えば非特許文献３に記載の方法で算出することができる。 In addition, when the number of classes is 3 or more (multidimensional), it can be calculated by the method described in Non-Patent Document 3, for example.

［クラス種別正規化部２６］
クラス種別正規化部２６は、決定された各クラス種別jの重みｗ_jを正規化して、クラス種別j毎の比率を算出する。ここでの正規化とは、０〜１の間の値に比例変換し、クラス種別毎に比較できるようにした変換をいう。 [Class type normalization unit 26]
The class type normalization unit 26 normalizes the determined weight w _j of each class type j and calculates a ratio for each class type j. The normalization here refers to a conversion that is proportionally converted to a value between 0 and 1 and can be compared for each class type.

＜学習用機能部＞
［学習テキスト取得部３１］
学習テキスト取得部３１は、ユーザインタフェース部１１を介して、ユーザから、学習対象の複数のメンバオブジェクトi（i＝1〜L）を含むグループを取得する。取得されたメンバオブジェクトは、学習特徴量分布算出部３４へ出力される。 <Functional part for learning>
[Learning text acquisition unit 31]
The learning text acquisition unit 31 acquires a group including a plurality of learning target member objects i (i = 1 to L) from the user via the user interface unit 11. The acquired member object is output to the learning feature amount distribution calculation unit 34.

［学習特徴量分布算出部３４］
学習特徴量分布算出部３４は、グループの全てのメンバオブジェクトから、クラス種別j毎に各要素kの学習特徴量の確率分布関数Ｐ_kiを算出し、当該確率分布関数を確率分布関数Ｐ_kjとして学習特徴量蓄積部３０へ出力する。 [Learning feature amount distribution calculation unit 34]
The learning feature amount distribution calculation unit 34 calculates a probability distribution function P _ki of the learning feature amount of each element k for each class type j from all member objects of the group, and uses the probability distribution function as the probability distribution function P _kj. This is output to the learning feature amount storage unit 30.

図６は、学習特徴量分布算出部における処理内容を表す説明図である。 FIG. 6 is an explanatory diagram illustrating processing contents in the learning feature amount distribution calculation unit.

図６によれば、所属するクラス種別が既知の複数の学習用メンバオブジェクトが表されており、クラス種別「男性」とする複数（２個）のものと、クラス種別「女性」とする複数（２個）のものとが表されている。ここでは、クラス種別毎に、各要素kの学習特徴量の平均値を算出している。一般に、特徴量の要素k（次元数）がクラスの数より大きくなるほど、各メンバオブジェクトの特徴量は、クラス種別の特徴量の近くに分布することが期待できる。そのために、本発明によって算出されるクラス比は、真の値に十分に近い近似値となり得る。 According to FIG. 6, a plurality of learning member objects having a known class type are represented, and a plurality (two) of class types “male” and a plurality of class types “female” ( 2). Here, the average value of the learning feature amount of each element k is calculated for each class type. In general, it can be expected that the feature amount of each member object is distributed closer to the feature amount of the class type as the element k (number of dimensions) of the feature amount becomes larger than the number of classes. Therefore, the class ratio calculated by the present invention can be an approximate value sufficiently close to the true value.

尚、前述した実施形態によれば、ユーザ毎のユーザテキストから、１つの学習特徴量蓄積部３０を参照してクラス種別を導出している。ここで、多数のユーザテキストを投稿している多投稿ユーザと、少しのユーザテキストしか投稿していない少投稿ユーザとが混在している。例えば、過去の投稿数が１００件以下の少投稿ユーザと、１００件以上の多投稿ユーザでは、要素の特徴量の大きさ|ｆ_Ti|、即ち、特徴量の分布傾向も異なっている。従って、これらを一括に１つの学習特徴量蓄積部３０によって推定することは、精度を低下させる場合がある。そのため、できる限りユーザに適したクラス推定をするために、多投稿ユーザ用の学習特徴量蓄積部と、少投稿ユーザ用の学習特徴量蓄積部とを別々に備えることも好ましい According to the above-described embodiment, the class type is derived from the user text for each user with reference to one learning feature amount storage unit 30. Here, a multi-posting user who has posted a large number of user texts and a low-posting user who has posted only a few user texts are mixed. For example, the feature size of the element | f _Ti |, that is, the distribution tendency of the feature quantity is different between a small number of posting users with a past posting number of 100 or less and a multiple posting user with 100 or more postings. Therefore, estimating them collectively by one learning feature amount storage unit 30 may reduce accuracy. Therefore, in order to perform class estimation suitable for the user as much as possible, it is also preferable to separately provide a learning feature amount storage unit for multiple posting users and a learning feature amount storage unit for low posting users.

以上、詳細に説明したように、本発明のクラス比率推定装置、プログラム及び方法によれば、複数のメンバオブジェクトを含む推定対象のグループについて、オブジェクト毎のクラス（属性）種別を正確に識別することなく、各クラス種別の比率を算出することができる。特に、本発明によれば、曖昧な大量のメンバオブジェクトのグループついて、未知となるクラス比率を効率よく推定することができる。 As described above in detail, according to the class ratio estimation apparatus, program, and method of the present invention, the class (attribute) type for each object can be accurately identified for the estimation target group including a plurality of member objects. The ratio of each class type can be calculated. In particular, according to the present invention, an unknown class ratio can be efficiently estimated for a group of a large number of ambiguous member objects.

本発明のクラス比率推定装置を用いたサービスとして、推定対象となるグループにおけるソーシャルメディア解析に用いることができる。従来技術によれば、メンバの投稿文に基づくネガティブ／ポジティブ推定しかできなかった。本発明によれば、リアルタイムに、年代、性別、職業等のクラス種別に応じた推定が可能となり、所定の商品役務に対して現に話題となっているマーケットの声を認識することができる。尚、このような推定結果は、Ｗｅｂサイト上に閲覧可能とするだけでなく、様々なアプリケーションに対するＷｅｂツールとしても提供される。 As a service using the class ratio estimation apparatus of the present invention, it can be used for social media analysis in a group to be estimated. According to the prior art, only negative / positive estimation based on a member's post can be performed. According to the present invention, estimation according to the class type such as age, sex, occupation, etc. can be performed in real time, and the voice of the market that is currently a topic for a predetermined product service can be recognized. Such estimation results are not only made available for browsing on a website, but also provided as web tools for various applications.

前述した本発明の種々の実施形態について、本発明の技術思想及び見地の範囲の種々の変更、修正及び省略は、当業者によれば容易に行うことができる。前述の説明はあくまで例であって、何ら制約しようとするものではない。本発明は、特許請求の範囲及びその均等物として限定するものにのみ制約される。 Various changes, modifications, and omissions of the above-described various embodiments of the present invention can be easily made by those skilled in the art. The above description is merely an example, and is not intended to be restrictive. The invention is limited only as defined in the following claims and the equivalents thereto.

１クラス比率推定装置
１０通信インタフェース部
１１ユーザインタフェース部
２１ユーザテキスト取得部
２２キーワード抽出部
２３特徴量抽出部
２４対象特徴量分布算出部
２５重み決定部
２６クラス種別正規化部
３０学習特徴量蓄積部
３１学習テキスト取得部
３４学習特徴量分布算出部
２コミュニケーションサイトサーバ、ＳＮＳサイトサーバ
３端末 DESCRIPTION OF SYMBOLS 1 Class ratio estimation apparatus 10 Communication interface part 11 User interface part 21 User text acquisition part 22 Keyword extraction part 23 Feature-value extraction part 24 Target feature-value distribution calculation part 25 Weight determination part 26 Class classification normalization part 30 Learning feature-value storage part 31 Learning text acquisition part 34 Learning feature-value distribution calculation part 2 Communication site server, SNS site server 3 Terminal

Claims

A class ratio estimation apparatus that calculates a ratio of each class type j (j = 1 to M) in a plurality of class (attribute) types for a group including a plurality of member objects i (i = 1 to N) to be estimated. And
The member object includes a member identifier i and a feature quantity f _ki of each element in a plurality of elements k (k = 1 to K).
Learning feature value accumulating means for accumulating the probability distribution function of the feature value of each element k as a learning feature value distribution P _kj for each class type j from all member objects i (i = 1 to L) of the learning target group When,
A target feature amount distribution calculating means for calculating a probability distribution function of the feature amount of each element k as a target feature distribution P _k from all the member objects of the estimation target group;
The distribution distance d (Σ _j w _j P _kj , P _k ) between the weighted sum Σ _j w _j P _kj of the learning feature quantity distribution P _kj for each class type j and the target feature quantity distribution P _k is expressed as w _j ≧ so as to minimize under the condition that 0, a weight determining means for determining the weights w _j to be given to learning feature distribution P _kj for each class type j,
A class ratio estimating apparatus comprising class type normalizing means for normalizing the determined weight w _j of each class type j and calculating a ratio for each class type j.

The weight determining means calculates the weighted sum Σ _j w _j μ _kj of the average value μ _kj of the learning feature amount distribution P _kj of each element k calculated for each class type j and the target feature amount distribution P _k for each element _k. Inter-distribution distance d (Σ _j w _j P _kj , P _k ) = Σ _k (Σ _j w _j μ _kj −μ _k ) ² defined as a square error with respect to the average value μ _{k of} w _j ≧ 0 2. The class ratio estimation apparatus according to claim 1, wherein a weight _wj to be assigned to the learning feature amount distribution _Pkj of each class type _j is determined so as to be minimized under conditions.

Enter the group including a plurality of member object i to be learned (i = 1 to L), from all the member objects of the group, for each class type j of feature parameters of each element k of the probability distribution function P _ki 3. The class ratio estimation apparatus according to claim 1, further comprising a learning feature amount distribution calculating unit that calculates and outputs the probability distribution function as a learning feature amount distribution P _kj to the learning feature amount storage unit. .

The member identifier is a poster identifier,
Each member object is a user text relating to a member poster corresponding to the poster identifier,
The class type is an attribute type related to the user profile,
The class ratio estimation apparatus according to claim 1, wherein the element is a keyword included in the user text.

User text acquisition means for acquiring the user text of each member contributor belonging to the estimation target group;
Keyword extracting means for extracting a keyword as an element from the user text by morphological analysis;
The class ratio estimation apparatus according to claim 4, further comprising a feature amount extraction unit that extracts, as a feature amount, the number of each keyword as an element for each member contributor.

6. The class ratio estimation apparatus according to claim 5, wherein the keyword extraction unit further extracts a domain name in a URL (Uniform Resource Locator) as a keyword as an element.

7. The class ratio estimation according to claim 5, wherein the keyword extraction unit further extracts a member identifier of each member belonging to the estimation target group in the user text acquisition unit as a keyword as an element. apparatus.

The class according to any one of claims 5 to 7, wherein the user text is a posted message sent by a member poster and / or a profile message of a member poster in the communication server. Ratio estimation device.

The target feature amount distribution calculating means, when the member poster's profile sentence includes a keyword representing the class type j stored in the learning feature amount storage means, the class of the learning feature amount storage means _9. The learning feature distribution P _kj of each element k corresponding to the type j is used as the feature quantity f _ki of each element k of the member object of the member poster as it is. The class ratio estimation apparatus described in 1.

For a group including a plurality of member objects i (i = 1 to N) to be estimated, the computer is caused to calculate a ratio of each class type j (j = 1 to M) in a plurality of class (attribute) types. A class ratio estimation program,
The member object includes a member identifier i and a feature quantity f _ki of each element in a plurality of elements k (k = 1 to K) .
Learning feature value accumulating means for accumulating the probability distribution function of the feature value of each element k as a learning feature value distribution P _kj for each class type j from all member objects i (i = 1 to L) of the learning target group When,
A target feature amount distribution calculating means for calculating a probability distribution function of the feature amount of each element k as a target feature distribution P _k from all the member objects of the estimation target group ;
The distribution distance d (Σ _j w _j P _kj , P _k ) between the weighted sum Σ _j w _j P _kj of the learning feature quantity distribution P _kj for each class type j and the target feature quantity distribution P _k is expressed as w _j ≧ Weight determining means for determining a weight w _j to be assigned to the learning feature amount distribution P _kj of each class type j so as to be minimized under the condition of 0 ;
Class type normalizing means for normalizing the weight w _j of each determined class type j and calculating a ratio for each class type j;
A class ratio estimation program that causes a computer to function.

Class ratio estimation method for a device that calculates a ratio of each class type j (j = 1 to M) in a plurality of class (attribute) types for a group including a plurality of estimation target member objects i (i = 1 to N) Because
The member object includes a member identifier i and a feature quantity f _ki of each element in a plurality of elements k (k = 1 to K) .
The apparatus learns to accumulate a probability distribution function of a feature quantity of each element k as a learned feature quantity distribution P _kj for each class type j from all member objects i (i = 1 to L) of a learning target group. It has a feature amount storage unit,
The device is
A first step of calculating a probability distribution function of a feature value of each element k as a target feature distribution P _k from all the member objects of the estimation target group ;
The distribution distance d (Σ _j w _j P _kj , P _k ) between the weighted sum Σ _j w _j P _kj of the learning feature quantity distribution P _kj for each class type j and the target feature quantity distribution P _k is expressed as w _j ≧ A second step of determining a weight w _j to be assigned to the learning feature amount distribution P _kj of each class type j so as to be minimized under the condition of 0 ;
A third step of normalizing the determined weight w _j of each class type j and calculating a ratio for each class type j;
A method for estimating the class ratio of the apparatus .